johncordes.ca
robots.txt

Robots Exclusion Standard data for johncordes.ca

Resource Scan

Scan Details

Site Domain johncordes.ca
Base Domain johncordes.ca
Scan Status Ok
Last Scan2025-11-02T04:21:35+00:00
Next Scan 2025-12-02T04:21:35+00:00

Last Scan

Scanned2025-11-02T04:21:35+00:00
URL https://johncordes.ca/robots.txt
Domain IPs 198.72.126.220
Response IP 198.72.126.220
Found Yes
Hash 0581ff7a14d1d3cf3aa13e38af57dd621c4d9bf8abbbdd2de69942f34d79cc24
SimHash de205011c917

Groups

google

Rule Path
Disallow *.jpg

googlebot

Rule Path
Disallow *.jpg

bingbot

Rule Path
Disallow *.jpg

Other Records

Field Value
crawl-delay 120

msnbot

Rule Path
Disallow *.jpg

Other Records

Field Value
crawl-delay 120

slurp

Rule Path
Disallow *.jpg

Other Records

Field Value
crawl-delay 120

adidxbot

Rule Path
Disallow *.jpg

semrushbot

Rule Path
Disallow /

semrushbot/7

Rule Path
Disallow /

semrushbot/7~bl

Rule Path
Disallow /

gptbot

Rule Path
Disallow /

*

Rule Path
Disallow /cgi-bin/
Disallow /restricted/
Disallow /genealogy2/cordes/
Disallow /genealogy/thomas/
Disallow /GANS/
Disallow /nsancestors.ca/templates/
Disallow /nsancestors.ca/forms/
Disallow /novascotiaancestors.ca/
Disallow /rnshs.ca/
Disallow /rnshs/
Disallow /rnshs/?p=2304
Disallow /testbed/
Disallow /nschess/
Disallow /BoutilierGroup/
Disallow /formtest.html
Disallow /analytics.html
Disallow /dalhousie.html
Disallow /hotlists.html
Disallow /medical.html
Disallow /webcams.html
Disallow /travel.html
Disallow /linux.html
Disallow /gans.html
Disallow /misc.html
Disallow /news.html
Disallow /hrm.html
Disallow /utilities.html
Disallow /games-sports.html
Disallow /semrush.com
Disallow /bot.semrush.com
Disallow /bl.bot.semrush.com
Disallow /seostar.co
Disallow /timeline.php
Disallow /suggest.php
Disallow /rev.poneytelecom.eu

*

Rule Path
Disallow /

Other Records

Field Value
sitemap https://www.nsancestors.ca/nscumber/sitemap_05-09-p1.txt
sitemap https://www.nsancestors.ca/nscumber/sitemap_05-09-p2.txt
sitemap https://www.nsancestors.ca/nscumber/sitemap_05-09-p3.txt
sitemap https://www.nsancestors.ca/nscumber/sitemap_10-14-p1a.txt
sitemap https://www.nsancestors.ca/nscumber/sitemap_10-14-p1b.txt
sitemap https://www.nsancestors.ca/nscumber/sitemap_10-14-p2a.txt
sitemap https://www.nsancestors.ca/nscumber/sitemap_10-14-p2b.txt
sitemap https://www.nsancestors.ca/nscumber/sitemap_10-14-p3.txt
sitemap https://www.nsancestors.ca/nscumber/sitemap_15-19-p1.txt
sitemap https://www.nsancestors.ca/nscumber/sitemap_15-19-p2.txt
sitemap https://www.nsancestors.ca/nscumber/sitemap_99-04-p1.txt
sitemap https://www.nsancestors.ca/nscumber/sitemap_99-04-p2.txt
sitemap https://www.nsancestors.ca/nscumber/sitemap_gap.txt
sitemap https://www.nsancestors.ca/nscumber/sitemap_nonmessages.txt

Comments

  • http://johncordes.ca/genealogy/tngsitemapindex.xml
  • /nsancestors.ca/nscumber/sitemap.txt
  • Sitemap: https://www.nsancestors.ca/nscumber/sitemap_various_99-04_05_09_15-19_gap.txt
  • /nsancestors.ca/nscumber/sitemap_10-14.txt
  • 2024-08-01 try adding the stuff to this file that I put in the tng/robots.txt file a few days ago (yesterday?)
  • 2024-07-31 try these entries, from a 4 yr old post by William J. Watson on the TNG FB page
  • The first entry that matches gets used.
  • Allow google to access anything but images
  • Allow bing to access anything but images.
  • Ask it to wait a full two minutes between requests.
  • We don't care if its slow.
  • This is the old name of the bing bot.
  • This is yahoo's spider
  • I don't know whose this is, but we'll allow it.
  • 2024-08-01 comment out these lines
  • User-agent: *
  • Disallow: http://www.nsancestors.ca/nsa-newspapers.html
  • Disallow: http://www.nsancestors.ca/taylor/advocatecemetery.html
  • Allow: /
  • 2023-01-27 jgc from Roger Moffat on tng list
  • 2023-09-15 https://community.realmacsoftware.com/c/knowledge-sharing/how-to-prevent-chatgpt-from-crawling-your-website
  • 2024-08-01
  • User-agent: *
  • Disallow: /bot-trap # 2024-08-01 only want bot-trap in the tng/robots.txt file (I think)
  • 2024-08-01 Ask everyone else to go away. (not relevant, given the above ordering?)