familysearch.org
robots.txt

Robots Exclusion Standard data for familysearch.org

Resource Scan

Scan Details

Site Domain familysearch.org
Base Domain familysearch.org
Scan Status Ok
Last Scan2024-09-26T16:41:08+00:00
Next Scan 2024-10-10T16:41:08+00:00

Last Scan

Scanned2024-09-26T16:41:08+00:00
URL https://familysearch.org/robots.txt
Redirect https://www.familysearch.org:443/robots.txt
Redirect Domain www.familysearch.org
Redirect Base familysearch.org
Domain IPs 3.212.42.78, 3.93.111.247
Redirect IPs 45.223.168.251
Response IP 45.223.168.251
Found Yes
Hash bce233c45e5cb6cd6fa9aa29c0907b442b0deb518b02a4f5fdca2bce9af0f561
SimHash 74344949e5c7

Groups

*

Rule Path
Disallow /ark%3A/61903/1%3A
Disallow /ark%3A/61903/2%3A
Disallow /campaign/
Disallow /cgi-bin/
Disallow /Eng/
Disallow /frontier/
Disallow /identity/settings/
Disallow /learningcenter
Disallow /mgmt/
Disallow /pal%3A/
Disallow /photos/album/
Disallow /photos/person/
Disallow /photos/view/
Disallow /profile/
Disallow /records/pal%3A/
Disallow /Search/
Disallow /service/temple/cards
Disallow /tree
Allow /tree/
Disallow /tree/contributions
Disallow /tree/find
Disallow /tree/following
Disallow /tree/import
Disallow /tree/improve-place-names
Disallow /tree/pedigree
Disallow /tree/person
Disallow /tree/sources

algolia crawler

Rule Path
Allow /frontier

mediapartners-google*

Rule Path
Disallow /wiki/

israbot

Rule Path
Disallow /wiki/

orthogaffe

Rule Path
Disallow /wiki/

ubicrawler

Rule Path
Disallow /wiki/

doc

Rule Path
Disallow /wiki/

zao

Rule Path
Disallow /wiki/

sitecheck.internetseer.com

Rule Path
Disallow /wiki/

zealbot

Rule Path
Disallow /wiki/

msiecrawler

Rule Path
Disallow /wiki/

sitesnagger

Rule Path
Disallow /wiki/

webstripper

Rule Path
Disallow /wiki/

webcopier

Rule Path
Disallow /wiki/

fetch

Rule Path
Disallow /wiki/

offline explorer

Rule Path
Disallow /wiki/

teleport

Rule Path
Disallow /wiki/

teleportpro

Rule Path
Disallow /wiki/

webzip

Rule Path
Disallow /wiki/

linko

Rule Path
Disallow /wiki/

httrack

Rule Path
Disallow /wiki/

microsoft.url.control

Rule Path
Disallow /wiki/

xenu

Rule Path
Disallow /wiki/

larbin

Rule Path
Disallow /wiki/

libwww

Rule Path
Disallow /wiki/

zyborg

Rule Path
Disallow /wiki/

download ninja

Rule Path
Disallow /wiki/

fast

Rule Path
Disallow /wiki/

wget

Rule Path
Disallow /wiki/

grub-client

Rule Path
Disallow /wiki/

k2spider

Rule Path
Disallow /wiki/

npbot

Rule Path
Disallow /wiki/

webreaper

Rule Path
Disallow /wiki/

ia_archiver

Rule Path
Allow /wiki/*%26action%3Draw

*

Rule Path
Allow /wiki/w/api.php?action=mobileview&
Allow /wiki/w/load.php?
Allow /wiki/api/rest_v1/?doc
Disallow /wiki/w/
Disallow /wiki/api/
Disallow /wiki/trap/
Disallow /wiki/Special%3A*

Other Records

Field Value
sitemap https://www.familysearch.org/photos/sitemapIndex?category=artifacts&artifactCategory=IMAGE
sitemap https://www.familysearch.org/photos/sitemapIndex?category=artifacts&artifactCategory=TEXT

Comments

  • LAST CHANGED: Tue Mar 29 2022, at 11:00:00 GMT+0000 (GMT)
  • Version 1.0.10
  • Allow Algolia to search /frontier
  • Specific rules for /wiki/
  • Please note: There are a lot of pages on this site, and there are some misbehaved spiders out there
  • that go _way_ too fast. If you're irresponsible, your access to the site may be blocked.
  • advertising-related bots:
  • Wikipedia work bots:
  • Crawlers that are kind enough to obey, but which we'd rather not have
  • unless they're feeding search engines.
  • Some bots are known to be trouble, particularly those designed to copy
  • entire sites. Please obey robots.txt.
  • Misbehaving: requests much too fast:
  • Sorry, wget in its recursive mode is a frequent problem.
  • Please read the man page and use it properly; there is a
  • --wait option you can use to set the delay between hits,
  • for instance.
  • The 'grub' distributed client has been *very* poorly behaved.
  • Doesn't follow robots.txt anyway, but...
  • Hits many times per second, not acceptable
  • http://www.nameprotect.com/botinfo.html
  • A capture bot, downloads gazillions of pages with no public benefit
  • http://www.webreaper.net/
  • Wayback Machine
  • User-agent: archive.org_bot
  • Treated like anyone else
  • Allow the Internet Archiver to index action=raw and thereby store the raw wikitext of pages
  • Friendly, low-speed bots are welcome viewing article pages, but not
  • dynamically-generated pages please.
  • Inktomi's "Slurp" can read a minimum delay between hits; if your
  • bot supports such a thing using the 'Crawl-delay' or another
  • instruction, please let us know.
  • There is a special exception for API mobileview to allow dynamic
  • mobile web & app views to load section content.
  • These views aren't HTTP-cached but use parser cache aggressively
  • and don't expose special: pages etc.
  • Another exception is for REST API documentation, located at
  • /api/rest_v1/?doc.
  • Disallow indexing of non-article content