scholar.archive.org
robots.txt

Robots Exclusion Standard data for scholar.archive.org

Archived Snapshots

Resource Scan

Scan Details

Site Domain	scholar.archive.org
Base Domain	archive.org
Scan Status	Ok
Last Scan	2025-03-03T10:41:39+00:00
Next Scan	2025-04-02T10:41:39+00:00

Last Scan

Scanned	2025-03-03T10:41:39+00:00
URL	https://scholar.archive.org/robots.txt
Domain IPs	207.241.225.8, 207.241.232.8
Response IP	207.241.232.8
Found	Yes
Hash	497d356a937fcdeff680e3f9d9a7cdeae67e23633a4624957c21aecb9206556c
SimHash	be275950c7d5

Groups

semrushbot
yandexbot
bingbot
googlebot
semanticscholarbot
yacybot
petalbot
yeti
riddler

Rule	Path
Disallow	/search

Rule

Path

Disallow

/search

*

Rule	Path
Disallow	/search

Rule

Path

Disallow

/search

*

Rule	Path
Allow	/

Rule

Path

Allow

/

Back to top

Other Records

Field	Value
sitemap	https://scholar.archive.org/sitemap.xml
sitemap	https://scholar.archive.org/sitemap-index-works.xml

Field

Value

sitemap

https://scholar.archive.org/sitemap.xml

sitemap

https://scholar.archive.org/sitemap-index-works.xml

Back to top

Comments

Hello friends!
If you are considering large or automated crawling, you may want to look at
our catalog API (https://api.fatcat.wiki) or bulk database snapshots instead.
large-scale bots should not index search pages
crawling search result pages is expensive, so we do specify a long crawl
delay for those (for bots other than the above broad search bots)
UPDATE: actually, just block all robots from search page, we are overwhelmed
as of 2022-10-31
Allow: /search
Crawl-delay: 5
by default, can crawl anything on this domain. HTTP 429 ("backoff") status
codes are used for rate-limiting instead of any crawl delay specified here.
Up to a handful concurrent requests should be fine.
same info as sitemap-index-works.xml plus following citation_pdf_url
Sitemap: https://scholar.archive.org/sitemap-index-access.xml

Back to top

scholar.archive.orgrobots.txt

Resource Scan

Scan Details

Last Scan

Groups

semrushbotyandexbotbingbotgooglebotsemanticscholarbotyacybotpetalbotyetiriddler

*

*

Other Records

Comments

scholar.archive.org
robots.txt

semrushbot
yandexbot
bingbot
googlebot
semanticscholarbot
yacybot
petalbot
yeti
riddler