worldwidearchive.org
robots.txt

Robots Exclusion Standard data for worldwidearchive.org

Archived Snapshots

Resource Scan

Scan Details

Site Domain	worldwidearchive.org
Base Domain	worldwidearchive.org
Scan Status	Ok
Last Scan	2025-08-19T06:01:00+00:00
Next Scan	2025-08-26T06:01:00+00:00

Last Scan

Scanned	2025-08-19T06:01:00+00:00
URL	https://worldwidearchive.org/robots.txt
Domain IPs	104.21.59.196, 172.67.182.241, 2606:4700:3033::6815:3bc4, 2606:4700:3037::ac43:b6f1
Response IP	104.21.59.196
Found	Yes
Hash	2fa526570acba59d29971e10359f4f666f8c3c4226017e750a33e20439d1a1c4
SimHash	501d0142f209

Groups

daumoa

Rule	Path
Disallow	/

Rule

Path

Disallow

cliqzbot

Rule	Path
Disallow	/

Rule

Path

Disallow

crawler4j

Rule	Path
Disallow	/

Rule

Path

Disallow

getintent

Rule	Path
Disallow	/

Rule

Path

Disallow

coccoc

Rule	Path
Disallow	/

Rule

Path

Disallow

proximic

Rule	Path
Disallow	/

Rule

Path

Disallow

grapeshot

Rule	Path
Disallow	/

Rule

Path

Disallow

ltx71

Rule	Path
Disallow	/

Rule

Path

Disallow

jamesbot

Rule	Path
Disallow	/

Rule

Path

Disallow

smtbot

Rule	Path
Disallow	/

Rule

Path

Disallow

scrapy

Rule	Path
Disallow	/

Rule

Path

Disallow

gptbot

Rule	Path
Disallow	/

Rule

Path

Disallow

claudebot

Rule	Path
Disallow	/

Rule

Path

Disallow

amazonbot

Rule	Path
Disallow	/

Rule

Path

Disallow

blexbot

Rule	Path
Disallow	/

Rule

Path

Disallow

ccbot

Rule	Path
Disallow	/

Rule

Path

Disallow

bytespider

Rule	Path
Disallow	/

Rule

Path

Disallow

ahrefsbot

Rule	Path
Disallow	/

Rule

Path

Disallow

mj12bot

Rule	Path
Disallow	/

Rule

Path

Disallow

semrushbot

Rule	Path
Disallow	/

Rule

Path

Disallow

imagesiftbot

Rule	Path
Disallow	/

Rule

Path

Disallow

meta-externalagent

Rule	Path
Disallow	/

Rule

Path

Disallow

facebookexternalhit

Rule	Path
Disallow	/

Rule

Path

Disallow

serpstatbot

Rule	Path
Disallow	/

Rule

Path

Disallow

dataforseobot

Rule	Path
Disallow	/

Rule

Path

Disallow

barkrowler

Rule	Path
Disallow	/

Rule

Path

Disallow

petalbot

Rule	Path
Disallow	/

Rule

Path

Disallow

*

Rule	Path
Disallow	/search?
Disallow	/edit/
Disallow	/cdn-cgi/
Disallow	/dynjs/
Disallow	/dyn/actions/
Disallow	/en/search?
Disallow	/fr/search?
Disallow	/ta/search?
Disallow	/de/search?
Allow	/

Rule

Path

Disallow

/search?

Disallow

/edit/

Disallow

/cdn-cgi/

Disallow

/dynjs/

Disallow

/dyn/actions/

Disallow

/en/search?

Disallow

/fr/search?

Disallow

/ta/search?

Disallow

/de/search?

Allow

Other Records

Field	Value
sitemap	https://worldwidearchive.org/sitemaps/sitemap_index.xml

Field

Value

sitemap

https://worldwidearchive.org/sitemaps/sitemap_index.xml

worldwidearchive.orgrobots.txt

Resource Scan

Scan Details

Last Scan

Groups

daumoa

cliqzbot

crawler4j

getintent

coccoc

proximic

grapeshot

ltx71

jamesbot

smtbot

scrapy

gptbot

claudebot

amazonbot

blexbot

ccbot

bytespider

ahrefsbot

mj12bot

semrushbot

imagesiftbot

meta-externalagent

facebookexternalhit

serpstatbot

dataforseobot

barkrowler

petalbot

*

Other Records

worldwidearchive.org
robots.txt