thewholeinternet.net
robots.txt

Robots Exclusion Standard data for thewholeinternet.net

Archived Snapshots

Resource Scan

Scan Details

Site Domain	thewholeinternet.net
Base Domain	thewholeinternet.net
Scan Status	Ok
Last Scan	2026-02-07T14:49:41+00:00
Next Scan	2026-02-14T14:49:41+00:00

Last Scan

Scanned	2026-02-07T14:49:41+00:00
URL	https://thewholeinternet.net/robots.txt
Domain IPs	162.210.96.121
Response IP	162.210.96.121
Found	Yes
Hash	850f26c321e98237b80fb48f9e9f7c325c50ba5dcd03af5403f966f59ce033ad
SimHash	f197285082d2

Groups

applebot-extended

Rule	Path
Disallow	/

Rule

Path

Disallow

ai2bot

Rule	Path
Disallow	/

Rule

Path

Disallow

bytespider

Rule	Path
Disallow	/

Rule

Path

Disallow

ccbot

Rule	Path
Disallow	/

Rule

Path

Disallow

claude-web

Rule	Path
Disallow	/

Rule

Path

Disallow

claudebot

Rule	Path
Disallow	/

Rule

Path

Disallow

cohere-training-data-crawler

Rule	Path
Disallow	/

Rule

Path

Disallow

criteobot

Rule	Path
Disallow	/

Rule

Path

Disallow

diffbot

Rule	Path
Disallow	/

Rule

Path

Disallow

gptbot

Rule	Path
Disallow	/

Rule

Path

Disallow

imagesiftbot

Rule	Path
Disallow	/

Rule

Path

Disallow

kangaroo bot

Rule	Path
Disallow	/

Rule

Path

Disallow

meta-externalagent

Rule	Path
Disallow	/

Rule

Path

Disallow

pangubot

Rule	Path
Disallow	/

Rule

Path

Disallow

perplexitybot

Rule	Path
Disallow	/

Rule

Path

Disallow

terracotta

Rule	Path
Disallow	/web-hosting-articles/
Disallow	/wp-admin/

Rule

Path

Disallow

/web-hosting-articles/

Disallow

/wp-admin/

timpibot

Rule	Path
Disallow	/

Rule

Path

Disallow

webzio-extended

Rule	Path
Disallow	/

Rule

Path

Disallow

youbot

Rule	Path
Disallow	/

Rule

Path

Disallow

Other Records

Field	Value
sitemap	https://thewholeinternet.net/sitemap.xml

Field

Value

sitemap

https://thewholeinternet.net/sitemap.xml

thewholeinternet.netrobots.txt

Resource Scan

Scan Details

Last Scan

Groups

applebot-extended

ai2bot

bytespider

ccbot

claude-web

claudebot

cohere-training-data-crawler

criteobot

diffbot

gptbot

imagesiftbot

kangaroo bot

meta-externalagent

pangubot

perplexitybot

terracotta

timpibot

webzio-extended

youbot

Other Records

thewholeinternet.net
robots.txt