thewholeinternet.net
robots.txt

Robots Exclusion Standard data for thewholeinternet.net

Resource Scan

Scan Details

Site Domain thewholeinternet.net
Base Domain thewholeinternet.net
Scan Status Ok
Last Scan2026-02-07T14:49:41+00:00
Next Scan 2026-02-14T14:49:41+00:00

Last Scan

Scanned2026-02-07T14:49:41+00:00
URL https://thewholeinternet.net/robots.txt
Domain IPs 162.210.96.121
Response IP 162.210.96.121
Found Yes
Hash 850f26c321e98237b80fb48f9e9f7c325c50ba5dcd03af5403f966f59ce033ad
SimHash f197285082d2

Groups

applebot-extended

Rule Path
Disallow /

ai2bot

Rule Path
Disallow /

bytespider

Rule Path
Disallow /

ccbot

Rule Path
Disallow /

claude-web

Rule Path
Disallow /

claudebot

Rule Path
Disallow /

cohere-training-data-crawler

Rule Path
Disallow /

criteobot

Rule Path
Disallow /

diffbot

Rule Path
Disallow /

gptbot

Rule Path
Disallow /

imagesiftbot

Rule Path
Disallow /

kangaroo bot

Rule Path
Disallow /

meta-externalagent

Rule Path
Disallow /

pangubot

Rule Path
Disallow /

perplexitybot

Rule Path
Disallow /

terracotta

Rule Path
Disallow /web-hosting-articles/
Disallow /wp-admin/

timpibot

Rule Path
Disallow /

webzio-extended

Rule Path
Disallow /

youbot

Rule Path
Disallow /

Other Records

Field Value
sitemap https://thewholeinternet.net/sitemap.xml