cap.news
robots.txt

Robots Exclusion Standard data for cap.news

Resource Scan

Scan Details

Site Domain cap.news
Base Domain cap.news
Scan Status Ok
Last Scan2026-02-05T19:41:24+00:00
Next Scan 2026-03-07T19:41:24+00:00

Last Scan

Scanned2026-02-05T19:41:24+00:00
URL https://www.cap.news/robots.txt
Domain IPs 104.18.68.40, 104.18.69.40, 2606:4700::6812:4428, 2606:4700::6812:4528
Response IP 104.18.68.40
Found Yes
Hash 46d87c76f66c60881138397f15b68205866c397bbce64906555bc561cf3f1d72
SimHash 6f1d9c20ab11

Groups

amazonbot

Rule Path
Disallow /

googlebot

Rule Path
Disallow /nogooglebot/

*

Rule Path
Disallow /login

adsbot-google

Rule Path
Disallow /login

nutch

Rule Path
Disallow /

ahrefsbot

Rule Path
Disallow /login

Other Records

Field Value
crawl-delay 10

ahrefssiteaudit

Rule Path
Disallow /login

Other Records

Field Value
crawl-delay 10

mj12bot

Rule Path
Disallow /login

Other Records

Field Value
crawl-delay 10

Other Records

Field Value
sitemap https://www.cap.news/sitemap.xml

Comments

  • beehiiv default robots.txt
  • This is automatically used when you leave custom content empty
  • Customize below or upload your own robots.txt file