headline.com
robots.txt

Robots Exclusion Standard data for headline.com

Resource Scan

Scan Details

Site Domain headline.com
Base Domain headline.com
Scan Status Ok
Last Scan2026-02-18T16:54:39+00:00
Next Scan 2026-03-20T16:54:39+00:00

Last Scan

Scanned2026-02-18T16:54:39+00:00
URL https://headline.com/robots.txt
Domain IPs 76.76.21.21
Response IP 76.76.21.21
Found Yes
Hash b7bbf8d997209bd6f096cbac58da0a0ec96aa883d30da1097496075816da0e8f
SimHash 785c4868e792

Groups

googlebot
bingbot
uptimebot
better-stack
betteruptimebot

Rule Path
Disallow

ahrefsbot
semrushbot
mj12bot
majesticseo
dotbot
baiduspider
yandexbot
semrushbot
blexbot
dataforseobot
petalbot
blexbot
bytespider
seznambot
duckduckbot
facebookexternalhit
facebookbot
claudebot
claude-web
gptbot
chatgpt-user
ccbot
cohere-ai
diffbot
anthropic-ai

Rule Path
Disallow /

*

Rule Path
Disallow /api/
Disallow /asia/*/search

Other Records

Field Value
sitemap https://headline.com/sitemap.xml

Comments

  • Host
  • ✅ ALLOW: Good search engines and monitoring
  • ❌ BLOCK: Aggressive SEO scrapers and bandwidth hogs
  • ✅ ALLOW: Everything else (standard browsers, legitimate crawlers)
  • Sitemap (for Google & Bing SEO)

Warnings

  • `host` is not a known field.