spectrumnoticias.com
robots.txt

Robots Exclusion Standard data for spectrumnoticias.com

Resource Scan

Scan Details

Site Domain spectrumnoticias.com
Base Domain spectrumnoticias.com
Scan Status Ok
Last Scan2025-12-28T00:01:09+00:00
Next Scan 2026-01-04T00:01:09+00:00

Last Scan

Scanned2025-12-28T00:01:09+00:00
URL https://spectrumnoticias.com/robots.txt
Domain IPs 34.224.210.193, 34.235.28.141, 52.21.98.4, 54.225.124.191
Response IP 34.235.28.141
Found Yes
Hash a9a5953c687b592f79fbcbc45499d11baba937adb4e4b200de7f20b8efd88936
SimHash 629e895ace94

Groups

*

Rule Path
Allow /$
Allow /ny/nyc
Allow /ny/nyc/*
Allow /tx/texas
Allow /tx/texas/*
Allow /ca/los-angeles
Allow /ca/los-angeles/*
Allow /us/noticias
Allow /us/noticias/*
Allow /fl/florida
Allow /fl/florida/*
Allow /sitemap.xml
Allow /services/*
Allow /content/*
Allow /etc/*
Allow /.well-known/assetlinks.json
Allow /local
Allow /splash
Allow /etc.clientlibs/*
Disallow /*
Disallow /*/*/partner-content/*
Disallow /content/news/stories/*

Other Records

Field Value
crawl-delay 1

twitterbot

Rule Path
Disallow /.well-known/

gptbot

Rule Path
Disallow /

anthropic-ai

Rule Path
Disallow /

petalbot

Rule Path
Disallow /

amazonbot

Rule Path
Disallow /

chatgpt

Rule Path
Disallow /

bytespider

Rule Path
Disallow /

claudebot

Rule Path
Disallow /

perplexitybot

Rule Path
Disallow /

meta-externalagent

Rule Path
Disallow /

chatgpt

Rule Path
Disallow /

ccbot

Rule Path
Disallow /

yeti

Rule Path
Disallow /

diffbot

Rule Path
Disallow /

timpibot

Rule Path
Disallow /

ai2bot

Rule Path
Disallow /

mistralai-user

Rule Path
Disallow /

anthropicbot

Rule Path
Disallow /

claude-web

Rule Path
Disallow /

cohere-ai

Rule Path
Disallow /

omgili

Rule Path
Disallow /

google-extended

Rule Path
Disallow /

applebot-extended

Rule Path
Disallow /

Other Records

Field Value
sitemap https://spectrumnoticias.com/sitemap.xml

Comments

  • Allowed Paths
  • Excluded Paths
  • Additional Config