feeds.npr.org
robots.txt

Robots Exclusion Standard data for feeds.npr.org

Resource Scan

Scan Details

Site Domain feeds.npr.org
Base Domain npr.org
Scan Status Ok
Last Scan2024-09-14T23:37:44+00:00
Next Scan 2024-10-14T23:37:44+00:00

Last Scan

Scanned2024-09-14T23:37:44+00:00
URL https://feeds.npr.org/robots.txt
Domain IPs 23.215.7.10, 23.215.7.4, 2600:1413:b000:1b::17d7:704, 2600:1413:b000:1b::17d7:70a
Response IP 96.17.180.32
Found Yes
Hash 9df237fd6ef88590acb9ba6a8df6d3e98bcb81940efc47afa11491d1ece77885
SimHash 09195604aca3

Groups

*

Rule Path
Disallow /mpx/
Disallow /cgi-bin
Disallow /ramfiles/
Disallow /oauth2/
Disallow /account/
Disallow /*.smil
Disallow /*.asx
Disallow /*.ram
Disallow /*.wav
Disallow /*.rmm
Disallow /*.js
Disallow /*.au
Disallow /stations/force/force_localization.php?
Disallow /rundowns/segment.php?
Disallow /templates/search/
Disallow /search
Disallow /2013/03/21/174840895/
Disallow /sections/ombudsman/2008/01/frequently_asked_questions_1.html
Disallow /sections/health-shots/2013/03/11/173816690/new-voices-for-the-voiceless-synthetic-speech-gets-an-upgrade
Disallow /transcripts/470280334*
Disallow /tags*

gptbot

Rule Path
Disallow /

chatgpt-user

Rule Path
Disallow /

Other Records

Field Value
sitemap https://googlecrawl.npr.org/news/sitemap_news.xml
sitemap https://googlecrawl.npr.org/standard/sitemap_index.xml
sitemap https://googlecrawl.npr.org/video/sitemap_video.xml
sitemap https://www.npr.org/live-updates/sitemap.xml

Comments

  • robots.txt for www.npr.org
  • Changes are tracked in npr_seamus
  • Disallowing the OpenAI web crawler
  • Disallowing OpenAI plugins