npr.org
robots.txt

Robots Exclusion Standard data for npr.org

Resource Scan

Scan Details

Site Domain npr.org
Base Domain npr.org
Scan Status Failed
Failure ReasonScan timed out.
Last Scan2024-03-09T23:13:26+00:00
Next Scan 2024-06-07T23:13:26+00:00

Last Successful Scan

Scanned2023-01-09T10:06:38+00:00
URL https://npr.org/robots.txt
Redirect https://www.npr.org/robots.txt
Redirect Domain www.npr.org
Redirect Base npr.org
Domain IPs 13.35.8.6, 13.35.8.68, 13.35.8.84, 13.35.8.89
Redirect IPs 184.31.4.152, 2600:1413:b000:483::1155, 2600:1413:b000:492::1155
Response IP 23.76.230.105
Found Yes
Hash de8a54317526a0a7bee0b16038fcae54e82db0ab607f1dd455e3a7379629861b
SimHash 8d151603ada3

Groups

*

Rule Path
Disallow /mpx/
Disallow /cgi-bin
Disallow /ramfiles/
Disallow /oauth2/
Disallow /account/
Disallow /proxy/
Disallow /*.smil
Disallow /*.asx
Disallow /*.ram
Disallow /*.wav
Disallow /*.rmm
Disallow /*.js
Disallow /*.au
Disallow /stations/force/force_localization.php?
Disallow /rundowns/segment.php?
Disallow /templates/search/*
Disallow /2013/03/21/174840895/
Disallow /sections/ombudsman/2008/01/frequently_asked_questions_1.html
Disallow /sections/health-shots/2013/03/11/173816690/new-voices-for-the-voiceless-synthetic-speech-gets-an-upgrade
Disallow /transcripts/470280334*
Disallow /2015/07/04/419570939/chasing-memories-in-their-refugee-camp-40-years-after-they-fled-vietnam
Disallow /transcripts/419570939*
Disallow /sections/parallels/2016/08/15/480128005/for-french-teens-smoking-still-has-more-allure-than-stigma
Disallow /transcripts/480128005*
Disallow /tags*
Disallow /sureroute
Disallow /*/partials*
Disallow /*?live=*
Disallow /*?cacheKill=*
Disallow /*?skipCache=*
Disallow /*?forceCds=*
Disallow /*?forceXml=*

Other Records

Field Value
sitemap https://legacy.npr.org/googlecrawl/sitemap_index.xml
sitemap https://legacy.npr.org/googlecrawl/sitemap_news.xml
sitemap https://legacy.npr.org/googlecrawl/sitemap_video.xml
sitemap https://www.npr.org/live-updates/sitemap.xml

Comments

  • robots.txt for www.npr.org
  • Changes are tracked in www-render
  • Ensures that we're using the correct sitemap. The fact that this is legacy*.npr.org is OK because the crawler will only accept
  • URLs in this sitemap to match www*.npr.org