npr.com
robots.txt

Robots Exclusion Standard data for npr.com

Resource Scan

Scan Details

Site Domain npr.com
Base Domain npr.com
Scan Status Failed
Failure StageFetching resource.
Failure ReasonCouldn't establish SSL connection.
Last Scan2024-10-29T01:15:25+00:00
Next Scan 2025-01-27T01:15:25+00:00

Last Successful Scan

Scanned2023-01-04T23:47:49+00:00
URL http://npr.com/robots.txt
Redirect https://www.npr.org/robots.txt
Redirect Domain www.npr.org
Redirect Base npr.org
Domain IPs 216.35.221.76
Redirect IPs 125.56.235.108, 2600:1413:b000:886::1155, 2600:1413:b000:89c::1155
Response IP 23.50.118.149
Found Yes
Hash de8a54317526a0a7bee0b16038fcae54e82db0ab607f1dd455e3a7379629861b
SimHash 8d151603ada3

Groups

*

Rule Path
Disallow /mpx/
Disallow /cgi-bin
Disallow /ramfiles/
Disallow /oauth2/
Disallow /account/
Disallow /proxy/
Disallow /*.smil
Disallow /*.asx
Disallow /*.ram
Disallow /*.wav
Disallow /*.rmm
Disallow /*.js
Disallow /*.au
Disallow /stations/force/force_localization.php?
Disallow /rundowns/segment.php?
Disallow /templates/search/*
Disallow /2013/03/21/174840895/
Disallow /sections/ombudsman/2008/01/frequently_asked_questions_1.html
Disallow /sections/health-shots/2013/03/11/173816690/new-voices-for-the-voiceless-synthetic-speech-gets-an-upgrade
Disallow /transcripts/470280334*
Disallow /2015/07/04/419570939/chasing-memories-in-their-refugee-camp-40-years-after-they-fled-vietnam
Disallow /transcripts/419570939*
Disallow /sections/parallels/2016/08/15/480128005/for-french-teens-smoking-still-has-more-allure-than-stigma
Disallow /transcripts/480128005*
Disallow /tags*
Disallow /sureroute
Disallow /*/partials*
Disallow /*?live=*
Disallow /*?cacheKill=*
Disallow /*?skipCache=*
Disallow /*?forceCds=*
Disallow /*?forceXml=*

Other Records

Field Value
sitemap https://legacy.npr.org/googlecrawl/sitemap_index.xml
sitemap https://legacy.npr.org/googlecrawl/sitemap_news.xml
sitemap https://legacy.npr.org/googlecrawl/sitemap_video.xml
sitemap https://www.npr.org/live-updates/sitemap.xml

Comments

  • robots.txt for www.npr.org
  • Changes are tracked in www-render
  • Ensures that we're using the correct sitemap. The fact that this is legacy*.npr.org is OK because the crawler will only accept
  • URLs in this sitemap to match www*.npr.org