economist.com
robots.txt

Robots Exclusion Standard data for economist.com

Resource Scan

Scan Details

Site Domain economist.com
Base Domain economist.com
Scan Status Ok
Last Scan2024-11-02T11:56:41+00:00
Next Scan 2024-11-09T11:56:41+00:00

Last Scan

Scanned2024-11-02T11:56:41+00:00
URL https://economist.com/robots.txt
Redirect https://www.economist.com/robots.txt
Redirect Domain www.economist.com
Redirect Base economist.com
Domain IPs 104.18.42.19, 172.64.145.237
Redirect IPs 104.18.42.19, 172.64.145.237
Response IP 104.18.42.19
Found Yes
Hash 788bf7e963a05e7c2dfae660fd14a2557ec2642169ad3152c0005458ba4feef3
SimHash d034f1109cf0

Groups

mediapartners-google*

Rule Path
Allow /

grapeshot

Rule Path
Allow /

gptbot

Rule Path
Disallow /

google-extended

Rule Path
Disallow /

chatgpt-user

Rule Path
Disallow /

ccbot

Rule Path
Disallow /

piplbot

Rule Path
Disallow /

anthropic-ai

Rule Path
Disallow /

claude-web

Rule Path
Disallow /

turnitinbot

Rule Path
Disallow /

petalbot

Rule Path
Disallow /

moodlebot

Rule Path
Disallow /

magpie-crawler

Rule Path
Disallow /

ia_archiver

Rule Path
Disallow /

applebot-extended

Rule Path
Disallow /

perplexitybot

Rule Path
Disallow /

bytespider

Rule Path
Disallow /

*

Rule Path
Disallow /includes/
Disallow /misc/
Disallow /modules/
Disallow /profiles/
Disallow /scripts/
Disallow /script/
Disallow /sites/
Disallow /digitaledition/
Disallow /search/apachesolr_search/
Disallow /search/ec_solr/
Disallow /search/google/
Disallow /rpx/
Disallow /report-abuse/
Disallow /user/
Disallow /users/
Disallow /esi/
Disallow /5605/
Disallow /pubads.g.doubleclick.net/
Disallow /subscribe/getstarted/
Disallow /assets/infographic/
Disallow /CHANGELOG.txt
Disallow /cron.php
Disallow /INSTALL.mysql.txt
Disallow /INSTALL.pgsql.txt
Disallow /install.php
Disallow /INSTALL.txt
Disallow /LICENSE.txt
Disallow /MAINTAINERS.txt
Disallow /geoip.php
Disallow /update.php
Disallow /UPGRADE.txt
Disallow /xmlrpc.php
Disallow /admin/
Disallow /comment/reply/
Disallow /contact/
Disallow /logout/
Disallow /node/add/
Disallow /search/
Disallow /semantic-homepage/
Disallow /vote/
Disallow /taxonomy/term/
Disallow /admin
Disallow /comment/reply
Disallow /contact
Disallow /lab
Disallow /logout
Disallow /node/add
Disallow /semantic-homepage
Disallow /user
Disallow /uspod
Disallow /which-mba
Disallow /whichmba/webinars?
Disallow /checkout
Disallow /?q=admin%2F
Disallow /?q=comment%2Freply%2F
Disallow /?q=contact%2F
Disallow /?q=logout%2F
Disallow /?q=node%2Fadd%2F
Disallow /search?q=
Disallow /?q=user
Disallow /?q=vote%2F
Disallow *?story_id=
Disallow *?RefID=
Disallow /members/
Disallow /subscriptions/
Disallow /*/print$
Disallow /hidden-content/
Allow /sites/default/files/
Allow /sites/all/themes/
Allow /whichmba/webinars?page=
Disallow /whichmba/forum
Disallow /ajax/comment/reply
Disallow /ajax/comment/edit
Disallow /ajax/comment/add
Disallow /ajax/comment/reply/form
Disallow /ajax/report-abuse/comment
Disallow /audio-edition-podcast/*/index.xml
Disallow /bookmarks

Other Records

Field Value
sitemap https://www.economist.com/sitemap.xml
sitemap https://www.economist.com/googlenews.xml

Comments

  • robots.txt
  • Specific robot directives:
  • Description : Google AdSense delivers advertisements to a broad network of affiliated sites.
  • A robot analyses the pages that display the ads in order to target the ads to the page content.
  • Description : The Grapeshot crawler is an automated robot that visits pages to examine and analyse the content.
  • This adds an exception to crawl delay while preserving disallows.
  • GPTBot is OpenAI’s web crawler
  • Allows us to block Google's bot Bard
  • ChatGPT-User is OpenAI’s web crawler
  • Common Crawl bot
  • PiplBot is PiplBot's web crawler
  • anthropic-ai is Anthropic's web crawler
  • Claude-Web is Claude’s web crawler
  • TurnitinBot is Turnitin’s web crawler
  • PetalBot is Petal’s web crawler
  • MoodleBot is Moodle’s web crawler
  • magpie-crawler is Brandwatch.com’s web crawler
  • ia_archiver is Wayback Machine’s web crawler
  • Applebot-Extended is Apple's secondary user agent
  • PerplexityBot is the crawler for perplexity AI
  • Bytespider is a web crawler operated by ByteDance, the Chinese owner of TikTok. It's allegedly used to download training data for its LLMs including those powering ChatGPT competitor Doubao.
  • No robots are allowed to index private paths:
  • Sitemap
  • Directories
  • Files
  • Paths (clean URLs)
  • Paths (no trailing /, beware this will stop file like /admin.html being
  • indexed if we had any)
  • Paths (no clean URLs)
  • Coldfusion paths
  • Print pages
  • Hidden articles
  • Allowed items
  • Comment urls deprecation
  • Prevent crawling podcast RSS file
  • Reading list