economist.com
robots.txt

Robots Exclusion Standard data for economist.com

Archived Snapshots

Resource Scan

Scan Details

Site Domain	economist.com
Base Domain	economist.com
Scan Status	Ok
Last Scan	2024-11-02T11:56:41+00:00
Next Scan	2024-11-09T11:56:41+00:00

Last Scan

Scanned	2024-11-02T11:56:41+00:00
URL	https://economist.com/robots.txt
Redirect	https://www.economist.com/robots.txt
Redirect Domain	www.economist.com
Redirect Base	economist.com
Domain IPs	104.18.42.19, 172.64.145.237
Redirect IPs	104.18.42.19, 172.64.145.237
Response IP	104.18.42.19
Found	Yes
Hash	788bf7e963a05e7c2dfae660fd14a2557ec2642169ad3152c0005458ba4feef3
SimHash	d034f1109cf0

Groups

mediapartners-google*

Rule	Path
Allow	/

Rule

Path

Allow

grapeshot

Rule	Path
Allow	/

Rule

Path

Allow

gptbot

Rule	Path
Disallow	/

Rule

Path

Disallow

google-extended

Rule	Path
Disallow	/

Rule

Path

Disallow

chatgpt-user

Rule	Path
Disallow	/

Rule

Path

Disallow

ccbot

Rule	Path
Disallow	/

Rule

Path

Disallow

piplbot

Rule	Path
Disallow	/

Rule

Path

Disallow

anthropic-ai

Rule	Path
Disallow	/

Rule

Path

Disallow

claude-web

Rule	Path
Disallow	/

Rule

Path

Disallow

turnitinbot

Rule	Path
Disallow	/

Rule

Path

Disallow

petalbot

Rule	Path
Disallow	/

Rule

Path

Disallow

moodlebot

Rule	Path
Disallow	/

Rule

Path

Disallow

magpie-crawler

Rule	Path
Disallow	/

Rule

Path

Disallow

ia_archiver

Rule	Path
Disallow	/

Rule

Path

Disallow

applebot-extended

Rule	Path
Disallow	/

Rule

Path

Disallow

perplexitybot

Rule	Path
Disallow	/

Rule

Path

Disallow

bytespider

Rule	Path
Disallow	/

Rule

Path

Disallow

*

Rule	Path
Disallow	/includes/
Disallow	/misc/
Disallow	/modules/
Disallow	/profiles/
Disallow	/scripts/
Disallow	/script/
Disallow	/sites/
Disallow	/digitaledition/
Disallow	/search/apachesolr_search/
Disallow	/search/ec_solr/
Disallow	/search/google/
Disallow	/rpx/
Disallow	/report-abuse/
Disallow	/user/
Disallow	/users/
Disallow	/esi/
Disallow	/5605/
Disallow	/pubads.g.doubleclick.net/
Disallow	/subscribe/getstarted/
Disallow	/assets/infographic/
Disallow	/CHANGELOG.txt
Disallow	/cron.php
Disallow	/INSTALL.mysql.txt
Disallow	/INSTALL.pgsql.txt
Disallow	/install.php
Disallow	/INSTALL.txt
Disallow	/LICENSE.txt
Disallow	/MAINTAINERS.txt
Disallow	/geoip.php
Disallow	/update.php
Disallow	/UPGRADE.txt
Disallow	/xmlrpc.php
Disallow	/admin/
Disallow	/comment/reply/
Disallow	/contact/
Disallow	/logout/
Disallow	/node/add/
Disallow	/search/
Disallow	/semantic-homepage/
Disallow	/vote/
Disallow	/taxonomy/term/
Disallow	/admin
Disallow	/comment/reply
Disallow	/contact
Disallow	/lab
Disallow	/logout
Disallow	/node/add
Disallow	/semantic-homepage
Disallow	/user
Disallow	/uspod
Disallow	/which-mba
Disallow	/whichmba/webinars?
Disallow	/checkout
Disallow	/?q=admin%2F
Disallow	/?q=comment%2Freply%2F
Disallow	/?q=contact%2F
Disallow	/?q=logout%2F
Disallow	/?q=node%2Fadd%2F
Disallow	/search?q=
Disallow	/?q=user
Disallow	/?q=vote%2F
Disallow	*?story_id=
Disallow	*?RefID=
Disallow	/members/
Disallow	/subscriptions/
Disallow	/*/print$
Disallow	/hidden-content/
Allow	/sites/default/files/
Allow	/sites/all/themes/
Allow	/whichmba/webinars?page=
Disallow	/whichmba/forum
Disallow	/ajax/comment/reply
Disallow	/ajax/comment/edit
Disallow	/ajax/comment/add
Disallow	/ajax/comment/reply/form
Disallow	/ajax/report-abuse/comment
Disallow	/audio-edition-podcast/*/index.xml
Disallow	/bookmarks

Rule

Path

Disallow

/includes/

Disallow

/misc/

Disallow

/modules/

Disallow

/profiles/

Disallow

/scripts/

Disallow

/script/

Disallow

/sites/

Disallow

/digitaledition/

Disallow

/search/apachesolr_search/

Disallow

/search/ec_solr/

Disallow

/search/google/

Disallow

/rpx/

Disallow

/report-abuse/

Disallow

/user/

Disallow

/users/

Disallow

/esi/

Disallow

/5605/

Disallow

/pubads.g.doubleclick.net/

Disallow

/subscribe/getstarted/

Disallow

/assets/infographic/

Disallow

/CHANGELOG.txt

Disallow

/cron.php

Disallow

/INSTALL.mysql.txt

Disallow

/INSTALL.pgsql.txt

Disallow

/install.php

Disallow

/INSTALL.txt

Disallow

/LICENSE.txt

Disallow

/MAINTAINERS.txt

Disallow

/geoip.php

Disallow

/update.php

Disallow

/UPGRADE.txt

Disallow

/xmlrpc.php

Disallow

/admin/

Disallow

/comment/reply/

Disallow

/contact/

Disallow

/logout/

Disallow

/node/add/

Disallow

/search/

Disallow

/semantic-homepage/

Disallow

/vote/

Disallow

/taxonomy/term/

Disallow

/admin

Disallow

/comment/reply

Disallow

/contact

Disallow

/lab

Disallow

/logout

Disallow

/node/add

Disallow

/semantic-homepage

Disallow

/user

Disallow

/uspod

Disallow

/which-mba

Disallow

/whichmba/webinars?

Disallow

/checkout

Disallow

/?q=admin%2F

Disallow

/?q=comment%2Freply%2F

Disallow

/?q=contact%2F

Disallow

/?q=logout%2F

Disallow

/?q=node%2Fadd%2F

Disallow

/search?q=

Disallow

/?q=user

Disallow

/?q=vote%2F

Disallow

*?story_id=

Disallow

*?RefID=

Disallow

/members/

Disallow

/subscriptions/

Disallow

/*/print$

Disallow

/hidden-content/

Allow

/sites/default/files/

Allow

/sites/all/themes/

Allow

/whichmba/webinars?page=

Disallow

/whichmba/forum

Disallow

/ajax/comment/reply

Disallow

/ajax/comment/edit

Disallow

/ajax/comment/add

Disallow

/ajax/comment/reply/form

Disallow

/ajax/report-abuse/comment

Disallow

/audio-edition-podcast/*/index.xml

Disallow

/bookmarks

Other Records

Field	Value
sitemap	https://www.economist.com/sitemap.xml
sitemap	https://www.economist.com/googlenews.xml

Field

Value

sitemap

https://www.economist.com/sitemap.xml

sitemap

https://www.economist.com/googlenews.xml

Comments

robots.txt
Specific robot directives:
Description : Google AdSense delivers advertisements to a broad network of affiliated sites.
A robot analyses the pages that display the ads in order to target the ads to the page content.
Description : The Grapeshot crawler is an automated robot that visits pages to examine and analyse the content.
This adds an exception to crawl delay while preserving disallows.
GPTBot is OpenAIâs web crawler
Allows us to block Google's bot Bard
ChatGPT-User is OpenAIâs web crawler
Common Crawl bot
PiplBot is PiplBot's web crawler
anthropic-ai is Anthropic's web crawler
Claude-Web is Claudeâs web crawler
TurnitinBot is Turnitinâs web crawler
PetalBot is Petalâs web crawler
MoodleBot is Moodleâs web crawler
magpie-crawler is Brandwatch.comâs web crawler
ia_archiver is Wayback Machineâs web crawler
Applebot-Extended is Apple's secondary user agent
PerplexityBot is the crawler for perplexity AI
Bytespider is a web crawler operated by ByteDance, the Chinese owner of TikTok. It's allegedly used to download training data for its LLMs including those powering ChatGPT competitor Doubao.
No robots are allowed to index private paths:
Sitemap
Directories
Files
Paths (clean URLs)
Paths (no trailing /, beware this will stop file like /admin.html being
indexed if we had any)
Paths (no clean URLs)
Coldfusion paths
Print pages
Hidden articles
Allowed items
Comment urls deprecation
Prevent crawling podcast RSS file
Reading list

economist.comrobots.txt

Resource Scan

Scan Details

Last Scan

Groups

mediapartners-google*

grapeshot

gptbot

google-extended

chatgpt-user

ccbot

piplbot

anthropic-ai

claude-web

turnitinbot

petalbot

moodlebot

magpie-crawler

ia_archiver

applebot-extended

perplexitybot

bytespider

*

Other Records

Comments

economist.com
robots.txt