theguardian.com
robots.txt

Robots Exclusion Standard data for theguardian.com

Archived Snapshots

Resource Scan

Scan Details

Site Domain	theguardian.com
Base Domain	theguardian.com
Scan Status	Ok
Last Scan	2024-10-31T23:15:54+00:00
Next Scan	2024-11-07T23:15:54+00:00

Last Scan

Scanned	2024-10-31T23:15:54+00:00
URL	https://theguardian.com/robots.txt
Redirect	https://www.theguardian.com/robots.txt
Redirect Domain	www.theguardian.com
Redirect Base	theguardian.com
Domain IPs	151.101.1.111, 151.101.129.111, 151.101.193.111, 151.101.65.111, 2a04:4e42:200::367, 2a04:4e42:400::367, 2a04:4e42:600::367, 2a04:4e42::367
Redirect IPs	151.101.1.111, 151.101.129.111, 151.101.193.111, 151.101.65.111, 2a04:4e42:200::367, 2a04:4e42:400::367, 2a04:4e42:600::367, 2a04:4e42::367
Response IP	199.232.45.111
Found	Yes
Hash	1355a03101eb6a27a14d0f9e840bbec3c3e45e4b7104b12a8442f34b569efab9
SimHash	cf015509e7e6

Groups

*

Rule	Path
Disallow	/sendarticle/
Disallow	/Users/
Disallow	/users/
Disallow	/*/print$
Disallow	/email/
Disallow	/contactus/
Disallow	/share/
Disallow	/websearch
Disallow	/*?commentpage=
Disallow	/whsmiths/
Disallow	/external/overture/
Disallow	/discussion/report-abuse/*
Disallow	/discussion/report-abuse-ajax/*
Disallow	/discussion/comment-permalink/*
Disallow	/discussion/report-abuse/*
Disallow	/discussion/user-report-abuse/*
Disallow	/discussion/handlers/*
Disallow	/discussion/your-profile
Disallow	/discussion/your-comments
Disallow	/discussion/edit-profile
Disallow	/discussion/search/comments
Disallow	/discussion/*
Disallow	/search
Disallow	/music/artist/*
Disallow	/music/album/*
Disallow	/books/data/*
Disallow	/settings/
Disallow	/embed/
Disallow	/*styles/js-on.css$
Disallow	/sport/olympics/2008/events/*
Disallow	/sport/olympics/2008/medals/*
Disallow	/f/healthcheck
Disallow	/sections
Disallow	/top-stories
Disallow	/most-read/sport
Disallow	/articles
Disallow	/global$
Disallow	//feedarticle/
Disallow	/travel/2013/aug/22/been-there-readers-competition?*
Disallow	/preference/*
Disallow	/59666047/
Disallow	/print/
Disallow	/info/tech-feedback
Disallow	/production-monitoring/
Disallow	*.emailjson
Disallow	*.emailtxt
Disallow	/headline.txt
Disallow	?dcr=apps*

Rule

Path

Disallow

/sendarticle/

Disallow

/Users/

Disallow

/users/

Disallow

/*/print$

Disallow

/email/

Disallow

/contactus/

Disallow

/share/

Disallow

/websearch

Disallow

/*?commentpage=

Disallow

/whsmiths/

Disallow

/external/overture/

Disallow

/discussion/report-abuse/*

Disallow

/discussion/report-abuse-ajax/*

Disallow

/discussion/comment-permalink/*

Disallow

/discussion/report-abuse/*

Disallow

/discussion/user-report-abuse/*

Disallow

/discussion/handlers/*

Disallow

/discussion/your-profile

Disallow

/discussion/your-comments

Disallow

/discussion/edit-profile

Disallow

/discussion/search/comments

Disallow

/discussion/*

Disallow

/search

Disallow

/music/artist/*

Disallow

/music/album/*

Disallow

/books/data/*

Disallow

/settings/

Disallow

/embed/

Disallow

/*styles/js-on.css$

Disallow

/sport/olympics/2008/events/*

Disallow

/sport/olympics/2008/medals/*

Disallow

/f/healthcheck

Disallow

/sections

Disallow

/top-stories

Disallow

/most-read/sport

Disallow

/articles

Disallow

/global$

Disallow

/*/feedarticle/*

Disallow

/travel/2013/aug/22/been-there-readers-competition?*

Disallow

/preference/*

Disallow

/59666047/

Disallow

/print/

Disallow

/info/tech-feedback

Disallow

/production-monitoring/

Disallow

*.emailjson

Disallow

*.emailtxt

Disallow

/headline.txt

Disallow

*?*dcr=apps*

mediapartners-google

Rule	Path
Disallow

Rule

Path

Disallow

newsnow

Rule	Path
Disallow	/

Rule

Path

Disallow

gptbot

Rule	Path
Disallow	/

Rule

Path

Disallow

ccbot

Rule	Path
Disallow	/

Rule

Path

Disallow

turnitinbot

Rule	Path
Disallow	/

Rule

Path

Disallow

petalbot

Rule	Path
Disallow	/

Rule

Path

Disallow

moodlebot

Rule	Path
Disallow	/

Rule

Path

Disallow

facebookbot

Rule	Path
Disallow	/

Rule

Path

Disallow

bytespider

Rule	Path
Disallow	/

Rule

Path

Disallow

google-extended

Rule	Path
Disallow	/

Rule

Path

Disallow

https://hada.news

Rule	Path
Disallow	/

Rule

Path

Disallow

https://www.imediaethics.org

Rule	Path
Disallow	/

Rule

Path

Disallow

mojeek

Rule	Path
Disallow	/

Rule

Path

Disallow

jenkersbot

Rule	Path
Disallow	/

Rule

Path

Disallow

seekr

Rule	Path
Disallow	/

Rule

Path

Disallow

turnitin

Rule	Path
Disallow	/

Rule

Path

Disallow

youbot

Rule	Path
Disallow	/

Rule

Path

Disallow

arquivo-web-crawler

Rule	Path
Disallow	/

Rule

Path

Disallow

coccocbot-web

Rule	Path
Disallow	/

Rule

Path

Disallow

seznambot

Rule	Path
Disallow	/

Rule

Path

Disallow

perplexitybot

Rule	Path
Disallow	/

Rule

Path

Disallow

yacy

Rule	Path
Disallow	/

Rule

Path

Disallow

anthropic-ai

Rule	Path
Disallow	/

Rule

Path

Disallow

claudebot

Rule	Path
Disallow	/

Rule

Path

Disallow

bingbot

Rule	Path
Disallow	/

Rule

Path

Disallow

awariorssbot

Rule	Path
Disallow	/

Rule

Path

Disallow

awariosmartbot

Rule	Path
Disallow	/

Rule

Path

Disallow

netvibes

Rule	Path
Disallow	/

Rule

Path

Disallow

sentione

Rule	Path
Disallow	/

Rule

Path

Disallow

uptimerobot

Rule	Path
Disallow	/

Rule

Path

Disallow

imagesift

Rule	Path
Disallow	/

Rule

Path

Disallow

applebot-extended

Rule	Path
Disallow	/

Rule

Path

Disallow

yandexadditional

Rule	Path
Disallow	/

Rule

Path

Disallow

yandexadditionalbot

Rule	Path
Disallow	/

Rule

Path

Disallow

buck/2.4.2

Rule	Path
Disallow	/

Rule

Path

Disallow

meta-externalagent

Rule	Path
Disallow	/

Rule

Path

Disallow

Other Records

Field	Value
sitemap	http://www.theguardian.com/sitemaps/news.xml
sitemap	http://www.theguardian.com/sitemaps/video.xml

Field

Value

sitemap

http://www.theguardian.com/sitemaps/news.xml

sitemap

http://www.theguardian.com/sitemaps/video.xml

Comments

This is the robots.txt file for theguardian.com
Guardian content is made available under our terms and conditions of use.
Any other uses are not permitted, incl. but not limited to: for large language
models (LLMs), machine learning and/or artificial intelligence-related
purposes; with any of the aforementioned technologies; and/or for any
commercial purposes. Contact licensing@theguardian.com for assistance

Warnings

2 invalid lines.

theguardian.comrobots.txt

Resource Scan

Scan Details

Last Scan

Groups

*

mediapartners-google

newsnow

gptbot

ccbot

turnitinbot

petalbot

moodlebot

facebookbot

bytespider

google-extended

https://hada.news

https://www.imediaethics.org

mojeek

jenkersbot

seekr

turnitin

youbot

arquivo-web-crawler

coccocbot-web

seznambot

perplexitybot

yacy

anthropic-ai

claudebot

bingbot

awariorssbot

awariosmartbot

netvibes

sentione

uptimerobot

imagesift

applebot-extended

yandexadditional

yandexadditionalbot

buck/2.4.2

meta-externalagent

Other Records

Comments

Warnings

theguardian.com
robots.txt