guardianunlimited.co.uk
robots.txt

Robots Exclusion Standard data for guardianunlimited.co.uk

Resource Scan

Scan Details

Site Domain guardianunlimited.co.uk
Base Domain guardianunlimited.co.uk
Scan Status Failed
Failure ReasonScan timed out.
Last Scan2024-04-28T12:02:25+00:00
Next Scan 2024-06-27T12:02:25+00:00

Last Successful Scan

Scanned2021-10-19T20:56:22+00:00
URL http://guardianunlimited.co.uk/robots.txt
Redirect https://www.theguardian.com/robots.txt
Redirect Domain www.theguardian.com
Redirect Base theguardian.com
Found Yes
Hash 30d96e362ef58c65b94358afd43bda74ef8266733f6b33d2e5c06991baae543e
SimHash efa51fa1e7d4

Groups

*

Rule Path
Disallow /sendarticle/
Disallow /Users/
Disallow /users/
Disallow /*/print$
Disallow /email/
Disallow /contactus/
Disallow /share/
Disallow /websearch
Disallow /*?commentpage=
Disallow /whsmiths/
Disallow /external/overture/
Disallow /discussion/report-abuse/*
Disallow /discussion/report-abuse-ajax/*
Disallow /discussion/comment-permalink/*
Disallow /discussion/report-abuse/*
Disallow /discussion/user-report-abuse/*
Disallow /discussion/handlers/*
Disallow /discussion/your-profile
Disallow /discussion/your-comments
Disallow /discussion/edit-profile
Disallow /discussion/search/comments
Disallow /discussion/*
Disallow /search
Disallow /music/artist/*
Disallow /music/album/*
Disallow /books/data/*
Disallow /settings/
Disallow /embed/
Disallow /*styles/js-on.css$
Disallow /sport/olympics/2008/events/*
Disallow /sport/olympics/2008/medals/*
Disallow /f/healthcheck
Disallow /sections
Disallow /top-stories
Disallow /most-read/sport
Disallow /articles
Disallow /global$
Disallow /*/feedarticle/*
Disallow /travel/2013/aug/22/been-there-readers-competition?*
Disallow /preference/*
Disallow /59666047/
Disallow /print/
Disallow /info/tech-feedback
Disallow /production-monitoring/
Disallow *.emailjson
Disallow *.emailtxt
Disallow /headline.txt

mediapartners-google

Rule Path
Disallow

bingbot

Rule Path
Disallow /sendarticle/
Disallow /Users/
Disallow /users/
Disallow /*/print$
Disallow /email/
Disallow /contactus/
Disallow /share/
Disallow /websearch
Disallow /*?commentpage=
Disallow /whsmiths/
Disallow /external/overture/
Disallow /discussion/report-abuse/*
Disallow /discussion/report-abuse-ajax/*
Disallow /discussion/comment-permalink/*
Disallow /discussion/report-abuse/*
Disallow /discussion/user-report-abuse/*
Disallow /discussion/handlers/*
Disallow /discussion/your-profile
Disallow /discussion/your-comments
Disallow /discussion/edit-profile
Disallow /discussion/search/comments
Disallow /discussion/*
Disallow /search
Disallow /music/artist/*
Disallow /music/album/*
Disallow /books/data/*
Disallow /settings/
Disallow /embed/
Disallow /*styles/js-on.css$
Disallow /sport/olympics/2008/events/*
Disallow /sport/olympics/2008/medals/*
Disallow /f/healthcheck
Disallow /sections
Disallow /top-stories
Disallow /most-read/sport
Disallow /articles
Disallow /global$
Disallow /*/feedarticle/*
Disallow /travel/2013/aug/22/been-there-readers-competition?*
Disallow /preference/*
Disallow /59666047/
Disallow /print/
Disallow /info/tech-feedback
Disallow /production-monitoring/
Disallow *.emailjson
Disallow *.emailtxt
Disallow /headline.txt

Other Records

Field Value
crawl-delay 1

Other Records

Field Value
sitemap http://www.theguardian.com/sitemaps/news.xml
sitemap http://www.theguardian.com/sitemaps/video.xml

Comments

  • this is the robots.txt file for theguardian.com