findhereall.com
robots.txt

Robots Exclusion Standard data for findhereall.com

Resource Scan

Scan Details

Site Domain findhereall.com
Base Domain findhereall.com
Scan Status Ok
Last Scan2024-05-19T02:42:13+00:00
Next Scan 2024-05-26T02:42:13+00:00

Last Scan

Scanned2024-05-19T02:42:13+00:00
URL https://findhereall.com/robots.txt
Domain IPs 179.61.189.27, 2a02:4780:15:de10:2fc6:2f96:c095:aa31
Response IP 91.108.100.48
Found Yes
Hash 6c23e6a56995a2629f51ce1dfb13f9fecb2f2c12353f3621097d2f9803587a96
SimHash ef015529e7f4

Groups

*

Rule Path
Disallow /sendarticle/
Disallow /Users/
Disallow /users/
Disallow /*/print$
Disallow /email/
Disallow /contactus/
Disallow /share/
Disallow /websearch
Disallow /*?commentpage=
Disallow /whsmiths/
Disallow /external/overture/
Disallow /discussion/report-abuse/*
Disallow /discussion/report-abuse-ajax/*
Disallow /discussion/comment-permalink/*
Disallow /discussion/report-abuse/*
Disallow /discussion/user-report-abuse/*
Disallow /discussion/handlers/*
Disallow /discussion/your-profile
Disallow /discussion/your-comments
Disallow /discussion/edit-profile
Disallow /discussion/search/comments
Disallow /discussion/*
Disallow /search
Disallow /music/artist/*
Disallow /music/album/*
Disallow /books/data/*
Disallow /settings/
Disallow /embed/
Disallow /*styles/js-on.css$
Disallow /sport/olympics/2008/events/*
Disallow /sport/olympics/2008/medals/*
Disallow /f/healthcheck
Disallow /sections
Disallow /top-stories
Disallow /most-read/sport
Disallow /articles
Disallow /global$
Disallow /*/feedarticle/*
Disallow /travel/2013/aug/22/been-there-readers-competition?*
Disallow /preference/*
Disallow /59666047/
Disallow /print/
Disallow /info/tech-feedback
Disallow /production-monitoring/
Disallow *.emailjson
Disallow *.emailtxt
Disallow /headline.txt
Disallow *?*dcr=apps*

mediapartners-google

Rule Path
Disallow

newsnow

Rule Path
Disallow /

gptbot

Rule Path
Disallow /

ccbot

Rule Path
Disallow /

turnitinbot

Rule Path
Disallow /

petalbot

Rule Path
Disallow /

moodlebot

Rule Path
Disallow /

facebookbot

Rule Path
Disallow /

bytespider

Rule Path
Disallow /

google-extended

Rule Path
Disallow /

https://hada.news

Rule Path
Disallow /

https://www.imediaethics.org

Rule Path
Disallow /

mojeek

Rule Path
Disallow /

jenkersbot

Rule Path
Disallow /

seekr

Rule Path
Disallow /

turnitin

Rule Path
Disallow /

youbot

Rule Path
Disallow /

arquivo-web-crawler

Rule Path
Disallow /

coccocbot-web

Rule Path
Disallow /

seznambot

Rule Path
Disallow /

perplexitybot

Rule Path
Disallow /

yacy

Rule Path
Disallow /

anthropic-ai

Rule Path
Disallow /

claudebot

Rule Path
Disallow /

bingbot

Rule Path
Disallow /

Other Records

Field Value
sitemap http://www.theguardian.com/sitemaps/news.xml
sitemap http://www.theguardian.com/sitemaps/video.xml

Comments

  • This is the robots.txt file for theguardian.com
  • Guardian content is made available under our terms and conditions of use.
  • Any other uses are not permitted, incl. but not limited to: for large language
  • models (LLMs), machine learning and/or artificial intelligence-related
  • purposes; with any of the aforementioned technologies; and/or for any
  • commercial purposes. Contact licensing@theguardian.com for assistance