findhereall.com
robots.txt

Robots Exclusion Standard data for findhereall.com

Archived Snapshots

Resource Scan

Scan Details

Site Domain	findhereall.com
Base Domain	findhereall.com
Scan Status	Ok
Last Scan	2024-05-19T02:42:13+00:00
Next Scan	2024-05-26T02:42:13+00:00

Last Scan

Scanned	2024-05-19T02:42:13+00:00
URL	https://findhereall.com/robots.txt
Domain IPs	179.61.189.27, 2a02:4780:15:de10:2fc6:2f96:c095:aa31
Response IP	91.108.100.48
Found	Yes
Hash	6c23e6a56995a2629f51ce1dfb13f9fecb2f2c12353f3621097d2f9803587a96
SimHash	ef015529e7f4

Groups

*

Rule	Path
Disallow	/sendarticle/
Disallow	/Users/
Disallow	/users/
Disallow	/*/print$
Disallow	/email/
Disallow	/contactus/
Disallow	/share/
Disallow	/websearch
Disallow	/*?commentpage=
Disallow	/whsmiths/
Disallow	/external/overture/
Disallow	/discussion/report-abuse/*
Disallow	/discussion/report-abuse-ajax/*
Disallow	/discussion/comment-permalink/*
Disallow	/discussion/report-abuse/*
Disallow	/discussion/user-report-abuse/*
Disallow	/discussion/handlers/*
Disallow	/discussion/your-profile
Disallow	/discussion/your-comments
Disallow	/discussion/edit-profile
Disallow	/discussion/search/comments
Disallow	/discussion/*
Disallow	/search
Disallow	/music/artist/*
Disallow	/music/album/*
Disallow	/books/data/*
Disallow	/settings/
Disallow	/embed/
Disallow	/*styles/js-on.css$
Disallow	/sport/olympics/2008/events/*
Disallow	/sport/olympics/2008/medals/*
Disallow	/f/healthcheck
Disallow	/sections
Disallow	/top-stories
Disallow	/most-read/sport
Disallow	/articles
Disallow	/global$
Disallow	//feedarticle/
Disallow	/travel/2013/aug/22/been-there-readers-competition?*
Disallow	/preference/*
Disallow	/59666047/
Disallow	/print/
Disallow	/info/tech-feedback
Disallow	/production-monitoring/
Disallow	*.emailjson
Disallow	*.emailtxt
Disallow	/headline.txt
Disallow	?dcr=apps*

Rule

Path

Disallow

/sendarticle/

Disallow

/Users/

Disallow

/users/

Disallow

/*/print$

Disallow

/email/

Disallow

/contactus/

Disallow

/share/

Disallow

/websearch

Disallow

/*?commentpage=

Disallow

/whsmiths/

Disallow

/external/overture/

Disallow

/discussion/report-abuse/*

Disallow

/discussion/report-abuse-ajax/*

Disallow

/discussion/comment-permalink/*

Disallow

/discussion/report-abuse/*

Disallow

/discussion/user-report-abuse/*

Disallow

/discussion/handlers/*

Disallow

/discussion/your-profile

Disallow

/discussion/your-comments

Disallow

/discussion/edit-profile

Disallow

/discussion/search/comments

Disallow

/discussion/*

Disallow

/search

Disallow

/music/artist/*

Disallow

/music/album/*

Disallow

/books/data/*

Disallow

/settings/

Disallow

/embed/

Disallow

/*styles/js-on.css$

Disallow

/sport/olympics/2008/events/*

Disallow

/sport/olympics/2008/medals/*

Disallow

/f/healthcheck

Disallow

/sections

Disallow

/top-stories

Disallow

/most-read/sport

Disallow

/articles

Disallow

/global$

Disallow

/*/feedarticle/*

Disallow

/travel/2013/aug/22/been-there-readers-competition?*

Disallow

/preference/*

Disallow

/59666047/

Disallow

/print/

Disallow

/info/tech-feedback

Disallow

/production-monitoring/

Disallow

*.emailjson

Disallow

*.emailtxt

Disallow

/headline.txt

Disallow

*?*dcr=apps*

mediapartners-google

Rule	Path
Disallow

Rule

Path

Disallow

newsnow

Rule	Path
Disallow	/

Rule

Path

Disallow

gptbot

Rule	Path
Disallow	/

Rule

Path

Disallow

ccbot

Rule	Path
Disallow	/

Rule

Path

Disallow

turnitinbot

Rule	Path
Disallow	/

Rule

Path

Disallow

petalbot

Rule	Path
Disallow	/

Rule

Path

Disallow

moodlebot

Rule	Path
Disallow	/

Rule

Path

Disallow

facebookbot

Rule	Path
Disallow	/

Rule

Path

Disallow

bytespider

Rule	Path
Disallow	/

Rule

Path

Disallow

google-extended

Rule	Path
Disallow	/

Rule

Path

Disallow

https://hada.news

Rule	Path
Disallow	/

Rule

Path

Disallow

https://www.imediaethics.org

Rule	Path
Disallow	/

Rule

Path

Disallow

mojeek

Rule	Path
Disallow	/

Rule

Path

Disallow

jenkersbot

Rule	Path
Disallow	/

Rule

Path

Disallow

seekr

Rule	Path
Disallow	/

Rule

Path

Disallow

turnitin

Rule	Path
Disallow	/

Rule

Path

Disallow

youbot

Rule	Path
Disallow	/

Rule

Path

Disallow

arquivo-web-crawler

Rule	Path
Disallow	/

Rule

Path

Disallow

coccocbot-web

Rule	Path
Disallow	/

Rule

Path

Disallow

seznambot

Rule	Path
Disallow	/

Rule

Path

Disallow

perplexitybot

Rule	Path
Disallow	/

Rule

Path

Disallow

yacy

Rule	Path
Disallow	/

Rule

Path

Disallow

anthropic-ai

Rule	Path
Disallow	/

Rule

Path

Disallow

claudebot

Rule	Path
Disallow	/

Rule

Path

Disallow

bingbot

Rule	Path
Disallow	/

Rule

Path

Disallow

Other Records

Field	Value
sitemap	http://www.theguardian.com/sitemaps/news.xml
sitemap	http://www.theguardian.com/sitemaps/video.xml

Field

Value

sitemap

http://www.theguardian.com/sitemaps/news.xml

sitemap

http://www.theguardian.com/sitemaps/video.xml

Comments

This is the robots.txt file for theguardian.com
Guardian content is made available under our terms and conditions of use.
Any other uses are not permitted, incl. but not limited to: for large language
models (LLMs), machine learning and/or artificial intelligence-related
purposes; with any of the aforementioned technologies; and/or for any
commercial purposes. Contact licensing@theguardian.com for assistance

findhereall.comrobots.txt

Resource Scan

Scan Details

Last Scan

Groups

*

mediapartners-google

newsnow

gptbot

ccbot

turnitinbot

petalbot

moodlebot

facebookbot

bytespider

google-extended

https://hada.news

https://www.imediaethics.org

mojeek

jenkersbot

seekr

turnitin

youbot

arquivo-web-crawler

coccocbot-web

seznambot

perplexitybot

yacy

anthropic-ai

claudebot

bingbot

Other Records

Comments

findhereall.com
robots.txt