housedillon.com
robots.txt

Robots Exclusion Standard data for housedillon.com

Archived Snapshots

Resource Scan

Scan Details

Site Domain	housedillon.com
Base Domain	housedillon.com
Scan Status	Ok
Last Scan	2025-03-07T11:49:56+00:00
Next Scan	2025-04-06T11:49:56+00:00

Last Scan

Scanned	2025-03-07T11:49:56+00:00
URL	https://housedillon.com/robots.txt
Domain IPs	104.21.2.115, 172.67.129.33, 2606:4700:3035::ac43:8121, 2606:4700:3037::6815:273
Response IP	104.21.2.115
Found	Yes
Hash	33916799a6f8af81547d87ac1c2f229bbbcd6aec7ecd69ea9f94341a2c578dc1
SimHash	964ccb1b88f0

Groups

adsbot

Rule	Path
Disallow	/
Allow	/ads.txt
Allow	/app-ads.txt

Rule

Path

Disallow

Allow

/ads.txt

Allow

/app-ads.txt

turnitinbot

Rule	Path
Disallow	/

Rule

Path

Disallow

npbot

Rule	Path
Disallow	/

Rule

Path

Disallow

slysearch

Rule	Path
Disallow	/

Rule

Path

Disallow

blexbot

Rule	Path
Disallow	/

Rule

Path

Disallow

checkmarknetwork/1.0 (+https://www.checkmarknetwork.com/spider.html)

Rule	Path
Disallow	/

Rule

Path

Disallow

brandverity/1.0

Rule	Path
Disallow	/

Rule

Path

Disallow

chatgpt-user

Rule	Path
Disallow	/

Rule

Path

Disallow

gptbot

Rule	Path
Disallow	/

Rule

Path

Disallow

ccbot

Rule	Path
Disallow	/

Rule

Path

Disallow

google-extended

Rule	Path
Disallow	/

Rule

Path

Disallow

omgilibot

Rule	Path
Disallow	/

Rule

Path

Disallow

facebookbot

Rule	Path
Disallow	/

Rule

Path

Disallow

applebot-extended

Rule	Path
Disallow	/

Rule

Path

Disallow

amazonbot

Rule	Path
Disallow	/

Rule

Path

Disallow

petalbot

Rule	Path
Disallow	/

Rule

Path

Disallow

google adsbot

Rule	Path
Disallow	/

Rule

Path

Disallow

taboola

Rule	Path
Disallow	/

Rule

Path

Disallow

googleother

Rule	Path
Disallow	/

Rule

Path

Disallow

google adsense

Rule	Path
Disallow	/

Rule

Path

Disallow

brandwatch

Rule	Path
Disallow	/

Rule

Path

Disallow

bing ads

Rule	Path
Disallow	/

Rule

Path

Disallow

amazon adbot

Rule	Path
Disallow	/

Rule

Path

Disallow

proximic

Rule	Path
Disallow	/

Rule

Path

Disallow

yahoo ad monitoring

Rule	Path
Disallow	/

Rule

Path

Disallow

outbrain

Rule	Path
Disallow	/

Rule

Path

Disallow

ias crawler

Rule	Path
Disallow	/

Rule

Path

Disallow

bytespider

Rule	Path
Disallow	/

Rule

Path

Disallow

Comments

The below is borrowed from seirdy.one...
I opt out of online advertising so malware that injects ads on my site won't get paid.
You should do the same. my ads.txt file contains a standard placeholder to forbid any
compliant ad networks from paying for ad placement on my domain.
The next three are borrowed from https://www.videolan.org/robots.txt
> This robot collects content from the Internet for the sole purpose of # helping educational institutions prevent plagiarism. [...] we compare student papers against the content we find on the Internet to see if we # can find similarities. (http://www.turnitin.com/robot/crawlerinfo.html)
--> fuck off.
> NameProtect engages in crawling activity in search of a wide range of brand and other intellectual property violations that may be of interest to our clients. (http://www.nameprotect.com/botinfo.html)
--> fuck off.
iThenticate is a new service we have developed to combat the piracy of intellectual property and ensure the originality of written work for# publishers, non-profit agencies, corporations, and newspapers. (http://www.slysearch.com/)
--> fuck off.
BLEXBot assists internet marketers to get information on the link structure of sites and their interlinking on the web, to avoid any technical and possible legal issues and improve overall online experience. (http://webmeup-crawler.com/)
--> fuck off.
Providing Intellectual Property professionals with superior brand protection services by artfully merging the latest technology with expert analysis. (https://www.checkmarknetwork.com/spider.html/)
"The Internet is just way to big to effectively police alone." (ACTUAL quote)
--> fuck off.
Stop trademark violations and affiliate non-compliance in paid search. Automatically monitor your partner and affiliatesâ online marketing to protect yourself from harmful brand violations and regulatory risks. We regularly crawl websites on behalf of our clients to ensure content compliance with brand and regulatory guidelines. (https://www.brandverity.com/why-is-brandverity-visiting-me)
--> fuck off.
Eat shit, LLMs
These are other random bots that I don't want to help

housedillon.comrobots.txt

Resource Scan

Scan Details

Last Scan

Groups

adsbot

turnitinbot

npbot

slysearch

blexbot

checkmarknetwork/1.0 (+https://www.checkmarknetwork.com/spider.html)

brandverity/1.0

chatgpt-user

gptbot

ccbot

google-extended

omgilibot

facebookbot

applebot-extended

amazonbot

petalbot

google adsbot

taboola

googleother

google adsense

brandwatch

bing ads

amazon adbot

proximic

yahoo ad monitoring

outbrain

ias crawler

bytespider

Comments

housedillon.com
robots.txt