housedillon.com
robots.txt

Robots Exclusion Standard data for housedillon.com

Resource Scan

Scan Details

Site Domain housedillon.com
Base Domain housedillon.com
Scan Status Ok
Last Scan2025-03-07T11:49:56+00:00
Next Scan 2025-04-06T11:49:56+00:00

Last Scan

Scanned2025-03-07T11:49:56+00:00
URL https://housedillon.com/robots.txt
Domain IPs 104.21.2.115, 172.67.129.33, 2606:4700:3035::ac43:8121, 2606:4700:3037::6815:273
Response IP 104.21.2.115
Found Yes
Hash 33916799a6f8af81547d87ac1c2f229bbbcd6aec7ecd69ea9f94341a2c578dc1
SimHash 964ccb1b88f0

Groups

adsbot

Rule Path
Disallow /
Allow /ads.txt
Allow /app-ads.txt

turnitinbot

Rule Path
Disallow /

npbot

Rule Path
Disallow /

slysearch

Rule Path
Disallow /

blexbot

Rule Path
Disallow /

checkmarknetwork/1.0 (+https://www.checkmarknetwork.com/spider.html)

Rule Path
Disallow /

brandverity/1.0

Rule Path
Disallow /

chatgpt-user

Rule Path
Disallow /

gptbot

Rule Path
Disallow /

ccbot

Rule Path
Disallow /

google-extended

Rule Path
Disallow /

omgilibot

Rule Path
Disallow /

facebookbot

Rule Path
Disallow /

applebot-extended

Rule Path
Disallow /

amazonbot

Rule Path
Disallow /

petalbot

Rule Path
Disallow /

google adsbot

Rule Path
Disallow /

taboola

Rule Path
Disallow /

googleother

Rule Path
Disallow /

google adsense

Rule Path
Disallow /

brandwatch

Rule Path
Disallow /

bing ads

Rule Path
Disallow /

amazon adbot

Rule Path
Disallow /

proximic

Rule Path
Disallow /

yahoo ad monitoring

Rule Path
Disallow /

outbrain

Rule Path
Disallow /

ias crawler

Rule Path
Disallow /

bytespider

Rule Path
Disallow /

Comments

  • The below is borrowed from seirdy.one...
  • I opt out of online advertising so malware that injects ads on my site won't get paid.
  • You should do the same. my ads.txt file contains a standard placeholder to forbid any
  • compliant ad networks from paying for ad placement on my domain.
  • The next three are borrowed from https://www.videolan.org/robots.txt
  • > This robot collects content from the Internet for the sole purpose of # helping educational institutions prevent plagiarism. [...] we compare student papers against the content we find on the Internet to see if we # can find similarities. (http://www.turnitin.com/robot/crawlerinfo.html)
  • --> fuck off.
  • > NameProtect engages in crawling activity in search of a wide range of brand and other intellectual property violations that may be of interest to our clients. (http://www.nameprotect.com/botinfo.html)
  • --> fuck off.
  • iThenticate is a new service we have developed to combat the piracy of intellectual property and ensure the originality of written work for# publishers, non-profit agencies, corporations, and newspapers. (http://www.slysearch.com/)
  • --> fuck off.
  • BLEXBot assists internet marketers to get information on the link structure of sites and their interlinking on the web, to avoid any technical and possible legal issues and improve overall online experience. (http://webmeup-crawler.com/)
  • --> fuck off.
  • Providing Intellectual Property professionals with superior brand protection services by artfully merging the latest technology with expert analysis. (https://www.checkmarknetwork.com/spider.html/)
  • "The Internet is just way to big to effectively police alone." (ACTUAL quote)
  • --> fuck off.
  • Stop trademark violations and affiliate non-compliance in paid search. Automatically monitor your partner and affiliates’ online marketing to protect yourself from harmful brand violations and regulatory risks. We regularly crawl websites on behalf of our clients to ensure content compliance with brand and regulatory guidelines. (https://www.brandverity.com/why-is-brandverity-visiting-me)
  • --> fuck off.
  • Eat shit, LLMs
  • These are other random bots that I don't want to help