attrition.org
robots.txt

Robots Exclusion Standard data for attrition.org

Resource Scan

Scan Details

Site Domain attrition.org
Base Domain attrition.org
Scan Status Ok
Last Scan2024-09-22T21:45:58+00:00
Next Scan 2024-10-06T21:45:58+00:00

Last Scan

Scanned2024-09-22T21:45:58+00:00
URL https://attrition.org/robots.txt
Domain IPs 185.148.129.110
Response IP 185.148.129.110
Found Yes
Hash 8b3bded437d69059d5766c0d933e2c8691a88a975699699b1f406ebf953bcc98
SimHash 7d1cc2100854

Groups

ia_archiver

Rule Path
Allow /

*

Rule Path
Disallow /cgi-bin
Disallow /rss/
Disallow /mirror/
Disallow /HelloImScrapingTheMirrorInsteadOfDownloadingOffGitHubBanMeNow
Disallow /harming/humans
Disallow /ignoring/human/orders
Disallow /harm/to/self

slysearch

Rule Path
Disallow /

arianna.iol.it linux/2.2.17-14smp (linux)

Rule Path
Disallow /

webclipping.com

Rule Path
Disallow /

casper bot search

Rule Path
Disallow /

dex bot search

Rule Path
Disallow /

blekkobot

Rule Path
Disallow /

http://blekko.com/about/blekkobot

Rule Path
Disallow /

scoutjet

Rule Path
Disallow /

tinybottestua

Rule Path
Disallow /

krzana-rss-bot

Rule Path
Disallow /

daum

Rule Path
Disallow /

larbin_2.6.2 larbin2.6.2@unspecified.mail

Rule Path
Disallow /

larbin_2.6.2

Rule Path
Disallow /

larbin_2.6.1 larbin2.6.2@unspecified.mail

Rule Path
Disallow /

larbin_2.6.1

Rule Path
Disallow /

java1.3.0

Rule Path
Disallow /

java1.4.0

Rule Path
Disallow /

java

Rule Path
Disallow /

winampmpeg/2.00 larbin@unspecified.mail

Rule Path
Disallow /

winampmpeg

Rule Path
Disallow /

msie-5.13 larbin@unspecified.mail

Rule Path
Disallow /

opera/6.01 larbin2.6.2@unspecified.mail

Rule Path
Disallow /

opera/6.01 larbin@unspecified.mail

Rule Path
Disallow /

mozilla/5.0 larbin2.6.2@unspecified.mail

Rule Path
Disallow /

bumblebee@relevare.com

Rule Path
Disallow /

zeus 64087 webster pro v2.9 win32

Rule Path
Disallow /

netresearchserver/2.3(loopimprovements.com/robot.html)

Rule Path
Disallow /

netresearchserver

Rule Path
Disallow /

openfind data gatherer, openbot/3.0+(robot-response@openfind.com.tw;+http://www.openfind.com.tw/robot.html)

Rule Path
Disallow /

openbot

Rule Path
Disallow /

netmechanic v3.0

Rule Path
Disallow /

netmechanic

Rule Path
Disallow /

ariadne rpt-httpclient/0.3-3

Rule Path
Disallow /

ariadne

Rule Path
Disallow /

mozilla/4.0 compatible zyborg/1.0 (zyborg@wisenutbot.com; http://www.wisenutbot.com)

Rule Path
Disallow /

zyborg

Rule Path
Disallow /

appie 1.1 (www.walhello.com)

Rule Path
Disallow /

appie

Rule Path
Disallow /

search.ch v1.4.2 (spiderman@search.ch; http://www.search.ch)

Rule Path
Disallow /

search.ch

Rule Path
Disallow /

pingalink monitoring services 1.0 (http://www.pingalink.com)

Rule Path
Disallow /

pingalink monitoring services

Rule Path
Disallow /

pingalink

Rule Path
Disallow /

enterprise_search/1.0 (http://www.innerprise.net/es-spider.asp)

Rule Path
Disallow /

blitzbot

Rule Path
Disallow /

mozilla/4.0 (compatible; msie 6.0; windows nt 5.1; .wonkz)

Rule Path
Disallow /

emailsiphon

Rule Path
Disallow /

emailwolf

Rule Path
Disallow /

extractorpro

Rule Path
Disallow /

cherrypicker

Rule Path
Disallow /

nicerspro

Rule Path
Disallow /

teleport

Rule Path
Disallow /

emailcollector

Rule Path
Disallow /

dirbuster-0.9.7 www.sittinglittleduck.com

Rule Path
Disallow /

compatible; adsbot/3.1

Rule Path
Disallow /

mozilla/5.0 (compatible; adsbot/3.1)

Rule Path
Disallow /

mozilla/5.0 (compatible; dotbot/1.1; http://www.opensiteexplorer.org/dotbot, help@moz.com)

Rule Path
Disallow /

magpie-crawler/1.1 (robots-txt-checker; +http://www.brandwatch.net)

Rule Path
Disallow /

ltx71 - (http://ltx71.com/)

Rule Path
Disallow /

chatgpt

Rule Path
Disallow /

openai

Rule Path
Disallow /
Disallow /pipermail/attrition/2011-May/000357.html

Comments

  • domo arigato mr. roboto
  • we like these fine folks
  • since Google keeps hitting 404 urls and won't stop..
  • a mirror scraping trap will cause a request to this. if a bot honors this file no issue. if not, fail2ban will get them
  • from http://www.last.fm/robots.txt
  • much of the following taken from http://pigdog.org/robots.txt.
  • http://www.plagiarism.org/crawler/robotinfo.html Fuck off, snitchbot!
  • various lame bots
  • https://cs.daum.net/faq/15/4118.html?faqId=28966 requesting strange shit that is more than just a crawler, plonk!
  • People who just set up dork-ass library bots they downloaded
  • off the Innurnet, and don't even bother to ID themselves,
  • are ASS. Go fuck yourself.
  • Welcome to the Innernet. You might want to read the HTTP spec before
  • fucking with our Web site. Your user-agent ID is bad, which suggests
  • that you don't know shit, and you're a sloppy programmer. Go away.
  • http://www.ietf.org/rfc/rfc1945.txt
  • http://www.ietf.org/rfc/rfc2068.txt
  • this is not email, d00d.
  • whitespace, d00d.
  • ID first, d00d. Then a space, then a comment.
  • Gosh, so close.
  • No spaces. First the ID, then a space, then a comment.
  • You suck.
  • slash, not a space
  • bad version string
  • No spaces.
  • Other misc blocks
  • spoofing the href to sites that have annoying javascript and other stuff
  • http://www.clockwatchers.com/robots_list.html spammer bots
  • 2/9/2021
  • 2/10/2021
  • 7/3/2023
  • for Ilia