attrition.org
robots.txt

Robots Exclusion Standard data for attrition.org

Archived Snapshots

Resource Scan

Scan Details

Site Domain	attrition.org
Base Domain	attrition.org
Scan Status	Ok
Last Scan	2024-09-22T21:45:58+00:00
Next Scan	2024-10-06T21:45:58+00:00

Last Scan

Scanned	2024-09-22T21:45:58+00:00
URL	https://attrition.org/robots.txt
Domain IPs	185.148.129.110
Response IP	185.148.129.110
Found	Yes
Hash	8b3bded437d69059d5766c0d933e2c8691a88a975699699b1f406ebf953bcc98
SimHash	7d1cc2100854

Groups

ia_archiver

Rule	Path
Allow	/

Rule

Path

Allow

*

Rule	Path
Disallow	/cgi-bin
Disallow	/rss/
Disallow	/mirror/
Disallow	/HelloImScrapingTheMirrorInsteadOfDownloadingOffGitHubBanMeNow
Disallow	/harming/humans
Disallow	/ignoring/human/orders
Disallow	/harm/to/self

Rule

Path

Disallow

/cgi-bin

Disallow

/rss/

Disallow

/mirror/

Disallow

/HelloImScrapingTheMirrorInsteadOfDownloadingOffGitHubBanMeNow

Disallow

/harming/humans

Disallow

/ignoring/human/orders

Disallow

/harm/to/self

slysearch

Rule	Path
Disallow	/

Rule

Path

Disallow

arianna.iol.it linux/2.2.17-14smp (linux)

Rule	Path
Disallow	/

Rule

Path

Disallow

webclipping.com

Rule	Path
Disallow	/

Rule

Path

Disallow

casper bot search

Rule	Path
Disallow	/

Rule

Path

Disallow

dex bot search

Rule	Path
Disallow	/

Rule

Path

Disallow

blekkobot

Rule	Path
Disallow	/

Rule

Path

Disallow

http://blekko.com/about/blekkobot

Rule	Path
Disallow	/

Rule

Path

Disallow

scoutjet

Rule	Path
Disallow	/

Rule

Path

Disallow

tinybottestua

Rule	Path
Disallow	/

Rule

Path

Disallow

krzana-rss-bot

Rule	Path
Disallow	/

Rule

Path

Disallow

daum

Rule	Path
Disallow	/

Rule

Path

Disallow

larbin_2.6.2 larbin2.6.2@unspecified.mail

Rule	Path
Disallow	/

Rule

Path

Disallow

larbin_2.6.2

Rule	Path
Disallow	/

Rule

Path

Disallow

larbin_2.6.1 larbin2.6.2@unspecified.mail

Rule	Path
Disallow	/

Rule

Path

Disallow

larbin_2.6.1

Rule	Path
Disallow	/

Rule

Path

Disallow

java1.3.0

Rule	Path
Disallow	/

Rule

Path

Disallow

java1.4.0

Rule	Path
Disallow	/

Rule

Path

Disallow

java

Rule	Path
Disallow	/

Rule

Path

Disallow

winampmpeg/2.00 larbin@unspecified.mail

Rule	Path
Disallow	/

Rule

Path

Disallow

winampmpeg

Rule	Path
Disallow	/

Rule

Path

Disallow

msie-5.13 larbin@unspecified.mail

Rule	Path
Disallow	/

Rule

Path

Disallow

opera/6.01 larbin2.6.2@unspecified.mail

Rule	Path
Disallow	/

Rule

Path

Disallow

opera/6.01 larbin@unspecified.mail

Rule	Path
Disallow	/

Rule

Path

Disallow

mozilla/5.0 larbin2.6.2@unspecified.mail

Rule	Path
Disallow	/

Rule

Path

Disallow

bumblebee@relevare.com

Rule	Path
Disallow	/

Rule

Path

Disallow

zeus 64087 webster pro v2.9 win32

Rule	Path
Disallow	/

Rule

Path

Disallow

netresearchserver/2.3(loopimprovements.com/robot.html)

Rule	Path
Disallow	/

Rule

Path

Disallow

netresearchserver

Rule	Path
Disallow	/

Rule

Path

Disallow

openfind data gatherer, openbot/3.0+(robot-response@openfind.com.tw;+http://www.openfind.com.tw/robot.html)

Rule	Path
Disallow	/

Rule

Path

Disallow

openbot

Rule	Path
Disallow	/

Rule

Path

Disallow

netmechanic v3.0

Rule	Path
Disallow	/

Rule

Path

Disallow

netmechanic

Rule	Path
Disallow	/

Rule

Path

Disallow

ariadne rpt-httpclient/0.3-3

Rule	Path
Disallow	/

Rule

Path

Disallow

ariadne

Rule	Path
Disallow	/

Rule

Path

Disallow

mozilla/4.0 compatible zyborg/1.0 (zyborg@wisenutbot.com; http://www.wisenutbot.com)

Rule	Path
Disallow	/

Rule

Path

Disallow

zyborg

Rule	Path
Disallow	/

Rule

Path

Disallow

appie 1.1 (www.walhello.com)

Rule	Path
Disallow	/

Rule

Path

Disallow

appie

Rule	Path
Disallow	/

Rule

Path

Disallow

search.ch v1.4.2 (spiderman@search.ch; http://www.search.ch)

Rule	Path
Disallow	/

Rule

Path

Disallow

search.ch

Rule	Path
Disallow	/

Rule

Path

Disallow

pingalink monitoring services 1.0 (http://www.pingalink.com)

Rule	Path
Disallow	/

Rule

Path

Disallow

pingalink monitoring services

Rule	Path
Disallow	/

Rule

Path

Disallow

pingalink

Rule	Path
Disallow	/

Rule

Path

Disallow

enterprise_search/1.0 (http://www.innerprise.net/es-spider.asp)

Rule	Path
Disallow	/

Rule

Path

Disallow

blitzbot

Rule	Path
Disallow	/

Rule

Path

Disallow

mozilla/4.0 (compatible; msie 6.0; windows nt 5.1; .wonkz)

Rule	Path
Disallow	/

Rule

Path

Disallow

emailsiphon

Rule

Path

Disallow

emailwolf

Rule

Path

Disallow

extractorpro

Rule

Path

Disallow

cherrypicker

Rule

Path

Disallow

nicerspro

Rule

Path

Disallow

teleport

Rule

Path

Disallow

emailcollector

Rule

Path

Disallow

dirbuster-0.9.7 www.sittinglittleduck.com

Rule

Path

Disallow

compatible; adsbot/3.1

Rule

Path

Disallow

mozilla/5.0 (compatible; adsbot/3.1)

Rule

Path

Disallow

mozilla/5.0 (compatible; dotbot/1.1; http://www.opensiteexplorer.org/dotbot, help@moz.com)

Rule

Path

Disallow

magpie-crawler/1.1 (robots-txt-checker; +http://www.brandwatch.net)

Rule

Path

Disallow

ltx71 - (http://ltx71.com/)

Rule

Path

Disallow

chatgpt

Rule

Path

Disallow

openai

Rule

Path

Disallow

/pipermail/attrition/2011-May/000357.html

Comments

domo arigato mr. roboto
we like these fine folks
since Google keeps hitting 404 urls and won't stop..
a mirror scraping trap will cause a request to this. if a bot honors this file no issue. if not, fail2ban will get them
from http://www.last.fm/robots.txt
much of the following taken from http://pigdog.org/robots.txt.
http://www.plagiarism.org/crawler/robotinfo.html Fuck off, snitchbot!
various lame bots
https://cs.daum.net/faq/15/4118.html?faqId=28966 requesting strange shit that is more than just a crawler, plonk!
People who just set up dork-ass library bots they downloaded
off the Innurnet, and don't even bother to ID themselves,
are ASS. Go fuck yourself.
Welcome to the Innernet. You might want to read the HTTP spec before
fucking with our Web site. Your user-agent ID is bad, which suggests
that you don't know shit, and you're a sloppy programmer. Go away.
http://www.ietf.org/rfc/rfc1945.txt
http://www.ietf.org/rfc/rfc2068.txt
this is not email, d00d.
whitespace, d00d.
ID first, d00d. Then a space, then a comment.
Gosh, so close.
No spaces. First the ID, then a space, then a comment.
You suck.
slash, not a space
bad version string
No spaces.
Other misc blocks
spoofing the href to sites that have annoying javascript and other stuff
http://www.clockwatchers.com/robots_list.html spammer bots
2/9/2021
2/10/2021
7/3/2023
for Ilia

attrition.orgrobots.txt

Resource Scan

Scan Details

Last Scan

Groups

ia_archiver

*

slysearch

arianna.iol.it linux/2.2.17-14smp (linux)

webclipping.com

casper bot search

dex bot search

blekkobot

http://blekko.com/about/blekkobot

scoutjet

tinybottestua

krzana-rss-bot

daum

larbin_2.6.2 larbin2.6.2@unspecified.mail

larbin_2.6.2

larbin_2.6.1 larbin2.6.2@unspecified.mail

larbin_2.6.1

java1.3.0

java1.4.0

java

winampmpeg/2.00 larbin@unspecified.mail

winampmpeg

msie-5.13 larbin@unspecified.mail

opera/6.01 larbin2.6.2@unspecified.mail

opera/6.01 larbin@unspecified.mail

mozilla/5.0 larbin2.6.2@unspecified.mail

bumblebee@relevare.com

zeus 64087 webster pro v2.9 win32

netresearchserver/2.3(loopimprovements.com/robot.html)

netresearchserver

openfind data gatherer, openbot/3.0+(robot-response@openfind.com.tw;+http://www.openfind.com.tw/robot.html)

openbot

netmechanic v3.0

netmechanic

ariadne rpt-httpclient/0.3-3

ariadne

mozilla/4.0 compatible zyborg/1.0 (zyborg@wisenutbot.com; http://www.wisenutbot.com)

zyborg

appie 1.1 (www.walhello.com)

appie

search.ch v1.4.2 (spiderman@search.ch; http://www.search.ch)

search.ch

pingalink monitoring services 1.0 (http://www.pingalink.com)

pingalink monitoring services

pingalink

enterprise_search/1.0 (http://www.innerprise.net/es-spider.asp)

blitzbot

mozilla/4.0 (compatible; msie 6.0; windows nt 5.1; .wonkz)

emailsiphon

emailwolf

extractorpro

cherrypicker

nicerspro

teleport

emailcollector

dirbuster-0.9.7 www.sittinglittleduck.com

compatible; adsbot/3.1

mozilla/5.0 (compatible; adsbot/3.1)

mozilla/5.0 (compatible; dotbot/1.1; http://www.opensiteexplorer.org/dotbot, help@moz.com)

magpie-crawler/1.1 (robots-txt-checker; +http://www.brandwatch.net)

ltx71 - (http://ltx71.com/)

chatgpt

openai

Comments

attrition.org
robots.txt