iragination.com
robots.txt

Robots Exclusion Standard data for iragination.com

Archived Snapshots

Resource Scan

Scan Details

Site Domain	iragination.com
Base Domain	iragination.com
Scan Status	Ok
Last Scan	2025-08-20T00:36:12+00:00
Next Scan	2025-08-27T00:36:12+00:00

Last Scan

Scanned	2025-08-20T00:36:12+00:00
URL	https://iragination.com/robots.txt
Domain IPs	104.21.19.199, 172.67.188.153, 2606:4700:3033::6815:13c7, 2606:4700:3034::ac43:bc99
Response IP	104.21.19.199
Found	Yes
Hash	c6eb57e4d9a258334701b4514b28f5b6636b9497dc274b46beb863b80d94fa95
SimHash	b65c597ccd61

Groups

mozilla/5.0 (compatible; loc-crawler

Rule	Path
Disallow	/

Rule

Path

Disallow

tagent

Rule	Path
Disallow	/

Rule

Path

Disallow

teleport pro

Rule	Path
Disallow	/

Rule

Path

Disallow

alkalinebot

Rule	Path
Disallow	/

Rule

Path

Disallow

whizbang

Rule	Path
Disallow	/

Rule

Path

Disallow

universebot

Rule	Path
Disallow	/

Rule

Path

Disallow

http://www.almaden.ibm.com/cs/crawler

Rule	Path
Disallow	/

Rule

Path

Disallow

slysearch

Rule	Path
Disallow	/

Rule

Path

Disallow

ng/1.0

Rule	Path
Disallow	/

Rule

Path

Disallow

asterias

Rule	Path
Disallow	/

Rule

Path

Disallow

gaisbot

Rule	Path
Disallow	/

Rule

Path

Disallow

ubicrawler

Rule	Path
Disallow	/

Rule

Path

Disallow

wget

Rule	Path
Disallow	/

Rule

Path

Disallow

transgenikbot

Rule	Path
Disallow	/

Rule

Path

Disallow

ocelli

Rule	Path
Disallow	/

Rule

Path

Disallow

exabot

Rule	Path
Disallow	/

Rule

Path

Disallow

pompos

Rule	Path
Disallow	/

Rule

Path

Disallow

larbin

Rule	Path
Disallow	/

Rule

Path

Disallow

nutch

Rule	Path
Disallow	/

Rule

Path

Disallow

jetbot

Rule	Path
Disallow	/

Rule

Path

Disallow

slurp

Rule	Path
Disallow	/

Rule

Path

Disallow

cyotekwebcrawler

Rule	Path
Disallow	/

Rule

Path

Disallow

httrack

Rule	Path
Disallow	/

Rule

Path

Disallow

dotbot

Rule	Path
Disallow	/

Rule

Path

Disallow

blexbot

Rule	Path
Disallow	/

Rule

Path

Disallow

serpstatbot

Rule	Path
Disallow	/

Rule

Path

Disallow

mj12bot

Rule	Path
Disallow	/

Rule

Path

Disallow

ahrefsbot

Rule	Path
Disallow	/

Rule

Path

Disallow

adsbot

Rule	Path
Disallow	/

Rule

Path

Disallow

dataforseobot

Rule	Path
Disallow	/

Rule

Path

Disallow

*

Rule	Path
Disallow
Disallow	/illust/addfav.php?*
Disallow	/illust/login.php?*
Disallow	/illust/thumbnails.php?album=*slideshow
Disallow	/cgi-bin/
Disallow	/api/
Disallow	/amfphp/
Disallow	/js/
Disallow	/css/
Disallow	/clips/bassabyss/game/

Rule

Path

Disallow

/illust/addfav.php?*

Disallow

/illust/login.php?*

Disallow

/illust/thumbnails.php?album=*slideshow

Disallow

/cgi-bin/

Disallow

/api/

Disallow

/amfphp/

Disallow

/js/

Disallow

/css/

Disallow

/clips/bassabyss/game/

Comments

Copied from
https://www.gamers.org/robots.txt
Exclusions section for specific robots
Exclude loc-crawler - it gets at high speed w/no delay
accessing from lx8.loc.gov 140.147.249.70 starting April 15 2011
Exclude TAGENT - it requests robots.txt before every GET
and GETs files too quickly. Here is a sample from the access log:
sv.tkensaku.com - - [22/Jan/2002:11:38:05 -0500] "GET /robots.txt HTTP/1.0" 200 210 "TAGENT/V0.5"
sv.tkensaku.com - - [22/Jan/2002:11:38:06 -0500] "GET /reviews/ HTTP/1.0" 200 14750 "TAGENT/V0.5"
sv.tkensaku.com - - [22/Jan/2002:11:38:08 -0500] "GET /robots.txt HTTP/1.0" 200 210 "TAGENT/V0.5"
sv.tkensaku.com - - [22/Jan/2002:11:38:09 -0500] "GET /previews/ HTTP/1.0" 200 9163 "TAGENT/V0.5"
sv.tkensaku.com - - [22/Jan/2002:11:38:10 -0500] "GET /robots.txt HTTP/1.0" 200 210 "TAGENT/V0.5"
sv.tkensaku.com - - [22/Jan/2002:11:38:12 -0500] "GET /articles/ HTTP/1.0" 200 9489 "TAGENT/V0.5"
Exclude Teleport Pro
Teleport Pro has a bug where it interprets HREF=".." as a file and
constructs and submits bad URLs, resulting in many Not Found errors.
Apache should redirect URIs ending in ".." to the 'real' directory.
Exclude AlkalineBOT
On 10-Mar-2002 from remote host syr-24-95-161-196.twcny.rr.com
Exclude Whizbang (see http://www.whizbang.com/crawler)
Exclude UniverseBot
No delay between requests. It strips off trailing slash, thus
triggering redirects. It does both HEAD and GET. Sample:
07:18:04 "HEAD /companies/ensemble HTTP/1.0" 301 0 "UniverseBot/1.0"
07:18:06 "HEAD /companies/ensemble/ HTTP/1.0" 200 0 "UniverseBot/1.0"
07:18:07 "GET /companies/ensemble HTTP/1.0" 301 247 "UniverseBot/1.0"
07:18:09 "GET /companies/ensemble/ HTTP/1.0" 200 9961 "UniverseBot/1.0"
Exclude http://www.almaden.ibm.com/cs/crawler
We'd like to limit the sites crawling us to the main indexers.
Exclude "SlySearch/1.0 http://www.plagiarism.org/crawler/robotinfo.html"
This site indexes article for plagiarism checks.
Exclude NG/1.0
On 18-Oct-2002 from remote host ng1.exabot.com
13:11:35 "GET /news/more/1005254413/d/redir/cb_order/UNRET2003.IR HTTP/1.0" 404 244 "NG/1.0"
13:11:37 "GET /news/more/1005254413/gi/tattletale/news/ HTTP/1.0" 404 234 "NG/1.0"
13:11:38 "GET /news/more/1005254413/ews/ HTTP/1.0" 404 219 "NG/1.0"
Exclude spider from singingfish.com - no media to index.
Exclude spider from xo.net - no reason to index our files
Exclude UbiCrawler
On 27-Sep-2003 from remote host ubi1.iit.cnr.it
http://ubi.imc.pi.cnr.it/projects/ubicrawler/
Exclude Wget
It checks this only for recursive operations, not for indiv. files
Exclude TranSGeniKBot
Exclude Ocelli/1.1 (http://www.globalspec.com)
Exclude Exabot (http://www.exava.com/)
Doesn't honor global exclusions.
Exclude Pompos (http://www.dir.com/)
Obscure search site - 1/4 of the URLs have %00 appended.
Stupid thing requires *no* optional space after User-agent:
Exclude larbin (http://freshmeat.net/projects/larbin/)
Open source spider that can be used by anyone. :-/
Exclude Nutch (http://www.nutch.org/docs/en/bot.html)
Open source spider that can be used by anyone. :-/
Exclude Jetbot (http://www.jeteye.com/jetbot.html)
Doesn't honor global exclusions (it fetches /dl pages).
Exclude Yahoo Slurp (http://help.yahoo.com/l/us/yahoo/search/webcrawler/)
Slurps tons of binaries too, averaging 2 GB/day
Exclude http://crawler.007ac9.net/
We'd like to limit the sites crawling us to the main indexers.
Exclude http://www.cyotek.com/cyotek-webcopy
Offline viewing tool
Exclude https://www.httrack.com/
Offline viewing tool
Exclude dotbot (http://www.opensiteexplorer.org/dotbot -> https://moz.com/researchtools/ose/dotbot)
Exclude BLEXBot (http://webmeup-crawler.com/)
Exclude serpstatbot (https://serpstatbot.com/)
Exclude MJ12bot (http://mj12bot.com/)
Fetches lots of mangled (wrongly nested) paths.
Exclude AhrefsBot (http://ahrefs.com/robot/)
Exclude Adsbot (https://seostar.co/robot/)
Exclude DataForSeoBot (https://dataforseo.com/dataforseo-bot)
Exclusions section for ALL robots
These are plain string patterns - not necessarily directory names -
so directories should have trailing slash if substring of another
directory name (like /a is a substring of /about).
robots.txt generated at http://www.mcanerin.com

Warnings

2 invalid lines.

iragination.comrobots.txt

Resource Scan

Scan Details

Last Scan

Groups

mozilla/5.0 (compatible; loc-crawler

tagent

teleport pro

alkalinebot

whizbang

universebot

http://www.almaden.ibm.com/cs/crawler

slysearch

ng/1.0

asterias

gaisbot

ubicrawler

wget

transgenikbot

ocelli

exabot

pompos

larbin

nutch

jetbot

slurp

cyotekwebcrawler

httrack

dotbot

blexbot

serpstatbot

mj12bot

ahrefsbot

adsbot

dataforseobot

*

Comments

Warnings

iragination.com
robots.txt