ttadev.org
robots.txt

Robots Exclusion Standard data for ttadev.org

Archived Snapshots

Resource Scan

Scan Details

Site Domain	ttadev.org
Base Domain	ttadev.org
Scan Status	Ok
Last Scan	2025-10-29T23:42:55+00:00
Next Scan	2025-11-28T23:42:55+00:00

Last Scan

Scanned	2025-10-29T23:42:55+00:00
URL	https://ttadev.org/robots.txt
Domain IPs	104.21.10.87, 172.67.162.191, 2606:4700:3031::6815:a57, 2606:4700:3032::ac43:a2bf
Response IP	104.21.10.87
Found	Yes
Hash	60f9dfd086fca6a24b0a38bd31f95de3486477043109869a7fbc99f4a04c4e97
SimHash	64140d52ddd6

Groups

*

Rule	Path
Allow	/

Rule

Path

Allow

amazonbot

Rule	Path
Disallow	/

Rule

Path

Disallow

applebot-extended

Rule	Path
Disallow	/

Rule

Path

Disallow

bytespider

Rule	Path
Disallow	/

Rule

Path

Disallow

ccbot

Rule	Path
Disallow	/

Rule

Path

Disallow

claudebot

Rule	Path
Disallow	/

Rule

Path

Disallow

google-extended

Rule	Path
Disallow	/

Rule

Path

Disallow

gptbot

Rule	Path
Disallow	/

Rule

Path

Disallow

meta-externalagent

Rule	Path
Disallow	/

Rule

Path

Disallow

anthropic-ai

Rule	Path
Disallow	/

Rule

Path

Disallow

claudebot

Rule	Path
Disallow	/

Rule

Path

Disallow

claude-web

Rule	Path
Disallow	/

Rule

Path

Disallow

chatgpt-user

Rule	Path
Disallow	/

Rule

Path

Disallow

gptbot

Rule	Path
Disallow	/

Rule

Path

Disallow

perplexitybot

Rule	Path
Disallow	/

Rule

Path

Disallow

cohere-ai

Rule	Path
Disallow	/

Rule

Path

Disallow

google-extended

Rule	Path
Disallow	/

Rule

Path

Disallow

facebookbot

Rule	Path
Disallow	/

Rule

Path

Disallow

meta-externalagent

Rule	Path
Disallow	/

Rule

Path

Disallow

timpibot

Rule	Path
Disallow	/

Rule

Path

Disallow

friendlycrawler

Rule	Path
Disallow	/

Rule

Path

Disallow

image2dataset

Rule	Path
Disallow	/

Rule

Path

Disallow

imagesiftbot

Rule	Path
Disallow	/

Rule

Path

Disallow

ahrefsbot

Rule	Path
Disallow	/

Rule

Path

Disallow

barkrowler

Rule	Path
Disallow	/

Rule

Path

Disallow

blexbot

Rule	Path
Disallow	/

Rule

Path

Disallow

mj12bot

Rule	Path
Disallow	/

Rule

Path

Disallow

dataforseobot

Rule	Path
Disallow	/

Rule

Path

Disallow

dotbot

Rule	Path
Disallow	/

Rule

Path

Disallow

semrushbot

Rule	Path
Disallow	/

Rule

Path

Disallow

yandex

Rule	Path
Disallow	/

Rule

Path

Disallow

petalbot

Rule	Path
Disallow	/

Rule

Path

Disallow

amazonbot

Rule	Path
Disallow	/

Rule

Path

Disallow

bytespider

Rule	Path
Disallow	/

Rule

Path

Disallow

censysinspect

Rule	Path
Disallow	/

Rule

Path

Disallow

expanse

Rule	Path
Disallow	/

Rule

Path

Disallow

internet-measurement

Rule	Path
Disallow	/

Rule

Path

Disallow

scrapy

Rule	Path
Disallow	/

Rule

Path

Disallow

python-requests

Rule	Path
Disallow	/

Rule

Path

Disallow

java

Rule	Path
Disallow	/

Rule

Path

Disallow

go-http-client

Rule	Path
Disallow	/

Rule

Path

Disallow

news-please

Rule	Path
Disallow	/

Rule

Path

Disallow

dataprovider

Rule	Path
Disallow	/

Rule

Path

Disallow

orbbot

Rule	Path
Disallow	/

Rule

Path

Disallow

ioncrawl

Rule	Path
Disallow	/

Rule

Path

Disallow

isscyberriskcrawler

Rule	Path
Disallow	/

Rule

Path

Disallow

velenpublicwebcrawler

Rule	Path
Disallow	/

Rule

Path

Disallow

peer39_crawler

Rule	Path
Disallow	/

Rule

Path

Disallow

zoominfobot

Rule

Path

Disallow

wp_is_mobile

Rule

Path

Disallow

fast

Rule

Path

Disallow

wget

Rule

Path

Disallow

grub-client

Rule

Path

Disallow

*

Rule

Path

Allow

/w/sitemap/

Allow

/w/api.php?action=mobileview&

Allow

/w/load.php?

Allow

/api/rest_v1/?doc

Disallow

/w/

Disallow

/api/

Disallow

/trap/

Disallow

/wiki/Special%3A

Disallow

/wiki/User%3A

Disallow

/wiki/User_talk%3A

Disallow

/wiki/MediaWiki%3A

Disallow

/wiki/MediaWiki_talk%3A

Disallow

/wiki/Template%3A

Disallow

/wiki/Template_talk%3A

Other Records

Field

Value

sitemap

https://tunearch.org/w/sitemap/tta.xml

Comments

As a condition of accessing this website, you agree to abide by the following
content signals:
(a) If a content-signal = yes, you may collect content for the corresponding
use.
(b) If a content-signal = no, you may not collect content for the
corresponding use.
(c) If the website operator does not include a content signal for a
corresponding use, the website operator neither grants nor restricts
permission via content signal with respect to the corresponding use.
The content signals and their meanings are:
search: building a search index and providing search results (e.g., returning
hyperlinks and short excerpts from your website's contents). Search does not
include providing AI-generated search summaries.
ai-input: inputting content into one or more AI models (e.g., retrieval
augmented generation, grounding, or other real-time taking of content for
generative AI search answers).
ai-train: training or fine-tuning AI models.
ANY RESTRICTIONS EXPRESSED VIA CONTENT SIGNALS ARE EXPRESS RESERVATIONS OF
AND RELATED RIGHTS IN THE DIGITAL SINGLE MARKET.
BEGIN Cloudflare Managed content
END Cloudflare Managed Content
robots.txt for http://www.tunearch.org/ and friends
Please note: There are a lot of pages on this site, and there are
some misbehaved spiders out there that go _way_ too fast. If you're
irresponsible, your access to the site may be blocked.
Observed spamming large amounts of https://en.wikipedia.org/?curid=NNNNNN
and ignoring 429 ratelimit responses, claims to respect robots:
http://mj12bot.com/
=== AI CRAWLERS ===
=== SEO CRAWLERS ===
=== SEARCH ENGINES & DATA ===
=== SECURITY & NETWORK SCANNERS ===
=== GENERIC / ABUSIVE ===
Misbehaving: requests much too fast:
Sorry, wget in its recursive mode is a frequent problem.
Please read the man page and use it properly; there is a
--wait option you can use to set the delay between hits,
for instance.
The 'grub' distributed client has been *very* poorly behaved.
Friendly, low-speed bots are welcome viewing article pages, but not
dynamically-generated pages please.
Inktomi's "Slurp" can read a minimum delay between hits; if your
bot supports such a thing using the 'Crawl-delay' or another
instruction, please let us know.
There is a special exception for API mobileview to allow dynamic
mobile web & app views to load section content.
These views aren't HTTP-cached but use parser cache aggressively
and don't expose special: pages etc.
Another exception is for REST API documentation, located at
/api/rest_v1/?doc.

Warnings

`content-signal` is not a known field.

ttadev.orgrobots.txt

Resource Scan

Scan Details

Last Scan

Groups

*

amazonbot

applebot-extended

bytespider

ccbot

claudebot

google-extended

gptbot

meta-externalagent

anthropic-ai

claudebot

claude-web

chatgpt-user

gptbot

perplexitybot

cohere-ai

google-extended

facebookbot

meta-externalagent

timpibot

friendlycrawler

image2dataset

imagesiftbot

ahrefsbot

barkrowler

blexbot

mj12bot

dataforseobot

dotbot

semrushbot

yandex

petalbot

amazonbot

bytespider

censysinspect

expanse

internet-measurement

scrapy

python-requests

java

go-http-client

news-please

dataprovider

orbbot

ioncrawl

isscyberriskcrawler

velenpublicwebcrawler

peer39_crawler

zoominfobot

wp_is_mobile

fast

wget

grub-client

*

Other Records

Comments

Warnings

ttadev.org
robots.txt