polcompball.wikitide.org
robots.txt

Robots Exclusion Standard data for polcompball.wikitide.org

Archived Snapshots

Resource Scan

Scan Details

Site Domain	polcompball.wikitide.org
Base Domain	wikitide.org
Scan Status	Failed
Failure Stage	Fetching resource.
Failure Reason	Server returned a client error.
Last Scan	2025-10-22T19:22:54+00:00
Next Scan	2025-11-21T19:22:54+00:00

Last Successful Scan

Scanned	2025-05-16T00:30:29+00:00
URL	https://polcompball.wikitide.org/robots.txt
Domain IPs	2602:294:0:b13::110, 38.46.223.205
Response IP	38.46.223.205
Found	Yes
Hash	0e5e8f41a755391e9e8c9b7053b10fbc63b49587db9021995dc51af4f5803ddf
SimHash	3207e85a85d0

Groups

*

Rule	Path
Allow	/w/api.php?action=mobileview&
Allow	/w/load.php?
Disallow	/w/
Disallow	/geoip$
Disallow	/rest_v1/

Rule

Path

Allow

/w/api.php?action=mobileview&

Allow

/w/load.php?

Disallow

/w/

Disallow

/geoip$

Disallow

/rest_v1/

semrushbot

Rule	Path
Disallow	/

Rule

Path

Disallow

ahrefsbot

Rule	Path
Disallow	/

Rule

Path

Disallow

bytespider

Rule	Path
Disallow	/

Rule

Path

Disallow

petalbot

Rule	Path
Disallow	/

Rule

Path

Disallow

dotbot

Rule	Path
Disallow	/

Rule

Path

Disallow

megaindex

Rule	Path
Disallow	/

Rule

Path

Disallow

serpstatbot

Rule	Path
Disallow	/

Rule

Path

Disallow

barkrowler

Rule	Path
Disallow	/

Rule

Path

Disallow

seekportbot

Rule	Path
Disallow	/

Rule

Path

Disallow

mj12bot

No rules defined. All paths allowed.

Other Records

Field	Value
crawl-delay	10

Field

Value

crawl-delay

yandexbot

No rules defined. All paths allowed.

Other Records

Field	Value
crawl-delay	2.5

Field

Value

crawl-delay

2.5

bingbot

No rules defined. All paths allowed.

Other Records

Field	Value
crawl-delay	20

Field

Value

crawl-delay

Other Records

Field	Value
sitemap	https://polcompball.wikitide.org/sitemap.xml

Field

Value

sitemap

https://polcompball.wikitide.org/sitemap.xml

Comments

robots.txt for Miraheze
Throttle access to certain pages
Do not include special pages and other pages where indexing is undesirable if they are likely to be linked to; use noindex instead.
That's because Google can still index pages in here without crawling them if the pages are linked to
See https://developers.google.com/search/docs/crawling-indexing/robots/intro
Block SemrushBot
Block AhrefsBot
Block Bytespider
Block PetalBot
Block DotBot
Block MegaIndex
Block serpstatbot
Block Barkrowler
Block SeekportBot
Keep Crawl-Delay rules at the bottom
Bots that don't understand Crawl-Delay might break when encountering it
See https://github.com/otwcode/otwarchive/pull/4411#discussion_r1044351129 (English) and https://webtan.impress.co.jp/e/2022/11/04/43611 (Japanese)
Throttle MJ12Bot
Throttle YandexBot
TODO: Crawl-delay is not respected since 2018
Throttle BingBot
----------------------------------------------------------
Dynamic sitemap url
----------------------------------------------------------

polcompball.wikitide.orgrobots.txt

Resource Scan

Scan Details

Last Successful Scan

Groups

*

semrushbot

ahrefsbot

bytespider

petalbot

dotbot

megaindex

serpstatbot

barkrowler

seekportbot

mj12bot

Other Records

yandexbot

Other Records

bingbot

Other Records

Other Records

Comments

polcompball.wikitide.org
robots.txt