portalsofphereon.com
robots.txt

Robots Exclusion Standard data for portalsofphereon.com

Archived Snapshots

Resource Scan

Scan Details

Site Domain	portalsofphereon.com
Base Domain	portalsofphereon.com
Scan Status	Failed
Failure Stage	Fetching resource.
Failure Reason	Server returned a client error.
Last Scan	2025-11-16T00:32:06+00:00
Next Scan	2025-11-30T00:32:06+00:00

Last Successful Scan

Scanned	2025-10-23T03:09:36+00:00
URL	https://www.portalsofphereon.com/robots.txt
Domain IPs	104.21.70.31, 172.67.218.208, 2606:4700:3031::ac43:dad0, 2606:4700:3037::6815:461f
Response IP	104.21.70.31
Found	Yes
Hash	6e75715ac9f255d6b4782fc240b7fc5a2fd175b06462b4f87a66d518d4df10aa
SimHash	3007c85ac5c0

Groups

*

Rule	Path
Allow	/w/api.php?action=mobileview&
Allow	/w/load.php?
Disallow	/w/
Disallow	/geoip$
Disallow	/rest_v1/

Rule

Path

Allow

/w/api.php?action=mobileview&

Allow

/w/load.php?

Disallow

/w/

Disallow

/geoip$

Disallow

/rest_v1/

semrushbot

Rule	Path
Disallow	/

Rule

Path

Disallow

ahrefsbot

Rule	Path
Disallow	/

Rule

Path

Disallow

bytespider

Rule	Path
Disallow	/

Rule

Path

Disallow

petalbot

Rule	Path
Disallow	/

Rule

Path

Disallow

dotbot

Rule	Path
Disallow	/

Rule

Path

Disallow

megaindex

Rule	Path
Disallow	/

Rule

Path

Disallow

serpstatbot

Rule	Path
Disallow	/

Rule

Path

Disallow

barkrowler

Rule	Path
Disallow	/

Rule

Path

Disallow

seekportbot

Rule	Path
Disallow	/

Rule

Path

Disallow

mj12bot

No rules defined. All paths allowed.

Other Records

Field	Value
crawl-delay	10

Field

Value

crawl-delay

yandexbot

No rules defined. All paths allowed.

Other Records

Field	Value
crawl-delay	2.5

Field

Value

crawl-delay

2.5

bingbot

No rules defined. All paths allowed.

Other Records

Field	Value
crawl-delay	20

Field

Value

crawl-delay

Other Records

Field	Value
sitemap	https://www.portalsofphereon.com/sitemap.xml

Field

Value

sitemap

https://www.portalsofphereon.com/sitemap.xml

Comments

robots.txt for Miraheze
Throttle access to certain pages
Do not include special pages and other pages where indexing is undesirable if they are likely to be linked to; use noindex instead.
That's because Google can still index pages in here without crawling them if the pages are linked to
See https://developers.google.com/search/docs/crawling-indexing/robots/intro
Block SemrushBot
Block AhrefsBot
Block Bytespider
Block PetalBot
Block DotBot
Block MegaIndex
Block serpstatbot
Block Barkrowler
Block SeekportBot
Keep Crawl-Delay rules at the bottom
Bots that don't understand Crawl-Delay might break when encountering it
See https://github.com/otwcode/otwarchive/pull/4411#discussion_r1044351129 (English) and https://webtan.impress.co.jp/e/2022/11/04/43611 (Japanese)
Throttle MJ12Bot
Throttle YandexBot
TODO: Crawl-delay is not respected since 2018
Throttle BingBot
----------------------------------------------------------
Dynamic sitemap url
----------------------------------------------------------

portalsofphereon.comrobots.txt

Resource Scan

Scan Details

Last Successful Scan

Groups

*

semrushbot

ahrefsbot

bytespider

petalbot

dotbot

megaindex

serpstatbot

barkrowler

seekportbot

mj12bot

Other Records

yandexbot

Other Records

bingbot

Other Records

Other Records

Comments

portalsofphereon.com
robots.txt