guerillacricket.com
robots.txt

Robots Exclusion Standard data for guerillacricket.com

Archived Snapshots

Resource Scan

Scan Details

Site Domain	guerillacricket.com
Base Domain	guerillacricket.com
Scan Status	Ok
Last Scan	2024-11-11T22:28:11+00:00
Next Scan	2024-11-18T22:28:11+00:00

Last Scan

Scanned	2024-11-11T22:28:11+00:00
URL	https://guerillacricket.com/robots.txt
Redirect	https://www.guerillacricket.com/robots.txt
Redirect Domain	www.guerillacricket.com
Redirect Base	guerillacricket.com
Domain IPs	35.197.243.217
Redirect IPs	35.197.243.217
Response IP	35.197.243.217
Found	Yes
Hash	3c8cc4227bdb68017b5c75a939e6f15bae0be58c2bdb9a28056eaa55d33d640d
SimHash	21240cf5453d

Groups

*

Rule	Path	Comment
Disallow	/wp-admin	-
Allow	/wp-admin/admin-ajax.php	-
Disallow	/wp-login	-
Disallow	/xmlrpc.php	-
Disallow	/wp-content/themes/*/$	prevents just the theme dir being crawled, which they have been since mid-2024 for some reason
Disallow	/trackback	-
Disallow	*/trackback	-

Rule

Path

Comment

Disallow

/wp-admin

-

Allow

/wp-admin/admin-ajax.php

-

Disallow

/wp-login

-

Disallow

/xmlrpc.php

-

Disallow

/wp-content/themes/*/$

prevents just the theme dir being crawled, which they have been since mid-2024 for some reason

Disallow

/trackback

-

Disallow

*/trackback

-

Back to top

Other Records

Field	Value
sitemap	https://www.guerillacricket.com/sitemap/sitemap.xml

Field

Value

sitemap

https://www.guerillacricket.com/sitemap/sitemap.xml

Back to top

guerillacricket.comrobots.txt

Resource Scan

Scan Details

Last Scan

Groups

*

Other Records

guerillacricket.com
robots.txt