guerillacricket.com
robots.txt

Robots Exclusion Standard data for guerillacricket.com

Resource Scan

Scan Details

Site Domain guerillacricket.com
Base Domain guerillacricket.com
Scan Status Ok
Last Scan2024-11-11T22:28:11+00:00
Next Scan 2024-11-18T22:28:11+00:00

Last Scan

Scanned2024-11-11T22:28:11+00:00
URL https://guerillacricket.com/robots.txt
Redirect https://www.guerillacricket.com/robots.txt
Redirect Domain www.guerillacricket.com
Redirect Base guerillacricket.com
Domain IPs 35.197.243.217
Redirect IPs 35.197.243.217
Response IP 35.197.243.217
Found Yes
Hash 3c8cc4227bdb68017b5c75a939e6f15bae0be58c2bdb9a28056eaa55d33d640d
SimHash 21240cf5453d

Groups

*

Rule Path Comment
Disallow /wp-admin -
Allow /wp-admin/admin-ajax.php -
Disallow /wp-login -
Disallow /xmlrpc.php -
Disallow /wp-content/themes/*/$ prevents just the theme dir being crawled, which they have been since mid-2024 for some reason
Disallow /trackback -
Disallow */trackback -

Other Records

Field Value
sitemap https://www.guerillacricket.com/sitemap/sitemap.xml