guerillacricket.com
robots.txt
Robots Exclusion Standard data for guerillacricket.com
Resource Scan
Scan Details
Site Domain | guerillacricket.com |
Base Domain | guerillacricket.com |
Scan Status | Ok |
Last Scan | 2024-11-11T22:28:11+00:00 |
Next Scan | 2024-11-18T22:28:11+00:00 |
Last Scan
Scanned | 2024-11-11T22:28:11+00:00 |
URL | https://guerillacricket.com/robots.txt |
Redirect | https://www.guerillacricket.com/robots.txt |
Redirect Domain | www.guerillacricket.com |
Redirect Base | guerillacricket.com |
Domain IPs | 35.197.243.217 |
Redirect IPs | 35.197.243.217 |
Response IP | 35.197.243.217 |
Found | Yes |
Hash | 3c8cc4227bdb68017b5c75a939e6f15bae0be58c2bdb9a28056eaa55d33d640d |
SimHash | 21240cf5453d |
Groups
*
Rule | Path | Comment |
---|---|---|
Disallow | /wp-admin | - |
Allow | /wp-admin/admin-ajax.php | - |
Disallow | /wp-login | - |
Disallow | /xmlrpc.php | - |
Disallow | /wp-content/themes/*/$ | prevents just the theme dir being crawled, which they have been since mid-2024 for some reason |
Disallow | /trackback | - |
Disallow | */trackback | - |
Other Records
Field | Value |
---|---|
sitemap | https://www.guerillacricket.com/sitemap/sitemap.xml |