getlighthouse.com
robots.txt

Robots Exclusion Standard data for getlighthouse.com

Resource Scan

Scan Details

Site Domain getlighthouse.com
Base Domain getlighthouse.com
Scan Status Ok
Last Scan2025-09-12T09:47:55+00:00
Next Scan 2025-10-12T09:47:55+00:00

Last Scan

Scanned2025-09-12T09:47:55+00:00
URL https://getlighthouse.com/robots.txt
Domain IPs 104.21.5.80, 172.67.133.49, 2606:4700:3032::ac43:8531, 2606:4700:3034::6815:550
Response IP 104.21.5.80
Found Yes
Hash 3defc7df492ad816db4dbb8de7b5bab73712b8eb0603da7ef2e6892d65f45808
SimHash 82850c07e4d0

Groups

*

Rule Path
Disallow /blog/wp-admin/
Disallow *?s=*
Disallow */?s
Disallow *%26s%3D*
Disallow *?p=*
Disallow *%26p%3D*
Disallow *%26preview%3D*
Allow /blog/wp-admin/admin-ajax.php

Other Records

Field Value
sitemap https://getlighthouse.com/blog/sitemap_index.xml

Comments

  • See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
  • To ban all spiders from the entire site uncomment the next two lines:
  • User-Agent: *
  • Disallow: /