getlighthouse.com
robots.txt

Robots Exclusion Standard data for getlighthouse.com

Resource Scan

Scanned	2025-09-12T09:47:55+00:00
URL	https://getlighthouse.com/robots.txt
Domain IPs	104.21.5.80, 172.67.133.49, 2606:4700:3032::ac43:8531, 2606:4700:3034::6815:550
Response IP	104.21.5.80
Found	Yes
Hash	3defc7df492ad816db4dbb8de7b5bab73712b8eb0603da7ef2e6892d65f45808
SimHash	82850c07e4d0

Rule

Path

Disallow

/blog/wp-admin/

Disallow

*?s=*

Disallow

*/?s

Disallow

*%26s%3D*

Disallow

*?p=*

Disallow

*%26p%3D*

Disallow

*%26preview%3D*

Allow

/blog/wp-admin/admin-ajax.php

Back to top

Field	Value
sitemap	https://getlighthouse.com/blog/sitemap_index.xml

Field

Value

sitemap

https://getlighthouse.com/blog/sitemap_index.xml

Back to top

See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
To ban all spiders from the entire site uncomment the next two lines:
User-Agent: *
Disallow: /

Back to top