leafhaus.com
robots.txt

Robots Exclusion Standard data for leafhaus.com

Resource Scan

Scan Details

Site Domain leafhaus.com
Base Domain leafhaus.com
Scan Status Ok
Last Scan2025-10-02T18:12:17+00:00
Next Scan 2025-11-01T18:12:17+00:00

Last Scan

Scanned2025-10-02T18:12:17+00:00
URL https://leafhaus.com/robots.txt
Domain IPs 104.21.26.3, 172.67.168.52, 2606:4700:3030::ac43:a834, 2606:4700:3034::6815:1a03
Response IP 172.67.168.52
Found Yes
Hash 6fbcb65a1e2e9f6814161a2decd3f08b2e3aa98728b56dfcb21d7c232cbca5e5
SimHash 4a349bca64b3

Groups

*

Rule Path
Disallow /wp-admin/
Disallow /wp-includes/
Disallow /cgi-bin/
Disallow /wp-content/plugins/
Disallow /wp-content/themes/
Disallow /trackback/
Disallow /xmlrpc.php
Disallow /readme.html
Disallow /license.txt
Disallow /wp-login.php
Disallow /wp-signup.php
Disallow /search/
Disallow /private/
Disallow /tmp/
Disallow /backup/
Disallow /staging/
Disallow /?s=
Disallow /?author=*
Allow /wp-content/uploads/
Allow /wp-includes/js/

ahrefsbot

Rule Path
Disallow /

semrushbot

Rule Path
Disallow /

mj12bot

Rule Path
Disallow /

baiduspider

Rule Path
Disallow /

dotbot

Rule Path
Disallow /

spbot

Rule Path
Disallow /

yandex

Rule Path
Disallow /

chatgpt-user

Rule Path
Disallow /

openai

Rule Path
Disallow /

gptbot

Rule Path
Disallow /

Other Records

Field Value
sitemap https://leafhaus.com/sitemap_index.xml
sitemap https://dutchie.com/embedded-menu/leaf-haus-retail-franklin-twp/sitemap.xml

Comments

  • General rules for all user agents
  • Disallow access to sensitive directories and files
  • Allow crawlers to index critical assets
  • Block specific known bad bots
  • Block GPT-based bots
  • Sitemaps