henitan.com
robots.txt

Robots Exclusion Standard data for henitan.com

Resource Scan

Scan Details

Site Domain henitan.com
Base Domain henitan.com
Scan Status Ok
Last Scan5/19/2025, 9:02:31 AM
Next Scan 5/26/2025, 9:02:31 AM

Last Scan

Scanned5/19/2025, 9:02:31 AM
URL https://henitan.com/robots.txt
Domain IPs 104.21.79.172, 172.67.146.160, 2606:4700:3033::ac43:92a0, 2606:4700:3037::6815:4fac
Response IP 104.21.79.172
Found Yes
Hash 768bde52814028f14dcdbe0b883d87f1ee8a5d8e05d53ff1c70ddd0e60b20051
SimHash 6635d85367e0

Groups

*

Rule Path
Disallow /wp-admin/
Disallow /wp-includes/
Disallow /wp-content/plugins/
Disallow /wp-content/cache/
Disallow /tmp/
Disallow /private/
Disallow /backup/
Disallow /scripts/

googlebot

Rule Path
Allow /

bingbot

Rule Path
Allow /

badbot

Rule Path
Disallow /

adsbot-google

Rule Path
Allow /

*

Rule Path
Disallow /112924tpgealegtw-.html
Disallow /145460tpgetokyo/aleqtg-.htm
Disallow /41446tpgealecgs-el.html
Disallow /*.html$
Disallow /*.htm$

Other Records

Field Value
sitemap https://henitan.com/sitemap_index.xml

Comments

  • robots.txt for https://henitan.com
  • Allow all user agents to crawl the entire site
  • Allow Googlebot and other major search engines to crawl
  • Block specific bots that may harm your site
  • Sitemap file for better indexing
  • Allow crawling of AdSense-related content
  • Crawl-delay settings for less aggressive bots (optional)
  • User-agent: *
  • Crawl-delay:
  • Block any URLs that end with .html or .htm