yaccintv.com
robots.txt

Robots Exclusion Standard data for yaccintv.com

Resource Scan

Scan Details

Site Domain yaccintv.com
Base Domain yaccintv.com
Scan Status Ok
Last Scan2026-02-06T03:05:14+00:00
Next Scan 2026-03-08T03:05:14+00:00

Last Scan

Scanned2026-02-06T03:05:14+00:00
URL https://yaccintv.com/robots.txt
Domain IPs 104.21.34.166, 172.67.163.21, 2606:4700:3036::6815:22a6, 2606:4700:3036::ac43:a315
Response IP 104.21.34.166
Found Yes
Hash d7572a10b4a4b0e269da91a905c9b9c3cf5d0c2e7dc2a59971ab7218ba36e883
SimHash 000cdce256b2

Groups

googlebot

Rule Path
Disallow

googlebot-image

Rule Path
Disallow

googlebot-news

Rule Path
Disallow

googlebot-video

Rule Path
Disallow

adsbot-google

Rule Path
Disallow

ahrefsbot

Rule Path
Disallow /

semrushbot

Rule Path
Disallow /

mj12bot

Rule Path
Disallow /

dotbot

Rule Path
Disallow /

mauibot

Rule Path
Disallow /

blexbot

Rule Path
Disallow /

yandex

Rule Path
Disallow /

baiduspider

Rule Path
Disallow /

petalbot

Rule Path
Disallow /

sogou

Rule Path
Disallow /

spbot

Rule Path
Disallow /

nutch

Rule Path
Disallow /

scrapy

Rule Path
Disallow /

*

Rule Path
Disallow /tmp/
Disallow /private/
Disallow /admin/

Other Records

Field Value
crawl-delay 10

Other Records

Field Value
sitemap https://www.example.com/sitemap.xml

Comments

  • robots.txt for https://www.example.com/
  • Purpose: Block bad bots, allow Google, and rate-limit basic crawlers
  • =========================
  • Allow Google Crawlers
  • =========================
  • =========================
  • Block Common Bad Crawlers
  • =========================
  • =========================
  • Slow Down Basic Crawlers
  • =========================
  • =========================
  • Sitemap (important for SEO)
  • =========================