dcuguide.com
robots.txt

Robots Exclusion Standard data for dcuguide.com

Resource Scan

Scan Details

Site Domain dcuguide.com
Base Domain dcuguide.com
Scan Status Failed
Failure StageFetching resource.
Failure ReasonServer returned a client error.
Last Scan2024-08-31T11:15:47+00:00
Next Scan 2024-11-29T11:15:47+00:00

Last Successful Scan

Scanned2023-10-14T09:50:19+00:00
URL https://dcuguide.com/robots.txt
Domain IPs 104.21.36.208, 172.67.199.153, 2606:4700:3035::6815:24d0, 2606:4700:3036::ac43:c799
Response IP 104.21.36.208
Found Yes
Hash 272e76ff04c97306d607892d5678451b2af7774f764fc714bf9f936c1981ac6a
SimHash 6f48d0816415

Groups

*

Rule Path
Disallow /forums/
Disallow /who.php
Disallow /chronology.php
Disallow /w/Batman%3A_The_Detective_Title_Index
Disallow /w/Batman_Vol._3_Title_Index
Disallow /w/I_Am_Batman_Title_Index
Allow /sitemap/
Allow /sitemap.xml

Other Records

Field Value
crawl-delay 200

bingbot
googlebot
googlebot-image
mediapartners-google
msnbot
msnbot-media
slurp
yahoo-blogs
yahoo-mmcrawler

Rule Path
Disallow /cgi-bin/
Disallow /wiki/
Disallow /wiki2/
Disallow /guide2wiki/
Disallow /forums/

Other Records

Field Value
crawl-delay 100

baiduspider
yandex
ahrefsbot
semrushbot
the knowledge ai
mj12bot

Rule Path
Disallow /

Other Records

Field Value
sitemap https://dcuguide.com/sitemap/sitemap.xml

Comments

  • but allow only important bots
  • Directories
  • Disallow certain spiders