legacy.wral.com
robots.txt

Robots Exclusion Standard data for legacy.wral.com

Resource Scan

Scan Details

Site Domain legacy.wral.com
Base Domain wral.com
Scan Status Ok
Last Scan2024-06-29T04:39:10+00:00
Next Scan 2024-07-06T04:39:10+00:00

Last Scan

Scanned2024-06-29T04:39:10+00:00
URL https://legacy.wral.com/robots.txt
Domain IPs 3.160.188.36, 3.160.188.59, 3.160.188.9, 3.160.188.94
Response IP 18.165.171.9
Found Yes
Hash b1d17d7500cde81d111b8216d737af1d4ff92162f9f3239e0f9777f397e088b7
SimHash d40d1b50a780

Groups

twitterbot

Rule Path
Disallow

amazonbot
anthropic-ai
awariorssbot
awariosmartbot
bytespider
ccbot
chatgpt-user
claudebot
claude-web
cohere-ai
dataforseobot
diffbot
facebookbot
google-extended
gptbot
magpie-crawler
newsnow
news-please
omgili
omgilibot
peer39_crawler
peer39_crawler/1.0
perplexitybot
scrapy
turnitinbot

Rule Path
Disallow /

*

Rule Path
Disallow /
Allow /favicons/
Allow /images/content
Allow *.js
Allow *.css

Comments

  • WRAL.com robots.txt