newscaf.com
robots.txt

Robots Exclusion Standard data for newscaf.com

Resource Scan

Scan Details

Site Domain newscaf.com
Base Domain newscaf.com
Scan Status Ok
Last Scan2026-01-09T08:03:51+00:00
Next Scan 2026-01-16T08:03:51+00:00

Last Scan

Scanned2026-01-09T08:03:51+00:00
URL https://newscaf.com/robots.txt
Redirect https://www.newscaf.com/robots.txt
Redirect Domain www.newscaf.com
Redirect Base newscaf.com
Domain IPs 198.72.102.251
Redirect IPs 198.72.102.251
Response IP 198.72.102.251
Found Yes
Hash a22adabb915b9b5b1bcc0f48e718fd3fa90ad8720ccec0d461ef29df590421e2
SimHash 4a35fd61c211

Groups

*

Rule Path
Allow /

googlebot-image

Rule Path
Disallow /

*

Rule Path
Disallow /out/

slurp

Rule Path
Disallow /

mj12bot

Rule Path
Disallow /

wotbox

Rule Path
Disallow /

aboundexbot

Rule Path
Disallow /

jikespider

Rule Path
Disallow /

sentibot

Rule Path
Disallow /

ahrefsbot

Rule Path
Disallow /

blexbot

Rule Path
Disallow /

rogerbot

Rule Path
Disallow /

semrushbot

Rule Path
Disallow /

semrushbot-sa

Rule Path
Disallow /

baiduspider

Rule Path
Disallow /

yandexbot

Rule Path
Disallow /

linguee

Rule Path
Disallow /

applebot

Rule Path
Disallow /

*

No rules defined. All paths allowed.

Other Records

Field Value
crawl-delay 15