correiomanha.pt
robots.txt

Robots Exclusion Standard data for correiomanha.pt

Resource Scan

Scan Details

Site Domain correiomanha.pt
Base Domain correiomanha.pt
Scan Status Ok
Last Scan2024-11-16T12:34:35+00:00
Next Scan 2024-11-23T12:34:35+00:00

Last Scan

Scanned2024-11-16T12:34:35+00:00
URL https://correiomanha.pt/robots.txt
Redirect https://www.cmjornal.pt/robots.txt
Redirect Domain www.cmjornal.pt
Redirect Base cmjornal.pt
Domain IPs 195.23.36.47
Redirect IPs 88.157.217.146
Response IP 88.157.217.146
Found Yes
Hash 8ec52e08194290f26d08d9883aab9f2b607e60a64de4f722b7f74be45051a04d
SimHash 1806585489d3

Groups

openai-crawler

Rule Path
Disallow /

googlebot-bard

Rule Path
Disallow /

gptbot

Rule Path
Disallow /

chatgpt-user

Rule Path
Disallow /

google-extended

Rule Path
Disallow /

ccbot

Rule Path
Disallow /

*

Rule Path
Disallow /i/
Disallow /fonts/
Disallow /css/
Disallow /Error/
Disallow /4196/
Disallow /js/
Disallow /Async
Disallow /site/
Disallow /lib/
Disallow /Comentarios/

Other Records

Field Value
sitemap https://www.cmjornal.pt/sitemap