catanzarotoday.it
robots.txt

Robots Exclusion Standard data for catanzarotoday.it

Resource Scan

Scan Details

Site Domain catanzarotoday.it
Base Domain catanzarotoday.it
Scan Status Ok
Last Scan2024-11-13T05:32:34+00:00
Next Scan 2024-11-20T05:32:34+00:00

Last Scan

Scanned2024-11-13T05:32:34+00:00
URL https://catanzarotoday.it/robots.txt
Redirect https://www.catanzarotoday.it/robots.txt
Redirect Domain www.catanzarotoday.it
Redirect Base catanzarotoday.it
Domain IPs 217.182.55.155
Redirect IPs 217.182.55.155
Response IP 217.182.55.155
Found Yes
Hash 522681ac5fe4d312d2ddc7664cdf557b2352fc12eb1ba457eae97202fd216c6a
SimHash 5f49815347f1

Groups

*
googlebot

Rule Path
Allow /
Disallow /~vda/
Disallow /~shared/do/
Disallow /~shared/cgi-bin/
Disallow /~do/
Disallow /~cgi-bin/
Disallow /do/
Disallow /cgi-bin/
Disallow /~test/
Disallow /~api/
Disallow /~ajax/
Disallow /~otp/
Disallow /~pixel/
Disallow /~empty/
Disallow /captcha/
Disallow /form/
Disallow /signup/
Disallow /commento/
Disallow /user/login/
Disallow /user/logout/
Disallow /user/sso/
Disallow /user/oauth/
Disallow /user/activate/
Disallow /user/reset/
Disallow /user/delete/
Disallow /user/unsubscribe/
Disallow /user/contents/
Disallow /user/news/
Disallow /user/relation/
Disallow /user/subscription/
Disallow /user/edit/
Disallow /user/self/
Disallow /~shared/styles/
Disallow /styles/
Disallow /~shared/scripts/
Disallow /scripts/
Disallow /medias/
Disallow /uploads/
Disallow /sf/
Allow /~shared/do/api/google/
Allow /~shared/do/api/google-newsstand/
Allow /~shared/do/api/amazon/
Allow /~shared/do/api/facebook/
Allow /~shared/do/api/samsung/

amazonbot

Rule Path
Disallow /

anthropic-ai

Rule Path
Disallow /

applebot-extended

Rule Path
Disallow /

awariorssbot
awariosmartbot

Rule Path
Disallow /

bytespider

Rule Path
Disallow /

ccbot

Rule Path
Disallow /

chatgpt-user

Rule Path
Disallow /

claudebot

Rule Path
Disallow /

claude-web

Rule Path
Disallow /

cohere-ai

Rule Path
Disallow /

dataforseobot

Rule Path
Disallow /

diffbot

Rule Path
Disallow /

facebookbot

Rule Path
Disallow /

gptbot

Rule Path
Disallow /

magpie-crawler

Rule Path
Disallow /

newsnow

Rule Path
Disallow /

news-please

Rule Path
Disallow /

omgili

Rule Path
Disallow /

omgilibot

Rule Path
Disallow /

peer39_crawler
peer39_crawler/1.0

Rule Path
Disallow /

perplexitybot

Rule Path
Disallow /

scrapy

Rule Path
Disallow /

turnitinbot

Rule Path
Disallow /

Other Records

Field Value
crawl-delay 3

Other Records

Field Value
sitemap https://www.sassaritoday.it/sitemaps/sitemap.xml
sitemap https://www.sassaritoday.it/sitemaps/sitemap_news.xml

Comments

  • COPYRIGHT NOTICE. The contents of this website are available only for personal, non-commercial
  • use. Use of any kind of device, tool, or process designed to data mine or scrape the content
  • using automated means is prohibited without prior written permission from
  • Citynews SpA. Prohibited uses include but are not limited to:
  • (1) text and data mining activities under Art. 4 of the EU Directive on Copyright in
  • the Digital Single Market;
  • (2) the development of any software, machine learning, artificial intelligence (AI),
  • and/or large language models (LLMs);
  • (3) creating or providing archived or cached data sets containing our content to others; and/or
  • (4) any commercial purposes.
  • Contact https://citynews.it for licensing.

Warnings

  • `host` is not a known field.