independent.ie
robots.txt

Robots Exclusion Standard data for independent.ie

Resource Scan

Scan Details

Site Domain independent.ie
Base Domain independent.ie
Scan Status Ok
Last Scan2024-11-15T23:38:04+00:00
Next Scan 2024-11-22T23:38:04+00:00

Last Scan

Scanned2024-11-15T23:38:04+00:00
URL https://independent.ie/robots.txt
Redirect https://www.independent.ie/robots.txt
Redirect Domain www.independent.ie
Redirect Base independent.ie
Domain IPs 104.18.30.138, 104.18.31.138, 2606:4700::6812:1e8a, 2606:4700::6812:1f8a
Redirect IPs 104.18.30.138, 104.18.31.138, 2606:4700::6812:1e8a, 2606:4700::6812:1f8a
Response IP 104.18.31.138
Found Yes
Hash 5f3b30ad52f2aa7808a05dd83dbf0d60e25f210e48f2e92426edfc2cb96014f2
SimHash 683897718c75

Groups

*

Rule Path
Disallow /search/
Disallow /qwerty/
Disallow /*.ece$
Disallow /utils/
Disallow /account/
Disallow /LoadTest/
Disallow /api/
Disallow /qa/
Disallow /ad-test
Disallow /service-archive
Disallow /subscribe-archive
Disallow /messagent/
Disallow /extra/messagent/

googlebot-news

Rule Path
Disallow /storyplus/*
Disallow /sponsored-features/*

mediapartners-google

Rule Path
Disallow

amazonbot

Rule Path
Disallow /

anthropic-ai

Rule Path
Disallow /

bytespider

Rule Path
Disallow /

ccbot

Rule Path
Disallow /

chatgpt-user

Rule Path
Disallow /

claudebot

Rule Path
Disallow /

claude-web

Rule Path
Disallow /

cohere-ai

Rule Path
Disallow /

diffbot

Rule Path
Disallow /

facebookbot

Rule Path
Disallow /

google-extended

Rule Path
Disallow /

gptbot

Rule Path
Disallow /

magpie-crawler

Rule Path
Disallow /

omgili

Rule Path
Disallow /

omgilibot

Rule Path
Disallow /

perplexitybot

Rule Path
Disallow /

Other Records

Field Value
sitemap https://www.independent.ie/sitemap/sitemap_googlenews.xml      
sitemap https://www.independent.ie/sitemap/sitemap_channels.xml
sitemap https://www.independent.ie/sitemap/sitemap.xml
sitemap https://www.independent.ie/sitemap/sitemap_video.xml

Comments

  • All copyrights, neighbouring rights and database rights in the content and layout of this website/app are explicitly reserved and are for personal, non-commercial use only.
  • In accordance with Article 4 of the Directive on Copyright in the Digital Single Market (CDSM) and its transposition into the law of the applicable Member State,
  • all content of this website on which it is made available is not to be used for the purposes of text and data mining, extraction, scraping and/or the use of programs or robots
  • for automatic data collection and/or extraction of digital data, whether for machine learning or artificial intelligence purposes or otherwise.
  • See also the Terms and Conditions of this website.
  • All Robots
  • Disallow unwanted URL patterns to be crawled and indexed
  • Disallow Sponsored Articles for Google News
  • Sitemap Files
  • Allow Adsense
  • Rules for robots