beta.thehindu.com
robots.txt

Robots Exclusion Standard data for beta.thehindu.com

Resource Scan

Scan Details

Site Domain beta.thehindu.com
Base Domain thehindu.com
Scan Status Ok
Last Scan2024-11-08T22:36:09+00:00
Next Scan 2024-11-15T22:36:09+00:00

Last Scan

Scanned2024-11-08T22:36:09+00:00
URL https://beta.thehindu.com/robots.txt
Redirect https://www.thehindu.com/robots.txt
Redirect Domain www.thehindu.com
Redirect Base thehindu.com
Domain IPs 104.18.39.235, 172.64.148.21, 2606:4700:4400::6812:27eb, 2606:4700:4400::ac40:9415
Redirect IPs 104.18.39.235, 172.64.148.21, 2606:4700:4400::6812:27eb, 2606:4700:4400::ac40:9415
Response IP 104.18.39.235
Found Yes
Hash 5eddfb36e19b4033dec6df823e573c5110b8404d57d9e5945545c7576bc2f4d2
SimHash 38006be56461

Groups

*

Rule Path
Disallow /cgi-bin/
Disallow /cdn-cgi/*
Disallow /config/
Disallow */analysis-logger/*
Disallow /todays-paper/*article*.ece*
Disallow /migration_catalog/
Disallow */getapplink*
Disallow /summercamp/
Disallow */wf.fragment/*
Disallow /poll/vote.do
Disallow /photo/10000/
Disallow */photo/10000/
Disallow /news-service/*
Disallow /search/
Disallow /SEARCH/
Disallow /Search/
Disallow /22390678/
Disallow /walkathon
Disallow /heelsonwheels
Disallow /archive/print/
Disallow /static/content/
Disallow /todayspaper/
Disallow /today-paper/
Disallow /thseweekend
Disallow /ywsummercamp
Disallow /frnf2014
Disallow /summermusic
Disallow /summercamp
Disallow /yw25
Disallow /thischess
Disallow /ywquiz
Disallow /mumbaiedition
Disallow /marathon
Disallow /flndrmk-spl
Disallow /system/
Disallow /sachin
Disallow /supermom2016
Disallow /ywq2015
Disallow /fooddirectory
Disallow /chennaicoastalcleanup
Disallow /onam-1
Disallow /onam-2
Disallow /hohkidscarnival
Disallow /HOW2014
Disallow /how2014
Disallow /cookingcontest
Disallow /youngworld
Disallow /impulse2014
Disallow /mssaward2015
Disallow /mssaward2016
Disallow /wow2015
Disallow /wow2017/chennai/
Disallow /wow2017/bangalore/
Disallow /tickets2016/
Disallow /tickets2017/
Disallow /tickets2018/
Disallow /ticket2018/
Disallow /ywsummercamp/
Disallow /tags/TopicRoot_TH/
Disallow /thehindu/
Disallow /200*
Disallow /201*
Disallow */http%3A*
Disallow */https%3A*
Disallow */mailto%3A*
Disallow *.ecehttp*
Disallow *.ece1http*
Disallow *.ece2http*
Disallow /brandhub/sponsored-content/
Disallow /coupons/
Disallow */?_ptid=*
Disallow */?_gl=*
Disallow /sitemap/archive/picture/
Disallow *%3Bhttp%3A*
Disallow *%3Bhttps%3A*
Disallow *%20http%3A*
Disallow *%20https%3A*
Disallow */couponRedirect
Disallow *?redirect=
Disallow *?store=
Disallow /tag/*/*/
Disallow /profile/
Allow /profile/*article*.ece
Allow /profile/contributor/?
Allow /profile/photographers/?
Allow /profile/author/?
Allow /profile/contributor/$
Allow /profile/photographers/$
Allow /profile/author/$
Allow /profile/$
Disallow /sitemap/archive*
Disallow */fragment/*
Disallow /sitemap/
Allow /sitemap/$
Allow /sitemap/update/all.xml$
Allow /sitemap/update/section.xml$
Allow /sitemap/googlenews/all/all.xml$

ia_archiver

Rule Path
Disallow /

gptbot

Rule Path
Disallow /

Other Records

Field Value
sitemap https://www.thehindu.com/sitemap/googlenews/all/all.xml
sitemap https://www.thehindu.com/sitemap/update/all.xml
sitemap https://www.thehindu.com/sitemap/update/section.xml
sitemap https://www.thehindu.com/feeder/default.rss
sitemap https://www.thehindu.com/?service=videositemap

Comments

  • Disallow: /newsletter/*
  • Disallow: /ebooks/
  • Disallow: /todays-paper/
  • Disallow: /static/
  • Blocking ptid and gl parameter URLs that are causing crawl budget wastage
  • Block picture sitemap because urls are noindex therein
  • Block all space or %20 suffixed with http or https protocols
  • Blocked until duplicate profile bylines fixed
  • profile articles allowed
  • profile exceptions allowed for pages with page=1 for rediects
  • Disallow Wayback machine which is a problem crawler
  • Disallow ChatGPT from extracting or interpreting our content

Warnings

  • 3 invalid lines.