static.theprint.in
robots.txt

Robots Exclusion Standard data for static.theprint.in

Resource Scan

Scan Details

Site Domain static.theprint.in
Base Domain theprint.in
Scan Status Ok
Last Scan2024-05-03T00:08:19+00:00
Next Scan 2024-06-02T00:08:19+00:00

Last Scan

Scanned2024-05-03T00:08:19+00:00
URL https://static.theprint.in/robots.txt
Domain IPs 108.156.133.11, 108.156.133.36, 108.156.133.45, 108.156.133.67, 2600:9000:2755:2c00:1e:3acb:8080:93a1, 2600:9000:2755:6000:1e:3acb:8080:93a1, 2600:9000:2755:8200:1e:3acb:8080:93a1, 2600:9000:2755:9200:1e:3acb:8080:93a1, 2600:9000:2755:9c00:1e:3acb:8080:93a1, 2600:9000:2755:aa00:1e:3acb:8080:93a1, 2600:9000:2755:bc00:1e:3acb:8080:93a1, 2600:9000:2755:c00:1e:3acb:8080:93a1
Response IP 108.156.133.67
Found Yes
Hash aad80a92e5e9011812ad8cb5e18762d75cb2736ee672f93d0408103539abe335
SimHash 6d455650c313

Groups

*

Rule Path
Allow /
Disallow *LINK*
Disallow *newnewssitemap.xml?yyyy*
Disallow *?p=*
Disallow */search/*
Disallow *?s*

gptbot

Rule Path
Disallow /

Other Records

Field Value
sitemap https://theprint.in/googlenews.xml
sitemap https://theprint.in/sitemap_index.xml