the-express.com
robots.txt

Robots Exclusion Standard data for the-express.com

Resource Scan

Scan Details

Site Domain the-express.com
Base Domain the-express.com
Scan Status Ok
Last Scan2024-06-13T16:51:16+00:00
Next Scan 2024-06-20T16:51:16+00:00

Last Scan

Scanned2024-06-13T16:51:16+00:00
URL https://the-express.com/robots.txt
Domain IPs 3.165.82.128, 3.165.82.41, 3.165.82.60, 3.165.82.70
Response IP 3.165.82.60
Found Yes
Hash cb1f9a3b2879c263319f5b5e251d9fa1299e58f1262000078fb6a1337d0c7f27
SimHash a8166a064783

Groups

*

Rule Path Comment
Disallow /myexpress/ -
Disallow /printer/ We'll keep the print version for our newspaper
Disallow /users/ -
Disallow /sponsored/ Advertorials
Disallow /trackings/ Adserving
Disallow /34722903/ Adserving
Disallow /search?* -
Disallow /videos/get_video_by_uid/ -
Disallow /videos/viewmeta/ -

grapeshot

Rule Path
Disallow

googlebot-news

Rule Path Comment
Disallow /myexpress/ -
Disallow /printer/ We'll keep the print version for our newspaper
Disallow /users/ -
Disallow /fun/ -
Disallow /sponsored/ Advertorials
Disallow /trackings/ Adserving
Disallow /34722903/ Adserving
Disallow /sponsoredfeatures -
Disallow /search?* -
Disallow /videos/get_video_by_uid/ -
Disallow /videos/viewmeta/ -

ia_archiver

Rule Path
Disallow /

nutch

Rule Path
Disallow /

Other Records

Field Value
sitemap https://www.the-express.com/sitemap.xml
sitemap https://www.the-express.com/googlenews.xml

Comments

  • 170820-DXD-6728