newsru.ca
robots.txt

Robots Exclusion Standard data for newsru.ca

Resource Scan

Scan Details

Site Domain newsru.ca
Base Domain newsru.ca
Scan Status Failed
Failure StageFetching resource.
Failure ReasonServer returned a client error.
Last Scan2024-09-07T09:27:33+00:00
Next Scan 2024-12-06T09:27:33+00:00

Last Successful Scan

Scanned2022-11-10T20:53:50+00:00
URL https://newsru.ca/robots.txt
Response IP 172.67.170.121, 104.21.95.146
Found Yes
Hash fad591f50d8030b4c05616f41ec2346621962ed3c0b608ef5514b57259d1dea1
SimHash 7305dc332393

Groups

scrapy

Rule Path
Allow /

*

Rule Path
Allow */*.css
Allow */*.js*
Allow /wp-admin/admin-ajax.php
Disallow /wp-admin
Disallow */*wp-json/
Disallow /wp-login.php
Disallow /wp-register.php
Disallow */feed*
Disallow /cgi-bin
Disallow /xmlrpc.php
Disallow */*comments
Disallow */*trackback/
Disallow */embed*

Other Records

Field Value
crawl-delay 10

Other Records

Field Value
sitemap https://newsru.ca/sitemap.xml

Comments

  • Disallow: */*?*=*
  • Disallow: */*?p*=*