usask.ca
robots.txt

Robots Exclusion Standard data for usask.ca

Resource Scan

Scan Details

Site Domain usask.ca
Base Domain usask.ca
Scan Status Ok
Last Scan2024-08-30T21:30:33+00:00
Next Scan 2024-09-29T21:30:33+00:00

Last Scan

Scanned2024-08-30T21:30:33+00:00
URL https://usask.ca/robots.txt
Redirect https://www.usask.ca/robots.txt
Redirect Domain www.usask.ca
Redirect Base usask.ca
Domain IPs 128.233.195.103
Redirect IPs 128.233.198.205, 2620:ae:0:1172:2840:d79f:ea47:75a4
Response IP 128.233.198.205
Found Yes
Hash 341d9820265a95a670cda0b9fab7bf12078b5ef04cfd628c0488d39ec772ed5a
SimHash be9d912287f2

Groups

*

Rule Path Comment
Disallow /cgi-bin/ includes some large virtual spaces
Disallow /test/ -
Disallow /test.php -
Disallow /_uofs-codebase/ -
Disallow /_uofs-site-basic/ -
Disallow /_usask/ -
Disallow /arts-sandbox/ -
Disallow /wcs-sandbox/ -
Disallow /wcms-sandbox/ -
Disallow /usaskcdn-sandbox/ -

Comments

  • This file lists local URLs that well-behaved robots should ignore
  • Sandboxes and code bases - WCS-1562