sc.edu
robots.txt

Robots Exclusion Standard data for sc.edu

Resource Scan

Scan Details

Site Domain sc.edu
Base Domain sc.edu
Scan Status Ok
Last Scan2024-06-18T10:36:55+00:00
Next Scan 2024-07-18T10:36:55+00:00

Last Scan

Scanned2024-06-18T10:36:55+00:00
URL https://sc.edu/robots.txt
Domain IPs 129.252.90.108
Response IP 129.252.90.108
Found Yes
Hash b1ed0e25322ae5ed982d76f5dfd44993a06f3f56b3b4ba1807eb67f6b606b752
SimHash 7df6e4197dfb

Groups

*

Rule Path Comment
Disallow ~ -
Disallow /reports/ server statistics
Disallow /stats/ server statistics
Disallow /gifs/ images
Disallow /icon/ images
Disallow /icons/ images
Disallow /images/ images
Disallow /cgi-bin/ cgi-bin
Disallow /policies/athl200.pdf -
Disallow /policies/eop100.pdf -
Disallow /policies/hr178.pdf -
Disallow /policies/ppm/athl200.pdf -
Disallow /policies/ppm/eop100.pdf -
Disallow /policies/ppm/hr178.pdf -
Disallow /archives/ -
Disallow /about/contact/unsubscribe/ -
Disallow /*.xml$ -
Disallow /*.inc$ -
Disallow /ardc/sashtml/ -
Disallow /ardc/saspdf/ -
Disallow /usctest -
Disallow /library/godort/ -
Disallow /library/pubserv/ala/ -
Disallow /library/pubserv/images/ -
Disallow /library/pubserv/podreport/ -
Disallow /library/pubserv/test/ -
Disallow /library/pubserv/tutorial/ -
Disallow /library/test/ -

twitterbot

Rule Path
Disallow

facebot

Rule Path
Disallow

siteimprovebot-crawler

Rule Path
Disallow

googlebot

Rule Path
Disallow

piplbot

Rule Path
Disallow /

Comments

  • ARDC
  • USC Test
  • University Libraries