cse.unl.edu
robots.txt

Robots Exclusion Standard data for cse.unl.edu

Resource Scan

Scan Details

Site Domain cse.unl.edu
Base Domain unl.edu
Scan Status Ok
Last Scan2024-06-11T13:43:35+00:00
Next Scan 2024-07-11T13:43:35+00:00

Last Scan

Scanned2024-06-11T13:43:35+00:00
URL https://cse.unl.edu/robots.txt
Domain IPs 129.93.1.30
Response IP 129.93.1.30
Found Yes
Hash c3f969719a688b841c9adcdfd68199126db115908fecae6b1dcd7c3ce14749aa
SimHash e844bd99a7f0

Groups

*

Rule Path
Disallow /htdocs.old
Disallow /BACKUP
Disallow /Templates
Disallow /2del

sitemaster/1.0

Rule Path
Disallow /~*
Disallow /faculty/protected

Comments

  • I've commented out the below. - CD April 6th 2015.
  • 1) Pretty sure robots.txt has not been served since we switched to UNLCMS
  • 2) Need to UNCMS sitescan to run thru to find 404 on new site.
  • Request-rate: 2/1 # maximum rate is one page every 5 seconds
  • Crawl-delay: 5
  • Visit-time: 2300-0715 UTC-06 # only visit between
  • Keep unlcms from scanning user home directories, and faculty protected