/.well-known/

Log In Sign Up

cse.unl.edu
robots.txt

Robots Exclusion Standard data for cse.unl.edu

Archived Snapshots

Resource Scan

Scan Details

Site Domain	cse.unl.edu
Base Domain	unl.edu
Scan Status	Ok
Last Scan	2024-09-09T14:18:53+00:00
Next Scan	2024-10-09T14:18:53+00:00

Last Scan

Scanned	2024-09-09T14:18:53+00:00
URL	https://cse.unl.edu/robots.txt
Domain IPs	129.93.1.30
Response IP	129.93.1.30
Found	Yes
Hash	c3f969719a688b841c9adcdfd68199126db115908fecae6b1dcd7c3ce14749aa
SimHash	e844bd99a7f0

Groups

*

Rule

Path

Disallow

/htdocs.old

Disallow

/BACKUP

Disallow

/Templates

Disallow

/2del

sitemaster/1.0

Rule

Path

Disallow

/~*

Disallow

/faculty/protected

Back to top

Comments

I've commented out the below. - CD April 6th 2015.
1) Pretty sure robots.txt has not been served since we switched to UNLCMS
2) Need to UNCMS sitescan to run thru to find 404 on new site.
Request-rate: 2/1 # maximum rate is one page every 5 seconds
Crawl-delay: 5
Visit-time: 2300-0715 UTC-06 # only visit between
Keep unlcms from scanning user home directories, and faculty protected

Back to top