cs.columbia.edu
robots.txt

Robots Exclusion Standard data for cs.columbia.edu

Resource Scan

Scan Details

Site Domain cs.columbia.edu
Base Domain columbia.edu
Scan Status Ok
Last Scan2025-12-24T23:50:44+00:00
Next Scan 2026-01-23T23:50:44+00:00

Last Scan

Scanned2025-12-24T23:50:44+00:00
URL https://cs.columbia.edu/robots.txt
Redirect https://www.cs.columbia.edu/robots.txt
Redirect Domain www.cs.columbia.edu
Redirect Base columbia.edu
Domain IPs 128.59.11.206
Redirect IPs 128.59.11.206
Response IP 128.59.11.206
Found Yes
Hash 14375ab4bdfba79371e1dcfb44dc3ed93fa87a2c8746db00aa83a9fc178fb00a
SimHash 0660de25db84

Groups

*

Rule Path
Disallow /crf/phone/3com/
Disallow /CAVE/private
Disallow /CAVE/exclude
Disallow /CAVE/phpadmin
Disallow /nlp/newsblaster/archives/
Disallow /?plugin=attach&pcmd=open&file=*
Disallow /mice
Disallow /crftest
Disallow /crfnew
Disallow /crf/support_charge.shtml
Disallow /webtest
Disallow /~nayar/akash

slurp

No rules defined. All paths allowed.

Other Records

Field Value
crawl-delay 10