cs.nyu.edu
robots.txt

Robots Exclusion Standard data for cs.nyu.edu

Resource Scan

Scan Details

Site Domain cs.nyu.edu
Base Domain nyu.edu
Scan Status Ok
Last Scan2024-05-31T15:37:14+00:00
Next Scan 2024-06-30T15:37:14+00:00

Last Scan

Scanned2024-05-31T15:37:14+00:00
URL https://cs.nyu.edu/robots.txt
Domain IPs 216.165.22.203
Response IP 216.165.22.203
Found Yes
Hash 67ed20a693736ea708562548c5e3e54ec23b98106da82cf6d4408f642e6075fc
SimHash b01a8113c7fd

Groups

*

Rule Path
Disallow /cgi-bin/
Disallow /systems/platforms/linux/software/package/
Disallow /webapps-dev/
Allow /pipermail/smt-lib
Allow /pipermail/smt-comp
Allow /pipermail/fom
Disallow /pipermail/
Disallow /computing/
Disallow /dynamic/admin/
Disallow /archive201511/
Allow /web/
Allow /webapps/
Disallow /cs/review/
Disallow /webapps/classrooms/
Disallow /*%3D$

ahrefsbot

No rules defined. All paths allowed.

Other Records

Field Value
crawl-delay 2

Comments

  • This restricts access to only known and registered robots.
  • Modified by Daniel - took out all whitelisted bots, we can
  • add blacklists here and in web server if needed...
  • page fragments included by CMS
  • Disallow: /webapps/page_body/
  • Disallow: /webapps/page_title/
  • Staging server
  • note: Someone complained that their email to cvc-users was picked up
  • by google (because their signature line had their phone number).
  • I thought it a reasonable expectation that such email not be picked
  • up by google, since folks very often use a signature line that
  • includes such information. So i agree with the person that
  • complained and called it "Bad Practice".
  • For most lists, this is not allowed, so these exceptions must
  • have been requested by the list owners.
  • So if they complain, i guess we can explain then?
  • But, for now, i commented out all 4 Allows below.
  • -aph- 11/29/2018
  • Changes reverted. smt-lib, smt-comp, and fom lists are crawlable again.
  • -robb- 07/25/2019
  • Added 2018/11/27 by NF
  • Do not crawl computing.nyu.edu test site
  • Added 2015/11/25 by NF
  • Do not crawl Django Admin site
  • Added 2015/11/25 by NF
  • Added 2015/11/24 by NF
  • CS website has moved from /web/ to /home/
  • Added 2016/03/03 by NF. See Ticket#2016030210001058.
  • Added 2015/06/09
  • Google Search Appliance seems to abuse classroom calendar.
  • I believe the following will eliminate many of the 404's that result from
  • crawling javascript such as
  • var AUTH_TOKEN = 'm0IBKGTI83RXdNSm25OtcWWCyfDE6SLQWkkBosLVvmA=';