cs.nyu.edu
robots.txt

Robots Exclusion Standard data for cs.nyu.edu

Archived Snapshots

Resource Scan

Scan Details

Site Domain	cs.nyu.edu
Base Domain	nyu.edu
Scan Status	Ok
Last Scan	2024-05-31T15:37:14+00:00
Next Scan	2024-06-30T15:37:14+00:00

Last Scan

Scanned	2024-05-31T15:37:14+00:00
URL	https://cs.nyu.edu/robots.txt
Domain IPs	216.165.22.203
Response IP	216.165.22.203
Found	Yes
Hash	67ed20a693736ea708562548c5e3e54ec23b98106da82cf6d4408f642e6075fc
SimHash	b01a8113c7fd

Groups

*

Rule	Path
Disallow	/cgi-bin/
Disallow	/systems/platforms/linux/software/package/
Disallow	/webapps-dev/
Allow	/pipermail/smt-lib
Allow	/pipermail/smt-comp
Allow	/pipermail/fom
Disallow	/pipermail/
Disallow	/computing/
Disallow	/dynamic/admin/
Disallow	/archive201511/
Allow	/web/
Allow	/webapps/
Disallow	/cs/review/
Disallow	/webapps/classrooms/
Disallow	/*%3D$

Rule

Path

Disallow

/cgi-bin/

Disallow

/systems/platforms/linux/software/package/

Disallow

/webapps-dev/

Allow

/pipermail/smt-lib

Allow

/pipermail/smt-comp

Allow

/pipermail/fom

Disallow

/pipermail/

Disallow

/computing/

Disallow

/dynamic/admin/

Disallow

/archive201511/

Allow

/web/

Allow

/webapps/

Disallow

/cs/review/

Disallow

/webapps/classrooms/

Disallow

/*%3D$

ahrefsbot

No rules defined. All paths allowed.

Other Records

Field	Value
crawl-delay	2

Field

Value

crawl-delay

2

Back to top

Comments

This restricts access to only known and registered robots.
Modified by Daniel - took out all whitelisted bots, we can
add blacklists here and in web server if needed...
page fragments included by CMS
Disallow: /webapps/page_body/
Disallow: /webapps/page_title/
Staging server
note: Someone complained that their email to cvc-users was picked up
by google (because their signature line had their phone number).
I thought it a reasonable expectation that such email not be picked
up by google, since folks very often use a signature line that
includes such information. So i agree with the person that
complained and called it "Bad Practice".
For most lists, this is not allowed, so these exceptions must
have been requested by the list owners.
So if they complain, i guess we can explain then?
But, for now, i commented out all 4 Allows below.
-aph- 11/29/2018
Changes reverted. smt-lib, smt-comp, and fom lists are crawlable again.
-robb- 07/25/2019
Added 2018/11/27 by NF
Do not crawl computing.nyu.edu test site
Added 2015/11/25 by NF
Do not crawl Django Admin site
Added 2015/11/25 by NF
Added 2015/11/24 by NF
CS website has moved from /web/ to /home/
Added 2016/03/03 by NF. See Ticket#2016030210001058.
Added 2015/06/09
Google Search Appliance seems to abuse classroom calendar.
I believe the following will eliminate many of the 404's that result from
crawling javascript such as
var AUTH_TOKEN = 'm0IBKGTI83RXdNSm25OtcWWCyfDE6SLQWkkBosLVvmA=';

Back to top

cs.nyu.edurobots.txt

Resource Scan

Scan Details

Last Scan

Groups

*

ahrefsbot

Other Records

Comments

cs.nyu.edu
robots.txt