cims.nyu.edu
robots.txt

Robots Exclusion Standard data for cims.nyu.edu

Resource Scan

Scan Details

Site Domain cims.nyu.edu
Base Domain nyu.edu
Scan Status Ok
Last Scan2024-08-29T05:15:56+00:00
Next Scan 2024-09-28T05:15:56+00:00

Last Scan

Scanned2024-08-29T05:15:56+00:00
URL https://cims.nyu.edu/robots.txt
Domain IPs 216.165.22.202
Response IP 216.165.22.202
Found Yes
Hash 1a18a99cc6f38880d76f2b4ae0e434f5f2f5132bc6937e6271d860c02b3911c5
SimHash f81e3352477f

Groups

*

Rule Path
Disallow /cgi-bin/
Disallow /cgi-comment/
Disallow /cgi-systems/
Disallow /systems/platforms/linux/software/package/
Disallow /webapps/page_body/
Disallow /webapps/page_title/
Disallow /webapps/cms
Disallow /webapps/directory
Disallow /webapps-dev/
Disallow /webcalendar/
Disallow /webcalendar?*
Disallow /webapps/classrooms/
Disallow /*%3D$
Disallow /people/profiles/WALFISH_Michael.html
Disallow /webapps/content/systems/platforms/linux/softare/dropbox-workaround

Comments

  • This restricts access to only known and registered robots.
  • Modified by Daniel - took out all whitelisted bots, we can
  • add blacklists here and in web server if needed...
  • page fragments included by CMS
  • cms is editor, only content is public
  • Staging server
  • googlebot was abusing the webcalendar
  • Other bots were abusing it as well
  • User-agent: Googlebot
  • Added 2015/06/09
  • Google Search Appliance seems to abuse classroom calendar.
  • I believe the following will eliminate many of the 404's that result from
  • crawling javascript such as
  • var AUTH_TOKEN = 'm0IBKGTI83RXdNSm25OtcWWCyfDE6SLQWkkBosLVvmA=';
  • Added 2016/09/06
  • Block Michael Walfish's old profile which containted some information he
  • wanted kept private. New profile is at WALFISH__Michael.html
  • Added 2018/10/26