web.eecs.umich.edu
robots.txt

Robots Exclusion Standard data for web.eecs.umich.edu

Resource Scan

Scan Details

Site Domain web.eecs.umich.edu
Base Domain umich.edu
Scan Status Ok
Last Scan2025-07-31T23:20:37+00:00
Next Scan 2025-08-30T23:20:37+00:00

Last Scan

Scanned2025-07-31T23:20:37+00:00
URL https://web.eecs.umich.edu/robots.txt
Domain IPs 141.212.113.214
Response IP 141.212.113.214
Found Yes
Hash 0a2429062938c0f36e85f3fc6b76f1a5639e1f7db11e52b502234a42c7de96ad
SimHash a80859574cd9

Groups

*

Rule Path
Disallow /~smbrain

*

Rule Path
Disallow /robots.txt

*

Rule Path
Disallow /courses/eecs484

*

Rule Path
Disallow /courses

*

Rule Path
Disallow /etc/

*

Rule Path
Disallow /~imarkov/5

*

Rule Path
Disallow /etc/calendar

*

Rule Path
Disallow /eecs/etc/calendar

*

Rule Path
Disallow /techday

*

Rule Path
Disallow /vlsipool

Comments

  • Description : Search engine exclusion file for http://www.eecs.umich.edu/
  • Author(s) : DCO staff
  • Organization: University of Michigan EECS DCO
  • Created : 1996-12-06 22:10 EDT
  • Version : $Revision$
  • RCS id : $Id$
  • -----------------------------------------------------------------------------
  • Summary of the file format, drawn from the documentation available at
  • http://info.webcrawler.com/mak/projects/robots/.
  • Each record contains lines of the form
  • <field>:<optionalspace><value><optionalspace>
  • The field name is case insensitive.
  • Comments can be included in file using UNIX bourne shell conventions: the
  • '#' character is used to indicate that preceding space (if any) and the
  • remainder of the line up to the line termination is discarded. Lines
  • containing only a comment are discarded completely, and therefore do not
  • indicate a record boundary.
  • The record starts with one or more User-agent lines, followed by one or
  • more Disallow lines, as detailed below. Unrecognised headers are ignored.
  • User-agent
  • The value of this field is the name of the robot the record is
  • describing access policy for.
  • If more than one User-agent field is present the record describes an
  • identical access policy for more than one robot. At least one field
  • needs to be present per record.
  • The robot should be liberal in interpreting this field. A case
  • insensitive substring match of the name without version information
  • is recommended.
  • If the value is '*', the record describes the default access policy
  • for any robot that has not matched any of the other records. It is
  • not allowed to have multiple such records in the "/robots.txt" file.
  • Disallow
  • The value of this field specifies a partial URL that is not to be
  • visited. This can be a full path, or a partial path; any URL that
  • starts with this value will not be retrieved. For example, Disallow:
  • /help disallows both /help.html and /help/index.html, whereas
  • Disallow: /help/ would disallow /help/index.html but allow
  • /help.html.
  • Any empty value, indicates that all URLs can be retrieved. At least
  • one Disallow field needs to be present in a record.
  • The presence of an empty "/robots.txt" file has no explicit associated
  • semantics, it will be treated as if it was not present, i.e. all robots
  • will consider themselves welcome.
  • Examples
  • The following example "/robots.txt" file specifies that no robots should
  • visit any URL starting with "/cyberworld/map/" or "/tmp/:
  • User-agent: *
  • Disallow: /cyberworld/map/ # This is an infinite virtual URL space
  • Disallow: /tmp/ # these will soon disappear
  • -----------------------------------------------------------------------------