cs.washington.edu
robots.txt

Robots Exclusion Standard data for cs.washington.edu

Resource Scan

Scan Details

Site Domain cs.washington.edu
Base Domain washington.edu
Scan Status Ok
Last Scan2024-05-10T15:41:54+00:00
Next Scan 2024-06-09T15:41:54+00:00

Last Scan

Scanned2024-05-10T15:41:54+00:00
URL https://cs.washington.edu/robots.txt
Redirect https://www.cs.washington.edu/robots.txt
Redirect Domain www.cs.washington.edu
Redirect Base washington.edu
Domain IPs 34.215.139.216
Redirect IPs 34.215.139.216
Response IP 34.215.139.216
Found Yes
Hash a17f26e346dd121c1cbf4b1075852193a222891d238da247951ecb6c06ff38df
SimHash 84c701c61a04

Groups

*

Rule Path
Disallow /info/Review
Disallow /archive
Disallow /htbin-post
Disallow /logs
Disallow /tmp
Disallow /usage
Disallow /utils
Disallow /research/projects/lis
Disallow /research/projects/cecil/www/www
Disallow /research/jair
Disallow /includes
Disallow /modules
Disallow /themes
Disallow /scripts
Disallow /sites
Disallow /internal_files
Disallow /1999
Disallow /lab/facilities/hardware/sponsors
Disallow /biblio
Disallow /people/faculty/shapiro
Disallow /node
Disallow /taxonomy
Disallow /content
Disallow /articles
Disallow /faqs
Disallow /feed
Disallow /feed-items
Disallow /frontpage-slideshow-images
Disallow /grouping-containers
Disallow /Shibboleth.sso

emailsiphon

Rule Path
Disallow /

extractor pro

Rule Path
Disallow /

dlexpert

Rule Path
Disallow /

webcopier

Rule Path
Disallow /

petalbot

Rule Path
Disallow /

Comments

  • Robot behavior guidelines for www.cs.washington.edu
  • See http://info.webcrawler.com/mak/projects/robots/exclusion.html