rit.edu
robots.txt

Robots Exclusion Standard data for rit.edu

Resource Scan

Scan Details

Site Domain rit.edu
Base Domain rit.edu
Scan Status Ok
Last Scan2024-06-07T09:48:01+00:00
Next Scan 2024-07-07T09:48:01+00:00

Last Scan

Scanned2024-06-07T09:48:01+00:00
URL https://rit.edu/robots.txt
Redirect https://www.rit.edu/robots.txt
Redirect Domain www.rit.edu
Redirect Base rit.edu
Domain IPs 129.21.1.40, 2620:8d:8000:0:aba:ca:daba:217
Redirect IPs 129.21.1.40, 2620:8d:8000:0:aba:ca:daba:217
Response IP 129.21.1.40
Found Yes
Hash 8c5a2919d31258824b5757bda0af4b5142d6384ef58b75f0f3f4f1ab5daf0ffb
SimHash 201d4335e7d2

Groups

semrushbot

Rule Path
Disallow /

semanticscholarbot

Rule Path
Disallow /

petalbot

Rule Path
Disallow /

baiduspider

Rule Path
Disallow /

yandex

Rule Path
Disallow /

trendictionbot

Rule Path
Disallow /

*

Rule Path
Disallow /commencement/book/
Disallow /academicaffairs/commencement/book/
Disallow /cos/satoshi-takahashi
Disallow /cos/old
Disallow /directory*
Disallow /ntid/educationalmaterials
Disallow /study/former-ceramics-bfa
Disallow /study/former-fine-arts-studio-bfa
Disallow /study/former-furniture-design-bfa
Disallow /study/former-glass-bfa
Disallow /study/former-metals-and-jewelry-design-bfa
Disallow /cla/modernlanguages/
Disallow /blog/
Disallow /its/old
Disallow /its/new

rit storm crawler
googlebot
google

Rule Path
Disallow /ntid/educationalmaterials
Disallow /controller/newsite

siteimprovebot

Rule Path
Disallow

siteimprovebot-crawler

Rule Path
Disallow

ahc/1.0

Rule Path
Disallow /

ahc/2.0

Rule Path
Disallow /

ahc/2.1

Rule Path
Disallow /
Disallow /~w-oce/
Disallow /~pltw/
Disallow /~w-cosold/

Comments

  • Add Robots Exclusion Commands for www below this line
  • Baiduspider
  • Yandex
  • allow the RIT Stormcrawler and Google
  • allow the Siteimprove crawler
  • Removal of /~w-* URLS from search indexes
  • We can't do this globally, since many sites are broken and use these URLs publically