rit.edu
robots.txt

Robots Exclusion Standard data for rit.edu

Archived Snapshots

Resource Scan

Scan Details

Site Domain	rit.edu
Base Domain	rit.edu
Scan Status	Ok
Last Scan	2024-06-07T09:48:01+00:00
Next Scan	2024-07-07T09:48:01+00:00

Last Scan

Scanned	2024-06-07T09:48:01+00:00
URL	https://rit.edu/robots.txt
Redirect	https://www.rit.edu/robots.txt
Redirect Domain	www.rit.edu
Redirect Base	rit.edu
Domain IPs	129.21.1.40, 2620:8d:8000:0:aba:ca:daba:217
Redirect IPs	129.21.1.40, 2620:8d:8000:0:aba:ca:daba:217
Response IP	129.21.1.40
Found	Yes
Hash	8c5a2919d31258824b5757bda0af4b5142d6384ef58b75f0f3f4f1ab5daf0ffb
SimHash	201d4335e7d2

Groups

semrushbot

Rule	Path
Disallow	/

Rule

Path

Disallow

semanticscholarbot

Rule	Path
Disallow	/

Rule

Path

Disallow

petalbot

Rule	Path
Disallow	/

Rule

Path

Disallow

baiduspider

Rule	Path
Disallow	/

Rule

Path

Disallow

yandex

Rule	Path
Disallow	/

Rule

Path

Disallow

trendictionbot

Rule	Path
Disallow	/

Rule

Path

Disallow

*

Rule	Path
Disallow	/commencement/book/
Disallow	/academicaffairs/commencement/book/
Disallow	/cos/satoshi-takahashi
Disallow	/cos/old
Disallow	/directory*
Disallow	/ntid/educationalmaterials
Disallow	/study/former-ceramics-bfa
Disallow	/study/former-fine-arts-studio-bfa
Disallow	/study/former-furniture-design-bfa
Disallow	/study/former-glass-bfa
Disallow	/study/former-metals-and-jewelry-design-bfa
Disallow	/cla/modernlanguages/
Disallow	/blog/
Disallow	/its/old
Disallow	/its/new

Rule

Path

Disallow

/commencement/book/

Disallow

/academicaffairs/commencement/book/

Disallow

/cos/satoshi-takahashi

Disallow

/cos/old

Disallow

/directory*

Disallow

/ntid/educationalmaterials

Disallow

/study/former-ceramics-bfa

Disallow

/study/former-fine-arts-studio-bfa

Disallow

/study/former-furniture-design-bfa

Disallow

/study/former-glass-bfa

Disallow

/study/former-metals-and-jewelry-design-bfa

Disallow

/cla/modernlanguages/

Disallow

/blog/

Disallow

/its/old

Disallow

/its/new

rit storm crawler
googlebot
google

Rule	Path
Disallow	/ntid/educationalmaterials
Disallow	/controller/newsite

Rule

Path

Disallow

/ntid/educationalmaterials

Disallow

/controller/newsite

siteimprovebot

Rule	Path
Disallow

Rule

Path

Disallow

siteimprovebot-crawler

Rule	Path
Disallow

Rule

Path

Disallow

ahc/1.0

Rule	Path
Disallow	/

Rule

Path

Disallow

ahc/2.0

Rule	Path
Disallow	/

Rule

Path

Disallow

ahc/2.1

Rule	Path
Disallow	/
Disallow	/~w-oce/
Disallow	/~pltw/
Disallow	/~w-cosold/

Rule

Path

Disallow

/~w-oce/

Disallow

/~pltw/

Disallow

/~w-cosold/

Comments

Add Robots Exclusion Commands for www below this line
Baiduspider
Yandex
allow the RIT Stormcrawler and Google
allow the Siteimprove crawler
Removal of /~w-* URLS from search indexes
We can't do this globally, since many sites are broken and use these URLs publically

rit.edurobots.txt

Resource Scan

Scan Details

Last Scan

Groups

semrushbot

semanticscholarbot

petalbot

baiduspider

yandex

trendictionbot

*

rit storm crawlergooglebotgoogle

siteimprovebot

siteimprovebot-crawler

ahc/1.0

ahc/2.0

ahc/2.1

Comments

rit.edu
robots.txt

rit storm crawler
googlebot
google