cornell.joinhandshake.com
robots.txt

Robots Exclusion Standard data for cornell.joinhandshake.com

Archived Snapshots

Resource Scan

Scan Details

Site Domain	cornell.joinhandshake.com
Base Domain	joinhandshake.com
Scan Status	Ok
Last Scan	2025-08-26T00:19:29+00:00
Next Scan	2025-09-09T00:19:29+00:00

Last Scan

Scanned	2025-08-26T00:19:29+00:00
URL	https://cornell.joinhandshake.com/robots.txt
Domain IPs	104.18.42.156, 172.64.145.100, 2606:4700:4400::6812:2a9c, 2606:4700:4400::ac40:9164
Response IP	104.18.42.156
Found	Yes
Hash	629b73c3621dcab34581590b98256881d2e74f3b1bc251c25dd9811d732247de
SimHash	60f4890f8671

Groups

*

Rule	Path
Allow	/$
Allow	/login
Allow	/register
Allow	/employer_registrations/new
Allow	/career_fairs/*/student_preview
Allow	/career_fairs/*/employer_preview
Allow	/events/*/share_preview
Allow	/jobs/*/share_preview
Allow	/employers
Allow	/job_role_groups
Allow	/questions
Allow	/favicon-32x32.png
Allow	/favicon-16x16.png
Allow	/favicon.png
Allow	/favicon.ico
Allow	/profiles/*
Disallow	/profiles/*/posts
Disallow	/

Rule

Path

Allow

/$

Allow

/login

Allow

/register

Allow

/employer_registrations/new

Allow

/career_fairs/*/student_preview

Allow

/career_fairs/*/employer_preview

Allow

/events/*/share_preview

Allow

/jobs/*/share_preview

Allow

/employers

Allow

/job_role_groups

Allow

/questions

Allow

/favicon-32x32.png

Allow

/favicon-16x16.png

Allow

/favicon.png

Allow

/favicon.ico

Allow

/profiles/*

Disallow

/profiles/*/posts

Disallow

/

Back to top

Comments

See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
A basic robots.txt file which allows specific pages and regex matches of pages but disallows
every other page from scraping, for all user agents.

Back to top

cornell.joinhandshake.comrobots.txt

Resource Scan

Scan Details

Last Scan

Groups

*

Comments

cornell.joinhandshake.com
robots.txt