hbsp.harvard.edu
robots.txt

Robots Exclusion Standard data for hbsp.harvard.edu

Resource Scan

Scan Details

Site Domain hbsp.harvard.edu
Base Domain harvard.edu
Scan Status Ok
Last Scan2024-10-16T14:49:58+00:00
Next Scan 2024-11-15T14:49:58+00:00

Last Scan

Scanned2024-10-16T14:49:58+00:00
URL https://hbsp.harvard.edu/robots.txt
Domain IPs 13.33.88.41, 13.33.88.59, 13.33.88.78, 13.33.88.91
Response IP 13.33.88.91
Found Yes
Hash 5f0e4c9050c8294dda7054b2980398bc1b626ca46717e69a0fe9890aa65322b0
SimHash e180f842c0b7

Groups

*

Rule Path
Disallow /signin
Disallow /tu/*
Disallow /cbmp/pl/*
Disallow /coursepacks/*
Disallow /import/*
Disallow /cart/*

twitterbot

Rule Path
Disallow

gptbot

Rule Path
Disallow /

Comments

  • robots.txt for https://hbsp.harvard.edu/
  • disallow all crawls from 80legs.com
  • disallow singleclick and signin paths
  • disallow coursepack and import paths
  • prevents crawling disallow from Twitterbot
  • prevents OpenAI crawling

Warnings

  • 2 invalid lines.