gustavus.edu
robots.txt

Robots Exclusion Standard data for gustavus.edu

Resource Scan

Scan Details

Site Domain gustavus.edu
Base Domain gustavus.edu
Scan Status Ok
Last Scan2024-09-21T11:12:59+00:00
Next Scan 2024-10-05T11:12:59+00:00

Last Scan

Scanned2024-09-21T11:12:59+00:00
URL https://gustavus.edu/robots.txt
Domain IPs 138.236.1.12, 2606:8fc0::1012
Response IP 138.236.1.12
Found Yes
Hash c219afb5a70a2eccce6599e3d51a2cc3c97b82ca58e510bb6fd1b9ca658e08e5
SimHash f69071ddeff7

Groups

*

Rule Path
Disallow /news/yellow/
Disallow /files/
Disallow /news/summerscoop/
Disallow /computing/
Disallow /giving/honorroll/
Disallow /news/weekly/
Disallow /archive/
Disallow /gts/w/
Disallow /biology/systematics/w/
Disallow /general_catalog/18_19/
Disallow /general_catalog/17_18/
Disallow /general_catalog/16_17/
Disallow /general_catalog/15_16/
Disallow /general_catalog/14_15/
Disallow /general_catalog/13_14/
Disallow /general_catalog/12_13/
Disallow /general_catalog/11_12/
Disallow /general_catalog/10_11/
Disallow /general_catalog/09_10/
Disallow /general_catalog/08_09/
Disallow /general_catalog/07_08/
Disallow /general_catalog/06_07/
Disallow /general_catalog/05_06/
Disallow /general_catalog/04_05/
Disallow /general_catalog/03_04/
Disallow /general_catalog/02_03/
Disallow /general_catalog/01_02/
Disallow /general_catalog/00_01/
Disallow /plugins/
Disallow /flag-page/
Disallow /search/
Disallow /concert/
Disallow /social/
Disallow /campus/
Disallow /welcome/
Disallow /academics/overlays/
Disallow /academics/profiles/
Disallow /giving/donorprofiles/
Disallow /finearts/ensembles/
Disallow /feedback/
Disallow /account/aToZ
Disallow /calendar/exportItem
Disallow /oncampus/
Disallow /about/webcam
Disallow /covid/
Disallow /library/library_home.php

mediapartners-google*

Rule Path
Disallow /

israbot

Rule Path
Disallow

orthogaffe

Rule Path
Disallow

ubicrawler

Rule Path
Disallow /

doc

Rule Path
Disallow /

zao

Rule Path
Disallow /

sitecheck.internetseer.com

Rule Path
Disallow /

zealbot

Rule Path
Disallow /

msiecrawler

Rule Path
Disallow /

sitesnagger

Rule Path
Disallow /

webstripper

Rule Path
Disallow /

webcopier

Rule Path
Disallow /

fetch

Rule Path
Disallow /

offline explorer

Rule Path
Disallow /

teleport

Rule Path
Disallow /

teleportpro

Rule Path
Disallow /

webzip

Rule Path
Disallow /

linko

Rule Path
Disallow /

httrack

Rule Path
Disallow /

microsoft.url.control

Rule Path
Disallow /

xenu

Rule Path
Disallow /

larbin

Rule Path
Disallow /

libwww

Rule Path
Disallow /

zyborg

Rule Path
Disallow /

download ninja

Rule Path
Disallow /

wget

Rule Path
Disallow /

grub-client

Rule Path
Disallow /

k2spider

Rule Path
Disallow /

npbot

Rule Path
Disallow /

webreaper

Rule Path
Disallow /

Comments

  • Disallow: /news/calendar/ical.cfm
  • Disallow: */secure/
  • advertising-related bots:
  • Wikipedia work bots:
  • Crawlers that are kind enough to obey, but which we'd rather not have
  • unless they're feeding search engines.
  • Some bots are known to be trouble, particularly those designed to copy
  • entire sites. Please obey robots.txt.
  • Sorry, wget in its recursive mode is a frequent problem.
  • Please read the man page and use it properly; there is a
  • --wait option you can use to set the delay between hits,
  • for instance.
  • The 'grub' distributed client has been *very* poorly behaved.
  • Doesn't follow robots.txt anyway, but...
  • Hits many times per second, not acceptable
  • http://www.nameprotect.com/botinfo.html
  • A capture bot, downloads gazillions of pages with no public benefit
  • http://www.webreaper.net/