dot.ny.gov
robots.txt

Robots Exclusion Standard data for dot.ny.gov

Resource Scan

Scan Details

Site Domain dot.ny.gov
Base Domain ny.gov
Scan Status Ok
Last Scan2024-05-30T19:34:49+00:00
Next Scan 2024-06-29T19:34:49+00:00

Last Scan

Scanned2024-05-30T19:34:49+00:00
URL https://dot.ny.gov/robots.txt
Domain IPs 161.11.225.99
Response IP 161.11.225.99
Found Yes
Hash 92e503745c53b8658e7d96ed8e20a9524b50e92f074eb28c79de0041070f99e5
SimHash aa1759838f53

Groups

nys-crawler

Rule Path
Disallow

wc3-checklink

Rule Path
Disallow

googlebot

Rule Path
Disallow

inktomi slurp

Rule Path
Disallow

msnbot

Rule Path
Disallow

askjeeves

Rule Path
Disallow

infoseek robot 1.0

Rule Path
Disallow

infoseek sidewinder

Rule Path
Disallow

*

Rule Path
Disallow /
Disallow /portal/page/portal/
Disallow /pls/
Disallow /portalHelp2/
Disallow /portal/pls/
Disallow /tmp

Comments

  • $Id: robots.txt
  • This is a file retrieved by webwalkers a.k.a. spiders that
  • conform to a defacto standard.
  • See <URL:http://www.robotstxt.org/wc/exclusion.html#robotstxt>
  • Format is:
  • User-agent: <name of spider>
  • Disallow: <nothing> | <path>
  • -----------------------------------------------------------------------------

Warnings

  • 1 invalid line.