dot.ny.gov
robots.txt

Robots Exclusion Standard data for dot.ny.gov

Archived Snapshots

Resource Scan

Scan Details

Site Domain	dot.ny.gov
Base Domain	ny.gov
Scan Status	Ok
Last Scan	2024-05-30T19:34:49+00:00
Next Scan	2024-06-29T19:34:49+00:00

Last Scan

Scanned	2024-05-30T19:34:49+00:00
URL	https://dot.ny.gov/robots.txt
Domain IPs	161.11.225.99
Response IP	161.11.225.99
Found	Yes
Hash	92e503745c53b8658e7d96ed8e20a9524b50e92f074eb28c79de0041070f99e5
SimHash	aa1759838f53

Groups

nys-crawler

Rule	Path
Disallow

Rule

Path

Disallow

wc3-checklink

Rule	Path
Disallow

Rule

Path

Disallow

googlebot

Rule	Path
Disallow

Rule

Path

Disallow

inktomi slurp

Rule	Path
Disallow

Rule

Path

Disallow

msnbot

Rule	Path
Disallow

Rule

Path

Disallow

askjeeves

Rule	Path
Disallow

Rule

Path

Disallow

infoseek robot 1.0

Rule	Path
Disallow

Rule

Path

Disallow

infoseek sidewinder

Rule	Path
Disallow

Rule

Path

Disallow

*

Rule	Path
Disallow	/
Disallow	/portal/page/portal/
Disallow	/pls/
Disallow	/portalHelp2/
Disallow	/portal/pls/
Disallow	/tmp

Rule

Path

Disallow

/

Disallow

/portal/page/portal/

Disallow

/pls/

Disallow

/portalHelp2/

Disallow

/portal/pls/

Disallow

/tmp

Back to top

Comments

$Id: robots.txt
This is a file retrieved by webwalkers a.k.a. spiders that
conform to a defacto standard.
See <URL:http://www.robotstxt.org/wc/exclusion.html#robotstxt>
Format is:
User-agent: <name of spider>
Disallow: <nothing> | <path>
-----------------------------------------------------------------------------

Back to top

Warnings

1 invalid line.

Back to top

dot.ny.govrobots.txt

Resource Scan

Scan Details

Last Scan

Groups

nys-crawler

wc3-checklink

googlebot

inktomi slurp

msnbot

askjeeves

infoseek robot 1.0

infoseek sidewinder

*

Comments

Warnings

dot.ny.gov
robots.txt