rankincounty.org
robots.txt

Robots Exclusion Standard data for rankincounty.org

Archived Snapshots

Resource Scan

Scan Details

Site Domain	rankincounty.org
Base Domain	rankincounty.org
Scan Status	Ok
Last Scan	2024-06-03T05:19:22+00:00
Next Scan	2024-07-03T05:19:22+00:00

Last Scan

Scanned	2024-06-03T05:19:22+00:00
URL	https://rankincounty.org/robots.txt
Redirect	https://www.rankincounty.org/robots.txt
Redirect Domain	www.rankincounty.org
Redirect Base	rankincounty.org
Domain IPs	104.21.32.43, 172.67.182.204, 2606:4700:3033::ac43:b6cc, 2606:4700:3034::6815:202b
Redirect IPs	104.21.32.43, 172.67.182.204, 2606:4700:3033::ac43:b6cc, 2606:4700:3034::6815:202b
Response IP	172.67.182.204
Found	Yes
Hash	4026e46acd309a9d3371c9b88b39abdf5ffd3931cabd7ec1eccff8682f465a6c
SimHash	96985959c5f1

Groups

*

Rule	Path
Disallow	/egov/imgs
Disallow	/egov/help
Disallow	/egov/include
Disallow	/egov/apps/events
Disallow	/include

Rule

Path

Disallow

/egov/imgs

Disallow

/egov/help

Disallow

/egov/include

Disallow

/egov/apps/events

Disallow

/include

Other Records

Field	Value
crawl-delay	300

Field

Value

crawl-delay

300

funwebproducts

Rule	Path
Disallow	/

Rule

Path

Disallow

ubicrawler

Rule	Path
Disallow	/

Rule

Path

Disallow

doc

Rule	Path
Disallow	/

Rule

Path

Disallow

zao

Rule	Path
Disallow	/

Rule

Path

Disallow

sitecheck.internetseer.com

Rule	Path
Disallow	/

Rule

Path

Disallow

zealbot

Rule	Path
Disallow	/

Rule

Path

Disallow

msiecrawler

Rule	Path
Disallow	/

Rule

Path

Disallow

sitesnagger

Rule	Path
Disallow	/

Rule

Path

Disallow

webstripper

Rule	Path
Disallow	/

Rule

Path

Disallow

webcopier

Rule	Path
Disallow	/

Rule

Path

Disallow

fetch

Rule	Path
Disallow	/

Rule

Path

Disallow

offline explorer

Rule	Path
Disallow	/

Rule

Path

Disallow

teleport

Rule	Path
Disallow	/

Rule

Path

Disallow

teleportpro

Rule	Path
Disallow	/

Rule

Path

Disallow

webzip

Rule	Path
Disallow	/

Rule

Path

Disallow

linko

Rule	Path
Disallow	/

Rule

Path

Disallow

httrack

Rule	Path
Disallow	/

Rule

Path

Disallow

microsoft.url.control

Rule	Path
Disallow	/

Rule

Path

Disallow

xenu

Rule	Path
Disallow	/

Rule

Path

Disallow

larbin

Rule	Path
Disallow	/

Rule

Path

Disallow

libwww

Rule	Path
Disallow	/

Rule

Path

Disallow

zyborg

Rule	Path
Disallow	/

Rule

Path

Disallow

download ninja

Rule	Path
Disallow	/

Rule

Path

Disallow

k2spider

Rule	Path
Disallow	/

Rule

Path

Disallow

npbot

Rule	Path
Disallow	/

Rule

Path

Disallow

webreaper

Rule	Path
Disallow	/

Rule

Path

Disallow

Comments

This thing comes from a known spyware site, so shove off.
Crawlers that are kind enough to obey, but which we'd rather not have
unless they're feeding search engines.
Some bots are known to be trouble, particularly those designed to copy
entire sites. Please obey robots.txt.
Doesn't follow robots.txt anyway, but...
Hits many times per second, not acceptable
http://www.nameprotect.com/botinfo.html
A capture bot, downloads gazillions of pages with no public benefit
http://www.webreaper.net/

Warnings

`noindex` is not a known field.

rankincounty.orgrobots.txt

Resource Scan

Scan Details

Last Scan

Groups

*

Other Records

funwebproducts

ubicrawler

doc

zao

sitecheck.internetseer.com

zealbot

msiecrawler

sitesnagger

webstripper

webcopier

fetch

offline explorer

teleport

teleportpro

webzip

linko

httrack

microsoft.url.control

xenu

larbin

libwww

zyborg

download ninja

k2spider

npbot

webreaper

Comments

Warnings

rankincounty.org
robots.txt