myhousingsearch.com
robots.txt

Robots Exclusion Standard data for myhousingsearch.com

Archived Snapshots

Resource Scan

Scan Details

Site Domain	myhousingsearch.com
Base Domain	myhousingsearch.com
Scan Status	Ok
Last Scan	2024-10-08T20:17:22+00:00
Next Scan	2024-11-07T20:17:22+00:00

Last Scan

Scanned	2024-10-08T20:17:22+00:00
URL	https://myhousingsearch.com/robots.txt
Redirect	https://www.myhousingsearch.com/robots.txt
Redirect Domain	www.myhousingsearch.com
Redirect Base	myhousingsearch.com
Domain IPs	66.129.73.135
Redirect IPs	66.129.73.135
Response IP	66.129.73.135
Found	Yes
Hash	5baf8a94c4e189beb3b45de775e6aeb6d6e9017438053bbc1a5722eb47bce82c
SimHash	26f29e58cb59

Groups

*

Rule	Path
Disallow	/dbh/ViewUnit
Disallow	/dbh/SearchHousingSubmit.html
Disallow	/dbh/MFICalculator.html
Disallow	/WebFile
Disallow	/tenant/Search.html.ESP
Disallow	/tenant/index.html.ESP
Disallow	/alf/fl/ViewFacility.html
Disallow	/sw/

Rule

Path

Disallow

/dbh/ViewUnit

Disallow

/dbh/SearchHousingSubmit.html

Disallow

/dbh/MFICalculator.html

Disallow

/WebFile

Disallow

/tenant/Search.html.ESP

Disallow

/tenant/index.html.ESP

Disallow

/alf/fl/ViewFacility.html

Disallow

/sw/

Other Records

Field	Value
crawl-delay	5

Field

Value

crawl-delay

slurp china

Rule	Path
Disallow	/

Rule

Path

Disallow

psbot

Rule	Path
Disallow	/

Rule

Path

Disallow

twiceler

Rule	Path
Disallow	/

Rule

Path

Disallow

speedy

Rule	Path
Disallow	/

Rule

Path

Disallow

findlinks

Rule	Path
Disallow	/

Rule

Path

Disallow

irlbot

Rule	Path
Disallow	/

Rule

Path

Disallow

mqbot

Rule	Path
Disallow	/

Rule

Path

Disallow

nutch

Rule	Path
Disallow	/

Rule

Path

Disallow

becomebot

Rule	Path
Disallow	/

Rule

Path

Disallow

woriobot

Rule	Path
Disallow	/

Rule

Path

Disallow

gigabot

Rule	Path
Disallow	/

Rule

Path

Disallow

gigabot

Rule	Path
Disallow	/

Rule

Path

Disallow

gigabot/2.0

Rule	Path
Disallow	/

Rule

Path

Disallow

gigabot/2.0att

Rule	Path
Disallow	/

Rule

Path

Disallow

archive.org_bot

Rule	Path
Disallow	/

Rule

Path

Disallow

heritrix

Rule	Path
Disallow	/

Rule

Path

Disallow

heritrix

Rule	Path
Disallow	/

Rule

Path

Disallow

archive.org_bot

Rule	Path
Disallow	/

Rule

Path

Disallow

voyager/1.0

Rule	Path
Disallow	/

Rule

Path

Disallow

voyager

Rule	Path
Disallow	/

Rule

Path

Disallow

voyager/1.0

Rule	Path
Disallow	/

Rule

Path

Disallow

voyager

Rule	Path
Disallow	/

Rule

Path

Disallow

cazoodlebot

Rule	Path
Disallow	/

Rule

Path

Disallow

squider

Rule	Path
Disallow	/

Rule

Path

Disallow

squider/0.01

Rule	Path
Disallow	/

Rule

Path

Disallow

Comments

Production robots.txt. See #4832.
We have nothing of interest for yahoo china search here.
Yahoo! Slurp China
We hate seeing psbot also.
We hate seeing twiceler/ Cuill also.
And 'Speedy Spider'
And try to stop findlinks (http://wortschatz.uni-leipzig.de/nextlinks/findlinks_en.html), but it does not say its User-agent ...
http://irl.cs.tamu.edu/crawler/
not sure about user agent .... "MQBOT/Nutch-0.9-dev (MQBOT Nutch Crawler; http://vwbot.cs.uiuc.edu; mqbot@cs.uiuc.edu)"
Shopping robot ... http://www.become.com/site_owners.html
http://www.worio.com/S
Gigabot. Possibly many names. One annoying spider.
Heretix
User agent string in logs report as "Mozilla/5.0 (compatible; heritrix/1.12.0 +http://www.accelobot.com)"
Seems to be an open-source bot: http://crawler.archive.org/
But this instance coming from c01.ba.accelovation.com -> c08.ba.accelovation.com,
Which are 72.20.99.41 -> 72.20.99.48. Perhaps will just block at
firewall.
But vast a wide net.
'voyager/1.0' coming from crawl*.cosmixcorp.com, a lovely linkfarm.
Why another robot?
This smells like a badly behaved bot (which means putting it here is
more like venting than actually having an effect).

myhousingsearch.comrobots.txt

Resource Scan

Scan Details

Last Scan

Groups

*

Other Records

slurp china

psbot

twiceler

speedy

findlinks

irlbot

mqbot

nutch

becomebot

woriobot

gigabot

gigabot

gigabot/2.0

gigabot/2.0att

archive.org_bot

heritrix

heritrix

archive.org_bot

voyager/1.0

voyager

voyager/1.0

voyager

cazoodlebot

squider

squider/0.01

Comments

myhousingsearch.com
robots.txt