myhousingsearch.com
robots.txt

Robots Exclusion Standard data for myhousingsearch.com

Resource Scan

Scan Details

Site Domain myhousingsearch.com
Base Domain myhousingsearch.com
Scan Status Ok
Last Scan2024-10-08T20:17:22+00:00
Next Scan 2024-11-07T20:17:22+00:00

Last Scan

Scanned2024-10-08T20:17:22+00:00
URL https://myhousingsearch.com/robots.txt
Redirect https://www.myhousingsearch.com/robots.txt
Redirect Domain www.myhousingsearch.com
Redirect Base myhousingsearch.com
Domain IPs 66.129.73.135
Redirect IPs 66.129.73.135
Response IP 66.129.73.135
Found Yes
Hash 5baf8a94c4e189beb3b45de775e6aeb6d6e9017438053bbc1a5722eb47bce82c
SimHash 26f29e58cb59

Groups

*

Rule Path
Disallow /dbh/ViewUnit
Disallow /dbh/SearchHousingSubmit.html
Disallow /dbh/MFICalculator.html
Disallow /WebFile
Disallow /tenant/Search.html.ESP
Disallow /tenant/index.html.ESP
Disallow /alf/fl/ViewFacility.html
Disallow /sw/

Other Records

Field Value
crawl-delay 5

slurp china

Rule Path
Disallow /

psbot

Rule Path
Disallow /

twiceler

Rule Path
Disallow /

speedy

Rule Path
Disallow /

findlinks

Rule Path
Disallow /

irlbot

Rule Path
Disallow /

mqbot

Rule Path
Disallow /

nutch

Rule Path
Disallow /

becomebot

Rule Path
Disallow /

woriobot

Rule Path
Disallow /

gigabot

Rule Path
Disallow /

gigabot

Rule Path
Disallow /

gigabot/2.0

Rule Path
Disallow /

gigabot/2.0att

Rule Path
Disallow /

archive.org_bot

Rule Path
Disallow /

heritrix

Rule Path
Disallow /

heritrix

Rule Path
Disallow /

archive.org_bot

Rule Path
Disallow /

voyager/1.0

Rule Path
Disallow /

voyager

Rule Path
Disallow /

voyager/1.0

Rule Path
Disallow /

voyager

Rule Path
Disallow /

cazoodlebot

Rule Path
Disallow /

squider

Rule Path
Disallow /

squider/0.01

Rule Path
Disallow /

Comments

  • Production robots.txt. See #4832.
  • We have nothing of interest for yahoo china search here.
  • Yahoo! Slurp China
  • We hate seeing psbot also.
  • We hate seeing twiceler/ Cuill also.
  • And 'Speedy Spider'
  • And try to stop findlinks (http://wortschatz.uni-leipzig.de/nextlinks/findlinks_en.html), but it does not say its User-agent ...
  • http://irl.cs.tamu.edu/crawler/
  • not sure about user agent .... "MQBOT/Nutch-0.9-dev (MQBOT Nutch Crawler; http://vwbot.cs.uiuc.edu; mqbot@cs.uiuc.edu)"
  • Shopping robot ... http://www.become.com/site_owners.html
  • http://www.worio.com/S
  • Gigabot. Possibly many names. One annoying spider.
  • Heretix
  • User agent string in logs report as "Mozilla/5.0 (compatible; heritrix/1.12.0 +http://www.accelobot.com)"
  • Seems to be an open-source bot: http://crawler.archive.org/
  • But this instance coming from c01.ba.accelovation.com -> c08.ba.accelovation.com,
  • Which are 72.20.99.41 -> 72.20.99.48. Perhaps will just block at
  • firewall.
  • But vast a wide net.
  • 'voyager/1.0' coming from crawl*.cosmixcorp.com, a lovely linkfarm.
  • Why another robot?
  • This smells like a badly behaved bot (which means putting it here is
  • more like venting than actually having an effect).