houza.com
robots.txt

Robots Exclusion Standard data for houza.com

Resource Scan

Scan Details

Site Domain houza.com
Base Domain houza.com
Scan Status Failed
Failure StageFetching resource.
Failure ReasonCouldn't connect to server.
Last Scan2024-09-18T20:11:04+00:00
Next Scan 2024-12-17T20:11:04+00:00

Last Successful Scan

Scanned2024-01-30T20:08:29+00:00
URL https://houza.com/robots.txt
Domain IPs 108.157.254.103, 108.157.254.28, 108.157.254.59, 108.157.254.83
Response IP 18.155.129.40
Found Yes
Hash 69d08a046312a29c94417dbab061f85c9f21eecc93e21a1340c63795fdb2573f
SimHash a9ba1b93ceb7

Groups

*

Rule Path
Disallow /en/agencies?*name=*
Disallow /search*
Disallow /en/search*
Disallow /ar/search*
Disallow /agent*
Disallow /en/agent*
Disallow /ar/agent*
Disallow /profile*
Disallow /en/profile*
Disallow /ar/profile*
Disallow *?*sort=*
Disallow /p/*
Disallow /en/p/*
Disallow /ar/p/*

zealbot

Rule Path
Disallow /

msiecrawler

Rule Path
Disallow /

sitesnagger

Rule Path
Disallow /

webstripper

Rule Path
Disallow /

webcopier

Rule Path
Disallow /

fetch

Rule Path
Disallow /

offline explorer

Rule Path
Disallow /

teleport

Rule Path
Disallow /

teleportpro

Rule Path
Disallow /

webzip

Rule Path
Disallow /

linko

Rule Path
Disallow /

httrack

Rule Path
Disallow /

microsoft.url.control

Rule Path
Disallow /

xenu

Rule Path
Disallow /

larbin

Rule Path
Disallow /

libwww

Rule Path
Disallow /

zyborg

Rule Path
Disallow /

download ninja

Rule Path
Disallow /

webreaper

Rule Path
Disallow /

screaming frog seo spider

Rule Path
Disallow /

Other Records

Field Value
sitemap https://houza.com/sitemaps/sitemaps.xml.gz

Comments

  • Disallow name search param in agencies searchpages
  • Disallow search variations of listing pages
  • Disallow agent listings pages for now
  • Disallow profile pages
  • Disallow additional parameters on listing pages that lead to same canonicals
  • Don't crawl short links
  • No bad bots allowed