houza.com
robots.txt

Robots Exclusion Standard data for houza.com

Archived Snapshots

Resource Scan

Scan Details

Site Domain	houza.com
Base Domain	houza.com
Scan Status	Failed
Failure Stage	Fetching resource.
Failure Reason	Couldn't connect to server.
Last Scan	2024-09-18T20:11:04+00:00
Next Scan	2024-12-17T20:11:04+00:00

Last Successful Scan

Scanned	2024-01-30T20:08:29+00:00
URL	https://houza.com/robots.txt
Domain IPs	108.157.254.103, 108.157.254.28, 108.157.254.59, 108.157.254.83
Response IP	18.155.129.40
Found	Yes
Hash	69d08a046312a29c94417dbab061f85c9f21eecc93e21a1340c63795fdb2573f
SimHash	a9ba1b93ceb7

Groups

*

Rule	Path
Disallow	/en/agencies?name=
Disallow	/search*
Disallow	/en/search*
Disallow	/ar/search*
Disallow	/agent*
Disallow	/en/agent*
Disallow	/ar/agent*
Disallow	/profile*
Disallow	/en/profile*
Disallow	/ar/profile*
Disallow	?sort=*
Disallow	/p/*
Disallow	/en/p/*
Disallow	/ar/p/*

Rule

Path

Disallow

/en/agencies?*name=*

Disallow

/search*

Disallow

/en/search*

Disallow

/ar/search*

Disallow

/agent*

Disallow

/en/agent*

Disallow

/ar/agent*

Disallow

/profile*

Disallow

/en/profile*

Disallow

/ar/profile*

Disallow

*?*sort=*

Disallow

/p/*

Disallow

/en/p/*

Disallow

/ar/p/*

zealbot

Rule	Path
Disallow	/

Rule

Path

Disallow

msiecrawler

Rule	Path
Disallow	/

Rule

Path

Disallow

sitesnagger

Rule	Path
Disallow	/

Rule

Path

Disallow

webstripper

Rule	Path
Disallow	/

Rule

Path

Disallow

webcopier

Rule	Path
Disallow	/

Rule

Path

Disallow

fetch

Rule	Path
Disallow	/

Rule

Path

Disallow

offline explorer

Rule	Path
Disallow	/

Rule

Path

Disallow

teleport

Rule	Path
Disallow	/

Rule

Path

Disallow

teleportpro

Rule	Path
Disallow	/

Rule

Path

Disallow

webzip

Rule	Path
Disallow	/

Rule

Path

Disallow

linko

Rule	Path
Disallow	/

Rule

Path

Disallow

httrack

Rule	Path
Disallow	/

Rule

Path

Disallow

microsoft.url.control

Rule	Path
Disallow	/

Rule

Path

Disallow

xenu

Rule	Path
Disallow	/

Rule

Path

Disallow

larbin

Rule	Path
Disallow	/

Rule

Path

Disallow

libwww

Rule	Path
Disallow	/

Rule

Path

Disallow

zyborg

Rule	Path
Disallow	/

Rule

Path

Disallow

download ninja

Rule	Path
Disallow	/

Rule

Path

Disallow

webreaper

Rule	Path
Disallow	/

Rule

Path

Disallow

screaming frog seo spider

Rule	Path
Disallow	/

Rule

Path

Disallow

Other Records

Field	Value
sitemap	https://houza.com/sitemaps/sitemaps.xml.gz

Field

Value

sitemap

https://houza.com/sitemaps/sitemaps.xml.gz

Comments

Disallow name search param in agencies searchpages
Disallow search variations of listing pages
Disallow agent listings pages for now
Disallow profile pages
Disallow additional parameters on listing pages that lead to same canonicals
Don't crawl short links
No bad bots allowed

houza.comrobots.txt

Resource Scan

Scan Details

Last Successful Scan

Groups

*

zealbot

msiecrawler

sitesnagger

webstripper

webcopier

fetch

offline explorer

teleport

teleportpro

webzip

linko

httrack

microsoft.url.control

xenu

larbin

libwww

zyborg

download ninja

webreaper

screaming frog seo spider

Other Records

Comments

houza.com
robots.txt