/.well-known/

Log In Sign Up

companies.gov.cy
robots.txt

Robots Exclusion Standard data for companies.gov.cy

Archived Snapshots

Resource Scan

Scan Details

Site Domain	companies.gov.cy
Base Domain	companies.gov.cy
Scan Status	Ok
Last Scan	2024-05-05T19:56:11+00:00
Next Scan	2024-06-04T19:56:11+00:00

Last Scan

Scanned	2024-05-05T19:56:11+00:00
URL	https://companies.gov.cy/robots.txt
Domain IPs	192.124.249.170
Response IP	192.124.249.170
Found	Yes
Hash	12563d649398bc2e5b526de5103788dd47f62931e297470b555ce534b0331fcf
SimHash	a136141d6bf4

Groups

*

Rule

Path

Disallow

/includes

Disallow

/modules

Disallow

/templates

Disallow

/tools

Back to top

Other Records

Field

Value

sitemap

https://www.companies.gov.cy/_Google/sitemap.xml

Back to top

Comments

/robots.txt file for http://website_url/
email webmaster@website_url
Before the website goes live use the below set of commands
This tells all the crawlers not to crawl the site.
User-agent: *
Disallow: /
After the website goes live use the below set of commands
This tells all the crawlers to crawl the whole site except the /admin part.
This will clear the visitation statistics as well.
Disallow: /layout

Back to top