companies.gov.cy
robots.txt

Robots Exclusion Standard data for companies.gov.cy

Resource Scan

Scan Details

Site Domain companies.gov.cy
Base Domain companies.gov.cy
Scan Status Ok
Last Scan2024-05-05T19:56:11+00:00
Next Scan 2024-06-04T19:56:11+00:00

Last Scan

Scanned2024-05-05T19:56:11+00:00
URL https://companies.gov.cy/robots.txt
Domain IPs 192.124.249.170
Response IP 192.124.249.170
Found Yes
Hash 12563d649398bc2e5b526de5103788dd47f62931e297470b555ce534b0331fcf
SimHash a136141d6bf4

Groups

*

Rule Path
Disallow /includes
Disallow /modules
Disallow /templates
Disallow /tools

Other Records

Field Value
sitemap https://www.companies.gov.cy/_Google/sitemap.xml

Comments

  • /robots.txt file for http://website_url/
  • email webmaster@website_url
  • Before the website goes live use the below set of commands
  • This tells all the crawlers not to crawl the site.
  • User-agent: *
  • Disallow: /
  • After the website goes live use the below set of commands
  • This tells all the crawlers to crawl the whole site except the /admin part.
  • This will clear the visitation statistics as well.
  • Disallow: /layout