cilip.org.uk
robots.txt

Robots Exclusion Standard data for cilip.org.uk

Resource Scan

Scan Details

Site Domain cilip.org.uk
Base Domain cilip.org.uk
Scan Status Failed
Failure StageFetching resource.
Failure ReasonServer returned a client error.
Last Scan5/25/2025, 11:56:50 AM
Next Scan 6/1/2025, 11:56:50 AM

Last Successful Scan

Scanned4/24/2025, 11:47:28 AM
URL https://www.cilip.org.uk/robots.txt
Domain IPs 35.169.50.49, 35.173.82.140, 35.174.132.21
Response IP 35.174.132.21
Found Yes
Hash 6e8105e816af790353e9e971d7eadc5d5fb9b46026b1e15e7ff61f9fb964517b
SimHash ec945d42c3d0

Groups

*

Rule Path
Disallow /global_inc/
Allow /global_inc/*.css
Allow /global_inc/*.js

*

Rule Path
Disallow /global_engine/ajax/

siteauditbot

Rule Path
Allow /

semrushbot-si

Rule Path
Allow /

Other Records

Field Value
sitemap https://www.cilip.org.uk/autositemapindex.xml

Comments

  • When crawlers hit the engine dir they sometimes publish confusing links to site content
  • in their search results so we exclude these specific engines from crawling it.
  • Note: Certain crawlers do need access to this directory so we do not want a blanket
  • exlude statment here.

Warnings

  • 36 invalid lines.