cilip.org.uk
robots.txt

Robots Exclusion Standard data for cilip.org.uk

Archived Snapshots

Resource Scan

Scan Details

Site Domain	cilip.org.uk
Base Domain	cilip.org.uk
Scan Status	Failed
Failure Stage	Fetching resource.
Failure Reason	Server returned a client error.
Last Scan	5/25/2025, 11:56:50 AM
Next Scan	6/1/2025, 11:56:50 AM

Last Successful Scan

Scanned	4/24/2025, 11:47:28 AM
URL	https://www.cilip.org.uk/robots.txt
Domain IPs	35.169.50.49, 35.173.82.140, 35.174.132.21
Response IP	35.174.132.21
Found	Yes
Hash	6e8105e816af790353e9e971d7eadc5d5fb9b46026b1e15e7ff61f9fb964517b
SimHash	ec945d42c3d0

Groups

*

Rule	Path
Disallow	/global_inc/
Allow	/global_inc/*.css
Allow	/global_inc/*.js

Rule

Path

Disallow

/global_inc/

Allow

/global_inc/*.css

Allow

/global_inc/*.js

*

Rule	Path
Disallow	/global_engine/ajax/

Rule

Path

Disallow

/global_engine/ajax/

siteauditbot

Rule	Path
Allow	/

Rule

Path

Allow

/

semrushbot-si

Rule	Path
Allow	/

Rule

Path

Allow

/

Back to top

Other Records

Field	Value
sitemap	https://www.cilip.org.uk/autositemapindex.xml

Field

Value

sitemap

https://www.cilip.org.uk/autositemapindex.xml

Back to top

Comments

When crawlers hit the engine dir they sometimes publish confusing links to site content
in their search results so we exclude these specific engines from crawling it.
Note: Certain crawlers do need access to this directory so we do not want a blanket
exlude statment here.

Back to top

Warnings

36 invalid lines.

Back to top

cilip.org.ukrobots.txt

Resource Scan

Scan Details

Last Successful Scan

Groups

*

*

siteauditbot

semrushbot-si

Other Records

Comments

Warnings

cilip.org.uk
robots.txt