blog.cuatrecasas.com
robots.txt

Robots Exclusion Standard data for blog.cuatrecasas.com

Archived Snapshots

Resource Scan

Scan Details

Site Domain	blog.cuatrecasas.com
Base Domain	cuatrecasas.com
Scan Status	Ok
Last Scan	2025-09-24T07:06:49+00:00
Next Scan	2025-10-24T07:06:49+00:00

Last Scan

Scanned	2025-09-24T07:06:49+00:00
URL	https://blog.cuatrecasas.com/robots.txt
Domain IPs	13.39.11.93
Response IP	13.37.145.162
Found	Yes
Hash	63eef2614fc52a77eea0f1a85029fbe3e9c3d025f7c24fc4d0f4fda6a61c373d
SimHash	3c143150c544

Groups

*

Rule	Path
Disallow	/admin/
Disallow	/bundles/
Disallow	/bundles_old/
Disallow	/erecruiting/
Disallow	/images/
Allow	/images/cache/
Disallow	/img/
Disallow	/media_repository/
Allow	/summernote/
Allow	/resources/
Disallow	/web/
Allow	/web/assets/
Allow	/web/vendor/

Rule

Path

Disallow

/admin/

Disallow

/bundles/

Disallow

/bundles_old/

Disallow

/erecruiting/

Disallow

/images/

Allow

/images/cache/

Disallow

/img/

Disallow

/media_repository/

Allow

/summernote/

Allow

/resources/

Disallow

/web/

Allow

/web/assets/

Allow

/web/vendor/

twitterbot

Rule	Path
Allow	/images/

Rule

Path

Allow

/images/

gptbot

Rule	Path
Disallow	/

Rule

Path

Disallow

/

oai-searchbot
chatgpt-user
perplexitybot
bingbot
googlebot
google-extended

Product	Comment
oai-searchbot	ChatGPT Search (OpenAI)
chatgpt-user	ChatGPT (user-requested searches)
perplexitybot	Perplexity.ai (AI-powered search engine)
bingbot	Microsoft Bing (search/AI crawler)
googlebot	Google Search (search crawler)
google-extended	Google AI services (e.g. Bard)

Product

Comment

oai-searchbot

ChatGPT Search (OpenAI)

chatgpt-user

ChatGPT (user-requested searches)

perplexitybot

Perplexity.ai (AI-powered search engine)

bingbot

Microsoft Bing (search/AI crawler)

googlebot

Google Search (search crawler)

google-extended

Google AI services (e.g. Bard)

Rule	Path
Disallow	/admin/
Disallow	/bundles/
Disallow	/bundles_old/
Disallow	/erecruiting/
Disallow	/images/
Allow	/images/cache/
Disallow	/img/
Disallow	/media_repository/
Allow	/summernote/
Allow	/resources/
Disallow	/web/
Allow	/web/assets/
Allow	/web/vendor/

Rule

Path

Disallow

/admin/

Disallow

/bundles/

Disallow

/bundles_old/

Disallow

/erecruiting/

Disallow

/images/

Allow

/images/cache/

Disallow

/img/

Disallow

/media_repository/

Allow

/summernote/

Allow

/resources/

Disallow

/web/

Allow

/web/assets/

Allow

/web/vendor/

Back to top

Other Records

Field	Value
sitemap	https://www.cuatrecasas.com/sitemap.xml

Field

Value

sitemap

https://www.cuatrecasas.com/sitemap.xml

Back to top

Comments

This file is to prevent the crawling and indexing of certain parts
of your site by web crawlers and spiders run by sites like Yahoo!
and Google. By telling these "robots" where not to go on your site,
you save bandwidth and server resources.
For more information about the robots.txt standard, see:
http://www.robotstxt.org/wc/robots.html
Sitemap
Directories
Files
Disallow: /calendario.php
Web_Service
Disallow: /*noticias/table
Certain social media sites are whitelisted to allow crawlers to access page markup when links to /images are shared.
--- Rules added for Artificial Intelligence (AI) bots ---
OpenAI (crawler for AI training) - Total restriction
AI bots with access allowed under the same general restrictions
End of rules for AI bots

Back to top

blog.cuatrecasas.comrobots.txt

Resource Scan

Scan Details

Last Scan

Groups

*

twitterbot

gptbot

oai-searchbotchatgpt-userperplexitybotbingbotgooglebotgoogle-extended

Other Records

Comments

blog.cuatrecasas.com
robots.txt

oai-searchbot
chatgpt-user
perplexitybot
bingbot
googlebot
google-extended