blog.cuatrecasas.com
robots.txt

Robots Exclusion Standard data for blog.cuatrecasas.com

Resource Scan

Scan Details

Site Domain blog.cuatrecasas.com
Base Domain cuatrecasas.com
Scan Status Ok
Last Scan2025-09-24T07:06:49+00:00
Next Scan 2025-10-24T07:06:49+00:00

Last Scan

Scanned2025-09-24T07:06:49+00:00
URL https://blog.cuatrecasas.com/robots.txt
Domain IPs 13.39.11.93
Response IP 13.37.145.162
Found Yes
Hash 63eef2614fc52a77eea0f1a85029fbe3e9c3d025f7c24fc4d0f4fda6a61c373d
SimHash 3c143150c544

Groups

*

Rule Path
Disallow /admin/
Disallow /bundles/
Disallow /bundles_old/
Disallow /erecruiting/
Disallow /images/
Allow /images/cache/
Disallow /img/
Disallow /media_repository/
Allow /summernote/
Allow /resources/
Disallow /web/
Allow /web/assets/
Allow /web/vendor/

twitterbot

Rule Path
Allow /images/

gptbot

Rule Path
Disallow /

oai-searchbot
chatgpt-user
perplexitybot
bingbot
googlebot
google-extended

Product Comment
oai-searchbot ChatGPT Search (OpenAI)
chatgpt-user ChatGPT (user-requested searches)
perplexitybot Perplexity.ai (AI-powered search engine)
bingbot Microsoft Bing (search/AI crawler)
googlebot Google Search (search crawler)
google-extended Google AI services (e.g. Bard)
Rule Path
Disallow /admin/
Disallow /bundles/
Disallow /bundles_old/
Disallow /erecruiting/
Disallow /images/
Allow /images/cache/
Disallow /img/
Disallow /media_repository/
Allow /summernote/
Allow /resources/
Disallow /web/
Allow /web/assets/
Allow /web/vendor/

Other Records

Field Value
sitemap https://www.cuatrecasas.com/sitemap.xml

Comments

  • This file is to prevent the crawling and indexing of certain parts
  • of your site by web crawlers and spiders run by sites like Yahoo!
  • and Google. By telling these "robots" where not to go on your site,
  • you save bandwidth and server resources.
  • For more information about the robots.txt standard, see:
  • http://www.robotstxt.org/wc/robots.html
  • Sitemap
  • Directories
  • Files
  • Disallow: /calendario.php
  • Web_Service
  • Disallow: /*noticias/table
  • Certain social media sites are whitelisted to allow crawlers to access page markup when links to /images are shared.
  • --- Rules added for Artificial Intelligence (AI) bots ---
  • OpenAI (crawler for AI training) - Total restriction
  • AI bots with access allowed under the same general restrictions
  • End of rules for AI bots