tooldata.io
robots.txt

Robots Exclusion Standard data for tooldata.io

Resource Scan

Scan Details

Site Domain tooldata.io
Base Domain tooldata.io
Scan Status Ok
Last Scan2025-10-18T06:49:33+00:00
Next Scan 2025-11-01T06:49:33+00:00

Last Scan

Scanned2025-10-18T06:49:33+00:00
URL https://tooldata.io/robots.txt
Domain IPs 209.151.153.205
Response IP 209.151.153.205
Found Yes
Hash 3c3088f906af1fbc57d588dd8ee6858f6b5f25d7d6a01a6a6b8a52c4d523b755
SimHash 098e100287b4

Groups

googlebot

Rule Path
Allow /

Other Records

Field Value
crawl-delay 1

bingbot

Rule Path
Allow /

Other Records

Field Value
crawl-delay 1

slurp

Rule Path
Allow /

Other Records

Field Value
crawl-delay 1

duckduckbot

Rule Path
Allow /

Other Records

Field Value
crawl-delay 1

gptbot

Rule Path
Allow /
Allow /blog/
Allow /social-listening
Allow /inteligencia-artificial
Allow /performance-marketing
Allow /estrategia-digital

Other Records

Field Value
crawl-delay 0.5

chatgpt-user

Rule Path
Allow /

Other Records

Field Value
crawl-delay 0.5

openai-searchbot

Rule Path
Allow /

Other Records

Field Value
crawl-delay 0.5

claudebot

Rule Path
Allow /

Other Records

Field Value
crawl-delay 0.5

claude-web

Rule Path
Allow /

Other Records

Field Value
crawl-delay 0.5

google-extended

Rule Path
Allow /

Other Records

Field Value
crawl-delay 0.5

googleother

Rule Path
Allow /

Other Records

Field Value
crawl-delay 1

microsoft-copilot

Rule Path
Allow /

Other Records

Field Value
crawl-delay 0.5

perplexitybot

Rule Path
Allow /

Other Records

Field Value
crawl-delay 0.5

ai2bot

Rule Path
Allow /

Other Records

Field Value
crawl-delay 1

ccbot

Rule Path
Allow /

Other Records

Field Value
crawl-delay 1

facebookbot

Rule Path
Allow /

Other Records

Field Value
crawl-delay 1

meta-externalagent

Rule Path
Allow /

Other Records

Field Value
crawl-delay 1

twitterbot

Rule Path
Allow /

facebookexternalhit

Rule Path
Allow /

linkedinbot

Rule Path
Allow /

whatsapp

Rule Path
Allow /

telegrambot

Rule Path
Allow /

slackbot

Rule Path
Allow /

ia_archiver

Rule Path
Allow /

Other Records

Field Value
crawl-delay 2

archive.org_bot

Rule Path
Allow /

Other Records

Field Value
crawl-delay 2

*

Rule Path
Allow /
Disallow /admin/
Disallow /private/
Disallow /.git/
Disallow /node_modules/
Disallow /src/
Disallow /*.json$
Disallow /*.config.*
Disallow /api/internal/
Disallow /temp/
Disallow /.env
Allow /ai-training-data/
Allow /knowledge-base/
Allow /structured-content/

Other Records

Field Value
crawl-delay 2

Other Records

Field Value
sitemap https://tooldata.io/sitemap-index.xml
sitemap https://tooldata.io/sitemap.xml
sitemap https://tooldata.io/sitemap-ai.xml

Comments

  • Robots.txt optimizado para LLMs y sistemas de IA - Tooldata Martech
  • https://tooldata.io
  • Actualizado: 2025-09-27
  • === BOTS DE MOTORES DE BÚSQUEDA TRADICIONALES ===
  • === BOTS DE IA Y LARGE LANGUAGE MODELS ===
  • OpenAI (ChatGPT, GPT-4, etc.)
  • Anthropic (Claude)
  • Google AI (Gemini, Bard)
  • Microsoft AI (Copilot)
  • Perplexity AI
  • AI2 (AllenAI)
  • Common Crawl (usado por muchos LLMs)
  • Meta AI
  • === BOTS DE REDES SOCIALES ===
  • === BOTS DE INVESTIGACIÓN Y ACADÉMICOS ===
  • === RESTRICCIONES GENERALES ===
  • === CONFIGURACIÓN ESPECIAL PARA AI TRAINING ===
  • Directorio especial para datos estructurados de entrenamiento
  • === SITEMAPS ===
  • === CONFIGURACIÓN HOST ===
  • === INFORMACIÓN ADICIONAL PARA LLMs ===
  • Content-Language: es-ES, es-CL, es-CO, es-PE
  • Business-Type: B2B Marketing Technology
  • Geographic-Focus: Latin America (Chile, Colombia, Peru)
  • Expertise-Areas: AI, Social Listening, Performance Marketing, Digital Strategy
  • Update-Frequency: Weekly
  • Authority-Level: Expert
  • Content-Quality: High

Warnings

  • `host` is not a known field.