vlt.se
robots.txt

Robots Exclusion Standard data for vlt.se

Resource Scan

Scan Details

Site Domain vlt.se
Base Domain vlt.se
Scan Status Ok
Last Scan2025-07-28T08:25:10+00:00
Next Scan 2025-08-04T08:25:10+00:00

Last Scan

Scanned2025-07-28T08:25:10+00:00
URL https://vlt.se/robots.txt
Redirect https://www.vlt.se/robots.txt
Redirect Domain www.vlt.se
Redirect Base vlt.se
Domain IPs 34.149.169.35
Redirect IPs 151.101.37.91, 2a04:4e42:9::347
Response IP 146.75.117.91
Found Yes
Hash 6b6cc1a80e57d536ac2d24685b68fa551a8cf8828604020990a243f59d8f17f8
SimHash e01c137c6d75

Groups

*

Rule Path
Disallow /sok/
Disallow /kop/
Disallow /bn/id/*
Disallow /foljer
Disallow /api/*

ccbot

Rule Path
Disallow /

chatgpt-user

Rule Path
Disallow /

gptbot

Rule Path
Disallow /

google-extended

Rule Path
Disallow /

google-cloudvertexbot

Rule Path
Disallow /

omgilibot

Rule Path
Disallow /

omgili

Rule Path
Disallow /

facebookbot

Rule Path
Disallow /

claudebot

Rule Path
Disallow /

diffbot

Rule Path
Disallow /

duckassistbot

Rule Path
Disallow /

perplexitybot

Rule Path
Disallow /

cohere-ai

Rule Path
Disallow /

cohere-training-data-crawler

Rule Path
Disallow /

meta-externalagent

Rule Path
Disallow /

meta-externalfetcher

Rule Path
Disallow /

timpibot

Rule Path
Disallow /

webzio-extended

Rule Path
Disallow /

youbot

Rule Path
Disallow /

amazonbot

Rule Path
Disallow /

bytespider

Rule Path
Disallow /

anthropic-ai

Rule Path
Disallow /

oai-searchbot

Rule Path
Disallow /

velenpublicwebcrawler

Rule Path
Disallow /

Comments

  • Common Crawl robot, the resulting dataset is the primary training corpus in every LLM.
  • ChatGPT robot, used to improve the ChatGPT LLM.
  • ChatGPT robot, may be used to improve the ChatGPT LLM.
  • Robot used to improve Bard and Vertex AI LLMs.
  • Associated with Google Vertex AI agents
  • webz.io robot, the resulting dataset can and is purchased to train LLMs.
  • webz.io robot, the resulting dataset can and is purchased to train LLMs.
  • FacebookBot crawls public web pages to improve LLMs for Facebook's speech recognition technology.
  • Another agent used by Anthropic that is more specifically related to Claude
  • Diffbot crawls the web in or others to train their LLMs.
  • Uses scraped data on-the-fly to create answers for DuckAssist.
  • Used by perplexity.ai. Generates text based on scraped material.
  • Cohere’s chatbot.
  • Cohere’s chatbot.
  • Use cases such as training AI models or improving products by indexing content directly.
  • Crawler performs user-initiated fetches of individual links in support of some AI tools.
  • Used by Timpi to scrape data for training their Large Language Models.
  • Used by Webz.io to indicate that your site should not be included those using it to train AI models.
  • Crawler behind You.com’s AI search and browser assistant, indexing content for real-time answers.
  • Amazonbot is used to train Amazon services such as Alexa.
  • Bytespider is ByteDance's bot and may not respect robots.txt.
  • Robot used to improve Anthropic AI LLMs.
  • OpenAI search bot
  • Velen.io/Hunter.io "build business datasets and machine learning models to better understand the web" - seems to focus on collecting email adresses for spam though.