vlt.se
robots.txt

Robots Exclusion Standard data for vlt.se

Archived Snapshots

Resource Scan

Scan Details

Site Domain	vlt.se
Base Domain	vlt.se
Scan Status	Ok
Last Scan	2025-07-28T08:25:10+00:00
Next Scan	2025-08-04T08:25:10+00:00

Last Scan

Scanned	2025-07-28T08:25:10+00:00
URL	https://vlt.se/robots.txt
Redirect	https://www.vlt.se/robots.txt
Redirect Domain	www.vlt.se
Redirect Base	vlt.se
Domain IPs	34.149.169.35
Redirect IPs	151.101.37.91, 2a04:4e42:9::347
Response IP	146.75.117.91
Found	Yes
Hash	6b6cc1a80e57d536ac2d24685b68fa551a8cf8828604020990a243f59d8f17f8
SimHash	e01c137c6d75

Groups

*

Rule	Path
Disallow	/sok/
Disallow	/kop/
Disallow	/bn/id/*
Disallow	/foljer
Disallow	/api/*

Rule

Path

Disallow

/sok/

Disallow

/kop/

Disallow

/bn/id/*

Disallow

/foljer

Disallow

/api/*

ccbot

Rule	Path
Disallow	/

Rule

Path

Disallow

chatgpt-user

Rule	Path
Disallow	/

Rule

Path

Disallow

gptbot

Rule	Path
Disallow	/

Rule

Path

Disallow

google-extended

Rule	Path
Disallow	/

Rule

Path

Disallow

google-cloudvertexbot

Rule	Path
Disallow	/

Rule

Path

Disallow

omgilibot

Rule	Path
Disallow	/

Rule

Path

Disallow

omgili

Rule	Path
Disallow	/

Rule

Path

Disallow

facebookbot

Rule	Path
Disallow	/

Rule

Path

Disallow

claudebot

Rule	Path
Disallow	/

Rule

Path

Disallow

diffbot

Rule	Path
Disallow	/

Rule

Path

Disallow

duckassistbot

Rule	Path
Disallow	/

Rule

Path

Disallow

perplexitybot

Rule	Path
Disallow	/

Rule

Path

Disallow

cohere-ai

Rule	Path
Disallow	/

Rule

Path

Disallow

cohere-training-data-crawler

Rule	Path
Disallow	/

Rule

Path

Disallow

meta-externalagent

Rule	Path
Disallow	/

Rule

Path

Disallow

meta-externalfetcher

Rule	Path
Disallow	/

Rule

Path

Disallow

timpibot

Rule	Path
Disallow	/

Rule

Path

Disallow

webzio-extended

Rule	Path
Disallow	/

Rule

Path

Disallow

youbot

Rule	Path
Disallow	/

Rule

Path

Disallow

amazonbot

Rule	Path
Disallow	/

Rule

Path

Disallow

bytespider

Rule	Path
Disallow	/

Rule

Path

Disallow

anthropic-ai

Rule	Path
Disallow	/

Rule

Path

Disallow

oai-searchbot

Rule	Path
Disallow	/

Rule

Path

Disallow

velenpublicwebcrawler

Rule	Path
Disallow	/

Rule

Path

Disallow

Comments

Common Crawl robot, the resulting dataset is the primary training corpus in every LLM.
ChatGPT robot, used to improve the ChatGPT LLM.
ChatGPT robot, may be used to improve the ChatGPT LLM.
Robot used to improve Bard and Vertex AI LLMs.
Associated with Google Vertex AI agents
webz.io robot, the resulting dataset can and is purchased to train LLMs.
webz.io robot, the resulting dataset can and is purchased to train LLMs.
FacebookBot crawls public web pages to improve LLMs for Facebook's speech recognition technology.
Another agent used by Anthropic that is more specifically related to Claude
Diffbot crawls the web in or others to train their LLMs.
Uses scraped data on-the-fly to create answers for DuckAssist.
Used by perplexity.ai. Generates text based on scraped material.
Cohere’s chatbot.
Cohere’s chatbot.
Use cases such as training AI models or improving products by indexing content directly.
Crawler performs user-initiated fetches of individual links in support of some AI tools.
Used by Timpi to scrape data for training their Large Language Models.
Used by Webz.io to indicate that your site should not be included those using it to train AI models.
Crawler behind You.com’s AI search and browser assistant, indexing content for real-time answers.
Amazonbot is used to train Amazon services such as Alexa.
Bytespider is ByteDance's bot and may not respect robots.txt.
Robot used to improve Anthropic AI LLMs.
OpenAI search bot
Velen.io/Hunter.io "build business datasets and machine learning models to better understand the web" - seems to focus on collecting email adresses for spam though.

vlt.serobots.txt

Resource Scan

Scan Details

Last Scan

Groups

*

ccbot

chatgpt-user

gptbot

google-extended

google-cloudvertexbot

omgilibot

omgili

facebookbot

claudebot

diffbot

duckassistbot

perplexitybot

cohere-ai

cohere-training-data-crawler

meta-externalagent

meta-externalfetcher

timpibot

webzio-extended

youbot

amazonbot

bytespider

anthropic-ai

oai-searchbot

velenpublicwebcrawler

Comments

vlt.se
robots.txt