halsoliv.expressen.se
robots.txt

Robots Exclusion Standard data for halsoliv.expressen.se

Archived Snapshots

Resource Scan

Scan Details

Site Domain	halsoliv.expressen.se
Base Domain	expressen.se
Scan Status	Ok
Last Scan	2025-08-04T10:15:28+00:00
Next Scan	2025-08-11T10:15:28+00:00

Last Scan

Scanned	2025-08-04T10:15:28+00:00
URL	https://halsoliv.expressen.se/robots.txt
Domain IPs	146.75.117.91, 2a04:4e42:8d::347
Response IP	151.101.37.91
Found	Yes
Hash	1a8814c8c51ee1d8fe43c2e4c8f39c14c104e99780690ec8310afcc3712093a9
SimHash	e81c117ced75

Groups

ccbot

Rule	Path
Disallow	/

Rule

Path

Disallow

chatgpt-user

Rule	Path
Disallow	/

Rule

Path

Disallow

gptbot

Rule	Path
Disallow	/

Rule

Path

Disallow

google-extended

Rule	Path
Disallow	/

Rule

Path

Disallow

google-cloudvertexbot

Rule	Path
Disallow	/

Rule

Path

Disallow

omgilibot

Rule	Path
Disallow	/

Rule

Path

Disallow

omgili

Rule	Path
Disallow	/

Rule

Path

Disallow

facebookbot

Rule	Path
Disallow	/

Rule

Path

Disallow

claudebot

Rule	Path
Disallow	/

Rule

Path

Disallow

diffbot

Rule	Path
Disallow	/

Rule

Path

Disallow

duckassistbot

Rule	Path
Disallow	/

Rule

Path

Disallow

perplexitybot

Rule	Path
Disallow	/

Rule

Path

Disallow

cohere-ai

Rule	Path
Disallow	/

Rule

Path

Disallow

cohere-training-data-crawler

Rule	Path
Disallow	/

Rule

Path

Disallow

meta-externalagent

Rule	Path
Disallow	/

Rule

Path

Disallow

meta-externalfetcher

Rule	Path
Disallow	/

Rule

Path

Disallow

timpibot

Rule	Path
Disallow	/

Rule

Path

Disallow

webzio-extended

Rule	Path
Disallow	/

Rule

Path

Disallow

youbot

Rule	Path
Disallow	/

Rule

Path

Disallow

amazonbot

Rule	Path
Disallow	/

Rule

Path

Disallow

bytespider

Rule	Path
Disallow	/

Rule

Path

Disallow

anthropic-ai

Rule	Path
Disallow	/

Rule

Path

Disallow

oai-searchbot

Rule	Path
Disallow	/

Rule

Path

Disallow

velenpublicwebcrawler

Rule	Path
Disallow	/

Rule

Path

Disallow

Other Records

Field	Value
sitemap	https://halsoliv.expressen.se/sitemap.xml

Field

Value

sitemap

https://halsoliv.expressen.se/sitemap.xml

Comments

Common Crawl robot, the resulting dataset is the primary training corpus in every LLM.
ChatGPT robot, used to improve the ChatGPT LLM.
ChatGPT robot, may be used to improve the ChatGPT LLM.
Robot used to improve Bard and Vertex AI LLMs.
Associated with Google Vertex AI agents
webz.io robot, the resulting dataset can and is purchased to train LLMs.
webz.io robot, the resulting dataset can and is purchased to train LLMs.
FacebookBot crawls public web pages to improve LLMs for Facebook's speech recognition technology.
Another agent used by Anthropic that is more specifically related to Claude
Diffbot crawls the web in or others to train their LLMs.
Uses scraped data on-the-fly to create answers for DuckAssist.
Used by perplexity.ai. Generates text based on scraped material.
Cohere’s chatbot.
Cohere’s chatbot.
Use cases such as training AI models or improving products by indexing content directly.
Crawler performs user-initiated fetches of individual links in support of some AI tools.
Used by Timpi to scrape data for training their Large Language Models.
Used by Webz.io to indicate that your site should not be included those using it to train AI models.
Crawler behind You.com’s AI search and browser assistant, indexing content for real-time answers.
Amazonbot is used to train Amazon services such as Alexa.
Bytespider is ByteDance's bot and may not respect robots.txt.
Robot used to improve Anthropic AI LLMs.
OpenAI search bot
Velen.io/Hunter.io "build business datasets and machine learning models to better understand the web" - seems to focus on collecting email adresses for spam though.

halsoliv.expressen.serobots.txt

Resource Scan

Scan Details

Last Scan

Groups

ccbot

chatgpt-user

gptbot

google-extended

google-cloudvertexbot

omgilibot

omgili

facebookbot

claudebot

diffbot

duckassistbot

perplexitybot

cohere-ai

cohere-training-data-crawler

meta-externalagent

meta-externalfetcher

timpibot

webzio-extended

youbot

amazonbot

bytespider

anthropic-ai

oai-searchbot

velenpublicwebcrawler

Other Records

Comments

halsoliv.expressen.se
robots.txt