levaochbo.expressen.se
robots.txt

Robots Exclusion Standard data for levaochbo.expressen.se

Archived Snapshots

Resource Scan

Scan Details

Site Domain	levaochbo.expressen.se
Base Domain	expressen.se
Scan Status	Ok
Last Scan	2025-05-04T17:31:40+00:00
Next Scan	2025-05-11T17:31:40+00:00

Last Scan

Scanned	2025-05-04T17:31:40+00:00
URL	https://levaochbo.expressen.se/robots.txt
Domain IPs	146.75.117.91, 2a04:4e42:9::347
Response IP	151.101.37.91
Found	Yes
Hash	8a12905489b53d8029bd7a6c55924d2b3206b5b332e24d59d1e7b9321ad7ba65
SimHash	ea161150cd65

Groups

ccbot

Rule	Path
Disallow	/

Rule

Path

Disallow

/

chatgpt-user

Rule	Path
Disallow	/

Rule

Path

Disallow

/

gptbot

Rule	Path
Disallow	/

Rule

Path

Disallow

/

google-extended

Rule	Path
Disallow	/

Rule

Path

Disallow

/

omgilibot

Rule	Path
Disallow	/

Rule

Path

Disallow

/

omgili

Rule	Path
Disallow	/

Rule

Path

Disallow

/

facebookbot

Rule	Path
Disallow	/

Rule

Path

Disallow

/

amazonbot

Rule	Path
Disallow	/

Rule

Path

Disallow

/

bytespider

Rule	Path
Disallow	/

Rule

Path

Disallow

/

anthropic-ai

Rule	Path
Disallow	/

Rule

Path

Disallow

/

oai-searchbot

Rule	Path
Disallow	/

Rule

Path

Disallow

/

velenpublicwebcrawler

Rule	Path
Disallow	/

Rule

Path

Disallow

/

Back to top

Other Records

Field	Value
sitemap	https://levaochbo.expressen.se/sitemap.xml

Field

Value

sitemap

https://levaochbo.expressen.se/sitemap.xml

Back to top

Comments

Common Crawl robot, the resulting dataset is the primary training corpus in every LLM.
ChatGPT robot, used to improve the ChatGPT LLM.
ChatGPT robot, may be used to improve the ChatGPT LLM.
Robot used to improve Bard and Vertex AI LLMs.
webz.io robot, the resulting dataset can and is purchased to train LLMs.
webz.io robot, the resulting dataset can and is purchased to train LLMs.
FacebookBot crawls public web pages to improve LLMs for Facebook's speech recognition technology.
Amazonbot is used to train Amazon services such as Alexa.
Bytespider is ByteDance's bot and may not respect robots.txt.
Robot used to improve Anthropic AI LLMs.
OpenAI search bot
Velen.io/Hunter.io "build business datasets and machine learning models to better understand the web" - seems to focus on collecting email adresses for spam though.

Back to top

levaochbo.expressen.serobots.txt

Resource Scan

Scan Details

Last Scan

Groups

ccbot

chatgpt-user

gptbot

google-extended

omgilibot

omgili

facebookbot

amazonbot

bytespider

anthropic-ai

oai-searchbot

velenpublicwebcrawler

Other Records

Comments

levaochbo.expressen.se
robots.txt