levaochbo.expressen.se
robots.txt

Robots Exclusion Standard data for levaochbo.expressen.se

Resource Scan

Scan Details

Site Domain levaochbo.expressen.se
Base Domain expressen.se
Scan Status Ok
Last Scan2025-05-04T17:31:40+00:00
Next Scan 2025-05-11T17:31:40+00:00

Last Scan

Scanned2025-05-04T17:31:40+00:00
URL https://levaochbo.expressen.se/robots.txt
Domain IPs 146.75.117.91, 2a04:4e42:9::347
Response IP 151.101.37.91
Found Yes
Hash 8a12905489b53d8029bd7a6c55924d2b3206b5b332e24d59d1e7b9321ad7ba65
SimHash ea161150cd65

Groups

ccbot

Rule Path
Disallow /

chatgpt-user

Rule Path
Disallow /

gptbot

Rule Path
Disallow /

google-extended

Rule Path
Disallow /

omgilibot

Rule Path
Disallow /

omgili

Rule Path
Disallow /

facebookbot

Rule Path
Disallow /

amazonbot

Rule Path
Disallow /

bytespider

Rule Path
Disallow /

anthropic-ai

Rule Path
Disallow /

oai-searchbot

Rule Path
Disallow /

velenpublicwebcrawler

Rule Path
Disallow /

Other Records

Field Value
sitemap https://levaochbo.expressen.se/sitemap.xml

Comments

  • Common Crawl robot, the resulting dataset is the primary training corpus in every LLM.
  • ChatGPT robot, used to improve the ChatGPT LLM.
  • ChatGPT robot, may be used to improve the ChatGPT LLM.
  • Robot used to improve Bard and Vertex AI LLMs.
  • webz.io robot, the resulting dataset can and is purchased to train LLMs.
  • webz.io robot, the resulting dataset can and is purchased to train LLMs.
  • FacebookBot crawls public web pages to improve LLMs for Facebook's speech recognition technology.
  • Amazonbot is used to train Amazon services such as Alexa.
  • Bytespider is ByteDance's bot and may not respect robots.txt.
  • Robot used to improve Anthropic AI LLMs.
  • OpenAI search bot
  • Velen.io/Hunter.io "build business datasets and machine learning models to better understand the web" - seems to focus on collecting email adresses for spam though.