malmolund.city.se
robots.txt

Robots Exclusion Standard data for malmolund.city.se

Resource Scan

Scan Details

Site Domain malmolund.city.se
Base Domain city.se
Scan Status Ok
Last Scan2024-05-14T16:36:01+00:00
Next Scan 2024-05-21T16:36:01+00:00

Last Scan

Scanned2024-05-14T16:36:01+00:00
URL https://malmolund.city.se/robots.txt
Redirect https://www.sydsvenskan.se/robots.txt
Redirect Domain www.sydsvenskan.se
Redirect Base sydsvenskan.se
Domain IPs 34.160.47.224
Redirect IPs 146.75.117.91, 2a04:4e42:9::347
Response IP 146.75.117.91
Found Yes
Hash 4d4913fd24187cb0600eaae967188c26a2c9a990249732a648bd8fdb8bfb18c5
SimHash 625ff1444d74

Groups

*

Rule Path
Disallow /sok/
Disallow /kop/
Disallow /logga-in
Disallow /bn/id/*
Disallow /foljer
Disallow /api/*

ccbot

Rule Path
Disallow /

chatgpt-user

Rule Path
Disallow /

gptbot

Rule Path
Disallow /

google-extended

Rule Path
Disallow /

omgilibot

Rule Path
Disallow /

omgili

Rule Path
Disallow /

facebookbot

Rule Path
Disallow /

Comments

  • Common Crawl robot, the resulting dataset is the primary training corpus in every LLM.
  • ChatGPT robot, used to improve the ChatGPT LLM.
  • ChatGPT robot, may be used to improve the ChatGPT LLM.
  • Robot used to improve Bard and Vertex AI LLMs.
  • webz.io robot, the resulting dataset can and is purchased to train LLMs.
  • webz.io robot, the resulting dataset can and is purchased to train LLMs.
  • FacebookBot crawls public web pages to improve LLMs for Facebook's speech recognition technology.