viivilla.se
robots.txt

Robots Exclusion Standard data for viivilla.se

Resource Scan

Scan Details

Site Domain viivilla.se
Base Domain viivilla.se
Scan Status Ok
Last Scan2024-09-21T10:56:42+00:00
Next Scan 2024-09-28T10:56:42+00:00

Last Scan

Scanned2024-09-21T10:56:42+00:00
URL https://viivilla.se/robots.txt
Domain IPs 151.101.1.91
Response IP 151.101.1.91
Found Yes
Hash 57ce2dfda798d22b27c5974a129b45270d98d4d49ae651b12b21f0b0544893d2
SimHash 6a3739648d76

Groups

ccbot

Rule Path
Disallow /

chatgpt-user

Rule Path
Disallow /

gptbot

Rule Path
Disallow /

google-extended

Rule Path
Disallow /

omgilibot

Rule Path
Disallow /

omgili

Rule Path
Disallow /

facebookbot

Rule Path
Disallow /

Other Records

Field Value
sitemap https://viivilla.se/sitemap.xml

Comments

  • Common Crawl robot, the resulting dataset is the primary training corpus in every LLM.
  • ChatGPT robot, used to improve the ChatGPT LLM.
  • ChatGPT robot, may be used to improve the ChatGPT LLM.
  • Robot used to improve Bard and Vertex AI LLMs.
  • webz.io robot, the resulting dataset can and is purchased to train LLMs.
  • webz.io robot, the resulting dataset can and is purchased to train LLMs.
  • FacebookBot crawls public web pages to improve LLMs for Facebook's speech recognition technology.