plusallt.se
robots.txt

Robots Exclusion Standard data for plusallt.se

Archived Snapshots

Resource Scan

Scan Details

Site Domain	plusallt.se
Base Domain	plusallt.se
Scan Status	Ok
Last Scan	2024-09-24T18:49:09+00:00
Next Scan	2024-10-01T18:49:09+00:00

Last Scan

Scanned	2024-09-24T18:49:09+00:00
URL	https://plusallt.se/robots.txt
Redirect	https://www.plusallt.se/robots.txt
Redirect Domain	www.plusallt.se
Redirect Base	plusallt.se
Domain IPs	34.149.169.35
Redirect IPs	151.101.37.91, 2a04:4e42:8d::347
Response IP	146.75.117.91
Found	Yes
Hash	b206c5a402684d1a77747344a27669b43cf57667a4afe3bc301bc07e150f6e87
SimHash	6a74b9448d74

Groups

*

Rule	Path
Disallow	/bn/id/*

Rule

Path

Disallow

/bn/id/*

ccbot

Rule	Path
Disallow	/

Rule

Path

Disallow

/

chatgpt-user

Rule	Path
Disallow	/

Rule

Path

Disallow

/

gptbot

Rule	Path
Disallow	/

Rule

Path

Disallow

/

google-extended

Rule	Path
Disallow	/

Rule

Path

Disallow

/

omgilibot

Rule	Path
Disallow	/

Rule

Path

Disallow

/

omgili

Rule	Path
Disallow	/

Rule

Path

Disallow

/

facebookbot

Rule	Path
Disallow	/

Rule

Path

Disallow

/

Back to top

Other Records

Field	Value
sitemap	https://www.plusallt.se/sitemap.xml

Field

Value

sitemap

https://www.plusallt.se/sitemap.xml

Back to top

Comments

Common Crawl robot, the resulting dataset is the primary training corpus in every LLM.
ChatGPT robot, used to improve the ChatGPT LLM.
ChatGPT robot, may be used to improve the ChatGPT LLM.
Robot used to improve Bard and Vertex AI LLMs.
webz.io robot, the resulting dataset can and is purchased to train LLMs.
webz.io robot, the resulting dataset can and is purchased to train LLMs.
FacebookBot crawls public web pages to improve LLMs for Facebook's speech recognition technology.

Back to top

plusallt.serobots.txt

Resource Scan

Scan Details

Last Scan

Groups

*

ccbot

chatgpt-user

gptbot

google-extended

omgilibot

omgili

facebookbot

Other Records

Comments

plusallt.se
robots.txt