hd.se
robots.txt

Robots Exclusion Standard data for hd.se

Archived Snapshots

Resource Scan

Scan Details

Site Domain	hd.se
Base Domain	hd.se
Scan Status	Ok
Last Scan	2024-11-16T21:44:42+00:00
Next Scan	2024-11-23T21:44:42+00:00

Last Scan

Scanned	2024-11-16T21:44:42+00:00
URL	https://hd.se/robots.txt
Redirect	https://www.hd.se/robots.txt
Redirect Domain	www.hd.se
Redirect Base	hd.se
Domain IPs	34.149.169.35
Redirect IPs	146.75.117.91, 2a04:4e42:9::347
Response IP	151.101.37.91
Found	Yes
Hash	afa2233ca88fc05270e08692939fdb436014ed3ddbbffbce172a330225da035e
SimHash	6a5ff1604d74

Groups

*

Rule	Path
Disallow	/sok/
Disallow	/kop/
Disallow	/bn/id/*
Disallow	/foljer
Disallow	/api/*

Rule

Path

Disallow

/sok/

Disallow

/kop/

Disallow

/bn/id/*

Disallow

/foljer

Disallow

/api/*

ccbot

Rule	Path
Disallow	/

Rule

Path

Disallow

/

chatgpt-user

Rule	Path
Disallow	/

Rule

Path

Disallow

/

gptbot

Rule	Path
Disallow	/

Rule

Path

Disallow

/

google-extended

Rule	Path
Disallow	/

Rule

Path

Disallow

/

omgilibot

Rule	Path
Disallow	/

Rule

Path

Disallow

/

omgili

Rule	Path
Disallow	/

Rule

Path

Disallow

/

facebookbot

Rule	Path
Disallow	/

Rule

Path

Disallow

/

Back to top

Comments

Common Crawl robot, the resulting dataset is the primary training corpus in every LLM.
ChatGPT robot, used to improve the ChatGPT LLM.
ChatGPT robot, may be used to improve the ChatGPT LLM.
Robot used to improve Bard and Vertex AI LLMs.
webz.io robot, the resulting dataset can and is purchased to train LLMs.
webz.io robot, the resulting dataset can and is purchased to train LLMs.
FacebookBot crawls public web pages to improve LLMs for Facebook's speech recognition technology.

Back to top

hd.serobots.txt

Resource Scan

Scan Details

Last Scan

Groups

*

ccbot

chatgpt-user

gptbot

google-extended

omgilibot

omgili

facebookbot

Comments

hd.se
robots.txt