heili.fi
robots.txt

Robots Exclusion Standard data for heili.fi

Archived Snapshots

Resource Scan

Scan Details

Site Domain	heili.fi
Base Domain	heili.fi
Scan Status	Ok
Last Scan	2024-11-15T23:58:39+00:00
Next Scan	2024-11-22T23:58:39+00:00

Last Scan

Scanned	2024-11-15T23:58:39+00:00
URL	https://heili.fi/robots.txt
Redirect	https://www.heili.fi:443/robots.txt
Redirect Domain	www.heili.fi
Redirect Base	heili.fi
Domain IPs	54.246.245.212
Redirect IPs	216.137.52.6, 216.137.52.64, 216.137.52.74, 216.137.52.96
Response IP	18.165.122.104
Found	Yes
Hash	ed120954f9e69017ffc7fe305af94f7e2553eb50dce6b1ad077b5ede29681ed7
SimHash	6230f15d3510

Groups

amazonbot

Rule	Path
Disallow	/

Rule

Path

Disallow

anthropic-ai

Rule	Path
Disallow	/

Rule

Path

Disallow

claudebot

Rule	Path
Disallow	/

Rule

Path

Disallow

claude-web

Rule	Path
Disallow	/

Rule

Path

Disallow

bytespider

Rule	Path
Disallow	/

Rule

Path

Disallow

gptbot

Rule	Path
Disallow	/

Rule

Path

Disallow

chatgpt-user

Rule	Path
Disallow	/

Rule

Path

Disallow

cohere-ai

Rule	Path
Disallow	/

Rule

Path

Disallow

ccbot

Rule	Path
Disallow	/

Rule

Path

Disallow

diffbot

Rule	Path
Disallow	/

Rule

Path

Disallow

facebookbot

Rule	Path
Disallow	/

Rule

Path

Disallow

google-extended

Rule	Path
Disallow	/

Rule

Path

Disallow

imagesiftbot

Rule	Path
Disallow	/

Rule

Path

Disallow

meta-externalagent

Rule	Path
Disallow	/

Rule

Path

Disallow

omgilibot

Rule	Path
Disallow	/

Rule

Path

Disallow

omgili

Rule	Path
Disallow	/

Rule

Path

Disallow

oai-searchbot

Rule	Path
Disallow	/

Rule

Path

Disallow

perplexitybot

Rule	Path
Disallow	/

Rule

Path

Disallow

youbot

Rule	Path
Disallow	/

Rule

Path

Disallow

googlebot

Rule	Path
Disallow	/kaupalliset/*.jpg$
Disallow	/kaupalliset/*.Jpg$
Disallow	/kaupalliset/*.jPg$
Disallow	/kaupalliset/*.jpG$
Disallow	/kaupalliset/*.jPG$
Disallow	/kaupalliset/*.JPg$
Disallow	/kaupalliset/*.JpG$
Disallow	/kaupalliset/*.JPG$
Disallow	/kaupalliset/*.png$
Disallow	/kaupalliset/*.Png$
Disallow	/kaupalliset/*.pNg$
Disallow	/kaupalliset/*.pnG$
Disallow	/kaupalliset/*.pNG$
Disallow	/kaupalliset/*.PNg$
Disallow	/kaupalliset/*.PnG$
Disallow	/kaupalliset/*.PNG$
Disallow	/kaupalliset/*.gif$
Disallow	/kaupalliset/*.Gif$
Disallow	/kaupalliset/*.gIf$
Disallow	/kaupalliset/*.giF$
Disallow	/kaupalliset/*.gIF$
Disallow	/kaupalliset/*.GIf$
Disallow	/kaupalliset/*.GiF$
Disallow	/kaupalliset/*.GIF$

Rule

Path

Disallow

/kaupalliset/*.jpg$

Disallow

/kaupalliset/*.Jpg$

Disallow

/kaupalliset/*.jPg$

Disallow

/kaupalliset/*.jpG$

Disallow

/kaupalliset/*.jPG$

Disallow

/kaupalliset/*.JPg$

Disallow

/kaupalliset/*.JpG$

Disallow

/kaupalliset/*.JPG$

Disallow

/kaupalliset/*.png$

Disallow

/kaupalliset/*.Png$

Disallow

/kaupalliset/*.pNg$

Disallow

/kaupalliset/*.pnG$

Disallow

/kaupalliset/*.pNG$

Disallow

/kaupalliset/*.PNg$

Disallow

/kaupalliset/*.PnG$

Disallow

/kaupalliset/*.PNG$

Disallow

/kaupalliset/*.gif$

Disallow

/kaupalliset/*.Gif$

Disallow

/kaupalliset/*.gIf$

Disallow

/kaupalliset/*.giF$

Disallow

/kaupalliset/*.gIF$

Disallow

/kaupalliset/*.GIf$

Disallow

/kaupalliset/*.GiF$

Disallow

/kaupalliset/*.GIF$

Other Records

Field	Value
sitemap	https://www.heili.fi/sitemap.xml

Field

Value

sitemap

https://www.heili.fi/sitemap.xml

Comments

Scraping is not allowed for training AI language models, or selling to AI companies
Amazon: used to improve/enable Alexa to answer questions
Anthropic/Claude: provides no documentation whether these are effective
Anthropic/Claude
Anthropic/Claude
ByteDance LLMs, including Doubao
ChatGPT crawler
ChatGPT plugins
Cohere: associated with Cohere's chatbot
Common Crawl
Diffbot: collects data to train LLMs
Facebook: crawls to improve language models
Google: Bard and Vertex AI generative APIs
ImagesiftBot: associated with a company that produces models for image generation
Meta
Omgilibot/webz.io: sells data for training LLMs
OpenAI Search
Perplexity AI
SuSea
Disable indexing of native ad images
Sitemap

heili.firobots.txt

Resource Scan

Scan Details

Last Scan

Groups

amazonbot

anthropic-ai

claudebot

claude-web

bytespider

gptbot

chatgpt-user

cohere-ai

ccbot

diffbot

facebookbot

google-extended

imagesiftbot

meta-externalagent

omgilibot

omgili

oai-searchbot

perplexitybot

youbot

googlebot

Other Records

Comments

heili.fi
robots.txt