iltamakasiini.fi
robots.txt

Robots Exclusion Standard data for iltamakasiini.fi

Archived Snapshots

Resource Scan

Scan Details

Site Domain	iltamakasiini.fi
Base Domain	iltamakasiini.fi
Scan Status	Ok
Last Scan	2024-10-25T09:59:46+00:00
Next Scan	2024-11-01T09:59:46+00:00

Last Scan

Scanned	2024-10-25T09:59:46+00:00
URL	http://iltamakasiini.fi/robots.txt
Redirect	https://www.helsinginuutiset.fi/robots.txt
Redirect Domain	www.helsinginuutiset.fi
Redirect Base	helsinginuutiset.fi
Domain IPs	34.250.111.149
Redirect IPs	54.77.71.122, 99.81.148.8
Response IP	99.81.148.8
Found	Yes
Hash	8cc9d78295dc797c2246c1a2266c933bcb49020852afa2cd8c987f188c72793b
SimHash	e238f14d3530

Groups

amazonbot

Rule	Path
Disallow	/

Rule

Path

Disallow

anthropic-ai

Rule	Path
Disallow	/

Rule

Path

Disallow

claudebot

Rule	Path
Disallow	/

Rule

Path

Disallow

claude-web

Rule	Path
Disallow	/

Rule

Path

Disallow

bytespider

Rule	Path
Disallow	/

Rule

Path

Disallow

gptbot

Rule	Path
Disallow	/

Rule

Path

Disallow

chatgpt-user

Rule	Path
Disallow	/

Rule

Path

Disallow

cohere-ai

Rule	Path
Disallow	/

Rule

Path

Disallow

ccbot

Rule	Path
Disallow	/

Rule

Path

Disallow

diffbot

Rule	Path
Disallow	/

Rule

Path

Disallow

facebookbot

Rule	Path
Disallow	/

Rule

Path

Disallow

google-extended

Rule	Path
Disallow	/

Rule

Path

Disallow

imagesiftbot

Rule	Path
Disallow	/
Disallow	/

Rule

Path

Disallow

omgilibot

Rule	Path
Disallow	/

Rule

Path

Disallow

omgili

Rule	Path
Disallow	/

Rule

Path

Disallow

perplexitybot

Rule	Path
Disallow	/

Rule

Path

Disallow

youbot

Rule	Path
Disallow	/

Rule

Path

Disallow

googlebot

Rule	Path
Disallow	/kaupalliset/*.jpg$
Disallow	/kaupalliset/*.Jpg$
Disallow	/kaupalliset/*.jPg$
Disallow	/kaupalliset/*.jpG$
Disallow	/kaupalliset/*.jPG$
Disallow	/kaupalliset/*.JPg$
Disallow	/kaupalliset/*.JpG$
Disallow	/kaupalliset/*.JPG$
Disallow	/kaupalliset/*.png$
Disallow	/kaupalliset/*.Png$
Disallow	/kaupalliset/*.pNg$
Disallow	/kaupalliset/*.pnG$
Disallow	/kaupalliset/*.pNG$
Disallow	/kaupalliset/*.PNg$
Disallow	/kaupalliset/*.PnG$
Disallow	/kaupalliset/*.PNG$
Disallow	/kaupalliset/*.gif$
Disallow	/kaupalliset/*.Gif$
Disallow	/kaupalliset/*.gIf$
Disallow	/kaupalliset/*.giF$
Disallow	/kaupalliset/*.gIF$
Disallow	/kaupalliset/*.GIf$
Disallow	/kaupalliset/*.GiF$
Disallow	/kaupalliset/*.GIF$

Rule

Path

Disallow

/kaupalliset/*.jpg$

Disallow

/kaupalliset/*.Jpg$

Disallow

/kaupalliset/*.jPg$

Disallow

/kaupalliset/*.jpG$

Disallow

/kaupalliset/*.jPG$

Disallow

/kaupalliset/*.JPg$

Disallow

/kaupalliset/*.JpG$

Disallow

/kaupalliset/*.JPG$

Disallow

/kaupalliset/*.png$

Disallow

/kaupalliset/*.Png$

Disallow

/kaupalliset/*.pNg$

Disallow

/kaupalliset/*.pnG$

Disallow

/kaupalliset/*.pNG$

Disallow

/kaupalliset/*.PNg$

Disallow

/kaupalliset/*.PnG$

Disallow

/kaupalliset/*.PNG$

Disallow

/kaupalliset/*.gif$

Disallow

/kaupalliset/*.Gif$

Disallow

/kaupalliset/*.gIf$

Disallow

/kaupalliset/*.giF$

Disallow

/kaupalliset/*.gIF$

Disallow

/kaupalliset/*.GIf$

Disallow

/kaupalliset/*.GiF$

Disallow

/kaupalliset/*.GIF$

Other Records

Field	Value
sitemap	https://www.helsinginuutiset.fi/sitemap.xml

Field

Value

sitemap

https://www.helsinginuutiset.fi/sitemap.xml

Comments

Scraping is not allowed for training AI language models, or selling to AI companies
Amazon: used to improve/enable Alexa to answer questions
Anthropic/Claude: provides no documentation whether these are effective
Anthropic/Claude
Anthropic/Claude
ByteDance LLMs, including Doubao
ChatGPT crawler
ChatGPT plugins
Cohere: associated with Cohere's chatbot
Common Crawl
Diffbot: collects data to train LLMs
Facebook: crawls to improve language models
Google: Bard and Vertex AI generative APIs
ImagesiftBot: associated with a company that produces models for image generation
Meta
Omgilibot/webz.io: sells data for training LLMs
Perplexity AI
SuSea
Disable indexing of native ad images
Sitemap

Warnings

1 invalid line.

iltamakasiini.firobots.txt

Resource Scan

Scan Details

Last Scan

Groups

amazonbot

anthropic-ai

claudebot

claude-web

bytespider

gptbot

chatgpt-user

cohere-ai

ccbot

diffbot

facebookbot

google-extended

imagesiftbot

omgilibot

omgili

perplexitybot

youbot

googlebot

Other Records

Comments

Warnings

iltamakasiini.fi
robots.txt