sisasuomenlehti.fi
robots.txt

Robots Exclusion Standard data for sisasuomenlehti.fi

Resource Scan

Scan Details

Site Domain sisasuomenlehti.fi
Base Domain sisasuomenlehti.fi
Scan Status Ok
Last Scan2024-11-12T23:45:05+00:00
Next Scan 2024-11-19T23:45:05+00:00

Last Scan

Scanned2024-11-12T23:45:05+00:00
URL https://sisasuomenlehti.fi/robots.txt
Redirect https://www.sisasuomenlehti.fi:443/robots.txt
Redirect Domain www.sisasuomenlehti.fi
Redirect Base sisasuomenlehti.fi
Domain IPs 54.246.245.212
Redirect IPs 65.9.112.15, 65.9.112.17, 65.9.112.34, 65.9.112.35
Response IP 65.9.112.17
Found Yes
Hash e963e2baadd79376d1625d0ca45e08833e25f0ef2e55bbd4df735419d7e4dd2b
SimHash 6230f15d3510

Groups

amazonbot

Rule Path
Disallow /

anthropic-ai

Rule Path
Disallow /

claudebot

Rule Path
Disallow /

claude-web

Rule Path
Disallow /

bytespider

Rule Path
Disallow /

gptbot

Rule Path
Disallow /

chatgpt-user

Rule Path
Disallow /

cohere-ai

Rule Path
Disallow /

ccbot

Rule Path
Disallow /

diffbot

Rule Path
Disallow /

facebookbot

Rule Path
Disallow /

google-extended

Rule Path
Disallow /

imagesiftbot

Rule Path
Disallow /

meta-externalagent

Rule Path
Disallow /

omgilibot

Rule Path
Disallow /

omgili

Rule Path
Disallow /

oai-searchbot

Rule Path
Disallow /

perplexitybot

Rule Path
Disallow /

youbot

Rule Path
Disallow /

googlebot

Rule Path
Disallow /kaupalliset/*.jpg$
Disallow /kaupalliset/*.Jpg$
Disallow /kaupalliset/*.jPg$
Disallow /kaupalliset/*.jpG$
Disallow /kaupalliset/*.jPG$
Disallow /kaupalliset/*.JPg$
Disallow /kaupalliset/*.JpG$
Disallow /kaupalliset/*.JPG$
Disallow /kaupalliset/*.png$
Disallow /kaupalliset/*.Png$
Disallow /kaupalliset/*.pNg$
Disallow /kaupalliset/*.pnG$
Disallow /kaupalliset/*.pNG$
Disallow /kaupalliset/*.PNg$
Disallow /kaupalliset/*.PnG$
Disallow /kaupalliset/*.PNG$
Disallow /kaupalliset/*.gif$
Disallow /kaupalliset/*.Gif$
Disallow /kaupalliset/*.gIf$
Disallow /kaupalliset/*.giF$
Disallow /kaupalliset/*.gIF$
Disallow /kaupalliset/*.GIf$
Disallow /kaupalliset/*.GiF$
Disallow /kaupalliset/*.GIF$

Other Records

Field Value
sitemap https://www.sisasuomenlehti.fi/sitemap.xml

Comments

  • Scraping is not allowed for training AI language models, or selling to AI companies
  • Amazon: used to improve/enable Alexa to answer questions
  • Anthropic/Claude: provides no documentation whether these are effective
  • Anthropic/Claude
  • Anthropic/Claude
  • ByteDance LLMs, including Doubao
  • ChatGPT crawler
  • ChatGPT plugins
  • Cohere: associated with Cohere's chatbot
  • Common Crawl
  • Diffbot: collects data to train LLMs
  • Facebook: crawls to improve language models
  • Google: Bard and Vertex AI generative APIs
  • ImagesiftBot: associated with a company that produces models for image generation
  • Meta
  • Omgilibot/webz.io: sells data for training LLMs
  • OpenAI Search
  • Perplexity AI
  • SuSea
  • Disable indexing of native ad images
  • Sitemap