aarhus.lokalavisen.dk
robots.txt

Robots Exclusion Standard data for aarhus.lokalavisen.dk

Resource Scan

Scan Details

Site Domain aarhus.lokalavisen.dk
Base Domain lokalavisen.dk
Scan Status Ok
Last Scan2024-05-08T20:16:25+00:00
Next Scan 2024-05-15T20:16:25+00:00

Last Scan

Scanned2024-05-08T20:16:25+00:00
URL https://aarhus.lokalavisen.dk/robots.txt
Domain IPs 18.154.144.101, 18.154.144.40, 18.154.144.55, 18.154.144.68
Response IP 18.165.171.2
Found Yes
Hash 364215de554287f263082554cae20d646fd16416c712bff06882a453866a5082
SimHash 3033181487c7

Groups

*

Rule Path
Disallow /soeg/*

ccbot

Rule Path
Disallow /

gptbot

Rule Path
Disallow /

chatgpt-user

Rule Path
Disallow /

anthropic-ai

Rule Path
Disallow /

Other Records

Field Value
sitemap https://aarhus.lokalavisen.dk/sitemapindex.xml

Comments

  • AI crawler reference
  • The link below provides instructions to what kind of content can be used to train AI models on this website
  • https://aarhus.lokalavisen.dk/ai.txt
  • Common crawl
  • OpenAI (ChatGPT)
  • OpenAI (ChatGPT realtime search)
  • Anthropic