planetgong.co.uk
robots.txt

Robots Exclusion Standard data for planetgong.co.uk

Resource Scan

Scan Details

Site Domain planetgong.co.uk
Base Domain planetgong.co.uk
Scan Status Ok
Last Scan2024-11-09T20:04:52+00:00
Next Scan 2024-11-16T20:04:52+00:00

Last Scan

Scanned2024-11-09T20:04:52+00:00
URL https://planetgong.co.uk/robots.txt
Domain IPs 35.214.119.107
Response IP 35.214.119.107
Found Yes
Hash 95815de5aa1865ba499af4a5cbc39cf491363004e9c09d5536c61945f6075049
SimHash 147a5151ae22

Groups

archive.org_bot
heritrix
ia_archiver
ia_archiver-web.archive.org
mastodon
uptimebot.org
uptimerobot

Rule Path
Allow /

chatgpt-user
duckassistbot
meta-externalfetcher
ai2bot
anthropic-ai
bytespider
ccbot
claudebot
claude-web
cohere-ai
dataprovider.com
dcrawl
diffbot
facebookbot
google-extended
gptbot
httrack
httrack 3.0
meta-externalagent
metainspector
newspaper
nutch
offlineexplorer
omgili
scrapy
simplescraper
timpibot
webzio-extended
perplexitybot
youbot
linkedinbot
mail.ru_bot
pinterestbot
twitterbot
whatsapp
aihitbot
anderspinkbot
webzio
aisearchbot
bot-pge.chlooe.com
bot.araturka.com
emailcollector
emailsiphon
emailwolf
facebot
megaindex.ru
omgilibot
pinterest
pr-cy.ru
qqdownload
slackbot-linkexpanding
tencenttraveler

Product Comment
chatgpt-user https://openai.com/chatgpt
ai2bot Ai2 https://allenai.org/
anthropic-ai Anthropic
bytespider ByteDance (owns Tik-Tok)
ccbot Common Crawl https://commoncrawl.org/faq/
claudebot Anthropic
claude-web Anthropic
dataprovider.com 'summariser'
diffbot https://www.diffbot.com/products/crawl/
google-extended Bard, Gemini?, Vertex AI
gptbot GPT
meta-externalagent Meta
offlineexplorer MetaProducts
omgili Webz.io
timpibot Timpi
webzio-extended Webz.io
perplexitybot https://www.perplexity.ai/
youbot you.com/
anderspinkbot https://anderspink.com/
webzio Webz.io
bot-pge.chlooe.com 2024-03-16 non-secure site
bot.araturka.com 2024-03-16
facebot Meta + Facebook
qqdownload Tencent China
tencenttraveler China
Rule Path
Disallow /

*

Rule Path
Disallow /wp-login.php
Disallow */menus/
Disallow */styles/
Disallow /zzz/
Disallow */zzz/
Disallow *.php
Disallow *.re.shtml
Disallow /archives/lyrics/songs/*.txt
Allow /archives/lyrics/songs/a-z.shtml
Disallow /archives/tabs/tunes/*.txt
Allow /archives/tabs/tunes/a-z.shtml
Disallow /av/
Disallow /bazaar/a-list.shtml
Disallow /bazaar/brief.shtml
Disallow /bazaar/badges/
Disallow /bazaar/books/*.html
Disallow /bazaar/cd/*.html
Disallow /bazaar/dvd/*.html
Disallow /bazaar/postcards/
Disallow /bazaar/posters/
Disallow /bazaar/tape/*.html
Disallow /bazaar/threads/
Disallow /bazaar/vinyl/*.html
Disallow /bits/
Disallow /cgi-bin/
Disallow /digital/a-list.shtml
Disallow /digital/brief.html
Disallow /digital/linkloki/
Disallow /digital/logs/
Disallow /digital/music/*.html
Disallow /digital/posters/*.html
Disallow /digital/ringtones/*.html
Disallow /digital/words/*.html
Disallow /gigs/briefs/
Disallow /gigs/agenda.shtml
Disallow /gigs/gignet.shtml
Disallow /gigs/time-machine.shtml
Disallow /graphics/
Disallow /headers/
Disallow /images/
Disallow /news/a-list.html
Disallow /news/brief.shtml
Disallow /newsletter/
Disallow /outland/forum/
Disallow /outland/accessibility.shtml
Disallow /outland/cookies.html
Disallow /outland/cookies.shtml
Disallow /outland/privacy.shtml
Disallow /tail.html

Other Records

Field Value
sitemap https://planetgong.co.uk/sitemap.xml

Comments

  • User-agent not case sensitive
  • urls case sensitive
  • Noindex ?
  • allow wayback machine, mastodon, uptime robot
  • allow DDG ?
  • User-agent: DuckDuckBot
  • User-agent: DuckAssistBot
  • Allow: /
  • disallow (https://darkvisitors.com/agents/)
  • AI Assistants
  • AI Data Scrapers
  • User-agent: Applebot-Extended # Apple Intelligence
  • AI Search Crawlers
  • User-agent: Amazonbot (Alexa)
  • User-agent: Applebot (Siri)
  • Fetchers
  • User-agent: facebookexternalhit (Meta + Facebook previews)
  • Intelligence Gatherers
  • Uncategorised or unknown
  • User-agent: Download Ninja
  • User-agent: Exabot
  • User Agent: LLaMA Meta AI title?
  • User Agent: LLaMA 2 Meta AI title?
  • User-agent: YandexImages
  • User-agent: YandexMobileBot
  • User-agent: YandexRenderResourcesBot
  • User-agent: YandexVideo
  • Undocumented AI Agents (see above)
  • User-agent: anthropic-ai
  • User-agent: cohere-ai
  • User-agent: Claude-Web
  • go no-go zones

Warnings

  • 4 invalid lines.
  • `iser-agent` is not a known field.