cookpad.jp
robots.txt

Robots Exclusion Standard data for cookpad.jp

Resource Scan

Scan Details

Site Domain cookpad.jp
Base Domain cookpad.jp
Scan Status Ok
Last Scan2024-09-17T11:37:27+00:00
Next Scan 2024-09-24T11:37:27+00:00

Last Scan

Scanned2024-09-17T11:37:27+00:00
URL https://cookpad.jp/robots.txt
Redirect https://cookpad.com/robots.txt
Redirect Domain cookpad.com
Redirect Base cookpad.com
Domain IPs 13.33.30.13, 13.33.30.24, 13.33.30.27, 13.33.30.40, 2600:9000:229f:1200:15:22c0:77c0:93a1, 2600:9000:229f:3a00:15:22c0:77c0:93a1, 2600:9000:229f:4800:15:22c0:77c0:93a1, 2600:9000:229f:600:15:22c0:77c0:93a1, 2600:9000:229f:8400:15:22c0:77c0:93a1, 2600:9000:229f:9000:15:22c0:77c0:93a1, 2600:9000:229f:bc00:15:22c0:77c0:93a1, 2600:9000:229f:fe00:15:22c0:77c0:93a1
Redirect IPs 151.101.1.55, 151.101.129.55, 151.101.193.55, 151.101.65.55, 2a04:4e42:200::311, 2a04:4e42:400::311, 2a04:4e42:600::311, 2a04:4e42::311
Response IP 151.101.193.55
Found Yes
Hash 5a9479eacb4788d2315b551762103f7695e45daad9da79f52602fb2a569b007b
SimHash f81991dbc726

Groups

*

Rule Path
Disallow /user/confirm_premium_navi
Allow /

baiduspider

Rule Path
Allow /cn
Disallow /*?_pxhc=*
Disallow /cn/users
Disallow /

yandex

Rule Path
Allow /
Disallow /*/accounts/new

gptbot

Rule Path
Disallow /

chatgpt-user

Rule Path
Disallow /

ccbot

Rule Path
Disallow /

anthopic-ai

Rule Path
Disallow /

claude-web

Rule Path
Disallow /

claudebot

Rule Path
Disallow /

cohere-ai

Rule Path
Disallow /

omgilibot

Rule Path
Disallow /

omgili

Rule Path
Disallow /

diffbot

Rule Path
Disallow /

facebookbot

Rule Path
Disallow /

meta-externalagent

Rule Path
Disallow /

imagesiftbot

Rule Path
Disallow /

perplexitybot

Rule Path
Disallow /

bytespider

Rule Path
Disallow /

peer39_crawler

Rule Path
Disallow /

peer39_crawler/1.0

Rule Path
Disallow /

timpibot

Rule Path
Disallow /

google-extended

Rule Path
Disallow /

applebot-extended

Rule Path
Disallow /

facebookbot

Rule Path
Disallow /

Comments

  • See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
  • See below for how Clean-param works for Yandex crawler
  • https://yandex.ru/support/webmaster/robot-workings/clean-param.html?lang=en
  • OpenAI Crawler
  • OpenAI Plugin Bot
  • Block CCBot (used to create training datasets)
  • Anthropic AI bots
  • Enterprise LLM
  • Generates LLM datasets
  • Default UA for a data scraping tool
  • https://developers.facebook.com/docs/sharing/bot/
  • https://developers.facebook.com/docs/sharing/webmasters/web-crawlers
  • Claims to be reverse image search, but is part of
  • training dataset generator for https://hivemoderation.com
  • LLM Search
  • TikTok generative LLM scraper
  • Advertising tool / LLM
  • AI Data Scraper
  • https://darkvisitors.com/agents/timpibot
  • The following do not impact search results or functionality,
  • but do tell the companies and bots in question
  • not to add crawled content to LLM datasets.
  • https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers#google-extended
  • https://support.apple.com/en-us/119829
  • https://developers.facebook.com/docs/sharing/bot

Warnings

  • `clean-param` is not a known field.