cookpad.com
robots.txt

Robots Exclusion Standard data for cookpad.com

Archived Snapshots

Resource Scan

Scan Details

Site Domain	cookpad.com
Base Domain	cookpad.com
Scan Status	Ok
Last Scan	2024-11-13T22:13:44+00:00
Next Scan	2024-11-20T22:13:44+00:00

Last Scan

Scanned	2024-11-13T22:13:44+00:00
URL	https://cookpad.com/robots.txt
Domain IPs	151.101.1.55, 151.101.129.55, 151.101.193.55, 151.101.65.55, 2a04:4e42:200::311, 2a04:4e42:400::311, 2a04:4e42:600::311, 2a04:4e42::311
Response IP	151.101.1.55
Found	Yes
Hash	5a9479eacb4788d2315b551762103f7695e45daad9da79f52602fb2a569b007b
SimHash	f81991dbc726

Groups

*

Rule	Path
Disallow	/user/confirm_premium_navi
Allow	/

Rule

Path

Disallow

/user/confirm_premium_navi

Allow

baiduspider

Rule	Path
Allow	/cn
Disallow	/?_pxhc=
Disallow	/cn/users
Disallow	/

Rule

Path

Allow

/cn

Disallow

/*?_pxhc=*

Disallow

/cn/users

Disallow

yandex

Rule	Path
Allow	/
Disallow	/*/accounts/new

Rule

Path

Allow

Disallow

/*/accounts/new

gptbot

Rule	Path
Disallow	/

Rule

Path

Disallow

chatgpt-user

Rule	Path
Disallow	/

Rule

Path

Disallow

ccbot

Rule	Path
Disallow	/

Rule

Path

Disallow

anthopic-ai

Rule	Path
Disallow	/

Rule

Path

Disallow

claude-web

Rule	Path
Disallow	/

Rule

Path

Disallow

claudebot

Rule	Path
Disallow	/

Rule

Path

Disallow

cohere-ai

Rule	Path
Disallow	/

Rule

Path

Disallow

omgilibot

Rule	Path
Disallow	/

Rule

Path

Disallow

omgili

Rule	Path
Disallow	/

Rule

Path

Disallow

diffbot

Rule	Path
Disallow	/

Rule

Path

Disallow

facebookbot

Rule	Path
Disallow	/

Rule

Path

Disallow

meta-externalagent

Rule	Path
Disallow	/

Rule

Path

Disallow

imagesiftbot

Rule	Path
Disallow	/

Rule

Path

Disallow

perplexitybot

Rule	Path
Disallow	/

Rule

Path

Disallow

bytespider

Rule	Path
Disallow	/

Rule

Path

Disallow

peer39_crawler

Rule	Path
Disallow	/

Rule

Path

Disallow

peer39_crawler/1.0

Rule	Path
Disallow	/

Rule

Path

Disallow

timpibot

Rule	Path
Disallow	/

Rule

Path

Disallow

google-extended

Rule	Path
Disallow	/

Rule

Path

Disallow

applebot-extended

Rule	Path
Disallow	/

Rule

Path

Disallow

facebookbot

Rule	Path
Disallow	/

Rule

Path

Disallow

Comments

See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
See below for how Clean-param works for Yandex crawler
https://yandex.ru/support/webmaster/robot-workings/clean-param.html?lang=en
OpenAI Crawler
OpenAI Plugin Bot
Block CCBot (used to create training datasets)
Anthropic AI bots
Enterprise LLM
Generates LLM datasets
Default UA for a data scraping tool
https://developers.facebook.com/docs/sharing/bot/
https://developers.facebook.com/docs/sharing/webmasters/web-crawlers
Claims to be reverse image search, but is part of
training dataset generator for https://hivemoderation.com
LLM Search
TikTok generative LLM scraper
Advertising tool / LLM
AI Data Scraper
https://darkvisitors.com/agents/timpibot
The following do not impact search results or functionality,
but do tell the companies and bots in question
not to add crawled content to LLM datasets.
https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers#google-extended
https://support.apple.com/en-us/119829
https://developers.facebook.com/docs/sharing/bot

Warnings

`clean-param` is not a known field.

cookpad.comrobots.txt

Resource Scan

Scan Details

Last Scan

Groups

*

baiduspider

yandex

gptbot

chatgpt-user

ccbot

anthopic-ai

claude-web

claudebot

cohere-ai

omgilibot

omgili

diffbot

facebookbot

meta-externalagent

imagesiftbot

perplexitybot

bytespider

peer39_crawler

peer39_crawler/1.0

timpibot

google-extended

applebot-extended

facebookbot

Comments

Warnings

cookpad.com
robots.txt