cromwell-intl.com
robots.txt

Robots Exclusion Standard data for cromwell-intl.com

Archived Snapshots

Resource Scan

Scan Details

Site Domain	cromwell-intl.com
Base Domain	cromwell-intl.com
Scan Status	Ok
Last Scan	2024-09-29T21:00:39+00:00
Next Scan	2024-10-06T21:00:39+00:00

Last Scan

Scanned	2024-09-29T21:00:39+00:00
URL	https://cromwell-intl.com/robots.txt
Domain IPs	35.203.182.32
Response IP	35.203.182.32
Found	Yes
Hash	1f12f87c3fea33664bbcd9682cb2ac992f32dcfbb72463b80ea2a9c3b62209fb
SimHash	1316d371cc72

Groups

*

Rule	Path
Disallow	/roots/
Disallow	/tcpip/class-a-nets.html
Disallow	/cybersecurity/attack-study/analysis-01.html
Disallow	/cybersecurity/attack-study/analysis-02.html
Disallow	/cybersecurity/attack-study/analysis-03.html
Disallow	/cybersecurity/attack-study/analysis-04.html
Disallow	/cybersecurity/attack-study/analysis-05.html
Disallow	/cybersecurity/attack-study/analysis-06.html
Disallow	/cybersecurity/attack-study/analysis-07.html
Disallow	/cybersecurity/attack-study/analysis-08.html
Disallow	/cybersecurity/attack-study/analysis-09.html
Disallow	/cybersecurity/attack-study/analysis-10.html
Disallow	/cybersecurity/attack-study/analysis-11.html
Disallow	/cybersecurity/attack-study/analysis-12.html
Disallow	/cybersecurity/attack-study/botnet-log-1-c193.html
Disallow	/cybersecurity/attack-study/botnet-log-1-i192.html
Disallow	/cybersecurity/attack-study/botnet-log-2-c193.html
Disallow	/cybersecurity/attack-study/botnet-log-2-i192.html
Disallow	/cybersecurity/attack-study/botnet-log.html

Rule

Path

Disallow

/roots/

Disallow

/tcpip/class-a-nets.html

Disallow

/cybersecurity/attack-study/analysis-01.html

Disallow

/cybersecurity/attack-study/analysis-02.html

Disallow

/cybersecurity/attack-study/analysis-03.html

Disallow

/cybersecurity/attack-study/analysis-04.html

Disallow

/cybersecurity/attack-study/analysis-05.html

Disallow

/cybersecurity/attack-study/analysis-06.html

Disallow

/cybersecurity/attack-study/analysis-07.html

Disallow

/cybersecurity/attack-study/analysis-08.html

Disallow

/cybersecurity/attack-study/analysis-09.html

Disallow

/cybersecurity/attack-study/analysis-10.html

Disallow

/cybersecurity/attack-study/analysis-11.html

Disallow

/cybersecurity/attack-study/analysis-12.html

Disallow

/cybersecurity/attack-study/botnet-log-1-c193.html

Disallow

/cybersecurity/attack-study/botnet-log-1-i192.html

Disallow

/cybersecurity/attack-study/botnet-log-2-c193.html

Disallow

/cybersecurity/attack-study/botnet-log-2-i192.html

Disallow

/cybersecurity/attack-study/botnet-log.html

anthropic-ai

Rule	Path
Disallow	/

Rule

Path

Disallow

claude-web

Rule	Path
Disallow	/

Rule

Path

Disallow

claudebot

Rule	Path
Disallow	/

Rule

Path

Disallow

ccbot

Rule	Path
Disallow	/

Rule

Path

Disallow

img2dataset

Rule	Path
Disallow	/

Rule

Path

Disallow

gptbot

Rule	Path
Disallow	/

Rule

Path

Disallow

chatgpt-user

Rule	Path
Disallow	/

Rule

Path

Disallow

omgilibot

Rule	Path
Disallow	/

Rule

Path

Disallow

omgili

Rule	Path
Disallow	/

Rule

Path

Disallow

facebookbot

Rule	Path
Disallow	/

Rule

Path

Disallow

bytespider

Rule	Path
Disallow	/

Rule

Path

Disallow

magpie-crawler

Rule	Path
Disallow	/

Rule

Path

Disallow

applebot-extended

Rule	Path
Disallow	/

Rule

Path

Disallow

perplexitybot

Rule	Path
Disallow	/

Rule

Path

Disallow

Comments

Asking AI content scrapers to not scrape my content, from:
https://github.com/healsdata/ai-training-opt-out/blob/main/robots.txt
https://github.com/zcutlip/gen-ai-robots.txt/blob/main/robots.txt
However, it seems that blocking Google's AI scraper also excludes
a site from Google search results, or it soon will:
https://www.osnews.com/story/140536/google-to-websites-let-us-train-our-ai-on-your-content-or-well-remove-you-from-google-search/
ClaudeBot, Claude-Web, anthropic-ai = speculative blocks for Anthropic
CCBot = Common Crawl dataset, original source for GPT and others
The example for img2dataset, although the default is *None*
GPTBot = OpenAI's web crawler
ChatGPT-User takes direct actions on behalf of ChatGPT users
Google-Extended = Google's Bard and Vertex AI generative APIs
User-agent: Google-Extended
Disallow: /
Omgilibot, Omgili = webz.io = they sell data for training LLMs.
FacebookBot = Meta's bot that crawls public web pages
Bytespider = ByteDance's bot gathering data for their LLMs, including Doubao.
magpie-crawler = Brandwatch, "AI to discover new trends"
Apple's AI system
Perplexity AI
https://archive.is/22gCl (wired.com)
https://rknight.me/blog/perplexity-ai-is-lying-about-its-user-agent/

cromwell-intl.comrobots.txt

Resource Scan

Scan Details

Last Scan

Groups

*

anthropic-ai

claude-web

claudebot

ccbot

img2dataset

gptbot

chatgpt-user

omgilibot

omgili

facebookbot

bytespider

magpie-crawler

applebot-extended

perplexitybot

Comments

cromwell-intl.com
robots.txt