scup.org
robots.txt

Robots Exclusion Standard data for scup.org

Archived Snapshots

Resource Scan

Scan Details

Site Domain	scup.org
Base Domain	scup.org
Scan Status	Ok
Last Scan	2024-09-18T17:13:03+00:00
Next Scan	2024-10-18T17:13:03+00:00

Last Scan

Scanned	2024-09-18T17:13:03+00:00
URL	https://scup.org/robots.txt
Redirect	https://www.scup.org/robots.txt
Redirect Domain	www.scup.org
Redirect Base	scup.org
Domain IPs	23.185.0.2, 2620:12a:8000::2, 2620:12a:8001::2
Redirect IPs	23.185.0.2, 2620:12a:8000::2, 2620:12a:8001::2
Response IP	23.185.0.2
Found	Yes
Hash	0dcad598b6c597b7c1f4773b60c72967932d277ab58effe1ab9a455e85e8bd94
SimHash	1a5ed3c48662

Groups

*

Rule	Path
Disallow	/*.pdf$
Disallow	/*.zip$
Disallow	/*.mp3$

Rule

Path

Disallow

/*.pdf$

Disallow

/*.zip$

Disallow

/*.mp3$

ccbot

Rule	Path
Disallow	/

Rule

Path

Disallow

img2dataset

Rule	Path
Disallow	/

Rule

Path

Disallow

gptbot

Rule	Path
Disallow	/

Rule

Path

Disallow

chatgpt-user

Rule	Path
Disallow	/

Rule

Path

Disallow

google-extended

Rule	Path
Disallow	/

Rule

Path

Disallow

anthropic-ai

Rule	Path
Disallow	/

Rule

Path

Disallow

claude-web

Rule	Path
Disallow	/

Rule

Path

Disallow

omgilibot

Rule	Path
Disallow	/

Rule

Path

Disallow

omgili

Rule	Path
Disallow	/

Rule

Path

Disallow

facebookbot

Rule	Path
Disallow	/

Rule

Path

Disallow

bytespider

Rule	Path
Disallow	/

Rule

Path

Disallow

magpie-crawler

Rule	Path
Disallow	/

Rule

Path

Disallow

applebot-extended

Rule	Path
Disallow	/private/

Rule

Path

Disallow

/private/

Other Records

Field	Value
sitemap	https://www.scup.org/sitemap_index.xml

Field

Value

sitemap

https://www.scup.org/sitemap_index.xml

Comments

START YOAST BLOCK
---------------------------
---------------------------
END YOAST BLOCK
The Common Crawl dataset. Original source for GPT and others.
The example for img2dataset, although the default is *None*
GPTBot is OpenAI's web crawler
ChatGPT-User takes direct actions on behalf of ChatGPT users
Google's Bard and Vertex AI generative APIs
Speculative blocks for Anthropic
webz.io - they sell data for training LLMs.
Meta's bot that crawls public web pages to improve language models
ByteDance's bot used to gather data for their LLMs, including Doubao.
Brandwatch - "AI to discover new trends"
Apple

scup.orgrobots.txt

Resource Scan

Scan Details

Last Scan

Groups

*

ccbot

img2dataset

gptbot

chatgpt-user

google-extended

anthropic-ai

claude-web

omgilibot

omgili

facebookbot

bytespider

magpie-crawler

applebot-extended

Other Records

Comments

scup.org
robots.txt