occi-wg.org
robots.txt

Robots Exclusion Standard data for occi-wg.org

Archived Snapshots

Resource Scan

Scan Details

Site Domain	occi-wg.org
Base Domain	occi-wg.org
Scan Status	Ok
Last Scan	2025-08-08T03:30:29+00:00
Next Scan	2025-09-07T03:30:29+00:00

Last Scan

Scanned	2025-08-08T03:30:29+00:00
URL	https://occi-wg.org/robots.txt
Redirect	https://www.carlyscafe.com/robots.txt
Redirect Domain	www.carlyscafe.com
Redirect Base	carlyscafe.com
Domain IPs	104.18.18.15, 104.18.19.15, 2606:4700::6812:120f, 2606:4700::6812:130f
Redirect IPs	104.21.59.63, 172.67.216.159, 2606:4700:3034::6815:3b3f, 2606:4700:3036::ac43:d89f
Response IP	172.67.216.159
Found	Yes
Hash	9d54c43357613b3a0af9ba114c48ff631918752dfe2a534e2d33d5a6a970ca3c
SimHash	be1b78526b80

Groups

amazonbot

Rule	Path
Disallow	/

Rule

Path

Disallow

applebot-extended

Rule	Path
Disallow	/

Rule

Path

Disallow

bytespider

Rule	Path
Disallow	/

Rule

Path

Disallow

ccbot

Rule	Path
Disallow	/

Rule

Path

Disallow

claudebot

Rule	Path
Disallow	/

Rule

Path

Disallow

google-extended

Rule	Path
Disallow	/

Rule

Path

Disallow

gptbot

Rule	Path
Disallow	/

Rule

Path

Disallow

meta-externalagent

Rule	Path
Disallow	/

Rule

Path

Disallow

teleport

Rule	Path
Disallow	/

Rule

Path

Disallow

teleportpro

Rule	Path
Disallow	/

Rule

Path

Disallow

emailcollector

Rule	Path
Disallow	/

Rule

Path

Disallow

emailsiphon

Rule	Path
Disallow	/

Rule

Path

Disallow

webbandit

Rule	Path
Disallow	/

Rule

Path

Disallow

webzip

Rule	Path
Disallow	/

Rule

Path

Disallow

webreaper

Rule	Path
Disallow	/

Rule

Path

Disallow

webstripper

Rule	Path
Disallow	/

Rule

Path

Disallow

web downloader

Rule	Path
Disallow	/

Rule

Path

Disallow

ahrefsbot

Rule	Path
Disallow	/

Rule

Path

Disallow

semrushbot

Rule	Path
Disallow	/

Rule

Path

Disallow

mj12bot

Rule	Path
Disallow	/

Rule

Path

Disallow

webcopier

Rule	Path
Disallow	/

Rule

Path

Disallow

offline explorer pro

Rule	Path
Disallow	/

Rule

Path

Disallow

offline explorer

Rule	Path
Disallow	/

Rule

Path

Disallow

httrack website copier

Rule	Path
Disallow	/

Rule

Path

Disallow

offline commander

Rule	Path
Disallow	/

Rule

Path

Disallow

leech

Rule	Path
Disallow	/

Rule

Path

Disallow

websnake

Rule	Path
Disallow	/

Rule

Path

Disallow

blackwidow

Rule	Path
Disallow	/

Rule

Path

Disallow

http weazel

Rule	Path
Disallow	/

Rule

Path

Disallow

*

Rule	Path
Disallow	/wp-admin/
Disallow	/wp-includes/

Rule

Path

Disallow

/wp-admin/

Disallow

/wp-includes/

Other Records

Field	Value
sitemap	/sitemap_index.php

Field

Value

sitemap

/sitemap_index.php

Comments

NOTICE: The collection of content and other data on this
site through automated means, including any device, tool,
or process designed to data mine or scrape content, is
prohibited except (1) for the purpose of search engine indexing or
artificial intelligence retrieval augmented generation or (2) with express
written permission from this site’s operator.
To request permission to license our intellectual
property and/or other materials, please contact this
site’s operator directly.
BEGIN Cloudflare Managed content
END Cloudflare Managed Content

occi-wg.orgrobots.txt

Resource Scan

Scan Details

Last Scan

Groups

amazonbot

applebot-extended

bytespider

ccbot

claudebot

google-extended

gptbot

meta-externalagent

teleport

teleportpro

emailcollector

emailsiphon

webbandit

webzip

webreaper

webstripper

web downloader

ahrefsbot

semrushbot

mj12bot

webcopier

offline explorer pro

offline explorer

httrack website copier

offline commander

leech

websnake

blackwidow

http weazel

*

Other Records

Comments

occi-wg.org
robots.txt