spectai.net
robots.txt

Robots Exclusion Standard data for spectai.net

Archived Snapshots

Resource Scan

Scan Details

Site Domain	spectai.net
Base Domain	spectai.net
Scan Status	Ok
Last Scan	2025-10-24T22:55:43+00:00
Next Scan	2025-10-25T22:55:43+00:00

Last Scan

Scanned	2025-10-24T22:55:43+00:00
URL	https://spectai.net/robots.txt
Domain IPs	104.21.39.63, 172.67.169.183, 2606:4700:3032::6815:273f, 2606:4700:3032::ac43:a9b7
Response IP	172.67.169.183
Found	Yes
Hash	09f5217fd2c639609375bc9dea11b69256fad1c1cf08f4d8b36d82d4a128413d
SimHash	66310b5145f0

Groups

*

Rule	Path
Allow	/

Rule

Path

Allow

/

amazonbot

Rule	Path
Disallow	/

Rule

Path

Disallow

/

applebot-extended

Rule	Path
Disallow	/

Rule

Path

Disallow

/

bytespider

Rule	Path
Disallow	/

Rule

Path

Disallow

/

ccbot

Rule	Path
Disallow	/

Rule

Path

Disallow

/

claudebot

Rule	Path
Disallow	/

Rule

Path

Disallow

/

google-extended

Rule	Path
Disallow	/

Rule

Path

Disallow

/

gptbot

Rule	Path
Disallow	/

Rule

Path

Disallow

/

meta-externalagent

Rule	Path
Disallow	/

Rule

Path

Disallow

/

adsbot-google
adsbot-google-mobile
adsbot-google-mobile-apps
adidxbot
applebot
applenewsbot
baiduspider
baiduspider-image
baiduspider-news
baiduspider-video
bingbot
bingpreview
bublupbot
ccbot
cliqzbot
coccoc
coccocbot-image
coccocbot-web
daumoa
dazoobot
deusu
duckduckbot
duckduckgo-favicons-bot
euripbot
exploratodo
facebot
feedly
findxbot
gooblog
googlebot
googlebot-image
googlebot-mobile
googlebot-news
googlebot-video
haosouspider
ichiro
istellabot
jikespider
lycos
mail.ru
mediapartners-google
mojeekbot
msnbot
msnbot-media
orangebot
pinterest
plukkie
qwantify
rambler
seznambot
sosospider
slurp
sogou blog
sogou inst spider
sogou news spider
sogou orion spider
sogou spider2
sogou web spider
sputnikbot
teoma
twitterbot
wotbox
yacybot
yandex
yandexmobilebot
yeti
yioopbot
yoozbot
youdaobot

Rule	Path
Disallow

Rule

Path

Disallow

*

Rule	Path
Disallow	/

Rule

Path

Disallow

/

Back to top

Comments

As a condition of accessing this website, you agree to abide by the following
content signals:
(a) If a content-signal = yes, you may collect content for the corresponding
use.
(b) If a content-signal = no, you may not collect content for the
corresponding use.
(c) If the website operator does not include a content signal for a
corresponding use, the website operator neither grants nor restricts
permission via content signal with respect to the corresponding use.
The content signals and their meanings are:
search: building a search index and providing search results (e.g., returning
hyperlinks and short excerpts from your website's contents). Search does not
include providing AI-generated search summaries.
ai-input: inputting content into one or more AI models (e.g., retrieval
augmented generation, grounding, or other real-time taking of content for
generative AI search answers).
ai-train: training or fine-tuning AI models.
ANY RESTRICTIONS EXPRESSED VIA CONTENT SIGNALS ARE EXPRESS RESERVATIONS OF
AND RELATED RIGHTS IN THE DIGITAL SINGLE MARKET.
BEGIN Cloudflare Managed content
END Cloudflare Managed Content
ROBOTS.TXT
Alphabetically ordered whitelisting of legitimate web robots, which obey the
Robots Exclusion Standard (robots.txt). Each bot is shortly described in a
comment above the (list of) user-agent(s). Comment out or delete lines which
contain User-agents you do not wish to allow on your website.
Important: Blank lines are not allowed in the final robots.txt file!
Updates can be retrieved from: https://www.ditig.com/robots-txt-template
This document is licensed with a CC BY-NC-SA 4.0 license.
Last update: 2021-11-04
so.com chinese search engine
google.com landing page quality checks
google.com app resource fetcher
bing ads bot
apple.com search engine
baidu.com chinese search engine
bing.com international search engine
bublup.com suggestion/search engine
commoncrawl.org open repository of web crawl data
cliqz.com german in-product search engine
coccoc.com vietnamese search engine
daum.net korean search engine
dazoo.fr french search engine
deusu.de german search engine
duckduckgo.com international privacy search engine
eurip.com european search engine
exploratodo.com latin search engine
facebook.com social network
feedly.com feed fetcher
findx.com european search engine
goo.ne.jp japanese search engine
google.com international search engine
so.com chinese search engine
goo.ne.jp japanese search engine
istella.it italian search engine
jike.com / chinaso.com chinese search engine
lycos.com & hotbot.com international search engine
mail.ru russian search engine
google.com adsense bot
mojeek.com search engine
bing.com international search engine
orange.com international search engine
pinterest.com social networtk
botje.nl dutch search engine
qwant.com french search engine
rambler.ru russian search engine
seznam.cz czech search engine
soso.com chinese search engine
yahoo.com international search engine
sogou.com chinese search engine
sputnik.ru russian search engine
ask.com international search engine
twitter.com bot
wotbox.com international search engine
yacy.net p2p search software
yandex.com russian search engine
search.naver.com south korean search engine
yioop.com international search engine
yooz.ir iranian search engine
youdao.com chinese search engine
crawling rule(s) for above bots
disallow all other bots

Back to top

Warnings

3 invalid lines.
`content-signal` is not a known field.

Back to top

spectai.netrobots.txt

Resource Scan

Scan Details

Last Scan

Groups

*

amazonbot

applebot-extended

bytespider

ccbot

claudebot

google-extended

gptbot

meta-externalagent

*

Comments

Warnings

spectai.net
robots.txt