spectai.net
robots.txt

Robots Exclusion Standard data for spectai.net

Resource Scan

Scan Details

Site Domain spectai.net
Base Domain spectai.net
Scan Status Ok
Last Scan2025-10-24T22:55:43+00:00
Next Scan 2025-10-25T22:55:43+00:00

Last Scan

Scanned2025-10-24T22:55:43+00:00
URL https://spectai.net/robots.txt
Domain IPs 104.21.39.63, 172.67.169.183, 2606:4700:3032::6815:273f, 2606:4700:3032::ac43:a9b7
Response IP 172.67.169.183
Found Yes
Hash 09f5217fd2c639609375bc9dea11b69256fad1c1cf08f4d8b36d82d4a128413d
SimHash 66310b5145f0

Groups

*

Rule Path
Allow /

amazonbot

Rule Path
Disallow /

applebot-extended

Rule Path
Disallow /

bytespider

Rule Path
Disallow /

ccbot

Rule Path
Disallow /

claudebot

Rule Path
Disallow /

google-extended

Rule Path
Disallow /

gptbot

Rule Path
Disallow /

meta-externalagent

Rule Path
Disallow /

adsbot-google
adsbot-google-mobile
adsbot-google-mobile-apps
adidxbot
applebot
applenewsbot
baiduspider
baiduspider-image
baiduspider-news
baiduspider-video
bingbot
bingpreview
bublupbot
ccbot
cliqzbot
coccoc
coccocbot-image
coccocbot-web
daumoa
dazoobot
deusu
duckduckbot
duckduckgo-favicons-bot
euripbot
exploratodo
facebot
feedly
findxbot
gooblog
googlebot
googlebot-image
googlebot-mobile
googlebot-news
googlebot-video
haosouspider
ichiro
istellabot
jikespider
lycos
mail.ru
mediapartners-google
mojeekbot
msnbot
msnbot-media
orangebot
pinterest
plukkie
qwantify
rambler
seznambot
sosospider
slurp
sogou blog
sogou inst spider
sogou news spider
sogou orion spider
sogou spider2
sogou web spider
sputnikbot
teoma
twitterbot
wotbox
yacybot
yandex
yandexmobilebot
yeti
yioopbot
yoozbot
youdaobot

Rule Path
Disallow

*

Rule Path
Disallow /

Comments

  • As a condition of accessing this website, you agree to abide by the following
  • content signals:
  • (a) If a content-signal = yes, you may collect content for the corresponding
  • use.
  • (b) If a content-signal = no, you may not collect content for the
  • corresponding use.
  • (c) If the website operator does not include a content signal for a
  • corresponding use, the website operator neither grants nor restricts
  • permission via content signal with respect to the corresponding use.
  • The content signals and their meanings are:
  • search: building a search index and providing search results (e.g., returning
  • hyperlinks and short excerpts from your website's contents). Search does not
  • include providing AI-generated search summaries.
  • ai-input: inputting content into one or more AI models (e.g., retrieval
  • augmented generation, grounding, or other real-time taking of content for
  • generative AI search answers).
  • ai-train: training or fine-tuning AI models.
  • ANY RESTRICTIONS EXPRESSED VIA CONTENT SIGNALS ARE EXPRESS RESERVATIONS OF
  • RIGHTS UNDER ARTICLE 4 OF THE EUROPEAN UNION DIRECTIVE 2019/790 ON COPYRIGHT
  • AND RELATED RIGHTS IN THE DIGITAL SINGLE MARKET.
  • BEGIN Cloudflare Managed content
  • END Cloudflare Managed Content
  • ROBOTS.TXT
  • Alphabetically ordered whitelisting of legitimate web robots, which obey the
  • Robots Exclusion Standard (robots.txt). Each bot is shortly described in a
  • comment above the (list of) user-agent(s). Comment out or delete lines which
  • contain User-agents you do not wish to allow on your website.
  • Important: Blank lines are not allowed in the final robots.txt file!
  • Updates can be retrieved from: https://www.ditig.com/robots-txt-template
  • This document is licensed with a CC BY-NC-SA 4.0 license.
  • Last update: 2021-11-04
  • so.com chinese search engine
  • google.com landing page quality checks
  • google.com app resource fetcher
  • bing ads bot
  • apple.com search engine
  • baidu.com chinese search engine
  • bing.com international search engine
  • bublup.com suggestion/search engine
  • commoncrawl.org open repository of web crawl data
  • cliqz.com german in-product search engine
  • coccoc.com vietnamese search engine
  • daum.net korean search engine
  • dazoo.fr french search engine
  • deusu.de german search engine
  • duckduckgo.com international privacy search engine
  • eurip.com european search engine
  • exploratodo.com latin search engine
  • facebook.com social network
  • feedly.com feed fetcher
  • findx.com european search engine
  • goo.ne.jp japanese search engine
  • google.com international search engine
  • so.com chinese search engine
  • goo.ne.jp japanese search engine
  • istella.it italian search engine
  • jike.com / chinaso.com chinese search engine
  • lycos.com & hotbot.com international search engine
  • mail.ru russian search engine
  • google.com adsense bot
  • mojeek.com search engine
  • bing.com international search engine
  • orange.com international search engine
  • pinterest.com social networtk
  • botje.nl dutch search engine
  • qwant.com french search engine
  • rambler.ru russian search engine
  • seznam.cz czech search engine
  • soso.com chinese search engine
  • yahoo.com international search engine
  • sogou.com chinese search engine
  • sputnik.ru russian search engine
  • ask.com international search engine
  • twitter.com bot
  • wotbox.com international search engine
  • yacy.net p2p search software
  • yandex.com russian search engine
  • search.naver.com south korean search engine
  • yioop.com international search engine
  • yooz.ir iranian search engine
  • youdao.com chinese search engine
  • crawling rule(s) for above bots
  • disallow all other bots

Warnings

  • 3 invalid lines.
  • `content-signal` is not a known field.