sciencephoto.com
robots.txt

Robots Exclusion Standard data for sciencephoto.com

Resource Scan

Scan Details

Site Domain sciencephoto.com
Base Domain sciencephoto.com
Scan Status Ok
Last Scan2024-10-18T22:59:21+00:00
Next Scan 2024-11-17T22:59:21+00:00

Last Scan

Scanned2024-10-18T22:59:21+00:00
URL https://sciencephoto.com/robots.txt
Redirect https://www.sciencephoto.com/robots.txt
Redirect Domain www.sciencephoto.com
Redirect Base sciencephoto.com
Domain IPs 144.76.242.34
Redirect IPs 144.76.242.34
Response IP 144.76.242.34
Found Yes
Hash 35b82f3e1a566354b4158d576144f4c826b8ba6d5a964b45f8d25f9ef2d4ed75
SimHash 139a4fd06ca5

Groups

*

Rule Path
Disallow /admin
Disallow /api
Disallow /cms
Disallow /education/basket/
Disallow /login
Disallow /media/*/download
Disallow /ping
Disallow /public/basket/
Disallow /public/login/
Disallow /sales
Disallow /sciencephoto/
Disallow /user
Disallow /_assets/

Other Records

Field Value
crawl-delay 1

turnitinbot

Rule Path
Disallow /category
Disallow /keyword
Disallow /login

adbeat_bot
adsbot
ahc
ahrefsbot
aihitbot
aiohttp
amazonadbot
amazonbot
anthropic-ai
applebot-extended
awariobot
awariorssbot
awariosmartbot
barkrowler
blexbot
brandverity
buck
ccbot
chatglm-spider
chatgpt-user
cincraw
cirrusexplorer
claudebot
claude-web
cohere-ai
criteobot/0.1
crystalsemantics
dataforseobot
daum
dataprovider
deepcrawl
diffbot
domcopbot
dotbot
duckassistbot
duckduckbot
ev-crawler
exabot
experiancrawluk
ezoicbot
facebookbot
genai
gptbot
go-http-client
google-extended
grapeshot
httrack
iaskbot
img2dataset
imagesiftbot
lcc
llm-jp-crawler
linespider
ltx71 - (http://ltx71.com/)
magellan
magpie-crawler
mail.ru_bot
mauibot
megaindex
meta-externalagent
metajobbot
mj12bot
neevabot
netpeakcheckerbot
oai-searchbot
omgili
omgilibot
owler
panscient.com
perplexitybot
petalbot
piplbot
proximic
rainbot
riddler
rogerbot
scrapy
screaming frog seo spider
semanticbot
semanticscholarbot
semrushbot
semrushbot-ba
semrushbot-coub
semrushbot-ct
semrushbot-si
semrushbot-swa
sentibot
serpstatbot
seekportbot
seokicks
siteauditbot
sitecheckerbotcrawler
splitsignalbot
stormcrawler
surdotlybot
the knowledge ai
timpibot
trendictionbot
velenpublicwebcrawler
webzio
wpbot
webprosbot
wellknownbot
wrtnbot
xovibot
yak
yaosoubot
yepbot
yeti
yisouspider
youbot
zoominfobot

Rule Path
Disallow /

bytespider

Rule Path
Disallow /

sogou web spider

Rule Path
Disallow /

linkchecker

Rule Path
Allow /
Disallow /education/basket/

Other Records

Field Value
sitemap https://www.sciencephoto.com/sitemap.xml

Comments

  • config for _all_ crawlers
  • please keep in alphabetic order so it's easy to find things