opencitations.net
robots.txt

Robots Exclusion Standard data for opencitations.net

Resource Scan

Scan Details

Site Domain opencitations.net
Base Domain opencitations.net
Scan Status Ok
Last Scan2024-09-21T06:02:13+00:00
Next Scan 2024-10-21T06:02:13+00:00

Last Scan

Scanned2024-09-21T06:02:13+00:00
URL https://opencitations.net/robots.txt
Domain IPs 130.136.130.1
Response IP 130.136.130.1
Found Yes
Hash 273a529ead164147ee67d32b031654eff8aef43c0ddde7701fbe3644347a4d75
SimHash fc145583f2f9

Groups

crawler
spider
bot
yahoo! slurp
bubing
adsbot-google
adsbot-google-mobile-apps
adidxbot
applebot
applenewsbot
baiduspider
baiduspider-image
bingbot
bingpreview
ccbot
cliqzbot
coccoc
coccocbot-image
coccocbot-web
daumoa
dazoobot
deusu
duckduckbot
duckduckgo-favicons-bot
euripbot
exploratodo
facebot
feedly
findxbot
googlebot
googlebot-image
googlebot-mobile
googlebot-news
googlebot-video
haosouspider
ichiro
istellabot
jikespider
lycos
mail.ru
mediapartners-google
mojeekbot
msnbot
msnbot-media
orangebot
pinterest
plukkie
qwantify
rambler
seznambot
sosospider
slurp
sogou blog
sogou inst spider
sogou news spider
sogou orion spider
sogou spider2
sogou web spider
sputnikbot
teoma
twitterbot
wotbox
yacybot
yandex
yandexmobilebot
yeti
yioopbot
yoozbot
youdaobot
ahrefsbot
dotbot
semanticscholarbot
blexbot
mb2345browser
liebaofast
mqqbrowser
ucbrowser
aspiegelbot
petalbot

Rule Path
Disallow /corpus/
Disallow /virtual/
Disallow /index/coci/

semrushbot
semrushbot-sa

Rule Path
Disallow /

Warnings

  • 3 invalid lines.