nicebooks.com
robots.txt

Robots Exclusion Standard data for nicebooks.com

Resource Scan

Scan Details

Site Domain nicebooks.com
Base Domain nicebooks.com
Scan Status Ok
Last Scan2024-10-24T09:48:39+00:00
Next Scan 2024-11-23T09:48:39+00:00

Last Scan

Scanned2024-10-24T09:48:39+00:00
URL https://nicebooks.com/robots.txt
Domain IPs 194.163.174.117
Response IP 194.163.174.117
Found Yes
Hash 4f26be2360136384e1b6155151f0870bbee75a81592c9996496027f2781559bf
SimHash 0e519d51d3a1

Groups

*
adsbot-google
adsbot-google-mobile
yadirectfetcher
yandexaccessibilitybot
yandexadnet
yandexadditional
yandexadditionalbot
yandexblogs
yandexbot
yandexcalendar
yandexdirect
yandexdirectdyn
yandexfavicons
yandexfordomain
yandeximageresizer
yandeximages
yandexmarket
yandexmedia
yandexmetrika
yandexmobilebot
yandexmobilescreenshotbot
yandexnews
yandexontodb
yandexontodbapi
yandexpagechecker
yandexpartner
yandexrca
yandexrenderresourcesbot
yandexscreenshotbot
yandexsearchshop
yandexsitelinks
yandexspravbot
yandextracker
yandexturbo
yandexuserproxy
yandexverticals
yandexvertis
yandexvideo
yandexvideoparser
yandexwebmaster

Rule Path
Disallow /auth/
Disallow /search$
Disallow /search?
Disallow /book/*/store/
Disallow /book/*/add
Disallow /book/*/set
Disallow /book/*/review

semvisubot

Rule Path
Disallow /

semrushbot
semrushbot-sa

Rule Path
Disallow /

geedoproductsearch

Rule Path
Disallow /

Comments

  • Regular bots
  • Some bots need to be explicitly named to respect these rules
  • Bad bots
  • SemrushBot does not obey the disallow rules above, and crawls book/*/go/
  • GeedoProductSearch does not obey the disallow rules above, and crawls /search?q=