smashwords.com
robots.txt

Robots Exclusion Standard data for smashwords.com

Resource Scan

Scan Details

Site Domain smashwords.com
Base Domain smashwords.com
Scan Status Ok
Last Scan2024-09-18T19:25:40+00:00
Next Scan 2024-09-25T19:25:40+00:00

Last Scan

Scanned2024-09-18T19:25:40+00:00
URL https://smashwords.com/robots.txt
Redirect https://www.smashwords.com/robots.txt
Redirect Domain www.smashwords.com
Redirect Base smashwords.com
Domain IPs 23.253.20.116
Redirect IPs 23.253.20.116
Response IP 23.253.20.116
Found Yes
Hash 18726fa4caa6d9d4b39e0c56ffa6e89c798858979df25c493c8da3fe2e42517b
SimHash 771b5955c4d5

Groups

baiduspider

Rule Path
Disallow /books/search?
Disallow /books/download/

bingbot

Rule Path
Disallow /books/search?
Disallow /books/download/

duckduckbot

Rule Path
Disallow /books/search?
Disallow /books/download/

googlebot

Rule Path
Disallow /books/search?
Disallow /books/download/

mediapartners-google

Rule Path
Disallow /books/search?
Disallow /books/download/

petalbot

Rule Path
Disallow /

slurp

Rule Path
Disallow /books/search?
Disallow /books/download/

yandex

Rule Path
Disallow /books/search?
Disallow /books/download/

turnitinbot

Rule Path
Disallow /

amazonbot
applebot
applebot-extended
bytespider
ccbot
chatgpt-user
claude-web
claudebot
diffbot
facebookbot
friendlycrawler
gptbot
google-extended
googleother
googleother-image
googleother-video
icc-crawler
imagesiftbot
meta-externalagent
meta-externalfetcher
oai-searchbot
perplexitybot
petalbot
scrapy
timpibot
velenpublicwebcrawler
webzio-extended
youbot
anthropic-ai
cohere-ai
facebookexternalhit
img2dataset
omgili
omgilibot

Rule Path
Disallow /

*

Rule Path
Disallow /books/search?
Disallow /books/download/
Disallow /books/tags/
Disallow /extreader/

Other Records

Field Value
crawl-delay 4

Other Records

Field Value
sitemap http://sitemaps.smashwords.com/prod/sitemap-index.xml

Comments

  • list pulled from https://github.com/ai-robots-txt/ai.robots.txt/blob/main/robots.txt