routledge.com
robots.txt

Robots Exclusion Standard data for routledge.com

Resource Scan

Scan Details

Site Domain routledge.com
Base Domain routledge.com
Scan Status Ok
Last Scan2024-09-29T20:31:58+00:00
Next Scan 2024-10-29T20:31:58+00:00

Last Scan

Scanned2024-09-29T20:31:58+00:00
URL https://routledge.com/robots.txt
Redirect https://www.routledge.com/robots.txt
Redirect Domain www.routledge.com
Redirect Base routledge.com
Domain IPs 104.17.184.26, 104.17.185.26
Redirect IPs 104.17.184.26, 104.17.185.26
Response IP 104.17.185.26
Found Yes
Hash 4e9d2a936aed48c09d58ba18126ba70d6010496b29df9edb975035c222c4a211
SimHash 0d254730d4d1

Groups

*

Rule Path
Disallow /account
Disallow /wishlist
Disallow /cart
Disallow /c/
Disallow /cw/
Disallow /cdn-cgi/
Disallow /sitemap_*.xml

adsbot
adsbot-google
adsbot-google-mobile
adsbot-google-mobile-apps
adidxbot
applebot
apis-google
baiduspider
bingbot
bingpreview
caliperbot
ccbot
contentking
deepcrawl
duckduckbot
facebot
facebookexternalhit/1.0
facebookexternalhit/1.1
googlebot
googlebot-image
googlebot-news
googlebot-video
linkedinbot
microsoftpreview
mediapartners-google
neevabot
petalbot
semrushbot
slurp
siteimprove
semanticscholarbot
storebot-google
twitterbot
yandexbot

Rule Path
Allow /sitemap_*.xml
Disallow /account
Disallow /wishlist
Disallow /cart
Disallow /c/
Disallow /cw/
Disallow /cdn-cgi/

ahrefsbot

Rule Path
Disallow /

Other Records

Field Value
sitemap https://www.routledge.com/sitemap_index.xml

Comments

  • Disallow select URLs
  • Allow these Bots to crawl Sitemap
  • Disallow everything for AhrefsBot
  • Sitemaps-https