charanga.com
robots.txt

Robots Exclusion Standard data for charanga.com

Resource Scan

Scan Details

Site Domain charanga.com
Base Domain charanga.com
Scan Status Ok
Last Scan2024-11-04T16:20:28+00:00
Next Scan 2024-11-18T16:20:28+00:00

Last Scan

Scanned2024-11-04T16:20:28+00:00
URL https://charanga.com/robots.txt
Domain IPs 2a05:d018:1a71:f800:b07:e38a:c04e:13b8, 79.125.12.250
Response IP 79.125.12.250
Found Yes
Hash cb3b03d67c1b96da275ce685aa55cb8889bbc27a8b8ae4bc2e780095dccf47c9
SimHash ae1418d1b174

Groups

*

Rule Path
Disallow /hnyp0t
Disallow /o

red

Rule Path
Allow /

*

Rule Path
Disallow /resource_library/*.pdf$

*

Rule Path
Disallow /resource_library

bingpreview

Rule Path
Disallow /pupil_reports
Disallow /o

adsbot-google

Rule Path
Disallow /user/login

mj12bot

Rule Path
Disallow /

*

Rule Path
Disallow /~magnolia/
Disallow /music_service_admin/letters/pupil_information_printable
Disallow /site/?ID=
Disallow /site/?Id=
Disallow /site/?id=
Disallow /site/?s=
Disallow /site/video/
Disallow /school_quotes
Disallow /vip_session_accounts
Disallow /quotes
Disallow /musicalschoolfreetrial
Disallow /assets/record_usage
Disallow /admin
Disallow /music_service_admin
Disallow /training_events

Other Records

Field Value
sitemap https://charanga.com/site/sitemap_index.xml
sitemap https://www.charanga.cz/site/sitemap_index.xml
sitemap https://www.charanga.dk/site/sitemap_index.xml
sitemap https://www.charanga.com.au/site/sitemap_index.xml
sitemap https://www.charanga.co.za/site/sitemap_index.xml
sitemap https://www.charanga.hk/site/sitemap_index.xml
sitemap https://www.charanga.in/site/sitemap_index.xml
sitemap https://www.charanga.vn/site/sitemap_index.xml
sitemap https://www.banesmusiconline.co.uk/site/sitemapindex.xml
sitemap https://www.bradfordmusiconline.co.uk/site/sitemap_index.xml
sitemap https://www.essexmusichub.org.uk/site/sitemap_index.xml
sitemap https://www.lancashiremusichub.co.uk/site/sitemap_index.xml
sitemap https://www.norfolkmusichub.org.uk/site/sitemap_index.xml
sitemap https://www.richmondmusictrust.org.uk/site/sitemap_index.xml
sitemap https://www.wakefieldmusicservices.org/site/sitemap_index.xml

Comments

  • See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
  • To ban all spiders from the entire site uncomment the next two lines:
  • User-Agent: *
  • Disallow: /
  • Also be aware that this robots.txt is only served for charanga.com and the
  • principle partner sites. We have a dynamically generated robots.txt for
  • assets1.charanga.com, assets2.charanga.com etc to prevent us getting duplicate
  • content penalties, but it might interfere with e.g. redbot.org testing
  • Googlebot is now ignoring Noindex directives:
  • https://developers.google.com/search/blog/2019/07/a-note-on-unsupported-rules-in-robotstxt
  • redbot.org is very useful for testing, but will only work if we
  • specifically allow it
  • Don't index any pdfs in the resource_library; there are brochures we do want
  • indexed, hence the /resource_library path
  • And really, we should just stop any crawling of the resource_library
  • Stop Bingpreview from invalidating links
  • The AdsBot-Google has gone mental and is following links that
  • deliver converting users to us. It seems to particularly like
  • the login page, so BAN THIS FILTH
  • Some SEO spider that's all over us
  • Google, Yahoo, MSN et al all seem to be trying to index this (non-existant) url. God knows why
  • Tell them to sod off
  • It was public, now it's login-only. We need this here so we can purge
  • cached copies with Google Webmaster Tools
  • These have to be fully qualified URLs, and because this robots.txt is shared
  • let's add them all here...
  • Ours
  • Partners
  • Noindex for internal search, which is typically now just referer spam.
  • This should noindex the crawls like
  • /site/?ID=m4wfx391hhzna2z9rfgp7g91p3hg8063&s=%E6%88%90%E9%83%BD
  • which we're increasingly seeing
  • We've also got people trying to hotlink external URLs to our
  • video player, which doesn't work, but we don't want those as incoming
  • links
  • stop dumb robots submitting this with no values