charanga.com
robots.txt

Robots Exclusion Standard data for charanga.com

Archived Snapshots

Resource Scan

Scan Details

Site Domain	charanga.com
Base Domain	charanga.com
Scan Status	Ok
Last Scan	2024-11-04T16:20:28+00:00
Next Scan	2024-11-18T16:20:28+00:00

Last Scan

Scanned	2024-11-04T16:20:28+00:00
URL	https://charanga.com/robots.txt
Domain IPs	2a05:d018:1a71:f800:b07:e38a:c04e:13b8, 79.125.12.250
Response IP	79.125.12.250
Found	Yes
Hash	cb3b03d67c1b96da275ce685aa55cb8889bbc27a8b8ae4bc2e780095dccf47c9
SimHash	ae1418d1b174

Groups

*

Rule	Path
Disallow	/hnyp0t
Disallow	/o

Rule

Path

Disallow

/hnyp0t

Disallow

/o

red

Rule	Path
Allow	/

Rule

Path

Allow

/

*

Rule	Path
Disallow	/resource_library/*.pdf$

Rule

Path

Disallow

/resource_library/*.pdf$

*

Rule	Path
Disallow	/resource_library

Rule

Path

Disallow

/resource_library

bingpreview

Rule	Path
Disallow	/pupil_reports
Disallow	/o

Rule

Path

Disallow

/pupil_reports

Disallow

/o

adsbot-google

Rule	Path
Disallow	/user/login

Rule

Path

Disallow

/user/login

mj12bot

Rule	Path
Disallow	/

Rule

Path

Disallow

/

*

Rule	Path
Disallow	/~magnolia/
Disallow	/music_service_admin/letters/pupil_information_printable
Disallow	/site/?ID=
Disallow	/site/?Id=
Disallow	/site/?id=
Disallow	/site/?s=
Disallow	/site/video/
Disallow	/school_quotes
Disallow	/vip_session_accounts
Disallow	/quotes
Disallow	/musicalschoolfreetrial
Disallow	/assets/record_usage
Disallow	/admin
Disallow	/music_service_admin
Disallow	/training_events

Rule

Path

Disallow

/~magnolia/

Disallow

/music_service_admin/letters/pupil_information_printable

Disallow

/site/?ID=

Disallow

/site/?Id=

Disallow

/site/?id=

Disallow

/site/?s=

Disallow

/site/video/

Disallow

/school_quotes

Disallow

/vip_session_accounts

Disallow

/quotes

Disallow

/musicalschoolfreetrial

Disallow

/assets/record_usage

Disallow

/admin

Disallow

/music_service_admin

Disallow

/training_events

Back to top

Other Records

Field	Value
sitemap	https://charanga.com/site/sitemap_index.xml
sitemap	https://www.charanga.cz/site/sitemap_index.xml
sitemap	https://www.charanga.dk/site/sitemap_index.xml
sitemap	https://www.charanga.com.au/site/sitemap_index.xml
sitemap	https://www.charanga.co.za/site/sitemap_index.xml
sitemap	https://www.charanga.hk/site/sitemap_index.xml
sitemap	https://www.charanga.in/site/sitemap_index.xml
sitemap	https://www.charanga.vn/site/sitemap_index.xml
sitemap	https://www.banesmusiconline.co.uk/site/sitemapindex.xml
sitemap	https://www.bradfordmusiconline.co.uk/site/sitemap_index.xml
sitemap	https://www.essexmusichub.org.uk/site/sitemap_index.xml
sitemap	https://www.lancashiremusichub.co.uk/site/sitemap_index.xml
sitemap	https://www.norfolkmusichub.org.uk/site/sitemap_index.xml
sitemap	https://www.richmondmusictrust.org.uk/site/sitemap_index.xml
sitemap	https://www.wakefieldmusicservices.org/site/sitemap_index.xml

Field

Value

sitemap

https://charanga.com/site/sitemap_index.xml

sitemap

https://www.charanga.cz/site/sitemap_index.xml

sitemap

https://www.charanga.dk/site/sitemap_index.xml

sitemap

https://www.charanga.com.au/site/sitemap_index.xml

sitemap

https://www.charanga.co.za/site/sitemap_index.xml

sitemap

https://www.charanga.hk/site/sitemap_index.xml

sitemap

https://www.charanga.in/site/sitemap_index.xml

sitemap

https://www.charanga.vn/site/sitemap_index.xml

sitemap

https://www.banesmusiconline.co.uk/site/sitemapindex.xml

sitemap

https://www.bradfordmusiconline.co.uk/site/sitemap_index.xml

sitemap

https://www.essexmusichub.org.uk/site/sitemap_index.xml

sitemap

https://www.lancashiremusichub.co.uk/site/sitemap_index.xml

sitemap

https://www.norfolkmusichub.org.uk/site/sitemap_index.xml

sitemap

https://www.richmondmusictrust.org.uk/site/sitemap_index.xml

sitemap

https://www.wakefieldmusicservices.org/site/sitemap_index.xml

Back to top

Comments

See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
To ban all spiders from the entire site uncomment the next two lines:
User-Agent: *
Disallow: /
Also be aware that this robots.txt is only served for charanga.com and the
principle partner sites. We have a dynamically generated robots.txt for
assets1.charanga.com, assets2.charanga.com etc to prevent us getting duplicate
content penalties, but it might interfere with e.g. redbot.org testing
Googlebot is now ignoring Noindex directives:
https://developers.google.com/search/blog/2019/07/a-note-on-unsupported-rules-in-robotstxt
redbot.org is very useful for testing, but will only work if we
specifically allow it
Don't index any pdfs in the resource_library; there are brochures we do want
indexed, hence the /resource_library path
And really, we should just stop any crawling of the resource_library
Stop Bingpreview from invalidating links
The AdsBot-Google has gone mental and is following links that
deliver converting users to us. It seems to particularly like
the login page, so BAN THIS FILTH
Some SEO spider that's all over us
Google, Yahoo, MSN et al all seem to be trying to index this (non-existant) url. God knows why
Tell them to sod off
It was public, now it's login-only. We need this here so we can purge
cached copies with Google Webmaster Tools
These have to be fully qualified URLs, and because this robots.txt is shared
let's add them all here...
Ours
Partners
Noindex for internal search, which is typically now just referer spam.
This should noindex the crawls like
/site/?ID=m4wfx391hhzna2z9rfgp7g91p3hg8063&s=%E6%88%90%E9%83%BD
which we're increasingly seeing
We've also got people trying to hotlink external URLs to our
video player, which doesn't work, but we don't want those as incoming
links
stop dumb robots submitting this with no values

Back to top

charanga.comrobots.txt

Resource Scan

Scan Details

Last Scan

Groups

*

red

*

*

bingpreview

adsbot-google

mj12bot

*

Other Records

Comments

charanga.com
robots.txt