angrymum.fr
robots.txt

Robots Exclusion Standard data for angrymum.fr

Resource Scan

Scan Details

Site Domain angrymum.fr
Base Domain angrymum.fr
Scan Status Ok
Last Scan2024-09-30T13:49:45+00:00
Next Scan 2024-10-07T13:49:45+00:00

Last Scan

Scanned2024-09-30T13:49:45+00:00
URL https://angrymum.fr/robots.txt
Domain IPs 109.234.166.24
Response IP 109.234.166.24
Found Yes
Hash 305daff988bb2880e5b778e6e379e25d7b854f2487d7eda670b650a17e2d1851
SimHash 795c51c2c7e5

Groups

googlebot
googlebot-image
googlebot-video

Rule Path
Disallow /*.php$
Disallow /*.inc$
Disallow /*.gz$
Disallow /*.swf$
Disallow /*.wmv$
Disallow /*.cgi$
Disallow /*.xhtml$
Disallow /wp-admin/
Disallow /wp-includes/
Disallow /trackback/
Disallow /wp-login.php
Disallow /wp-register.php
Disallow /wp-content/plugins/link-juice-optimizer/public/js/link-juice-optimizer.js

google-read-aloud

Rule Path
Allow /*

feedfetcher-google

Rule Path
Allow /*
Allow /feed/*

google-read-aloud
googlebot-news
google-speakr
duplexweb-google
googleweblight
google-producer
google-extended

Rule Path
Allow /*

storebot-google
adsbot-google-mobile
adsbot-google
mediapartners-google
googleother-image
googleother-video

Rule Path
Allow /*

bingbot
yahoo! slurp
teoma
baiduspider
yandex
applebot
exabot
ia_archiver
qwantify
wikiwix
duckduckbot
pinterest
coccocbot
coccocbot-web
yeti
sogou web spider
sogou
seekbot
seekport
seekport crawler
linguee
deusu
turnitinbot

Rule Path
Allow /*
Disallow /*.php$
Disallow /*.inc$
Disallow /*.gz$
Disallow /*.swf$
Disallow /*.wmv$
Disallow /*.cgi$
Disallow /*.xhtml$
Disallow /wp-content/plugins/link-juice-optimizer/public/js/link-juice-optimizer.js

*

Rule Path
Disallow /*.php$
Disallow /*.inc$
Disallow /*.gz$
Disallow /*.swf$
Disallow /*.wmv$
Disallow /*.cgi$
Disallow /*.xhtml$
Disallow /wp-content/plugins/link-juice-optimizer/public/js/link-juice-optimizer.js

mj12bot
orthogaffe
ubicrawler
doc
zao
sitecheck.internetseer.com
zealbot
msiecrawler
sitesnagger
webstripper
webcopier
fetch
offline explorer
teleport
teleportpro
webzip
linko
httrack
microsoft.url.control
xenu
larbin
libwww
zyborg
download ninja
fast
wget
npbot
webreaper
mojeekbot
cliqzbot
istellabot
psbot
coccocbot-image
spbot
proximic
bizinformation
blexbot
riddler
ltx71
magpie-crawler
grapeshot
grapeshotcrawler
gigablastopensource
bubing
linkdexbot
linkdexbot/2.2
seokicks
seokicks-robot
panscient.com
webdatastats
zoominfobot
ccbot

Rule Path
Disallow /

Other Records

Field Value
sitemap https://www.lamusiqueducorps.fr/sitemap_index.xml

Comments

  • On empeche l'indexation des fichiers sensibles
  • Autoriser Google Recherche vocale (?)
  • Autoriser Google Recherche flux rss (?)
  • Autoriser Google Recherche contenu (?), News
  • Autoriser Google store bot pour les boutiques
  • Autoriser Google mobile ads et Google ads, media partner
  • Autoriser Bing, Yahoo, Ask, Baidu, Yandex, Apple, Exalead, Alexa, Qwant, Wikipedia,DuckDuckBot
  • Pour les autres robots
  • On empeche l'indexation des fichiers sensibles
  • spamming bot, badbot et robots trop gourmands
  • On indique au spider le lien vers notre sitemap
  • @@@@@@@@@@@&
  • @/ .@@@@(.&@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@# (
  • (.%,, **% .@@@, %@@@* ,@@@@@@@@@@@@ *
  • &@@ ,@@, &@@& @@@@@ %

Warnings

  • 28 invalid lines.
  • `allos` is not a known field.