lanouvellerepublique.fr
robots.txt

Robots Exclusion Standard data for lanouvellerepublique.fr

Resource Scan

Scan Details

Site Domain lanouvellerepublique.fr
Base Domain lanouvellerepublique.fr
Scan Status Ok
Last Scan2024-05-29T19:38:20+00:00
Next Scan 2024-06-28T19:38:20+00:00

Last Scan

Scanned2024-05-29T19:38:20+00:00
URL https://lanouvellerepublique.fr/robots.txt
Redirect https://www.lanouvellerepublique.fr:443/robots.txt
Redirect Domain www.lanouvellerepublique.fr
Redirect Base lanouvellerepublique.fr
Domain IPs 52.50.95.211
Redirect IPs 18.239.199.118, 18.239.199.55, 18.239.199.58, 18.239.199.79, 2600:9000:2024:1600:1d:d466:e0c0:93a1, 2600:9000:2024:4600:1d:d466:e0c0:93a1, 2600:9000:2024:4a00:1d:d466:e0c0:93a1, 2600:9000:2024:5400:1d:d466:e0c0:93a1, 2600:9000:2024:9800:1d:d466:e0c0:93a1, 2600:9000:2024:ca00:1d:d466:e0c0:93a1, 2600:9000:2024:d400:1d:d466:e0c0:93a1, 2600:9000:2024:dc00:1d:d466:e0c0:93a1
Response IP 108.157.52.42
Found Yes
Hash c91e032bf1ce69342234d246cdcfa61359535e2c964ea8a8b858d03675245cac
SimHash a8529503c5f5

Groups

duckduckbot
mediapartners-google
googlebot
googlebot-image
googlebot-mobile
googleproducer
googlebot-video
adsbot-google
googlebot_nauxeo
qwantify
qwant-news
voilabot
msnbot
slurp
bingbot
twitterbot
facebookexternalhit
applebot
bingbot
facebot
grapeshot
flipboard
flipboardproxy
weborama-fetcher
feedfetcher-google

Rule Path
Disallow /recherche
Disallow /backoffice
Disallow /mon-compte
Disallow /kiosque
Disallow /autour-de-moi
Disallow /contributeur
Disallow /annonces
Disallow /fr/
Disallow /api/
Allow /annonces/avis-de-deces/
Allow /api/v1/showcase
Allow /api/v1/rss/5c5d4592a7f67291298b456a
Allow /api/v1/rss/5c5d46dfa32027d4478b4567
Allow /api/v1/rss/5c5d41ce08cd953b7e8b4574
Allow /api/v1/rss/5c5d429800655ad45a8b4571
Allow /api/v1/rss/5c5d4612a7f672692a8b4575
Allow /api/v1/rss/5c5d4679e91a9623078b458d
Allow /api/v1/rss/592bf255489a4555008b4568

googlebot-news

Rule Path
Disallow /recherche
Disallow /backoffice
Disallow /mon-compte
Disallow /kiosque
Disallow /autour-de-moi
Disallow /contributeur
Disallow /annonces
Disallow /fr/
Disallow /api/
Disallow /annonces/avis-de-deces/
Allow /api/v1/showcase
Allow /api/v1/rss/5c5d4592a7f67291298b456a
Allow /api/v1/rss/5c5d46dfa32027d4478b4567
Allow /api/v1/rss/5c5d41ce08cd953b7e8b4574
Allow /api/v1/rss/5c5d429800655ad45a8b4571
Allow /api/v1/rss/5c5d4612a7f672692a8b4575
Allow /api/v1/rss/5c5d4679e91a9623078b458d

*

Rule Path
Disallow /

Other Records

Field Value
sitemap https://www.lanouvellerepublique.fr/sitemap.xml

Comments

  • robots.txt
  • This file is to prevent the crawling and indexing of certain parts
  • of your site by web crawlers and spiders run by sites like Yahoo!
  • and Google. By telling these "robots" where not to go on your site,
  • you save bandwidth and server resources.
  • This file will be ignored unless it is at the root of your host:
  • Used: http://example.com/robots.txt
  • Ignored: http://example.com/site/robots.txt
  • For more information about the robots.txt standard, see:
  • http://www.robotstxt.org/wc/robots.html
  • For syntax checking, see:
  • http://www.sxw.org.uk/computing/robots/check.html