cci-paris-idf.fr
robots.txt

Robots Exclusion Standard data for cci-paris-idf.fr

Resource Scan

Scan Details

Site Domain cci-paris-idf.fr
Base Domain cci-paris-idf.fr
Scan Status Ok
Last Scan2024-06-08T10:34:09+00:00
Next Scan 2024-07-08T10:34:09+00:00

Last Scan

Scanned2024-06-08T10:34:09+00:00
URL https://www.cci-paris-idf.fr/robots.txt
Domain IPs 2600:9000:271a:2a00:c:dfc4:4800:93a1, 2600:9000:271a:4a00:c:dfc4:4800:93a1, 2600:9000:271a:6200:c:dfc4:4800:93a1, 2600:9000:271a:6600:c:dfc4:4800:93a1, 2600:9000:271a:6e00:c:dfc4:4800:93a1, 2600:9000:271a:800:c:dfc4:4800:93a1, 2600:9000:271a:d200:c:dfc4:4800:93a1, 2600:9000:271a:f000:c:dfc4:4800:93a1, 3.165.82.35, 3.165.82.73, 3.165.82.79, 3.165.82.98
Response IP 3.165.82.79
Found Yes
Hash eee54748d94eb96207dea482baa6676ce482daf93e12faefea0b507777390171
SimHash b106bf0b0740

Groups

*

Rule Path
Allow /core/*.css$
Allow /core/*.css?
Allow /core/*.js$
Allow /core/*.js?
Allow /core/*.gif
Allow /core/*.jpg
Allow /core/*.jpeg
Allow /core/*.png
Allow /core/*.svg
Allow /profiles/*.css$
Allow /profiles/*.css?
Allow /profiles/*.js$
Allow /profiles/*.js?
Allow /profiles/*.gif
Allow /profiles/*.jpg
Allow /profiles/*.jpeg
Allow /profiles/*.png
Allow /profiles/*.svg
Disallow /core/
Disallow /profiles/
Disallow /README.txt
Disallow /web.config
Disallow /admin/
Disallow /comment/reply/
Disallow /filter/tips
Disallow /node/add/
Disallow /search/
Disallow /user/register/
Disallow /user/password/
Disallow /user/login/
Disallow /user/logout/
Disallow /index.php/admin/
Disallow /index.php/comment/reply/
Disallow /index.php/filter/tips
Disallow /index.php/node/add/
Disallow /index.php/search/
Disallow /index.php/user/password/
Disallow /index.php/user/register/
Disallow /index.php/user/login/
Disallow /index.php/user/logout/
Disallow /en/node/
Disallow /*search_field%3D
Disallow /*?field_niveau_entree
Disallow /*?field_salon
Disallow /*combine%3D
Disallow /*search-content
Disallow /fr/domaine
Disallow /fr/accueil-cloned
Disallow /resultats-corporate-0
Disallow /etudes/organisation/crocis/resultats-de-votre-recherche-crocis?page
Disallow /etudes/organisation/*/*page%3D
Disallow /etudes/actualites/*-etudes*page%3D
Disallow /etudes/rechercher-etudes*?
Disallow /formation/ecoles/*/*page%3D
Disallow /formation/actualites/*/*page%3D
Disallow /formation/24-ecoles/ecoles/*/*page%3D
Disallow /informations-territoriales/*/*page%3D
Disallow /etudes/salons-paris/
Disallow /dga-aie*/*page%3D
Disallow /dec-departement-*/*page%3D
Disallow /cfi-centre*/*page%3D
Disallow /faculte-des-metiers*/*page%3D
Disallow /escp-europe/*page%3D
Disallow /faculte-des-metiers*/*page%3D
Disallow /*/wysiwyg
Disallow /taxonomy/
Disallow /email/
Disallow /cci-region/contenus/partager
Disallow /cci-region/contacts-sites/les-marques-et-les-sites-internet-corporate?id
Disallow /cci-region/contacts-sites/*?
Disallow /thematique
Disallow /*?order=
Disallow /*sort%3D
Disallow /*?lettre1
Disallow /comment/
Disallow /contenus/*?domaine=
Disallow /print
Disallow /newsletter
Disallow /*undefined
Disallow /*/salons/2001
Disallow /*/salons/2002
Disallow /*/salons/2003
Disallow /*/salons/2004
Disallow /*/salons/2005
Disallow /*/salons/2006
Disallow /*/salons/2007
Disallow /*/salons/2008
Disallow /*/salons/2009
Disallow /*/salons/2010
Disallow /*/salons/2011
Disallow /*/salons/2012
Disallow /*/salons/2013
Disallow /*/salons/2014
Disallow /*/salons/2015
Disallow /*/salons/2016
Disallow /*/salons/2017
Disallow /*/salons/2018
Disallow /*/salons/2019
Disallow /*/salons/201*
Disallow /*/salons/200*
Disallow /*/*-bo/
Disallow /*/termes-bo/
Disallow /*/lieux-bo/
Disallow /*/contacts-bo/

Comments

  • robots.txt
  • This file is to prevent the crawling and indexing of certain parts
  • of your site by web crawlers and spiders run by sites like Yahoo!
  • and Google. By telling these "robots" where not to go on your site,
  • you save bandwidth and server resources.
  • This file will be ignored unless it is at the root of your host:
  • Used: http://example.com/robots.txt
  • Ignored: http://example.com/site/robots.txt
  • For more information about the robots.txt standard, see:
  • http://www.robotstxt.org/robotstxt.html
  • CSS, JS, Images
  • Directories
  • Files
  • Paths (clean URLs)
  • Paths (no clean URLs)