neuf.logic-immo.com
robots.txt

Robots Exclusion Standard data for neuf.logic-immo.com

Resource Scan

Scan Details

Site Domain neuf.logic-immo.com
Base Domain logic-immo.com
Scan Status Ok
Last Scan2024-11-03T09:10:57+00:00
Next Scan 2024-12-03T09:10:57+00:00

Last Scan

Scanned2024-11-03T09:10:57+00:00
URL https://neuf.logic-immo.com/robots.txt
Domain IPs 13.227.254.11, 13.227.254.34, 13.227.254.36, 13.227.254.68
Response IP 13.227.254.11
Found Yes
Hash 7bc144e8b461168b0b36ee76d23b003d8b0b8c1fa1c694af9e9e8929165d04f8
SimHash be2e3183c6c4

Groups

*

Rule Path
Disallow /z/
Allow /z/produits/static/css/
Allow /z/produits/static/js/
Allow /z/*.jpg
Allow /z/*.gif
Allow /z/*.png
Disallow /noindex/
Disallow /recherche%2Calerte%2Ccreation.htm
Disallow /cgi/
Disallow /prerecherche.htm
Disallow /cartographie.htm
Disallow /r%2Cgo
Disallow /carto%2Ccarte.htm
Disallow /cartepop.htm
Disallow /prj%2Caddalerte.htm
Disallow /residence_print.htm
Disallow /creation.htm
Disallow /alerte_email.htm
Disallow /form_nous_contacter.htm
Disallow /affiliation%2Ccollecte_newsletter.htm
Disallow /affiliation%2Ctemplate_affiliation.htm
Disallow /detail%2Cincl_coord_annonceur.htm
Disallow /recherche%2Cframe_300_250.htm
Disallow /recherche%2Cframe_300_600.htm
Disallow /recherche%2Cframe_300_600.htm
Disallow /recherche%2Cframe_300_600_2.htm
Disallow /recherche%2Cframe_300_encart.htm
Disallow /recherche%2Cframe_728_90.htm
Disallow /*/detail%2Cincl_coord_annonceur.htm
Disallow /*/residence_print.htm
Disallow /*/documentation_programme_is.htm
Disallow /rss%2Crecherche.xml
Disallow /recherche%2Cframe_300
Disallow /listing%2Cadvanced_search.htm
Disallow /recherche-avancee.htm
Disallow /*/new_detail%2Cajax%2Call_lots.htm
Disallow /*/new_detail%2Cajax%2Cpoi_data.htm
Disallow /*/interceptor%2Cpages.json
Disallow /*/dem_doc.htm
Disallow *tri%3D*
Disallow /*/annonces?*
Disallow /recherche*
Disallow /annuaire/recherche*
Disallow *?bp=*
Disallow *?ann_neufpg=*
Disallow *?ci=*
Disallow *?bd=*
Disallow *?cp=*
Disallow *?cmp=*
Disallow *?vedette=*
Disallow *?ali=*
Disallow *?idannonce=*
Disallow *?div=*
Disallow *?idpays=*
Disallow *?p=*
Disallow *?annuaireLabel=*
Disallow /*?xtor=
Disallow /?
Disallow /carte-mobile-*
Disallow /om/
Disallow /investir/
Disallow /vendor/
Disallow /content/
Disallow /fonts/
Disallow /actu-immo-neuf/*
Disallow /actus_neuf.php*
Disallow /affluence
Disallow /confirmation-contact*
Disallow /contact?prgId*
Disallow /contact-simulation*
Disallow /programs-sticky*
Disallow /programsmarkers*
Disallow /getNumFound*
Disallow /alerte-mail-popin*
Disallow /program-navigation
Disallow /ajaxPrograms
Disallow /newspromotor
Disallow /?pushcontact=*
Disallow /habiter/maison-1-piece-neuve%2C*
Disallow /habiter/maison-neuf-*
Disallow /*studio*
Disallow /*-pieces-*
Disallow /*-T1*
Disallow /*-T2*
Disallow /*-T3*
Disallow /*-T4*
Disallow /*-T5*

mediapartners-google
mj12bot

Rule Path
Disallow

ubicrawler

Rule Path
Disallow /

doc

Rule Path
Disallow /

zao

Rule Path
Disallow /

sitecheck.internetseer.com

Rule Path
Disallow /

zealbot

Rule Path
Disallow /

msiecrawler

Rule Path
Disallow /

sitesnagger

Rule Path
Disallow /

webstripper

Rule Path
Disallow /

webcopier

Rule Path
Disallow /

fetch

Rule Path
Disallow /

offline explorer

Rule Path
Disallow /

teleport

Rule Path
Disallow /

teleportpro

Rule Path
Disallow /

webzip

Rule Path
Disallow /

linko

Rule Path
Disallow /

httrack

Rule Path
Disallow /

microsoft.url.control

Rule Path
Disallow /

xenu

Rule Path
Disallow /

larbin

Rule Path
Disallow /

libwww

Rule Path
Disallow /

zyborg

Rule Path
Disallow /

download ninja

Rule Path
Disallow /

wget

Rule Path
Disallow /

grub-client

Rule Path
Disallow /

k2spider

Rule Path
Disallow /

npbot

Rule Path
Disallow /

webreaper

Rule Path
Disallow /

Comments

  • Sorry, wget in its recursive mode is a frequent problem.
  • Please read the man page and use it properly; there is a
  • --wait option you can use to set the delay between hits,
  • for instance.
  • The 'grub' distributed client has been *very* poorly behaved.
  • Doesn't follow robots.txt anyway, but...
  • Hits many times per second, not acceptable
  • http://www.nameprotect.com/botinfo.html
  • A capture bot, downloads gazillions of pages with no public benefit
  • http://www.webreaper.net/