selogerneuf.com
robots.txt

Robots Exclusion Standard data for selogerneuf.com

Resource Scan

Scan Details

Site Domain selogerneuf.com
Base Domain selogerneuf.com
Scan Status Ok
Last Scan2026-01-21T02:01:40+00:00
Next Scan 2026-02-20T02:01:40+00:00

Last Scan

Scanned2026-01-21T02:01:40+00:00
URL https://selogerneuf.com/robots.txt
Domain IPs 52.84.45.116, 52.84.45.31, 52.84.45.7, 52.84.45.8
Response IP 3.169.71.95
Found Yes
Hash 9bafdcdd62f7d63dcad49027a919bd62310f2bba7daac1415cc82482712e4618
SimHash be3c718bcec4

Groups

*

Rule Path
Disallow /z/
Allow /z/produits/assets/css/
Allow /z/produits/assets/js/
Allow /z/*.jpg
Allow /z/*.gif
Allow /z/*.png
Disallow /noindex/
Disallow /recherche%2Calerte%2Ccreation.htm
Disallow /cgi/
Disallow /prerecherche.htm
Disallow /cartographie.htm
Disallow /r%2Cgo
Disallow /carto%2Ccarte.htm
Disallow /cartepop.htm
Disallow /prj%2Caddalerte.htm
Disallow /residence_print.htm
Disallow /creation.htm
Disallow /alerte_email.htm
Disallow /form_nous_contacter.htm
Disallow /affiliation%2Ccollecte_newsletter.htm
Disallow /affiliation%2Ctemplate_affiliation.htm
Disallow /detail%2Cincl_coord_annonceur.htm
Disallow /recherche%2Cframe_300_250.htm
Disallow /recherche%2Cframe_300_600.htm
Disallow /recherche%2Cframe_300_600.htm
Disallow /recherche%2Cframe_300_600_2.htm
Disallow /recherche%2Cframe_300_encart.htm
Disallow /recherche%2Cframe_728_90.htm
Disallow /*/detail%2Cincl_coord_annonceur.htm
Disallow /*/residence_print.htm
Disallow /*/documentation_programme_is.htm
Disallow /rss%2Crecherche.xml
Disallow /recherche%2Cframe_300
Disallow /listing%2Cadvanced_search.htm
Disallow /recherche-avancee.htm
Disallow /*/new_detail%2Cajax%2Call_lots.htm
Disallow /*/new_detail%2Cajax%2Cpoi_data.htm
Disallow /*/interceptor%2Cpages.json
Disallow /*/dem_doc.htm
Disallow *tri%3D*
Disallow /*/annonces?*
Disallow /recherche*
Disallow /annuaire/recherche*
Disallow *?bp=*
Disallow *?ann_neufpg=*
Disallow *?ci=*
Disallow *?bd=*
Disallow *?cp=*
Disallow *?cmp=*
Disallow *?vedette=*
Disallow *?ali=*
Disallow *?idannonce=*
Disallow *?div=*
Disallow *?idpays=*
Disallow *?p=*
Disallow *?annuaireLabel=*
Disallow *clickserve.dartsearch.net*
Disallow /gtmlocal
Disallow /lib/seo/reportLinks.js

mj12bot

Rule Path
Disallow

ubicrawler

Rule Path
Disallow /

doc

Rule Path
Disallow /

zao

Rule Path
Disallow /

sitecheck.internetseer.com

Rule Path
Disallow /

zealbot

Rule Path
Disallow /

msiecrawler

Rule Path
Disallow /

sitesnagger

Rule Path
Disallow /

webstripper

Rule Path
Disallow /

webcopier

Rule Path
Disallow /

fetch

Rule Path
Disallow /

offline explorer

Rule Path
Disallow /

teleport

Rule Path
Disallow /

teleportpro

Rule Path
Disallow /

webzip

Rule Path
Disallow /

linko

Rule Path
Disallow /

httrack

Rule Path
Disallow /

microsoft.url.control

Rule Path
Disallow /

xenu

Rule Path
Disallow /

larbin

Rule Path
Disallow /

libwww

Rule Path
Disallow /

zyborg

Rule Path
Disallow /

download ninja

Rule Path
Disallow /

wget

Rule Path
Disallow /

grub-client

Rule Path
Disallow /

k2spider

Rule Path
Disallow /

npbot

Rule Path
Disallow /

webreaper

Rule Path
Disallow /

Other Records

Field Value
sitemap https://www.selogerneuf.com/sitemaps/index.xml

Comments

  • Sorry, wget in its recursive mode is a frequent problem.
  • Please read the man page and use it properly; there is a
  • --wait option you can use to set the delay between hits,
  • for instance.
  • The 'grub' distributed client has been *very* poorly behaved.
  • Doesn't follow robots.txt anyway, but...
  • Hits many times per second, not acceptable
  • http://www.nameprotect.com/botinfo.html
  • A capture bot, downloads gazillions of pages with no public benefit
  • http://www.webreaper.net/