All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Piotr Dałek" <branch@predictor.org.pl>
To: Lars Marowsky-Bree <lmb@suse.com>
Cc: ceph-devel@vger.kernel.org
Subject: Re: ECONNREFUSED implies OSD definitely failed
Date: Fri, 29 Apr 2016 09:46:39 +0200	[thread overview]
Message-ID: <20160429074639.GG26146@predictor> (raw)
In-Reply-To: <20160428143251.GA1541@suse.de>

On Thu, Apr 28, 2016 at 04:32:51PM +0200, Lars Marowsky-Bree wrote:
> On 2016-04-22T12:24:52, Sage Weil <sweil@redhat.com> wrote:
> 
> > Piotr has a PR at
> > 
> > 	https://github.com/ceph/ceph/pull/8558
> > 
> > that changes the messenger and OSD logic so that if we get an ECONNREFUSED 
> > trying to talk to another OSD we can definitively conclude that the OSD is 
> > down/failed, without waiting for the normal heartbeat timeout.
> > 
> > I think this is true in normal networking environments.  My only concern 
> > is that there might be cases where the OSD isn't actually down and some 
> > transient network issue could cause ECONNREFUSED.  Like... some 
> > firewally magic networky thing.  If a transient ECONNREFUSED was possible, 
> > it could cause some ugly flapping.
> > 
> > Can anyone think of something that might cause this?  Even if it is 
> > something obscure, it means we should have a config option to disable this 
> > new behavior (we probably should anyway).
> 
> Exactly this - the system reconfiguring it's network interfaces and
> firewall rules (in a suboptimal fashion; it should drop, not reject, but
> ...).

I'm not convinced that we should care about this. I think that probability
of (re)connect event occurrence during firewall reconfiguration is quite
low.
 
> Or a duplicate IP address (with a node that isn't running ceph-osd).
> Again, not supposed to happen.

That will cause a lot of other things to fail, and having ceph-osd get
downed faster gives a greater chance of getting someone's attention. 

-- 
Piotr Dałek
branch@predictor.org.pl
http://blog.predictor.org.pl
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2016-04-29  7:44 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-22 16:24 ECONNREFUSED implies OSD definitely failed Sage Weil
2016-04-28 14:32 ` Lars Marowsky-Bree
2016-04-29  7:46   ` Piotr Dałek [this message]
2016-04-29 12:29     ` Sage Weil
2016-04-29 12:32       ` Lars Marowsky-Bree
2016-04-29 19:02       ` Piotr Dałek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160429074639.GG26146@predictor \
    --to=branch@predictor.org.pl \
    --cc=ceph-devel@vger.kernel.org \
    --cc=lmb@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.