All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lars Marowsky-Bree <lmb@suse.com>
To: ceph-devel@vger.kernel.org
Subject: Re: ECONNREFUSED implies OSD definitely failed
Date: Thu, 28 Apr 2016 16:32:51 +0200	[thread overview]
Message-ID: <20160428143251.GA1541@suse.de> (raw)
In-Reply-To: <alpine.DEB.2.11.1604221221000.8831@cpach.fuggernut.com>

On 2016-04-22T12:24:52, Sage Weil <sweil@redhat.com> wrote:

> Piotr has a PR at
> 
> 	https://github.com/ceph/ceph/pull/8558
> 
> that changes the messenger and OSD logic so that if we get an ECONNREFUSED 
> trying to talk to another OSD we can definitively conclude that the OSD is 
> down/failed, without waiting for the normal heartbeat timeout.
> 
> I think this is true in normal networking environments.  My only concern 
> is that there might be cases where the OSD isn't actually down and some 
> transient network issue could cause ECONNREFUSED.  Like... some 
> firewally magic networky thing.  If a transient ECONNREFUSED was possible, 
> it could cause some ugly flapping.
> 
> Can anyone think of something that might cause this?  Even if it is 
> something obscure, it means we should have a config option to disable this 
> new behavior (we probably should anyway).

Exactly this - the system reconfiguring it's network interfaces and
firewall rules (in a suboptimal fashion; it should drop, not reject, but
...).

Or a duplicate IP address (with a node that isn't running ceph-osd).
Again, not supposed to happen.



-- 
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2016-04-28 14:32 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-22 16:24 ECONNREFUSED implies OSD definitely failed Sage Weil
2016-04-28 14:32 ` Lars Marowsky-Bree [this message]
2016-04-29  7:46   ` Piotr Dałek
2016-04-29 12:29     ` Sage Weil
2016-04-29 12:32       ` Lars Marowsky-Bree
2016-04-29 19:02       ` Piotr Dałek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160428143251.GA1541@suse.de \
    --to=lmb@suse.com \
    --cc=ceph-devel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.