All of lore.kernel.org
 help / color / mirror / Atom feed
* ECONNREFUSED implies OSD definitely failed
@ 2016-04-22 16:24 Sage Weil
  2016-04-28 14:32 ` Lars Marowsky-Bree
  0 siblings, 1 reply; 6+ messages in thread
From: Sage Weil @ 2016-04-22 16:24 UTC (permalink / raw)
  To: ceph-devel

Piotr has a PR at

	https://github.com/ceph/ceph/pull/8558

that changes the messenger and OSD logic so that if we get an ECONNREFUSED 
trying to talk to another OSD we can definitively conclude that the OSD is 
down/failed, without waiting for the normal heartbeat timeout.

I think this is true in normal networking environments.  My only concern 
is that there might be cases where the OSD isn't actually down and some 
transient network issue could cause ECONNREFUSED.  Like... some 
firewally magic networky thing.  If a transient ECONNREFUSED was possible, 
it could cause some ugly flapping.

Can anyone think of something that might cause this?  Even if it is 
something obscure, it means we should have a config option to disable this 
new behavior (we probably should anyway).

sage

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-04-29 19:01 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-04-22 16:24 ECONNREFUSED implies OSD definitely failed Sage Weil
2016-04-28 14:32 ` Lars Marowsky-Bree
2016-04-29  7:46   ` Piotr Dałek
2016-04-29 12:29     ` Sage Weil
2016-04-29 12:32       ` Lars Marowsky-Bree
2016-04-29 19:02       ` Piotr Dałek

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.