All of lore.kernel.org
 help / color / mirror / Atom feed
* [Ocfs2-devel] Fencing in OCFS2
@ 2006-05-17 14:20 Sum Sha
  2006-05-18 14:46 ` Sum Sha
  2006-05-25 21:35 ` Daniel Phillips
  0 siblings, 2 replies; 9+ messages in thread
From: Sum Sha @ 2006-05-17 14:20 UTC (permalink / raw)
  To: ocfs2-devel

Hi,
Just wanted to understand how OCFS2 fencing works. Sorry if this has
already been discussed...
(1)
--------------
A node has quorum when:
        * it sees an odd number of heartbeating nodes and has network
          connectivity to more than half of them.
                or
        * it sees an even number of heartbeating nodes and has network
          connectivity to at least half of them *and* has connectivity to
          the heartbeating node with the lowest node number.
--------------
Now, Think about a case where there are 5 nodes in an OCFS2 cluster.
Consider that split-brain happens and it's divided into 2 subclusters
of 3-node and 2-node. In this case, this algorithm will work fine and
the cluster with 3-node sub cluster will win the race. But think about
the case, where there is a serial split-brain and you have 2-node,
2-node and 1-node (3 sub-clusters) after 2 split-brains at the same
time. In this case, this algorithm will fail and all sub-clusters will
be paniced, because on each sub cluster, none of the nodes has
connectivity to more than (5/2 = 2) nodes, while each node can get
disk hearbeat from 5 nodes.

This may be the case with any cluster configuration, if there are
serial split-brains. Has the algorithm been designed for handling
serial split-brains? If yes, then how?
Is there anything else which is to be considered?

(2) In ocfs2_faq I read that for quorum process to get stabilzed it
may take 28 seconds.
--------------
Q05     How long does the quorum process take?
A05     First a node will realize that it doesn't have connectivity with
        another node.  This can happen immediately if the connection is closed
        but can take a maximum of 10 seconds of idle time.  Then the node
        must wait long enough to give heartbeating a chance to declare the
        node dead.  It does this by waiting two iterations longer than
        the number of iterations needed to consider a node dead (see Q03 in
        the Heartbeat section of this FAQ).  The current default of 7
        iterations of 2 seconds results in waiting for 9 iterations or 18
        seconds.  By default, then, a maximum of 28 seconds can pass from the
        time a network fault occurs until a node fences itself.
--------------

I don't understand why are we giving heartbeating extra 2 iterations
to declare a node dead in case of split brain? What I think is, if we
are already missing disk heartbeat for a node, then it's missed
heartbeat counter has already been started and we would declare that
node dead after 7 iterations. How do we include these extra 2
iterations?

What I want to say here is, after 10 seconds of TCP idle timeout for a
node, we believe that we will start missing disk heartbeats for that
node and we allow 9 iterations of such missed heartbeats, but how do
you inform the other thread, which is already doing this missed
heartbeat calculation (because we are missing disk hearbeats), that it
needs to wait for 2 more iterations before declaring the node dead. If
you don't inform that thread about this, then it will declare the
other node dead after 7 iterations only. So how this extra 2
iterations concept will come into picture?

Thanks.
Sumsha.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2006-05-26 17:23 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-05-17 14:20 [Ocfs2-devel] Fencing in OCFS2 Sum Sha
2006-05-18 14:46 ` Sum Sha
2006-05-18 18:48   ` Zach Brown
2006-05-18 19:21     ` Daniel Phillips
2006-05-19 14:40       ` Sum Sha
2006-05-20  2:00         ` Daniel Phillips
2006-05-22  7:42           ` Sum Sha
2006-05-25 21:35 ` Daniel Phillips
2006-05-26 17:23   ` Daniel Phillips

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.