All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daniel Phillips <phillips@google.com>
To: ocfs2-devel@oss.oracle.com
Subject: [Ocfs2-devel] Fencing in OCFS2
Date: Thu, 25 May 2006 14:35:43 -0700	[thread overview]
Message-ID: <4476232F.2070700@google.com> (raw)
In-Reply-To: <c04850370605170720g2ce8d0cayd5328712104fcb21@mail.gmail.com>

Sum Sha wrote:
> --------------
> Q05     How long does the quorum process take?
> A05     First a node will realize that it doesn't have connectivity with
>         another node.  This can happen immediately if the connection is closed
>         but can take a maximum of 10 seconds of idle time.  Then the node
>         must wait long enough to give heartbeating a chance to declare the
>         node dead.  It does this by waiting two iterations longer than
>         the number of iterations needed to consider a node dead (see Q03 in
>         the Heartbeat section of this FAQ).  The current default of 7
>         iterations of 2 seconds results in waiting for 9 iterations or 18
>         seconds.  By default, then, a maximum of 28 seconds can pass from the
>         time a network fault occurs until a node fences itself.
> --------------
> 
> I don't understand why are we giving heartbeating extra 2 iterations
> to declare a node dead in case of split brain? What I think is, if we
> are already missing disk heartbeat for a node, then it's missed
> heartbeat counter has already been started and we would declare that
> node dead after 7 iterations. How do we include these extra 2
> iterations?

While working on the fencing harness RFC I realized why that extra wait
is necessary.  Heartbeat will continue pinging a node some number of
periods even while it receives no responses from the node.  The trouble
is, the remote node may be receiving the pings and answering them, but
the answers are getting lost somewhere along the route back.  So the
remote node does not yet know it is incommunicado.  Then heartbeat
gives up and stops pinging.  It is only at this point that the
remote node is sure to start its watchdog count.

Given:

   A = number of missed answers before heartbeat stops pinging
   B = number of missed pings before watchdog triggers
   H = heartbeat period
   L = maximum network latency within some confidence factor
   W = maximum latency between watchdog trigger and shutdown

the time to declare a node dead is:

   P(A + B) + 2L

so with:

   A = 2
   B = 2
   H = 2 seconds
   L = .5 seconds
   W = 10 seconds

we have:

   8 + 1 + 10 = 19 seconds

Network latency includes the maximum time to notice a ping and respond to
it, and the time required for heartbeat to notice the answer.  There is no
need to incorporate a safety factor because allowing more than one missed
ping is already a safety factor.

Did I miss anything in my bookkeeping?  I did not check to see if OCFS2's
heartbeat obeys this formula.

Unfortunately, it is difficult to establish dependable bounds for network
latency, so heartbeating is really a game of probabilities.  We should set
the safety factor high enough so that false positives do not cost more
downtime than would be saved by shorter timeouts.

Now, if we use a storage-side fencing method instead of a watchdog we can
set B and W to zero, giving 5 seconds using the example above.  This is
three times better and shows why we need a proper fencing harness sooner
rather than later.

Regards,

Daniel

  parent reply	other threads:[~2006-05-25 21:35 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-05-17 14:20 [Ocfs2-devel] Fencing in OCFS2 Sum Sha
2006-05-18 14:46 ` Sum Sha
2006-05-18 18:48   ` Zach Brown
2006-05-18 19:21     ` Daniel Phillips
2006-05-19 14:40       ` Sum Sha
2006-05-20  2:00         ` Daniel Phillips
2006-05-22  7:42           ` Sum Sha
2006-05-25 21:35 ` Daniel Phillips [this message]
2006-05-26 17:23   ` Daniel Phillips

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4476232F.2070700@google.com \
    --to=phillips@google.com \
    --cc=ocfs2-devel@oss.oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.