cluster-devel.redhat.com archive mirror
 help / color / mirror / Atom feed
From: David Teigland <teigland@redhat.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] waiting in init.d/cman
Date: Wed, 5 Aug 2009 11:12:39 -0500	[thread overview]
Message-ID: <20090805161239.GA17292@redhat.com> (raw)

Back in the busy days of cluster3 development, I spent a little time looking
at the issue of waiting for quorum (and other waiting/timeouts) during
init.d/cman startup.

I wanted to clean up cluster2's somewhat arbitrary approach and have explicit,
intentional behavior around what each init.d/cman step would wait for and what
it wouldn't.  Strangely, it was fence_tool join where all sorts of odd
waits/timeouts had been wedged at various times.

In untangling and fixing, I'm not sure I got it quite right.  Current behavior
is that init.d/cman runs through and completes successfully very quickly
without waiting for quorum.  This seems nice, because it can be annoying to
have init.d/cman block.  In general it works too, it just ends up delaying the
wait for quorum until some cluster-using service starts later (clvmd,
rgmanager, gfs mount).

But, I think it may be best for init.d/cman to wait explicitly for quorum.  It
would be clearer what's happening (what's delaying startup), which was one of
the cluster2 problems.  So, roughly, init.d/cman would do:

- cman_tool join, print "Joining cluster"
- qdiskd (if configured), print "Starting qdiskd"
- wait for quorum, print "Waiting for quorum"

Any reasons to not do this or do it differently?

Related to this is the broader issue of waiting and timeouts in init.d/cman.
It would be nice to not have timeouts... I think the main reason for them is
that cman has started before the ssh service, so people could never log in if
cman was stuck (we talked about this a while back and I guess decided we
couldn't move cman later in the startup.)

Here's the startup with each wait/timeout mentioned (steps 3,4 only if qdisk
is configured.)

1. cman_tool join -w -t 120
2. WAIT/120s for join to complete, in cman_tool from the -w -t 120 options
3. qdiskd
4. WAIT/20s for cman to recognize qdisk (?), in init script loop
5. WAIT/??s for quorum, new step probably via cman_tool wait -q -t ??
6. start other daemons
7. fence_tool join -w 20
8. WAIT/20s for fence domain join to complete, in fence_tool from -w 20 option

step 2: there's been some doubt about what join -w actually gives us; at a
minimum -w may be useful here to catch delayed startup errors from corosync
and to be sure it's started up enough that qdiskd can use it in step 3.
Otherwise, the wait in step 5 seems to obviate the need for waiting at all in
step 2.

step 5: this is the only wait that people will typically notice during normal
operation.  Any suggestions on a timeout here?  And if it expires should
init.d/cman exit with a failure?  (I believe that's what other timeouts
cause.)

Dave



             reply	other threads:[~2009-08-05 16:12 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-05 16:12 David Teigland [this message]
2009-08-05 17:25 ` [Cluster-devel] waiting in init.d/cman Fabio M. Di Nitto
2009-08-05 18:20   ` David Teigland
2009-08-05 19:32     ` Fabio M. Di Nitto
2009-08-05 19:43       ` Bob Peterson
2009-08-06 16:05       ` David Teigland

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090805161239.GA17292@redhat.com \
    --to=teigland@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).