All of lore.kernel.org
 help / color / mirror / Atom feed
From: Joel Becker <Joel.Becker@oracle.com>
To: ocfs2-devel@oss.oracle.com
Subject: [Ocfs2-devel] ocfs2_controld.cman
Date: Wed, 8 Apr 2009 15:22:37 -0700	[thread overview]
Message-ID: <20090408222237.GC8561@mail.oracle.com> (raw)
In-Reply-To: <20090408213317.GC11662@redhat.com>

On Wed, Apr 08, 2009 at 04:33:17PM -0500, David Teigland wrote:
> If I start ocfs2_controld.cman in parallel on a few nodes, only one of them
> starts up, the others exit with one of these errors:
> 
> call_section_read at 370: Reading from section "daemon_protocol" on checkpoint "ocfs2:controld" (try 1)
> call_section_read at 387: Checkpoint "ocfs2:controld" does not have a section named "daemon_protocol"
> 
> call_section_read at 370: Reading from section "daemon_protocol" on checkpoint "ocfs2:controld" (try 1)
> call_section_read at 397: Unable to read section "daemon_protocol" from checkpoint "ocfs2:controld": Object does not exist
> 
> It does work ok if I remove those two checks.

	These checks are required - otherwise you end up with unsync'd
daemons, which is crap.
	I've changed the daemon to wait indefinitely, and that's
something lmb was testing.  See the controld-fixes branch of
ocfs2-tools.git.  That should fix these problems.

> Another thing I noticed while looking in the code is that it assumes a single
> node will become the first member of a cpg on its own when a bunch of nodes
> join at once: daemon_joined(daemon_group.cg_member_count == 1);
> 
> This isn't a correct assumption.  It's possible that two or more nodes joining
> at once will become initial members together.  (I realize that it's a very
> convenient assumption to make after using it in previous pre-cpg programs, and
> it may take a fair amount of work to do without.)

	Well, this is going to be fun.  I have to figure out which
daemon is the "first", and now it's just racy.  I could swear that
someone told me cpg would guarantee i see the joins in order, not at the
same time.

Joel

-- 

"Three o'clock is always too late or too early for anything you
 want to do."
        - Jean-Paul Sartre

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127

  reply	other threads:[~2009-04-08 22:22 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-04-08 21:33 [Ocfs2-devel] ocfs2_controld.cman David Teigland
2009-04-08 22:22 ` Joel Becker [this message]
2009-04-09 11:38   ` Andrew Beekhof
2009-04-09 16:11     ` David Teigland
2009-04-09 18:44       ` Joel Becker
2009-04-09 18:45     ` Joel Becker
2009-04-09 16:22   ` David Teigland
2009-04-09 18:46     ` Joel Becker
2009-04-10  0:11   ` Joel Becker
2009-04-14 23:39   ` [Ocfs2-devel] [PATCH] ocfs2_controld: Handle simultaneous group join Joel Becker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090408222237.GC8561@mail.oracle.com \
    --to=joel.becker@oracle.com \
    --cc=ocfs2-devel@oss.oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.