cluster-devel.redhat.com archive mirror
 help / color / mirror / Atom feed
From: David Teigland <teigland@redhat.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [PATCH 5/5] gfs2: dlm based recovery coordination
Date: Wed, 21 Dec 2011 10:40:48 -0500	[thread overview]
Message-ID: <20111221154048.GA32631@redhat.com> (raw)
In-Reply-To: <1324464321.2718.13.camel@menhir>

On Wed, Dec 21, 2011 at 10:45:21AM +0000, Steven Whitehouse wrote:
> I don't think I understand whats going on in that case. What I thought
> should be happening was this:
> 
>  - Try to get mounter lock in EX
>    - If successful, then we are the first mounter so recover all
>      journals
>    - Write info into LVB
>    - Drop mounter lock to PR so other nodes can mount
> 
>  - If failed to get mounter lock in EX, then wait for lock in PR state
>    - This will block until the EX lock is dropped to PR
>    - Read info from LVB
> 
> So a node with the mounter lock in EX knows that it is always the first
> mounter and will recover all journals before demoting the mounter lock
> to PR. A node with the mounter lock in PR may only recover its own
> journal (at mount time).

I previously used one lock similar to that, but had to change it a bit.
I had to split it across two separate locks, called control_lock and
mounted_lock.  There need to be two because of two conflicting requirements.

The control_lock lvb is used to communicate the generation number and jid
bits.  Writing the lvb requires an EX lock, and EX prevents others from
continually holding a PR lock.  Without mounted nodes continually holding
a PR lock we can't use EX to indicate first mounter.

So, the mounted_lock (no lvb) is used to indicate the first mounter.
Here all mounted nodes continually hold a PR lock, and a mounting node
attempts to get an EX lock, so any node to get an EX lock is the first
mounter.

(I previously used control_lock with "zero lvb" to indicate first mounter,
but there are some fairly common cases where the lvb may not be zero when
we need a first mounter.)

Now back to the reason why we need to retry lock requests and can't just
block.  It's not related to the first mounter case.  When a node mounts,
it needs to wait for other (previously mounted) nodes to update the
control_lock lvb with the latest generation number, and then it also needs
to wait for any bits set in the lvb to be cleared.  i.e. it needs to wait
for any unrecovered journals to be recovered before it finishes mounting.

To do this, it needs to wait in a loop reading the control_lock lvb.  The
question is whether we want to add some sort of delay to that loop or not,
and how.  msleep_interruptible(), schedule_timeout_interruptible(),
something else?

Dave



  reply	other threads:[~2011-12-21 15:40 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-12-16 22:03 [Cluster-devel] [PATCH 5/5] gfs2: dlm based recovery coordination David Teigland
2011-12-19 13:07 ` Steven Whitehouse
2011-12-19 17:47   ` David Teigland
2011-12-20 10:39     ` Steven Whitehouse
2011-12-20 19:16       ` David Teigland
2011-12-20 21:04         ` David Teigland
2011-12-21 10:45           ` Steven Whitehouse
2011-12-21 15:40             ` David Teigland [this message]
2011-12-22 21:23     ` David Teigland
2011-12-23  9:19       ` Steven Whitehouse
2011-12-19 15:17 ` Steven Whitehouse
2012-01-05 15:08 ` Bob Peterson
2012-01-05 15:21   ` David Teigland
2012-01-05 15:40     ` Steven Whitehouse
2012-01-05 16:16       ` David Teigland
2012-01-05 16:45 ` Bob Peterson
  -- strict thread matches above, loose matches on Subject: below --
2012-01-05 16:46 David Teigland
2012-01-05 16:58 ` Steven Whitehouse
2012-01-05 17:13   ` David Teigland
2012-01-09 16:36 ` Steven Whitehouse
2012-01-09 16:46   ` David Teigland
2012-01-09 17:00     ` David Teigland
2012-01-09 17:04       ` Steven Whitehouse
2012-01-09 17:02     ` Steven Whitehouse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20111221154048.GA32631@redhat.com \
    --to=teigland@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).