[Cluster-devel] [GFS2 PATCH] GFS2: Skip taking journal recovery locks for spectators

cluster-devel.redhat.com archive mirror
 help / color / mirror / Atom feed

From: Steven Whitehouse <swhiteho@redhat.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [GFS2 PATCH] GFS2: Skip taking journal recovery locks for spectators
Date: Mon, 25 Jun 2018 14:48:39 +0100	[thread overview]
Message-ID: <36cde40b-722d-8b42-d68b-1824bc4a5ded@redhat.com> (raw)
In-Reply-To: <1041963515.45201495.1529844592291.JavaMail.zimbra@redhat.com>

Hi,


On 24/06/18 13:49, Bob Peterson wrote:
> ----- Original Message -----
>> Bob,
>>
>> On 22 June 2018 at 00:05, Bob Peterson <rpeterso@redhat.com> wrote:
>>> Hi,
>>>
>>> Before this patch, spectator mounts would try to acquire the dlm's
>>> control lock and mounted lock as part of its normal recovery sequence.
>>> It's not necessary because spectators don't ever do journal recovery.
>>> And if they acquire those locks (at all) it will prevent another
>>> first-mounter from acquiring the lock in EX mode, which means it
>>> also cannot do journal recovery because it doesn't think it's the
>>> first node to mount the file system.
>>>
>>> This patch checks if the mounter is a spectator, and if so, avoids
>>> grabbing the control lock and mounted lock. This allows a secondary
>>> mounter who is really the first rw mounter, to do journal recovery:
>>> since the spectator doesn't acquire those locks, it can grab them
>>> in EX mode, and therefore consider itself to be the first mounter.
>> if I understand things correctly, spectator mounts were so far holding
>> the mounted_lock in PR mode which did prevent other nodes from
>> starting recovery. With this change, spectators can now end up reading
>> from the filesystem during recovery. That doesn't seem safe,
>> especially since recovery won't take the usual glocks, and so there
>> would be no synchronization with spectators anymore. I'm wondering if
>> spectators could pause and relinquish the mounted_lock when another
>> node tries to acquire it in EX mode, and resume after re-acquiring the
>> lock. Or something similar.
>>
>> Thanks,
>> Andreas
>>
> Hi Andreas,
>
> With the patch, I suppose _new_ spectators can now mount the file
> system before recovery and see an inconsistent view. However, what's
> the difference between that and an already-mounted spectator, who can
> already see an unrecovered file system before the journal is replayed?
> Iow, if it's already mounted as spectator, but before the rw mounting
> node has done its recovery? Or even been fenced for that matter.
> It turns out, there is none.
>
> I did a little experiment using only stock kernels:
>
> 1. Node 1 mounts rw
> 2. Node 2 mounts -o spectator
> 3. Node 1 writes to the file system
> 4. Node 1 gets fenced while it's writing
> 5. Node 2 does ls -lR on the file system before the rw node reboots
>
> Guess what happens? Boom! It sees an inconsistent view of the file
> system. This is true, in theory, with all kernels today.
>
> [43241.051707] GFS2: fsid=dm-12.s: fatal: invalid metadata block
> [43241.051707] GFS2: fsid=dm-12.s:   bh = 360123 (magic number)
> [43241.051707] GFS2: fsid=dm-12.s:   function = gfs2_meta_indirect_buffer, file = fs/gfs2/meta_io.c, line = 428
> [43241.072890] GFS2: fsid=dm-12.s: about to withdraw this file system
> [43241.079112] GFS2: fsid=dm-12.s: withdrawn
That is clearly wrong. There is no reason that a spectator that has 
mounted the fs prior to some failure event should ever fail in this way. 
They should be blocked by the DLM locks until recovery has completed.

Spectators that try to mount a filesystem which needs recovery should 
either be blocked until recovery has taken place, or those mounts should 
fail if there is no way to perform recovery (e.g. if there are no 
non-spectator mounts)

This is something that we should make sure is included in testing to 
avoid regressions in this area. It sounds though as if we may need to 
take a wider look at this to make sure that all the corner cases are 
covered,

Steve.

     prev parent reply	other threads:[~2018-06-25 13:48 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <2061107895.44805743.1529618725855.JavaMail.zimbra@redhat.com>
2018-06-21 22:05 ` [Cluster-devel] [GFS2 PATCH] GFS2: Skip taking journal recovery locks for spectators Bob Peterson
2018-06-22 20:03   ` Andreas Gruenbacher
2018-06-24 12:49     ` Bob Peterson
2018-06-25 13:48       ` Steven Whitehouse [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=36cde40b-722d-8b42-d68b-1824bc4a5ded@redhat.com \
    --to=swhiteho@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).