* [Cluster-devel] [GFS2 PATCH] GFS2: Skip taking journal recovery locks for spectators
[not found] <2061107895.44805743.1529618725855.JavaMail.zimbra@redhat.com>
@ 2018-06-21 22:05 ` Bob Peterson
2018-06-22 20:03 ` Andreas Gruenbacher
0 siblings, 1 reply; 4+ messages in thread
From: Bob Peterson @ 2018-06-21 22:05 UTC (permalink / raw)
To: cluster-devel.redhat.com
Hi,
Before this patch, spectator mounts would try to acquire the dlm's
control lock and mounted lock as part of its normal recovery sequence.
It's not necessary because spectators don't ever do journal recovery.
And if they acquire those locks (at all) it will prevent another
first-mounter from acquiring the lock in EX mode, which means it
also cannot do journal recovery because it doesn't think it's the
first node to mount the file system.
This patch checks if the mounter is a spectator, and if so, avoids
grabbing the control lock and mounted lock. This allows a secondary
mounter who is really the first rw mounter, to do journal recovery:
since the spectator doesn't acquire those locks, it can grab them
in EX mode, and therefore consider itself to be the first mounter.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
fs/gfs2/lock_dlm.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/gfs2/lock_dlm.c b/fs/gfs2/lock_dlm.c
index 006c6164f759..59c7f2e8f077 100644
--- a/fs/gfs2/lock_dlm.c
+++ b/fs/gfs2/lock_dlm.c
@@ -807,6 +807,8 @@ static int control_mount(struct gfs2_sbd *sdp)
msleep_interruptible(500);
+ if (sdp->sd_args.ar_spectator)
+ goto locks_done;
/*
* Acquire control_lock in EX and mounted_lock in either EX or PR.
* control_lock lvb keeps track of any pending journal recoveries.
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [Cluster-devel] [GFS2 PATCH] GFS2: Skip taking journal recovery locks for spectators
2018-06-21 22:05 ` [Cluster-devel] [GFS2 PATCH] GFS2: Skip taking journal recovery locks for spectators Bob Peterson
@ 2018-06-22 20:03 ` Andreas Gruenbacher
2018-06-24 12:49 ` Bob Peterson
0 siblings, 1 reply; 4+ messages in thread
From: Andreas Gruenbacher @ 2018-06-22 20:03 UTC (permalink / raw)
To: cluster-devel.redhat.com
Bob,
On 22 June 2018 at 00:05, Bob Peterson <rpeterso@redhat.com> wrote:
> Hi,
>
> Before this patch, spectator mounts would try to acquire the dlm's
> control lock and mounted lock as part of its normal recovery sequence.
> It's not necessary because spectators don't ever do journal recovery.
> And if they acquire those locks (at all) it will prevent another
> first-mounter from acquiring the lock in EX mode, which means it
> also cannot do journal recovery because it doesn't think it's the
> first node to mount the file system.
>
> This patch checks if the mounter is a spectator, and if so, avoids
> grabbing the control lock and mounted lock. This allows a secondary
> mounter who is really the first rw mounter, to do journal recovery:
> since the spectator doesn't acquire those locks, it can grab them
> in EX mode, and therefore consider itself to be the first mounter.
if I understand things correctly, spectator mounts were so far holding
the mounted_lock in PR mode which did prevent other nodes from
starting recovery. With this change, spectators can now end up reading
from the filesystem during recovery. That doesn't seem safe,
especially since recovery won't take the usual glocks, and so there
would be no synchronization with spectators anymore. I'm wondering if
spectators could pause and relinquish the mounted_lock when another
node tries to acquire it in EX mode, and resume after re-acquiring the
lock. Or something similar.
Thanks,
Andreas
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Cluster-devel] [GFS2 PATCH] GFS2: Skip taking journal recovery locks for spectators
2018-06-22 20:03 ` Andreas Gruenbacher
@ 2018-06-24 12:49 ` Bob Peterson
2018-06-25 13:48 ` Steven Whitehouse
0 siblings, 1 reply; 4+ messages in thread
From: Bob Peterson @ 2018-06-24 12:49 UTC (permalink / raw)
To: cluster-devel.redhat.com
----- Original Message -----
> Bob,
>
> On 22 June 2018 at 00:05, Bob Peterson <rpeterso@redhat.com> wrote:
> > Hi,
> >
> > Before this patch, spectator mounts would try to acquire the dlm's
> > control lock and mounted lock as part of its normal recovery sequence.
> > It's not necessary because spectators don't ever do journal recovery.
> > And if they acquire those locks (at all) it will prevent another
> > first-mounter from acquiring the lock in EX mode, which means it
> > also cannot do journal recovery because it doesn't think it's the
> > first node to mount the file system.
> >
> > This patch checks if the mounter is a spectator, and if so, avoids
> > grabbing the control lock and mounted lock. This allows a secondary
> > mounter who is really the first rw mounter, to do journal recovery:
> > since the spectator doesn't acquire those locks, it can grab them
> > in EX mode, and therefore consider itself to be the first mounter.
>
> if I understand things correctly, spectator mounts were so far holding
> the mounted_lock in PR mode which did prevent other nodes from
> starting recovery. With this change, spectators can now end up reading
> from the filesystem during recovery. That doesn't seem safe,
> especially since recovery won't take the usual glocks, and so there
> would be no synchronization with spectators anymore. I'm wondering if
> spectators could pause and relinquish the mounted_lock when another
> node tries to acquire it in EX mode, and resume after re-acquiring the
> lock. Or something similar.
>
> Thanks,
> Andreas
>
Hi Andreas,
With the patch, I suppose _new_ spectators can now mount the file
system before recovery and see an inconsistent view. However, what's
the difference between that and an already-mounted spectator, who can
already see an unrecovered file system before the journal is replayed?
Iow, if it's already mounted as spectator, but before the rw mounting
node has done its recovery? Or even been fenced for that matter.
It turns out, there is none.
I did a little experiment using only stock kernels:
1. Node 1 mounts rw
2. Node 2 mounts -o spectator
3. Node 1 writes to the file system
4. Node 1 gets fenced while it's writing
5. Node 2 does ls -lR on the file system before the rw node reboots
Guess what happens? Boom! It sees an inconsistent view of the file
system. This is true, in theory, with all kernels today.
[43241.051707] GFS2: fsid=dm-12.s: fatal: invalid metadata block
[43241.051707] GFS2: fsid=dm-12.s: bh = 360123 (magic number)
[43241.051707] GFS2: fsid=dm-12.s: function = gfs2_meta_indirect_buffer, file = fs/gfs2/meta_io.c, line = 428
[43241.072890] GFS2: fsid=dm-12.s: about to withdraw this file system
[43241.079112] GFS2: fsid=dm-12.s: withdrawn
So there's a larger issue here. We should prevent spectators from
seeing an inconsistent view of the file system, regardless of
the status of the mounted lock.
The concept of cluster quorum was supposed to prevent inconsistent
views because there's always someone around to replay the journal.
But when we introduced spectator mounts, we nullified that,
unintentionally.
In the old pre-spectator way of thinking, when a cluster loses
quorum, the dlm locking is temporarily halted until quorum is
regained, thus preventing inconsistent views until quorum is
regained and the journal is replayed. This can even be done by a
read-only mounter, and before the node rejoins the cluster.
With spectators, the there might not be anyone capable of replaying
the journal, regardless of quorum. You could have a 16 node cluster
with 15 spectators, all of whom instantly see an inconsistent view
of the file system metadata as soon as the writer disappears from
the cluster.
Bottom line: My patch makes things a little better because now at
least the rw node can do recovery and replay the journal, whereas
before it couldn't. It's true that we used to prevent other nodes
from mounting -o spectator at that point, but our existing nodes
who mount as spectator already have the same (worse) problem.
We have a larger problem that is not addressed by my patch, which
might be difficult to resolve.
Bob Peterson
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Cluster-devel] [GFS2 PATCH] GFS2: Skip taking journal recovery locks for spectators
2018-06-24 12:49 ` Bob Peterson
@ 2018-06-25 13:48 ` Steven Whitehouse
0 siblings, 0 replies; 4+ messages in thread
From: Steven Whitehouse @ 2018-06-25 13:48 UTC (permalink / raw)
To: cluster-devel.redhat.com
Hi,
On 24/06/18 13:49, Bob Peterson wrote:
> ----- Original Message -----
>> Bob,
>>
>> On 22 June 2018 at 00:05, Bob Peterson <rpeterso@redhat.com> wrote:
>>> Hi,
>>>
>>> Before this patch, spectator mounts would try to acquire the dlm's
>>> control lock and mounted lock as part of its normal recovery sequence.
>>> It's not necessary because spectators don't ever do journal recovery.
>>> And if they acquire those locks (at all) it will prevent another
>>> first-mounter from acquiring the lock in EX mode, which means it
>>> also cannot do journal recovery because it doesn't think it's the
>>> first node to mount the file system.
>>>
>>> This patch checks if the mounter is a spectator, and if so, avoids
>>> grabbing the control lock and mounted lock. This allows a secondary
>>> mounter who is really the first rw mounter, to do journal recovery:
>>> since the spectator doesn't acquire those locks, it can grab them
>>> in EX mode, and therefore consider itself to be the first mounter.
>> if I understand things correctly, spectator mounts were so far holding
>> the mounted_lock in PR mode which did prevent other nodes from
>> starting recovery. With this change, spectators can now end up reading
>> from the filesystem during recovery. That doesn't seem safe,
>> especially since recovery won't take the usual glocks, and so there
>> would be no synchronization with spectators anymore. I'm wondering if
>> spectators could pause and relinquish the mounted_lock when another
>> node tries to acquire it in EX mode, and resume after re-acquiring the
>> lock. Or something similar.
>>
>> Thanks,
>> Andreas
>>
> Hi Andreas,
>
> With the patch, I suppose _new_ spectators can now mount the file
> system before recovery and see an inconsistent view. However, what's
> the difference between that and an already-mounted spectator, who can
> already see an unrecovered file system before the journal is replayed?
> Iow, if it's already mounted as spectator, but before the rw mounting
> node has done its recovery? Or even been fenced for that matter.
> It turns out, there is none.
>
> I did a little experiment using only stock kernels:
>
> 1. Node 1 mounts rw
> 2. Node 2 mounts -o spectator
> 3. Node 1 writes to the file system
> 4. Node 1 gets fenced while it's writing
> 5. Node 2 does ls -lR on the file system before the rw node reboots
>
> Guess what happens? Boom! It sees an inconsistent view of the file
> system. This is true, in theory, with all kernels today.
>
> [43241.051707] GFS2: fsid=dm-12.s: fatal: invalid metadata block
> [43241.051707] GFS2: fsid=dm-12.s: bh = 360123 (magic number)
> [43241.051707] GFS2: fsid=dm-12.s: function = gfs2_meta_indirect_buffer, file = fs/gfs2/meta_io.c, line = 428
> [43241.072890] GFS2: fsid=dm-12.s: about to withdraw this file system
> [43241.079112] GFS2: fsid=dm-12.s: withdrawn
That is clearly wrong. There is no reason that a spectator that has
mounted the fs prior to some failure event should ever fail in this way.
They should be blocked by the DLM locks until recovery has completed.
Spectators that try to mount a filesystem which needs recovery should
either be blocked until recovery has taken place, or those mounts should
fail if there is no way to perform recovery (e.g. if there are no
non-spectator mounts)
This is something that we should make sure is included in testing to
avoid regressions in this area. It sounds though as if we may need to
take a wider look at this to make sure that all the corner cases are
covered,
Steve.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2018-06-25 13:48 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <2061107895.44805743.1529618725855.JavaMail.zimbra@redhat.com>
2018-06-21 22:05 ` [Cluster-devel] [GFS2 PATCH] GFS2: Skip taking journal recovery locks for spectators Bob Peterson
2018-06-22 20:03 ` Andreas Gruenbacher
2018-06-24 12:49 ` Bob Peterson
2018-06-25 13:48 ` Steven Whitehouse
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).