From mboxrd@z Thu Jan 1 00:00:00 1970 From: Steven Whitehouse Date: Mon, 25 Jun 2018 14:48:39 +0100 Subject: [Cluster-devel] [GFS2 PATCH] GFS2: Skip taking journal recovery locks for spectators In-Reply-To: <1041963515.45201495.1529844592291.JavaMail.zimbra@redhat.com> References: <2061107895.44805743.1529618725855.JavaMail.zimbra@redhat.com> <602032757.44805751.1529618745642.JavaMail.zimbra@redhat.com> <1041963515.45201495.1529844592291.JavaMail.zimbra@redhat.com> Message-ID: <36cde40b-722d-8b42-d68b-1824bc4a5ded@redhat.com> List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Hi, On 24/06/18 13:49, Bob Peterson wrote: > ----- Original Message ----- >> Bob, >> >> On 22 June 2018 at 00:05, Bob Peterson wrote: >>> Hi, >>> >>> Before this patch, spectator mounts would try to acquire the dlm's >>> control lock and mounted lock as part of its normal recovery sequence. >>> It's not necessary because spectators don't ever do journal recovery. >>> And if they acquire those locks (at all) it will prevent another >>> first-mounter from acquiring the lock in EX mode, which means it >>> also cannot do journal recovery because it doesn't think it's the >>> first node to mount the file system. >>> >>> This patch checks if the mounter is a spectator, and if so, avoids >>> grabbing the control lock and mounted lock. This allows a secondary >>> mounter who is really the first rw mounter, to do journal recovery: >>> since the spectator doesn't acquire those locks, it can grab them >>> in EX mode, and therefore consider itself to be the first mounter. >> if I understand things correctly, spectator mounts were so far holding >> the mounted_lock in PR mode which did prevent other nodes from >> starting recovery. With this change, spectators can now end up reading >> from the filesystem during recovery. That doesn't seem safe, >> especially since recovery won't take the usual glocks, and so there >> would be no synchronization with spectators anymore. I'm wondering if >> spectators could pause and relinquish the mounted_lock when another >> node tries to acquire it in EX mode, and resume after re-acquiring the >> lock. Or something similar. >> >> Thanks, >> Andreas >> > Hi Andreas, > > With the patch, I suppose _new_ spectators can now mount the file > system before recovery and see an inconsistent view. However, what's > the difference between that and an already-mounted spectator, who can > already see an unrecovered file system before the journal is replayed? > Iow, if it's already mounted as spectator, but before the rw mounting > node has done its recovery? Or even been fenced for that matter. > It turns out, there is none. > > I did a little experiment using only stock kernels: > > 1. Node 1 mounts rw > 2. Node 2 mounts -o spectator > 3. Node 1 writes to the file system > 4. Node 1 gets fenced while it's writing > 5. Node 2 does ls -lR on the file system before the rw node reboots > > Guess what happens? Boom! It sees an inconsistent view of the file > system. This is true, in theory, with all kernels today. > > [43241.051707] GFS2: fsid=dm-12.s: fatal: invalid metadata block > [43241.051707] GFS2: fsid=dm-12.s: bh = 360123 (magic number) > [43241.051707] GFS2: fsid=dm-12.s: function = gfs2_meta_indirect_buffer, file = fs/gfs2/meta_io.c, line = 428 > [43241.072890] GFS2: fsid=dm-12.s: about to withdraw this file system > [43241.079112] GFS2: fsid=dm-12.s: withdrawn That is clearly wrong. There is no reason that a spectator that has mounted the fs prior to some failure event should ever fail in this way. They should be blocked by the DLM locks until recovery has completed. Spectators that try to mount a filesystem which needs recovery should either be blocked until recovery has taken place, or those mounts should fail if there is no way to perform recovery (e.g. if there are no non-spectator mounts) This is something that we should make sure is included in testing to avoid regressions in this area. It sounds though as if we may need to take a wider look at this to make sure that all the corner cases are covered, Steve.