From: Bob Peterson <rpeterso@redhat.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [PATCH] gfs2: fix GL_SKIP node_scope problems
Date: Wed, 29 Sep 2021 08:27:50 -0500 [thread overview]
Message-ID: <6e1f6910-bc37-ae93-0362-dee45c579d14@redhat.com> (raw)
In-Reply-To: <20210929132103.192481-1-rpeterso@redhat.com>
On 9/29/21 8:21 AM, Bob Peterson wrote:
> Before this patch, when a glock was locked, the very first holder on the
> queue would unlock the lockref and call the go_lock glops function (if
> one existed), unless GL_SKIP was specified. When we introduced the new
> node-scope concept, we allowed multiple holders to lock glocks in EX mode
> and share the lock.
>
> But node-scope introduced a new problem: if the first holder has GL_SKIP
> and the next one does NOT, since it is not the first holder on the queue,
> the go_lock op was not called. Eventually the GL_SKIP holder may call the
> go_lock sub-function (e.g. gfs2_rgrp_bh_get) but there was still a
> window of time in which another non-GL_SKIP holder assumes the go_lock
> function had been called by the first holder. In the case of rgrp glocks,
> this led to a NULL pointer dereference on the buffer_heads.
>
> This patch tries to fix the problem by introducing two new glock flags:
>
> GLF_GO_LOCK_NEEDED, which keeps track of when the go_lock function needs
> to be called to "fill in" or "read" the object before it is referenced.
>
> GLF_GO_LOCK_IN_PROG which is used to determine when a process is
> in the process of reading in the object. Whenever a function needs to
> reference the object, it checks the GLF_GO_LOCK_NEEDED flag, and if
> set, it sets GLF_GO_LOCK_IN_PROG and calls the glops "go_lock" function.
>
> As before, the gl_lockref spin_lock is unlocked during the IO operation,
> which may take a relatively long amount of time to complete. While
> unlocked, if another process determines go_lock is still needed, it sees
> the GLF_GO_LOCK_IN_PROG flag is set, and waits for the go_lock glop
> operation to be completed. Once GLF_GO_LOCK_IN_PROG is cleared, it needs
> to check GLF_GO_LOCK_NEEDED again because the other process's go_lock
> operation may not have been successful.
>
> To faciliate this change, the go_lock section of function do_promote
> was extracted to its own new function, gfs2_go_lock. The reason we do
> this is because GL_SKIP callers often read in the object later.
> Before this patch, those GL_SKIP callers (like gfs2_inode_lookup and
> update_rgrp_lvb) called directly into the object-read functions
> (gfs2_inode_refresh and gfs2_rgrp_bh_get respectively), but that never
> cleared the new GLF_GO_LOCK_NEEDED flag. This patch changes those
> functions so they call into the new gfs2_go_lock directly, which takes
> care of all that.
>
> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
> ---
> fs/gfs2/glock.c | 136 +++++++++++++++++++++++++++++++++--------------
> fs/gfs2/glock.h | 1 +
> fs/gfs2/glops.c | 21 ++++----
> fs/gfs2/incore.h | 3 +-
> fs/gfs2/inode.c | 4 +-
> fs/gfs2/rgrp.c | 12 ++---
> fs/gfs2/super.c | 6 ++-
> 7 files changed, 121 insertions(+), 62 deletions(-)
(snip)
> @@ -2153,6 +2209,8 @@ static const char *gflags2str(char *buf, const struct gfs2_glock *gl)
> *p++ = 'P';
> if (test_bit(GLF_FREEING, gflags))
> *p++ = 'x';
> + if (test_bit(GLF_GO_LOCK_NEEDED, gflags))
> + *p++ = 'g';
> *p = 0;
> return buf;
> }
Hi,
As soon as I sent this patch out I realized I forgot to add the second
new GLF bit, GLF_GO_LOCK_IN_PROG, to gflags2str. So the final version
will include it.
Also, this version passed 500 iterations of the failing test case and
a full run of xfstests.
Regards,
Bob Peterson
next prev parent reply other threads:[~2021-09-29 13:27 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-09-29 13:21 [Cluster-devel] [PATCH] gfs2: fix GL_SKIP node_scope problems Bob Peterson
2021-09-29 13:27 ` Bob Peterson [this message]
2021-09-29 15:35 ` Andreas Gruenbacher
2021-09-29 16:42 ` Andreas Gruenbacher
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6e1f6910-bc37-ae93-0362-dee45c579d14@redhat.com \
--to=rpeterso@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).