cluster-devel.redhat.com archive mirror
 help / color / mirror / Atom feed
From: Bob Peterson <rpeterso@redhat.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [PATCH] gfs2: fix GL_SKIP node_scope problems
Date: Wed, 29 Sep 2021 08:27:50 -0500	[thread overview]
Message-ID: <6e1f6910-bc37-ae93-0362-dee45c579d14@redhat.com> (raw)
In-Reply-To: <20210929132103.192481-1-rpeterso@redhat.com>

On 9/29/21 8:21 AM, Bob Peterson wrote:
> Before this patch, when a glock was locked, the very first holder on the
> queue would unlock the lockref and call the go_lock glops function (if
> one existed), unless GL_SKIP was specified. When we introduced the new
> node-scope concept, we allowed multiple holders to lock glocks in EX mode
> and share the lock.
> 
> But node-scope introduced a new problem: if the first holder has GL_SKIP
> and the next one does NOT, since it is not the first holder on the queue,
> the go_lock op was not called. Eventually the GL_SKIP holder may call the
> go_lock sub-function (e.g. gfs2_rgrp_bh_get) but there was still a
> window of time in which another non-GL_SKIP holder assumes the go_lock
> function had been called by the first holder. In the case of rgrp glocks,
> this led to a NULL pointer dereference on the buffer_heads.
> 
> This patch tries to fix the problem by introducing two new glock flags:
> 
> GLF_GO_LOCK_NEEDED, which keeps track of when the go_lock function needs
> to be called to "fill in" or "read" the object before it is referenced.
> 
> GLF_GO_LOCK_IN_PROG which is used to determine when a process is
> in the process of reading in the object. Whenever a function needs to
> reference the object, it checks the GLF_GO_LOCK_NEEDED flag, and if
> set, it sets GLF_GO_LOCK_IN_PROG and calls the glops "go_lock" function.
> 
> As before, the gl_lockref spin_lock is unlocked during the IO operation,
> which may take a relatively long amount of time to complete. While
> unlocked, if another process determines go_lock is still needed, it sees
> the GLF_GO_LOCK_IN_PROG flag is set, and waits for the go_lock glop
> operation to be completed. Once GLF_GO_LOCK_IN_PROG is cleared, it needs
> to check GLF_GO_LOCK_NEEDED again because the other process's go_lock
> operation may not have been successful.
> 
> To faciliate this change, the go_lock section of function do_promote
> was extracted to its own new function, gfs2_go_lock. The reason we do
> this is because GL_SKIP callers often read in the object later.
> Before this patch, those GL_SKIP callers (like gfs2_inode_lookup and
> update_rgrp_lvb) called directly into the object-read functions
> (gfs2_inode_refresh and gfs2_rgrp_bh_get respectively), but that never
> cleared the new GLF_GO_LOCK_NEEDED flag. This patch changes those
> functions so they call into the new gfs2_go_lock directly, which takes
> care of all that.
> 
> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
> ---
>   fs/gfs2/glock.c  | 136 +++++++++++++++++++++++++++++++++--------------
>   fs/gfs2/glock.h  |   1 +
>   fs/gfs2/glops.c  |  21 ++++----
>   fs/gfs2/incore.h |   3 +-
>   fs/gfs2/inode.c  |   4 +-
>   fs/gfs2/rgrp.c   |  12 ++---
>   fs/gfs2/super.c  |   6 ++-
>   7 files changed, 121 insertions(+), 62 deletions(-)
(snip)
> @@ -2153,6 +2209,8 @@ static const char *gflags2str(char *buf, const struct gfs2_glock *gl)
>   		*p++ = 'P';
>   	if (test_bit(GLF_FREEING, gflags))
>   		*p++ = 'x';
> +	if (test_bit(GLF_GO_LOCK_NEEDED, gflags))
> +		*p++ = 'g';
>   	*p = 0;
>   	return buf;
>   }

Hi,

As soon as I sent this patch out I realized I forgot to add the second
new GLF bit, GLF_GO_LOCK_IN_PROG, to gflags2str. So the final version
will include it.

Also, this version passed 500 iterations of the failing test case and
a full run of xfstests.

Regards,

Bob Peterson



  reply	other threads:[~2021-09-29 13:27 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-29 13:21 [Cluster-devel] [PATCH] gfs2: fix GL_SKIP node_scope problems Bob Peterson
2021-09-29 13:27 ` Bob Peterson [this message]
2021-09-29 15:35 ` Andreas Gruenbacher
2021-09-29 16:42   ` Andreas Gruenbacher

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6e1f6910-bc37-ae93-0362-dee45c579d14@redhat.com \
    --to=rpeterso@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).