cluster-devel.redhat.com archive mirror
 help / color / mirror / Atom feed
From: Bob Peterson <rpeterso@redhat.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [GFS2 PATCH v2 6/6] gfs2: introduce and use new glops go_lock_needed
Date: Thu, 16 Sep 2021 14:10:03 -0500	[thread overview]
Message-ID: <20210916191003.105866-7-rpeterso@redhat.com> (raw)
In-Reply-To: <20210916191003.105866-1-rpeterso@redhat.com>

Before this patch, when a glock was locked, the very first holder on the
queue would unlock the lockref and call the go_lock glops function (if
one exists), unless GL_SKIP was specified. When we introduced the new
node-scope concept, we allowed multiple holders to lock glocks in EX mode
and share the lock, but node-scope introduced a new problem: if the
first holder has GL_SKIP and the next one does NOT, since it is not the
first holder on the queue, the go_lock op was not called. Eventually the
GL_SKIP holder may call the go_lock sub-function (e.g. gfs2_rgrp_bh_get)
but there's still a race in which another non-GL_SKIP holder assumes the
go_lock function was called by the first holder. In the case of rgrp
glocks, this leads to a NULL pointer dereference on the buffer_heads.

This patch tries to fix the problem by introducing a new go_lock_needed
glops function: Now ALL callers who do not specify GL_SKIP should call
the go_lock_needed glops function to see if it should still be called.
This allows any holder (secondary, teriary, etc) to call the go_lock
function when needed.

However, this introduces a new race: Several node-scope EX holders could
all decide the lock needs go_lock, and call the go_lock function to read
in the buffers and operate on them. This can lead to situations in which
one process can call go_lock then create a reservation (rd_reserved+=)
but another process can do the same, then hit the gfs2_rgrp_bh_get
BUG_ON(rgd->rd_reserved) for the first holder's reservation.

To prevent this race, we hold the rgrp_lock_local during the rgrp_go_lock
function. The first caller will get the local lock, submit the IO
request and wait for it to complete. The second caller will wait for the
rgrp_local_lock, then gfs2_rgrp_bh_get will decide it no longer needs
to do the read, and continue on without penalty.

fixes: 06e908cd9ead ("gfs2: Allow node-wide exclusive glock sharing")
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/glock.c  | 30 +++++++++++++++---------------
 fs/gfs2/glops.c  | 16 +++++++++++++---
 fs/gfs2/incore.h |  1 +
 fs/gfs2/rgrp.c   | 22 +++++++++++++++++++++-
 fs/gfs2/rgrp.h   |  1 +
 5 files changed, 51 insertions(+), 19 deletions(-)

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 4fcf340603e7..6dfd33dc206b 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -403,21 +403,21 @@ __acquires(&gl->gl_lockref.lock)
 			continue;
 		if (may_grant(gl, gh)) {
 			first = gfs2_first_holder(gh);
-			if (first && glops->go_lock) {
-				if (!(gh->gh_flags & GL_SKIP)) {
-					spin_unlock(&gl->gl_lockref.lock);
-					/* FIXME: eliminate this eventually */
-					ret = glops->go_lock(gh);
-					spin_lock(&gl->gl_lockref.lock);
-					if (ret) {
-						if (ret == 1)
-							return 2;
-						gh->gh_error = ret;
-						list_del_init(&gh->gh_list);
-						trace_gfs2_glock_queue(gh, 0);
-						gfs2_holder_wake(gh);
-						goto restart;
-					}
+			if (!(gh->gh_flags & GL_SKIP) &&
+			    glops->go_lock_needed &&
+			    glops->go_lock_needed(gh)) {
+				spin_unlock(&gl->gl_lockref.lock);
+				/* FIXME: eliminate this eventually */
+				ret = glops->go_lock(gh);
+				spin_lock(&gl->gl_lockref.lock);
+				if (ret) {
+					if (ret == 1)
+						return 2;
+					gh->gh_error = ret;
+					list_del_init(&gh->gh_list);
+					trace_gfs2_glock_queue(gh, 0);
+					gfs2_holder_wake(gh);
+					goto restart;
 				}
 			}
 			set_bit(HIF_HOLDER, &gh->gh_iflags);
diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c
index 4b19f513570f..e0fa8d7f96d3 100644
--- a/fs/gfs2/glops.c
+++ b/fs/gfs2/glops.c
@@ -481,6 +481,17 @@ int gfs2_inode_refresh(struct gfs2_inode *ip)
 	return error;
 }
 
+static bool inode_go_lock_needed(struct gfs2_holder *gh)
+{
+	struct gfs2_glock *gl = gh->gh_gl;
+
+	if (!gl->gl_object)
+		return false;
+	if (!gfs2_first_holder(gh))
+		return false;
+	return !(gh->gh_flags & GL_SKIP);
+}
+
 /**
  * inode_go_lock - operation done after an inode lock is locked by a process
  * @gh: The glock holder
@@ -495,9 +506,6 @@ static int inode_go_lock(struct gfs2_holder *gh)
 	struct gfs2_inode *ip = gl->gl_object;
 	int error = 0;
 
-	if (!ip)
-		return 0;
-
 	if (test_bit(GIF_INVALID, &ip->i_flags)) {
 		error = gfs2_inode_refresh(ip);
 		if (error)
@@ -740,6 +748,7 @@ const struct gfs2_glock_operations gfs2_inode_glops = {
 	.go_sync = inode_go_sync,
 	.go_inval = inode_go_inval,
 	.go_demote_ok = inode_go_demote_ok,
+	.go_lock_needed = inode_go_lock_needed,
 	.go_lock = inode_go_lock,
 	.go_dump = inode_go_dump,
 	.go_type = LM_TYPE_INODE,
@@ -750,6 +759,7 @@ const struct gfs2_glock_operations gfs2_inode_glops = {
 const struct gfs2_glock_operations gfs2_rgrp_glops = {
 	.go_sync = rgrp_go_sync,
 	.go_inval = rgrp_go_inval,
+	.go_lock_needed = gfs2_rgrp_go_lock_needed,
 	.go_lock = gfs2_rgrp_go_lock,
 	.go_dump = gfs2_rgrp_go_dump,
 	.go_type = LM_TYPE_RGRP,
diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index 0fe49770166e..dc5c9dccb060 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -225,6 +225,7 @@ struct gfs2_glock_operations {
 			const char *fs_id_buf);
 	void (*go_callback)(struct gfs2_glock *gl, bool remote);
 	void (*go_free)(struct gfs2_glock *gl);
+	bool (*go_lock_needed)(struct gfs2_holder *gh);
 	const int go_subclass;
 	const int go_type;
 	const unsigned long go_flags;
diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index 96b2fbed6bf1..9848c5f4fbc4 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -1288,11 +1288,31 @@ static int update_rgrp_lvb(struct gfs2_rgrpd *rgd)
 	return 0;
 }
 
+bool gfs2_rgrp_go_lock_needed(struct gfs2_holder *gh)
+{
+	struct gfs2_rgrpd *rgd = gh->gh_gl->gl_object;
+
+	if (gh->gh_flags & GL_SKIP)
+		return false;
+
+	if (rgd->rd_bits[0].bi_bh)
+		return false;
+	return true;
+}
+
 int gfs2_rgrp_go_lock(struct gfs2_holder *gh)
 {
+	int ret;
+
 	struct gfs2_rgrpd *rgd = gh->gh_gl->gl_object;
 
-	return gfs2_rgrp_bh_get(rgd);
+	if (gfs2_glock_is_held_excl(rgd->rd_gl))
+		rgrp_lock_local(rgd);
+	ret = gfs2_rgrp_bh_get(rgd);
+	if (gfs2_glock_is_held_excl(rgd->rd_gl))
+		rgrp_unlock_local(rgd);
+
+	return ret;
 }
 
 /**
diff --git a/fs/gfs2/rgrp.h b/fs/gfs2/rgrp.h
index a6855fd796e0..4b62ba5d8e20 100644
--- a/fs/gfs2/rgrp.h
+++ b/fs/gfs2/rgrp.h
@@ -31,6 +31,7 @@ extern struct gfs2_rgrpd *gfs2_rgrpd_get_next(struct gfs2_rgrpd *rgd);
 extern void gfs2_clear_rgrpd(struct gfs2_sbd *sdp);
 extern int gfs2_rindex_update(struct gfs2_sbd *sdp);
 extern void gfs2_free_clones(struct gfs2_rgrpd *rgd);
+extern bool gfs2_rgrp_go_lock_needed(struct gfs2_holder *gh);
 extern int gfs2_rgrp_go_lock(struct gfs2_holder *gh);
 extern void gfs2_rgrp_brelse(struct gfs2_rgrpd *rgd);
 
-- 
2.31.1



  parent reply	other threads:[~2021-09-16 19:10 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-16 19:09 [Cluster-devel] [GFS2 PATCH v2 0/6] gfs2: fix bugs related to node_scope and go_lock Bob Peterson
2021-09-16 19:09 ` [Cluster-devel] [GFS2 PATCH v2 1/6] gfs2: remove redundant check in gfs2_rgrp_go_lock Bob Peterson
2021-09-16 19:09 ` [Cluster-devel] [GFS2 PATCH v2 2/6] gfs2: Add GL_SKIP holder flag to dump_holder Bob Peterson
2021-09-16 19:10 ` [Cluster-devel] [GFS2 PATCH v2 3/6] gfs2: move GL_SKIP check from glops to do_promote Bob Peterson
2021-09-16 19:10 ` [Cluster-devel] [GFS2 PATCH v2 4/6] gfs2: Switch some BUG_ON to GLOCK_BUG_ON for debug Bob Peterson
2021-09-16 19:10 ` [Cluster-devel] [GFS2 PATCH v2 5/6] gfs2: simplify do_promote and fix promote trace Bob Peterson
2021-09-16 19:10 ` Bob Peterson [this message]
2021-09-22 11:57   ` [Cluster-devel] [GFS2 PATCH v2 6/6] gfs2: introduce and use new glops go_lock_needed Andreas Gruenbacher
2021-09-22 12:47     ` Bob Peterson
2021-09-22 13:54       ` Andreas Gruenbacher

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210916191003.105866-7-rpeterso@redhat.com \
    --to=rpeterso@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).