* [Cluster-devel] [PATCH] gfs2: conversion deadlock do_promote bypass
@ 2023-07-26 17:01 Bob Peterson
2023-08-09 12:13 ` Andreas Gruenbacher
0 siblings, 1 reply; 2+ messages in thread
From: Bob Peterson @ 2023-07-26 17:01 UTC (permalink / raw)
To: cluster-devel.redhat.com
I know the description is vague or hard to grasp, but it's hard to be
succinct for this problem.
In this case the failing scenario is this:
1. We have a glock in the SH state
2. A process requests an asychronous lock of the glock in EX mode.
(rename)
3. Before the lock is granted, more processes (read / ls) request the
glock in SH again.
4. gfs2 sends a request to DLM for the lock in EX because that holder is
at the top of the queue.
5. Somehow the dlm request gets canceled, so dlm sends us back a
response with state==SH and LM_OUT_CANCELED.
6. finish_xmote gets called to process the response from dlm. It detects
the glock is not in the requested mode, and demote is not in progress
so it goes through this codepath:
if (unlikely(state != gl->gl_target)) {
if (gh && !test_bit(GLF_DEMOTE_IN_PROGRESS, &gl->gl_flags)) {
if (ret & LM_OUT_CANCELED) {
At this point, finish_xmote cannot grant the canceled EX holder, but
the glock is still in SH mode.
7. Before this patch, finish_xmote moves the holder to the end of the
queue and finds the next holder, which is for SH. Then it does:
gl->gl_target = gh->gh_state;
goto retry;
The retry calls do_xmote, which detects the requested state (SH) is equal
to the current state, and does:
GLOCK_BUG_ON(gl, gl->gl_state == gl->gl_target);
To do_xmote, it is invalid to transition a glock to the existing state.
This patch adds a check for the next holder wanting the state the glock
is already in, and if that's the case, it doesn't need to call do_xmote
at all. It can simply call do_promote and promote the holders needing
the glock in the existing state. The patch adds a goto promote which
jumps to the end of finish_xmote where it calls do_promote.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
fs/gfs2/glock.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 1438e7465e30..d1e1fd786417 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -595,6 +595,8 @@ static void finish_xmote(struct gfs2_glock *gl, unsigned int ret)
list_move_tail(&gh->gh_list, &gl->gl_holders);
gh = find_first_waiter(gl);
gl->gl_target = gh->gh_state;
+ if (gl->gl_state == gl->gl_target)
+ goto promote;
goto retry;
}
/* Some error or failed "try lock" - report it */
@@ -640,6 +642,7 @@ static void finish_xmote(struct gfs2_glock *gl, unsigned int ret)
goto out;
}
}
+promote:
do_promote(gl);
}
out:
--
2.41.0
^ permalink raw reply related [flat|nested] 2+ messages in thread
* Re: [Cluster-devel] [PATCH] gfs2: conversion deadlock do_promote bypass
2023-07-26 17:01 [Cluster-devel] [PATCH] gfs2: conversion deadlock do_promote bypass Bob Peterson
@ 2023-08-09 12:13 ` Andreas Gruenbacher
0 siblings, 0 replies; 2+ messages in thread
From: Andreas Gruenbacher @ 2023-08-09 12:13 UTC (permalink / raw)
To: Bob Peterson; +Cc: cluster-devel
Hi Bob,
On Wed, Jul 26, 2023 at 8:36 PM Bob Peterson <rpeterso@redhat.com> wrote:
> I know the description is vague or hard to grasp, but it's hard to be
> succinct for this problem.
as discussed off-list, this one needed a bit more work. I've just
posted the updated version.
Thanks,
Andreas
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2023-08-09 12:20 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-07-26 17:01 [Cluster-devel] [PATCH] gfs2: conversion deadlock do_promote bypass Bob Peterson
2023-08-09 12:13 ` Andreas Gruenbacher
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).