From mboxrd@z Thu Jan 1 00:00:00 1970 From: Bob Peterson Date: Wed, 26 Jul 2023 12:01:08 -0500 Subject: [Cluster-devel] [PATCH] gfs2: conversion deadlock do_promote bypass Message-ID: <20230726170108.389793-1-rpeterso@redhat.com> List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit I know the description is vague or hard to grasp, but it's hard to be succinct for this problem. In this case the failing scenario is this: 1. We have a glock in the SH state 2. A process requests an asychronous lock of the glock in EX mode. (rename) 3. Before the lock is granted, more processes (read / ls) request the glock in SH again. 4. gfs2 sends a request to DLM for the lock in EX because that holder is at the top of the queue. 5. Somehow the dlm request gets canceled, so dlm sends us back a response with state==SH and LM_OUT_CANCELED. 6. finish_xmote gets called to process the response from dlm. It detects the glock is not in the requested mode, and demote is not in progress so it goes through this codepath: if (unlikely(state != gl->gl_target)) { if (gh && !test_bit(GLF_DEMOTE_IN_PROGRESS, &gl->gl_flags)) { if (ret & LM_OUT_CANCELED) { At this point, finish_xmote cannot grant the canceled EX holder, but the glock is still in SH mode. 7. Before this patch, finish_xmote moves the holder to the end of the queue and finds the next holder, which is for SH. Then it does: gl->gl_target = gh->gh_state; goto retry; The retry calls do_xmote, which detects the requested state (SH) is equal to the current state, and does: GLOCK_BUG_ON(gl, gl->gl_state == gl->gl_target); To do_xmote, it is invalid to transition a glock to the existing state. This patch adds a check for the next holder wanting the state the glock is already in, and if that's the case, it doesn't need to call do_xmote at all. It can simply call do_promote and promote the holders needing the glock in the existing state. The patch adds a goto promote which jumps to the end of finish_xmote where it calls do_promote. Signed-off-by: Bob Peterson --- fs/gfs2/glock.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c index 1438e7465e30..d1e1fd786417 100644 --- a/fs/gfs2/glock.c +++ b/fs/gfs2/glock.c @@ -595,6 +595,8 @@ static void finish_xmote(struct gfs2_glock *gl, unsigned int ret) list_move_tail(&gh->gh_list, &gl->gl_holders); gh = find_first_waiter(gl); gl->gl_target = gh->gh_state; + if (gl->gl_state == gl->gl_target) + goto promote; goto retry; } /* Some error or failed "try lock" - report it */ @@ -640,6 +642,7 @@ static void finish_xmote(struct gfs2_glock *gl, unsigned int ret) goto out; } } +promote: do_promote(gl); } out: -- 2.41.0