public inbox for gfs2@lists.linux.dev
 help / color / mirror / Atom feed
* [PATCH vv6.19-rc6 1/7] dlm: fix recovery pending middle conversion
@ 2026-01-20 15:35 Alexander Aring
  2026-01-20 15:35 ` [PATCH vv6.19-rc6 2/7] dlm: validate length in dlm_search_rsb_tree Alexander Aring
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: Alexander Aring @ 2026-01-20 15:35 UTC (permalink / raw)
  To: teigland; +Cc: aahringo, gfs2

A workload involving PR <-> CW conversions and triggering recovery can
end in a so named "conversion deadlock" situation that is signaled by
"dlm: WARN: pending deadlock 1e node 0 2 1bf21" in the kernel log.

Under normal circumstances such conversion deadlocks are solved
immediately, in this case recovery created such scenario that was not
solved immediately. This scenario that two locks ending up on the
convertqueue with the conversion PR -> CW. In normal circumstances one
of the conversion will be rejected with -DEADLK as CW cannot be granted
when one lock is helding still PR. Usually one of those conversion will
immediately rejected and the rejected conversion need to convert to a
compatible lock mode. If such situation is created on the convertqueue
we don't solve such conversion in the expected way by the user.

The situation is created by recovery when a pending middle conversion
will be recovered and signaled by:

receive_rcom_lock_args 2e middle convert gr 3 rq 2 remote 2 1e

In this case recovery will remove waiting for the pending message and
force the lock being on the convertqueue without checking if there
is another incompatible conversion going on like PR -> CW which was the
case as the mentioned above "WARN pending deadlock ..." occurs.

This state is difficult to reproduce as it is requires a pending PR ->
CW conversion, however we automated a test scenario that fences randomly
on PR -> CW conversion and we was able to hit it.

The proposed change in this patch changes to not "force" putting the
pending middle conversion on the convertqueue and just handle it like
every other message to resend it later to the new lock master. To using
the existing convert functionality we will immediately reject such
conversion if a incompatible mode on the convertqueue is detected.

Long run with the automated randomly middle conversion test showed so
far we don't run into a "WARN: pending deadlock ..." situation again.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
---
 fs/dlm/lock.c | 19 +------------------
 1 file changed, 1 insertion(+), 18 deletions(-)

diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c
index be938fdf17d96..c01a291db401b 100644
--- a/fs/dlm/lock.c
+++ b/fs/dlm/lock.c
@@ -5014,25 +5014,8 @@ void dlm_receive_buffer(const union dlm_packet *p, int nodeid)
 static void recover_convert_waiter(struct dlm_ls *ls, struct dlm_lkb *lkb,
 				   struct dlm_message *ms_local)
 {
-	if (middle_conversion(lkb)) {
-		log_rinfo(ls, "%s %x middle convert in progress", __func__,
-			 lkb->lkb_id);
-
-		/* We sent this lock to the new master. The new master will
-		 * tell us when it's granted.  We no longer need a reply, so
-		 * use a fake reply to put the lkb into the right state.
-		 */
-		hold_lkb(lkb);
-		memset(ms_local, 0, sizeof(struct dlm_message));
-		ms_local->m_type = cpu_to_le32(DLM_MSG_CONVERT_REPLY);
-		ms_local->m_result = cpu_to_le32(to_dlm_errno(-EINPROGRESS));
-		ms_local->m_header.h_nodeid = cpu_to_le32(lkb->lkb_nodeid);
-		_receive_convert_reply(lkb, ms_local, true);
-		unhold_lkb(lkb);
-
-	} else if (lkb->lkb_rqmode >= lkb->lkb_grmode) {
+	if (middle_conversion(lkb) || lkb->lkb_rqmode >= lkb->lkb_grmode)
 		set_bit(DLM_IFL_RESEND_BIT, &lkb->lkb_iflags);
-	}
 
 	/* lkb->lkb_rqmode < lkb->lkb_grmode shouldn't happen since down
 	   conversions are async; there's no reply from the remote master */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-01-20 15:35 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-20 15:35 [PATCH vv6.19-rc6 1/7] dlm: fix recovery pending middle conversion Alexander Aring
2026-01-20 15:35 ` [PATCH vv6.19-rc6 2/7] dlm: validate length in dlm_search_rsb_tree Alexander Aring
2026-01-20 15:35 ` [PATCH vv6.19-rc6 3/7] fs/dlm: use list_add_tail() instead of open-coding list insertion Alexander Aring
2026-01-20 15:35 ` [PATCH vv6.19-rc6 4/7] dlm: Constify struct configfs_item_operations and configfs_group_operations Alexander Aring
2026-01-20 15:35 ` [PATCH vv6.19-rc6 5/7] fs/dlm/dir: remove unuse variable count_match Alexander Aring
2026-01-20 15:35 ` [PATCH vv6.19-rc6 6/7] dlm: use bool for coniditonal expressions Alexander Aring
2026-01-20 15:35 ` [PATCH vv6.19-rc6 7/7] dlm: use coniditon expression instead return scalars Alexander Aring

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox