public inbox for gfs2@lists.linux.dev
 help / color / mirror / Atom feed
From: Alexander Aring <aahringo@redhat.com>
To: teigland@redhat.com
Cc: gfs2@lists.linux.dev, aahringo@redhat.com
Subject: [PATCH dlm/next 4/5] dlm: fix middle conversion cases in DLM recovery
Date: Wed, 13 Nov 2024 11:46:42 -0500	[thread overview]
Message-ID: <20241113164643.464055-4-aahringo@redhat.com> (raw)
In-Reply-To: <20241113164643.464055-1-aahringo@redhat.com>

We saw some issue after recovery and a following
dlm_lock(QUECVT | CONVERT) returned with -EINVAL. After investigation it
came out the lkb grmode was in an invalid state "-1" although the lkb was
in granted state.

I was able to reproduce the issue with the following DLM lkb/rsb state
manipulation:

steps to reproduce (3 nodes cluster):

Recovery conversion case "recover_convert_waiter()":

Node 1:
$ echo "test $CLNAME" > /sys/kernel/debug/dlm/__ctrl
$ echo "1337 foobaaar 0x0 0x0 3 3 2 2 2 3" >
/sys/kernel/debug/dlm/test_locks
$ echo "1337 2 2" > /sys/kernel/debug/dlm/test_waiters
Node 2:
$ echo "test $CLNAME" > /sys/kernel/debug/dlm/__ctrl
Node 3:
$ echo "test $CLNAME" > /sys/kernel/debug/dlm/__ctrl

`dlm_tool lockdebug test` before recovery dumps:
Node 1:
---
Resource len  8  "foobaaar"
Master:2         flags 000000c8 first_lkid 0 root 0 recover 0 locks 0
Convert
00001337 PR (CW) Master:   2 00000000  wait 02

Expecting reply
nodeid  2 msg convert lkid 00001337 resource "foobaaar"
---

Do a fence of node 2.

`dlm_tool lockdebug test` after recovery dumps:

---
Resource len  8  "foobaaar"
Master           flags 000000c8 first_lkid 0 root 0 recover 0 locks 0
Granted
00001337 IV
---

The lkb ends up in an invalid granted state with the lock mode invalid.
A following lock conversion with QUECVT will always return -EINVAL.

This invalid case is only appears when we have a waiter for a PR <-> CW
lock conversion that waits for an answer from the master lock node but
the master was being fenced. As this case is mainly handled in
"recover_convert_waiter()".

There are more cases to check but this is so far the only one I can
confirm that there are problems with and should be fixed.

If the above case occurs it leaves the lkb in a invalid state of
"grmode == rqmode" on the convertqueue that "recover_conversion()"
handles.

After this patch the above invalid state does not appear anymore. It
ends in the following lock state after fencing node 2 and recovery is
being triggered:

`dlm_tool lockdebug test` after recovery dumps:
Node 1:
---
Resource len  8  "foobaaar"
Master           flags 000000c8 first_lkid 0 root 0 recover 0 locks 0
Granted
00001337 CW
---

That successful indicates the new master is node 1 and it got the CW
lock state granted as requested before recovery.

There is however a second case handled when a new master is elected and
"receive_rcom_lock_args()" recovers lock requests from non master nodes
to the new master. The condition in "receive_rcom_lock_args()" of
"rl->rl_wait_type == cpu_to_le16(DLM_MSG_CONVERT)" however can't be true
as "recover_convert_waiter()" would call "_receive_convert_reply()" and
remove the current waiter of the lkb if there is one. That always would
have ended in "wait_type == 0". In case of a new master the new master
will never receive the lkb with "wait_type != 0".

I tried to find problems in the current handling of
"receive_rcom_lock_args()" case and I couldn't find any. However there is
a condition that seems never to be worked correctly.

This solution was co-developed with the original DLM author David
Teigland that changes this case as it covers potential all other non yet
easy to reproduce cases for PR <-> CW that are on granted/convertqueue.

Co-developed: David Teigland <teigland@redhat.com>
Signed-off-by: Alexander Aring <aahringo@redhat.com>
---
 fs/dlm/lock.c    | 20 ++++++++++++--------
 fs/dlm/recover.c | 40 +++++++++++++++++++++++++++-------------
 2 files changed, 39 insertions(+), 21 deletions(-)

diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c
index 4219e0025fa2..1e809f591d1a 100644
--- a/fs/dlm/lock.c
+++ b/fs/dlm/lock.c
@@ -4999,16 +4999,19 @@ static void recover_convert_waiter(struct dlm_ls *ls, struct dlm_lkb *lkb,
 				   struct dlm_message *ms_local)
 {
 	if (middle_conversion(lkb)) {
+		log_rinfo(ls, "%s %x middle convert in progress", __func__,
+			 lkb->lkb_id);
+
+		/* We sent this lock to the new master. The new master will
+		 * tell us when it's granted.  We no longer need a reply, so
+		 * use a fake reply to put the lkb into the right state.
+		 */
 		hold_lkb(lkb);
 		memset(ms_local, 0, sizeof(struct dlm_message));
 		ms_local->m_type = cpu_to_le32(DLM_MSG_CONVERT_REPLY);
 		ms_local->m_result = cpu_to_le32(to_dlm_errno(-EINPROGRESS));
 		ms_local->m_header.h_nodeid = cpu_to_le32(lkb->lkb_nodeid);
 		_receive_convert_reply(lkb, ms_local, true);
-
-		/* Same special case as in receive_rcom_lock_args() */
-		lkb->lkb_grmode = DLM_LOCK_IV;
-		rsb_set_flag(lkb->lkb_resource, RSB_RECOVER_CONVERT);
 		unhold_lkb(lkb);
 
 	} else if (lkb->lkb_rqmode >= lkb->lkb_grmode) {
@@ -5555,10 +5558,11 @@ static int receive_rcom_lock_args(struct dlm_ls *ls, struct dlm_lkb *lkb,
 	   The real granted mode of these converting locks cannot be determined
 	   until all locks have been rebuilt on the rsb (recover_conversion) */
 
-	if (rl->rl_wait_type == cpu_to_le16(DLM_MSG_CONVERT) &&
-	    middle_conversion(lkb)) {
-		rl->rl_status = DLM_LKSTS_CONVERT;
-		lkb->lkb_grmode = DLM_LOCK_IV;
+	if (rl->rl_status == DLM_LKSTS_CONVERT && middle_conversion(lkb)) {
+		/* We may need to adjust grmode depending on other granted locks. */
+		log_limit(ls, "%s %x middle convert gr %d rq %d remote %d %x",
+			  __func__, lkb->lkb_id, lkb->lkb_grmode,
+			  lkb->lkb_rqmode, lkb->lkb_nodeid, lkb->lkb_remid);
 		rsb_set_flag(r, RSB_RECOVER_CONVERT);
 	}
 
diff --git a/fs/dlm/recover.c b/fs/dlm/recover.c
index 2e1169c81c6e..5a7b4b2016c9 100644
--- a/fs/dlm/recover.c
+++ b/fs/dlm/recover.c
@@ -811,33 +811,47 @@ static void recover_lvb(struct dlm_rsb *r)
 }
 
 /* All master rsb's flagged RECOVER_CONVERT need to be looked at.  The locks
-   converting PR->CW or CW->PR need to have their lkb_grmode set. */
+ * converting PR->CW or CW->PR may need to have their lkb_grmode changed.
+ */
 
 static void recover_conversion(struct dlm_rsb *r)
 {
 	struct dlm_ls *ls = r->res_ls;
+	uint32_t other_lkid = 0;
+	int other_grmode = -1;
 	struct dlm_lkb *lkb;
-	int grmode = -1;
 
 	list_for_each_entry(lkb, &r->res_grantqueue, lkb_statequeue) {
 		if (lkb->lkb_grmode == DLM_LOCK_PR ||
 		    lkb->lkb_grmode == DLM_LOCK_CW) {
-			grmode = lkb->lkb_grmode;
+			other_grmode = lkb->lkb_grmode;
+			other_lkid = lkb->lkb_id;
 			break;
 		}
 	}
 
+	if (other_grmode == -1) {
+		log_limit(ls, "%s %x gr %d rq %d, remote %d %x, no granted PR/CW found",
+			  __func__, lkb->lkb_id, lkb->lkb_grmode,
+			  lkb->lkb_rqmode, lkb->lkb_nodeid,
+			  lkb->lkb_remid);
+		return;
+	}
+
 	list_for_each_entry(lkb, &r->res_convertqueue, lkb_statequeue) {
-		if (lkb->lkb_grmode != DLM_LOCK_IV)
-			continue;
-		if (grmode == -1) {
-			log_debug(ls, "recover_conversion %x set gr to rq %d",
-				  lkb->lkb_id, lkb->lkb_rqmode);
-			lkb->lkb_grmode = lkb->lkb_rqmode;
-		} else {
-			log_debug(ls, "recover_conversion %x set gr %d",
-				  lkb->lkb_id, grmode);
-			lkb->lkb_grmode = grmode;
+		/* Lock recovery created incompatible granted modes, so
+		 * change the granted mode of the converting lock to
+		 * NL. The rqmode of the converting lock should be CW,
+		 * which means the converting lock should be granted at
+		 * the end of recovery.
+		 */
+		if (((lkb->lkb_grmode == DLM_LOCK_PR) && (other_grmode == DLM_LOCK_CW)) ||
+		    ((lkb->lkb_grmode == DLM_LOCK_CW) && (other_grmode == DLM_LOCK_PR))) {
+			log_limit(ls, "%s %x gr %d rq %d, remote %d %x, other_lkid %u, other gr %d, set gr=NL",
+				  __func__, lkb->lkb_id, lkb->lkb_grmode,
+				  lkb->lkb_rqmode, lkb->lkb_nodeid,
+				  lkb->lkb_remid, other_lkid, other_grmode);
+			lkb->lkb_grmode = DLM_LOCK_NL;
 		}
 	}
 }
-- 
2.43.0


  parent reply	other threads:[~2024-11-13 16:46 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-13 16:46 [PATCH dlm/next 1/5] dlm: new max length for debugfs entry name Alexander Aring
2024-11-13 16:46 ` [PATCH dlm/next 2/5] dlm: introduce lockspace control debugfs entry Alexander Aring
2024-11-13 16:46 ` [PATCH dlm/next 3/5] dlm: extend debugfs to manipulate more DLM states Alexander Aring
2024-11-13 16:46 ` Alexander Aring [this message]
2024-11-13 16:46 ` [PATCH dlm/next 5/5] dlm: more debug info on invalid lock requests Alexander Aring

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20241113164643.464055-4-aahringo@redhat.com \
    --to=aahringo@redhat.com \
    --cc=gfs2@lists.linux.dev \
    --cc=teigland@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox