public inbox for gfs2@lists.linux.dev
 help / color / mirror / Atom feed
* [PATCH dlm/next 1/3] dlm: fix recover_conversion() if grmode is unknown
@ 2024-11-04 22:04 Alexander Aring
  2024-11-04 22:04 ` [PATCH dlm/next 2/3] dlm: add grmode sanity checks and debug info Alexander Aring
  2024-11-04 22:04 ` [PATCH dlm/next 3/3] dlm: log_limit() recover_conversion() handling Alexander Aring
  0 siblings, 2 replies; 3+ messages in thread
From: Alexander Aring @ 2024-11-04 22:04 UTC (permalink / raw)
  To: teigland; +Cc: gfs2, aahringo

When a pending PR -> CW conversion is interrupted by a fence. While a
fence happened the master lock node of the pending PR -> CW conversion
was removed from the lockspace an recovery try to solve lock
dependencies. A new master node will be elected and the original granted
mode of the PR -> CW conversion cannot be determined anymore. In this
case recovery will set lkb_grmode to lkb_rqmode, but this will leave the
lkb in a invalid conversion state. Later on recovery will grant the
recovered lock state and it can't move out the lkb out of the conversion
queue to move it into the granted queue as the grmode is equal the
rqmode. At the end the lkb ends in an invalid state with grmode set to
DLM_LOCK_IV that can't handle future conversions.

To avoid this case we need to set the grmode to something different than
the rqmode. In the particular case we only run into PR <-> CW conversion.
If the rqmode is PR the grmode should be CW and vice versa to signal a
valid conversion on the conversion queue.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
---
 fs/dlm/recover.c | 26 +++++++++++++++++++++++---
 1 file changed, 23 insertions(+), 3 deletions(-)

diff --git a/fs/dlm/recover.c b/fs/dlm/recover.c
index 2e1169c81c6e..7f748b21f1fb 100644
--- a/fs/dlm/recover.c
+++ b/fs/dlm/recover.c
@@ -831,11 +831,31 @@ static void recover_conversion(struct dlm_rsb *r)
 		if (lkb->lkb_grmode != DLM_LOCK_IV)
 			continue;
 		if (grmode == -1) {
-			log_debug(ls, "recover_conversion %x set gr to rq %d",
+			/* the information of rqmode was never lost, but
+			 * grmode was lost. The lkb is on the convertqueue
+			 * and requires that lkb_grmode is different than
+			 * lkb_rqmode to be granted later by
+			 * dlm_recover_grant(). The real grmode is unknown
+			 * but as the rqmode is either PR or CW we just
+			 * set grmode as the conversion queue indicates
+			 * contention because the lock mode was incompatible.
+			 */
+			switch (lkb->lkb_rqmode) {
+			case DLM_LOCK_PR:
+				lkb->lkb_grmode = DLM_LOCK_CW;
+				break;
+			case DLM_LOCK_CW:
+				lkb->lkb_grmode = DLM_LOCK_PR;
+				break;
+			default:
+				WARN_ON(1);
+				break;
+			}
+
+			log_debug(ls, "%s %x set gr to rq %d", __func__,
 				  lkb->lkb_id, lkb->lkb_rqmode);
-			lkb->lkb_grmode = lkb->lkb_rqmode;
 		} else {
-			log_debug(ls, "recover_conversion %x set gr %d",
+			log_debug(ls, "%s %x set gr %d", __func__,
 				  lkb->lkb_id, grmode);
 			lkb->lkb_grmode = grmode;
 		}
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [PATCH dlm/next 2/3] dlm: add grmode sanity checks and debug info
  2024-11-04 22:04 [PATCH dlm/next 1/3] dlm: fix recover_conversion() if grmode is unknown Alexander Aring
@ 2024-11-04 22:04 ` Alexander Aring
  2024-11-04 22:04 ` [PATCH dlm/next 3/3] dlm: log_limit() recover_conversion() handling Alexander Aring
  1 sibling, 0 replies; 3+ messages in thread
From: Alexander Aring @ 2024-11-04 22:04 UTC (permalink / raw)
  To: teigland; +Cc: gfs2, aahringo

We had earlier issues regarding to look for invalid cases when DLM
conversions were called in a invalid lkb state or if recovery handling
set lkb states. Cover those cases and let the user warn them about that
we were running into those. While on it add more debugging information
for the kernel logger when we run into those cases if rqmode or grmode
is in some invalid state.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
---
 fs/dlm/lock.c    | 15 +++++++++++----
 fs/dlm/recover.c |  2 ++
 2 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c
index 966a926c301b..b883bb94e3e8 100644
--- a/fs/dlm/lock.c
+++ b/fs/dlm/lock.c
@@ -2830,6 +2830,11 @@ static int validate_lock_args(struct dlm_ls *ls, struct dlm_lkb *lkb,
 		if (test_bit(DLM_IFL_MSTCPY_BIT, &lkb->lkb_iflags))
 			goto out;
 
+		/* sanity check to do a conversion on a invalid lkb state */
+		if (lkb->lkb_grmode == DLM_LOCK_IV ||
+		    lkb->lkb_status != DLM_LKSTS_GRANTED)
+			goto out;
+
 		if (args->flags & DLM_LKF_QUECVT &&
 		    !__quecvt_compat_matrix[lkb->lkb_grmode+1][args->mode+1])
 			goto out;
@@ -2852,14 +2857,16 @@ static int validate_lock_args(struct dlm_ls *ls, struct dlm_lkb *lkb,
 	case -EINVAL:
 		/* annoy the user because dlm usage is wrong */
 		WARN_ON(1);
-		log_error(ls, "%s %d %x %x %x %d %d", __func__,
+		log_error(ls, "%s %d %x %x %x %d %d %d %d", __func__,
 			  rv, lkb->lkb_id, dlm_iflags_val(lkb), args->flags,
-			  lkb->lkb_status, lkb->lkb_wait_type);
+			  lkb->lkb_status, lkb->lkb_wait_type,
+			  lkb->lkb_rqmode, lkb->lkb_grmode);
 		break;
 	default:
-		log_debug(ls, "%s %d %x %x %x %d %d", __func__,
+		log_debug(ls, "%s %d %x %x %x %d %d %d %d", __func__,
 			  rv, lkb->lkb_id, dlm_iflags_val(lkb), args->flags,
-			  lkb->lkb_status, lkb->lkb_wait_type);
+			  lkb->lkb_status, lkb->lkb_wait_type,
+			  lkb->lkb_rqmode, lkb->lkb_grmode);
 		break;
 	}
 
diff --git a/fs/dlm/recover.c b/fs/dlm/recover.c
index 7f748b21f1fb..011153fcb84f 100644
--- a/fs/dlm/recover.c
+++ b/fs/dlm/recover.c
@@ -855,6 +855,8 @@ static void recover_conversion(struct dlm_rsb *r)
 			log_debug(ls, "%s %x set gr to rq %d", __func__,
 				  lkb->lkb_id, lkb->lkb_rqmode);
 		} else {
+			WARN_ON(grmode == lkb->lkb_rqmode);
+
 			log_debug(ls, "%s %x set gr %d", __func__,
 				  lkb->lkb_id, grmode);
 			lkb->lkb_grmode = grmode;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [PATCH dlm/next 3/3] dlm: log_limit() recover_conversion() handling
  2024-11-04 22:04 [PATCH dlm/next 1/3] dlm: fix recover_conversion() if grmode is unknown Alexander Aring
  2024-11-04 22:04 ` [PATCH dlm/next 2/3] dlm: add grmode sanity checks and debug info Alexander Aring
@ 2024-11-04 22:04 ` Alexander Aring
  1 sibling, 0 replies; 3+ messages in thread
From: Alexander Aring @ 2024-11-04 22:04 UTC (permalink / raw)
  To: teigland; +Cc: gfs2, aahringo

Let's us easier be aware when middle_conversions() are handled by DLM
recovery handling.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
---
 fs/dlm/recover.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/dlm/recover.c b/fs/dlm/recover.c
index 011153fcb84f..7e89f362e50c 100644
--- a/fs/dlm/recover.c
+++ b/fs/dlm/recover.c
@@ -852,12 +852,12 @@ static void recover_conversion(struct dlm_rsb *r)
 				break;
 			}
 
-			log_debug(ls, "%s %x set gr to rq %d", __func__,
+			log_limit(ls, "%s %x set gr to rq %d", __func__,
 				  lkb->lkb_id, lkb->lkb_rqmode);
 		} else {
 			WARN_ON(grmode == lkb->lkb_rqmode);
 
-			log_debug(ls, "%s %x set gr %d", __func__,
+			log_limit(ls, "%s %x set gr %d", __func__,
 				  lkb->lkb_id, grmode);
 			lkb->lkb_grmode = grmode;
 		}
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-11-04 22:04 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-04 22:04 [PATCH dlm/next 1/3] dlm: fix recover_conversion() if grmode is unknown Alexander Aring
2024-11-04 22:04 ` [PATCH dlm/next 2/3] dlm: add grmode sanity checks and debug info Alexander Aring
2024-11-04 22:04 ` [PATCH dlm/next 3/3] dlm: log_limit() recover_conversion() handling Alexander Aring

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox