From: Alexander Aring <aahringo@redhat.com>
To: teigland@redhat.com
Cc: gfs2@lists.linux.dev, aahringo@redhat.com
Subject: [PATCH dlm/next 4/5] dlm: fix middle conversion cases in DLM recovery
Date: Wed, 13 Nov 2024 11:46:42 -0500 [thread overview]
Message-ID: <20241113164643.464055-4-aahringo@redhat.com> (raw)
In-Reply-To: <20241113164643.464055-1-aahringo@redhat.com>
We saw some issue after recovery and a following
dlm_lock(QUECVT | CONVERT) returned with -EINVAL. After investigation it
came out the lkb grmode was in an invalid state "-1" although the lkb was
in granted state.
I was able to reproduce the issue with the following DLM lkb/rsb state
manipulation:
steps to reproduce (3 nodes cluster):
Recovery conversion case "recover_convert_waiter()":
Node 1:
$ echo "test $CLNAME" > /sys/kernel/debug/dlm/__ctrl
$ echo "1337 foobaaar 0x0 0x0 3 3 2 2 2 3" >
/sys/kernel/debug/dlm/test_locks
$ echo "1337 2 2" > /sys/kernel/debug/dlm/test_waiters
Node 2:
$ echo "test $CLNAME" > /sys/kernel/debug/dlm/__ctrl
Node 3:
$ echo "test $CLNAME" > /sys/kernel/debug/dlm/__ctrl
`dlm_tool lockdebug test` before recovery dumps:
Node 1:
---
Resource len 8 "foobaaar"
Master:2 flags 000000c8 first_lkid 0 root 0 recover 0 locks 0
Convert
00001337 PR (CW) Master: 2 00000000 wait 02
Expecting reply
nodeid 2 msg convert lkid 00001337 resource "foobaaar"
---
Do a fence of node 2.
`dlm_tool lockdebug test` after recovery dumps:
---
Resource len 8 "foobaaar"
Master flags 000000c8 first_lkid 0 root 0 recover 0 locks 0
Granted
00001337 IV
---
The lkb ends up in an invalid granted state with the lock mode invalid.
A following lock conversion with QUECVT will always return -EINVAL.
This invalid case is only appears when we have a waiter for a PR <-> CW
lock conversion that waits for an answer from the master lock node but
the master was being fenced. As this case is mainly handled in
"recover_convert_waiter()".
There are more cases to check but this is so far the only one I can
confirm that there are problems with and should be fixed.
If the above case occurs it leaves the lkb in a invalid state of
"grmode == rqmode" on the convertqueue that "recover_conversion()"
handles.
After this patch the above invalid state does not appear anymore. It
ends in the following lock state after fencing node 2 and recovery is
being triggered:
`dlm_tool lockdebug test` after recovery dumps:
Node 1:
---
Resource len 8 "foobaaar"
Master flags 000000c8 first_lkid 0 root 0 recover 0 locks 0
Granted
00001337 CW
---
That successful indicates the new master is node 1 and it got the CW
lock state granted as requested before recovery.
There is however a second case handled when a new master is elected and
"receive_rcom_lock_args()" recovers lock requests from non master nodes
to the new master. The condition in "receive_rcom_lock_args()" of
"rl->rl_wait_type == cpu_to_le16(DLM_MSG_CONVERT)" however can't be true
as "recover_convert_waiter()" would call "_receive_convert_reply()" and
remove the current waiter of the lkb if there is one. That always would
have ended in "wait_type == 0". In case of a new master the new master
will never receive the lkb with "wait_type != 0".
I tried to find problems in the current handling of
"receive_rcom_lock_args()" case and I couldn't find any. However there is
a condition that seems never to be worked correctly.
This solution was co-developed with the original DLM author David
Teigland that changes this case as it covers potential all other non yet
easy to reproduce cases for PR <-> CW that are on granted/convertqueue.
Co-developed: David Teigland <teigland@redhat.com>
Signed-off-by: Alexander Aring <aahringo@redhat.com>
---
fs/dlm/lock.c | 20 ++++++++++++--------
fs/dlm/recover.c | 40 +++++++++++++++++++++++++++-------------
2 files changed, 39 insertions(+), 21 deletions(-)
diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c
index 4219e0025fa2..1e809f591d1a 100644
--- a/fs/dlm/lock.c
+++ b/fs/dlm/lock.c
@@ -4999,16 +4999,19 @@ static void recover_convert_waiter(struct dlm_ls *ls, struct dlm_lkb *lkb,
struct dlm_message *ms_local)
{
if (middle_conversion(lkb)) {
+ log_rinfo(ls, "%s %x middle convert in progress", __func__,
+ lkb->lkb_id);
+
+ /* We sent this lock to the new master. The new master will
+ * tell us when it's granted. We no longer need a reply, so
+ * use a fake reply to put the lkb into the right state.
+ */
hold_lkb(lkb);
memset(ms_local, 0, sizeof(struct dlm_message));
ms_local->m_type = cpu_to_le32(DLM_MSG_CONVERT_REPLY);
ms_local->m_result = cpu_to_le32(to_dlm_errno(-EINPROGRESS));
ms_local->m_header.h_nodeid = cpu_to_le32(lkb->lkb_nodeid);
_receive_convert_reply(lkb, ms_local, true);
-
- /* Same special case as in receive_rcom_lock_args() */
- lkb->lkb_grmode = DLM_LOCK_IV;
- rsb_set_flag(lkb->lkb_resource, RSB_RECOVER_CONVERT);
unhold_lkb(lkb);
} else if (lkb->lkb_rqmode >= lkb->lkb_grmode) {
@@ -5555,10 +5558,11 @@ static int receive_rcom_lock_args(struct dlm_ls *ls, struct dlm_lkb *lkb,
The real granted mode of these converting locks cannot be determined
until all locks have been rebuilt on the rsb (recover_conversion) */
- if (rl->rl_wait_type == cpu_to_le16(DLM_MSG_CONVERT) &&
- middle_conversion(lkb)) {
- rl->rl_status = DLM_LKSTS_CONVERT;
- lkb->lkb_grmode = DLM_LOCK_IV;
+ if (rl->rl_status == DLM_LKSTS_CONVERT && middle_conversion(lkb)) {
+ /* We may need to adjust grmode depending on other granted locks. */
+ log_limit(ls, "%s %x middle convert gr %d rq %d remote %d %x",
+ __func__, lkb->lkb_id, lkb->lkb_grmode,
+ lkb->lkb_rqmode, lkb->lkb_nodeid, lkb->lkb_remid);
rsb_set_flag(r, RSB_RECOVER_CONVERT);
}
diff --git a/fs/dlm/recover.c b/fs/dlm/recover.c
index 2e1169c81c6e..5a7b4b2016c9 100644
--- a/fs/dlm/recover.c
+++ b/fs/dlm/recover.c
@@ -811,33 +811,47 @@ static void recover_lvb(struct dlm_rsb *r)
}
/* All master rsb's flagged RECOVER_CONVERT need to be looked at. The locks
- converting PR->CW or CW->PR need to have their lkb_grmode set. */
+ * converting PR->CW or CW->PR may need to have their lkb_grmode changed.
+ */
static void recover_conversion(struct dlm_rsb *r)
{
struct dlm_ls *ls = r->res_ls;
+ uint32_t other_lkid = 0;
+ int other_grmode = -1;
struct dlm_lkb *lkb;
- int grmode = -1;
list_for_each_entry(lkb, &r->res_grantqueue, lkb_statequeue) {
if (lkb->lkb_grmode == DLM_LOCK_PR ||
lkb->lkb_grmode == DLM_LOCK_CW) {
- grmode = lkb->lkb_grmode;
+ other_grmode = lkb->lkb_grmode;
+ other_lkid = lkb->lkb_id;
break;
}
}
+ if (other_grmode == -1) {
+ log_limit(ls, "%s %x gr %d rq %d, remote %d %x, no granted PR/CW found",
+ __func__, lkb->lkb_id, lkb->lkb_grmode,
+ lkb->lkb_rqmode, lkb->lkb_nodeid,
+ lkb->lkb_remid);
+ return;
+ }
+
list_for_each_entry(lkb, &r->res_convertqueue, lkb_statequeue) {
- if (lkb->lkb_grmode != DLM_LOCK_IV)
- continue;
- if (grmode == -1) {
- log_debug(ls, "recover_conversion %x set gr to rq %d",
- lkb->lkb_id, lkb->lkb_rqmode);
- lkb->lkb_grmode = lkb->lkb_rqmode;
- } else {
- log_debug(ls, "recover_conversion %x set gr %d",
- lkb->lkb_id, grmode);
- lkb->lkb_grmode = grmode;
+ /* Lock recovery created incompatible granted modes, so
+ * change the granted mode of the converting lock to
+ * NL. The rqmode of the converting lock should be CW,
+ * which means the converting lock should be granted at
+ * the end of recovery.
+ */
+ if (((lkb->lkb_grmode == DLM_LOCK_PR) && (other_grmode == DLM_LOCK_CW)) ||
+ ((lkb->lkb_grmode == DLM_LOCK_CW) && (other_grmode == DLM_LOCK_PR))) {
+ log_limit(ls, "%s %x gr %d rq %d, remote %d %x, other_lkid %u, other gr %d, set gr=NL",
+ __func__, lkb->lkb_id, lkb->lkb_grmode,
+ lkb->lkb_rqmode, lkb->lkb_nodeid,
+ lkb->lkb_remid, other_lkid, other_grmode);
+ lkb->lkb_grmode = DLM_LOCK_NL;
}
}
}
--
2.43.0
next prev parent reply other threads:[~2024-11-13 16:46 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-13 16:46 [PATCH dlm/next 1/5] dlm: new max length for debugfs entry name Alexander Aring
2024-11-13 16:46 ` [PATCH dlm/next 2/5] dlm: introduce lockspace control debugfs entry Alexander Aring
2024-11-13 16:46 ` [PATCH dlm/next 3/5] dlm: extend debugfs to manipulate more DLM states Alexander Aring
2024-11-13 16:46 ` Alexander Aring [this message]
2024-11-13 16:46 ` [PATCH dlm/next 5/5] dlm: more debug info on invalid lock requests Alexander Aring
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241113164643.464055-4-aahringo@redhat.com \
--to=aahringo@redhat.com \
--cc=gfs2@lists.linux.dev \
--cc=teigland@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox