cluster-devel.redhat.com archive mirror
 help / color / mirror / Atom feed
From: Alexander Aring <aahringo@redhat.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [RFC PATCH gfs2/for-next] fs: dlm: do retry workaround on other dlm API calls
Date: Tue, 26 Jul 2022 14:38:01 -0400	[thread overview]
Message-ID: <20220726183801.1005291-1-aahringo@redhat.com> (raw)

This patch will do a retry on -EBUSY dlm API return value when it's not a
dlm_unlock() call with either DLM_LKF_CANCEL or DLM_LKF_FORCEUNLOCK. All
other API calls can occur a -EBUSY return value and the DLM user needs
to handle it as a retry again for now.

The reason why we need a -EBUSY in all other cases is because
dlm_recover_waiters_post() function in DLM. This function is happening
when dlm recovery recovers lock states. In some cases it will trigger
new lock requests as:

if (oc || ou) {
	...
} else {
	switch (mstype) {
	case DLM_MSG_LOOKUP:
	case DLM_MSG_REQUEST:
		_request_lock(r, lkb);
		if (is_master(r))
			confirm_master(r, 0);
		break;
	case DLM_MSG_CONVERT:
                _convert_lock(r, lkb);
                break;
        default:
		err = 1;
        }
}

The problem begins on what dlm recovery is doing afterwards. Those
requests are not synchronized and there could be pending messages around.
There exists a race between handling those messages, dlm unlocks dlm API
for accepting new dlm requests and the dlm user triggers new requests
immediate afterwards. If the DLM user triggers new requests it can clash
with the above requests triggered by DLM internally because the pending
messages are still around. This behaviour is unpredictable and the user
has only knowledge about it if new requests returning -EBUSY. For now we
need to add retry cases everywhere to retry lock requests if the above
race happens.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
---
 fs/gfs2/lock_dlm.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/fs/gfs2/lock_dlm.c b/fs/gfs2/lock_dlm.c
index 2559a79cf14b..5c2eba142e9e 100644
--- a/fs/gfs2/lock_dlm.c
+++ b/fs/gfs2/lock_dlm.c
@@ -519,8 +519,15 @@ static int sync_unlock(struct gfs2_sbd *sdp, struct dlm_lksb *lksb, char *name)
 	struct lm_lockstruct *ls = &sdp->sd_lockstruct;
 	int error;
 
+again:
 	error = dlm_unlock(ls->ls_dlm, lksb->sb_lkid, 0, lksb, ls);
-	if (error) {
+	switch (error) {
+	case 0:
+		break;
+	case -EBUSY:
+		msleep(20);
+		goto again;
+	default:
 		fs_err(sdp, "%s lkid %x error %d\n",
 		       name, lksb->sb_lkid, error);
 		return error;
@@ -546,10 +553,17 @@ static int sync_lock(struct gfs2_sbd *sdp, int mode, uint32_t flags,
 	memset(strname, 0, GDLM_STRNAME_BYTES);
 	snprintf(strname, GDLM_STRNAME_BYTES, "%8x%16x", LM_TYPE_NONDISK, num);
 
+again:
 	error = dlm_lock(ls->ls_dlm, mode, lksb, flags,
 			 strname, GDLM_STRNAME_BYTES - 1,
 			 0, sync_wait_cb, ls, NULL);
-	if (error) {
+	switch (error) {
+	case 0:
+		break;
+	case -EBUSY:
+		msleep(20);
+		goto again;
+	default:
 		fs_err(sdp, "%s lkid %x flags %x mode %d error %d\n",
 		       name, lksb->sb_lkid, flags, mode, error);
 		return error;
-- 
2.31.1


                 reply	other threads:[~2022-07-26 18:38 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220726183801.1005291-1-aahringo@redhat.com \
    --to=aahringo@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).