From: Alexander Aring <aahringo@redhat.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [PATCH v6.4-rc2 2/5] fs: dlm: fix cleanup pending ops when interrupted
Date: Fri, 19 May 2023 11:21:25 -0400 [thread overview]
Message-ID: <20230519152128.65272-2-aahringo@redhat.com> (raw)
In-Reply-To: <20230519152128.65272-1-aahringo@redhat.com>
This patch mainly reverts what commit b92a4e3f86b1 ("fs: dlm: change posix
lock sigint handling") introduced. Except two things, checking if
op->done got true under ops_lock after it got interrupted and changing
"no op" messages to debug printout.
There is currently problems with cleaning up pending operations. The
main idea of commit b92a4e3f86b1 ("fs: dlm: change posix lock sigint
handling") was to wait for a reply and if it was interrupted then the
cleanup routine e.g. list_del(), do_unlock_close() will be executed.
This requires that for every dlm op request a answer in dev_write()
comes back. The cleanup routine do_unlock_close() is not operating in
the dlm user space software on a per request basis and will cleanup
everything else what matches certain plock op fields which concludes
that we don't get anymore for all request a result back. This will
have some leftovers inside the dlm plock recv_list which will never
being deleted.
It was confirmed with a new debugfs entry to look if some plock lists
have still entries left when there is no posix lock activity, checked
by dlm_tool plocks $LS, ongoing anymore. In the specific testcase on
a gfs2 mountpoint the following command was executed:
stress-ng --fcntl 32
and the stress-ng program was killed after certain time.
Due the fact that do_unlock_close() cleans more than just a specific
operation and the dlm operation is already removed by list_del(). This
list_del() can either be operating on send_list or recv_list. If it hits
recv_list it still can be that answers coming back for an ongoing
operation and do_unlock_close() is not synchronized with the list_del().
This will end in "no op ..." log_print(), to not confuse the user about
such issues which seems to be there by design we move this logging
information to pr_debug() as those are expected log messages.
Cc: stable at vger.kernel.org
Fixes: b92a4e3f86b1 ("fs: dlm: change posix lock sigint handling")
Signed-off-by: Alexander Aring <aahringo@redhat.com>
---
fs/dlm/plock.c | 25 ++++++-------------------
1 file changed, 6 insertions(+), 19 deletions(-)
diff --git a/fs/dlm/plock.c b/fs/dlm/plock.c
index ff364901f22b..fea2157fac5b 100644
--- a/fs/dlm/plock.c
+++ b/fs/dlm/plock.c
@@ -30,8 +30,6 @@ struct plock_async_data {
struct plock_op {
struct list_head list;
int done;
- /* if lock op got interrupted while waiting dlm_controld reply */
- bool sigint;
struct dlm_plock_info info;
/* if set indicates async handling */
struct plock_async_data *data;
@@ -167,12 +165,14 @@ int dlm_posix_lock(dlm_lockspace_t *lockspace, u64 number, struct file *file,
spin_unlock(&ops_lock);
goto do_lock_wait;
}
-
- op->sigint = true;
+ list_del(&op->list);
spin_unlock(&ops_lock);
+
log_debug(ls, "%s: wait interrupted %x %llx pid %d",
__func__, ls->ls_global_id,
(unsigned long long)number, op->info.pid);
+ do_unlock_close(&op->info);
+ dlm_release_plock_op(op);
goto out;
}
@@ -434,19 +434,6 @@ static ssize_t dev_write(struct file *file, const char __user *u, size_t count,
if (iter->info.fsid == info.fsid &&
iter->info.number == info.number &&
iter->info.owner == info.owner) {
- if (iter->sigint) {
- list_del(&iter->list);
- spin_unlock(&ops_lock);
-
- pr_debug("%s: sigint cleanup %x %llx pid %d",
- __func__, iter->info.fsid,
- (unsigned long long)iter->info.number,
- iter->info.pid);
- do_unlock_close(&iter->info);
- memcpy(&iter->info, &info, sizeof(info));
- dlm_release_plock_op(iter);
- return count;
- }
list_del_init(&iter->list);
memcpy(&iter->info, &info, sizeof(info));
if (iter->data)
@@ -465,8 +452,8 @@ static ssize_t dev_write(struct file *file, const char __user *u, size_t count,
else
wake_up(&recv_wq);
} else
- log_print("%s: no op %x %llx", __func__,
- info.fsid, (unsigned long long)info.number);
+ pr_debug("%s: no op %x %llx", __func__,
+ info.fsid, (unsigned long long)info.number);
return count;
}
--
2.31.1
next prev parent reply other threads:[~2023-05-19 15:21 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-19 15:21 [Cluster-devel] [PATCH v6.4-rc2 1/5] fs: dlm: change local pids to be positive pids Alexander Aring
2023-05-19 15:21 ` Alexander Aring [this message]
2023-05-19 15:21 ` [Cluster-devel] [PATCH v6.4-rc2 3/5] fs: dlm: switch posix lock to killable only Alexander Aring
2023-05-19 15:21 ` [Cluster-devel] [PATCH v6.4-rc2 4/5] fs: dlm: make F_SETLK not killable Alexander Aring
2023-05-19 15:21 ` [Cluster-devel] [PATCH v6.4-rc2 5/5] fs: dlm: avoid F_SETLKW plock op lookup collisions Alexander Aring
2023-05-19 21:20 ` Alexander Aring
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230519152128.65272-2-aahringo@redhat.com \
--to=aahringo@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).