* [Cluster-devel] [PATCH RESEND v5.15-rc7 00/18] fs: dlm: cleanups, trace and debugfs
@ 2021-11-02 19:17 Alexander Aring
2021-11-02 19:17 ` [Cluster-devel] [PATCH RESEND v5.15-rc7 01/18] fs: dlm: remove obsolete INBUF define Alexander Aring
` (17 more replies)
0 siblings, 18 replies; 19+ messages in thread
From: Alexander Aring @ 2021-11-02 19:17 UTC (permalink / raw)
To: cluster-devel.redhat.com
Hi,
this patch series contains patches for some cleanups, adding tracepoints
and add more debugfs functionality to put dlm into a specific state
which can be combined to fuzz/test the DLM stack. This is a resend of
some pending patches. I still have some patches for trying changing the
ls_waiters mutex handling and how acting on critical recovery errors
(either if there is a dlm_new_lockspace() waiter or not).
- Alex
Alexander Aring (18):
fs: dlm: remove obsolete INBUF define
fs: dlm: fix small lockspace typo
fs: dlm: debug improvements print nodeid
fs: dlm: remove check SCTP is loaded message
fs: dlm: move version conversion to compile time
fs: dlm: use dlm_recovery_stopped instead of test_bit
fs: dlm: use dlm_recovery_stopped in condition
fs: dlm: make dlm_callback_resume quite
fs: dlm: initial support for tracepoints
fs: dlm: trace socket handling
fs: dlm: requestqueue busy wait to event based wait
fs: dlm: ls_count busy wait to event based wait
fs: dlm: let handle callback data as void
fs: dlm: add debugfs rawmsg send functionality
fs: dlm: allow create lkb with specific id range
fs: dlm: add lkb debugfs functionality
fs: dlm: add lkb waiters debugfs functionality
fs: dlm: filter user dlm messages for kernel locks
fs/dlm/ast.c | 12 +-
fs/dlm/debug_fs.c | 96 +++++++++++++-
fs/dlm/dir.c | 3 +-
fs/dlm/dlm_internal.h | 11 +-
fs/dlm/lock.c | 90 ++++++++++++-
fs/dlm/lock.h | 4 +
fs/dlm/lockspace.c | 37 +++---
fs/dlm/lowcomms.c | 25 ++--
fs/dlm/lowcomms.h | 4 +-
fs/dlm/main.c | 3 +
fs/dlm/member.c | 3 +-
fs/dlm/midcomms.c | 62 ++++++++-
fs/dlm/midcomms.h | 2 +
fs/dlm/rcom.c | 2 +-
fs/dlm/recoverd.c | 3 +-
fs/dlm/requestqueue.c | 17 ++-
include/trace/events/dlm.h | 260 +++++++++++++++++++++++++++++++++++++
17 files changed, 568 insertions(+), 66 deletions(-)
create mode 100644 include/trace/events/dlm.h
--
2.27.0
^ permalink raw reply [flat|nested] 19+ messages in thread
* [Cluster-devel] [PATCH RESEND v5.15-rc7 01/18] fs: dlm: remove obsolete INBUF define
2021-11-02 19:17 [Cluster-devel] [PATCH RESEND v5.15-rc7 00/18] fs: dlm: cleanups, trace and debugfs Alexander Aring
@ 2021-11-02 19:17 ` Alexander Aring
2021-11-02 19:17 ` [Cluster-devel] [PATCH RESEND v5.15-rc7 02/18] fs: dlm: fix small lockspace typo Alexander Aring
` (16 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: Alexander Aring @ 2021-11-02 19:17 UTC (permalink / raw)
To: cluster-devel.redhat.com
This patch removes an obsolete define for some length for an temporary
buffer which is not being used anymore. The use of this define is not
necessary anymore since commit 4798cbbfbd00 ("fs: dlm: rework receive
handling").
Signed-off-by: Alexander Aring <aahringo@redhat.com>
---
fs/dlm/dlm_internal.h | 6 ------
1 file changed, 6 deletions(-)
diff --git a/fs/dlm/dlm_internal.h b/fs/dlm/dlm_internal.h
index 5f57538b5d45..44a5c67b5213 100644
--- a/fs/dlm/dlm_internal.h
+++ b/fs/dlm/dlm_internal.h
@@ -41,12 +41,6 @@
#include <linux/dlm.h>
#include "config.h"
-/* Size of the temp buffer midcomms allocates on the stack.
- We try to make this large enough so most messages fit.
- FIXME: should sctp make this unnecessary? */
-
-#define DLM_INBUF_LEN 148
-
struct dlm_ls;
struct dlm_lkb;
struct dlm_rsb;
--
2.27.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [Cluster-devel] [PATCH RESEND v5.15-rc7 02/18] fs: dlm: fix small lockspace typo
2021-11-02 19:17 [Cluster-devel] [PATCH RESEND v5.15-rc7 00/18] fs: dlm: cleanups, trace and debugfs Alexander Aring
2021-11-02 19:17 ` [Cluster-devel] [PATCH RESEND v5.15-rc7 01/18] fs: dlm: remove obsolete INBUF define Alexander Aring
@ 2021-11-02 19:17 ` Alexander Aring
2021-11-02 19:17 ` [Cluster-devel] [PATCH RESEND v5.15-rc7 03/18] fs: dlm: debug improvements print nodeid Alexander Aring
` (15 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: Alexander Aring @ 2021-11-02 19:17 UTC (permalink / raw)
To: cluster-devel.redhat.com
This patch fixes a typo from lockspace to lockspace.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
---
fs/dlm/lockspace.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/dlm/lockspace.c b/fs/dlm/lockspace.c
index 10eddfa6c3d7..b90566502a81 100644
--- a/fs/dlm/lockspace.c
+++ b/fs/dlm/lockspace.c
@@ -868,7 +868,7 @@ static int release_lockspace(struct dlm_ls *ls, int force)
* until this returns.
*
* Force has 4 possible values:
- * 0 - don't destroy locksapce if it has any LKBs
+ * 0 - don't destroy lockspace if it has any LKBs
* 1 - destroy lockspace if it has remote LKBs but not if it has local LKBs
* 2 - destroy lockspace regardless of LKBs
* 3 - destroy lockspace as part of a forced shutdown
--
2.27.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [Cluster-devel] [PATCH RESEND v5.15-rc7 03/18] fs: dlm: debug improvements print nodeid
2021-11-02 19:17 [Cluster-devel] [PATCH RESEND v5.15-rc7 00/18] fs: dlm: cleanups, trace and debugfs Alexander Aring
2021-11-02 19:17 ` [Cluster-devel] [PATCH RESEND v5.15-rc7 01/18] fs: dlm: remove obsolete INBUF define Alexander Aring
2021-11-02 19:17 ` [Cluster-devel] [PATCH RESEND v5.15-rc7 02/18] fs: dlm: fix small lockspace typo Alexander Aring
@ 2021-11-02 19:17 ` Alexander Aring
2021-11-02 19:17 ` [Cluster-devel] [PATCH RESEND v5.15-rc7 04/18] fs: dlm: remove check SCTP is loaded message Alexander Aring
` (14 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: Alexander Aring @ 2021-11-02 19:17 UTC (permalink / raw)
To: cluster-devel.redhat.com
This patch improves the debug output for midcomms layer by also printing
out the nodeid where users counter belongs to.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
---
fs/dlm/midcomms.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/dlm/midcomms.c b/fs/dlm/midcomms.c
index 7ae39ec8d9b0..008078f06813 100644
--- a/fs/dlm/midcomms.c
+++ b/fs/dlm/midcomms.c
@@ -1231,7 +1231,7 @@ void dlm_midcomms_add_member(int nodeid)
}
node->users++;
- pr_debug("users inc count %d\n", node->users);
+ pr_debug("node %d users inc count %d\n", nodeid, node->users);
spin_unlock(&node->state_lock);
srcu_read_unlock(&nodes_srcu, idx);
@@ -1254,7 +1254,7 @@ void dlm_midcomms_remove_member(int nodeid)
spin_lock(&node->state_lock);
node->users--;
- pr_debug("users dec count %d\n", node->users);
+ pr_debug("node %d users dec count %d\n", nodeid, node->users);
/* hitting users count to zero means the
* other side is running dlm_midcomms_stop()
--
2.27.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [Cluster-devel] [PATCH RESEND v5.15-rc7 04/18] fs: dlm: remove check SCTP is loaded message
2021-11-02 19:17 [Cluster-devel] [PATCH RESEND v5.15-rc7 00/18] fs: dlm: cleanups, trace and debugfs Alexander Aring
` (2 preceding siblings ...)
2021-11-02 19:17 ` [Cluster-devel] [PATCH RESEND v5.15-rc7 03/18] fs: dlm: debug improvements print nodeid Alexander Aring
@ 2021-11-02 19:17 ` Alexander Aring
2021-11-02 19:17 ` [Cluster-devel] [PATCH RESEND v5.15-rc7 05/18] fs: dlm: move version conversion to compile time Alexander Aring
` (13 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: Alexander Aring @ 2021-11-02 19:17 UTC (permalink / raw)
To: cluster-devel.redhat.com
Since commit 764ff4011424 ("fs: dlm: auto load sctp module") we try
load the sctp module before we try to create a sctp kernel socket. That
a socket creation fails now has more likely other reasons. This patch
removes the part of error to load the sctp module and instead printout
the error code.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
---
fs/dlm/lowcomms.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c
index 8f715c620e1f..bee3757eb4c7 100644
--- a/fs/dlm/lowcomms.c
+++ b/fs/dlm/lowcomms.c
@@ -1775,7 +1775,7 @@ static int dlm_listen_for_all(void)
result = sock_create_kern(&init_net, dlm_local_addr[0]->ss_family,
SOCK_STREAM, dlm_proto_ops->proto, &sock);
if (result < 0) {
- log_print("Can't create comms socket, check SCTP is loaded");
+ log_print("Can't create comms socket: %d", result);
goto out;
}
--
2.27.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [Cluster-devel] [PATCH RESEND v5.15-rc7 05/18] fs: dlm: move version conversion to compile time
2021-11-02 19:17 [Cluster-devel] [PATCH RESEND v5.15-rc7 00/18] fs: dlm: cleanups, trace and debugfs Alexander Aring
` (3 preceding siblings ...)
2021-11-02 19:17 ` [Cluster-devel] [PATCH RESEND v5.15-rc7 04/18] fs: dlm: remove check SCTP is loaded message Alexander Aring
@ 2021-11-02 19:17 ` Alexander Aring
2021-11-02 19:17 ` [Cluster-devel] [PATCH RESEND v5.15-rc7 06/18] fs: dlm: use dlm_recovery_stopped instead of test_bit Alexander Aring
` (12 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: Alexander Aring @ 2021-11-02 19:17 UTC (permalink / raw)
To: cluster-devel.redhat.com
This patch moves version conversion to little endian from a runtime
variable to compile time constant.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
---
fs/dlm/midcomms.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/fs/dlm/midcomms.c b/fs/dlm/midcomms.c
index 008078f06813..76bdc3a9dc61 100644
--- a/fs/dlm/midcomms.c
+++ b/fs/dlm/midcomms.c
@@ -909,11 +909,11 @@ int dlm_process_incoming_buffer(int nodeid, unsigned char *buf, int len)
if (msglen > len)
break;
- switch (le32_to_cpu(hd->h_version)) {
- case DLM_VERSION_3_1:
+ switch (hd->h_version) {
+ case cpu_to_le32(DLM_VERSION_3_1):
dlm_midcomms_receive_buffer_3_1((union dlm_packet *)ptr, nodeid);
break;
- case DLM_VERSION_3_2:
+ case cpu_to_le32(DLM_VERSION_3_2):
dlm_midcomms_receive_buffer_3_2((union dlm_packet *)ptr, nodeid);
break;
default:
--
2.27.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [Cluster-devel] [PATCH RESEND v5.15-rc7 06/18] fs: dlm: use dlm_recovery_stopped instead of test_bit
2021-11-02 19:17 [Cluster-devel] [PATCH RESEND v5.15-rc7 00/18] fs: dlm: cleanups, trace and debugfs Alexander Aring
` (4 preceding siblings ...)
2021-11-02 19:17 ` [Cluster-devel] [PATCH RESEND v5.15-rc7 05/18] fs: dlm: move version conversion to compile time Alexander Aring
@ 2021-11-02 19:17 ` Alexander Aring
2021-11-02 19:17 ` [Cluster-devel] [PATCH RESEND v5.15-rc7 07/18] fs: dlm: use dlm_recovery_stopped in condition Alexander Aring
` (11 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: Alexander Aring @ 2021-11-02 19:17 UTC (permalink / raw)
To: cluster-devel.redhat.com
This patch will change to use dlm_recovery_stopped() which is the dlm way
to check if the LSFL_RECOVER_STOP flag in ls_flags by using the helper.
It is an atomic operation but the check is still as before to fetch the
value if ls_recover_lock is held. There might be more further
investigations if the value can be changed afterwards and if it has any
side effects.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
---
fs/dlm/rcom.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/dlm/rcom.c b/fs/dlm/rcom.c
index 6cba86470278..5821b777a1a7 100644
--- a/fs/dlm/rcom.c
+++ b/fs/dlm/rcom.c
@@ -601,7 +601,7 @@ void dlm_receive_rcom(struct dlm_ls *ls, struct dlm_rcom *rc, int nodeid)
spin_lock(&ls->ls_recover_lock);
status = ls->ls_recover_status;
- stop = test_bit(LSFL_RECOVER_STOP, &ls->ls_flags);
+ stop = dlm_recovery_stopped(ls);
seq = ls->ls_recover_seq;
spin_unlock(&ls->ls_recover_lock);
--
2.27.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [Cluster-devel] [PATCH RESEND v5.15-rc7 07/18] fs: dlm: use dlm_recovery_stopped in condition
2021-11-02 19:17 [Cluster-devel] [PATCH RESEND v5.15-rc7 00/18] fs: dlm: cleanups, trace and debugfs Alexander Aring
` (5 preceding siblings ...)
2021-11-02 19:17 ` [Cluster-devel] [PATCH RESEND v5.15-rc7 06/18] fs: dlm: use dlm_recovery_stopped instead of test_bit Alexander Aring
@ 2021-11-02 19:17 ` Alexander Aring
2021-11-02 19:17 ` [Cluster-devel] [PATCH RESEND v5.15-rc7 08/18] fs: dlm: make dlm_callback_resume quite Alexander Aring
` (10 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: Alexander Aring @ 2021-11-02 19:17 UTC (permalink / raw)
To: cluster-devel.redhat.com
This patch will change to evaluate the dlm_recovery_stopped() in the
condition of the if branch instead fetch it before evaluating the
condition. As this is an atomic test-set operation it should be
evaluated in the condition itself.
Reported-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Alexander Aring <aahringo@redhat.com>
---
fs/dlm/dir.c | 3 +--
fs/dlm/member.c | 3 +--
fs/dlm/recoverd.c | 3 +--
3 files changed, 3 insertions(+), 6 deletions(-)
diff --git a/fs/dlm/dir.c b/fs/dlm/dir.c
index 45ebbe602bbf..b6692f81ec83 100644
--- a/fs/dlm/dir.c
+++ b/fs/dlm/dir.c
@@ -84,8 +84,7 @@ int dlm_recover_directory(struct dlm_ls *ls)
for (;;) {
int left;
- error = dlm_recovery_stopped(ls);
- if (error) {
+ if (dlm_recovery_stopped(ls)) {
error = -EINTR;
goto out_free;
}
diff --git a/fs/dlm/member.c b/fs/dlm/member.c
index 731d489aa323..61f906e705db 100644
--- a/fs/dlm/member.c
+++ b/fs/dlm/member.c
@@ -442,8 +442,7 @@ static int ping_members(struct dlm_ls *ls)
int error = 0;
list_for_each_entry(memb, &ls->ls_nodes, list) {
- error = dlm_recovery_stopped(ls);
- if (error) {
+ if (dlm_recovery_stopped(ls)) {
error = -EINTR;
break;
}
diff --git a/fs/dlm/recoverd.c b/fs/dlm/recoverd.c
index 97d052cea5a9..a55dfce705dd 100644
--- a/fs/dlm/recoverd.c
+++ b/fs/dlm/recoverd.c
@@ -124,8 +124,7 @@ static int ls_recover(struct dlm_ls *ls, struct dlm_recover *rv)
dlm_recover_waiters_pre(ls);
- error = dlm_recovery_stopped(ls);
- if (error) {
+ if (dlm_recovery_stopped(ls)) {
error = -EINTR;
goto fail;
}
--
2.27.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [Cluster-devel] [PATCH RESEND v5.15-rc7 08/18] fs: dlm: make dlm_callback_resume quite
2021-11-02 19:17 [Cluster-devel] [PATCH RESEND v5.15-rc7 00/18] fs: dlm: cleanups, trace and debugfs Alexander Aring
` (6 preceding siblings ...)
2021-11-02 19:17 ` [Cluster-devel] [PATCH RESEND v5.15-rc7 07/18] fs: dlm: use dlm_recovery_stopped in condition Alexander Aring
@ 2021-11-02 19:17 ` Alexander Aring
2021-11-02 19:17 ` [Cluster-devel] [PATCH RESEND v5.15-rc7 09/18] fs: dlm: initial support for tracepoints Alexander Aring
` (9 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: Alexander Aring @ 2021-11-02 19:17 UTC (permalink / raw)
To: cluster-devel.redhat.com
This patch makes dlm_callback_resume info printout less noisy by
accumulate all callback queues into one printout not in 25 times steps.
It seems this printout became lately quite noisy in relationship with
gfs2.
Before:
[241767.849302] dlm: bin: dlm_callback_resume 25
[241767.854846] dlm: bin: dlm_callback_resume 25
[241767.860373] dlm: bin: dlm_callback_resume 25
...
[241767.865920] dlm: bin: dlm_callback_resume 25
[241767.871352] dlm: bin: dlm_callback_resume 25
[241767.876733] dlm: bin: dlm_callback_resume 25
After the patch:
[ 385.485728] dlm: gfs2: dlm_callback_resume 175
if zero it will not be printed out.
Reported-by: Barry Marson <bmarson@redhat.com>
Signed-off-by: Alexander Aring <aahringo@redhat.com>
---
fs/dlm/ast.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/fs/dlm/ast.c b/fs/dlm/ast.c
index 283c7b94edda..6600930497cc 100644
--- a/fs/dlm/ast.c
+++ b/fs/dlm/ast.c
@@ -295,7 +295,7 @@ void dlm_callback_suspend(struct dlm_ls *ls)
void dlm_callback_resume(struct dlm_ls *ls)
{
struct dlm_lkb *lkb, *safe;
- int count = 0;
+ int count = 0, sum = 0;
clear_bit(LSFL_CB_DELAY, &ls->ls_flags);
@@ -313,12 +313,14 @@ void dlm_callback_resume(struct dlm_ls *ls)
}
mutex_unlock(&ls->ls_cb_mutex);
- if (count)
- log_rinfo(ls, "dlm_callback_resume %d", count);
+ sum += count;
if (count == MAX_CB_QUEUE) {
count = 0;
cond_resched();
goto more;
}
+
+ if (sum)
+ log_rinfo(ls, "%s %d", __func__, sum);
}
--
2.27.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [Cluster-devel] [PATCH RESEND v5.15-rc7 09/18] fs: dlm: initial support for tracepoints
2021-11-02 19:17 [Cluster-devel] [PATCH RESEND v5.15-rc7 00/18] fs: dlm: cleanups, trace and debugfs Alexander Aring
` (7 preceding siblings ...)
2021-11-02 19:17 ` [Cluster-devel] [PATCH RESEND v5.15-rc7 08/18] fs: dlm: make dlm_callback_resume quite Alexander Aring
@ 2021-11-02 19:17 ` Alexander Aring
2021-11-02 19:17 ` [Cluster-devel] [PATCH RESEND v5.15-rc7 10/18] fs: dlm: trace socket handling Alexander Aring
` (8 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: Alexander Aring @ 2021-11-02 19:17 UTC (permalink / raw)
To: cluster-devel.redhat.com
This patch adds initial support for dlm tracepoints. It will introduce
tracepoints to dlm main functionality dlm_lock()/dlm_unlock() and their
complete ast() callback or blocking bast() callback.
The lock/unlock functionality has a start and end tracepoint, this is
because there exists a race in case if would have a tracepoint at the
end position only the complete/blocking callbacks could occur before. To
work with eBPF tracing and using their lookup hash functionality there
could be problems that an entry was not inserted yet. However use the
start functionality for hash insert and check again in end functionality
if there was an dlm internal error so there is no ast callback. In further
it might also that locks with local masters will occur those callbacks
immediately so we must have such functionality.
I did not make everything accessible yet, although it seems eBPF can be
used to access a lot of internal datastructures if it's aware of the
struct definitions of the running kernel instance. We still can change
it, if you do eBPF experiments e.g. time measurements between lock and
callback functionality you can simple use the local lkb_id field as hash
value in combination with the lockspace id if you have multiple
lockspaces. Otherwise you can simple use trace-cmd for some functionality,
e.g. `trace-cmd record -e dlm` and `trace-cmd report` afterwards.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
---
fs/dlm/ast.c | 4 +
fs/dlm/lock.c | 10 ++
fs/dlm/main.c | 3 +
include/trace/events/dlm.h | 220 +++++++++++++++++++++++++++++++++++++
4 files changed, 237 insertions(+)
create mode 100644 include/trace/events/dlm.h
diff --git a/fs/dlm/ast.c b/fs/dlm/ast.c
index 6600930497cc..27bae7d4a477 100644
--- a/fs/dlm/ast.c
+++ b/fs/dlm/ast.c
@@ -9,6 +9,8 @@
*******************************************************************************
******************************************************************************/
+#include <trace/events/dlm.h>
+
#include "dlm_internal.h"
#include "lock.h"
#include "user.h"
@@ -254,10 +256,12 @@ void dlm_callback_work(struct work_struct *work)
continue;
} else if (callbacks[i].flags & DLM_CB_BAST) {
bastfn(lkb->lkb_astparam, callbacks[i].mode);
+ trace_dlm_bast(ls, lkb, callbacks[i].mode);
} else if (callbacks[i].flags & DLM_CB_CAST) {
lkb->lkb_lksb->sb_status = callbacks[i].sb_status;
lkb->lkb_lksb->sb_flags = callbacks[i].sb_flags;
castfn(lkb->lkb_astparam);
+ trace_dlm_ast(ls, lkb, lkb->lkb_lksb);
}
}
diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c
index c502c065d007..feb2e94f5879 100644
--- a/fs/dlm/lock.c
+++ b/fs/dlm/lock.c
@@ -53,6 +53,8 @@
R: do_xxxx()
L: receive_xxxx_reply() <- R: send_xxxx_reply()
*/
+#include <trace/events/dlm.h>
+
#include <linux/types.h>
#include <linux/rbtree.h>
#include <linux/slab.h>
@@ -3437,6 +3439,8 @@ int dlm_lock(dlm_lockspace_t *lockspace,
if (error)
goto out;
+ trace_dlm_lock_start(ls, lkb, mode, flags);
+
error = set_lock_args(mode, lksb, flags, namelen, 0, ast,
astarg, bast, &args);
if (error)
@@ -3450,6 +3454,8 @@ int dlm_lock(dlm_lockspace_t *lockspace,
if (error == -EINPROGRESS)
error = 0;
out_put:
+ trace_dlm_lock_end(ls, lkb, mode, flags, error);
+
if (convert || error)
__put_lkb(ls, lkb);
if (error == -EAGAIN || error == -EDEADLK)
@@ -3481,6 +3487,8 @@ int dlm_unlock(dlm_lockspace_t *lockspace,
if (error)
goto out;
+ trace_dlm_unlock_start(ls, lkb, flags);
+
error = set_unlock_args(flags, astarg, &args);
if (error)
goto out_put;
@@ -3495,6 +3503,8 @@ int dlm_unlock(dlm_lockspace_t *lockspace,
if (error == -EBUSY && (flags & (DLM_LKF_CANCEL | DLM_LKF_FORCEUNLOCK)))
error = 0;
out_put:
+ trace_dlm_unlock_end(ls, lkb, flags, error);
+
dlm_put_lkb(lkb);
out:
dlm_unlock_recovery(ls);
diff --git a/fs/dlm/main.c b/fs/dlm/main.c
index afc66a1346d3..1c5be4b70ac1 100644
--- a/fs/dlm/main.c
+++ b/fs/dlm/main.c
@@ -19,6 +19,9 @@
#include "config.h"
#include "lowcomms.h"
+#define CREATE_TRACE_POINTS
+#include <trace/events/dlm.h>
+
static int __init init_dlm(void)
{
int error;
diff --git a/include/trace/events/dlm.h b/include/trace/events/dlm.h
new file mode 100644
index 000000000000..c97b4c163c3e
--- /dev/null
+++ b/include/trace/events/dlm.h
@@ -0,0 +1,220 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM dlm
+
+#if !defined(_TRACE_DLM_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_DLM_H
+
+#include <linux/dlm.h>
+#include <linux/dlmconstants.h>
+#include <linux/tracepoint.h>
+
+#include "../../../fs/dlm/dlm_internal.h"
+
+#define show_lock_flags(flags) __print_flags(flags, "|", \
+ { DLM_LKF_NOQUEUE, "NOQUEUE" }, \
+ { DLM_LKF_CANCEL, "CANCEL" }, \
+ { DLM_LKF_CONVERT, "CONVERT" }, \
+ { DLM_LKF_VALBLK, "VALBLK" }, \
+ { DLM_LKF_QUECVT, "QUECVT" }, \
+ { DLM_LKF_IVVALBLK, "IVVALBLK" }, \
+ { DLM_LKF_CONVDEADLK, "CONVDEADLK" }, \
+ { DLM_LKF_PERSISTENT, "PERSISTENT" }, \
+ { DLM_LKF_NODLCKWT, "NODLCKWT" }, \
+ { DLM_LKF_NODLCKBLK, "NODLCKBLK" }, \
+ { DLM_LKF_EXPEDITE, "EXPEDITE" }, \
+ { DLM_LKF_NOQUEUEBAST, "NOQUEUEBAST" }, \
+ { DLM_LKF_HEADQUE, "HEADQUE" }, \
+ { DLM_LKF_NOORDER, "NOORDER" }, \
+ { DLM_LKF_ORPHAN, "ORPHAN" }, \
+ { DLM_LKF_ALTPR, "ALTPR" }, \
+ { DLM_LKF_ALTCW, "ALTCW" }, \
+ { DLM_LKF_FORCEUNLOCK, "FORCEUNLOCK" }, \
+ { DLM_LKF_TIMEOUT, "TIMEOUT" })
+
+#define show_lock_mode(mode) __print_symbolic(mode, \
+ { DLM_LOCK_IV, "IV"}, \
+ { DLM_LOCK_NL, "NL"}, \
+ { DLM_LOCK_CR, "CR"}, \
+ { DLM_LOCK_CW, "CW"}, \
+ { DLM_LOCK_PR, "PR"}, \
+ { DLM_LOCK_PW, "PW"}, \
+ { DLM_LOCK_EX, "EX"})
+
+#define show_dlm_sb_flags(flags) __print_flags(flags, "|", \
+ { DLM_SBF_DEMOTED, "DEMOTED" }, \
+ { DLM_SBF_VALNOTVALID, "VALNOTVALID" }, \
+ { DLM_SBF_ALTMODE, "ALTMODE" })
+
+/* note: we begin tracing dlm_lock_start() only if ls and lkb are found */
+TRACE_EVENT(dlm_lock_start,
+
+ TP_PROTO(struct dlm_ls *ls, struct dlm_lkb *lkb, int mode,
+ __u32 flags),
+
+ TP_ARGS(ls, lkb, mode, flags),
+
+ TP_STRUCT__entry(
+ __field(__u32, ls_id)
+ __field(__u32, lkb_id)
+ __field(int, mode)
+ __field(__u32, flags)
+ ),
+
+ TP_fast_assign(
+ __entry->ls_id = ls->ls_global_id;
+ __entry->lkb_id = lkb->lkb_id;
+ __entry->mode = mode;
+ __entry->flags = flags;
+ ),
+
+ TP_printk("ls_id=%u lkb_id=%x mode=%s flags=%s",
+ __entry->ls_id, __entry->lkb_id,
+ show_lock_mode(__entry->mode),
+ show_lock_flags(__entry->flags))
+
+);
+
+TRACE_EVENT(dlm_lock_end,
+
+ TP_PROTO(struct dlm_ls *ls, struct dlm_lkb *lkb, int mode, __u32 flags,
+ int error),
+
+ TP_ARGS(ls, lkb, mode, flags, error),
+
+ TP_STRUCT__entry(
+ __field(__u32, ls_id)
+ __field(__u32, lkb_id)
+ __field(int, mode)
+ __field(__u32, flags)
+ __field(int, error)
+ ),
+
+ TP_fast_assign(
+ __entry->ls_id = ls->ls_global_id;
+ __entry->lkb_id = lkb->lkb_id;
+ __entry->mode = mode;
+ __entry->flags = flags;
+
+ /* return value will be zeroed in those cases by dlm_lock()
+ * we do it here again to not introduce more overhead if
+ * trace isn't running and error reflects the return value.
+ */
+ if (error == -EAGAIN || error == -EDEADLK)
+ __entry->error = 0;
+ else
+ __entry->error = error;
+ ),
+
+ TP_printk("ls_id=%u lkb_id=%x mode=%s flags=%s error=%d",
+ __entry->ls_id, __entry->lkb_id,
+ show_lock_mode(__entry->mode),
+ show_lock_flags(__entry->flags), __entry->error)
+
+);
+
+TRACE_EVENT(dlm_bast,
+
+ TP_PROTO(struct dlm_ls *ls, struct dlm_lkb *lkb, int mode),
+
+ TP_ARGS(ls, lkb, mode),
+
+ TP_STRUCT__entry(
+ __field(__u32, ls_id)
+ __field(__u32, lkb_id)
+ __field(int, mode)
+ ),
+
+ TP_fast_assign(
+ __entry->ls_id = ls->ls_global_id;
+ __entry->lkb_id = lkb->lkb_id;
+ __entry->mode = mode;
+ ),
+
+ TP_printk("ls_id=%u lkb_id=%x mode=%s", __entry->ls_id,
+ __entry->lkb_id, show_lock_mode(__entry->mode))
+
+);
+
+TRACE_EVENT(dlm_ast,
+
+ TP_PROTO(struct dlm_ls *ls, struct dlm_lkb *lkb, struct dlm_lksb *lksb),
+
+ TP_ARGS(ls, lkb, lksb),
+
+ TP_STRUCT__entry(
+ __field(__u32, ls_id)
+ __field(__u32, lkb_id)
+ __field(u8, sb_flags)
+ __field(int, sb_status)
+ ),
+
+ TP_fast_assign(
+ __entry->ls_id = ls->ls_global_id;
+ __entry->lkb_id = lkb->lkb_id;
+ __entry->sb_flags = lksb->sb_flags;
+ __entry->sb_status = lksb->sb_status;
+ ),
+
+ TP_printk("ls_id=%u lkb_id=%x sb_flags=%s sb_status=%d",
+ __entry->ls_id, __entry->lkb_id,
+ show_dlm_sb_flags(__entry->sb_flags), __entry->sb_status)
+
+);
+
+/* note: we begin tracing dlm_unlock_start() only if ls and lkb are found */
+TRACE_EVENT(dlm_unlock_start,
+
+ TP_PROTO(struct dlm_ls *ls, struct dlm_lkb *lkb, __u32 flags),
+
+ TP_ARGS(ls, lkb, flags),
+
+ TP_STRUCT__entry(
+ __field(__u32, ls_id)
+ __field(__u32, lkb_id)
+ __field(__u32, flags)
+ ),
+
+ TP_fast_assign(
+ __entry->ls_id = ls->ls_global_id;
+ __entry->lkb_id = lkb->lkb_id;
+ __entry->flags = flags;
+ ),
+
+ TP_printk("ls_id=%u lkb_id=%x flags=%s",
+ __entry->ls_id, __entry->lkb_id,
+ show_lock_flags(__entry->flags))
+
+);
+
+TRACE_EVENT(dlm_unlock_end,
+
+ TP_PROTO(struct dlm_ls *ls, struct dlm_lkb *lkb, __u32 flags,
+ int error),
+
+ TP_ARGS(ls, lkb, flags, error),
+
+ TP_STRUCT__entry(
+ __field(__u32, ls_id)
+ __field(__u32, lkb_id)
+ __field(__u32, flags)
+ __field(int, error)
+ ),
+
+ TP_fast_assign(
+ __entry->ls_id = ls->ls_global_id;
+ __entry->lkb_id = lkb->lkb_id;
+ __entry->flags = flags;
+ __entry->error = error;
+ ),
+
+ TP_printk("ls_id=%u lkb_id=%x flags=%s error=%d",
+ __entry->ls_id, __entry->lkb_id,
+ show_lock_flags(__entry->flags), __entry->error)
+
+);
+
+#endif /* if !defined(_TRACE_DLM_H) || defined(TRACE_HEADER_MULTI_READ) */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
--
2.27.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [Cluster-devel] [PATCH RESEND v5.15-rc7 10/18] fs: dlm: trace socket handling
2021-11-02 19:17 [Cluster-devel] [PATCH RESEND v5.15-rc7 00/18] fs: dlm: cleanups, trace and debugfs Alexander Aring
` (8 preceding siblings ...)
2021-11-02 19:17 ` [Cluster-devel] [PATCH RESEND v5.15-rc7 09/18] fs: dlm: initial support for tracepoints Alexander Aring
@ 2021-11-02 19:17 ` Alexander Aring
2021-11-02 19:17 ` [Cluster-devel] [PATCH RESEND v5.15-rc7 11/18] fs: dlm: requestqueue busy wait to event based wait Alexander Aring
` (7 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: Alexander Aring @ 2021-11-02 19:17 UTC (permalink / raw)
To: cluster-devel.redhat.com
This patch adds tracepoints for dlm socket receive and send
functionality. We can use it to track how much data was send or received
to or from a specific nodeid.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
---
fs/dlm/lowcomms.c | 4 ++++
include/trace/events/dlm.h | 40 ++++++++++++++++++++++++++++++++++++++
2 files changed, 44 insertions(+)
diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c
index bee3757eb4c7..6d6dcf0d5ba9 100644
--- a/fs/dlm/lowcomms.c
+++ b/fs/dlm/lowcomms.c
@@ -53,6 +53,8 @@
#include <net/sctp/sctp.h>
#include <net/ipv6.h>
+#include <trace/events/dlm.h>
+
#include "dlm_internal.h"
#include "lowcomms.h"
#include "midcomms.h"
@@ -925,6 +927,7 @@ static int receive_from_sock(struct connection *con)
msg.msg_flags = MSG_DONTWAIT | MSG_NOSIGNAL;
ret = kernel_recvmsg(con->sock, &msg, &iov, 1, iov.iov_len,
msg.msg_flags);
+ trace_dlm_recv(con->nodeid, ret);
if (ret == -EAGAIN)
break;
else if (ret <= 0)
@@ -1411,6 +1414,7 @@ static void send_to_sock(struct connection *con)
ret = kernel_sendpage(con->sock, e->page, offset, len,
msg_flags);
+ trace_dlm_send(con->nodeid, ret);
if (ret == -EAGAIN || ret == 0) {
if (ret == -EAGAIN &&
test_bit(SOCKWQ_ASYNC_NOSPACE, &con->sock->flags) &&
diff --git a/include/trace/events/dlm.h b/include/trace/events/dlm.h
index c97b4c163c3e..32088c603244 100644
--- a/include/trace/events/dlm.h
+++ b/include/trace/events/dlm.h
@@ -214,6 +214,46 @@ TRACE_EVENT(dlm_unlock_end,
);
+TRACE_EVENT(dlm_send,
+
+ TP_PROTO(int nodeid, int ret),
+
+ TP_ARGS(nodeid, ret),
+
+ TP_STRUCT__entry(
+ __field(int, nodeid)
+ __field(int, ret)
+ ),
+
+ TP_fast_assign(
+ __entry->nodeid = nodeid;
+ __entry->ret = ret;
+ ),
+
+ TP_printk("nodeid=%d ret=%d", __entry->nodeid, __entry->ret)
+
+);
+
+TRACE_EVENT(dlm_recv,
+
+ TP_PROTO(int nodeid, int ret),
+
+ TP_ARGS(nodeid, ret),
+
+ TP_STRUCT__entry(
+ __field(int, nodeid)
+ __field(int, ret)
+ ),
+
+ TP_fast_assign(
+ __entry->nodeid = nodeid;
+ __entry->ret = ret;
+ ),
+
+ TP_printk("nodeid=%d ret=%d", __entry->nodeid, __entry->ret)
+
+);
+
#endif /* if !defined(_TRACE_DLM_H) || defined(TRACE_HEADER_MULTI_READ) */
/* This part must be outside protection */
--
2.27.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [Cluster-devel] [PATCH RESEND v5.15-rc7 11/18] fs: dlm: requestqueue busy wait to event based wait
2021-11-02 19:17 [Cluster-devel] [PATCH RESEND v5.15-rc7 00/18] fs: dlm: cleanups, trace and debugfs Alexander Aring
` (9 preceding siblings ...)
2021-11-02 19:17 ` [Cluster-devel] [PATCH RESEND v5.15-rc7 10/18] fs: dlm: trace socket handling Alexander Aring
@ 2021-11-02 19:17 ` Alexander Aring
2021-11-02 19:17 ` [Cluster-devel] [PATCH RESEND v5.15-rc7 12/18] fs: dlm: ls_count " Alexander Aring
` (6 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: Alexander Aring @ 2021-11-02 19:17 UTC (permalink / raw)
To: cluster-devel.redhat.com
This patch changes the requestqueue busy waiting algorithm to use
atomic counter values and wait_event() to wait until the requestqueue is
empty. It will slightly reduce the number of holding ls_requestqueue_mutex
mutex.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
---
fs/dlm/dlm_internal.h | 2 ++
fs/dlm/lockspace.c | 2 ++
fs/dlm/requestqueue.c | 15 +++++++--------
3 files changed, 11 insertions(+), 8 deletions(-)
diff --git a/fs/dlm/dlm_internal.h b/fs/dlm/dlm_internal.h
index 44a5c67b5213..fd1c7a8c4485 100644
--- a/fs/dlm/dlm_internal.h
+++ b/fs/dlm/dlm_internal.h
@@ -626,6 +626,8 @@ struct dlm_ls {
struct rw_semaphore ls_in_recovery; /* block local requests */
struct rw_semaphore ls_recv_active; /* block dlm_recv */
struct list_head ls_requestqueue;/* queue remote requests */
+ atomic_t ls_requestqueue_cnt;
+ wait_queue_head_t ls_requestqueue_wait;
struct mutex ls_requestqueue_mutex;
struct dlm_rcom *ls_recover_buf;
int ls_recover_nodeid; /* for debugging */
diff --git a/fs/dlm/lockspace.c b/fs/dlm/lockspace.c
index b90566502a81..4e4181304ca1 100644
--- a/fs/dlm/lockspace.c
+++ b/fs/dlm/lockspace.c
@@ -564,6 +564,8 @@ static int new_lockspace(const char *name, const char *cluster,
init_rwsem(&ls->ls_in_recovery);
init_rwsem(&ls->ls_recv_active);
INIT_LIST_HEAD(&ls->ls_requestqueue);
+ atomic_set(&ls->ls_requestqueue_cnt, 0);
+ init_waitqueue_head(&ls->ls_requestqueue_wait);
mutex_init(&ls->ls_requestqueue_mutex);
mutex_init(&ls->ls_clear_proc_locks);
diff --git a/fs/dlm/requestqueue.c b/fs/dlm/requestqueue.c
index e89e0ff8bfa3..d0cf68570dcf 100644
--- a/fs/dlm/requestqueue.c
+++ b/fs/dlm/requestqueue.c
@@ -44,6 +44,7 @@ void dlm_add_requestqueue(struct dlm_ls *ls, int nodeid, struct dlm_message *ms)
e->nodeid = nodeid;
memcpy(&e->request, ms, ms->m_header.h_length);
+ atomic_inc(&ls->ls_requestqueue_cnt);
mutex_lock(&ls->ls_requestqueue_mutex);
list_add_tail(&e->list, &ls->ls_requestqueue);
mutex_unlock(&ls->ls_requestqueue_mutex);
@@ -89,6 +90,8 @@ int dlm_process_requestqueue(struct dlm_ls *ls)
mutex_lock(&ls->ls_requestqueue_mutex);
list_del(&e->list);
+ if (atomic_dec_and_test(&ls->ls_requestqueue_cnt))
+ wake_up(&ls->ls_requestqueue_wait);
kfree(e);
if (dlm_locking_stopped(ls)) {
@@ -115,14 +118,8 @@ int dlm_process_requestqueue(struct dlm_ls *ls)
void dlm_wait_requestqueue(struct dlm_ls *ls)
{
- for (;;) {
- mutex_lock(&ls->ls_requestqueue_mutex);
- if (list_empty(&ls->ls_requestqueue))
- break;
- mutex_unlock(&ls->ls_requestqueue_mutex);
- schedule();
- }
- mutex_unlock(&ls->ls_requestqueue_mutex);
+ wait_event(ls->ls_requestqueue_wait,
+ atomic_read(&ls->ls_requestqueue_cnt) == 0);
}
static int purge_request(struct dlm_ls *ls, struct dlm_message *ms, int nodeid)
@@ -161,6 +158,8 @@ void dlm_purge_requestqueue(struct dlm_ls *ls)
if (purge_request(ls, ms, e->nodeid)) {
list_del(&e->list);
+ if (atomic_dec_and_test(&ls->ls_requestqueue_cnt))
+ wake_up(&ls->ls_requestqueue_wait);
kfree(e);
}
}
--
2.27.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [Cluster-devel] [PATCH RESEND v5.15-rc7 12/18] fs: dlm: ls_count busy wait to event based wait
2021-11-02 19:17 [Cluster-devel] [PATCH RESEND v5.15-rc7 00/18] fs: dlm: cleanups, trace and debugfs Alexander Aring
` (10 preceding siblings ...)
2021-11-02 19:17 ` [Cluster-devel] [PATCH RESEND v5.15-rc7 11/18] fs: dlm: requestqueue busy wait to event based wait Alexander Aring
@ 2021-11-02 19:17 ` Alexander Aring
2021-11-02 19:17 ` [Cluster-devel] [PATCH RESEND v5.15-rc7 13/18] fs: dlm: let handle callback data as void Alexander Aring
` (5 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: Alexander Aring @ 2021-11-02 19:17 UTC (permalink / raw)
To: cluster-devel.redhat.com
This patch changes the ls_count busy wait to use atomic counter values
and wait_event() to wait until ls_count reach zero. It will slightly
reduce the number of holding lslist_lock. At remove lockspace we need to
retry the wait because it a lockspace get could interefere between
wait_event() and holding the lock which deletes the lockspace list entry.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
---
fs/dlm/dlm_internal.h | 3 ++-
fs/dlm/lockspace.c | 33 +++++++++++++++++----------------
fs/dlm/requestqueue.c | 2 +-
3 files changed, 20 insertions(+), 18 deletions(-)
diff --git a/fs/dlm/dlm_internal.h b/fs/dlm/dlm_internal.h
index fd1c7a8c4485..019931804af9 100644
--- a/fs/dlm/dlm_internal.h
+++ b/fs/dlm/dlm_internal.h
@@ -548,8 +548,9 @@ struct dlm_ls {
uint32_t ls_generation;
uint32_t ls_exflags;
int ls_lvblen;
- int ls_count; /* refcount of processes in
+ atomic_t ls_count; /* refcount of processes in
the dlm using this ls */
+ wait_queue_head_t ls_count_wait;
int ls_create_count; /* create/release refcount */
unsigned long ls_flags; /* LSFL_ */
unsigned long ls_scan_time;
diff --git a/fs/dlm/lockspace.c b/fs/dlm/lockspace.c
index 4e4181304ca1..2e51bd2bdacc 100644
--- a/fs/dlm/lockspace.c
+++ b/fs/dlm/lockspace.c
@@ -314,7 +314,7 @@ struct dlm_ls *dlm_find_lockspace_global(uint32_t id)
list_for_each_entry(ls, &lslist, ls_list) {
if (ls->ls_global_id == id) {
- ls->ls_count++;
+ atomic_inc(&ls->ls_count);
goto out;
}
}
@@ -331,7 +331,7 @@ struct dlm_ls *dlm_find_lockspace_local(dlm_lockspace_t *lockspace)
spin_lock(&lslist_lock);
list_for_each_entry(ls, &lslist, ls_list) {
if (ls->ls_local_handle == lockspace) {
- ls->ls_count++;
+ atomic_inc(&ls->ls_count);
goto out;
}
}
@@ -348,7 +348,7 @@ struct dlm_ls *dlm_find_lockspace_device(int minor)
spin_lock(&lslist_lock);
list_for_each_entry(ls, &lslist, ls_list) {
if (ls->ls_device.minor == minor) {
- ls->ls_count++;
+ atomic_inc(&ls->ls_count);
goto out;
}
}
@@ -360,24 +360,24 @@ struct dlm_ls *dlm_find_lockspace_device(int minor)
void dlm_put_lockspace(struct dlm_ls *ls)
{
- spin_lock(&lslist_lock);
- ls->ls_count--;
- spin_unlock(&lslist_lock);
+ if (atomic_dec_and_test(&ls->ls_count))
+ wake_up(&ls->ls_count_wait);
}
static void remove_lockspace(struct dlm_ls *ls)
{
- for (;;) {
- spin_lock(&lslist_lock);
- if (ls->ls_count == 0) {
- WARN_ON(ls->ls_create_count != 0);
- list_del(&ls->ls_list);
- spin_unlock(&lslist_lock);
- return;
- }
+retry:
+ wait_event(ls->ls_count_wait, atomic_read(&ls->ls_count) == 0);
+
+ spin_lock(&lslist_lock);
+ if (atomic_read(&ls->ls_count) != 0) {
spin_unlock(&lslist_lock);
- ssleep(1);
+ goto retry;
}
+
+ WARN_ON(ls->ls_create_count != 0);
+ list_del(&ls->ls_list);
+ spin_unlock(&lslist_lock);
}
static int threads_start(void)
@@ -481,7 +481,8 @@ static int new_lockspace(const char *name, const char *cluster,
memcpy(ls->ls_name, name, namelen);
ls->ls_namelen = namelen;
ls->ls_lvblen = lvblen;
- ls->ls_count = 0;
+ atomic_set(&ls->ls_count, 0);
+ init_waitqueue_head(&ls->ls_count_wait);
ls->ls_flags = 0;
ls->ls_scan_time = jiffies;
diff --git a/fs/dlm/requestqueue.c b/fs/dlm/requestqueue.c
index d0cf68570dcf..ccb5307c21e9 100644
--- a/fs/dlm/requestqueue.c
+++ b/fs/dlm/requestqueue.c
@@ -127,7 +127,7 @@ static int purge_request(struct dlm_ls *ls, struct dlm_message *ms, int nodeid)
uint32_t type = ms->m_type;
/* the ls is being cleaned up and freed by release_lockspace */
- if (!ls->ls_count)
+ if (!atomic_read(&ls->ls_count))
return 1;
if (dlm_is_removed(ls, nodeid))
--
2.27.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [Cluster-devel] [PATCH RESEND v5.15-rc7 13/18] fs: dlm: let handle callback data as void
2021-11-02 19:17 [Cluster-devel] [PATCH RESEND v5.15-rc7 00/18] fs: dlm: cleanups, trace and debugfs Alexander Aring
` (11 preceding siblings ...)
2021-11-02 19:17 ` [Cluster-devel] [PATCH RESEND v5.15-rc7 12/18] fs: dlm: ls_count " Alexander Aring
@ 2021-11-02 19:17 ` Alexander Aring
2021-11-02 19:17 ` [Cluster-devel] [PATCH RESEND v5.15-rc7 14/18] fs: dlm: add debugfs rawmsg send functionality Alexander Aring
` (4 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: Alexander Aring @ 2021-11-02 19:17 UTC (permalink / raw)
To: cluster-devel.redhat.com
This patch changes the dlm_lowcomms_new_msg() function pointer private data
from "struct mhandle *" to "void *" to provide different structures than
just "struct mhandle".
Signed-off-by: Alexander Aring <aahringo@redhat.com>
---
fs/dlm/lowcomms.c | 19 +++++++++----------
fs/dlm/lowcomms.h | 4 ++--
fs/dlm/midcomms.c | 4 +++-
3 files changed, 14 insertions(+), 13 deletions(-)
diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c
index 6d6dcf0d5ba9..3f8b015ba799 100644
--- a/fs/dlm/lowcomms.c
+++ b/fs/dlm/lowcomms.c
@@ -1205,8 +1205,7 @@ static struct writequeue_entry *new_writequeue_entry(struct connection *con,
static struct writequeue_entry *new_wq_entry(struct connection *con, int len,
gfp_t allocation, char **ppc,
- void (*cb)(struct dlm_mhandle *mh),
- struct dlm_mhandle *mh)
+ void (*cb)(void *data), void *data)
{
struct writequeue_entry *e;
@@ -1218,7 +1217,7 @@ static struct writequeue_entry *new_wq_entry(struct connection *con, int len,
*ppc = page_address(e->page) + e->end;
if (cb)
- cb(mh);
+ cb(data);
e->end += len;
e->users++;
@@ -1240,7 +1239,7 @@ static struct writequeue_entry *new_wq_entry(struct connection *con, int len,
spin_lock(&con->writequeue_lock);
if (cb)
- cb(mh);
+ cb(data);
list_add_tail(&e->list, &con->writequeue);
spin_unlock(&con->writequeue_lock);
@@ -1250,8 +1249,8 @@ static struct writequeue_entry *new_wq_entry(struct connection *con, int len,
static struct dlm_msg *dlm_lowcomms_new_msg_con(struct connection *con, int len,
gfp_t allocation, char **ppc,
- void (*cb)(struct dlm_mhandle *mh),
- struct dlm_mhandle *mh)
+ void (*cb)(void *data),
+ void *data)
{
struct writequeue_entry *e;
struct dlm_msg *msg;
@@ -1274,7 +1273,7 @@ static struct dlm_msg *dlm_lowcomms_new_msg_con(struct connection *con, int len,
kref_init(&msg->ref);
- e = new_wq_entry(con, len, allocation, ppc, cb, mh);
+ e = new_wq_entry(con, len, allocation, ppc, cb, data);
if (!e) {
if (sleepable)
mutex_unlock(&con->wq_alloc);
@@ -1294,8 +1293,8 @@ static struct dlm_msg *dlm_lowcomms_new_msg_con(struct connection *con, int len,
}
struct dlm_msg *dlm_lowcomms_new_msg(int nodeid, int len, gfp_t allocation,
- char **ppc, void (*cb)(struct dlm_mhandle *mh),
- struct dlm_mhandle *mh)
+ char **ppc, void (*cb)(void *data),
+ void *data)
{
struct connection *con;
struct dlm_msg *msg;
@@ -1316,7 +1315,7 @@ struct dlm_msg *dlm_lowcomms_new_msg(int nodeid, int len, gfp_t allocation,
return NULL;
}
- msg = dlm_lowcomms_new_msg_con(con, len, allocation, ppc, cb, mh);
+ msg = dlm_lowcomms_new_msg_con(con, len, allocation, ppc, cb, data);
if (!msg) {
srcu_read_unlock(&connections_srcu, idx);
return NULL;
diff --git a/fs/dlm/lowcomms.h b/fs/dlm/lowcomms.h
index 4ccae07cf005..8108ea24ec30 100644
--- a/fs/dlm/lowcomms.h
+++ b/fs/dlm/lowcomms.h
@@ -38,8 +38,8 @@ void dlm_lowcomms_stop(void);
void dlm_lowcomms_exit(void);
int dlm_lowcomms_close(int nodeid);
struct dlm_msg *dlm_lowcomms_new_msg(int nodeid, int len, gfp_t allocation,
- char **ppc, void (*cb)(struct dlm_mhandle *mh),
- struct dlm_mhandle *mh);
+ char **ppc, void (*cb)(void *data),
+ void *data);
void dlm_lowcomms_commit_msg(struct dlm_msg *msg);
void dlm_lowcomms_put_msg(struct dlm_msg *msg);
int dlm_lowcomms_resend_msg(struct dlm_msg *msg);
diff --git a/fs/dlm/midcomms.c b/fs/dlm/midcomms.c
index 76bdc3a9dc61..95a5643a950e 100644
--- a/fs/dlm/midcomms.c
+++ b/fs/dlm/midcomms.c
@@ -1020,8 +1020,10 @@ static void dlm_fill_opts_header(struct dlm_opts *opts, uint16_t inner_len,
header_out(&opts->o_header);
}
-static void midcomms_new_msg_cb(struct dlm_mhandle *mh)
+static void midcomms_new_msg_cb(void *data)
{
+ struct dlm_mhandle *mh = data;
+
atomic_inc(&mh->node->send_queue_cnt);
spin_lock(&mh->node->send_queue_lock);
--
2.27.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [Cluster-devel] [PATCH RESEND v5.15-rc7 14/18] fs: dlm: add debugfs rawmsg send functionality
2021-11-02 19:17 [Cluster-devel] [PATCH RESEND v5.15-rc7 00/18] fs: dlm: cleanups, trace and debugfs Alexander Aring
` (12 preceding siblings ...)
2021-11-02 19:17 ` [Cluster-devel] [PATCH RESEND v5.15-rc7 13/18] fs: dlm: let handle callback data as void Alexander Aring
@ 2021-11-02 19:17 ` Alexander Aring
2021-11-02 19:17 ` [Cluster-devel] [PATCH RESEND v5.15-rc7 15/18] fs: dlm: allow create lkb with specific id range Alexander Aring
` (3 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: Alexander Aring @ 2021-11-02 19:17 UTC (permalink / raw)
To: cluster-devel.redhat.com
This patch adds a dlm functionality to send a raw dlm message to a
specific cluster node. This raw message can be build by user space and
send out by writing the message to "rawmsg" dlm debugfs file.
There is a in progress scapy dlm module which provides a easy build of
DLM messages in user space. For example:
DLM(h_cmd=3, o_nextcmd=1, h_nodeid=1, h_lockspace=0xe4f48a18, ...)
The goal is to provide an easy reproducable state to crash DLM or to
fuzz the DLM kernel stack if there are possible ways to crash it.
Note: that if the sequence number is zero and dlm version is not set to
3.1 the kernel will automatic will set a right sequence number, otherwise
DLM stack testing is not possible.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
---
fs/dlm/debug_fs.c | 37 ++++++++++++++++++++++++++++++++++++
fs/dlm/midcomms.c | 48 +++++++++++++++++++++++++++++++++++++++++++++++
fs/dlm/midcomms.h | 2 ++
3 files changed, 87 insertions(+)
diff --git a/fs/dlm/debug_fs.c b/fs/dlm/debug_fs.c
index 47e9d57e4cae..555904eeea8e 100644
--- a/fs/dlm/debug_fs.c
+++ b/fs/dlm/debug_fs.c
@@ -768,6 +768,42 @@ static int dlm_version_show(struct seq_file *file, void *offset)
}
DEFINE_SHOW_ATTRIBUTE(dlm_version);
+static ssize_t dlm_rawmsg_write(struct file *fp, const char __user *user_buf,
+ size_t count, loff_t *ppos)
+{
+ void *buf;
+ int ret;
+
+ if (count > PAGE_SIZE || count < sizeof(struct dlm_header))
+ return -EINVAL;
+
+ buf = kmalloc(PAGE_SIZE, GFP_NOFS);
+ if (!buf)
+ return -ENOMEM;
+
+ if (copy_from_user(buf, user_buf, count)) {
+ ret = -EFAULT;
+ goto out;
+ }
+
+ ret = dlm_midcomms_rawmsg_send(fp->private_data, buf, count);
+ if (ret)
+ goto out;
+
+ kfree(buf);
+ return count;
+
+out:
+ kfree(buf);
+ return ret;
+}
+
+static const struct file_operations dlm_rawmsg_fops = {
+ .open = simple_open,
+ .write = dlm_rawmsg_write,
+ .llseek = no_llseek,
+};
+
void *dlm_create_debug_comms_file(int nodeid, void *data)
{
struct dentry *d_node;
@@ -782,6 +818,7 @@ void *dlm_create_debug_comms_file(int nodeid, void *data)
debugfs_create_file("send_queue_count", 0444, d_node, data,
&dlm_send_queue_cnt_fops);
debugfs_create_file("version", 0444, d_node, data, &dlm_version_fops);
+ debugfs_create_file("rawmsg", 0200, d_node, data, &dlm_rawmsg_fops);
return d_node;
}
diff --git a/fs/dlm/midcomms.c b/fs/dlm/midcomms.c
index 95a5643a950e..0b9bce6f04e1 100644
--- a/fs/dlm/midcomms.c
+++ b/fs/dlm/midcomms.c
@@ -1427,3 +1427,51 @@ int dlm_midcomms_close(int nodeid)
return ret;
}
+
+/* debug functionality to send raw dlm msg from user space */
+struct dlm_rawmsg_data {
+ struct midcomms_node *node;
+ void *buf;
+};
+
+static void midcomms_new_rawmsg_cb(void *data)
+{
+ struct dlm_rawmsg_data *rd = data;
+ struct dlm_header *h = rd->buf;
+
+ switch (h->h_version) {
+ case cpu_to_le32(DLM_VERSION_3_1):
+ break;
+ default:
+ switch (h->h_cmd) {
+ case DLM_OPTS:
+ if (!h->u.h_seq)
+ h->u.h_seq = rd->node->seq_send++;
+ break;
+ default:
+ break;
+ }
+ break;
+ }
+}
+
+int dlm_midcomms_rawmsg_send(struct midcomms_node *node, void *buf,
+ int buflen)
+{
+ struct dlm_rawmsg_data rd;
+ struct dlm_msg *msg;
+ char *msgbuf;
+
+ rd.node = node;
+ rd.buf = buf;
+
+ msg = dlm_lowcomms_new_msg(node->nodeid, buflen, GFP_NOFS,
+ &msgbuf, midcomms_new_rawmsg_cb, &rd);
+ if (!msg)
+ return -ENOMEM;
+
+ memcpy(msgbuf, buf, buflen);
+ dlm_lowcomms_commit_msg(msg);
+ return 0;
+}
+
diff --git a/fs/dlm/midcomms.h b/fs/dlm/midcomms.h
index 579abc6929be..bc63cf73aa87 100644
--- a/fs/dlm/midcomms.h
+++ b/fs/dlm/midcomms.h
@@ -28,6 +28,8 @@ const char *dlm_midcomms_state(struct midcomms_node *node);
unsigned long dlm_midcomms_flags(struct midcomms_node *node);
int dlm_midcomms_send_queue_cnt(struct midcomms_node *node);
uint32_t dlm_midcomms_version(struct midcomms_node *node);
+int dlm_midcomms_rawmsg_send(struct midcomms_node *node, void *buf,
+ int buflen);
#endif /* __MIDCOMMS_DOT_H__ */
--
2.27.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [Cluster-devel] [PATCH RESEND v5.15-rc7 15/18] fs: dlm: allow create lkb with specific id range
2021-11-02 19:17 [Cluster-devel] [PATCH RESEND v5.15-rc7 00/18] fs: dlm: cleanups, trace and debugfs Alexander Aring
` (13 preceding siblings ...)
2021-11-02 19:17 ` [Cluster-devel] [PATCH RESEND v5.15-rc7 14/18] fs: dlm: add debugfs rawmsg send functionality Alexander Aring
@ 2021-11-02 19:17 ` Alexander Aring
2021-11-02 19:17 ` [Cluster-devel] [PATCH RESEND v5.15-rc7 16/18] fs: dlm: add lkb debugfs functionality Alexander Aring
` (2 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: Alexander Aring @ 2021-11-02 19:17 UTC (permalink / raw)
To: cluster-devel.redhat.com
This patch adds functionality to add a lkb with a specific id range.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
---
fs/dlm/lock.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c
index feb2e94f5879..8b30c9d9e545 100644
--- a/fs/dlm/lock.c
+++ b/fs/dlm/lock.c
@@ -1180,7 +1180,8 @@ static void detach_lkb(struct dlm_lkb *lkb)
}
}
-static int create_lkb(struct dlm_ls *ls, struct dlm_lkb **lkb_ret)
+static int _create_lkb(struct dlm_ls *ls, struct dlm_lkb **lkb_ret,
+ int start, int end)
{
struct dlm_lkb *lkb;
int rv;
@@ -1201,7 +1202,7 @@ static int create_lkb(struct dlm_ls *ls, struct dlm_lkb **lkb_ret)
idr_preload(GFP_NOFS);
spin_lock(&ls->ls_lkbidr_spin);
- rv = idr_alloc(&ls->ls_lkbidr, lkb, 1, 0, GFP_NOWAIT);
+ rv = idr_alloc(&ls->ls_lkbidr, lkb, start, end, GFP_NOWAIT);
if (rv >= 0)
lkb->lkb_id = rv;
spin_unlock(&ls->ls_lkbidr_spin);
@@ -1217,6 +1218,11 @@ static int create_lkb(struct dlm_ls *ls, struct dlm_lkb **lkb_ret)
return 0;
}
+static int create_lkb(struct dlm_ls *ls, struct dlm_lkb **lkb_ret)
+{
+ return _create_lkb(ls, lkb_ret, 1, 0);
+}
+
static int find_lkb(struct dlm_ls *ls, uint32_t lkid, struct dlm_lkb **lkb_ret)
{
struct dlm_lkb *lkb;
--
2.27.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [Cluster-devel] [PATCH RESEND v5.15-rc7 16/18] fs: dlm: add lkb debugfs functionality
2021-11-02 19:17 [Cluster-devel] [PATCH RESEND v5.15-rc7 00/18] fs: dlm: cleanups, trace and debugfs Alexander Aring
` (14 preceding siblings ...)
2021-11-02 19:17 ` [Cluster-devel] [PATCH RESEND v5.15-rc7 15/18] fs: dlm: allow create lkb with specific id range Alexander Aring
@ 2021-11-02 19:17 ` Alexander Aring
2021-11-02 19:17 ` [Cluster-devel] [PATCH RESEND v5.15-rc7 17/18] fs: dlm: add lkb waiters " Alexander Aring
2021-11-02 19:17 ` [Cluster-devel] [PATCH RESEND v5.15-rc7 18/18] fs: dlm: filter user dlm messages for kernel locks Alexander Aring
17 siblings, 0 replies; 19+ messages in thread
From: Alexander Aring @ 2021-11-02 19:17 UTC (permalink / raw)
To: cluster-devel.redhat.com
This patch adds functionality to add an lkb during runtime. This is a
highly debugging feature only, wrong input can crash the kernel. It is a
early state feature as well. The goal is to provide a user interface for
manipulate dlm state and combine it with the rawmsg feature. It is
debugfs functionality, we don't care about UAPI breakage. Even it's
possible to add lkb's/rsb's which could never be exists in such wat by
using normal DLM operation. The user of this interface always need to
think before using this feature, not every crash which happens can really
occur during normal dlm operation.
Future there should be more functionality to add a more realistic lkb
which reflects normal DLM state inside the kernel. For now this is
enough.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
---
fs/dlm/debug_fs.c | 32 +++++++++++++++++++++++++++++++-
fs/dlm/lock.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
fs/dlm/lock.h | 2 ++
3 files changed, 79 insertions(+), 1 deletion(-)
diff --git a/fs/dlm/debug_fs.c b/fs/dlm/debug_fs.c
index 555904eeea8e..2ead4751d655 100644
--- a/fs/dlm/debug_fs.c
+++ b/fs/dlm/debug_fs.c
@@ -635,6 +635,35 @@ static int table_open2(struct inode *inode, struct file *file)
return 0;
}
+static ssize_t table_write2(struct file *file, const char __user *user_buf,
+ size_t count, loff_t *ppos)
+{
+ struct seq_file *seq = file->private_data;
+ int n, len, lkb_nodeid, lkb_status, error;
+ char name[DLM_RESNAME_MAXLEN] = {};
+ struct dlm_ls *ls = seq->private;
+ unsigned int lkb_flags;
+ char buf[256] = {};
+ uint32_t lkb_id;
+
+ if (copy_from_user(buf, user_buf,
+ min_t(size_t, sizeof(buf) - 1, count)))
+ return -EFAULT;
+
+ n = sscanf(buf, "%x %" __stringify(DLM_RESNAME_MAXLEN) "s %x %d %d",
+ &lkb_id, name, &lkb_flags, &lkb_nodeid, &lkb_status);
+ if (n != 5)
+ return -EINVAL;
+
+ len = strnlen(name, DLM_RESNAME_MAXLEN);
+ error = dlm_debug_add_lkb(ls, lkb_id, name, len, lkb_flags,
+ lkb_nodeid, lkb_status);
+ if (error)
+ return error;
+
+ return count;
+}
+
static int table_open3(struct inode *inode, struct file *file)
{
struct seq_file *seq;
@@ -675,6 +704,7 @@ static const struct file_operations format2_fops = {
.owner = THIS_MODULE,
.open = table_open2,
.read = seq_read,
+ .write = table_write2,
.llseek = seq_lseek,
.release = seq_release
};
@@ -846,7 +876,7 @@ void dlm_create_debug_file(struct dlm_ls *ls)
snprintf(name, DLM_LOCKSPACE_LEN + 8, "%s_locks", ls->ls_name);
ls->ls_debug_locks_dentry = debugfs_create_file(name,
- S_IFREG | S_IRUGO,
+ 0644,
dlm_root,
ls,
&format2_fops);
diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c
index 8b30c9d9e545..aeb793693d8c 100644
--- a/fs/dlm/lock.c
+++ b/fs/dlm/lock.c
@@ -6317,3 +6317,49 @@ int dlm_user_purge(struct dlm_ls *ls, struct dlm_user_proc *proc,
return error;
}
+/* debug functionality */
+int dlm_debug_add_lkb(struct dlm_ls *ls, uint32_t lkb_id, char *name, int len,
+ int lkb_nodeid, unsigned int lkb_flags, int lkb_status)
+{
+ struct dlm_lksb *lksb;
+ struct dlm_lkb *lkb;
+ struct dlm_rsb *r;
+ int error;
+
+ /* we currently can't set a valid user lock */
+ if (lkb_flags & DLM_IFL_USER)
+ return -EOPNOTSUPP;
+
+ lksb = kzalloc(sizeof(*lksb), GFP_NOFS);
+ if (!lksb)
+ return -ENOMEM;
+
+ error = _create_lkb(ls, &lkb, lkb_id, lkb_id + 1);
+ if (error) {
+ kfree(lksb);
+ return error;
+ }
+
+ lkb->lkb_flags = lkb_flags;
+ lkb->lkb_nodeid = lkb_nodeid;
+ lkb->lkb_lksb = lksb;
+ /* user specific pointer, just don't have it NULL for kernel locks */
+ if (~lkb_flags & DLM_IFL_USER)
+ lkb->lkb_astparam = (void *)0xDEADBEEF;
+
+ error = find_rsb(ls, name, len, 0, R_REQUEST, &r);
+ if (error) {
+ kfree(lksb);
+ __put_lkb(ls, lkb);
+ return error;
+ }
+
+ lock_rsb(r);
+ attach_lkb(r, lkb);
+ add_lkb(r, lkb, lkb_status);
+ unlock_rsb(r);
+ put_rsb(r);
+
+ return 0;
+}
+
diff --git a/fs/dlm/lock.h b/fs/dlm/lock.h
index 456c6ec3ef6f..863a66e128a2 100644
--- a/fs/dlm/lock.h
+++ b/fs/dlm/lock.h
@@ -58,6 +58,8 @@ int dlm_user_purge(struct dlm_ls *ls, struct dlm_user_proc *proc,
int nodeid, int pid);
int dlm_user_deadlock(struct dlm_ls *ls, uint32_t flags, uint32_t lkid);
void dlm_clear_proc_locks(struct dlm_ls *ls, struct dlm_user_proc *proc);
+int dlm_debug_add_lkb(struct dlm_ls *ls, uint32_t lkb_id, char *name, int len,
+ int lkb_nodeid, unsigned int lkb_flags, int lkb_status);
static inline int is_master(struct dlm_rsb *r)
{
--
2.27.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [Cluster-devel] [PATCH RESEND v5.15-rc7 17/18] fs: dlm: add lkb waiters debugfs functionality
2021-11-02 19:17 [Cluster-devel] [PATCH RESEND v5.15-rc7 00/18] fs: dlm: cleanups, trace and debugfs Alexander Aring
` (15 preceding siblings ...)
2021-11-02 19:17 ` [Cluster-devel] [PATCH RESEND v5.15-rc7 16/18] fs: dlm: add lkb debugfs functionality Alexander Aring
@ 2021-11-02 19:17 ` Alexander Aring
2021-11-02 19:17 ` [Cluster-devel] [PATCH RESEND v5.15-rc7 18/18] fs: dlm: filter user dlm messages for kernel locks Alexander Aring
17 siblings, 0 replies; 19+ messages in thread
From: Alexander Aring @ 2021-11-02 19:17 UTC (permalink / raw)
To: cluster-devel.redhat.com
This patch adds functionality to put a lkb to the waiters state. It can
be useful to combine this feature with the "rawmsg" debugfs
functionality. It will bring the DLM lkb into a state that a message
will be parsed by the kernel.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
---
fs/dlm/debug_fs.c | 27 ++++++++++++++++++++++++++-
fs/dlm/lock.c | 15 +++++++++++++++
fs/dlm/lock.h | 2 ++
3 files changed, 43 insertions(+), 1 deletion(-)
diff --git a/fs/dlm/debug_fs.c b/fs/dlm/debug_fs.c
index 2ead4751d655..df6f3f107be4 100644
--- a/fs/dlm/debug_fs.c
+++ b/fs/dlm/debug_fs.c
@@ -754,10 +754,35 @@ static ssize_t waiters_read(struct file *file, char __user *userbuf,
return rv;
}
+static ssize_t waiters_write(struct file *file, const char __user *user_buf,
+ size_t count, loff_t *ppos)
+{
+ struct dlm_ls *ls = file->private_data;
+ int mstype, to_nodeid;
+ char buf[128] = {};
+ uint32_t lkb_id;
+ int n, error;
+
+ if (copy_from_user(buf, user_buf,
+ min_t(size_t, sizeof(buf) - 1, count)))
+ return -EFAULT;
+
+ n = sscanf(buf, "%x %d %d", &lkb_id, &mstype, &to_nodeid);
+ if (n != 3)
+ return -EINVAL;
+
+ error = dlm_debug_add_lkb_to_waiters(ls, lkb_id, mstype, to_nodeid);
+ if (error)
+ return error;
+
+ return count;
+}
+
static const struct file_operations waiters_fops = {
.owner = THIS_MODULE,
.open = simple_open,
.read = waiters_read,
+ .write = waiters_write,
.llseek = default_llseek,
};
@@ -907,7 +932,7 @@ void dlm_create_debug_file(struct dlm_ls *ls)
snprintf(name, DLM_LOCKSPACE_LEN + 8, "%s_waiters", ls->ls_name);
ls->ls_debug_waiters_dentry = debugfs_create_file(name,
- S_IFREG | S_IRUGO,
+ 0644,
dlm_root,
ls,
&waiters_fops);
diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c
index aeb793693d8c..0dbe273566c0 100644
--- a/fs/dlm/lock.c
+++ b/fs/dlm/lock.c
@@ -6363,3 +6363,18 @@ int dlm_debug_add_lkb(struct dlm_ls *ls, uint32_t lkb_id, char *name, int len,
return 0;
}
+int dlm_debug_add_lkb_to_waiters(struct dlm_ls *ls, uint32_t lkb_id,
+ int mstype, int to_nodeid)
+{
+ struct dlm_lkb *lkb;
+ int error;
+
+ error = find_lkb(ls, lkb_id, &lkb);
+ if (error)
+ return error;
+
+ error = add_to_waiters(lkb, mstype, to_nodeid);
+ dlm_put_lkb(lkb);
+ return error;
+}
+
diff --git a/fs/dlm/lock.h b/fs/dlm/lock.h
index 863a66e128a2..252a5898f908 100644
--- a/fs/dlm/lock.h
+++ b/fs/dlm/lock.h
@@ -60,6 +60,8 @@ int dlm_user_deadlock(struct dlm_ls *ls, uint32_t flags, uint32_t lkid);
void dlm_clear_proc_locks(struct dlm_ls *ls, struct dlm_user_proc *proc);
int dlm_debug_add_lkb(struct dlm_ls *ls, uint32_t lkb_id, char *name, int len,
int lkb_nodeid, unsigned int lkb_flags, int lkb_status);
+int dlm_debug_add_lkb_to_waiters(struct dlm_ls *ls, uint32_t lkb_id,
+ int mstype, int to_nodeid);
static inline int is_master(struct dlm_rsb *r)
{
--
2.27.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [Cluster-devel] [PATCH RESEND v5.15-rc7 18/18] fs: dlm: filter user dlm messages for kernel locks
2021-11-02 19:17 [Cluster-devel] [PATCH RESEND v5.15-rc7 00/18] fs: dlm: cleanups, trace and debugfs Alexander Aring
` (16 preceding siblings ...)
2021-11-02 19:17 ` [Cluster-devel] [PATCH RESEND v5.15-rc7 17/18] fs: dlm: add lkb waiters " Alexander Aring
@ 2021-11-02 19:17 ` Alexander Aring
17 siblings, 0 replies; 19+ messages in thread
From: Alexander Aring @ 2021-11-02 19:17 UTC (permalink / raw)
To: cluster-devel.redhat.com
This patch fixes the following crash by receiving a invalid message:
[ 160.672220] ==================================================================
[ 160.676206] BUG: KASAN: user-memory-access in dlm_user_add_ast+0xc3/0x370
[ 160.679659] Read of size 8 at addr 00000000deadbeef by task kworker/u32:13/319
[ 160.681447]
[ 160.681824] CPU: 10 PID: 319 Comm: kworker/u32:13 Not tainted 5.14.0-rc2+ #399
[ 160.683472] Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.14.0-1.module+el8.6.0+12648+6ede71a5 04/01/2014
[ 160.685574] Workqueue: dlm_recv process_recv_sockets
[ 160.686721] Call Trace:
[ 160.687310] dump_stack_lvl+0x56/0x6f
[ 160.688169] ? dlm_user_add_ast+0xc3/0x370
[ 160.689116] kasan_report.cold.14+0x116/0x11b
[ 160.690138] ? dlm_user_add_ast+0xc3/0x370
[ 160.690832] dlm_user_add_ast+0xc3/0x370
[ 160.691502] _receive_unlock_reply+0x103/0x170
[ 160.692241] _receive_message+0x11df/0x1ec0
[ 160.692926] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 160.693700] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 160.694427] ? lock_acquire+0x175/0x400
[ 160.695058] ? do_purge.isra.51+0x200/0x200
[ 160.695744] ? lock_acquired+0x360/0x5d0
[ 160.696400] ? lock_contended+0x6a0/0x6a0
[ 160.697055] ? lock_release+0x21d/0x5e0
[ 160.697686] ? lock_is_held_type+0xe0/0x110
[ 160.698352] ? lock_is_held_type+0xe0/0x110
[ 160.699026] ? ___might_sleep+0x1cc/0x1e0
[ 160.699698] ? dlm_wait_requestqueue+0x94/0x140
[ 160.700451] ? dlm_process_requestqueue+0x240/0x240
[ 160.701249] ? down_write_killable+0x2b0/0x2b0
[ 160.701988] ? do_raw_spin_unlock+0xa2/0x130
[ 160.702690] dlm_receive_buffer+0x1a5/0x210
[ 160.703385] dlm_process_incoming_buffer+0x726/0x9f0
[ 160.704210] receive_from_sock+0x1c0/0x3b0
[ 160.704886] ? dlm_tcp_shutdown+0x30/0x30
[ 160.705561] ? lock_acquire+0x175/0x400
[ 160.706197] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 160.706941] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 160.707681] process_recv_sockets+0x32/0x40
[ 160.708366] process_one_work+0x55e/0xad0
[ 160.709045] ? pwq_dec_nr_in_flight+0x110/0x110
[ 160.709820] worker_thread+0x65/0x5e0
[ 160.710423] ? process_one_work+0xad0/0xad0
[ 160.711087] kthread+0x1ed/0x220
[ 160.711628] ? set_kthread_struct+0x80/0x80
[ 160.712314] ret_from_fork+0x22/0x30
The issue is that we received a DLM message for a user lock but the
destination lock is a kernel lock. Note that the address which is trying
to derefence is 00000000deadbeef, which is in a kernel lock
lkb->lkb_astparam, this field should never be derefenced by the DLM
kernel stack. In case of a user lock lkb->lkb_astparam is lkb->lkb_ua
(memory is shared by a union field). The struct lkb_ua will be handled
by the DLM kernel stack but on a kernel lock it will contain invalid
data and ends in most likely crashing the kernel.
It can be reproduced with two cluster nodes.
node 2:
dlm_tool join test
echo "862 fooobaar 1 2 1" > /sys/kernel/debug/dlm/test_locks
echo "862 3 1" > /sys/kernel/debug/dlm/test_waiters
node 1:
dlm_tool join test
python:
foo = DLM(h_cmd=3, o_nextcmd=1, h_nodeid=1, h_lockspace=0x77222027, \
m_type=7, m_flags=0x1, m_remid=0x862, m_result=0xFFFEFFFE)
newFile = open("/sys/kernel/debug/dlm/comms/2/rawmsg", "wb")
newFile.write(bytes(foo))
Signed-off-by: Alexander Aring <aahringo@redhat.com>
---
fs/dlm/lock.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c
index 0dbe273566c0..54705d367076 100644
--- a/fs/dlm/lock.c
+++ b/fs/dlm/lock.c
@@ -3989,6 +3989,14 @@ static int validate_message(struct dlm_lkb *lkb, struct dlm_message *ms)
int from = ms->m_header.h_nodeid;
int error = 0;
+ /* currently mixing of user/kernel locks are not supported */
+ if (ms->m_flags & DLM_IFL_USER && ~lkb->lkb_flags & DLM_IFL_USER) {
+ log_error(lkb->lkb_resource->res_ls,
+ "got user dlm message for a kernel lock");
+ error = -EINVAL;
+ goto out;
+ }
+
switch (ms->m_type) {
case DLM_MSG_CONVERT:
case DLM_MSG_UNLOCK:
@@ -4017,6 +4025,7 @@ static int validate_message(struct dlm_lkb *lkb, struct dlm_message *ms)
error = -EINVAL;
}
+out:
if (error)
log_error(lkb->lkb_resource->res_ls,
"ignore invalid message %d from %d %x %x %x %d",
--
2.27.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
end of thread, other threads:[~2021-11-02 19:17 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-11-02 19:17 [Cluster-devel] [PATCH RESEND v5.15-rc7 00/18] fs: dlm: cleanups, trace and debugfs Alexander Aring
2021-11-02 19:17 ` [Cluster-devel] [PATCH RESEND v5.15-rc7 01/18] fs: dlm: remove obsolete INBUF define Alexander Aring
2021-11-02 19:17 ` [Cluster-devel] [PATCH RESEND v5.15-rc7 02/18] fs: dlm: fix small lockspace typo Alexander Aring
2021-11-02 19:17 ` [Cluster-devel] [PATCH RESEND v5.15-rc7 03/18] fs: dlm: debug improvements print nodeid Alexander Aring
2021-11-02 19:17 ` [Cluster-devel] [PATCH RESEND v5.15-rc7 04/18] fs: dlm: remove check SCTP is loaded message Alexander Aring
2021-11-02 19:17 ` [Cluster-devel] [PATCH RESEND v5.15-rc7 05/18] fs: dlm: move version conversion to compile time Alexander Aring
2021-11-02 19:17 ` [Cluster-devel] [PATCH RESEND v5.15-rc7 06/18] fs: dlm: use dlm_recovery_stopped instead of test_bit Alexander Aring
2021-11-02 19:17 ` [Cluster-devel] [PATCH RESEND v5.15-rc7 07/18] fs: dlm: use dlm_recovery_stopped in condition Alexander Aring
2021-11-02 19:17 ` [Cluster-devel] [PATCH RESEND v5.15-rc7 08/18] fs: dlm: make dlm_callback_resume quite Alexander Aring
2021-11-02 19:17 ` [Cluster-devel] [PATCH RESEND v5.15-rc7 09/18] fs: dlm: initial support for tracepoints Alexander Aring
2021-11-02 19:17 ` [Cluster-devel] [PATCH RESEND v5.15-rc7 10/18] fs: dlm: trace socket handling Alexander Aring
2021-11-02 19:17 ` [Cluster-devel] [PATCH RESEND v5.15-rc7 11/18] fs: dlm: requestqueue busy wait to event based wait Alexander Aring
2021-11-02 19:17 ` [Cluster-devel] [PATCH RESEND v5.15-rc7 12/18] fs: dlm: ls_count " Alexander Aring
2021-11-02 19:17 ` [Cluster-devel] [PATCH RESEND v5.15-rc7 13/18] fs: dlm: let handle callback data as void Alexander Aring
2021-11-02 19:17 ` [Cluster-devel] [PATCH RESEND v5.15-rc7 14/18] fs: dlm: add debugfs rawmsg send functionality Alexander Aring
2021-11-02 19:17 ` [Cluster-devel] [PATCH RESEND v5.15-rc7 15/18] fs: dlm: allow create lkb with specific id range Alexander Aring
2021-11-02 19:17 ` [Cluster-devel] [PATCH RESEND v5.15-rc7 16/18] fs: dlm: add lkb debugfs functionality Alexander Aring
2021-11-02 19:17 ` [Cluster-devel] [PATCH RESEND v5.15-rc7 17/18] fs: dlm: add lkb waiters " Alexander Aring
2021-11-02 19:17 ` [Cluster-devel] [PATCH RESEND v5.15-rc7 18/18] fs: dlm: filter user dlm messages for kernel locks Alexander Aring
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).