* [PATCH RESEND v7.1-rc1 0/4] dlm: pending DLM patches
@ 2026-04-27 15:59 Alexander Aring
2026-04-27 15:59 ` [PATCH RESEND v7.1-rc1 1/4] dlm: use hlist_for_each_entry_srcu for SRCU protected lists Alexander Aring
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: Alexander Aring @ 2026-04-27 15:59 UTC (permalink / raw)
To: teigland; +Cc: aahringo, gfs2
Hi David,
here are the pending DLM patches for dlm/next.
- using srcu list iterations to be aligned with lockdep
- fixing an issue with hardened kernel and lvb bytearray
in dlm_callback that is copied to user address space
- fixing an issue with concurrent DLM requests and the
midcomms sendqueue is out of order that ends in ref
counting issues
- cleanup midcomms node creation by moving debugfs
assignment before adding it to the node hash to
avoid issues that debugfs is not initialiazed if
concurrent users iterate over the hash.
- Alex
Alexander Aring (2):
dlm: fix add msg handle in send_queue ordered
dlm: init per node debugfs before add to node hash
Li RongQing (1):
dlm: use hlist_for_each_entry_srcu for SRCU protected lists
Ziyi Guo (1):
dlm: add usercopy whitelist to dlm_cb cache
fs/dlm/lowcomms.c | 12 ++++++++----
fs/dlm/memory.c | 4 +++-
fs/dlm/midcomms.c | 21 +++++++++++++--------
3 files changed, 24 insertions(+), 13 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH RESEND v7.1-rc1 1/4] dlm: use hlist_for_each_entry_srcu for SRCU protected lists
2026-04-27 15:59 [PATCH RESEND v7.1-rc1 0/4] dlm: pending DLM patches Alexander Aring
@ 2026-04-27 15:59 ` Alexander Aring
2026-04-27 15:59 ` [PATCH RESEND v7.1-rc1 2/4] dlm: add usercopy whitelist to dlm_cb cache Alexander Aring
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Alexander Aring @ 2026-04-27 15:59 UTC (permalink / raw)
To: teigland; +Cc: aahringo, gfs2
From: Li RongQing <lirongqing@baidu.com>
The connection and node hash tables in DLM are protected by SRCU, but
the code currently uses hlist_for_each_entry_rcu() for traversal.
While this works functionally, it is semantically incorrect and triggers
warnings when RCU lockdep debugging is enabled, as it expects regular
RCU read-side critical sections.
This patch replaces the incorrect macros with hlist_for_each_entry_srcu()
and adds the appropriate lockdep expressions using srcu_read_lock_held()
to ensure consistency with the underlying locking mechanism.
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Acked-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: Alexander Aring <aahringo@redhat.com>
---
fs/dlm/lowcomms.c | 12 ++++++++----
fs/dlm/midcomms.c | 15 ++++++++++-----
2 files changed, 18 insertions(+), 9 deletions(-)
diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c
index 5b61427879195..2aff1c7c17de4 100644
--- a/fs/dlm/lowcomms.c
+++ b/fs/dlm/lowcomms.c
@@ -271,7 +271,8 @@ static struct connection *__find_con(int nodeid, int r)
{
struct connection *con;
- hlist_for_each_entry_rcu(con, &connection_hash[r], list) {
+ hlist_for_each_entry_srcu(con, &connection_hash[r], list,
+ srcu_read_lock_held(&connections_srcu)) {
if (con->nodeid == nodeid)
return con;
}
@@ -426,7 +427,8 @@ static int addr_to_nodeid(struct sockaddr_storage *addr, int *nodeid,
idx = srcu_read_lock(&connections_srcu);
for (i = 0; i < CONN_HASH_SIZE; i++) {
- hlist_for_each_entry_rcu(con, &connection_hash[i], list) {
+ hlist_for_each_entry_srcu(con, &connection_hash[i], list,
+ srcu_read_lock_held(&connections_srcu)) {
WARN_ON_ONCE(!con->addr_count);
spin_lock(&con->addrs_lock);
@@ -1729,7 +1731,8 @@ void dlm_lowcomms_shutdown(void)
idx = srcu_read_lock(&connections_srcu);
for (i = 0; i < CONN_HASH_SIZE; i++) {
- hlist_for_each_entry_rcu(con, &connection_hash[i], list) {
+ hlist_for_each_entry_srcu(con, &connection_hash[i], list,
+ srcu_read_lock_held(&connections_srcu)) {
shutdown_connection(con, true);
stop_connection_io(con);
flush_workqueue(process_workqueue);
@@ -1968,7 +1971,8 @@ void dlm_lowcomms_exit(void)
idx = srcu_read_lock(&connections_srcu);
for (i = 0; i < CONN_HASH_SIZE; i++) {
- hlist_for_each_entry_rcu(con, &connection_hash[i], list) {
+ hlist_for_each_entry_srcu(con, &connection_hash[i], list,
+ srcu_read_lock_held(&connections_srcu)) {
spin_lock(&connections_lock);
hlist_del_rcu(&con->list);
spin_unlock(&connections_lock);
diff --git a/fs/dlm/midcomms.c b/fs/dlm/midcomms.c
index d54bdd8fc4f2e..a5b363b4785f3 100644
--- a/fs/dlm/midcomms.c
+++ b/fs/dlm/midcomms.c
@@ -275,7 +275,8 @@ static struct midcomms_node *__find_node(int nodeid, int r)
{
struct midcomms_node *node;
- hlist_for_each_entry_rcu(node, &node_hash[r], hlist) {
+ hlist_for_each_entry_srcu(node, &node_hash[r], hlist,
+ srcu_read_lock_held(&nodes_srcu)) {
if (node->nodeid == nodeid)
return node;
}
@@ -1165,7 +1166,8 @@ void dlm_midcomms_exit(void)
idx = srcu_read_lock(&nodes_srcu);
for (i = 0; i < CONN_HASH_SIZE; i++) {
- hlist_for_each_entry_rcu(node, &node_hash[i], hlist) {
+ hlist_for_each_entry_srcu(node, &node_hash[i], hlist,
+ srcu_read_lock_held(&nodes_srcu)) {
dlm_delete_debug_comms_file(node->debugfs);
spin_lock(&nodes_lock);
@@ -1325,7 +1327,8 @@ void dlm_midcomms_version_wait(void)
idx = srcu_read_lock(&nodes_srcu);
for (i = 0; i < CONN_HASH_SIZE; i++) {
- hlist_for_each_entry_rcu(node, &node_hash[i], hlist) {
+ hlist_for_each_entry_srcu(node, &node_hash[i], hlist,
+ srcu_read_lock_held(&nodes_srcu)) {
ret = wait_event_timeout(node->shutdown_wait,
node->version != DLM_VERSION_NOT_SET ||
node->state == DLM_CLOSED ||
@@ -1396,7 +1399,8 @@ void dlm_midcomms_shutdown(void)
mutex_lock(&close_lock);
idx = srcu_read_lock(&nodes_srcu);
for (i = 0; i < CONN_HASH_SIZE; i++) {
- hlist_for_each_entry_rcu(node, &node_hash[i], hlist) {
+ hlist_for_each_entry_srcu(node, &node_hash[i], hlist,
+ srcu_read_lock_held(&nodes_srcu)) {
midcomms_shutdown(node);
}
}
@@ -1404,7 +1408,8 @@ void dlm_midcomms_shutdown(void)
dlm_lowcomms_shutdown();
for (i = 0; i < CONN_HASH_SIZE; i++) {
- hlist_for_each_entry_rcu(node, &node_hash[i], hlist) {
+ hlist_for_each_entry_srcu(node, &node_hash[i], hlist,
+ srcu_read_lock_held(&nodes_srcu)) {
midcomms_node_reset(node);
}
}
--
2.43.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH RESEND v7.1-rc1 2/4] dlm: add usercopy whitelist to dlm_cb cache
2026-04-27 15:59 [PATCH RESEND v7.1-rc1 0/4] dlm: pending DLM patches Alexander Aring
2026-04-27 15:59 ` [PATCH RESEND v7.1-rc1 1/4] dlm: use hlist_for_each_entry_srcu for SRCU protected lists Alexander Aring
@ 2026-04-27 15:59 ` Alexander Aring
2026-04-27 15:59 ` [PATCH RESEND v7.1-rc1 3/4] dlm: fix add msg handle in send_queue ordered Alexander Aring
2026-04-27 15:59 ` [PATCH RESEND v7.1-rc1 4/4] dlm: init per node debugfs before add to node hash Alexander Aring
3 siblings, 0 replies; 5+ messages in thread
From: Alexander Aring @ 2026-04-27 15:59 UTC (permalink / raw)
To: teigland; +Cc: aahringo, gfs2
From: Ziyi Guo <n7l8m4@u.northwestern.edu>
The dlm_cb slab cache is created with kmem_cache_create(), which
provides no usercopy whitelist. When a callback carries LVB data,
dlm_user_add_ast() copies the LVB into the inline lvbptr[] array within
the slab-allocated struct dlm_callback and redirects ua->lksb.sb_lvbptr
to point to it. copy_result_to_user() then calls copy_to_user() with
this pointer. With CONFIG_HARDENED_USERCOPY enabled, this triggers
usercopy_abort().
Switch to kmem_cache_create_usercopy() with a whitelist covering the
lvbptr field.
Signed-off-by: Ziyi Guo <n7l8m4@u.northwestern.edu>
Acked-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: Alexander Aring <aahringo@redhat.com>
---
fs/dlm/memory.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/fs/dlm/memory.c b/fs/dlm/memory.c
index 5c35cc67aca4c..ee55994ce90d8 100644
--- a/fs/dlm/memory.c
+++ b/fs/dlm/memory.c
@@ -48,8 +48,10 @@ int __init dlm_memory_init(void)
if (!rsb_cache)
goto rsb;
- cb_cache = kmem_cache_create("dlm_cb", sizeof(struct dlm_callback),
+ cb_cache = kmem_cache_create_usercopy("dlm_cb", sizeof(struct dlm_callback),
__alignof__(struct dlm_callback), 0,
+ offsetof(struct dlm_callback, lvbptr),
+ sizeof_field(struct dlm_callback, lvbptr),
NULL);
if (!cb_cache)
goto cb;
--
2.43.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH RESEND v7.1-rc1 3/4] dlm: fix add msg handle in send_queue ordered
2026-04-27 15:59 [PATCH RESEND v7.1-rc1 0/4] dlm: pending DLM patches Alexander Aring
2026-04-27 15:59 ` [PATCH RESEND v7.1-rc1 1/4] dlm: use hlist_for_each_entry_srcu for SRCU protected lists Alexander Aring
2026-04-27 15:59 ` [PATCH RESEND v7.1-rc1 2/4] dlm: add usercopy whitelist to dlm_cb cache Alexander Aring
@ 2026-04-27 15:59 ` Alexander Aring
2026-04-27 15:59 ` [PATCH RESEND v7.1-rc1 4/4] dlm: init per node debugfs before add to node hash Alexander Aring
3 siblings, 0 replies; 5+ messages in thread
From: Alexander Aring @ 2026-04-27 15:59 UTC (permalink / raw)
To: teigland; +Cc: aahringo, gfs2
In a benchmark scenario triggering a lot of requests that triggers a lot
of DLM messages on the network it can be that the mh->seq is not ordered
according the oldest seq number. This ordering is required by
dlm_receive_ack as "before(mh->seq, seq)" will stop to check for older
sequence numbers that are ordered in the tail of "node->send_queue".
The side effects of not having it correct ordered regarding
"before(mh->seq, seq)" are refcounting issues and use-after free.
I only was able to reproduce this issue in a experimental DLM branch
and a user space DLM benchmark that uses io_uring. After changing this I
don't experienced any refcounting with the sending buffer issues anymore.
Fixes: 489d8e559c659 ("fs: dlm: add reliable connection if reconnect")
Signed-off-by: Alexander Aring <aahringo@redhat.com>
---
fs/dlm/midcomms.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/dlm/midcomms.c b/fs/dlm/midcomms.c
index a5b363b4785f3..6b522d4e2b7e3 100644
--- a/fs/dlm/midcomms.c
+++ b/fs/dlm/midcomms.c
@@ -969,10 +969,10 @@ static void midcomms_new_msg_cb(void *data)
atomic_inc(&mh->node->send_queue_cnt);
spin_lock_bh(&mh->node->send_queue_lock);
+ /* need to be locked with list_add_tail_rcu() because list is ordered */
+ mh->seq = atomic_fetch_inc(&mh->node->seq_send);
list_add_tail_rcu(&mh->list, &mh->node->send_queue);
spin_unlock_bh(&mh->node->send_queue_lock);
-
- mh->seq = atomic_fetch_inc(&mh->node->seq_send);
}
static struct dlm_msg *dlm_midcomms_get_msg_3_2(struct dlm_mhandle *mh, int nodeid,
--
2.43.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH RESEND v7.1-rc1 4/4] dlm: init per node debugfs before add to node hash
2026-04-27 15:59 [PATCH RESEND v7.1-rc1 0/4] dlm: pending DLM patches Alexander Aring
` (2 preceding siblings ...)
2026-04-27 15:59 ` [PATCH RESEND v7.1-rc1 3/4] dlm: fix add msg handle in send_queue ordered Alexander Aring
@ 2026-04-27 15:59 ` Alexander Aring
3 siblings, 0 replies; 5+ messages in thread
From: Alexander Aring @ 2026-04-27 15:59 UTC (permalink / raw)
To: teigland; +Cc: aahringo, gfs2
Avoiding potential issues when a node is added to the hash but the
debugfs is not NULL or IS_ERR() so a potential iteration over the hash
and debugfs_remove() will not fail like in dlm_midcomms_exit().
However dlm_midcomms_exit() will be called in module init/exit function
and the hash should be empty anyway at those stages. We change the
behavior as cleanup to avoid potential issues.
Reported-by: Ginger <ginger.jzllee@gmail.com>
Closes: https://lore.kernel.org/gfs2/CAGp+u1ZE7UsQ4sSUHBKQXU8x3M_jwK=ek1urSjEtd3jXQGFmVg@mail.gmail.com
Signed-off-by: Alexander Aring <aahringo@redhat.com>
---
fs/dlm/midcomms.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/dlm/midcomms.c b/fs/dlm/midcomms.c
index 6b522d4e2b7e3..8964164600d2d 100644
--- a/fs/dlm/midcomms.c
+++ b/fs/dlm/midcomms.c
@@ -356,6 +356,7 @@ int dlm_midcomms_addr(int nodeid, struct sockaddr_storage *addr)
if (!node)
return -ENOMEM;
+ node->debugfs = dlm_create_debug_comms_file(nodeid, node);
node->nodeid = nodeid;
spin_lock_init(&node->state_lock);
spin_lock_init(&node->send_queue_lock);
@@ -369,7 +370,6 @@ int dlm_midcomms_addr(int nodeid, struct sockaddr_storage *addr)
hlist_add_head_rcu(&node->hlist, &node_hash[r]);
spin_unlock_bh(&nodes_lock);
- node->debugfs = dlm_create_debug_comms_file(nodeid, node);
return 0;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-04-27 15:59 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-27 15:59 [PATCH RESEND v7.1-rc1 0/4] dlm: pending DLM patches Alexander Aring
2026-04-27 15:59 ` [PATCH RESEND v7.1-rc1 1/4] dlm: use hlist_for_each_entry_srcu for SRCU protected lists Alexander Aring
2026-04-27 15:59 ` [PATCH RESEND v7.1-rc1 2/4] dlm: add usercopy whitelist to dlm_cb cache Alexander Aring
2026-04-27 15:59 ` [PATCH RESEND v7.1-rc1 3/4] dlm: fix add msg handle in send_queue ordered Alexander Aring
2026-04-27 15:59 ` [PATCH RESEND v7.1-rc1 4/4] dlm: init per node debugfs before add to node hash Alexander Aring
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox