public inbox for gfs2@lists.linux.dev
 help / color / mirror / Atom feed
* [PATCH RESEND v7.1-rc1 0/4] dlm: pending DLM patches
@ 2026-04-27 15:59 Alexander Aring
  2026-04-27 15:59 ` [PATCH RESEND v7.1-rc1 1/4] dlm: use hlist_for_each_entry_srcu for SRCU protected lists Alexander Aring
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Alexander Aring @ 2026-04-27 15:59 UTC (permalink / raw)
  To: teigland; +Cc: aahringo, gfs2

Hi David,

here are the pending DLM patches for dlm/next.

- using srcu list iterations to be aligned with lockdep
- fixing an issue with hardened kernel and lvb bytearray
  in dlm_callback that is copied to user address space
- fixing an issue with concurrent DLM requests and the
  midcomms sendqueue is out of order that ends in ref
  counting issues
- cleanup midcomms node creation by moving debugfs
  assignment before adding it to the node hash to
  avoid issues that debugfs is not initialiazed if
  concurrent users iterate over the hash.

- Alex

Alexander Aring (2):
  dlm: fix add msg handle in send_queue ordered
  dlm: init per node debugfs before add to node hash

Li RongQing (1):
  dlm: use hlist_for_each_entry_srcu for SRCU protected lists

Ziyi Guo (1):
  dlm: add usercopy whitelist to dlm_cb cache

 fs/dlm/lowcomms.c | 12 ++++++++----
 fs/dlm/memory.c   |  4 +++-
 fs/dlm/midcomms.c | 21 +++++++++++++--------
 3 files changed, 24 insertions(+), 13 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH RESEND v7.1-rc1 1/4] dlm: use hlist_for_each_entry_srcu for SRCU protected lists
  2026-04-27 15:59 [PATCH RESEND v7.1-rc1 0/4] dlm: pending DLM patches Alexander Aring
@ 2026-04-27 15:59 ` Alexander Aring
  2026-04-27 15:59 ` [PATCH RESEND v7.1-rc1 2/4] dlm: add usercopy whitelist to dlm_cb cache Alexander Aring
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Alexander Aring @ 2026-04-27 15:59 UTC (permalink / raw)
  To: teigland; +Cc: aahringo, gfs2

From: Li RongQing <lirongqing@baidu.com>

The connection and node hash tables in DLM are protected by SRCU, but
the code currently uses hlist_for_each_entry_rcu() for traversal.
While this works functionally, it is semantically incorrect and triggers
warnings when RCU lockdep debugging is enabled, as it expects regular
RCU read-side critical sections.

This patch replaces the incorrect macros with hlist_for_each_entry_srcu()
and adds the appropriate lockdep expressions using srcu_read_lock_held()
to ensure consistency with the underlying locking mechanism.

Signed-off-by: Li RongQing <lirongqing@baidu.com>
Acked-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: Alexander Aring <aahringo@redhat.com>
---
 fs/dlm/lowcomms.c | 12 ++++++++----
 fs/dlm/midcomms.c | 15 ++++++++++-----
 2 files changed, 18 insertions(+), 9 deletions(-)

diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c
index 5b61427879195..2aff1c7c17de4 100644
--- a/fs/dlm/lowcomms.c
+++ b/fs/dlm/lowcomms.c
@@ -271,7 +271,8 @@ static struct connection *__find_con(int nodeid, int r)
 {
 	struct connection *con;
 
-	hlist_for_each_entry_rcu(con, &connection_hash[r], list) {
+	hlist_for_each_entry_srcu(con, &connection_hash[r], list,
+			srcu_read_lock_held(&connections_srcu)) {
 		if (con->nodeid == nodeid)
 			return con;
 	}
@@ -426,7 +427,8 @@ static int addr_to_nodeid(struct sockaddr_storage *addr, int *nodeid,
 
 	idx = srcu_read_lock(&connections_srcu);
 	for (i = 0; i < CONN_HASH_SIZE; i++) {
-		hlist_for_each_entry_rcu(con, &connection_hash[i], list) {
+		hlist_for_each_entry_srcu(con, &connection_hash[i], list,
+				srcu_read_lock_held(&connections_srcu)) {
 			WARN_ON_ONCE(!con->addr_count);
 
 			spin_lock(&con->addrs_lock);
@@ -1729,7 +1731,8 @@ void dlm_lowcomms_shutdown(void)
 
 	idx = srcu_read_lock(&connections_srcu);
 	for (i = 0; i < CONN_HASH_SIZE; i++) {
-		hlist_for_each_entry_rcu(con, &connection_hash[i], list) {
+		hlist_for_each_entry_srcu(con, &connection_hash[i], list,
+				srcu_read_lock_held(&connections_srcu)) {
 			shutdown_connection(con, true);
 			stop_connection_io(con);
 			flush_workqueue(process_workqueue);
@@ -1968,7 +1971,8 @@ void dlm_lowcomms_exit(void)
 
 	idx = srcu_read_lock(&connections_srcu);
 	for (i = 0; i < CONN_HASH_SIZE; i++) {
-		hlist_for_each_entry_rcu(con, &connection_hash[i], list) {
+		hlist_for_each_entry_srcu(con, &connection_hash[i], list,
+				srcu_read_lock_held(&connections_srcu)) {
 			spin_lock(&connections_lock);
 			hlist_del_rcu(&con->list);
 			spin_unlock(&connections_lock);
diff --git a/fs/dlm/midcomms.c b/fs/dlm/midcomms.c
index d54bdd8fc4f2e..a5b363b4785f3 100644
--- a/fs/dlm/midcomms.c
+++ b/fs/dlm/midcomms.c
@@ -275,7 +275,8 @@ static struct midcomms_node *__find_node(int nodeid, int r)
 {
 	struct midcomms_node *node;
 
-	hlist_for_each_entry_rcu(node, &node_hash[r], hlist) {
+	hlist_for_each_entry_srcu(node, &node_hash[r], hlist,
+			srcu_read_lock_held(&nodes_srcu)) {
 		if (node->nodeid == nodeid)
 			return node;
 	}
@@ -1165,7 +1166,8 @@ void dlm_midcomms_exit(void)
 
 	idx = srcu_read_lock(&nodes_srcu);
 	for (i = 0; i < CONN_HASH_SIZE; i++) {
-		hlist_for_each_entry_rcu(node, &node_hash[i], hlist) {
+		hlist_for_each_entry_srcu(node, &node_hash[i], hlist,
+				srcu_read_lock_held(&nodes_srcu)) {
 			dlm_delete_debug_comms_file(node->debugfs);
 
 			spin_lock(&nodes_lock);
@@ -1325,7 +1327,8 @@ void dlm_midcomms_version_wait(void)
 
 	idx = srcu_read_lock(&nodes_srcu);
 	for (i = 0; i < CONN_HASH_SIZE; i++) {
-		hlist_for_each_entry_rcu(node, &node_hash[i], hlist) {
+		hlist_for_each_entry_srcu(node, &node_hash[i], hlist,
+				srcu_read_lock_held(&nodes_srcu)) {
 			ret = wait_event_timeout(node->shutdown_wait,
 						 node->version != DLM_VERSION_NOT_SET ||
 						 node->state == DLM_CLOSED ||
@@ -1396,7 +1399,8 @@ void dlm_midcomms_shutdown(void)
 	mutex_lock(&close_lock);
 	idx = srcu_read_lock(&nodes_srcu);
 	for (i = 0; i < CONN_HASH_SIZE; i++) {
-		hlist_for_each_entry_rcu(node, &node_hash[i], hlist) {
+		hlist_for_each_entry_srcu(node, &node_hash[i], hlist,
+				srcu_read_lock_held(&nodes_srcu)) {
 			midcomms_shutdown(node);
 		}
 	}
@@ -1404,7 +1408,8 @@ void dlm_midcomms_shutdown(void)
 	dlm_lowcomms_shutdown();
 
 	for (i = 0; i < CONN_HASH_SIZE; i++) {
-		hlist_for_each_entry_rcu(node, &node_hash[i], hlist) {
+		hlist_for_each_entry_srcu(node, &node_hash[i], hlist,
+				srcu_read_lock_held(&nodes_srcu)) {
 			midcomms_node_reset(node);
 		}
 	}
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH RESEND v7.1-rc1 2/4] dlm: add usercopy whitelist to dlm_cb cache
  2026-04-27 15:59 [PATCH RESEND v7.1-rc1 0/4] dlm: pending DLM patches Alexander Aring
  2026-04-27 15:59 ` [PATCH RESEND v7.1-rc1 1/4] dlm: use hlist_for_each_entry_srcu for SRCU protected lists Alexander Aring
@ 2026-04-27 15:59 ` Alexander Aring
  2026-04-27 15:59 ` [PATCH RESEND v7.1-rc1 3/4] dlm: fix add msg handle in send_queue ordered Alexander Aring
  2026-04-27 15:59 ` [PATCH RESEND v7.1-rc1 4/4] dlm: init per node debugfs before add to node hash Alexander Aring
  3 siblings, 0 replies; 5+ messages in thread
From: Alexander Aring @ 2026-04-27 15:59 UTC (permalink / raw)
  To: teigland; +Cc: aahringo, gfs2

From: Ziyi Guo <n7l8m4@u.northwestern.edu>

The dlm_cb slab cache is created with kmem_cache_create(), which
provides no usercopy whitelist. When a callback carries LVB data,
dlm_user_add_ast() copies the LVB into the inline lvbptr[] array within
the slab-allocated struct dlm_callback and redirects ua->lksb.sb_lvbptr
to point to it. copy_result_to_user() then calls copy_to_user() with
this pointer. With CONFIG_HARDENED_USERCOPY enabled, this triggers
usercopy_abort().

Switch to kmem_cache_create_usercopy() with a whitelist covering the
lvbptr field.

Signed-off-by: Ziyi Guo <n7l8m4@u.northwestern.edu>
Acked-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: Alexander Aring <aahringo@redhat.com>
---
 fs/dlm/memory.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/dlm/memory.c b/fs/dlm/memory.c
index 5c35cc67aca4c..ee55994ce90d8 100644
--- a/fs/dlm/memory.c
+++ b/fs/dlm/memory.c
@@ -48,8 +48,10 @@ int __init dlm_memory_init(void)
 	if (!rsb_cache)
 		goto rsb;
 
-	cb_cache = kmem_cache_create("dlm_cb", sizeof(struct dlm_callback),
+	cb_cache = kmem_cache_create_usercopy("dlm_cb", sizeof(struct dlm_callback),
 				     __alignof__(struct dlm_callback), 0,
+					 offsetof(struct dlm_callback, lvbptr),
+					 sizeof_field(struct dlm_callback, lvbptr),
 				     NULL);
 	if (!cb_cache)
 		goto cb;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH RESEND v7.1-rc1 3/4] dlm: fix add msg handle in send_queue ordered
  2026-04-27 15:59 [PATCH RESEND v7.1-rc1 0/4] dlm: pending DLM patches Alexander Aring
  2026-04-27 15:59 ` [PATCH RESEND v7.1-rc1 1/4] dlm: use hlist_for_each_entry_srcu for SRCU protected lists Alexander Aring
  2026-04-27 15:59 ` [PATCH RESEND v7.1-rc1 2/4] dlm: add usercopy whitelist to dlm_cb cache Alexander Aring
@ 2026-04-27 15:59 ` Alexander Aring
  2026-04-27 15:59 ` [PATCH RESEND v7.1-rc1 4/4] dlm: init per node debugfs before add to node hash Alexander Aring
  3 siblings, 0 replies; 5+ messages in thread
From: Alexander Aring @ 2026-04-27 15:59 UTC (permalink / raw)
  To: teigland; +Cc: aahringo, gfs2

In a benchmark scenario triggering a lot of requests that triggers a lot
of DLM messages on the network it can be that the mh->seq is not ordered
according the oldest seq number. This ordering is required by
dlm_receive_ack as "before(mh->seq, seq)" will stop to check for older
sequence numbers that are ordered in the tail of "node->send_queue".

The side effects of not having it correct ordered regarding
"before(mh->seq, seq)" are refcounting issues and use-after free.

I only was able to reproduce this issue in a experimental DLM branch
and a user space DLM benchmark that uses io_uring. After changing this I
don't experienced any refcounting with the sending buffer issues anymore.

Fixes: 489d8e559c659 ("fs: dlm: add reliable connection if reconnect")
Signed-off-by: Alexander Aring <aahringo@redhat.com>
---
 fs/dlm/midcomms.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/dlm/midcomms.c b/fs/dlm/midcomms.c
index a5b363b4785f3..6b522d4e2b7e3 100644
--- a/fs/dlm/midcomms.c
+++ b/fs/dlm/midcomms.c
@@ -969,10 +969,10 @@ static void midcomms_new_msg_cb(void *data)
 	atomic_inc(&mh->node->send_queue_cnt);
 
 	spin_lock_bh(&mh->node->send_queue_lock);
+	/* need to be locked with list_add_tail_rcu() because list is ordered */
+	mh->seq = atomic_fetch_inc(&mh->node->seq_send);
 	list_add_tail_rcu(&mh->list, &mh->node->send_queue);
 	spin_unlock_bh(&mh->node->send_queue_lock);
-
-	mh->seq = atomic_fetch_inc(&mh->node->seq_send);
 }
 
 static struct dlm_msg *dlm_midcomms_get_msg_3_2(struct dlm_mhandle *mh, int nodeid,
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH RESEND v7.1-rc1 4/4] dlm: init per node debugfs before add to node hash
  2026-04-27 15:59 [PATCH RESEND v7.1-rc1 0/4] dlm: pending DLM patches Alexander Aring
                   ` (2 preceding siblings ...)
  2026-04-27 15:59 ` [PATCH RESEND v7.1-rc1 3/4] dlm: fix add msg handle in send_queue ordered Alexander Aring
@ 2026-04-27 15:59 ` Alexander Aring
  3 siblings, 0 replies; 5+ messages in thread
From: Alexander Aring @ 2026-04-27 15:59 UTC (permalink / raw)
  To: teigland; +Cc: aahringo, gfs2

Avoiding potential issues when a node is added to the hash but the
debugfs is not NULL or IS_ERR() so a potential iteration over the hash
and debugfs_remove() will not fail like in dlm_midcomms_exit().

However dlm_midcomms_exit() will be called in module init/exit function
and the hash should be empty anyway at those stages. We change the
behavior as cleanup to avoid potential issues.

Reported-by: Ginger <ginger.jzllee@gmail.com>
Closes: https://lore.kernel.org/gfs2/CAGp+u1ZE7UsQ4sSUHBKQXU8x3M_jwK=ek1urSjEtd3jXQGFmVg@mail.gmail.com
Signed-off-by: Alexander Aring <aahringo@redhat.com>
---
 fs/dlm/midcomms.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/dlm/midcomms.c b/fs/dlm/midcomms.c
index 6b522d4e2b7e3..8964164600d2d 100644
--- a/fs/dlm/midcomms.c
+++ b/fs/dlm/midcomms.c
@@ -356,6 +356,7 @@ int dlm_midcomms_addr(int nodeid, struct sockaddr_storage *addr)
 	if (!node)
 		return -ENOMEM;
 
+	node->debugfs = dlm_create_debug_comms_file(nodeid, node);
 	node->nodeid = nodeid;
 	spin_lock_init(&node->state_lock);
 	spin_lock_init(&node->send_queue_lock);
@@ -369,7 +370,6 @@ int dlm_midcomms_addr(int nodeid, struct sockaddr_storage *addr)
 	hlist_add_head_rcu(&node->hlist, &node_hash[r]);
 	spin_unlock_bh(&nodes_lock);
 
-	node->debugfs = dlm_create_debug_comms_file(nodeid, node);
 	return 0;
 }
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-04-27 15:59 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-27 15:59 [PATCH RESEND v7.1-rc1 0/4] dlm: pending DLM patches Alexander Aring
2026-04-27 15:59 ` [PATCH RESEND v7.1-rc1 1/4] dlm: use hlist_for_each_entry_srcu for SRCU protected lists Alexander Aring
2026-04-27 15:59 ` [PATCH RESEND v7.1-rc1 2/4] dlm: add usercopy whitelist to dlm_cb cache Alexander Aring
2026-04-27 15:59 ` [PATCH RESEND v7.1-rc1 3/4] dlm: fix add msg handle in send_queue ordered Alexander Aring
2026-04-27 15:59 ` [PATCH RESEND v7.1-rc1 4/4] dlm: init per node debugfs before add to node hash Alexander Aring

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox