Distributed Replicated Block Device (DRBD) development
 help / color / mirror / Atom feed
From: "zhengbing.huang" <zhengbing.huang@easystack.cn>
To: drbd-dev@lists.linbit.com
Subject: [PATCH 1/3] rdma: Fix kernel crash in dtr_create_rx_desc()
Date: Wed,  9 Jul 2025 10:55:50 +0800	[thread overview]
Message-ID: <20250709025553.694792-1-zhengbing.huang@easystack.cn> (raw)

Have the crash info as follow:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
PGD 0
Oops: 0000 [#1] SMP NOPTI
CPU: 51 PID: 748 Comm: kworker/51:1 Kdump: loaded Tainted: G           OE    --------- -  - 4.18.0-372.19.1.es8_10.x86_64 #1
Hardware name: SuperCloud R5215 G13/R5215 G13, BIOS EG6.17.12 12/20/2024
Workqueue: events dtr_refill_rx_descs_work_fn [drbd_transport_rdma]
RIP: 0010:dtr_create_rx_desc+0xe1/0x310 [drbd_transport_rdma]
Code: 48 85 db 0f 84 85 01 00 00 48 89 5d 20 48 8b 0d e5 dd 4f c7 4c 89 7d 00 4c 8b 05 ea dd 4f c7 c7 45 18 00 00 00 00 48 8b 43 28 <8b> 00 89 45 34 48 8b 53 08 4c 89 f8 48 29 c8 48 8b 12 48 c1 f8 06
RSP: 0018:ff8baaf70e6a3e28 EFLAGS: 00010286
RAX: 0000000000000000 RBX: ff4b9894e7f1f800 RCX: ffdcf32780000000
RDX: 0000000000000002 RSI: ff8baaf70e6a3dc0 RDI: ff4b98b3edac2fa8
RBP: ff4b989488700280 R08: ff4b989340000000 R09: ff4b989488700280
R10: 0000000000001000 R11: 0000000000000009 R12: ff4b989e12a2b648
R13: ff4b989c5fbaca60 R14: 0000000000001000 R15: ffdcf328136d5100
FS:  0000000000000000(0000) GS:ff4b98b33fac0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000015d3410004 CR4: 0000000000773ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
 __dtr_refill_rx_desc+0x5d/0xb0 [drbd_transport_rdma]
 process_one_work+0x1a7/0x360
 ? create_worker+0x1a0/0x1a0
 worker_thread+0x30/0x390
 ? create_worker+0x1a0/0x1a0
 kthread+0x10a/0x120
 ? set_kthread_struct+0x40/0x40
 ret_from_fork+0x1f/0x40

(gdb) l *dtr_create_rx_desc+0xe1
0x1e01 is in dtr_create_rx_desc (/.../drbd_transport_rdma.c:2093).
2088                    goto out;
2089            }
2090            rx_desc->cm = cm;
2091            rx_desc->page = page;
2092            rx_desc->size = 0;
2093            rx_desc->sge.lkey = dtr_cm_to_lkey(cm);
2094            rx_desc->sge.addr = ib_dma_map_single(cm->id->device, page_address(page), alloc_size,
2095                                                  DMA_FROM_DEVICE);

static u32 dtr_cm_to_lkey(struct dtr_cm *cm)
{
	return cm->pd->local_dma_lkey;
}

It is safe to obtain cm through dtr_path_get_cm(), so cm is not a null pointer.

In the dtr_path_prepare() function, it first replaces the cm of the path.
After the replacement is successful,
it will alloc pd in the subsequent dtr_cm_alloc_rdma_res() function.

So if in the __dtr_refill_rx_desc() function,
the cm of the path is replaced with a cm that has no pd yet,
this problem will occur.

In the dtr_path_get_cm() function, check the cm status,
if it is not connected, return null.

Signed-off-by: zhengbing.huang <zhengbing.huang@easystack.cn>
---
 drbd/drbd_transport_rdma.c | 19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/drbd/drbd_transport_rdma.c b/drbd/drbd_transport_rdma.c
index 14392a33b..442dd8e89 100644
--- a/drbd/drbd_transport_rdma.c
+++ b/drbd/drbd_transport_rdma.c
@@ -842,6 +842,20 @@ static struct dtr_cm *dtr_path_get_cm(struct dtr_path *path)
 {
 	struct dtr_cm *cm;
 
+	rcu_read_lock();
+	cm = __dtr_path_get_cm(path);
+	if (cm && cm->state != DSM_CONNECTED) {
+		kref_put(&cm->kref, dtr_destroy_cm);
+		cm = NULL;
+	}
+	rcu_read_unlock();
+	return cm;
+}
+
+static struct dtr_cm *dtr_path_get_cm_raw(struct dtr_path *path)
+{
+	struct dtr_cm *cm;
+
 	rcu_read_lock();
 	cm = __dtr_path_get_cm(path);
 	rcu_read_unlock();
@@ -2567,9 +2581,6 @@ static int _dtr_cm_alloc_rdma_res(struct dtr_cm *cm,
 		goto createqp_failed;
 	}
 
-	for (i = DATA_STREAM; i <= CONTROL_STREAM ; i++)
-		dtr_create_rx_desc(&path->flow[i], GFP_NOIO);
-
 	return 0;
 
 createqp_failed:
@@ -2756,7 +2767,7 @@ static void __dtr_disconnect_path(struct dtr_path *path)
 		break;
 	}
 
-	cm = dtr_path_get_cm(path);
+	cm = dtr_path_get_cm_raw(path);
 	if (!cm)
 		return;
 
-- 
2.43.0


             reply	other threads:[~2025-07-09  3:01 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-09  2:55 zhengbing.huang [this message]
2025-07-09  2:55 ` [PATCH 2/3] drbd: Fix kernel crash in drbd_find_path_by_addr() zhengbing.huang
2025-07-31 12:36   ` Philipp Reisner
2025-07-09  2:55 ` [PATCH 3/3] rdma: Get drbd_path->kref when get drbd_path by addr zhengbing.huang
2025-07-31 12:36   ` Philipp Reisner
2025-07-31 12:35 ` [PATCH 1/3] rdma: Fix kernel crash in dtr_create_rx_desc() Philipp Reisner
2025-08-01  2:59   ` ZhengbingHuang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250709025553.694792-1-zhengbing.huang@easystack.cn \
    --to=zhengbing.huang@easystack.cn \
    --cc=drbd-dev@lists.linbit.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox