Distributed Replicated Block Device (DRBD) development
 help / color / mirror / Atom feed
* [PATCH 1/3] rdma: Fix kernel crash in dtr_create_rx_desc()
@ 2025-07-09  2:55 zhengbing.huang
  2025-07-09  2:55 ` [PATCH 2/3] drbd: Fix kernel crash in drbd_find_path_by_addr() zhengbing.huang
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: zhengbing.huang @ 2025-07-09  2:55 UTC (permalink / raw)
  To: drbd-dev

Have the crash info as follow:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
PGD 0
Oops: 0000 [#1] SMP NOPTI
CPU: 51 PID: 748 Comm: kworker/51:1 Kdump: loaded Tainted: G           OE    --------- -  - 4.18.0-372.19.1.es8_10.x86_64 #1
Hardware name: SuperCloud R5215 G13/R5215 G13, BIOS EG6.17.12 12/20/2024
Workqueue: events dtr_refill_rx_descs_work_fn [drbd_transport_rdma]
RIP: 0010:dtr_create_rx_desc+0xe1/0x310 [drbd_transport_rdma]
Code: 48 85 db 0f 84 85 01 00 00 48 89 5d 20 48 8b 0d e5 dd 4f c7 4c 89 7d 00 4c 8b 05 ea dd 4f c7 c7 45 18 00 00 00 00 48 8b 43 28 <8b> 00 89 45 34 48 8b 53 08 4c 89 f8 48 29 c8 48 8b 12 48 c1 f8 06
RSP: 0018:ff8baaf70e6a3e28 EFLAGS: 00010286
RAX: 0000000000000000 RBX: ff4b9894e7f1f800 RCX: ffdcf32780000000
RDX: 0000000000000002 RSI: ff8baaf70e6a3dc0 RDI: ff4b98b3edac2fa8
RBP: ff4b989488700280 R08: ff4b989340000000 R09: ff4b989488700280
R10: 0000000000001000 R11: 0000000000000009 R12: ff4b989e12a2b648
R13: ff4b989c5fbaca60 R14: 0000000000001000 R15: ffdcf328136d5100
FS:  0000000000000000(0000) GS:ff4b98b33fac0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000015d3410004 CR4: 0000000000773ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
 __dtr_refill_rx_desc+0x5d/0xb0 [drbd_transport_rdma]
 process_one_work+0x1a7/0x360
 ? create_worker+0x1a0/0x1a0
 worker_thread+0x30/0x390
 ? create_worker+0x1a0/0x1a0
 kthread+0x10a/0x120
 ? set_kthread_struct+0x40/0x40
 ret_from_fork+0x1f/0x40

(gdb) l *dtr_create_rx_desc+0xe1
0x1e01 is in dtr_create_rx_desc (/.../drbd_transport_rdma.c:2093).
2088                    goto out;
2089            }
2090            rx_desc->cm = cm;
2091            rx_desc->page = page;
2092            rx_desc->size = 0;
2093            rx_desc->sge.lkey = dtr_cm_to_lkey(cm);
2094            rx_desc->sge.addr = ib_dma_map_single(cm->id->device, page_address(page), alloc_size,
2095                                                  DMA_FROM_DEVICE);

static u32 dtr_cm_to_lkey(struct dtr_cm *cm)
{
	return cm->pd->local_dma_lkey;
}

It is safe to obtain cm through dtr_path_get_cm(), so cm is not a null pointer.

In the dtr_path_prepare() function, it first replaces the cm of the path.
After the replacement is successful,
it will alloc pd in the subsequent dtr_cm_alloc_rdma_res() function.

So if in the __dtr_refill_rx_desc() function,
the cm of the path is replaced with a cm that has no pd yet,
this problem will occur.

In the dtr_path_get_cm() function, check the cm status,
if it is not connected, return null.

Signed-off-by: zhengbing.huang <zhengbing.huang@easystack.cn>
---
 drbd/drbd_transport_rdma.c | 19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/drbd/drbd_transport_rdma.c b/drbd/drbd_transport_rdma.c
index 14392a33b..442dd8e89 100644
--- a/drbd/drbd_transport_rdma.c
+++ b/drbd/drbd_transport_rdma.c
@@ -842,6 +842,20 @@ static struct dtr_cm *dtr_path_get_cm(struct dtr_path *path)
 {
 	struct dtr_cm *cm;
 
+	rcu_read_lock();
+	cm = __dtr_path_get_cm(path);
+	if (cm && cm->state != DSM_CONNECTED) {
+		kref_put(&cm->kref, dtr_destroy_cm);
+		cm = NULL;
+	}
+	rcu_read_unlock();
+	return cm;
+}
+
+static struct dtr_cm *dtr_path_get_cm_raw(struct dtr_path *path)
+{
+	struct dtr_cm *cm;
+
 	rcu_read_lock();
 	cm = __dtr_path_get_cm(path);
 	rcu_read_unlock();
@@ -2567,9 +2581,6 @@ static int _dtr_cm_alloc_rdma_res(struct dtr_cm *cm,
 		goto createqp_failed;
 	}
 
-	for (i = DATA_STREAM; i <= CONTROL_STREAM ; i++)
-		dtr_create_rx_desc(&path->flow[i], GFP_NOIO);
-
 	return 0;
 
 createqp_failed:
@@ -2756,7 +2767,7 @@ static void __dtr_disconnect_path(struct dtr_path *path)
 		break;
 	}
 
-	cm = dtr_path_get_cm(path);
+	cm = dtr_path_get_cm_raw(path);
 	if (!cm)
 		return;
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/3] drbd: Fix kernel crash in drbd_find_path_by_addr()
  2025-07-09  2:55 [PATCH 1/3] rdma: Fix kernel crash in dtr_create_rx_desc() zhengbing.huang
@ 2025-07-09  2:55 ` zhengbing.huang
  2025-07-31 12:36   ` Philipp Reisner
  2025-07-09  2:55 ` [PATCH 3/3] rdma: Get drbd_path->kref when get drbd_path by addr zhengbing.huang
  2025-07-31 12:35 ` [PATCH 1/3] rdma: Fix kernel crash in dtr_create_rx_desc() Philipp Reisner
  2 siblings, 1 reply; 7+ messages in thread
From: zhengbing.huang @ 2025-07-09  2:55 UTC (permalink / raw)
  To: drbd-dev

We hava the crash info as follow:
 BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
 Workqueue: ib_cm cm_work_handler [ib_cm]
 RIP: 0010:drbd_find_path_by_addr+0x6c/0xd0 [drbd]
 Call Trace:
  dtr_cma_event_handler+0x1c1/0x4ee [drbd_transport_rdma]
  cma_cm_event_handler+0x25/0xd0 [rdma_cm]
  cma_ib_req_handler+0x7cd/0x1250 [rdma_cm]
  ? addr4_resolve+0x67/0xd0 [ib_core]
  cm_process_work+0x22/0xf0 [ib_cm]
  cm_req_handler+0x7ed/0xf40 [ib_cm]
  ? __switch_to_asm+0x35/0x70
  cm_work_handler+0x798/0xf30 [ib_cm]
  ? finish_task_switch+0x18e/0x2e0
  process_one_work+0x1a7/0x360
  ? create_worker+0x1a0/0x1a0
  worker_thread+0x30/0x390
  ? create_worker+0x1a0/0x1a0
  kthread+0x10a/0x120
  ? set_kthread_struct+0x40/0x40
  ret_from_fork+0x1f/0x40

The code that crash is traverse the listener->waiters list:
struct drbd_path *drbd_find_path_by_addr(struct drbd_listener *listener, struct sockaddr_storage *addr)
{
	struct drbd_path *path;

	list_for_each_entry(path, &listener->waiters, listener_link) {
		if (addr_equal(&path->peer_addr, addr))
			return path;
	}

	return NULL;
}

The listener->waiters list has a Path node:
crash> struct dtr_listener ff4ba75054797c00
struct dtr_listener {
  listener = {
    kref = {
      refcount = {
        refs = {
          counter = 2
        }
      }
    },
    resource = 0xff4ba766cc325000,
    transport_class = 0xffffffffc037f080 <rdma_transport_class>,
    list = {
      next = 0xff4ba766cc325500,
      prev = 0xff4ba766cc325500
    },
    waiters = {
      next = 0xff4ba74fd578e138,
      prev = 0xff4ba74fd578e138
    },
 ...
}

but this Path has been released:
crash> struct drbd_path 0xff4ba74fd578e000
struct drbd_path {
  my_addr = {
    ss_family = 1,
    __data = "\000\000\000\000"
  },
  peer_addr = {
    ss_family = 0,
    __data = "\000\000\000\000\000\000\0"
  },
  kref = {
    refcount = {
      refs = {
        counter = 0
      }
    }
  },
  net = 0x0,
  my_addr_len = 0,
  peer_addr_len = 0,
  flags = 0,
  // all zero
  ...
}

So this path has been released, but it is still on the listener->waiters list,
which cause problem when traverse the list later.

And the scenario of this problem should be like this:
thread_1:
  remove_path()
    dtr_remove_path()
      drbd_put_listener()
        list_del(&path->listener_link)
                                          thread_2:
                                            ...
                                            dtr_activate_path()
                                              drbd_get_listener()
                                                list_add(&path->listener_link, &listener->waiters);
                                            ...
   ...
   kfree(path)

thread_3:
connect request come in:
dtr_cma_event_handler()
  dtr_cma_accept()
    drbd_find_path_by_addr()
    crash

To avoid this use-after-free, we hold an additional reference to drbd_path
whenever it is added to the listener->waiters list, and drop it when removed.

This ensures the path memory remains valid during list traversal.

Signed-off-by: zhengbing.huang <zhengbing.huang@easystack.cn>
---
 drbd/drbd_transport.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drbd/drbd_transport.c b/drbd/drbd_transport.c
index 00e7f9269..aff96716f 100644
--- a/drbd/drbd_transport.c
+++ b/drbd/drbd_transport.c
@@ -224,6 +224,7 @@ int drbd_get_listener(struct drbd_path *path)
 
 	spin_lock_bh(&listener->waiters_lock);
 	list_add(&path->listener_link, &listener->waiters);
+	kref_get(&path->kref);
 	path->listener = listener;
 	spin_unlock_bh(&listener->waiters_lock);
 	/* After exposing the listener on a path, drbd_put_listenr() can destroy it. */
@@ -258,6 +259,7 @@ void drbd_put_listener(struct drbd_path *path)
 
 	spin_lock_bh(&listener->waiters_lock);
 	list_del(&path->listener_link);
+	kref_put(&path->kref, drbd_destroy_path);
 	spin_unlock_bh(&listener->waiters_lock);
 	kref_put(&listener->kref, drbd_listener_destroy);
 }
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 3/3] rdma: Get drbd_path->kref when get drbd_path by addr
  2025-07-09  2:55 [PATCH 1/3] rdma: Fix kernel crash in dtr_create_rx_desc() zhengbing.huang
  2025-07-09  2:55 ` [PATCH 2/3] drbd: Fix kernel crash in drbd_find_path_by_addr() zhengbing.huang
@ 2025-07-09  2:55 ` zhengbing.huang
  2025-07-31 12:36   ` Philipp Reisner
  2025-07-31 12:35 ` [PATCH 1/3] rdma: Fix kernel crash in dtr_create_rx_desc() Philipp Reisner
  2 siblings, 1 reply; 7+ messages in thread
From: zhengbing.huang @ 2025-07-09  2:55 UTC (permalink / raw)
  To: drbd-dev

In the dtr_cma_accept() function, after obtain the drbd_path
through peer_addr, without take a reference,
the drbd_path may be released concurrently, leade to a use-after-free.

So when we obtain drbd_path, we add a reference count.

Signed-off-by: zhengbing.huang <zhengbing.huang@easystack.cn>
---
 drbd/drbd_transport_rdma.c | 21 ++++++++++++---------
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/drbd/drbd_transport_rdma.c b/drbd/drbd_transport_rdma.c
index 442dd8e89..68c668f7f 100644
--- a/drbd/drbd_transport_rdma.c
+++ b/drbd/drbd_transport_rdma.c
@@ -969,6 +969,8 @@ static int dtr_cma_accept(struct dtr_listener *listener, struct rdma_cm_id *new_
 
 	spin_lock(&listener->listener.waiters_lock);
 	drbd_path = drbd_find_path_by_addr(&listener->listener, peer_addr);
+	if (drbd_path)
+		kref_get(&drbd_path->kref);
 	spin_unlock(&listener->listener.waiters_lock);
 
 	if (!drbd_path) {
@@ -997,16 +999,13 @@ static int dtr_cma_accept(struct dtr_listener *listener, struct rdma_cm_id *new_
 
 	path = container_of(drbd_path, struct dtr_path, path);
 	cs = &path->cs;
-	if (atomic_read(&cs->passive_state) < PCS_CONNECTING) {
-		rdma_reject(new_cm_id, NULL, 0, IB_CM_REJ_CONSUMER_DEFINED);
-		return -EAGAIN;
-	}
+	if (atomic_read(&cs->passive_state) < PCS_CONNECTING)
+		goto reject;
 
 	cm = dtr_alloc_cm(path);
 	if (!cm) {
 		pr_err("rejecting connecting since -ENOMEM for cm\n");
-		rdma_reject(new_cm_id, NULL, 0, IB_CM_REJ_CONSUMER_DEFINED);
-		return -EAGAIN;
+		goto reject;
 	}
 
 	cm->state = DSM_CONNECT_REQ;
@@ -1024,17 +1023,21 @@ static int dtr_cma_accept(struct dtr_listener *listener, struct rdma_cm_id *new_
 	/* Gifting the initial kref to the path->cm pointer */
 	err = dtr_path_prepare(path, cm, false);
 	if (err) {
-		rdma_reject(new_cm_id, NULL, 0, IB_CM_REJ_CONSUMER_DEFINED);
 		/* Returning the cm via ret_cm and an error causes the caller to put one ref */
-
-		return -EAGAIN;
+		goto reject;
 	}
+	kref_put(&drbd_path->kref, drbd_destroy_path);
 
 	err = rdma_accept(new_cm_id, &dtr_conn_param);
 	if (err)
 		kref_put(&cm->kref, dtr_destroy_cm);
 
 	return err;
+
+reject:
+	rdma_reject(new_cm_id, NULL, 0, IB_CM_REJ_CONSUMER_DEFINED);
+	kref_put(&drbd_path->kref, drbd_destroy_path);
+	return -EAGAIN;
 }
 
 static int dtr_start_try_connect(struct dtr_connect_state *cs)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/3] rdma: Fix kernel crash in dtr_create_rx_desc()
  2025-07-09  2:55 [PATCH 1/3] rdma: Fix kernel crash in dtr_create_rx_desc() zhengbing.huang
  2025-07-09  2:55 ` [PATCH 2/3] drbd: Fix kernel crash in drbd_find_path_by_addr() zhengbing.huang
  2025-07-09  2:55 ` [PATCH 3/3] rdma: Get drbd_path->kref when get drbd_path by addr zhengbing.huang
@ 2025-07-31 12:35 ` Philipp Reisner
  2025-08-01  2:59   ` ZhengbingHuang
  2 siblings, 1 reply; 7+ messages in thread
From: Philipp Reisner @ 2025-07-31 12:35 UTC (permalink / raw)
  To: zhengbing.huang; +Cc: drbd-dev

Hi Zhengbing,

Thanks for the analysis and the patch. I took the freedom and slightly
modified it before applying it. Instead of changing the behaviour of
the dtr_path_get_cm() function I introduce a new one:
dtr_path_get_cm_connected().

Please see
https://github.com/LINBIT/drbd/commit/ae1b0bdfa2e4ea59d15199b55a6f0c571844f576

Also ...

[...]
> -       for (i = DATA_STREAM; i <= CONTROL_STREAM ; i++)
> -               dtr_create_rx_desc(&path->flow[i], GFP_NOIO);
> -

I dropped this part. It came without explanation in the commit
message. I believe that some RDMA transports require at least one
rx-descriptor to enable them to establish a connection.


Best regards,
 Philipp

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/3] drbd: Fix kernel crash in drbd_find_path_by_addr()
  2025-07-09  2:55 ` [PATCH 2/3] drbd: Fix kernel crash in drbd_find_path_by_addr() zhengbing.huang
@ 2025-07-31 12:36   ` Philipp Reisner
  0 siblings, 0 replies; 7+ messages in thread
From: Philipp Reisner @ 2025-07-31 12:36 UTC (permalink / raw)
  To: zhengbing.huang; +Cc: drbd-dev

Thanks applied

On Wed, Jul 9, 2025 at 5:01 AM zhengbing.huang
<zhengbing.huang@easystack.cn> wrote:
>
> We hava the crash info as follow:
>  BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
>  Workqueue: ib_cm cm_work_handler [ib_cm]
>  RIP: 0010:drbd_find_path_by_addr+0x6c/0xd0 [drbd]
>  Call Trace:
>   dtr_cma_event_handler+0x1c1/0x4ee [drbd_transport_rdma]
>   cma_cm_event_handler+0x25/0xd0 [rdma_cm]
>   cma_ib_req_handler+0x7cd/0x1250 [rdma_cm]
>   ? addr4_resolve+0x67/0xd0 [ib_core]
>   cm_process_work+0x22/0xf0 [ib_cm]
>   cm_req_handler+0x7ed/0xf40 [ib_cm]
>   ? __switch_to_asm+0x35/0x70
>   cm_work_handler+0x798/0xf30 [ib_cm]
>   ? finish_task_switch+0x18e/0x2e0
>   process_one_work+0x1a7/0x360
>   ? create_worker+0x1a0/0x1a0
>   worker_thread+0x30/0x390
>   ? create_worker+0x1a0/0x1a0
>   kthread+0x10a/0x120
>   ? set_kthread_struct+0x40/0x40
>   ret_from_fork+0x1f/0x40
>
> The code that crash is traverse the listener->waiters list:
> struct drbd_path *drbd_find_path_by_addr(struct drbd_listener *listener, struct sockaddr_storage *addr)
> {
>         struct drbd_path *path;
>
>         list_for_each_entry(path, &listener->waiters, listener_link) {
>                 if (addr_equal(&path->peer_addr, addr))
>                         return path;
>         }
>
>         return NULL;
> }
>
> The listener->waiters list has a Path node:
> crash> struct dtr_listener ff4ba75054797c00
> struct dtr_listener {
>   listener = {
>     kref = {
>       refcount = {
>         refs = {
>           counter = 2
>         }
>       }
>     },
>     resource = 0xff4ba766cc325000,
>     transport_class = 0xffffffffc037f080 <rdma_transport_class>,
>     list = {
>       next = 0xff4ba766cc325500,
>       prev = 0xff4ba766cc325500
>     },
>     waiters = {
>       next = 0xff4ba74fd578e138,
>       prev = 0xff4ba74fd578e138
>     },
>  ...
> }
>
> but this Path has been released:
> crash> struct drbd_path 0xff4ba74fd578e000
> struct drbd_path {
>   my_addr = {
>     ss_family = 1,
>     __data = "\000\000\000\000"
>   },
>   peer_addr = {
>     ss_family = 0,
>     __data = "\000\000\000\000\000\000\0"
>   },
>   kref = {
>     refcount = {
>       refs = {
>         counter = 0
>       }
>     }
>   },
>   net = 0x0,
>   my_addr_len = 0,
>   peer_addr_len = 0,
>   flags = 0,
>   // all zero
>   ...
> }
>
> So this path has been released, but it is still on the listener->waiters list,
> which cause problem when traverse the list later.
>
> And the scenario of this problem should be like this:
> thread_1:
>   remove_path()
>     dtr_remove_path()
>       drbd_put_listener()
>         list_del(&path->listener_link)
>                                           thread_2:
>                                             ...
>                                             dtr_activate_path()
>                                               drbd_get_listener()
>                                                 list_add(&path->listener_link, &listener->waiters);
>                                             ...
>    ...
>    kfree(path)
>
> thread_3:
> connect request come in:
> dtr_cma_event_handler()
>   dtr_cma_accept()
>     drbd_find_path_by_addr()
>     crash
>
> To avoid this use-after-free, we hold an additional reference to drbd_path
> whenever it is added to the listener->waiters list, and drop it when removed.
>
> This ensures the path memory remains valid during list traversal.
>
> Signed-off-by: zhengbing.huang <zhengbing.huang@easystack.cn>
> ---
>  drbd/drbd_transport.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drbd/drbd_transport.c b/drbd/drbd_transport.c
> index 00e7f9269..aff96716f 100644
> --- a/drbd/drbd_transport.c
> +++ b/drbd/drbd_transport.c
> @@ -224,6 +224,7 @@ int drbd_get_listener(struct drbd_path *path)
>
>         spin_lock_bh(&listener->waiters_lock);
>         list_add(&path->listener_link, &listener->waiters);
> +       kref_get(&path->kref);
>         path->listener = listener;
>         spin_unlock_bh(&listener->waiters_lock);
>         /* After exposing the listener on a path, drbd_put_listenr() can destroy it. */
> @@ -258,6 +259,7 @@ void drbd_put_listener(struct drbd_path *path)
>
>         spin_lock_bh(&listener->waiters_lock);
>         list_del(&path->listener_link);
> +       kref_put(&path->kref, drbd_destroy_path);
>         spin_unlock_bh(&listener->waiters_lock);
>         kref_put(&listener->kref, drbd_listener_destroy);
>  }
> --
> 2.43.0
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 3/3] rdma: Get drbd_path->kref when get drbd_path by addr
  2025-07-09  2:55 ` [PATCH 3/3] rdma: Get drbd_path->kref when get drbd_path by addr zhengbing.huang
@ 2025-07-31 12:36   ` Philipp Reisner
  0 siblings, 0 replies; 7+ messages in thread
From: Philipp Reisner @ 2025-07-31 12:36 UTC (permalink / raw)
  To: zhengbing.huang; +Cc: drbd-dev

Thanks! Applied.

Best regards,
 Philipp

On Wed, Jul 9, 2025 at 7:22 AM zhengbing.huang
<zhengbing.huang@easystack.cn> wrote:
>
> In the dtr_cma_accept() function, after obtain the drbd_path
> through peer_addr, without take a reference,
> the drbd_path may be released concurrently, leade to a use-after-free.
>
> So when we obtain drbd_path, we add a reference count.
>
> Signed-off-by: zhengbing.huang <zhengbing.huang@easystack.cn>
> ---
>  drbd/drbd_transport_rdma.c | 21 ++++++++++++---------
>  1 file changed, 12 insertions(+), 9 deletions(-)
>
> diff --git a/drbd/drbd_transport_rdma.c b/drbd/drbd_transport_rdma.c
> index 442dd8e89..68c668f7f 100644
> --- a/drbd/drbd_transport_rdma.c
> +++ b/drbd/drbd_transport_rdma.c
> @@ -969,6 +969,8 @@ static int dtr_cma_accept(struct dtr_listener *listener, struct rdma_cm_id *new_
>
>         spin_lock(&listener->listener.waiters_lock);
>         drbd_path = drbd_find_path_by_addr(&listener->listener, peer_addr);
> +       if (drbd_path)
> +               kref_get(&drbd_path->kref);
>         spin_unlock(&listener->listener.waiters_lock);
>
>         if (!drbd_path) {
> @@ -997,16 +999,13 @@ static int dtr_cma_accept(struct dtr_listener *listener, struct rdma_cm_id *new_
>
>         path = container_of(drbd_path, struct dtr_path, path);
>         cs = &path->cs;
> -       if (atomic_read(&cs->passive_state) < PCS_CONNECTING) {
> -               rdma_reject(new_cm_id, NULL, 0, IB_CM_REJ_CONSUMER_DEFINED);
> -               return -EAGAIN;
> -       }
> +       if (atomic_read(&cs->passive_state) < PCS_CONNECTING)
> +               goto reject;
>
>         cm = dtr_alloc_cm(path);
>         if (!cm) {
>                 pr_err("rejecting connecting since -ENOMEM for cm\n");
> -               rdma_reject(new_cm_id, NULL, 0, IB_CM_REJ_CONSUMER_DEFINED);
> -               return -EAGAIN;
> +               goto reject;
>         }
>
>         cm->state = DSM_CONNECT_REQ;
> @@ -1024,17 +1023,21 @@ static int dtr_cma_accept(struct dtr_listener *listener, struct rdma_cm_id *new_
>         /* Gifting the initial kref to the path->cm pointer */
>         err = dtr_path_prepare(path, cm, false);
>         if (err) {
> -               rdma_reject(new_cm_id, NULL, 0, IB_CM_REJ_CONSUMER_DEFINED);
>                 /* Returning the cm via ret_cm and an error causes the caller to put one ref */
> -
> -               return -EAGAIN;
> +               goto reject;
>         }
> +       kref_put(&drbd_path->kref, drbd_destroy_path);
>
>         err = rdma_accept(new_cm_id, &dtr_conn_param);
>         if (err)
>                 kref_put(&cm->kref, dtr_destroy_cm);
>
>         return err;
> +
> +reject:
> +       rdma_reject(new_cm_id, NULL, 0, IB_CM_REJ_CONSUMER_DEFINED);
> +       kref_put(&drbd_path->kref, drbd_destroy_path);
> +       return -EAGAIN;
>  }
>
>  static int dtr_start_try_connect(struct dtr_connect_state *cs)
> --
> 2.43.0
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re:Re: [PATCH 1/3] rdma: Fix kernel crash in dtr_create_rx_desc()
  2025-07-31 12:35 ` [PATCH 1/3] rdma: Fix kernel crash in dtr_create_rx_desc() Philipp Reisner
@ 2025-08-01  2:59   ` ZhengbingHuang
  0 siblings, 0 replies; 7+ messages in thread
From: ZhengbingHuang @ 2025-08-01  2:59 UTC (permalink / raw)
  To: Philipp Reisner; +Cc: drbd-dev

[-- Attachment #1: Type: text/plain, Size: 1435 bytes --]

Hi Philipp,
Thanks for reply.


From: Philipp Reisner <philipp.reisner@linbit.com>
Date: 2025-07-31 20:35:17
To:  "zhengbing.huang" <zhengbing.huang@easystack.cn>
Cc:  drbd-dev@lists.linbit.com
Subject: Re: [PATCH 1/3] rdma: Fix kernel crash in dtr_create_rx_desc()>Hi Zhengbing,
>
>Thanks for the analysis and the patch. I took the freedom and slightly
>modified it before applying it. Instead of changing the behaviour of
>the dtr_path_get_cm() function I introduce a new one:
>dtr_path_get_cm_connected().
>
>Please see
>https://github.com/LINBIT/drbd/commit/ae1b0bdfa2e4ea59d15199b55a6f0c571844f576
>
>Also ...
>
>[...]
>> -       for (i = DATA_STREAM; i <= CONTROL_STREAM ; i++)
>> -               dtr_create_rx_desc(&path->flow[i], GFP_NOIO);
>> -
>
Sorry for the lack of explanation for this part of the code. 


These two lines of code are in the path prepare phase, 
so the cm state at this time is definitely not connected. 
After the dtr_create_rx_desc() function calls the dtr_path_get_cm_connected() function, 
it is definitely impossible to create rx_desc, which is the reason for delete them.

And in the test after delete this code, the rdma connection was normal.
>I dropped this part. It came without explanation in the commit
>message. I believe that some RDMA transports require at least one
>rx-descriptor to enable them to establish a connection.
>
>
>Best regards,
> Philipp
>
Best regards,
  zhengbing








[-- Attachment #2: Type: text/html, Size: 1983 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-08-01  3:04 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-09  2:55 [PATCH 1/3] rdma: Fix kernel crash in dtr_create_rx_desc() zhengbing.huang
2025-07-09  2:55 ` [PATCH 2/3] drbd: Fix kernel crash in drbd_find_path_by_addr() zhengbing.huang
2025-07-31 12:36   ` Philipp Reisner
2025-07-09  2:55 ` [PATCH 3/3] rdma: Get drbd_path->kref when get drbd_path by addr zhengbing.huang
2025-07-31 12:36   ` Philipp Reisner
2025-07-31 12:35 ` [PATCH 1/3] rdma: Fix kernel crash in dtr_create_rx_desc() Philipp Reisner
2025-08-01  2:59   ` ZhengbingHuang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox