Distributed Replicated Block Device (DRBD) development
 help / color / mirror / Atom feed
From: "zhengbing.huang" <zhengbing.huang@easystack.cn>
To: drbd-dev@lists.linbit.com
Subject: [PATCH 2/3] drbd: Fix kernel crash in drbd_find_path_by_addr()
Date: Wed,  9 Jul 2025 10:55:51 +0800	[thread overview]
Message-ID: <20250709025553.694792-2-zhengbing.huang@easystack.cn> (raw)
In-Reply-To: <20250709025553.694792-1-zhengbing.huang@easystack.cn>

We hava the crash info as follow:
 BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
 Workqueue: ib_cm cm_work_handler [ib_cm]
 RIP: 0010:drbd_find_path_by_addr+0x6c/0xd0 [drbd]
 Call Trace:
  dtr_cma_event_handler+0x1c1/0x4ee [drbd_transport_rdma]
  cma_cm_event_handler+0x25/0xd0 [rdma_cm]
  cma_ib_req_handler+0x7cd/0x1250 [rdma_cm]
  ? addr4_resolve+0x67/0xd0 [ib_core]
  cm_process_work+0x22/0xf0 [ib_cm]
  cm_req_handler+0x7ed/0xf40 [ib_cm]
  ? __switch_to_asm+0x35/0x70
  cm_work_handler+0x798/0xf30 [ib_cm]
  ? finish_task_switch+0x18e/0x2e0
  process_one_work+0x1a7/0x360
  ? create_worker+0x1a0/0x1a0
  worker_thread+0x30/0x390
  ? create_worker+0x1a0/0x1a0
  kthread+0x10a/0x120
  ? set_kthread_struct+0x40/0x40
  ret_from_fork+0x1f/0x40

The code that crash is traverse the listener->waiters list:
struct drbd_path *drbd_find_path_by_addr(struct drbd_listener *listener, struct sockaddr_storage *addr)
{
	struct drbd_path *path;

	list_for_each_entry(path, &listener->waiters, listener_link) {
		if (addr_equal(&path->peer_addr, addr))
			return path;
	}

	return NULL;
}

The listener->waiters list has a Path node:
crash> struct dtr_listener ff4ba75054797c00
struct dtr_listener {
  listener = {
    kref = {
      refcount = {
        refs = {
          counter = 2
        }
      }
    },
    resource = 0xff4ba766cc325000,
    transport_class = 0xffffffffc037f080 <rdma_transport_class>,
    list = {
      next = 0xff4ba766cc325500,
      prev = 0xff4ba766cc325500
    },
    waiters = {
      next = 0xff4ba74fd578e138,
      prev = 0xff4ba74fd578e138
    },
 ...
}

but this Path has been released:
crash> struct drbd_path 0xff4ba74fd578e000
struct drbd_path {
  my_addr = {
    ss_family = 1,
    __data = "\000\000\000\000"
  },
  peer_addr = {
    ss_family = 0,
    __data = "\000\000\000\000\000\000\0"
  },
  kref = {
    refcount = {
      refs = {
        counter = 0
      }
    }
  },
  net = 0x0,
  my_addr_len = 0,
  peer_addr_len = 0,
  flags = 0,
  // all zero
  ...
}

So this path has been released, but it is still on the listener->waiters list,
which cause problem when traverse the list later.

And the scenario of this problem should be like this:
thread_1:
  remove_path()
    dtr_remove_path()
      drbd_put_listener()
        list_del(&path->listener_link)
                                          thread_2:
                                            ...
                                            dtr_activate_path()
                                              drbd_get_listener()
                                                list_add(&path->listener_link, &listener->waiters);
                                            ...
   ...
   kfree(path)

thread_3:
connect request come in:
dtr_cma_event_handler()
  dtr_cma_accept()
    drbd_find_path_by_addr()
    crash

To avoid this use-after-free, we hold an additional reference to drbd_path
whenever it is added to the listener->waiters list, and drop it when removed.

This ensures the path memory remains valid during list traversal.

Signed-off-by: zhengbing.huang <zhengbing.huang@easystack.cn>
---
 drbd/drbd_transport.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drbd/drbd_transport.c b/drbd/drbd_transport.c
index 00e7f9269..aff96716f 100644
--- a/drbd/drbd_transport.c
+++ b/drbd/drbd_transport.c
@@ -224,6 +224,7 @@ int drbd_get_listener(struct drbd_path *path)
 
 	spin_lock_bh(&listener->waiters_lock);
 	list_add(&path->listener_link, &listener->waiters);
+	kref_get(&path->kref);
 	path->listener = listener;
 	spin_unlock_bh(&listener->waiters_lock);
 	/* After exposing the listener on a path, drbd_put_listenr() can destroy it. */
@@ -258,6 +259,7 @@ void drbd_put_listener(struct drbd_path *path)
 
 	spin_lock_bh(&listener->waiters_lock);
 	list_del(&path->listener_link);
+	kref_put(&path->kref, drbd_destroy_path);
 	spin_unlock_bh(&listener->waiters_lock);
 	kref_put(&listener->kref, drbd_listener_destroy);
 }
-- 
2.43.0


  reply	other threads:[~2025-07-09  3:01 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-09  2:55 [PATCH 1/3] rdma: Fix kernel crash in dtr_create_rx_desc() zhengbing.huang
2025-07-09  2:55 ` zhengbing.huang [this message]
2025-07-31 12:36   ` [PATCH 2/3] drbd: Fix kernel crash in drbd_find_path_by_addr() Philipp Reisner
2025-07-09  2:55 ` [PATCH 3/3] rdma: Get drbd_path->kref when get drbd_path by addr zhengbing.huang
2025-07-31 12:36   ` Philipp Reisner
2025-07-31 12:35 ` [PATCH 1/3] rdma: Fix kernel crash in dtr_create_rx_desc() Philipp Reisner
2025-08-01  2:59   ` ZhengbingHuang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250709025553.694792-2-zhengbing.huang@easystack.cn \
    --to=zhengbing.huang@easystack.cn \
    --cc=drbd-dev@lists.linbit.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox