Linux block layer
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Guy Eisenberg <geisenberg@nvidia.com>
Cc: "linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	"axboe@kernel.dk" <axboe@kernel.dk>,
	Jared Holzman <jholzman@nvidia.com>, Yoav Cohen <yoav@nvidia.com>,
	Omri Levi <omril@nvidia.com>, Ofer Oshri <ofer@nvidia.com>
Subject: Re: ublk: kernel crash when killing SPDK application
Date: Tue, 15 Apr 2025 20:56:21 +0800	[thread overview]
Message-ID: <Z_5XdWPQa7cq1nDJ@fedora> (raw)
In-Reply-To: <IA1PR12MB645841796CB4C76F62F24522A9B22@IA1PR12MB6458.namprd12.prod.outlook.com>

On Tue, Apr 15, 2025 at 10:58:37AM +0000, Guy Eisenberg wrote:
> I am writing to report a kernel crash that occurred after terminating (kill -9) an SPDK application using ublk.
> Below are the details of the incident, including steps to reproduce the issue and the call stack.
> 
> Incident Description:
> After terminating an SPDK application, the system occasionally experiences a kernel crash.
> This issue is not consistent but happens once every few tries under the following conditions.
> We are using kernel 6.14.0-061400-generic
> 
> Steps to Reproduce:
> 1. install SPDK:
>       git clone https://github.com/spdk/spdk 
>       cd spdk
>       ./configure --disable-coverage --disable-debug --disable-tests --enable-unit-tests --without-crypto --without-fio --with-vhost --with-rdma --without-nvme-cuse --without-fuse --without-vfio-user --without-vtune --without-iscsi-initiator --without-shared --with-ublk --with-uring --with-raid5f
>       make
>       make install
> 2.  Create SPDK bdev (here we used PCI 0000.8b.00.0 as the nvme target, and named the bdev as guy_bdev):
>       ./spdk/scripts/setup.sh reset
>       ./spdk/scripts/setup.sh
>       /usr/local/bin/spdk_tgt --mem-size 2048 -m 0xff
>       ./spdk/scripts/rpc.py bdev_nvme_attach_controller -b guy_bdev -t PCIe -a 0000.8b.00.0
> 3. Expose it via ublk
>       modprobe ublk_drv
>       ./spdk/scripts/rpc.py ublk_create_target
>       ./spdk/scripts/rpc.py ublk_start_disk -q 8 -d 128 guy_bdevn1 0
> 4. Run IO to the /dev/ublkb0 that was created
>       Kill the spdk_tgt process (kill -9)
> 
> 
> Call Stack:
>       Below is the call stack captured during one of the crashes:
> 
> [54346.157495] [ T288311] BUG: kernel NULL pointer dereference, address: 0000000000000000
> [54346.157625] [ T288311] #PF: supervisor write access in kernel mode
> [54346.157708] [ T288311] #PF: error_code(0x0002) - not-present page
> [54346.157790] [ T288311] PGD 0 P4D 0 
> [54346.157911] [ T288311] Oops: Oops: 0002 [#1] PREEMPT SMP PTI
> [54346.158010] [ T288311] CPU: 0 UID: 0 PID: 288311 Comm: reactor_0 Kdump: loaded Tainted: G           OE      6.14.0-061400-generic #202503241442
> [54346.158264] [ T288311] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
> [54346.158374] [ T288311] Hardware name: Supermicro SYS-2028BT-HNR+/X10DRT-B+, BIOS 2.0 01/10/2017
> [54346.158490] [ T288311] RIP: 0010:percpu_ref_get_many+0x35/0x50

Looks one uring_cmd use-after-free issue.

And the following patchset may avoid it:

	https://lore.kernel.org/linux-block/20250414112554.3025113-1-ming.lei@redhat.com/

If you can build & test kernel, please apply the following debug patch
against v6.14 and post the panic log.


diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index ca9a67b5b537..6e50e8b9f836 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -1127,6 +1127,7 @@ static void ubq_complete_io_cmd(struct ublk_io *io, int res,
 
 	/* tell ublksrv one io request is coming */
 	io_uring_cmd_done(io->cmd, res, 0, issue_flags);
+	io->cmd = NULL;
 }
 
 #define UBLK_REQUEUE_DELAY_MS	3
@@ -1498,8 +1499,10 @@ static void ublk_cancel_cmd(struct ublk_queue *ubq, struct ublk_io *io,
 		io->flags |= UBLK_IO_FLAG_CANCELED;
 	spin_unlock(&ubq->cancel_lock);
 
-	if (!done)
+	if (!done) {
 		io_uring_cmd_done(io->cmd, UBLK_IO_RES_ABORT, 0, issue_flags);
+		io->cmd = NULL;
+	}
 }
 
 /*
@@ -1770,6 +1773,8 @@ static int __ublk_ch_uring_cmd(struct io_uring_cmd *cmd,
 	if (!ubq || ub_cmd->q_id != ubq->q_id)
 		goto out;
 
+	WARN_ON_ONCE(ubq->canceling);
+
 	if (ubq->ubq_daemon && ubq->ubq_daemon != current)
 		goto out;
 



Thanks,
Ming


  reply	other threads:[~2025-04-15 12:56 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-15 10:58 ublk: kernel crash when killing SPDK application Guy Eisenberg
2025-04-15 12:56 ` Ming Lei [this message]
2025-04-22 11:43   ` Jared Holzman
2025-04-22 13:42     ` Ming Lei
2025-04-22 13:47       ` Jared Holzman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z_5XdWPQa7cq1nDJ@fedora \
    --to=ming.lei@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=geisenberg@nvidia.com \
    --cc=jholzman@nvidia.com \
    --cc=linux-block@vger.kernel.org \
    --cc=ofer@nvidia.com \
    --cc=omril@nvidia.com \
    --cc=yoav@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox