From: Ming Lei <ming.lei@redhat.com>
To: Guy Eisenberg <geisenberg@nvidia.com>
Cc: "linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
"axboe@kernel.dk" <axboe@kernel.dk>,
Jared Holzman <jholzman@nvidia.com>, Yoav Cohen <yoav@nvidia.com>,
Omri Levi <omril@nvidia.com>, Ofer Oshri <ofer@nvidia.com>
Subject: Re: ublk: kernel crash when killing SPDK application
Date: Tue, 15 Apr 2025 20:56:21 +0800 [thread overview]
Message-ID: <Z_5XdWPQa7cq1nDJ@fedora> (raw)
In-Reply-To: <IA1PR12MB645841796CB4C76F62F24522A9B22@IA1PR12MB6458.namprd12.prod.outlook.com>
On Tue, Apr 15, 2025 at 10:58:37AM +0000, Guy Eisenberg wrote:
> I am writing to report a kernel crash that occurred after terminating (kill -9) an SPDK application using ublk.
> Below are the details of the incident, including steps to reproduce the issue and the call stack.
>
> Incident Description:
> After terminating an SPDK application, the system occasionally experiences a kernel crash.
> This issue is not consistent but happens once every few tries under the following conditions.
> We are using kernel 6.14.0-061400-generic
>
> Steps to Reproduce:
> 1. install SPDK:
> git clone https://github.com/spdk/spdk
> cd spdk
> ./configure --disable-coverage --disable-debug --disable-tests --enable-unit-tests --without-crypto --without-fio --with-vhost --with-rdma --without-nvme-cuse --without-fuse --without-vfio-user --without-vtune --without-iscsi-initiator --without-shared --with-ublk --with-uring --with-raid5f
> make
> make install
> 2. Create SPDK bdev (here we used PCI 0000.8b.00.0 as the nvme target, and named the bdev as guy_bdev):
> ./spdk/scripts/setup.sh reset
> ./spdk/scripts/setup.sh
> /usr/local/bin/spdk_tgt --mem-size 2048 -m 0xff
> ./spdk/scripts/rpc.py bdev_nvme_attach_controller -b guy_bdev -t PCIe -a 0000.8b.00.0
> 3. Expose it via ublk
> modprobe ublk_drv
> ./spdk/scripts/rpc.py ublk_create_target
> ./spdk/scripts/rpc.py ublk_start_disk -q 8 -d 128 guy_bdevn1 0
> 4. Run IO to the /dev/ublkb0 that was created
> Kill the spdk_tgt process (kill -9)
>
>
> Call Stack:
> Below is the call stack captured during one of the crashes:
>
> [54346.157495] [ T288311] BUG: kernel NULL pointer dereference, address: 0000000000000000
> [54346.157625] [ T288311] #PF: supervisor write access in kernel mode
> [54346.157708] [ T288311] #PF: error_code(0x0002) - not-present page
> [54346.157790] [ T288311] PGD 0 P4D 0
> [54346.157911] [ T288311] Oops: Oops: 0002 [#1] PREEMPT SMP PTI
> [54346.158010] [ T288311] CPU: 0 UID: 0 PID: 288311 Comm: reactor_0 Kdump: loaded Tainted: G OE 6.14.0-061400-generic #202503241442
> [54346.158264] [ T288311] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
> [54346.158374] [ T288311] Hardware name: Supermicro SYS-2028BT-HNR+/X10DRT-B+, BIOS 2.0 01/10/2017
> [54346.158490] [ T288311] RIP: 0010:percpu_ref_get_many+0x35/0x50
Looks one uring_cmd use-after-free issue.
And the following patchset may avoid it:
https://lore.kernel.org/linux-block/20250414112554.3025113-1-ming.lei@redhat.com/
If you can build & test kernel, please apply the following debug patch
against v6.14 and post the panic log.
diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index ca9a67b5b537..6e50e8b9f836 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -1127,6 +1127,7 @@ static void ubq_complete_io_cmd(struct ublk_io *io, int res,
/* tell ublksrv one io request is coming */
io_uring_cmd_done(io->cmd, res, 0, issue_flags);
+ io->cmd = NULL;
}
#define UBLK_REQUEUE_DELAY_MS 3
@@ -1498,8 +1499,10 @@ static void ublk_cancel_cmd(struct ublk_queue *ubq, struct ublk_io *io,
io->flags |= UBLK_IO_FLAG_CANCELED;
spin_unlock(&ubq->cancel_lock);
- if (!done)
+ if (!done) {
io_uring_cmd_done(io->cmd, UBLK_IO_RES_ABORT, 0, issue_flags);
+ io->cmd = NULL;
+ }
}
/*
@@ -1770,6 +1773,8 @@ static int __ublk_ch_uring_cmd(struct io_uring_cmd *cmd,
if (!ubq || ub_cmd->q_id != ubq->q_id)
goto out;
+ WARN_ON_ONCE(ubq->canceling);
+
if (ubq->ubq_daemon && ubq->ubq_daemon != current)
goto out;
Thanks,
Ming
next prev parent reply other threads:[~2025-04-15 12:56 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-15 10:58 ublk: kernel crash when killing SPDK application Guy Eisenberg
2025-04-15 12:56 ` Ming Lei [this message]
2025-04-22 11:43 ` Jared Holzman
2025-04-22 13:42 ` Ming Lei
2025-04-22 13:47 ` Jared Holzman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z_5XdWPQa7cq1nDJ@fedora \
--to=ming.lei@redhat.com \
--cc=axboe@kernel.dk \
--cc=geisenberg@nvidia.com \
--cc=jholzman@nvidia.com \
--cc=linux-block@vger.kernel.org \
--cc=ofer@nvidia.com \
--cc=omril@nvidia.com \
--cc=yoav@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox