From: Ming Lei <ming.lei@redhat.com>
To: Guy Eisenberg <geisenberg@nvidia.com>
Cc: "linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
"axboe@kernel.dk" <axboe@kernel.dk>,
Jared Holzman <jholzman@nvidia.com>, Yoav Cohen <yoav@nvidia.com>,
Omri Levi <omril@nvidia.com>, Ofer Oshri <ofer@nvidia.com>
Subject: Re: ublk: kernel crash when killing SPDK application
Date: Tue, 15 Apr 2025 20:56:21 +0800 [thread overview]
Message-ID: <Z_5XdWPQa7cq1nDJ@fedora> (raw)
In-Reply-To: <IA1PR12MB645841796CB4C76F62F24522A9B22@IA1PR12MB6458.namprd12.prod.outlook.com>
On Tue, Apr 15, 2025 at 10:58:37AM +0000, Guy Eisenberg wrote:
> I am writing to report a kernel crash that occurred after terminating (kill -9) an SPDK application using ublk.
> Below are the details of the incident, including steps to reproduce the issue and the call stack.
>
> Incident Description:
> After terminating an SPDK application, the system occasionally experiences a kernel crash.
> This issue is not consistent but happens once every few tries under the following conditions.
> We are using kernel 6.14.0-061400-generic
>
> Steps to Reproduce:
> 1. install SPDK:
> git clone https://github.com/spdk/spdk
> cd spdk
> ./configure --disable-coverage --disable-debug --disable-tests --enable-unit-tests --without-crypto --without-fio --with-vhost --with-rdma --without-nvme-cuse --without-fuse --without-vfio-user --without-vtune --without-iscsi-initiator --without-shared --with-ublk --with-uring --with-raid5f
> make
> make install
> 2. Create SPDK bdev (here we used PCI 0000.8b.00.0 as the nvme target, and named the bdev as guy_bdev):
> ./spdk/scripts/setup.sh reset
> ./spdk/scripts/setup.sh
> /usr/local/bin/spdk_tgt --mem-size 2048 -m 0xff
> ./spdk/scripts/rpc.py bdev_nvme_attach_controller -b guy_bdev -t PCIe -a 0000.8b.00.0
> 3. Expose it via ublk
> modprobe ublk_drv
> ./spdk/scripts/rpc.py ublk_create_target
> ./spdk/scripts/rpc.py ublk_start_disk -q 8 -d 128 guy_bdevn1 0
> 4. Run IO to the /dev/ublkb0 that was created
> Kill the spdk_tgt process (kill -9)
>
>
> Call Stack:
> Below is the call stack captured during one of the crashes:
>
> [54346.157495] [ T288311] BUG: kernel NULL pointer dereference, address: 0000000000000000
> [54346.157625] [ T288311] #PF: supervisor write access in kernel mode
> [54346.157708] [ T288311] #PF: error_code(0x0002) - not-present page
> [54346.157790] [ T288311] PGD 0 P4D 0
> [54346.157911] [ T288311] Oops: Oops: 0002 [#1] PREEMPT SMP PTI
> [54346.158010] [ T288311] CPU: 0 UID: 0 PID: 288311 Comm: reactor_0 Kdump: loaded Tainted: G OE 6.14.0-061400-generic #202503241442
> [54346.158264] [ T288311] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
> [54346.158374] [ T288311] Hardware name: Supermicro SYS-2028BT-HNR+/X10DRT-B+, BIOS 2.0 01/10/2017
> [54346.158490] [ T288311] RIP: 0010:percpu_ref_get_many+0x35/0x50
Looks one uring_cmd use-after-free issue.
And the following patchset may avoid it:
https://lore.kernel.org/linux-block/20250414112554.3025113-1-ming.lei@redhat.com/
If you can build & test kernel, please apply the following debug patch
against v6.14 and post the panic log.
diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index ca9a67b5b537..6e50e8b9f836 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -1127,6 +1127,7 @@ static void ubq_complete_io_cmd(struct ublk_io *io, int res,
/* tell ublksrv one io request is coming */
io_uring_cmd_done(io->cmd, res, 0, issue_flags);
+ io->cmd = NULL;
}
#define UBLK_REQUEUE_DELAY_MS 3
@@ -1498,8 +1499,10 @@ static void ublk_cancel_cmd(struct ublk_queue *ubq, struct ublk_io *io,
io->flags |= UBLK_IO_FLAG_CANCELED;
spin_unlock(&ubq->cancel_lock);
- if (!done)
+ if (!done) {
io_uring_cmd_done(io->cmd, UBLK_IO_RES_ABORT, 0, issue_flags);
+ io->cmd = NULL;
+ }
}
/*
@@ -1770,6 +1773,8 @@ static int __ublk_ch_uring_cmd(struct io_uring_cmd *cmd,
if (!ubq || ub_cmd->q_id != ubq->q_id)
goto out;
+ WARN_ON_ONCE(ubq->canceling);
+
if (ubq->ubq_daemon && ubq->ubq_daemon != current)
goto out;
Thanks,
Ming
next prev parent reply other threads:[~2025-04-15 12:56 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-15 10:58 ublk: kernel crash when killing SPDK application Guy Eisenberg
2025-04-15 12:56 ` Ming Lei [this message]
2025-04-22 11:43 ` Jared Holzman
2025-04-22 13:42 ` Ming Lei
2025-04-22 13:47 ` Jared Holzman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z_5XdWPQa7cq1nDJ@fedora \
--to=ming.lei@redhat.com \
--cc=axboe@kernel.dk \
--cc=geisenberg@nvidia.com \
--cc=jholzman@nvidia.com \
--cc=linux-block@vger.kernel.org \
--cc=ofer@nvidia.com \
--cc=omril@nvidia.com \
--cc=yoav@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.