* [PATCH 0/1] nvme-loop: avoid cancelling/aborting I/O and admin tagset
@ 2026-03-13 11:38 Nilay Shroff
2026-03-13 11:38 ` [PATCH 1/1] nvme-loop: do not cancel I/O and admin tagset during ctrl reset/shutdown Nilay Shroff
0 siblings, 1 reply; 4+ messages in thread
From: Nilay Shroff @ 2026-03-13 11:38 UTC (permalink / raw)
To: linux-nvme; +Cc: hch, kbusch, sagi, kch, gjoyce
Hi,
During nvme-loop controller reset or shutdown, the current code first
cancels/aborts the I/O and admin tagsets and then proceeds to destroy
the corresponding I/O and admin queues.
For the loop controller this cancellation is unnecessary. The queue
destruction path already waits for all in-flight target I/O and admin
operations to complete, which ensures that no outstanding operations
remain before the queues are torn down.
Cancelling the tagsets first also introduces a small race window where
a late completion from the target may arrive after the corresponding
request tag has been cancelled but before the queues are destroyed.
If this occurs, the completion path may attempt to access a request
whose tag has already been cancelled or freed, which can lead to a
kernel crash. So the patch in this patchset, avoids cancelling/aborting
the I/O and admin tagsets for nvme-loop target, as this step is redundant
and can expose the race described above.
This issue was observed while running blktests nvme/040. The kernel crash
encountered is shown below:
run blktests nvme/040 at 2026-03-08 06:34:27
loop0: detected capacity change from 0 to 2097152
nvmet: adding nsid 1 to subsystem blktests-subsystem-1
nvmet: Created nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
nvme nvme6: creating 96 I/O queues.
nvme nvme6: new ctrl: "blktests-subsystem-1"
nvme_log_error: 1 callbacks suppressed
block nvme6n1: no usable path - requeuing I/O
nvme6c6n1: Read(0x2) @ LBA 2096384, 128 blocks, Host Aborted Command (sct 0x3 / sc 0x71)
blk_print_req_error: 1 callbacks suppressed
I/O error, dev nvme6c6n1, sector 2096384 op 0x0:(READ) flags 0x2880700 phys_seg 1 prio class 2
block nvme6n1: no usable path - requeuing I/O
Kernel attempted to read user page (286) - exploit attempt? (uid: 0)
BUG: Kernel NULL pointer dereference on read at 0x00000286
Faulting instruction address: 0xc00000000090ca18
Oops: Kernel access of bad area, sig: 11 [#1]
[...]
[...]
NIP [c000000000961274] blk_mq_complete_request_remote+0x28/0x2d4
LR [c008000009af1808] nvme_loop_queue_response+0x110/0x290 [nvme_loop]
Call Trace:
0xc00000000502c640 (unreliable)
nvme_loop_queue_response+0x104/0x290 [nvme_loop]
__nvmet_req_complete+0x80/0x498 [nvmet]
nvmet_req_complete+0x24/0xf8 [nvmet]
nvmet_bio_done+0x58/0xcc [nvmet]
bio_endio+0x250/0x390
blk_update_request+0x2e8/0x68c
blk_mq_end_request+0x30/0x5c
lo_complete_rq+0x94/0x110 [loop]
blk_complete_reqs+0x78/0x98
handle_softirqs+0x148/0x454
do_softirq_own_stack+0x3c/0x50
__irq_exit_rcu+0x18c/0x1b4
irq_exit+0x1c/0x34
do_IRQ+0x114/0x278
hardware_interrupt_common_virt+0x28c/0x290
The above kernel oops occured in blk_mq_complete_request_remote():
1319 bool blk_mq_complete_request_remote(struct request *rq)
1320 {
1321 WRITE_ONCE(rq->state, MQ_RQ_COMPLETE);
1322
1323 /*
1324 * For request which hctx has only one ctx mapping,
1325 * or a polled request, always complete locally,
1326 * it's pointless to redirect the completion.
1327 */
1328 if ((rq->mq_hctx->nr_ctx == 1 &&
1329 rq->mq_ctx->cpu == raw_smp_processor_id()) ||
1330 rq->cmd_flags & REQ_POLLED)
1331 return false;
In the above code on line #1328, when kernel attempts to dereference
rq->mq_hctx->nr_ctx it triggers carsh because rq->mq_hctx is NULL.
This request has been already aborted/cancelled while loop controller
reset is initiated.
Nilay Shroff (1):
nvme-loop: do not cancel I/O and admin tagset during ctrl
reset/shutdown
drivers/nvme/target/loop.c | 2 --
1 file changed, 2 deletions(-)
--
2.53.0
^ permalink raw reply [flat|nested] 4+ messages in thread* [PATCH 1/1] nvme-loop: do not cancel I/O and admin tagset during ctrl reset/shutdown
2026-03-13 11:38 [PATCH 0/1] nvme-loop: avoid cancelling/aborting I/O and admin tagset Nilay Shroff
@ 2026-03-13 11:38 ` Nilay Shroff
2026-03-20 7:59 ` Christoph Hellwig
2026-03-24 15:30 ` Keith Busch
0 siblings, 2 replies; 4+ messages in thread
From: Nilay Shroff @ 2026-03-13 11:38 UTC (permalink / raw)
To: linux-nvme; +Cc: hch, kbusch, sagi, kch, gjoyce
Cancelling the I/O and admin tagsets during nvme-loop controller reset
or shutdown is unnecessary. The subsequent destruction of the I/O and
admin queues already waits for all in-flight target operations to
complete.
Cancelling the tagsets first also opens a race window. After a request
tag has been cancelled, a late completion from the target may still
arrive before the queues are destroyed. In that case the completion path
may access a request whose tag has already been cancelled or freed,
which can lead to a kernel crash. Please see below the kernel crash
encountered while running blktests nvme/040:
run blktests nvme/040 at 2026-03-08 06:34:27
loop0: detected capacity change from 0 to 2097152
nvmet: adding nsid 1 to subsystem blktests-subsystem-1
nvmet: Created nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
nvme nvme6: creating 96 I/O queues.
nvme nvme6: new ctrl: "blktests-subsystem-1"
nvme_log_error: 1 callbacks suppressed
block nvme6n1: no usable path - requeuing I/O
nvme6c6n1: Read(0x2) @ LBA 2096384, 128 blocks, Host Aborted Command (sct 0x3 / sc 0x71)
blk_print_req_error: 1 callbacks suppressed
I/O error, dev nvme6c6n1, sector 2096384 op 0x0:(READ) flags 0x2880700 phys_seg 1 prio class 2
block nvme6n1: no usable path - requeuing I/O
Kernel attempted to read user page (236) - exploit attempt? (uid: 0)
BUG: Kernel NULL pointer dereference on read at 0x00000236
Faulting instruction address: 0xc000000000961274
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
Modules linked in: nvme_loop nvme_fabrics loop nvmet null_blk rpadlpar_io rpaphp xsk_diag bonding rfkill nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink pseries_rng dax_pmem vmx_crypto drm drm_panel_orientation_quirks xfs mlx5_core nvme bnx2x sd_mod nd_pmem nd_btt nvme_core sg papr_scm tls libnvdimm ibmvscsi ibmveth scsi_transport_srp nvme_keyring nvme_auth mdio hkdf pseries_wdt dm_mirror dm_region_hash dm_log dm_mod fuse [last unloaded: loop]
CPU: 25 UID: 0 PID: 0 Comm: swapper/25 Kdump: loaded Not tainted 7.0.0-rc3+ #14 PREEMPT
Hardware name: IBM,9043-MRX Power11 (architected) 0x820200 0xf000007 of:IBM,FW1120.00 (RF1120_128) hv:phyp pSeries
NIP: c000000000961274 LR: c008000009af1808 CTR: c00000000096124c
REGS: c0000007ffc0f910 TRAP: 0300 Not tainted (7.0.0-rc3+)
MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 22222222 XER: 00000000
CFAR: c008000009af232c DAR: 0000000000000236 DSISR: 40000000 IRQMASK: 0
GPR00: c008000009af17fc c0000007ffc0fbb0 c000000001c78100 c0000000be05cc00
GPR04: 0000000000000001 0000000000000000 0000000000000007 0000000000000000
GPR08: 0000000000000000 0000000000000000 0000000000000002 c008000009af2318
GPR12: c00000000096124c c0000007ffdab880 0000000000000000 0000000000000000
GPR16: 0000000000000010 0000000000000000 0000000000000004 0000000000000000
GPR20: 0000000000000001 c000000002ca2b00 0000000100043bb2 000000000000000a
GPR24: 000000000000000a 0000000000000000 0000000000000000 0000000000000000
GPR28: c000000084021d40 c000000084021d50 c0000000be05cd60 c0000000be05cc00
NIP [c000000000961274] blk_mq_complete_request_remote+0x28/0x2d4
LR [c008000009af1808] nvme_loop_queue_response+0x110/0x290 [nvme_loop]
Call Trace:
0xc00000000502c640 (unreliable)
nvme_loop_queue_response+0x104/0x290 [nvme_loop]
__nvmet_req_complete+0x80/0x498 [nvmet]
nvmet_req_complete+0x24/0xf8 [nvmet]
nvmet_bio_done+0x58/0xcc [nvmet]
bio_endio+0x250/0x390
blk_update_request+0x2e8/0x68c
blk_mq_end_request+0x30/0x5c
lo_complete_rq+0x94/0x110 [loop]
blk_complete_reqs+0x78/0x98
handle_softirqs+0x148/0x454
do_softirq_own_stack+0x3c/0x50
__irq_exit_rcu+0x18c/0x1b4
irq_exit+0x1c/0x34
do_IRQ+0x114/0x278
hardware_interrupt_common_virt+0x28c/0x290
Since the queue teardown path already guarantees that all target-side
operations have completed, cancelling the tagsets is redundant and
unsafe. So avoid cancelling the I/O and admin tagsets during controller
reset and shutdown.
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
---
drivers/nvme/target/loop.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/drivers/nvme/target/loop.c b/drivers/nvme/target/loop.c
index 4b3f4f11928d..d98d0cdc5d6f 100644
--- a/drivers/nvme/target/loop.c
+++ b/drivers/nvme/target/loop.c
@@ -419,7 +419,6 @@ static void nvme_loop_shutdown_ctrl(struct nvme_loop_ctrl *ctrl)
{
if (ctrl->ctrl.queue_count > 1) {
nvme_quiesce_io_queues(&ctrl->ctrl);
- nvme_cancel_tagset(&ctrl->ctrl);
nvme_loop_destroy_io_queues(ctrl);
}
@@ -427,7 +426,6 @@ static void nvme_loop_shutdown_ctrl(struct nvme_loop_ctrl *ctrl)
if (nvme_ctrl_state(&ctrl->ctrl) == NVME_CTRL_LIVE)
nvme_disable_ctrl(&ctrl->ctrl, true);
- nvme_cancel_admin_tagset(&ctrl->ctrl);
nvme_loop_destroy_admin_queue(ctrl);
}
--
2.53.0
^ permalink raw reply related [flat|nested] 4+ messages in thread* Re: [PATCH 1/1] nvme-loop: do not cancel I/O and admin tagset during ctrl reset/shutdown
2026-03-13 11:38 ` [PATCH 1/1] nvme-loop: do not cancel I/O and admin tagset during ctrl reset/shutdown Nilay Shroff
@ 2026-03-20 7:59 ` Christoph Hellwig
2026-03-24 15:30 ` Keith Busch
1 sibling, 0 replies; 4+ messages in thread
From: Christoph Hellwig @ 2026-03-20 7:59 UTC (permalink / raw)
To: Nilay Shroff; +Cc: linux-nvme, hch, kbusch, sagi, kch, gjoyce
On Fri, Mar 13, 2026 at 05:08:48PM +0530, Nilay Shroff wrote:
> Cancelling the I/O and admin tagsets during nvme-loop controller reset
> or shutdown is unnecessary. The subsequent destruction of the I/O and
> admin queues already waits for all in-flight target operations to
> complete.
>
> Cancelling the tagsets first also opens a race window. After a request
> tag has been cancelled, a late completion from the target may still
> arrive before the queues are destroyed. In that case the completion path
> may access a request whose tag has already been cancelled or freed,
> which can lead to a kernel crash. Please see below the kernel crash
> encountered while running blktests nvme/040:
Looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH 1/1] nvme-loop: do not cancel I/O and admin tagset during ctrl reset/shutdown
2026-03-13 11:38 ` [PATCH 1/1] nvme-loop: do not cancel I/O and admin tagset during ctrl reset/shutdown Nilay Shroff
2026-03-20 7:59 ` Christoph Hellwig
@ 2026-03-24 15:30 ` Keith Busch
1 sibling, 0 replies; 4+ messages in thread
From: Keith Busch @ 2026-03-24 15:30 UTC (permalink / raw)
To: Nilay Shroff; +Cc: linux-nvme, hch, sagi, kch, gjoyce
On Fri, Mar 13, 2026 at 05:08:48PM +0530, Nilay Shroff wrote:
> Cancelling the I/O and admin tagsets during nvme-loop controller reset
> or shutdown is unnecessary. The subsequent destruction of the I/O and
> admin queues already waits for all in-flight target operations to
> complete.
Thanks, applied to nvme-7.1.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-03-24 15:30 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-13 11:38 [PATCH 0/1] nvme-loop: avoid cancelling/aborting I/O and admin tagset Nilay Shroff
2026-03-13 11:38 ` [PATCH 1/1] nvme-loop: do not cancel I/O and admin tagset during ctrl reset/shutdown Nilay Shroff
2026-03-20 7:59 ` Christoph Hellwig
2026-03-24 15:30 ` Keith Busch
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox