* [PATCH] Fix race bug in nvme_poll_irqdisable()
@ 2026-03-07 19:46 Sungwoo Kim
2026-03-07 20:15 ` Sungwoo Kim
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Sungwoo Kim @ 2026-03-07 19:46 UTC (permalink / raw)
To: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg
Cc: Sungwoo Kim, Chao Shi, Weidong Zhu, Dave Tian, linux-nvme,
linux-kernel
In the following scenario, pdev can be disabled between (1) and (3) by
(2). This sets pdev->msix_enabled = 0. Then, pci_irq_vector() will
return MSI-X IRQ(>15) for (1) whereas return INTx IRQ(<=15) for (2).
This causes IRQ warning because it tries to enable INTx IRQ that has
never been disabled before.
To fix this, save IRQ number into a local variable and ensure
disable_irq() and enable_irq() operate on the same IRQ number.
Even if pci_free_irq_vectors() frees the IRQ concurrently, disable_irq()
and enable_irq() on a stale IRQ number is still valid and safe, and the
depth accounting reamins balanced.
task 1:
nvme_poll_irqdisable()
disable_irq(pci_irq_vector(pdev, nvmeq->cq_vector)) ...(1)
enable_irq(pci_irq_vector(pdev, nvmeq->cq_vector)) ...(3)
task 2:
nvme_reset_work()
nvme_dev_disable()
pdev->msix_enable = 0; ...(2)
crash log:
------------[ cut here ]------------
Unbalanced enable for IRQ 10
WARNING: kernel/irq/manage.c:753 at __enable_irq+0x102/0x190 kernel/irq/manage.c:753, CPU#1: kworker/1:0H/26
Modules linked in:
CPU: 1 UID: 0 PID: 26 Comm: kworker/1:0H Not tainted 6.19.0-dirty #9 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
Workqueue: kblockd blk_mq_timeout_work
RIP: 0010:__enable_irq+0x107/0x190 kernel/irq/manage.c:753
Code: ff df 48 89 fa 48 c1 ea 03 0f b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 04 84 d2 75 79 48 8d 3d 2e 7a 3f 05 41 8b 74 24 2c <67> 48 0f b9 3a e8 ef b9 21 00 5b 41 5c 5d e9 46 54 66 03 e8 e1 b9
RSP: 0018:ffffc900001bf550 EFLAGS: 00010046
RAX: 0000000000000007 RBX: 0000000000000000 RCX: ffffffffb20c0e90
RDX: 0000000000000000 RSI: 000000000000000a RDI: ffffffffb74b88f0
RBP: ffffc900001bf560 R08: ffff88800197cf00 R09: 0000000000000001
R10: 0000000000000003 R11: 0000000000000003 R12: ffff8880012a6000
R13: 1ffff92000037eae R14: 000000000000000a R15: 0000000000000293
FS: 0000000000000000(0000) GS:ffff8880b49f7000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000555da4a25fa8 CR3: 00000000208e8000 CR4: 00000000000006f0
Call Trace:
<TASK>
enable_irq+0x121/0x1e0 kernel/irq/manage.c:797
nvme_poll_irqdisable+0x162/0x1c0 drivers/nvme/host/pci.c:1494
nvme_timeout+0x965/0x14b0 drivers/nvme/host/pci.c:1744
blk_mq_rq_timed_out block/blk-mq.c:1653 [inline]
blk_mq_handle_expired+0x227/0x2d0 block/blk-mq.c:1721
bt_iter+0x2fc/0x3a0 block/blk-mq-tag.c:292
__sbitmap_for_each_set include/linux/sbitmap.h:269 [inline]
sbitmap_for_each_set include/linux/sbitmap.h:290 [inline]
bt_for_each block/blk-mq-tag.c:324 [inline]
blk_mq_queue_tag_busy_iter+0x969/0x1e80 block/blk-mq-tag.c:536
blk_mq_timeout_work+0x627/0x870 block/blk-mq.c:1763
process_one_work+0x956/0x1aa0 kernel/workqueue.c:3257
process_scheduled_works kernel/workqueue.c:3340 [inline]
worker_thread+0x65c/0xe60 kernel/workqueue.c:3421
kthread+0x41a/0x930 kernel/kthread.c:463
ret_from_fork+0x6f8/0x8c0 arch/x86/kernel/process.c:158
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246
</TASK>
irq event stamp: 74478
hardirqs last enabled at (74477): [<ffffffffb5720a9c>] __raw_spin_unlock_irq include/linux/spinlock_api_smp.h:159 [inline]
hardirqs last enabled at (74477): [<ffffffffb5720a9c>] _raw_spin_unlock_irq+0x2c/0x60 kernel/locking/spinlock.c:202
hardirqs last disabled at (74478): [<ffffffffb57207b5>] __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:108 [inline]
hardirqs last disabled at (74478): [<ffffffffb57207b5>] _raw_spin_lock_irqsave+0x85/0xa0 kernel/locking/spinlock.c:162
softirqs last enabled at (74304): [<ffffffffb1e9466c>] __do_softirq kernel/softirq.c:656 [inline]
softirqs last enabled at (74304): [<ffffffffb1e9466c>] invoke_softirq kernel/softirq.c:496 [inline]
softirqs last enabled at (74304): [<ffffffffb1e9466c>] __irq_exit_rcu+0xdc/0x120 kernel/softirq.c:723
softirqs last disabled at (74287): [<ffffffffb1e9466c>] __do_softirq kernel/softirq.c:656 [inline]
softirqs last disabled at (74287): [<ffffffffb1e9466c>] invoke_softirq kernel/softirq.c:496 [inline]
softirqs last disabled at (74287): [<ffffffffb1e9466c>] __irq_exit_rcu+0xdc/0x120 kernel/softirq.c:723
---[ end trace 0000000000000000 ]---
Fixes: fa059b856a59 (nvme-pci: Simplify nvme_poll_irqdisable)
Acked-by: Chao Shi <cshi008@fiu.edu>
Acked-by: Weidong Zhu <weizhu@fiu.edu>
Acked-by: Dave Tian <daveti@purdue.edu>
Signed-off-by: Sungwoo Kim <iam@sung-woo.kim>
---
drivers/nvme/host/pci.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index d86f2565a..a7fb88aa8 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1484,14 +1484,16 @@ static irqreturn_t nvme_irq_check(int irq, void *data)
static void nvme_poll_irqdisable(struct nvme_queue *nvmeq)
{
struct pci_dev *pdev = to_pci_dev(nvmeq->dev->dev);
+ int irq;
WARN_ON_ONCE(test_bit(NVMEQ_POLLED, &nvmeq->flags));
- disable_irq(pci_irq_vector(pdev, nvmeq->cq_vector));
+ irq = pci_irq_vector(pdev, nvmeq->cq_vector);
+ disable_irq(irq);
spin_lock(&nvmeq->cq_poll_lock);
nvme_poll_cq(nvmeq, NULL);
spin_unlock(&nvmeq->cq_poll_lock);
- enable_irq(pci_irq_vector(pdev, nvmeq->cq_vector));
+ enable_irq(irq);
}
static int nvme_poll(struct blk_mq_hw_ctx *hctx, struct io_comp_batch *iob)
--
2.47.3
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] Fix race bug in nvme_poll_irqdisable()
2026-03-07 19:46 [PATCH] Fix race bug in nvme_poll_irqdisable() Sungwoo Kim
@ 2026-03-07 20:15 ` Sungwoo Kim
2026-03-10 13:57 ` Christoph Hellwig
2026-03-10 14:25 ` Keith Busch
2 siblings, 0 replies; 5+ messages in thread
From: Sungwoo Kim @ 2026-03-07 20:15 UTC (permalink / raw)
To: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg
Cc: Chao Shi, Weidong Zhu, Dave Tian, linux-nvme, linux-kernel
Also, this can permanently disable the first IRQ, preventing future
NVMe completions delivery.
Thus, bdev_open() and bdev_release() may stuck.
The task hung detector can detect this:
INFO: task syz-executor.2:15754 is blocked on a mutex likely owned by
task systemd-udevd:136.
task:systemd-udevd state:D stack:0 pid:136 tgid:136 ppid:1
task_flags:0x400100 flags:0x00080000
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5260 [inline]
__schedule+0xc92/0x54e0 kernel/sched/core.c:6867
__schedule_loop kernel/sched/core.c:6949 [inline]
schedule+0x87/0x300 kernel/sched/core.c:6964
io_schedule+0xce/0x150 kernel/sched/core.c:7791
folio_wait_bit_common+0x367/0x8b0 mm/filemap.c:1323
folio_wait_bit_killable mm/filemap.c:1468 [inline]
folio_wait_locked_killable include/linux/pagemap.h:1252 [inline]
folio_wait_locked_killable include/linux/pagemap.h:1248 [inline]
filemap_read_folio+0x175/0x2a0 mm/filemap.c:2502
do_read_cache_folio+0x258/0x480 mm/filemap.c:4096
read_cache_folio+0x5e/0x80 mm/filemap.c:4128
read_mapping_folio include/linux/pagemap.h:1028 [inline]
read_part_sector+0xd8/0x320 block/partitions/core.c:722
read_lba+0x1cb/0x370 block/partitions/efi.c:247
find_valid_gpt block/partitions/efi.c:602 [inline]
efi_partition+0x285/0x2b10 block/partitions/efi.c:719
check_partition block/partitions/core.c:141 [inline]
blk_add_partitions block/partitions/core.c:589 [inline]
bdev_disk_changed block/partitions/core.c:693 [inline]
bdev_disk_changed+0x8dc/0x1710 block/partitions/core.c:642
blkdev_get_whole+0x194/0x2a0 block/bdev.c:765
bdev_open+0x4ba/0xeb0 block/bdev.c:974
bdev_file_open_by_dev block/bdev.c:1076 [inline]
bdev_file_open_by_dev+0x1b4/0x250 block/bdev.c:1051
disk_scan_partitions+0x1f5/0x320 block/genhd.c:387
blkdev_common_ioctl+0x3fb/0x29b0 block/ioctl.c:707
blkdev_ioctl+0x299/0x700 block/ioctl.c:786
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:597 [inline]
__se_sys_ioctl fs/ioctl.c:583 [inline]
__x64_sys_ioctl+0x1bf/0x220 fs/ioctl.c:583
x64_sys_call+0x1280/0x21b0 arch/x86/include/generated/asm/syscalls_64.h:17
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0x71/0x330 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f16e1d2e277
RSP: 002b:00007ffe72978038 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007ffe72978150 RCX: 00007f16e1d2e277
RDX: 0000000000000000 RSI: 000000000000125f RDI: 000000000000000e
RBP: 00007ffe729781d0 R08: 000055ed818ee543 R09: 00007f16e1e0bbe0
R10: 0000000000000000 R11: 0000000000000246 R12: 000055ed818f4071
R13: 000055ed818ee89f R14: 00007ffe72978090 R15: 000055edb8a23bd0
</TASK>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] Fix race bug in nvme_poll_irqdisable()
2026-03-07 19:46 [PATCH] Fix race bug in nvme_poll_irqdisable() Sungwoo Kim
2026-03-07 20:15 ` Sungwoo Kim
@ 2026-03-10 13:57 ` Christoph Hellwig
2026-03-10 14:25 ` Keith Busch
2 siblings, 0 replies; 5+ messages in thread
From: Christoph Hellwig @ 2026-03-10 13:57 UTC (permalink / raw)
To: Sungwoo Kim
Cc: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
Chao Shi, Weidong Zhu, Dave Tian, linux-nvme, linux-kernel
Looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] Fix race bug in nvme_poll_irqdisable()
2026-03-07 19:46 [PATCH] Fix race bug in nvme_poll_irqdisable() Sungwoo Kim
2026-03-07 20:15 ` Sungwoo Kim
2026-03-10 13:57 ` Christoph Hellwig
@ 2026-03-10 14:25 ` Keith Busch
2026-03-11 20:45 ` Sungwoo Kim
2 siblings, 1 reply; 5+ messages in thread
From: Keith Busch @ 2026-03-10 14:25 UTC (permalink / raw)
To: Sungwoo Kim
Cc: Jens Axboe, Christoph Hellwig, Sagi Grimberg, Chao Shi,
Weidong Zhu, Dave Tian, linux-nvme, linux-kernel
On Sat, Mar 07, 2026 at 02:46:36PM -0500, Sungwoo Kim wrote:
> In the following scenario, pdev can be disabled between (1) and (3) by
> (2). This sets pdev->msix_enabled = 0. Then, pci_irq_vector() will
> return MSI-X IRQ(>15) for (1) whereas return INTx IRQ(<=15) for (2).
> This causes IRQ warning because it tries to enable INTx IRQ that has
> never been disabled before.
Thanks, applied to nvme-7.0 with updated commit subject.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] Fix race bug in nvme_poll_irqdisable()
2026-03-10 14:25 ` Keith Busch
@ 2026-03-11 20:45 ` Sungwoo Kim
0 siblings, 0 replies; 5+ messages in thread
From: Sungwoo Kim @ 2026-03-11 20:45 UTC (permalink / raw)
To: Keith Busch
Cc: Jens Axboe, Christoph Hellwig, Sagi Grimberg, Chao Shi,
Weidong Zhu, Dave Tian, linux-nvme, linux-kernel
On Tue, Mar 10, 2026 at 10:25 AM Keith Busch <kbusch@kernel.org> wrote:
>
> On Sat, Mar 07, 2026 at 02:46:36PM -0500, Sungwoo Kim wrote:
> > In the following scenario, pdev can be disabled between (1) and (3) by
> > (2). This sets pdev->msix_enabled = 0. Then, pci_irq_vector() will
> > return MSI-X IRQ(>15) for (1) whereas return INTx IRQ(<=15) for (2).
> > This causes IRQ warning because it tries to enable INTx IRQ that has
> > never been disabled before.
>
> Thanks, applied to nvme-7.0 with updated commit subject.
>
Thank you for updating the subject. I missed it :)
Sungwoo.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-03-11 20:46 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-07 19:46 [PATCH] Fix race bug in nvme_poll_irqdisable() Sungwoo Kim
2026-03-07 20:15 ` Sungwoo Kim
2026-03-10 13:57 ` Christoph Hellwig
2026-03-10 14:25 ` Keith Busch
2026-03-11 20:45 ` Sungwoo Kim
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox