public inbox for linux-nvme@lists.infradead.org
 help / color / mirror / Atom feed
* [PATCH] Fix race bug in nvme_poll_irqdisable()
@ 2026-03-07 19:46 Sungwoo Kim
  2026-03-07 20:15 ` Sungwoo Kim
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Sungwoo Kim @ 2026-03-07 19:46 UTC (permalink / raw)
  To: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg
  Cc: Sungwoo Kim, Chao Shi, Weidong Zhu, Dave Tian, linux-nvme,
	linux-kernel

In the following scenario, pdev can be disabled between (1) and (3) by
(2). This sets pdev->msix_enabled = 0. Then, pci_irq_vector() will
return MSI-X IRQ(>15) for (1) whereas return INTx IRQ(<=15) for (2).
This causes IRQ warning because it tries to enable INTx IRQ that has
never been disabled before.

To fix this, save IRQ number into a local variable and ensure
disable_irq() and enable_irq() operate on the same IRQ number.
Even if pci_free_irq_vectors() frees the IRQ concurrently, disable_irq()
and enable_irq() on a stale IRQ number is still valid and safe, and the
depth accounting reamins balanced.

task 1:
nvme_poll_irqdisable()
  disable_irq(pci_irq_vector(pdev, nvmeq->cq_vector)) ...(1)
  enable_irq(pci_irq_vector(pdev, nvmeq->cq_vector))  ...(3)

task 2:
nvme_reset_work()
  nvme_dev_disable()
    pdev->msix_enable = 0;  ...(2)

crash log:

------------[ cut here ]------------
Unbalanced enable for IRQ 10
WARNING: kernel/irq/manage.c:753 at __enable_irq+0x102/0x190 kernel/irq/manage.c:753, CPU#1: kworker/1:0H/26
Modules linked in:
CPU: 1 UID: 0 PID: 26 Comm: kworker/1:0H Not tainted 6.19.0-dirty #9 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
Workqueue: kblockd blk_mq_timeout_work
RIP: 0010:__enable_irq+0x107/0x190 kernel/irq/manage.c:753
Code: ff df 48 89 fa 48 c1 ea 03 0f b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 04 84 d2 75 79 48 8d 3d 2e 7a 3f 05 41 8b 74 24 2c <67> 48 0f b9 3a e8 ef b9 21 00 5b 41 5c 5d e9 46 54 66 03 e8 e1 b9
RSP: 0018:ffffc900001bf550 EFLAGS: 00010046
RAX: 0000000000000007 RBX: 0000000000000000 RCX: ffffffffb20c0e90
RDX: 0000000000000000 RSI: 000000000000000a RDI: ffffffffb74b88f0
RBP: ffffc900001bf560 R08: ffff88800197cf00 R09: 0000000000000001
R10: 0000000000000003 R11: 0000000000000003 R12: ffff8880012a6000
R13: 1ffff92000037eae R14: 000000000000000a R15: 0000000000000293
FS:  0000000000000000(0000) GS:ffff8880b49f7000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000555da4a25fa8 CR3: 00000000208e8000 CR4: 00000000000006f0
Call Trace:
 <TASK>
 enable_irq+0x121/0x1e0 kernel/irq/manage.c:797
 nvme_poll_irqdisable+0x162/0x1c0 drivers/nvme/host/pci.c:1494
 nvme_timeout+0x965/0x14b0 drivers/nvme/host/pci.c:1744
 blk_mq_rq_timed_out block/blk-mq.c:1653 [inline]
 blk_mq_handle_expired+0x227/0x2d0 block/blk-mq.c:1721
 bt_iter+0x2fc/0x3a0 block/blk-mq-tag.c:292
 __sbitmap_for_each_set include/linux/sbitmap.h:269 [inline]
 sbitmap_for_each_set include/linux/sbitmap.h:290 [inline]
 bt_for_each block/blk-mq-tag.c:324 [inline]
 blk_mq_queue_tag_busy_iter+0x969/0x1e80 block/blk-mq-tag.c:536
 blk_mq_timeout_work+0x627/0x870 block/blk-mq.c:1763
 process_one_work+0x956/0x1aa0 kernel/workqueue.c:3257
 process_scheduled_works kernel/workqueue.c:3340 [inline]
 worker_thread+0x65c/0xe60 kernel/workqueue.c:3421
 kthread+0x41a/0x930 kernel/kthread.c:463
 ret_from_fork+0x6f8/0x8c0 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246
 </TASK>
irq event stamp: 74478
hardirqs last  enabled at (74477): [<ffffffffb5720a9c>] __raw_spin_unlock_irq include/linux/spinlock_api_smp.h:159 [inline]
hardirqs last  enabled at (74477): [<ffffffffb5720a9c>] _raw_spin_unlock_irq+0x2c/0x60 kernel/locking/spinlock.c:202
hardirqs last disabled at (74478): [<ffffffffb57207b5>] __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:108 [inline]
hardirqs last disabled at (74478): [<ffffffffb57207b5>] _raw_spin_lock_irqsave+0x85/0xa0 kernel/locking/spinlock.c:162
softirqs last  enabled at (74304): [<ffffffffb1e9466c>] __do_softirq kernel/softirq.c:656 [inline]
softirqs last  enabled at (74304): [<ffffffffb1e9466c>] invoke_softirq kernel/softirq.c:496 [inline]
softirqs last  enabled at (74304): [<ffffffffb1e9466c>] __irq_exit_rcu+0xdc/0x120 kernel/softirq.c:723
softirqs last disabled at (74287): [<ffffffffb1e9466c>] __do_softirq kernel/softirq.c:656 [inline]
softirqs last disabled at (74287): [<ffffffffb1e9466c>] invoke_softirq kernel/softirq.c:496 [inline]
softirqs last disabled at (74287): [<ffffffffb1e9466c>] __irq_exit_rcu+0xdc/0x120 kernel/softirq.c:723
---[ end trace 0000000000000000 ]---

Fixes: fa059b856a59 (nvme-pci: Simplify nvme_poll_irqdisable)
Acked-by: Chao Shi <cshi008@fiu.edu>
Acked-by: Weidong Zhu <weizhu@fiu.edu>
Acked-by: Dave Tian <daveti@purdue.edu>
Signed-off-by: Sungwoo Kim <iam@sung-woo.kim>
---
 drivers/nvme/host/pci.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index d86f2565a..a7fb88aa8 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1484,14 +1484,16 @@ static irqreturn_t nvme_irq_check(int irq, void *data)
 static void nvme_poll_irqdisable(struct nvme_queue *nvmeq)
 {
 	struct pci_dev *pdev = to_pci_dev(nvmeq->dev->dev);
+	int irq;
 
 	WARN_ON_ONCE(test_bit(NVMEQ_POLLED, &nvmeq->flags));
 
-	disable_irq(pci_irq_vector(pdev, nvmeq->cq_vector));
+	irq = pci_irq_vector(pdev, nvmeq->cq_vector);
+	disable_irq(irq);
 	spin_lock(&nvmeq->cq_poll_lock);
 	nvme_poll_cq(nvmeq, NULL);
 	spin_unlock(&nvmeq->cq_poll_lock);
-	enable_irq(pci_irq_vector(pdev, nvmeq->cq_vector));
+	enable_irq(irq);
 }
 
 static int nvme_poll(struct blk_mq_hw_ctx *hctx, struct io_comp_batch *iob)
-- 
2.47.3



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] Fix race bug in nvme_poll_irqdisable()
  2026-03-07 19:46 [PATCH] Fix race bug in nvme_poll_irqdisable() Sungwoo Kim
@ 2026-03-07 20:15 ` Sungwoo Kim
  2026-03-10 13:57 ` Christoph Hellwig
  2026-03-10 14:25 ` Keith Busch
  2 siblings, 0 replies; 5+ messages in thread
From: Sungwoo Kim @ 2026-03-07 20:15 UTC (permalink / raw)
  To: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg
  Cc: Chao Shi, Weidong Zhu, Dave Tian, linux-nvme, linux-kernel

Also, this can permanently disable the first IRQ, preventing future
NVMe completions delivery.
Thus, bdev_open() and bdev_release() may stuck.

The task hung detector can detect this:

INFO: task syz-executor.2:15754 is blocked on a mutex likely owned by
task systemd-udevd:136.
task:systemd-udevd   state:D stack:0     pid:136   tgid:136   ppid:1
   task_flags:0x400100 flags:0x00080000
Call Trace:
 <TASK>
 context_switch kernel/sched/core.c:5260 [inline]
 __schedule+0xc92/0x54e0 kernel/sched/core.c:6867
 __schedule_loop kernel/sched/core.c:6949 [inline]
 schedule+0x87/0x300 kernel/sched/core.c:6964
 io_schedule+0xce/0x150 kernel/sched/core.c:7791
 folio_wait_bit_common+0x367/0x8b0 mm/filemap.c:1323
 folio_wait_bit_killable mm/filemap.c:1468 [inline]
 folio_wait_locked_killable include/linux/pagemap.h:1252 [inline]
 folio_wait_locked_killable include/linux/pagemap.h:1248 [inline]
 filemap_read_folio+0x175/0x2a0 mm/filemap.c:2502
 do_read_cache_folio+0x258/0x480 mm/filemap.c:4096
 read_cache_folio+0x5e/0x80 mm/filemap.c:4128
 read_mapping_folio include/linux/pagemap.h:1028 [inline]
 read_part_sector+0xd8/0x320 block/partitions/core.c:722
 read_lba+0x1cb/0x370 block/partitions/efi.c:247
 find_valid_gpt block/partitions/efi.c:602 [inline]
 efi_partition+0x285/0x2b10 block/partitions/efi.c:719
 check_partition block/partitions/core.c:141 [inline]
 blk_add_partitions block/partitions/core.c:589 [inline]
 bdev_disk_changed block/partitions/core.c:693 [inline]
 bdev_disk_changed+0x8dc/0x1710 block/partitions/core.c:642
 blkdev_get_whole+0x194/0x2a0 block/bdev.c:765
 bdev_open+0x4ba/0xeb0 block/bdev.c:974
 bdev_file_open_by_dev block/bdev.c:1076 [inline]
 bdev_file_open_by_dev+0x1b4/0x250 block/bdev.c:1051
 disk_scan_partitions+0x1f5/0x320 block/genhd.c:387
 blkdev_common_ioctl+0x3fb/0x29b0 block/ioctl.c:707
 blkdev_ioctl+0x299/0x700 block/ioctl.c:786
 vfs_ioctl fs/ioctl.c:51 [inline]
 __do_sys_ioctl fs/ioctl.c:597 [inline]
 __se_sys_ioctl fs/ioctl.c:583 [inline]
 __x64_sys_ioctl+0x1bf/0x220 fs/ioctl.c:583
 x64_sys_call+0x1280/0x21b0 arch/x86/include/generated/asm/syscalls_64.h:17
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x71/0x330 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f16e1d2e277
RSP: 002b:00007ffe72978038 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007ffe72978150 RCX: 00007f16e1d2e277
RDX: 0000000000000000 RSI: 000000000000125f RDI: 000000000000000e
RBP: 00007ffe729781d0 R08: 000055ed818ee543 R09: 00007f16e1e0bbe0
R10: 0000000000000000 R11: 0000000000000246 R12: 000055ed818f4071
R13: 000055ed818ee89f R14: 00007ffe72978090 R15: 000055edb8a23bd0
 </TASK>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] Fix race bug in nvme_poll_irqdisable()
  2026-03-07 19:46 [PATCH] Fix race bug in nvme_poll_irqdisable() Sungwoo Kim
  2026-03-07 20:15 ` Sungwoo Kim
@ 2026-03-10 13:57 ` Christoph Hellwig
  2026-03-10 14:25 ` Keith Busch
  2 siblings, 0 replies; 5+ messages in thread
From: Christoph Hellwig @ 2026-03-10 13:57 UTC (permalink / raw)
  To: Sungwoo Kim
  Cc: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
	Chao Shi, Weidong Zhu, Dave Tian, linux-nvme, linux-kernel

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] Fix race bug in nvme_poll_irqdisable()
  2026-03-07 19:46 [PATCH] Fix race bug in nvme_poll_irqdisable() Sungwoo Kim
  2026-03-07 20:15 ` Sungwoo Kim
  2026-03-10 13:57 ` Christoph Hellwig
@ 2026-03-10 14:25 ` Keith Busch
  2026-03-11 20:45   ` Sungwoo Kim
  2 siblings, 1 reply; 5+ messages in thread
From: Keith Busch @ 2026-03-10 14:25 UTC (permalink / raw)
  To: Sungwoo Kim
  Cc: Jens Axboe, Christoph Hellwig, Sagi Grimberg, Chao Shi,
	Weidong Zhu, Dave Tian, linux-nvme, linux-kernel

On Sat, Mar 07, 2026 at 02:46:36PM -0500, Sungwoo Kim wrote:
> In the following scenario, pdev can be disabled between (1) and (3) by
> (2). This sets pdev->msix_enabled = 0. Then, pci_irq_vector() will
> return MSI-X IRQ(>15) for (1) whereas return INTx IRQ(<=15) for (2).
> This causes IRQ warning because it tries to enable INTx IRQ that has
> never been disabled before.

Thanks, applied to nvme-7.0 with updated commit subject.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] Fix race bug in nvme_poll_irqdisable()
  2026-03-10 14:25 ` Keith Busch
@ 2026-03-11 20:45   ` Sungwoo Kim
  0 siblings, 0 replies; 5+ messages in thread
From: Sungwoo Kim @ 2026-03-11 20:45 UTC (permalink / raw)
  To: Keith Busch
  Cc: Jens Axboe, Christoph Hellwig, Sagi Grimberg, Chao Shi,
	Weidong Zhu, Dave Tian, linux-nvme, linux-kernel

On Tue, Mar 10, 2026 at 10:25 AM Keith Busch <kbusch@kernel.org> wrote:
>
> On Sat, Mar 07, 2026 at 02:46:36PM -0500, Sungwoo Kim wrote:
> > In the following scenario, pdev can be disabled between (1) and (3) by
> > (2). This sets pdev->msix_enabled = 0. Then, pci_irq_vector() will
> > return MSI-X IRQ(>15) for (1) whereas return INTx IRQ(<=15) for (2).
> > This causes IRQ warning because it tries to enable INTx IRQ that has
> > never been disabled before.
>
> Thanks, applied to nvme-7.0 with updated commit subject.
>

Thank you for updating the subject. I missed it :)
Sungwoo.


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-03-11 20:46 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-07 19:46 [PATCH] Fix race bug in nvme_poll_irqdisable() Sungwoo Kim
2026-03-07 20:15 ` Sungwoo Kim
2026-03-10 13:57 ` Christoph Hellwig
2026-03-10 14:25 ` Keith Busch
2026-03-11 20:45   ` Sungwoo Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox