[PATCH] Fix race bug in nvme_poll

Linux-NVME Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] Fix race bug in nvme_poll_irqdisable()
@ 2026-03-07 19:46 Sungwoo Kim
  2026-03-07 20:15 ` Sungwoo Kim
                   ` (4 more replies)
  0 siblings, 5 replies; 8+ messages in thread
From: Sungwoo Kim @ 2026-03-07 19:46 UTC (permalink / raw)
  To: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg
  Cc: Sungwoo Kim, Chao Shi, Weidong Zhu, Dave Tian, linux-nvme,
	linux-kernel

In the following scenario, pdev can be disabled between (1) and (3) by
(2). This sets pdev->msix_enabled = 0. Then, pci_irq_vector() will
return MSI-X IRQ(>15) for (1) whereas return INTx IRQ(<=15) for (2).
This causes IRQ warning because it tries to enable INTx IRQ that has
never been disabled before.

To fix this, save IRQ number into a local variable and ensure
disable_irq() and enable_irq() operate on the same IRQ number.
Even if pci_free_irq_vectors() frees the IRQ concurrently, disable_irq()
and enable_irq() on a stale IRQ number is still valid and safe, and the
depth accounting reamins balanced.

task 1:
nvme_poll_irqdisable()
  disable_irq(pci_irq_vector(pdev, nvmeq->cq_vector)) ...(1)
  enable_irq(pci_irq_vector(pdev, nvmeq->cq_vector))  ...(3)

task 2:
nvme_reset_work()
  nvme_dev_disable()
    pdev->msix_enable = 0;  ...(2)

crash log:

------------[ cut here ]------------
Unbalanced enable for IRQ 10
WARNING: kernel/irq/manage.c:753 at __enable_irq+0x102/0x190 kernel/irq/manage.c:753, CPU#1: kworker/1:0H/26
Modules linked in:
CPU: 1 UID: 0 PID: 26 Comm: kworker/1:0H Not tainted 6.19.0-dirty #9 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
Workqueue: kblockd blk_mq_timeout_work
RIP: 0010:__enable_irq+0x107/0x190 kernel/irq/manage.c:753
Code: ff df 48 89 fa 48 c1 ea 03 0f b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 04 84 d2 75 79 48 8d 3d 2e 7a 3f 05 41 8b 74 24 2c <67> 48 0f b9 3a e8 ef b9 21 00 5b 41 5c 5d e9 46 54 66 03 e8 e1 b9
RSP: 0018:ffffc900001bf550 EFLAGS: 00010046
RAX: 0000000000000007 RBX: 0000000000000000 RCX: ffffffffb20c0e90
RDX: 0000000000000000 RSI: 000000000000000a RDI: ffffffffb74b88f0
RBP: ffffc900001bf560 R08: ffff88800197cf00 R09: 0000000000000001
R10: 0000000000000003 R11: 0000000000000003 R12: ffff8880012a6000
R13: 1ffff92000037eae R14: 000000000000000a R15: 0000000000000293
FS:  0000000000000000(0000) GS:ffff8880b49f7000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000555da4a25fa8 CR3: 00000000208e8000 CR4: 00000000000006f0
Call Trace:
 <TASK>
 enable_irq+0x121/0x1e0 kernel/irq/manage.c:797
 nvme_poll_irqdisable+0x162/0x1c0 drivers/nvme/host/pci.c:1494
 nvme_timeout+0x965/0x14b0 drivers/nvme/host/pci.c:1744
 blk_mq_rq_timed_out block/blk-mq.c:1653 [inline]
 blk_mq_handle_expired+0x227/0x2d0 block/blk-mq.c:1721
 bt_iter+0x2fc/0x3a0 block/blk-mq-tag.c:292
 __sbitmap_for_each_set include/linux/sbitmap.h:269 [inline]
 sbitmap_for_each_set include/linux/sbitmap.h:290 [inline]
 bt_for_each block/blk-mq-tag.c:324 [inline]
 blk_mq_queue_tag_busy_iter+0x969/0x1e80 block/blk-mq-tag.c:536
 blk_mq_timeout_work+0x627/0x870 block/blk-mq.c:1763
 process_one_work+0x956/0x1aa0 kernel/workqueue.c:3257
 process_scheduled_works kernel/workqueue.c:3340 [inline]
 worker_thread+0x65c/0xe60 kernel/workqueue.c:3421
 kthread+0x41a/0x930 kernel/kthread.c:463
 ret_from_fork+0x6f8/0x8c0 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246
 </TASK>
irq event stamp: 74478
hardirqs last  enabled at (74477): [<ffffffffb5720a9c>] __raw_spin_unlock_irq include/linux/spinlock_api_smp.h:159 [inline]
hardirqs last  enabled at (74477): [<ffffffffb5720a9c>] _raw_spin_unlock_irq+0x2c/0x60 kernel/locking/spinlock.c:202
hardirqs last disabled at (74478): [<ffffffffb57207b5>] __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:108 [inline]
hardirqs last disabled at (74478): [<ffffffffb57207b5>] _raw_spin_lock_irqsave+0x85/0xa0 kernel/locking/spinlock.c:162
softirqs last  enabled at (74304): [<ffffffffb1e9466c>] __do_softirq kernel/softirq.c:656 [inline]
softirqs last  enabled at (74304): [<ffffffffb1e9466c>] invoke_softirq kernel/softirq.c:496 [inline]
softirqs last  enabled at (74304): [<ffffffffb1e9466c>] __irq_exit_rcu+0xdc/0x120 kernel/softirq.c:723
softirqs last disabled at (74287): [<ffffffffb1e9466c>] __do_softirq kernel/softirq.c:656 [inline]
softirqs last disabled at (74287): [<ffffffffb1e9466c>] invoke_softirq kernel/softirq.c:496 [inline]
softirqs last disabled at (74287): [<ffffffffb1e9466c>] __irq_exit_rcu+0xdc/0x120 kernel/softirq.c:723
---[ end trace 0000000000000000 ]---

Fixes: fa059b856a59 (nvme-pci: Simplify nvme_poll_irqdisable)
Acked-by: Chao Shi <cshi008@fiu.edu>
Acked-by: Weidong Zhu <weizhu@fiu.edu>
Acked-by: Dave Tian <daveti@purdue.edu>
Signed-off-by: Sungwoo Kim <iam@sung-woo.kim>
---
 drivers/nvme/host/pci.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index d86f2565a..a7fb88aa8 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1484,14 +1484,16 @@ static irqreturn_t nvme_irq_check(int irq, void *data)
 static void nvme_poll_irqdisable(struct nvme_queue *nvmeq)
 {
 	struct pci_dev *pdev = to_pci_dev(nvmeq->dev->dev);
+	int irq;
 
 	WARN_ON_ONCE(test_bit(NVMEQ_POLLED, &nvmeq->flags));
 
-	disable_irq(pci_irq_vector(pdev, nvmeq->cq_vector));
+	irq = pci_irq_vector(pdev, nvmeq->cq_vector);
+	disable_irq(irq);
 	spin_lock(&nvmeq->cq_poll_lock);
 	nvme_poll_cq(nvmeq, NULL);
 	spin_unlock(&nvmeq->cq_poll_lock);
-	enable_irq(pci_irq_vector(pdev, nvmeq->cq_vector));
+	enable_irq(irq);
 }
 
 static int nvme_poll(struct blk_mq_hw_ctx *hctx, struct io_comp_batch *iob)
-- 
2.47.3



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] Fix race bug in nvme_poll_irqdisable()
  2026-03-07 19:46 [PATCH] Fix race bug in nvme_poll_irqdisable() Sungwoo Kim
@ 2026-03-07 20:15 ` Sungwoo Kim
  2026-03-10 13:57 ` Christoph Hellwig
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Sungwoo Kim @ 2026-03-07 20:15 UTC (permalink / raw)
  To: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg
  Cc: Chao Shi, Weidong Zhu, Dave Tian, linux-nvme, linux-kernel

Also, this can permanently disable the first IRQ, preventing future
NVMe completions delivery.
Thus, bdev_open() and bdev_release() may stuck.

The task hung detector can detect this:

INFO: task syz-executor.2:15754 is blocked on a mutex likely owned by
task systemd-udevd:136.
task:systemd-udevd   state:D stack:0     pid:136   tgid:136   ppid:1
   task_flags:0x400100 flags:0x00080000
Call Trace:
 <TASK>
 context_switch kernel/sched/core.c:5260 [inline]
 __schedule+0xc92/0x54e0 kernel/sched/core.c:6867
 __schedule_loop kernel/sched/core.c:6949 [inline]
 schedule+0x87/0x300 kernel/sched/core.c:6964
 io_schedule+0xce/0x150 kernel/sched/core.c:7791
 folio_wait_bit_common+0x367/0x8b0 mm/filemap.c:1323
 folio_wait_bit_killable mm/filemap.c:1468 [inline]
 folio_wait_locked_killable include/linux/pagemap.h:1252 [inline]
 folio_wait_locked_killable include/linux/pagemap.h:1248 [inline]
 filemap_read_folio+0x175/0x2a0 mm/filemap.c:2502
 do_read_cache_folio+0x258/0x480 mm/filemap.c:4096
 read_cache_folio+0x5e/0x80 mm/filemap.c:4128
 read_mapping_folio include/linux/pagemap.h:1028 [inline]
 read_part_sector+0xd8/0x320 block/partitions/core.c:722
 read_lba+0x1cb/0x370 block/partitions/efi.c:247
 find_valid_gpt block/partitions/efi.c:602 [inline]
 efi_partition+0x285/0x2b10 block/partitions/efi.c:719
 check_partition block/partitions/core.c:141 [inline]
 blk_add_partitions block/partitions/core.c:589 [inline]
 bdev_disk_changed block/partitions/core.c:693 [inline]
 bdev_disk_changed+0x8dc/0x1710 block/partitions/core.c:642
 blkdev_get_whole+0x194/0x2a0 block/bdev.c:765
 bdev_open+0x4ba/0xeb0 block/bdev.c:974
 bdev_file_open_by_dev block/bdev.c:1076 [inline]
 bdev_file_open_by_dev+0x1b4/0x250 block/bdev.c:1051
 disk_scan_partitions+0x1f5/0x320 block/genhd.c:387
 blkdev_common_ioctl+0x3fb/0x29b0 block/ioctl.c:707
 blkdev_ioctl+0x299/0x700 block/ioctl.c:786
 vfs_ioctl fs/ioctl.c:51 [inline]
 __do_sys_ioctl fs/ioctl.c:597 [inline]
 __se_sys_ioctl fs/ioctl.c:583 [inline]
 __x64_sys_ioctl+0x1bf/0x220 fs/ioctl.c:583
 x64_sys_call+0x1280/0x21b0 arch/x86/include/generated/asm/syscalls_64.h:17
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x71/0x330 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f16e1d2e277
RSP: 002b:00007ffe72978038 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007ffe72978150 RCX: 00007f16e1d2e277
RDX: 0000000000000000 RSI: 000000000000125f RDI: 000000000000000e
RBP: 00007ffe729781d0 R08: 000055ed818ee543 R09: 00007f16e1e0bbe0
R10: 0000000000000000 R11: 0000000000000246 R12: 000055ed818f4071
R13: 000055ed818ee89f R14: 00007ffe72978090 R15: 000055edb8a23bd0
 </TASK>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Fix race bug in nvme_poll_irqdisable()
  2026-03-07 19:46 [PATCH] Fix race bug in nvme_poll_irqdisable() Sungwoo Kim
  2026-03-07 20:15 ` Sungwoo Kim
@ 2026-03-10 13:57 ` Christoph Hellwig
  2026-03-10 14:25 ` Keith Busch
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Christoph Hellwig @ 2026-03-10 13:57 UTC (permalink / raw)
  To: Sungwoo Kim
  Cc: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
	Chao Shi, Weidong Zhu, Dave Tian, linux-nvme, linux-kernel

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Fix race bug in nvme_poll_irqdisable()
  2026-03-07 19:46 [PATCH] Fix race bug in nvme_poll_irqdisable() Sungwoo Kim
  2026-03-07 20:15 ` Sungwoo Kim
  2026-03-10 13:57 ` Christoph Hellwig
@ 2026-03-10 14:25 ` Keith Busch
  2026-03-11 20:45   ` Sungwoo Kim
  2026-05-13 18:27 ` Bjorn Helgaas
  2026-05-14  7:56 ` Sagi Grimberg
  4 siblings, 1 reply; 8+ messages in thread
From: Keith Busch @ 2026-03-10 14:25 UTC (permalink / raw)
  To: Sungwoo Kim
  Cc: Jens Axboe, Christoph Hellwig, Sagi Grimberg, Chao Shi,
	Weidong Zhu, Dave Tian, linux-nvme, linux-kernel

On Sat, Mar 07, 2026 at 02:46:36PM -0500, Sungwoo Kim wrote:
> In the following scenario, pdev can be disabled between (1) and (3) by
> (2). This sets pdev->msix_enabled = 0. Then, pci_irq_vector() will
> return MSI-X IRQ(>15) for (1) whereas return INTx IRQ(<=15) for (2).
> This causes IRQ warning because it tries to enable INTx IRQ that has
> never been disabled before.

Thanks, applied to nvme-7.0 with updated commit subject.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Fix race bug in nvme_poll_irqdisable()
  2026-03-10 14:25 ` Keith Busch
@ 2026-03-11 20:45   ` Sungwoo Kim
  0 siblings, 0 replies; 8+ messages in thread
From: Sungwoo Kim @ 2026-03-11 20:45 UTC (permalink / raw)
  To: Keith Busch
  Cc: Jens Axboe, Christoph Hellwig, Sagi Grimberg, Chao Shi,
	Weidong Zhu, Dave Tian, linux-nvme, linux-kernel

On Tue, Mar 10, 2026 at 10:25 AM Keith Busch <kbusch@kernel.org> wrote:
>
> On Sat, Mar 07, 2026 at 02:46:36PM -0500, Sungwoo Kim wrote:
> > In the following scenario, pdev can be disabled between (1) and (3) by
> > (2). This sets pdev->msix_enabled = 0. Then, pci_irq_vector() will
> > return MSI-X IRQ(>15) for (1) whereas return INTx IRQ(<=15) for (2).
> > This causes IRQ warning because it tries to enable INTx IRQ that has
> > never been disabled before.
>
> Thanks, applied to nvme-7.0 with updated commit subject.
>

Thank you for updating the subject. I missed it :)
Sungwoo.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Fix race bug in nvme_poll_irqdisable()
  2026-03-07 19:46 [PATCH] Fix race bug in nvme_poll_irqdisable() Sungwoo Kim
                   ` (2 preceding siblings ...)
  2026-03-10 14:25 ` Keith Busch
@ 2026-05-13 18:27 ` Bjorn Helgaas
  2026-05-14 14:22   ` Keith Busch
  2026-05-14  7:56 ` Sagi Grimberg
  4 siblings, 1 reply; 8+ messages in thread
From: Bjorn Helgaas @ 2026-05-13 18:27 UTC (permalink / raw)
  To: Sungwoo Kim
  Cc: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
	Chao Shi, Weidong Zhu, Dave Tian, linux-nvme, linux-kernel

On Sat, Mar 07, 2026 at 02:46:36PM -0500, Sungwoo Kim wrote:
> In the following scenario, pdev can be disabled between (1) and (3) by
> (2). This sets pdev->msix_enabled = 0. Then, pci_irq_vector() will
> return MSI-X IRQ(>15) for (1) whereas return INTx IRQ(<=15) for (2).
> This causes IRQ warning because it tries to enable INTx IRQ that has
> never been disabled before.
> 
> To fix this, save IRQ number into a local variable and ensure
> disable_irq() and enable_irq() operate on the same IRQ number.
> Even if pci_free_irq_vectors() frees the IRQ concurrently, disable_irq()
> and enable_irq() on a stale IRQ number is still valid and safe, and the
> depth accounting reamins balanced.
> 
> task 1:
> nvme_poll_irqdisable()
>   disable_irq(pci_irq_vector(pdev, nvmeq->cq_vector)) ...(1)
>   enable_irq(pci_irq_vector(pdev, nvmeq->cq_vector))  ...(3)
> 
> task 2:
> nvme_reset_work()
>   nvme_dev_disable()
>     pdev->msix_enable = 0;  ...(2)
> ...

>  static void nvme_poll_irqdisable(struct nvme_queue *nvmeq)
>  {
>  	struct pci_dev *pdev = to_pci_dev(nvmeq->dev->dev);
> +	int irq;
>  
>  	WARN_ON_ONCE(test_bit(NVMEQ_POLLED, &nvmeq->flags));
>  
> -	disable_irq(pci_irq_vector(pdev, nvmeq->cq_vector));
> +	irq = pci_irq_vector(pdev, nvmeq->cq_vector);
> +	disable_irq(irq);
>  	spin_lock(&nvmeq->cq_poll_lock);
>  	nvme_poll_cq(nvmeq, NULL);
>  	spin_unlock(&nvmeq->cq_poll_lock);
> -	enable_irq(pci_irq_vector(pdev, nvmeq->cq_vector));
> +	enable_irq(irq);

An internal run of sashiko complained about this, and I think it's
right.  As the commit log mentions, the cached IRQ number is stale if
nvme_reset_work() frees all the vectors between (1) and (3).  It's
likely the pci_alloc_irq_vectors() in nvme_pci_enable() will get the
same IRQ number, but it would be a coincidence, and it doesn't feel
like a good idea to rely on it.

First sashiko review:

  This commit caches the IRQ number in nvme_poll_irqdisable() to
  ensure disable_irq() and enable_irq() operate on the same IRQ
  number, preventing an unbalanced enable warning if the device is
  concurrently disabled.

  If pci_free_irq_vectors() frees the IRQ concurrently, is it possible
  for the IRQ number to be reallocated to a completely different
  device before enable_irq() is called? Could this cause an unbalanced
  enable or incorrectly unmask the interrupt for the new device?

Second sashiko review:

  The commit caches the return value of pci_irq_vector() into a local
  variable to ensure disable_irq() and enable_irq() operate on the
  same IRQ number, preventing an unbalanced enable warning when
  pdev->msix_enabled is cleared concurrently.

  If pdev->msix_enabled is cleared concurrently and nvmeq->cq_vector >
  0, pci_irq_vector() will return -EINVAL. Since disable_irq() takes
  an unsigned int, does this result in passing a very large unsigned
  number (like 0xffffffea) to the IRQ subsystem?

  The commit message notes that operating on a stale IRQ number is
  valid and safe. However, if pci_free_irq_vectors() frees the MSI-X
  vector concurrently, couldn't this cached irq number be recycled and
  reallocated to a completely different device before enable_irq(irq)
  executes? If the new device had explicitly disabled its IRQ, does
  this code inadvertently enable it, breaking the other device's
  synchronization?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Fix race bug in nvme_poll_irqdisable()
  2026-03-07 19:46 [PATCH] Fix race bug in nvme_poll_irqdisable() Sungwoo Kim
                   ` (3 preceding siblings ...)
  2026-05-13 18:27 ` Bjorn Helgaas
@ 2026-05-14  7:56 ` Sagi Grimberg
  4 siblings, 0 replies; 8+ messages in thread
From: Sagi Grimberg @ 2026-05-14  7:56 UTC (permalink / raw)
  To: Sungwoo Kim, Keith Busch, Jens Axboe, Christoph Hellwig
  Cc: Chao Shi, Weidong Zhu, Dave Tian, linux-nvme, linux-kernel

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Fix race bug in nvme_poll_irqdisable()
  2026-05-13 18:27 ` Bjorn Helgaas
@ 2026-05-14 14:22   ` Keith Busch
  0 siblings, 0 replies; 8+ messages in thread
From: Keith Busch @ 2026-05-14 14:22 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Sungwoo Kim, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
	Chao Shi, Weidong Zhu, Dave Tian, linux-nvme, linux-kernel

On Wed, May 13, 2026 at 01:27:36PM -0500, Bjorn Helgaas wrote:
> On Sat, Mar 07, 2026 at 02:46:36PM -0500, Sungwoo Kim wrote:
> > In the following scenario, pdev can be disabled between (1) and (3) by
> > (2). This sets pdev->msix_enabled = 0. Then, pci_irq_vector() will
> > return MSI-X IRQ(>15) for (1) whereas return INTx IRQ(<=15) for (2).
> > This causes IRQ warning because it tries to enable INTx IRQ that has
> > never been disabled before.
> > 
> > To fix this, save IRQ number into a local variable and ensure
> > disable_irq() and enable_irq() operate on the same IRQ number.
> > Even if pci_free_irq_vectors() frees the IRQ concurrently, disable_irq()
> > and enable_irq() on a stale IRQ number is still valid and safe, and the
> > depth accounting reamins balanced.
> > 
> > task 1:
> > nvme_poll_irqdisable()
> >   disable_irq(pci_irq_vector(pdev, nvmeq->cq_vector)) ...(1)
> >   enable_irq(pci_irq_vector(pdev, nvmeq->cq_vector))  ...(3)
> > 
> > task 2:
> > nvme_reset_work()
> >   nvme_dev_disable()
> >     pdev->msix_enable = 0;  ...(2)
> > ...
> 
> >  static void nvme_poll_irqdisable(struct nvme_queue *nvmeq)
> >  {
> >  	struct pci_dev *pdev = to_pci_dev(nvmeq->dev->dev);
> > +	int irq;
> >  
> >  	WARN_ON_ONCE(test_bit(NVMEQ_POLLED, &nvmeq->flags));
> >  
> > -	disable_irq(pci_irq_vector(pdev, nvmeq->cq_vector));
> > +	irq = pci_irq_vector(pdev, nvmeq->cq_vector);
> > +	disable_irq(irq);
> >  	spin_lock(&nvmeq->cq_poll_lock);
> >  	nvme_poll_cq(nvmeq, NULL);
> >  	spin_unlock(&nvmeq->cq_poll_lock);
> > -	enable_irq(pci_irq_vector(pdev, nvmeq->cq_vector));
> > +	enable_irq(irq);
> 
> An internal run of sashiko complained about this, and I think it's
> right.  As the commit log mentions, the cached IRQ number is stale if
> nvme_reset_work() frees all the vectors between (1) and (3).  It's
> likely the pci_alloc_irq_vectors() in nvme_pci_enable() will get the
> same IRQ number, but it would be a coincidence, and it doesn't feel
> like a good idea to rely on it.
> 
> First sashiko review:
> 
>   This commit caches the IRQ number in nvme_poll_irqdisable() to
>   ensure disable_irq() and enable_irq() operate on the same IRQ
>   number, preventing an unbalanced enable warning if the device is
>   concurrently disabled.
> 
>   If pci_free_irq_vectors() frees the IRQ concurrently, is it possible
>   for the IRQ number to be reallocated to a completely different
>   device before enable_irq() is called? Could this cause an unbalanced
>   enable or incorrectly unmask the interrupt for the new device?

Yeah, that looks legit. This should fix it:

---
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 139a10cd687f9..34845d73cb3ab 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1885,8 +1885,12 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req)
 	 */
 	if (test_bit(NVMEQ_POLLED, &nvmeq->flags))
 		nvme_poll(req->mq_hctx, NULL);
-	else
-		nvme_poll_irqdisable(nvmeq);
+	else {
+		mutex_lock(&dev->shutdown_lock);
+		if (test_bit(NVMEQ_ENABLED, &nvmeq->flags))
+			nvme_poll_irqdisable(nvmeq);
+		mutex_unlock(&dev->shutdown_lock);
+	}
 
 	if (blk_mq_rq_state(req) != MQ_RQ_IN_FLIGHT) {
 		dev_warn(dev->ctrl.device,
--


^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-05-14 14:22 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-07 19:46 [PATCH] Fix race bug in nvme_poll_irqdisable() Sungwoo Kim
2026-03-07 20:15 ` Sungwoo Kim
2026-03-10 13:57 ` Christoph Hellwig
2026-03-10 14:25 ` Keith Busch
2026-03-11 20:45   ` Sungwoo Kim
2026-05-13 18:27 ` Bjorn Helgaas
2026-05-14 14:22   ` Keith Busch
2026-05-14  7:56 ` Sagi Grimberg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox