From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 714FB34F24A for ; Wed, 13 May 2026 18:27:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778696858; cv=none; b=bIaQL9GYqI5VIGY0qVIE6zC4ttQZ0Qe4MW/uG8EmBVLJ3wPw5Sw6gLfvEiATP7/t9sJUyBPMr8qnRXOp3p1z3bYNjXhB+Yst/1hVuX+UiJnE0X/x8zlcOVGFXbDvid4JZSFF19rnujpu5pdrh474HpQXqvYn2foAOfuNqZqfrnI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778696858; c=relaxed/simple; bh=m7UgsZu2WOoo5TwerUJvbQ/ULWpIWi71T7A+TM8mRyk=; h=Date:From:To:Cc:Subject:Message-ID:MIME-Version:Content-Type: Content-Disposition:In-Reply-To; b=rBadIUeTJAUBIyIQ8d0rKwWa6PxynwyTrO0MthKTcbaIxF57XRTeF1RbKmuKqtU4vd2shNcy98PkYeUxHIKI7WxSqUtRMiXsQUn2r5e79XRgRMiSixLY+s9g5Za6h4dtbcSZtkzNa24VgO1TLPGGJYO/nT2U6hQmQTVtAdEIBec= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=FqcpsPaR; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="FqcpsPaR" Received: by smtp.kernel.org (Postfix) with ESMTPSA id E830BC2BCC6; Wed, 13 May 2026 18:27:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1778696858; bh=m7UgsZu2WOoo5TwerUJvbQ/ULWpIWi71T7A+TM8mRyk=; h=Date:From:To:Cc:Subject:In-Reply-To:From; b=FqcpsPaRd1tHhNkmoyTMsICifFPh9fAgyiE6INnbyWIQVVQLlxOHmmlvtr1gAGmnq pqhXP4tkA069+CW0mSLxCTYUNzGEUpf0OxD9MGzbaecyM2LP7VtesMjcFfwTFZf2bY 9iYDv8h5Vq3+1GCD0GmF/meeIgVO5YaICYZGDMV1L8tJCLmlHwoLXf76Bcp06qliXh XgKN+rRn7bMwJPO3+RKz9b3I+HNeWtoch6krJgEnDoMayIrBn7wqXFG7K4nHsHnkJN xSEiSRotUNBTIw4PHxT/8v3MNGJKkRN0Hi0jsq1C674oHDHMeBTa1ml3Rn+YSOVT2l Xv8bXr+FAkCvQ== Date: Wed, 13 May 2026 13:27:36 -0500 From: Bjorn Helgaas To: Sungwoo Kim Cc: Keith Busch , Jens Axboe , Christoph Hellwig , Sagi Grimberg , Chao Shi , Weidong Zhu , Dave Tian , linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] Fix race bug in nvme_poll_irqdisable() Message-ID: <20260513182736.GA324918@bhelgaas> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20260307194636.2755443-2-iam@sung-woo.kim> On Sat, Mar 07, 2026 at 02:46:36PM -0500, Sungwoo Kim wrote: > In the following scenario, pdev can be disabled between (1) and (3) by > (2). This sets pdev->msix_enabled = 0. Then, pci_irq_vector() will > return MSI-X IRQ(>15) for (1) whereas return INTx IRQ(<=15) for (2). > This causes IRQ warning because it tries to enable INTx IRQ that has > never been disabled before. > > To fix this, save IRQ number into a local variable and ensure > disable_irq() and enable_irq() operate on the same IRQ number. > Even if pci_free_irq_vectors() frees the IRQ concurrently, disable_irq() > and enable_irq() on a stale IRQ number is still valid and safe, and the > depth accounting reamins balanced. > > task 1: > nvme_poll_irqdisable() > disable_irq(pci_irq_vector(pdev, nvmeq->cq_vector)) ...(1) > enable_irq(pci_irq_vector(pdev, nvmeq->cq_vector)) ...(3) > > task 2: > nvme_reset_work() > nvme_dev_disable() > pdev->msix_enable = 0; ...(2) > ... > static void nvme_poll_irqdisable(struct nvme_queue *nvmeq) > { > struct pci_dev *pdev = to_pci_dev(nvmeq->dev->dev); > + int irq; > > WARN_ON_ONCE(test_bit(NVMEQ_POLLED, &nvmeq->flags)); > > - disable_irq(pci_irq_vector(pdev, nvmeq->cq_vector)); > + irq = pci_irq_vector(pdev, nvmeq->cq_vector); > + disable_irq(irq); > spin_lock(&nvmeq->cq_poll_lock); > nvme_poll_cq(nvmeq, NULL); > spin_unlock(&nvmeq->cq_poll_lock); > - enable_irq(pci_irq_vector(pdev, nvmeq->cq_vector)); > + enable_irq(irq); An internal run of sashiko complained about this, and I think it's right. As the commit log mentions, the cached IRQ number is stale if nvme_reset_work() frees all the vectors between (1) and (3). It's likely the pci_alloc_irq_vectors() in nvme_pci_enable() will get the same IRQ number, but it would be a coincidence, and it doesn't feel like a good idea to rely on it. First sashiko review: This commit caches the IRQ number in nvme_poll_irqdisable() to ensure disable_irq() and enable_irq() operate on the same IRQ number, preventing an unbalanced enable warning if the device is concurrently disabled. If pci_free_irq_vectors() frees the IRQ concurrently, is it possible for the IRQ number to be reallocated to a completely different device before enable_irq() is called? Could this cause an unbalanced enable or incorrectly unmask the interrupt for the new device? Second sashiko review: The commit caches the return value of pci_irq_vector() into a local variable to ensure disable_irq() and enable_irq() operate on the same IRQ number, preventing an unbalanced enable warning when pdev->msix_enabled is cleared concurrently. If pdev->msix_enabled is cleared concurrently and nvmeq->cq_vector > 0, pci_irq_vector() will return -EINVAL. Since disable_irq() takes an unsigned int, does this result in passing a very large unsigned number (like 0xffffffea) to the IRQ subsystem? The commit message notes that operating on a stale IRQ number is valid and safe. However, if pci_free_irq_vectors() frees the MSI-X vector concurrently, couldn't this cached irq number be recycled and reallocated to a completely different device before enable_irq(irq) executes? If the new device had explicitly disabled its IRQ, does this code inadvertently enable it, breaking the other device's synchronization?