From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A9F14299944 for ; Thu, 14 May 2026 14:22:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778768561; cv=none; b=tUCI30rQjD6ALo7oSm5TP8f9wFFpJmphVuRwSIeOqirFwBLmTtK6wLa50WnlODK169m+aGaV9RXN7r1tVMoCrDvWMvSGGUl8MqBrQHslkpaUMQsDc2mqdH5Iqky8VEfgic7SpgKJhroQTlUtGtDDbX41QMQwODRhpY6gRjX8Dpk= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778768561; c=relaxed/simple; bh=f/UNxw30b2EGW9YHla+43oYlmJOOtuZJSU9HEYT2C1Y=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=gTeCVlPgt/HHblljPbPW2LMv0o/nIsLTRJRwhX6J+dorsLFUA8opy5pfFnPyXsBD/KjY6yIUTnJoYFhE1x9Am+92HOa0jYSaIuu+UL7f7S5OlVbMG9vuqbIjKShJ0+wizDRrNowcCXDsDWeIuQuIHW7EAB/8sk2PJLkGHutqhkI= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=k2+A3n8d; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="k2+A3n8d" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2270BC2BCB3; Thu, 14 May 2026 14:22:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1778768561; bh=f/UNxw30b2EGW9YHla+43oYlmJOOtuZJSU9HEYT2C1Y=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=k2+A3n8dwaX97anNJARl4tn+pCcGn9p81ndwioLquy/Vrrv7DoQmm+6fztm4hbJYR OjiF99+IgubeHvsoeLE7+xF24pA8hZK2bHngQZb9vbrTMjlNXDupJKuN39HqlYx891 TNr4j1N5BP4AotWiaOuedBmqskzV7hV3krebN3uqMJTyqqmEiLJNq67Lr1ZfmCBgnk U5OtJivk03oUbSi8g6tJDvOGJ0YosCP+QSdxClmTpxFmplLPqIUiDob7aaGXfjG4iY 6f1apkIdXtURZrdSLCVLJ+WIr7KW2PxYZ4g5pP+Q3N0iYElKaIbZqrPZq6zugJWtHC IHbFOqWCDF1IA== Date: Thu, 14 May 2026 08:22:39 -0600 From: Keith Busch To: Bjorn Helgaas Cc: Sungwoo Kim , Jens Axboe , Christoph Hellwig , Sagi Grimberg , Chao Shi , Weidong Zhu , Dave Tian , linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] Fix race bug in nvme_poll_irqdisable() Message-ID: References: <20260307194636.2755443-2-iam@sung-woo.kim> <20260513182736.GA324918@bhelgaas> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20260513182736.GA324918@bhelgaas> On Wed, May 13, 2026 at 01:27:36PM -0500, Bjorn Helgaas wrote: > On Sat, Mar 07, 2026 at 02:46:36PM -0500, Sungwoo Kim wrote: > > In the following scenario, pdev can be disabled between (1) and (3) by > > (2). This sets pdev->msix_enabled = 0. Then, pci_irq_vector() will > > return MSI-X IRQ(>15) for (1) whereas return INTx IRQ(<=15) for (2). > > This causes IRQ warning because it tries to enable INTx IRQ that has > > never been disabled before. > > > > To fix this, save IRQ number into a local variable and ensure > > disable_irq() and enable_irq() operate on the same IRQ number. > > Even if pci_free_irq_vectors() frees the IRQ concurrently, disable_irq() > > and enable_irq() on a stale IRQ number is still valid and safe, and the > > depth accounting reamins balanced. > > > > task 1: > > nvme_poll_irqdisable() > > disable_irq(pci_irq_vector(pdev, nvmeq->cq_vector)) ...(1) > > enable_irq(pci_irq_vector(pdev, nvmeq->cq_vector)) ...(3) > > > > task 2: > > nvme_reset_work() > > nvme_dev_disable() > > pdev->msix_enable = 0; ...(2) > > ... > > > static void nvme_poll_irqdisable(struct nvme_queue *nvmeq) > > { > > struct pci_dev *pdev = to_pci_dev(nvmeq->dev->dev); > > + int irq; > > > > WARN_ON_ONCE(test_bit(NVMEQ_POLLED, &nvmeq->flags)); > > > > - disable_irq(pci_irq_vector(pdev, nvmeq->cq_vector)); > > + irq = pci_irq_vector(pdev, nvmeq->cq_vector); > > + disable_irq(irq); > > spin_lock(&nvmeq->cq_poll_lock); > > nvme_poll_cq(nvmeq, NULL); > > spin_unlock(&nvmeq->cq_poll_lock); > > - enable_irq(pci_irq_vector(pdev, nvmeq->cq_vector)); > > + enable_irq(irq); > > An internal run of sashiko complained about this, and I think it's > right. As the commit log mentions, the cached IRQ number is stale if > nvme_reset_work() frees all the vectors between (1) and (3). It's > likely the pci_alloc_irq_vectors() in nvme_pci_enable() will get the > same IRQ number, but it would be a coincidence, and it doesn't feel > like a good idea to rely on it. > > First sashiko review: > > This commit caches the IRQ number in nvme_poll_irqdisable() to > ensure disable_irq() and enable_irq() operate on the same IRQ > number, preventing an unbalanced enable warning if the device is > concurrently disabled. > > If pci_free_irq_vectors() frees the IRQ concurrently, is it possible > for the IRQ number to be reallocated to a completely different > device before enable_irq() is called? Could this cause an unbalanced > enable or incorrectly unmask the interrupt for the new device? Yeah, that looks legit. This should fix it: --- diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 139a10cd687f9..34845d73cb3ab 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -1885,8 +1885,12 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req) */ if (test_bit(NVMEQ_POLLED, &nvmeq->flags)) nvme_poll(req->mq_hctx, NULL); - else - nvme_poll_irqdisable(nvmeq); + else { + mutex_lock(&dev->shutdown_lock); + if (test_bit(NVMEQ_ENABLED, &nvmeq->flags)) + nvme_poll_irqdisable(nvmeq); + mutex_unlock(&dev->shutdown_lock); + } if (blk_mq_rq_state(req) != MQ_RQ_IN_FLIGHT) { dev_warn(dev->ctrl.device, --