public inbox for linux-pci@vger.kernel.org
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: Niklas Cassel <cassel@kernel.org>
Cc: "Jingoo Han" <jingoohan1@gmail.com>,
	"Manivannan Sadhasivam" <mani@kernel.org>,
	"Lorenzo Pieralisi" <lpieralisi@kernel.org>,
	"Krzysztof Wilczyński" <kwilczynski@kernel.org>,
	"Rob Herring" <robh@kernel.org>,
	"Bjorn Helgaas" <bhelgaas@google.com>,
	"Koichiro Den" <den@valinux.co.jp>,
	"Shinichiro Kawasaki" <shinichiro.kawasaki@wdc.com>,
	linux-pci@vger.kernel.org
Subject: Re: [PATCH] PCI: dwc: ep: Fix regression in dw_pcie_ep_raise_msi_irq()
Date: Tue, 10 Feb 2026 13:32:05 -0600	[thread overview]
Message-ID: <20260210193205.GA41950@bhelgaas> (raw)
In-Reply-To: <20260210181225.3926165-2-cassel@kernel.org>

On Tue, Feb 10, 2026 at 07:12:25PM +0100, Niklas Cassel wrote:
> When using the nvmet-pci-epf EPF driver, and starting the EP before
> starting a host with UEFI, the UEFI performs NVMe commands e.g.
> Identify Controller, to get the name of the controller.
> 
> nvmet-pci-epf will post the CQE (completion queue entry) to the Admin
> Completion Queue, and then raise an IRQ (using
> dw_pcie_ep_raise_msi_irq()).
> 
> Once the host boots Linux, we will see a WARN_ON_ONCE() from
> dw_pcie_ep_raise_msi_irq(), and then the booting of the host hangs,
> because it never gets an IRQ when loading the nvme driver.
> 
> The reason is that the MSI target address used by UEFI and Linux might
> be different, which will cause dw_pcie_ep_raise_msi_irq() to simply
> return -EINVAL.
> 
> This was working before commit 8719c64e76bf ("PCI: dwc: ep: Cache MSI
> outbound iATU mapping"), so this is a regression.
> 
> Also, remove the warning, as we cannot know if there are operations in
> flight or not, so it seems wrong to print this warning unconditionally
> at every boot when e.g. nvmet-pci-epf is used with a host with UEFI.
> 
> Fixes: 8719c64e76bf ("PCI: dwc: ep: Cache MSI outbound iATU mapping")
> Signed-off-by: Niklas Cassel <cassel@kernel.org>

IIUC, the sequence is like this:

  - nvmet-pci-epf calls pci_epc_raise_irq() leading to
    dw_pcie_ep_raise_msi_irq()

  - dw_pcie_ep_raise_msi_irq() reads PCI_MSI_ADDRESS_*, maps msg_addr,
    and saves it in ep->msi_msg_addr

  - host updates PCI_MSI_ADDRESS_*

  - nvmet-pci-epf calls pci_epc_raise_irq() again

  - dw_pcie_ep_raise_msi_irq() reads PCI_MSI_ADDRESS_*, notices that
    msg_addr has changed, and WARNs and returns -EINVAL

and this patch makes it so the second time through
dw_pcie_ep_raise_msi_irq(), we notice the msg_addr change, remove the
old mapping, and map it again with the new address.

Isn't there still a race between host updates of PCI_MSI_ADDRESS_* and
endpoint reads of those registers?  We can't prevent the host from
updating PCI_MSI_ADDRESS_* between dw_pcie_ep_map_addr() and the
writel(), so maybe it's impossible to prevent the theoretical race
there, and all we can really do is mitigate what we expect to be a
single change at boot time of the host?

Even for that single change, it looks like the host could update
PCI_MSI_ADDRESS_* simultaneously with dw_pcie_ep_raise_msi_irq(), 
leading to mapping a half-updated msg_addr.  This part we *could*
prevent by re-reading PCI_MSI_ADDRESS_* to detect a partial update.

Probably unrelated question: the pci_epc_raise_irq() path doesn't seem
to check PCI_MSI_FLAGS_ENABLE or PCI_MSIX_FLAGS_ENABLE.  Is that
intended?

> ---
>  .../pci/controller/dwc/pcie-designware-ep.c   | 22 +++++++++++--------
>  1 file changed, 13 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/pci/controller/dwc/pcie-designware-ep.c b/drivers/pci/controller/dwc/pcie-designware-ep.c
> index 7e7844ff0f7e..5d8024d5e5c6 100644
> --- a/drivers/pci/controller/dwc/pcie-designware-ep.c
> +++ b/drivers/pci/controller/dwc/pcie-designware-ep.c
> @@ -896,6 +896,19 @@ int dw_pcie_ep_raise_msi_irq(struct dw_pcie_ep *ep, u8 func_no,
>  	 * supported, so we avoid reprogramming the region on every MSI,
>  	 * specifically unmapping immediately after writel().
>  	 */
> +	if (ep->msi_iatu_mapped && (ep->msi_msg_addr != msg_addr ||
> +				    ep->msi_map_size != map_size)) {
> +		/*
> +		 * The host changed the MSI target address or the required
> +		 * mapping size changed. Reprogramming the iATU when there are
> +		 * operations in flight is unsafe on this controller. However,
> +		 * there is no unified way to check if we have operations in
> +		 * flight, thus we don't know if we should WARN() or not.
> +		 */
> +		dw_pcie_ep_unmap_addr(epc, func_no, 0, ep->msi_mem_phys);
> +		ep->msi_iatu_mapped = false;
> +	}
> +
>  	if (!ep->msi_iatu_mapped) {
>  		ret = dw_pcie_ep_map_addr(epc, func_no, 0,
>  					  ep->msi_mem_phys, msg_addr,
> @@ -906,15 +919,6 @@ int dw_pcie_ep_raise_msi_irq(struct dw_pcie_ep *ep, u8 func_no,
>  		ep->msi_iatu_mapped = true;
>  		ep->msi_msg_addr = msg_addr;
>  		ep->msi_map_size = map_size;
> -	} else if (WARN_ON_ONCE(ep->msi_msg_addr != msg_addr ||
> -				ep->msi_map_size != map_size)) {
> -		/*
> -		 * The host changed the MSI target address or the required
> -		 * mapping size changed. Reprogramming the iATU at runtime is
> -		 * unsafe on this controller, so bail out instead of trying to
> -		 * update the existing region.
> -		 */
> -		return -EINVAL;
>  	}
>  
>  	writel(msg_data | (interrupt_num - 1), ep->msi_mem + offset);
> 
> base-commit: 43d324eeb08c3dd9fff7eb9a2c617afd3b96e65c
> -- 
> 2.53.0
> 

  reply	other threads:[~2026-02-10 19:32 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-10 18:12 [PATCH] PCI: dwc: ep: Fix regression in dw_pcie_ep_raise_msi_irq() Niklas Cassel
2026-02-10 19:32 ` Bjorn Helgaas [this message]
2026-02-10 20:22   ` Niklas Cassel
2026-02-10 20:33     ` Niklas Cassel
2026-02-10 20:39     ` Bjorn Helgaas
2026-02-11  8:52       ` Niklas Cassel
2026-02-11 18:08         ` Bjorn Helgaas
2026-02-25 14:59     ` Manivannan Sadhasivam
2026-02-11 16:44 ` Koichiro Den
2026-02-12  9:42 ` Shinichiro Kawasaki
2026-02-25 15:01 ` Manivannan Sadhasivam
2026-02-25 15:51   ` Niklas Cassel
2026-02-25 16:30     ` Manivannan Sadhasivam
2026-02-25 20:05 ` Bjorn Helgaas
2026-02-25 21:56   ` Niklas Cassel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260210193205.GA41950@bhelgaas \
    --to=helgaas@kernel.org \
    --cc=bhelgaas@google.com \
    --cc=cassel@kernel.org \
    --cc=den@valinux.co.jp \
    --cc=jingoohan1@gmail.com \
    --cc=kwilczynski@kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lpieralisi@kernel.org \
    --cc=mani@kernel.org \
    --cc=robh@kernel.org \
    --cc=shinichiro.kawasaki@wdc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox