All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: Niklas Cassel <cassel@kernel.org>
Cc: "Jingoo Han" <jingoohan1@gmail.com>,
	"Manivannan Sadhasivam" <mani@kernel.org>,
	"Lorenzo Pieralisi" <lpieralisi@kernel.org>,
	"Krzysztof Wilczyński" <kwilczynski@kernel.org>,
	"Rob Herring" <robh@kernel.org>,
	"Bjorn Helgaas" <bhelgaas@google.com>,
	"Koichiro Den" <den@valinux.co.jp>,
	"Shinichiro Kawasaki" <shinichiro.kawasaki@wdc.com>,
	linux-pci@vger.kernel.org
Subject: Re: [PATCH] PCI: dwc: ep: Fix regression in dw_pcie_ep_raise_msi_irq()
Date: Tue, 10 Feb 2026 13:32:05 -0600	[thread overview]
Message-ID: <20260210193205.GA41950@bhelgaas> (raw)
In-Reply-To: <20260210181225.3926165-2-cassel@kernel.org>

On Tue, Feb 10, 2026 at 07:12:25PM +0100, Niklas Cassel wrote:
> When using the nvmet-pci-epf EPF driver, and starting the EP before
> starting a host with UEFI, the UEFI performs NVMe commands e.g.
> Identify Controller, to get the name of the controller.
> 
> nvmet-pci-epf will post the CQE (completion queue entry) to the Admin
> Completion Queue, and then raise an IRQ (using
> dw_pcie_ep_raise_msi_irq()).
> 
> Once the host boots Linux, we will see a WARN_ON_ONCE() from
> dw_pcie_ep_raise_msi_irq(), and then the booting of the host hangs,
> because it never gets an IRQ when loading the nvme driver.
> 
> The reason is that the MSI target address used by UEFI and Linux might
> be different, which will cause dw_pcie_ep_raise_msi_irq() to simply
> return -EINVAL.
> 
> This was working before commit 8719c64e76bf ("PCI: dwc: ep: Cache MSI
> outbound iATU mapping"), so this is a regression.
> 
> Also, remove the warning, as we cannot know if there are operations in
> flight or not, so it seems wrong to print this warning unconditionally
> at every boot when e.g. nvmet-pci-epf is used with a host with UEFI.
> 
> Fixes: 8719c64e76bf ("PCI: dwc: ep: Cache MSI outbound iATU mapping")
> Signed-off-by: Niklas Cassel <cassel@kernel.org>

IIUC, the sequence is like this:

  - nvmet-pci-epf calls pci_epc_raise_irq() leading to
    dw_pcie_ep_raise_msi_irq()

  - dw_pcie_ep_raise_msi_irq() reads PCI_MSI_ADDRESS_*, maps msg_addr,
    and saves it in ep->msi_msg_addr

  - host updates PCI_MSI_ADDRESS_*

  - nvmet-pci-epf calls pci_epc_raise_irq() again

  - dw_pcie_ep_raise_msi_irq() reads PCI_MSI_ADDRESS_*, notices that
    msg_addr has changed, and WARNs and returns -EINVAL

and this patch makes it so the second time through
dw_pcie_ep_raise_msi_irq(), we notice the msg_addr change, remove the
old mapping, and map it again with the new address.

Isn't there still a race between host updates of PCI_MSI_ADDRESS_* and
endpoint reads of those registers?  We can't prevent the host from
updating PCI_MSI_ADDRESS_* between dw_pcie_ep_map_addr() and the
writel(), so maybe it's impossible to prevent the theoretical race
there, and all we can really do is mitigate what we expect to be a
single change at boot time of the host?

Even for that single change, it looks like the host could update
PCI_MSI_ADDRESS_* simultaneously with dw_pcie_ep_raise_msi_irq(), 
leading to mapping a half-updated msg_addr.  This part we *could*
prevent by re-reading PCI_MSI_ADDRESS_* to detect a partial update.

Probably unrelated question: the pci_epc_raise_irq() path doesn't seem
to check PCI_MSI_FLAGS_ENABLE or PCI_MSIX_FLAGS_ENABLE.  Is that
intended?

> ---
>  .../pci/controller/dwc/pcie-designware-ep.c   | 22 +++++++++++--------
>  1 file changed, 13 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/pci/controller/dwc/pcie-designware-ep.c b/drivers/pci/controller/dwc/pcie-designware-ep.c
> index 7e7844ff0f7e..5d8024d5e5c6 100644
> --- a/drivers/pci/controller/dwc/pcie-designware-ep.c
> +++ b/drivers/pci/controller/dwc/pcie-designware-ep.c
> @@ -896,6 +896,19 @@ int dw_pcie_ep_raise_msi_irq(struct dw_pcie_ep *ep, u8 func_no,
>  	 * supported, so we avoid reprogramming the region on every MSI,
>  	 * specifically unmapping immediately after writel().
>  	 */
> +	if (ep->msi_iatu_mapped && (ep->msi_msg_addr != msg_addr ||
> +				    ep->msi_map_size != map_size)) {
> +		/*
> +		 * The host changed the MSI target address or the required
> +		 * mapping size changed. Reprogramming the iATU when there are
> +		 * operations in flight is unsafe on this controller. However,
> +		 * there is no unified way to check if we have operations in
> +		 * flight, thus we don't know if we should WARN() or not.
> +		 */
> +		dw_pcie_ep_unmap_addr(epc, func_no, 0, ep->msi_mem_phys);
> +		ep->msi_iatu_mapped = false;
> +	}
> +
>  	if (!ep->msi_iatu_mapped) {
>  		ret = dw_pcie_ep_map_addr(epc, func_no, 0,
>  					  ep->msi_mem_phys, msg_addr,
> @@ -906,15 +919,6 @@ int dw_pcie_ep_raise_msi_irq(struct dw_pcie_ep *ep, u8 func_no,
>  		ep->msi_iatu_mapped = true;
>  		ep->msi_msg_addr = msg_addr;
>  		ep->msi_map_size = map_size;
> -	} else if (WARN_ON_ONCE(ep->msi_msg_addr != msg_addr ||
> -				ep->msi_map_size != map_size)) {
> -		/*
> -		 * The host changed the MSI target address or the required
> -		 * mapping size changed. Reprogramming the iATU at runtime is
> -		 * unsafe on this controller, so bail out instead of trying to
> -		 * update the existing region.
> -		 */
> -		return -EINVAL;
>  	}
>  
>  	writel(msg_data | (interrupt_num - 1), ep->msi_mem + offset);
> 
> base-commit: 43d324eeb08c3dd9fff7eb9a2c617afd3b96e65c
> -- 
> 2.53.0
> 

  reply	other threads:[~2026-02-10 19:32 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-10 18:12 [PATCH] PCI: dwc: ep: Fix regression in dw_pcie_ep_raise_msi_irq() Niklas Cassel
2026-02-10 19:32 ` Bjorn Helgaas [this message]
2026-02-10 20:22   ` Niklas Cassel
2026-02-10 20:33     ` Niklas Cassel
2026-02-10 20:39     ` Bjorn Helgaas
2026-02-11  8:52       ` Niklas Cassel
2026-02-11 18:08         ` Bjorn Helgaas
2026-02-25 14:59     ` Manivannan Sadhasivam
2026-02-11 16:44 ` Koichiro Den
2026-02-12  9:42 ` Shinichiro Kawasaki
2026-02-25 15:01 ` Manivannan Sadhasivam
2026-02-25 15:51   ` Niklas Cassel
2026-02-25 16:30     ` Manivannan Sadhasivam
2026-02-25 20:05 ` Bjorn Helgaas
2026-02-25 21:56   ` Niklas Cassel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260210193205.GA41950@bhelgaas \
    --to=helgaas@kernel.org \
    --cc=bhelgaas@google.com \
    --cc=cassel@kernel.org \
    --cc=den@valinux.co.jp \
    --cc=jingoohan1@gmail.com \
    --cc=kwilczynski@kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lpieralisi@kernel.org \
    --cc=mani@kernel.org \
    --cc=robh@kernel.org \
    --cc=shinichiro.kawasaki@wdc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.