From: Bjorn Helgaas <helgaas@kernel.org>
To: Niklas Cassel <cassel@kernel.org>
Cc: "Jingoo Han" <jingoohan1@gmail.com>,
"Manivannan Sadhasivam" <mani@kernel.org>,
"Lorenzo Pieralisi" <lpieralisi@kernel.org>,
"Krzysztof Wilczyński" <kwilczynski@kernel.org>,
"Rob Herring" <robh@kernel.org>,
"Bjorn Helgaas" <bhelgaas@google.com>,
"Koichiro Den" <den@valinux.co.jp>,
"Shinichiro Kawasaki" <shinichiro.kawasaki@wdc.com>,
linux-pci@vger.kernel.org
Subject: Re: [PATCH] PCI: dwc: ep: Fix regression in dw_pcie_ep_raise_msi_irq()
Date: Tue, 10 Feb 2026 13:32:05 -0600 [thread overview]
Message-ID: <20260210193205.GA41950@bhelgaas> (raw)
In-Reply-To: <20260210181225.3926165-2-cassel@kernel.org>
On Tue, Feb 10, 2026 at 07:12:25PM +0100, Niklas Cassel wrote:
> When using the nvmet-pci-epf EPF driver, and starting the EP before
> starting a host with UEFI, the UEFI performs NVMe commands e.g.
> Identify Controller, to get the name of the controller.
>
> nvmet-pci-epf will post the CQE (completion queue entry) to the Admin
> Completion Queue, and then raise an IRQ (using
> dw_pcie_ep_raise_msi_irq()).
>
> Once the host boots Linux, we will see a WARN_ON_ONCE() from
> dw_pcie_ep_raise_msi_irq(), and then the booting of the host hangs,
> because it never gets an IRQ when loading the nvme driver.
>
> The reason is that the MSI target address used by UEFI and Linux might
> be different, which will cause dw_pcie_ep_raise_msi_irq() to simply
> return -EINVAL.
>
> This was working before commit 8719c64e76bf ("PCI: dwc: ep: Cache MSI
> outbound iATU mapping"), so this is a regression.
>
> Also, remove the warning, as we cannot know if there are operations in
> flight or not, so it seems wrong to print this warning unconditionally
> at every boot when e.g. nvmet-pci-epf is used with a host with UEFI.
>
> Fixes: 8719c64e76bf ("PCI: dwc: ep: Cache MSI outbound iATU mapping")
> Signed-off-by: Niklas Cassel <cassel@kernel.org>
IIUC, the sequence is like this:
- nvmet-pci-epf calls pci_epc_raise_irq() leading to
dw_pcie_ep_raise_msi_irq()
- dw_pcie_ep_raise_msi_irq() reads PCI_MSI_ADDRESS_*, maps msg_addr,
and saves it in ep->msi_msg_addr
- host updates PCI_MSI_ADDRESS_*
- nvmet-pci-epf calls pci_epc_raise_irq() again
- dw_pcie_ep_raise_msi_irq() reads PCI_MSI_ADDRESS_*, notices that
msg_addr has changed, and WARNs and returns -EINVAL
and this patch makes it so the second time through
dw_pcie_ep_raise_msi_irq(), we notice the msg_addr change, remove the
old mapping, and map it again with the new address.
Isn't there still a race between host updates of PCI_MSI_ADDRESS_* and
endpoint reads of those registers? We can't prevent the host from
updating PCI_MSI_ADDRESS_* between dw_pcie_ep_map_addr() and the
writel(), so maybe it's impossible to prevent the theoretical race
there, and all we can really do is mitigate what we expect to be a
single change at boot time of the host?
Even for that single change, it looks like the host could update
PCI_MSI_ADDRESS_* simultaneously with dw_pcie_ep_raise_msi_irq(),
leading to mapping a half-updated msg_addr. This part we *could*
prevent by re-reading PCI_MSI_ADDRESS_* to detect a partial update.
Probably unrelated question: the pci_epc_raise_irq() path doesn't seem
to check PCI_MSI_FLAGS_ENABLE or PCI_MSIX_FLAGS_ENABLE. Is that
intended?
> ---
> .../pci/controller/dwc/pcie-designware-ep.c | 22 +++++++++++--------
> 1 file changed, 13 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/pci/controller/dwc/pcie-designware-ep.c b/drivers/pci/controller/dwc/pcie-designware-ep.c
> index 7e7844ff0f7e..5d8024d5e5c6 100644
> --- a/drivers/pci/controller/dwc/pcie-designware-ep.c
> +++ b/drivers/pci/controller/dwc/pcie-designware-ep.c
> @@ -896,6 +896,19 @@ int dw_pcie_ep_raise_msi_irq(struct dw_pcie_ep *ep, u8 func_no,
> * supported, so we avoid reprogramming the region on every MSI,
> * specifically unmapping immediately after writel().
> */
> + if (ep->msi_iatu_mapped && (ep->msi_msg_addr != msg_addr ||
> + ep->msi_map_size != map_size)) {
> + /*
> + * The host changed the MSI target address or the required
> + * mapping size changed. Reprogramming the iATU when there are
> + * operations in flight is unsafe on this controller. However,
> + * there is no unified way to check if we have operations in
> + * flight, thus we don't know if we should WARN() or not.
> + */
> + dw_pcie_ep_unmap_addr(epc, func_no, 0, ep->msi_mem_phys);
> + ep->msi_iatu_mapped = false;
> + }
> +
> if (!ep->msi_iatu_mapped) {
> ret = dw_pcie_ep_map_addr(epc, func_no, 0,
> ep->msi_mem_phys, msg_addr,
> @@ -906,15 +919,6 @@ int dw_pcie_ep_raise_msi_irq(struct dw_pcie_ep *ep, u8 func_no,
> ep->msi_iatu_mapped = true;
> ep->msi_msg_addr = msg_addr;
> ep->msi_map_size = map_size;
> - } else if (WARN_ON_ONCE(ep->msi_msg_addr != msg_addr ||
> - ep->msi_map_size != map_size)) {
> - /*
> - * The host changed the MSI target address or the required
> - * mapping size changed. Reprogramming the iATU at runtime is
> - * unsafe on this controller, so bail out instead of trying to
> - * update the existing region.
> - */
> - return -EINVAL;
> }
>
> writel(msg_data | (interrupt_num - 1), ep->msi_mem + offset);
>
> base-commit: 43d324eeb08c3dd9fff7eb9a2c617afd3b96e65c
> --
> 2.53.0
>
next prev parent reply other threads:[~2026-02-10 19:32 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-10 18:12 [PATCH] PCI: dwc: ep: Fix regression in dw_pcie_ep_raise_msi_irq() Niklas Cassel
2026-02-10 19:32 ` Bjorn Helgaas [this message]
2026-02-10 20:22 ` Niklas Cassel
2026-02-10 20:33 ` Niklas Cassel
2026-02-10 20:39 ` Bjorn Helgaas
2026-02-11 8:52 ` Niklas Cassel
2026-02-11 18:08 ` Bjorn Helgaas
2026-02-25 14:59 ` Manivannan Sadhasivam
2026-02-11 16:44 ` Koichiro Den
2026-02-12 9:42 ` Shinichiro Kawasaki
2026-02-25 15:01 ` Manivannan Sadhasivam
2026-02-25 15:51 ` Niklas Cassel
2026-02-25 16:30 ` Manivannan Sadhasivam
2026-02-25 20:05 ` Bjorn Helgaas
2026-02-25 21:56 ` Niklas Cassel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260210193205.GA41950@bhelgaas \
--to=helgaas@kernel.org \
--cc=bhelgaas@google.com \
--cc=cassel@kernel.org \
--cc=den@valinux.co.jp \
--cc=jingoohan1@gmail.com \
--cc=kwilczynski@kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=lpieralisi@kernel.org \
--cc=mani@kernel.org \
--cc=robh@kernel.org \
--cc=shinichiro.kawasaki@wdc.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox