public inbox for linux-pci@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] PCI: dwc: ep: Fix regression in dw_pcie_ep_raise_msi_irq()
@ 2026-02-10 18:12 Niklas Cassel
  2026-02-10 19:32 ` Bjorn Helgaas
                   ` (4 more replies)
  0 siblings, 5 replies; 15+ messages in thread
From: Niklas Cassel @ 2026-02-10 18:12 UTC (permalink / raw)
  To: Jingoo Han, Manivannan Sadhasivam, Lorenzo Pieralisi,
	Krzysztof Wilczyński, Rob Herring, Bjorn Helgaas,
	Koichiro Den, Niklas Cassel
  Cc: Shinichiro Kawasaki, linux-pci

When using the nvmet-pci-epf EPF driver, and starting the EP before
starting a host with UEFI, the UEFI performs NVMe commands e.g.
Identify Controller, to get the name of the controller.

nvmet-pci-epf will post the CQE (completion queue entry) to the Admin
Completion Queue, and then raise an IRQ (using
dw_pcie_ep_raise_msi_irq()).

Once the host boots Linux, we will see a WARN_ON_ONCE() from
dw_pcie_ep_raise_msi_irq(), and then the booting of the host hangs,
because it never gets an IRQ when loading the nvme driver.

The reason is that the MSI target address used by UEFI and Linux might
be different, which will cause dw_pcie_ep_raise_msi_irq() to simply
return -EINVAL.

This was working before commit 8719c64e76bf ("PCI: dwc: ep: Cache MSI
outbound iATU mapping"), so this is a regression.

Also, remove the warning, as we cannot know if there are operations in
flight or not, so it seems wrong to print this warning unconditionally
at every boot when e.g. nvmet-pci-epf is used with a host with UEFI.

Fixes: 8719c64e76bf ("PCI: dwc: ep: Cache MSI outbound iATU mapping")
Signed-off-by: Niklas Cassel <cassel@kernel.org>
---
 .../pci/controller/dwc/pcie-designware-ep.c   | 22 +++++++++++--------
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/drivers/pci/controller/dwc/pcie-designware-ep.c b/drivers/pci/controller/dwc/pcie-designware-ep.c
index 7e7844ff0f7e..5d8024d5e5c6 100644
--- a/drivers/pci/controller/dwc/pcie-designware-ep.c
+++ b/drivers/pci/controller/dwc/pcie-designware-ep.c
@@ -896,6 +896,19 @@ int dw_pcie_ep_raise_msi_irq(struct dw_pcie_ep *ep, u8 func_no,
 	 * supported, so we avoid reprogramming the region on every MSI,
 	 * specifically unmapping immediately after writel().
 	 */
+	if (ep->msi_iatu_mapped && (ep->msi_msg_addr != msg_addr ||
+				    ep->msi_map_size != map_size)) {
+		/*
+		 * The host changed the MSI target address or the required
+		 * mapping size changed. Reprogramming the iATU when there are
+		 * operations in flight is unsafe on this controller. However,
+		 * there is no unified way to check if we have operations in
+		 * flight, thus we don't know if we should WARN() or not.
+		 */
+		dw_pcie_ep_unmap_addr(epc, func_no, 0, ep->msi_mem_phys);
+		ep->msi_iatu_mapped = false;
+	}
+
 	if (!ep->msi_iatu_mapped) {
 		ret = dw_pcie_ep_map_addr(epc, func_no, 0,
 					  ep->msi_mem_phys, msg_addr,
@@ -906,15 +919,6 @@ int dw_pcie_ep_raise_msi_irq(struct dw_pcie_ep *ep, u8 func_no,
 		ep->msi_iatu_mapped = true;
 		ep->msi_msg_addr = msg_addr;
 		ep->msi_map_size = map_size;
-	} else if (WARN_ON_ONCE(ep->msi_msg_addr != msg_addr ||
-				ep->msi_map_size != map_size)) {
-		/*
-		 * The host changed the MSI target address or the required
-		 * mapping size changed. Reprogramming the iATU at runtime is
-		 * unsafe on this controller, so bail out instead of trying to
-		 * update the existing region.
-		 */
-		return -EINVAL;
 	}
 
 	writel(msg_data | (interrupt_num - 1), ep->msi_mem + offset);

base-commit: 43d324eeb08c3dd9fff7eb9a2c617afd3b96e65c
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] PCI: dwc: ep: Fix regression in dw_pcie_ep_raise_msi_irq()
  2026-02-10 18:12 [PATCH] PCI: dwc: ep: Fix regression in dw_pcie_ep_raise_msi_irq() Niklas Cassel
@ 2026-02-10 19:32 ` Bjorn Helgaas
  2026-02-10 20:22   ` Niklas Cassel
  2026-02-11 16:44 ` Koichiro Den
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 15+ messages in thread
From: Bjorn Helgaas @ 2026-02-10 19:32 UTC (permalink / raw)
  To: Niklas Cassel
  Cc: Jingoo Han, Manivannan Sadhasivam, Lorenzo Pieralisi,
	Krzysztof Wilczyński, Rob Herring, Bjorn Helgaas,
	Koichiro Den, Shinichiro Kawasaki, linux-pci

On Tue, Feb 10, 2026 at 07:12:25PM +0100, Niklas Cassel wrote:
> When using the nvmet-pci-epf EPF driver, and starting the EP before
> starting a host with UEFI, the UEFI performs NVMe commands e.g.
> Identify Controller, to get the name of the controller.
> 
> nvmet-pci-epf will post the CQE (completion queue entry) to the Admin
> Completion Queue, and then raise an IRQ (using
> dw_pcie_ep_raise_msi_irq()).
> 
> Once the host boots Linux, we will see a WARN_ON_ONCE() from
> dw_pcie_ep_raise_msi_irq(), and then the booting of the host hangs,
> because it never gets an IRQ when loading the nvme driver.
> 
> The reason is that the MSI target address used by UEFI and Linux might
> be different, which will cause dw_pcie_ep_raise_msi_irq() to simply
> return -EINVAL.
> 
> This was working before commit 8719c64e76bf ("PCI: dwc: ep: Cache MSI
> outbound iATU mapping"), so this is a regression.
> 
> Also, remove the warning, as we cannot know if there are operations in
> flight or not, so it seems wrong to print this warning unconditionally
> at every boot when e.g. nvmet-pci-epf is used with a host with UEFI.
> 
> Fixes: 8719c64e76bf ("PCI: dwc: ep: Cache MSI outbound iATU mapping")
> Signed-off-by: Niklas Cassel <cassel@kernel.org>

IIUC, the sequence is like this:

  - nvmet-pci-epf calls pci_epc_raise_irq() leading to
    dw_pcie_ep_raise_msi_irq()

  - dw_pcie_ep_raise_msi_irq() reads PCI_MSI_ADDRESS_*, maps msg_addr,
    and saves it in ep->msi_msg_addr

  - host updates PCI_MSI_ADDRESS_*

  - nvmet-pci-epf calls pci_epc_raise_irq() again

  - dw_pcie_ep_raise_msi_irq() reads PCI_MSI_ADDRESS_*, notices that
    msg_addr has changed, and WARNs and returns -EINVAL

and this patch makes it so the second time through
dw_pcie_ep_raise_msi_irq(), we notice the msg_addr change, remove the
old mapping, and map it again with the new address.

Isn't there still a race between host updates of PCI_MSI_ADDRESS_* and
endpoint reads of those registers?  We can't prevent the host from
updating PCI_MSI_ADDRESS_* between dw_pcie_ep_map_addr() and the
writel(), so maybe it's impossible to prevent the theoretical race
there, and all we can really do is mitigate what we expect to be a
single change at boot time of the host?

Even for that single change, it looks like the host could update
PCI_MSI_ADDRESS_* simultaneously with dw_pcie_ep_raise_msi_irq(), 
leading to mapping a half-updated msg_addr.  This part we *could*
prevent by re-reading PCI_MSI_ADDRESS_* to detect a partial update.

Probably unrelated question: the pci_epc_raise_irq() path doesn't seem
to check PCI_MSI_FLAGS_ENABLE or PCI_MSIX_FLAGS_ENABLE.  Is that
intended?

> ---
>  .../pci/controller/dwc/pcie-designware-ep.c   | 22 +++++++++++--------
>  1 file changed, 13 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/pci/controller/dwc/pcie-designware-ep.c b/drivers/pci/controller/dwc/pcie-designware-ep.c
> index 7e7844ff0f7e..5d8024d5e5c6 100644
> --- a/drivers/pci/controller/dwc/pcie-designware-ep.c
> +++ b/drivers/pci/controller/dwc/pcie-designware-ep.c
> @@ -896,6 +896,19 @@ int dw_pcie_ep_raise_msi_irq(struct dw_pcie_ep *ep, u8 func_no,
>  	 * supported, so we avoid reprogramming the region on every MSI,
>  	 * specifically unmapping immediately after writel().
>  	 */
> +	if (ep->msi_iatu_mapped && (ep->msi_msg_addr != msg_addr ||
> +				    ep->msi_map_size != map_size)) {
> +		/*
> +		 * The host changed the MSI target address or the required
> +		 * mapping size changed. Reprogramming the iATU when there are
> +		 * operations in flight is unsafe on this controller. However,
> +		 * there is no unified way to check if we have operations in
> +		 * flight, thus we don't know if we should WARN() or not.
> +		 */
> +		dw_pcie_ep_unmap_addr(epc, func_no, 0, ep->msi_mem_phys);
> +		ep->msi_iatu_mapped = false;
> +	}
> +
>  	if (!ep->msi_iatu_mapped) {
>  		ret = dw_pcie_ep_map_addr(epc, func_no, 0,
>  					  ep->msi_mem_phys, msg_addr,
> @@ -906,15 +919,6 @@ int dw_pcie_ep_raise_msi_irq(struct dw_pcie_ep *ep, u8 func_no,
>  		ep->msi_iatu_mapped = true;
>  		ep->msi_msg_addr = msg_addr;
>  		ep->msi_map_size = map_size;
> -	} else if (WARN_ON_ONCE(ep->msi_msg_addr != msg_addr ||
> -				ep->msi_map_size != map_size)) {
> -		/*
> -		 * The host changed the MSI target address or the required
> -		 * mapping size changed. Reprogramming the iATU at runtime is
> -		 * unsafe on this controller, so bail out instead of trying to
> -		 * update the existing region.
> -		 */
> -		return -EINVAL;
>  	}
>  
>  	writel(msg_data | (interrupt_num - 1), ep->msi_mem + offset);
> 
> base-commit: 43d324eeb08c3dd9fff7eb9a2c617afd3b96e65c
> -- 
> 2.53.0
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] PCI: dwc: ep: Fix regression in dw_pcie_ep_raise_msi_irq()
  2026-02-10 19:32 ` Bjorn Helgaas
@ 2026-02-10 20:22   ` Niklas Cassel
  2026-02-10 20:33     ` Niklas Cassel
                       ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Niklas Cassel @ 2026-02-10 20:22 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Jingoo Han, Manivannan Sadhasivam, Lorenzo Pieralisi,
	Krzysztof Wilczyński, Rob Herring, Bjorn Helgaas,
	Koichiro Den, Shinichiro Kawasaki, linux-pci

On Tue, Feb 10, 2026 at 01:32:05PM -0600, Bjorn Helgaas wrote:
> IIUC, the sequence is like this:
> 
>   - nvmet-pci-epf calls pci_epc_raise_irq() leading to
>     dw_pcie_ep_raise_msi_irq()
> 
>   - dw_pcie_ep_raise_msi_irq() reads PCI_MSI_ADDRESS_*, maps msg_addr,
>     and saves it in ep->msi_msg_addr
> 
>   - host updates PCI_MSI_ADDRESS_*
> 
>   - nvmet-pci-epf calls pci_epc_raise_irq() again
> 
>   - dw_pcie_ep_raise_msi_irq() reads PCI_MSI_ADDRESS_*, notices that
>     msg_addr has changed, and WARNs and returns -EINVAL
> 
> and this patch makes it so the second time through
> dw_pcie_ep_raise_msi_irq(), we notice the msg_addr change, remove the
> old mapping, and map it again with the new address.

Correct.


> 
> Isn't there still a race between host updates of PCI_MSI_ADDRESS_* and
> endpoint reads of those registers?  We can't prevent the host from
> updating PCI_MSI_ADDRESS_* between dw_pcie_ep_map_addr() and the
> writel(), so maybe it's impossible to prevent the theoretical race
> there, and all we can really do is mitigate what we expect to be a
> single change at boot time of the host?

Normally, the MSI target address is not changed during runtime.

The spec allows changing the MSI-X address/data pair when the corresponding
vector is *masked*, and classifies the behavior as undefined if address/data
pair gets changed while the vector is *unmasked*.

AFAICT, it does not mention anything for MSI, so I do not think it is allowed
to be changed during runtime.

The only reason why it is changed here is because UEFI/BIOS will have one
MSI target address, and then once Linux boots, it will use another MSI target
address. (So it only changes once.)


> 
> Even for that single change, it looks like the host could update
> PCI_MSI_ADDRESS_* simultaneously with dw_pcie_ep_raise_msi_irq(), 
> leading to mapping a half-updated msg_addr.  This part we *could*
> prevent by re-reading PCI_MSI_ADDRESS_* to detect a partial update.

The problem that commit 8719c64e76bf ("PCI: dwc: ep: Cache MSI outbound iATU
mapping") fixes is that the DWC controller does not handle when the outbound
iATU is re-programmed when there are ongoing outbound transactions.

What commit 8719c64e76bf ("PCI: dwc: ep: Cache MSI outbound iATU mapping")
did was to map it once on startup, that way we don't re-program the outbound
iATU on every pci_epc_raise_irq() call. Before this commit, every
pci_epc_raise_irq() call could potentially cause ongoing outbound transactions
to be sent untranslated. Often the transactions that were sent untranslated
did not appear to be the MSI writel() itself, but other transactions performed
by the eDMA.

So even with this patch, since we are still not mapping + unmapping the MSI
target address on every pci_epc_raise_irq(), it should just be a single time
where we might trigger this problematic behavior (when MSI target address
changes, UEFI -> Linux). (Which is still much better than possibly triggering
this problematic behavior on every pci_epc_raise_irq().)


> 
> Probably unrelated question: the pci_epc_raise_irq() path doesn't seem
> to check PCI_MSI_FLAGS_ENABLE or PCI_MSIX_FLAGS_ENABLE.  Is that
> intended?

I guess we could improve pci_epc_raise_irq() to check those bits, in a
separate patch.

Mani, thoughts?


Kind regards,
Niklas

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] PCI: dwc: ep: Fix regression in dw_pcie_ep_raise_msi_irq()
  2026-02-10 20:22   ` Niklas Cassel
@ 2026-02-10 20:33     ` Niklas Cassel
  2026-02-10 20:39     ` Bjorn Helgaas
  2026-02-25 14:59     ` Manivannan Sadhasivam
  2 siblings, 0 replies; 15+ messages in thread
From: Niklas Cassel @ 2026-02-10 20:33 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Jingoo Han, Manivannan Sadhasivam, Lorenzo Pieralisi,
	Krzysztof Wilczyński, Rob Herring, Bjorn Helgaas,
	Koichiro Den, Shinichiro Kawasaki, linux-pci

On Tue, Feb 10, 2026 at 09:22:58PM +0100, Niklas Cassel wrote:
> What commit 8719c64e76bf ("PCI: dwc: ep: Cache MSI outbound iATU mapping")
> did was to map it once on startup, that way we don't re-program the outbound
> iATU on every pci_epc_raise_irq() call. Before this commit, every
> pci_epc_raise_irq() call could potentially cause ongoing outbound transactions
> to be sent untranslated. Often the transactions that were sent untranslated
> did not appear to be the MSI writel() itself, but other transactions performed
> by the eDMA.

I guess it could have been the MSI writel() too.

Perhaps another fix could have been to just do a readl() after the writel(),
before the unmap().

If that is preferred, should be simple to test, since the bug is very easy
to reproduce, just run nvmet-pci-epf and do some transfers with a high queue
depth and see the IOMMU errors on the host.

I do like the solution that caches the MSI address though, as that avoids
the extra latency of a map() + unmap() for every pci_epc_raise_irq() call.


Kind regards,
Niklas

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] PCI: dwc: ep: Fix regression in dw_pcie_ep_raise_msi_irq()
  2026-02-10 20:22   ` Niklas Cassel
  2026-02-10 20:33     ` Niklas Cassel
@ 2026-02-10 20:39     ` Bjorn Helgaas
  2026-02-11  8:52       ` Niklas Cassel
  2026-02-25 14:59     ` Manivannan Sadhasivam
  2 siblings, 1 reply; 15+ messages in thread
From: Bjorn Helgaas @ 2026-02-10 20:39 UTC (permalink / raw)
  To: Niklas Cassel
  Cc: Jingoo Han, Manivannan Sadhasivam, Lorenzo Pieralisi,
	Krzysztof Wilczyński, Rob Herring, Bjorn Helgaas,
	Koichiro Den, Shinichiro Kawasaki, linux-pci

On Tue, Feb 10, 2026 at 09:22:58PM +0100, Niklas Cassel wrote:
> On Tue, Feb 10, 2026 at 01:32:05PM -0600, Bjorn Helgaas wrote:
> ...

> > Isn't there still a race between host updates of PCI_MSI_ADDRESS_*
> > and endpoint reads of those registers?  We can't prevent the host
> > from updating PCI_MSI_ADDRESS_* between dw_pcie_ep_map_addr() and
> > the writel(), so maybe it's impossible to prevent the theoretical
> > race there, and all we can really do is mitigate what we expect to
> > be a single change at boot time of the host?
> 
> Normally, the MSI target address is not changed during runtime.
> 
> The spec allows changing the MSI-X address/data pair when the
> corresponding vector is *masked*, and classifies the behavior as
> undefined if address/data pair gets changed while the vector is
> *unmasked*.
> 
> AFAICT, it does not mention anything for MSI, so I do not think it
> is allowed to be changed during runtime.
> 
> The only reason why it is changed here is because UEFI/BIOS will
> have one MSI target address, and then once Linux boots, it will use
> another MSI target address. (So it only changes once.)

Yes, hence "we expect a single change at boot time of the host" above.

> > Even for that single change, it looks like the host could update
> > PCI_MSI_ADDRESS_* simultaneously with dw_pcie_ep_raise_msi_irq(),
> > leading to mapping a half-updated msg_addr.  This part we *could*
> > prevent by re-reading PCI_MSI_ADDRESS_* to detect a partial
> > update.
> 
> The problem that commit 8719c64e76bf ("PCI: dwc: ep: Cache MSI
> outbound iATU mapping") fixes is that the DWC controller does not
> handle when the outbound iATU is re-programmed when there are
> ongoing outbound transactions.

The scenario I'm asking about is the following, where the single
change of MSI target as the host boots is concurrent with
dw_pcie_ep_raise_msi_irq()::

  - host writes PCI_MSI_ADDRESS_LO

  - dw_pcie_ep_raise_msi_irq() reads PCI_MSI_ADDRESS_LO and
    PCI_MSI_ADDRESS_HI

  - dw_pcie_ep_raise_msi_irq() maps msg_addr built from an old
    PCI_MSI_ADDRESS_HI and a new PCI_MSI_ADDRESS_LO

  - host writes PCI_MSI_ADDRESS_HI

This could be mitigated by re-reading PCI_MSI_ADDRESS_* to detect the
tearing.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] PCI: dwc: ep: Fix regression in dw_pcie_ep_raise_msi_irq()
  2026-02-10 20:39     ` Bjorn Helgaas
@ 2026-02-11  8:52       ` Niklas Cassel
  2026-02-11 18:08         ` Bjorn Helgaas
  0 siblings, 1 reply; 15+ messages in thread
From: Niklas Cassel @ 2026-02-11  8:52 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Jingoo Han, Manivannan Sadhasivam, Lorenzo Pieralisi,
	Krzysztof Wilczyński, Rob Herring, Bjorn Helgaas,
	Koichiro Den, Shinichiro Kawasaki, linux-pci

On Tue, Feb 10, 2026 at 02:39:49PM -0600, Bjorn Helgaas wrote:
> On Tue, Feb 10, 2026 at 09:22:58PM +0100, Niklas Cassel wrote:
> > On Tue, Feb 10, 2026 at 01:32:05PM -0600, Bjorn Helgaas wrote:
> 
> The scenario I'm asking about is the following, where the single
> change of MSI target as the host boots is concurrent with
> dw_pcie_ep_raise_msi_irq()::
> 
>   - host writes PCI_MSI_ADDRESS_LO
> 
>   - dw_pcie_ep_raise_msi_irq() reads PCI_MSI_ADDRESS_LO and
>     PCI_MSI_ADDRESS_HI
> 
>   - dw_pcie_ep_raise_msi_irq() maps msg_addr built from an old
>     PCI_MSI_ADDRESS_HI and a new PCI_MSI_ADDRESS_LO
> 
>   - host writes PCI_MSI_ADDRESS_HI
> 
> This could be mitigated by re-reading PCI_MSI_ADDRESS_* to detect the
> tearing.

Ok, now I understand.

This must be extremely unlikely to happen.

Since the host writes the MSI target address very early, before even
enumerating the bus.

So the EP reading a half updated 64-bit MSI address, seems very
unlikely.

Even in the NVMe EPF case, after UEFI loads Linux, there will be no one
posting new Submission Queue Entries, so the EP will not be raising any
interrupts.

If you want to create a misbehaving device that does an interrupt storm
during boot, it might be possible to hit the half updated 64-bit MSI
address race.

If anyone wants to write a patch that avoids that theoretical race,
fine with me, but I don't think we should do anything to avoid it in
this patch, as this theoretical race could happen even before we did
any caching of the MSI target address.


Kind regards,
Niklas

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] PCI: dwc: ep: Fix regression in dw_pcie_ep_raise_msi_irq()
  2026-02-10 18:12 [PATCH] PCI: dwc: ep: Fix regression in dw_pcie_ep_raise_msi_irq() Niklas Cassel
  2026-02-10 19:32 ` Bjorn Helgaas
@ 2026-02-11 16:44 ` Koichiro Den
  2026-02-12  9:42 ` Shinichiro Kawasaki
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 15+ messages in thread
From: Koichiro Den @ 2026-02-11 16:44 UTC (permalink / raw)
  To: Niklas Cassel
  Cc: Jingoo Han, Manivannan Sadhasivam, Lorenzo Pieralisi,
	Krzysztof Wilczyński, Rob Herring, Bjorn Helgaas,
	Shinichiro Kawasaki, linux-pci

On Tue, Feb 10, 2026 at 07:12:25PM +0100, Niklas Cassel wrote:
> When using the nvmet-pci-epf EPF driver, and starting the EP before
> starting a host with UEFI, the UEFI performs NVMe commands e.g.
> Identify Controller, to get the name of the controller.
> 
> nvmet-pci-epf will post the CQE (completion queue entry) to the Admin
> Completion Queue, and then raise an IRQ (using
> dw_pcie_ep_raise_msi_irq()).
> 
> Once the host boots Linux, we will see a WARN_ON_ONCE() from
> dw_pcie_ep_raise_msi_irq(), and then the booting of the host hangs,
> because it never gets an IRQ when loading the nvme driver.
> 
> The reason is that the MSI target address used by UEFI and Linux might
> be different, which will cause dw_pcie_ep_raise_msi_irq() to simply
> return -EINVAL.
> 
> This was working before commit 8719c64e76bf ("PCI: dwc: ep: Cache MSI
> outbound iATU mapping"), so this is a regression.
> 
> Also, remove the warning, as we cannot know if there are operations in
> flight or not, so it seems wrong to print this warning unconditionally
> at every boot when e.g. nvmet-pci-epf is used with a host with UEFI.
> 
> Fixes: 8719c64e76bf ("PCI: dwc: ep: Cache MSI outbound iATU mapping")
> Signed-off-by: Niklas Cassel <cassel@kernel.org>

Tested-by: Koichiro Den <den@valinux.co.jp>

Verified on R-Car S4 (just in case) that the change does not reintroduce
the issue originally addressed by commit 8719c64e76bf, nor does it cause
any regression under high load or in other runtime scenarios.

This looks like a pragmatic fix. The previous WARN_ON_ONCE() guard was
overly restrictive, as it turns out real-world is not always as
straightforward as initially assumed.

Best regards,
Koichiro

> Fixes: 8719c64e76bf ("PCI: dwc: ep: Cache MSI outbound iATU mapping")

> ---
>  .../pci/controller/dwc/pcie-designware-ep.c   | 22 +++++++++++--------
>  1 file changed, 13 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/pci/controller/dwc/pcie-designware-ep.c b/drivers/pci/controller/dwc/pcie-designware-ep.c
> index 7e7844ff0f7e..5d8024d5e5c6 100644
> --- a/drivers/pci/controller/dwc/pcie-designware-ep.c
> +++ b/drivers/pci/controller/dwc/pcie-designware-ep.c
> @@ -896,6 +896,19 @@ int dw_pcie_ep_raise_msi_irq(struct dw_pcie_ep *ep, u8 func_no,
>  	 * supported, so we avoid reprogramming the region on every MSI,
>  	 * specifically unmapping immediately after writel().
>  	 */
> +	if (ep->msi_iatu_mapped && (ep->msi_msg_addr != msg_addr ||
> +				    ep->msi_map_size != map_size)) {
> +		/*
> +		 * The host changed the MSI target address or the required
> +		 * mapping size changed. Reprogramming the iATU when there are
> +		 * operations in flight is unsafe on this controller. However,
> +		 * there is no unified way to check if we have operations in
> +		 * flight, thus we don't know if we should WARN() or not.
> +		 */
> +		dw_pcie_ep_unmap_addr(epc, func_no, 0, ep->msi_mem_phys);
> +		ep->msi_iatu_mapped = false;
> +	}
> +
>  	if (!ep->msi_iatu_mapped) {
>  		ret = dw_pcie_ep_map_addr(epc, func_no, 0,
>  					  ep->msi_mem_phys, msg_addr,
> @@ -906,15 +919,6 @@ int dw_pcie_ep_raise_msi_irq(struct dw_pcie_ep *ep, u8 func_no,
>  		ep->msi_iatu_mapped = true;
>  		ep->msi_msg_addr = msg_addr;
>  		ep->msi_map_size = map_size;
> -	} else if (WARN_ON_ONCE(ep->msi_msg_addr != msg_addr ||
> -				ep->msi_map_size != map_size)) {
> -		/*
> -		 * The host changed the MSI target address or the required
> -		 * mapping size changed. Reprogramming the iATU at runtime is
> -		 * unsafe on this controller, so bail out instead of trying to
> -		 * update the existing region.
> -		 */
> -		return -EINVAL;
>  	}
>  
>  	writel(msg_data | (interrupt_num - 1), ep->msi_mem + offset);
> 
> base-commit: 43d324eeb08c3dd9fff7eb9a2c617afd3b96e65c
> -- 
> 2.53.0
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] PCI: dwc: ep: Fix regression in dw_pcie_ep_raise_msi_irq()
  2026-02-11  8:52       ` Niklas Cassel
@ 2026-02-11 18:08         ` Bjorn Helgaas
  0 siblings, 0 replies; 15+ messages in thread
From: Bjorn Helgaas @ 2026-02-11 18:08 UTC (permalink / raw)
  To: Niklas Cassel
  Cc: Jingoo Han, Manivannan Sadhasivam, Lorenzo Pieralisi,
	Krzysztof Wilczyński, Rob Herring, Bjorn Helgaas,
	Koichiro Den, Shinichiro Kawasaki, linux-pci

On Wed, Feb 11, 2026 at 09:52:34AM +0100, Niklas Cassel wrote:
> On Tue, Feb 10, 2026 at 02:39:49PM -0600, Bjorn Helgaas wrote:
> > On Tue, Feb 10, 2026 at 09:22:58PM +0100, Niklas Cassel wrote:
> > > On Tue, Feb 10, 2026 at 01:32:05PM -0600, Bjorn Helgaas wrote:
> > 
> > The scenario I'm asking about is the following, where the single
> > change of MSI target as the host boots is concurrent with
> > dw_pcie_ep_raise_msi_irq()::
> > 
> >   - host writes PCI_MSI_ADDRESS_LO
> > 
> >   - dw_pcie_ep_raise_msi_irq() reads PCI_MSI_ADDRESS_LO and
> >     PCI_MSI_ADDRESS_HI
> > 
> >   - dw_pcie_ep_raise_msi_irq() maps msg_addr built from an old
> >     PCI_MSI_ADDRESS_HI and a new PCI_MSI_ADDRESS_LO
> > 
> >   - host writes PCI_MSI_ADDRESS_HI
> > 
> > This could be mitigated by re-reading PCI_MSI_ADDRESS_* to detect the
> > tearing.
> 
> Ok, now I understand.
> 
> This must be extremely unlikely to happen.
> 
> Since the host writes the MSI target address very early, before even
> enumerating the bus.
> 
> So the EP reading a half updated 64-bit MSI address, seems very
> unlikely.

Very unlikely for sure, and the patch is OK with me as-is, although
writing to the wrong address would be very difficult to debug.

I think it's probably more important to pay attention to the MSI and
MSI-X enable bits to make sure we don't generate MSIs when disabled.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] PCI: dwc: ep: Fix regression in dw_pcie_ep_raise_msi_irq()
  2026-02-10 18:12 [PATCH] PCI: dwc: ep: Fix regression in dw_pcie_ep_raise_msi_irq() Niklas Cassel
  2026-02-10 19:32 ` Bjorn Helgaas
  2026-02-11 16:44 ` Koichiro Den
@ 2026-02-12  9:42 ` Shinichiro Kawasaki
  2026-02-25 15:01 ` Manivannan Sadhasivam
  2026-02-25 20:05 ` Bjorn Helgaas
  4 siblings, 0 replies; 15+ messages in thread
From: Shinichiro Kawasaki @ 2026-02-12  9:42 UTC (permalink / raw)
  To: Niklas Cassel
  Cc: Jingoo Han, Manivannan Sadhasivam, Lorenzo Pieralisi,
	Krzysztof Wilczyński, Rob Herring, Bjorn Helgaas,
	Koichiro Den, linux-pci@vger.kernel.org

On Feb 10, 2026 / 19:12, Niklas Cassel wrote:
> When using the nvmet-pci-epf EPF driver, and starting the EP before
> starting a host with UEFI, the UEFI performs NVMe commands e.g.
> Identify Controller, to get the name of the controller.
> 
> nvmet-pci-epf will post the CQE (completion queue entry) to the Admin
> Completion Queue, and then raise an IRQ (using
> dw_pcie_ep_raise_msi_irq()).
> 
> Once the host boots Linux, we will see a WARN_ON_ONCE() from
> dw_pcie_ep_raise_msi_irq(), and then the booting of the host hangs,
> because it never gets an IRQ when loading the nvme driver.
> 
> The reason is that the MSI target address used by UEFI and Linux might
> be different, which will cause dw_pcie_ep_raise_msi_irq() to simply
> return -EINVAL.
> 
> This was working before commit 8719c64e76bf ("PCI: dwc: ep: Cache MSI
> outbound iATU mapping"), so this is a regression.
> 
> Also, remove the warning, as we cannot know if there are operations in
> flight or not, so it seems wrong to print this warning unconditionally
> at every boot when e.g. nvmet-pci-epf is used with a host with UEFI.
> 
> Fixes: 8719c64e76bf ("PCI: dwc: ep: Cache MSI outbound iATU mapping")
> Signed-off-by: Niklas Cassel <cassel@kernel.org>

Niklas, thank you for the fix. I confirmed that this patch avoids the
"WARN_ON_ONCE() from dw_pcie_ep_raise_msi_irq()" and the host hangs in
my environment.

Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] PCI: dwc: ep: Fix regression in dw_pcie_ep_raise_msi_irq()
  2026-02-10 20:22   ` Niklas Cassel
  2026-02-10 20:33     ` Niklas Cassel
  2026-02-10 20:39     ` Bjorn Helgaas
@ 2026-02-25 14:59     ` Manivannan Sadhasivam
  2 siblings, 0 replies; 15+ messages in thread
From: Manivannan Sadhasivam @ 2026-02-25 14:59 UTC (permalink / raw)
  To: Niklas Cassel
  Cc: Bjorn Helgaas, Jingoo Han, Lorenzo Pieralisi,
	Krzysztof Wilczyński, Rob Herring, Bjorn Helgaas,
	Koichiro Den, Shinichiro Kawasaki, linux-pci

On Tue, Feb 10, 2026 at 09:22:58PM +0100, Niklas Cassel wrote:
> On Tue, Feb 10, 2026 at 01:32:05PM -0600, Bjorn Helgaas wrote:
> > IIUC, the sequence is like this:
> > 
> >   - nvmet-pci-epf calls pci_epc_raise_irq() leading to
> >     dw_pcie_ep_raise_msi_irq()
> > 
> >   - dw_pcie_ep_raise_msi_irq() reads PCI_MSI_ADDRESS_*, maps msg_addr,
> >     and saves it in ep->msi_msg_addr
> > 
> >   - host updates PCI_MSI_ADDRESS_*
> > 
> >   - nvmet-pci-epf calls pci_epc_raise_irq() again
> > 
> >   - dw_pcie_ep_raise_msi_irq() reads PCI_MSI_ADDRESS_*, notices that
> >     msg_addr has changed, and WARNs and returns -EINVAL
> > 
> > and this patch makes it so the second time through
> > dw_pcie_ep_raise_msi_irq(), we notice the msg_addr change, remove the
> > old mapping, and map it again with the new address.
> 
> Correct.
> 
> 
> > 
> > Isn't there still a race between host updates of PCI_MSI_ADDRESS_* and
> > endpoint reads of those registers?  We can't prevent the host from
> > updating PCI_MSI_ADDRESS_* between dw_pcie_ep_map_addr() and the
> > writel(), so maybe it's impossible to prevent the theoretical race
> > there, and all we can really do is mitigate what we expect to be a
> > single change at boot time of the host?
> 
> Normally, the MSI target address is not changed during runtime.
> 
> The spec allows changing the MSI-X address/data pair when the corresponding
> vector is *masked*, and classifies the behavior as undefined if address/data
> pair gets changed while the vector is *unmasked*.
> 
> AFAICT, it does not mention anything for MSI, so I do not think it is allowed
> to be changed during runtime.
> 
> The only reason why it is changed here is because UEFI/BIOS will have one
> MSI target address, and then once Linux boots, it will use another MSI target
> address. (So it only changes once.)
> 
> 
> > 
> > Even for that single change, it looks like the host could update
> > PCI_MSI_ADDRESS_* simultaneously with dw_pcie_ep_raise_msi_irq(), 
> > leading to mapping a half-updated msg_addr.  This part we *could*
> > prevent by re-reading PCI_MSI_ADDRESS_* to detect a partial update.
> 
> The problem that commit 8719c64e76bf ("PCI: dwc: ep: Cache MSI outbound iATU
> mapping") fixes is that the DWC controller does not handle when the outbound
> iATU is re-programmed when there are ongoing outbound transactions.
> 
> What commit 8719c64e76bf ("PCI: dwc: ep: Cache MSI outbound iATU mapping")
> did was to map it once on startup, that way we don't re-program the outbound
> iATU on every pci_epc_raise_irq() call. Before this commit, every
> pci_epc_raise_irq() call could potentially cause ongoing outbound transactions
> to be sent untranslated. Often the transactions that were sent untranslated
> did not appear to be the MSI writel() itself, but other transactions performed
> by the eDMA.
> 
> So even with this patch, since we are still not mapping + unmapping the MSI
> target address on every pci_epc_raise_irq(), it should just be a single time
> where we might trigger this problematic behavior (when MSI target address
> changes, UEFI -> Linux). (Which is still much better than possibly triggering
> this problematic behavior on every pci_epc_raise_irq().)
> 
> 
> > 
> > Probably unrelated question: the pci_epc_raise_irq() path doesn't seem
> > to check PCI_MSI_FLAGS_ENABLE or PCI_MSIX_FLAGS_ENABLE.  Is that
> > intended?
> 
> I guess we could improve pci_epc_raise_irq() to check those bits, in a
> separate patch.
> 

Yes, this looks like bug which should be fixed in separate patch(es).

- Mani

-- 
மணிவண்ணன் சதாசிவம்

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] PCI: dwc: ep: Fix regression in dw_pcie_ep_raise_msi_irq()
  2026-02-10 18:12 [PATCH] PCI: dwc: ep: Fix regression in dw_pcie_ep_raise_msi_irq() Niklas Cassel
                   ` (2 preceding siblings ...)
  2026-02-12  9:42 ` Shinichiro Kawasaki
@ 2026-02-25 15:01 ` Manivannan Sadhasivam
  2026-02-25 15:51   ` Niklas Cassel
  2026-02-25 20:05 ` Bjorn Helgaas
  4 siblings, 1 reply; 15+ messages in thread
From: Manivannan Sadhasivam @ 2026-02-25 15:01 UTC (permalink / raw)
  To: Niklas Cassel, Bjorn Helgaas
  Cc: Jingoo Han, Lorenzo Pieralisi, Krzysztof Wilczyński,
	Rob Herring, Koichiro Den, Shinichiro Kawasaki, linux-pci

On Tue, Feb 10, 2026 at 07:12:25PM +0100, Niklas Cassel wrote:
> When using the nvmet-pci-epf EPF driver, and starting the EP before
> starting a host with UEFI, the UEFI performs NVMe commands e.g.
> Identify Controller, to get the name of the controller.
> 
> nvmet-pci-epf will post the CQE (completion queue entry) to the Admin
> Completion Queue, and then raise an IRQ (using
> dw_pcie_ep_raise_msi_irq()).
> 
> Once the host boots Linux, we will see a WARN_ON_ONCE() from
> dw_pcie_ep_raise_msi_irq(), and then the booting of the host hangs,
> because it never gets an IRQ when loading the nvme driver.
> 
> The reason is that the MSI target address used by UEFI and Linux might
> be different, which will cause dw_pcie_ep_raise_msi_irq() to simply
> return -EINVAL.
> 
> This was working before commit 8719c64e76bf ("PCI: dwc: ep: Cache MSI
> outbound iATU mapping"), so this is a regression.
> 
> Also, remove the warning, as we cannot know if there are operations in
> flight or not, so it seems wrong to print this warning unconditionally
> at every boot when e.g. nvmet-pci-epf is used with a host with UEFI.
> 
> Fixes: 8719c64e76bf ("PCI: dwc: ep: Cache MSI outbound iATU mapping")
> Signed-off-by: Niklas Cassel <cassel@kernel.org>

Acked-by: Manivannan Sadhasivam <mani@kernel.org>

Bjorn, since this patch fixes a regression introduced in v7.0, could you please
take it for current -rc?

- Mani

> ---
>  .../pci/controller/dwc/pcie-designware-ep.c   | 22 +++++++++++--------
>  1 file changed, 13 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/pci/controller/dwc/pcie-designware-ep.c b/drivers/pci/controller/dwc/pcie-designware-ep.c
> index 7e7844ff0f7e..5d8024d5e5c6 100644
> --- a/drivers/pci/controller/dwc/pcie-designware-ep.c
> +++ b/drivers/pci/controller/dwc/pcie-designware-ep.c
> @@ -896,6 +896,19 @@ int dw_pcie_ep_raise_msi_irq(struct dw_pcie_ep *ep, u8 func_no,
>  	 * supported, so we avoid reprogramming the region on every MSI,
>  	 * specifically unmapping immediately after writel().
>  	 */
> +	if (ep->msi_iatu_mapped && (ep->msi_msg_addr != msg_addr ||
> +				    ep->msi_map_size != map_size)) {
> +		/*
> +		 * The host changed the MSI target address or the required
> +		 * mapping size changed. Reprogramming the iATU when there are
> +		 * operations in flight is unsafe on this controller. However,
> +		 * there is no unified way to check if we have operations in
> +		 * flight, thus we don't know if we should WARN() or not.
> +		 */
> +		dw_pcie_ep_unmap_addr(epc, func_no, 0, ep->msi_mem_phys);
> +		ep->msi_iatu_mapped = false;
> +	}
> +
>  	if (!ep->msi_iatu_mapped) {
>  		ret = dw_pcie_ep_map_addr(epc, func_no, 0,
>  					  ep->msi_mem_phys, msg_addr,
> @@ -906,15 +919,6 @@ int dw_pcie_ep_raise_msi_irq(struct dw_pcie_ep *ep, u8 func_no,
>  		ep->msi_iatu_mapped = true;
>  		ep->msi_msg_addr = msg_addr;
>  		ep->msi_map_size = map_size;
> -	} else if (WARN_ON_ONCE(ep->msi_msg_addr != msg_addr ||
> -				ep->msi_map_size != map_size)) {
> -		/*
> -		 * The host changed the MSI target address or the required
> -		 * mapping size changed. Reprogramming the iATU at runtime is
> -		 * unsafe on this controller, so bail out instead of trying to
> -		 * update the existing region.
> -		 */
> -		return -EINVAL;
>  	}
>  
>  	writel(msg_data | (interrupt_num - 1), ep->msi_mem + offset);
> 
> base-commit: 43d324eeb08c3dd9fff7eb9a2c617afd3b96e65c
> -- 
> 2.53.0
> 

-- 
மணிவண்ணன் சதாசிவம்

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] PCI: dwc: ep: Fix regression in dw_pcie_ep_raise_msi_irq()
  2026-02-25 15:01 ` Manivannan Sadhasivam
@ 2026-02-25 15:51   ` Niklas Cassel
  2026-02-25 16:30     ` Manivannan Sadhasivam
  0 siblings, 1 reply; 15+ messages in thread
From: Niklas Cassel @ 2026-02-25 15:51 UTC (permalink / raw)
  To: Manivannan Sadhasivam
  Cc: Bjorn Helgaas, Jingoo Han, Lorenzo Pieralisi,
	Krzysztof Wilczyński, Rob Herring, Koichiro Den,
	Shinichiro Kawasaki, linux-pci

On Wed, Feb 25, 2026 at 08:31:24PM +0530, Manivannan Sadhasivam wrote:
> On Tue, Feb 10, 2026 at 07:12:25PM +0100, Niklas Cassel wrote:
> > When using the nvmet-pci-epf EPF driver, and starting the EP before
> > starting a host with UEFI, the UEFI performs NVMe commands e.g.
> > Identify Controller, to get the name of the controller.
> > 
> > nvmet-pci-epf will post the CQE (completion queue entry) to the Admin
> > Completion Queue, and then raise an IRQ (using
> > dw_pcie_ep_raise_msi_irq()).
> > 
> > Once the host boots Linux, we will see a WARN_ON_ONCE() from
> > dw_pcie_ep_raise_msi_irq(), and then the booting of the host hangs,
> > because it never gets an IRQ when loading the nvme driver.
> > 
> > The reason is that the MSI target address used by UEFI and Linux might
> > be different, which will cause dw_pcie_ep_raise_msi_irq() to simply
> > return -EINVAL.
> > 
> > This was working before commit 8719c64e76bf ("PCI: dwc: ep: Cache MSI
> > outbound iATU mapping"), so this is a regression.
> > 
> > Also, remove the warning, as we cannot know if there are operations in
> > flight or not, so it seems wrong to print this warning unconditionally
> > at every boot when e.g. nvmet-pci-epf is used with a host with UEFI.
> > 
> > Fixes: 8719c64e76bf ("PCI: dwc: ep: Cache MSI outbound iATU mapping")
> > Signed-off-by: Niklas Cassel <cassel@kernel.org>
> 
> Acked-by: Manivannan Sadhasivam <mani@kernel.org>
> 
> Bjorn, since this patch fixes a regression introduced in v7.0, could you please
> take it for current -rc?


While not introduced in v7.0, it would be nice if you could also consider:
https://patchwork.kernel.org/project/linux-pci/patch/20260211175540.105677-2-cassel@kernel.org/

(Such that v7.0 could have both working dw_pcie_ep_raise_msix_irq() and
dw_pcie_ep_raise_msi_irq())


Kind regards,
Niklas

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] PCI: dwc: ep: Fix regression in dw_pcie_ep_raise_msi_irq()
  2026-02-25 15:51   ` Niklas Cassel
@ 2026-02-25 16:30     ` Manivannan Sadhasivam
  0 siblings, 0 replies; 15+ messages in thread
From: Manivannan Sadhasivam @ 2026-02-25 16:30 UTC (permalink / raw)
  To: Niklas Cassel
  Cc: Bjorn Helgaas, Jingoo Han, Lorenzo Pieralisi,
	Krzysztof Wilczyński, Rob Herring, Koichiro Den,
	Shinichiro Kawasaki, linux-pci

On Wed, Feb 25, 2026 at 04:51:59PM +0100, Niklas Cassel wrote:
> On Wed, Feb 25, 2026 at 08:31:24PM +0530, Manivannan Sadhasivam wrote:
> > On Tue, Feb 10, 2026 at 07:12:25PM +0100, Niklas Cassel wrote:
> > > When using the nvmet-pci-epf EPF driver, and starting the EP before
> > > starting a host with UEFI, the UEFI performs NVMe commands e.g.
> > > Identify Controller, to get the name of the controller.
> > > 
> > > nvmet-pci-epf will post the CQE (completion queue entry) to the Admin
> > > Completion Queue, and then raise an IRQ (using
> > > dw_pcie_ep_raise_msi_irq()).
> > > 
> > > Once the host boots Linux, we will see a WARN_ON_ONCE() from
> > > dw_pcie_ep_raise_msi_irq(), and then the booting of the host hangs,
> > > because it never gets an IRQ when loading the nvme driver.
> > > 
> > > The reason is that the MSI target address used by UEFI and Linux might
> > > be different, which will cause dw_pcie_ep_raise_msi_irq() to simply
> > > return -EINVAL.
> > > 
> > > This was working before commit 8719c64e76bf ("PCI: dwc: ep: Cache MSI
> > > outbound iATU mapping"), so this is a regression.
> > > 
> > > Also, remove the warning, as we cannot know if there are operations in
> > > flight or not, so it seems wrong to print this warning unconditionally
> > > at every boot when e.g. nvmet-pci-epf is used with a host with UEFI.
> > > 
> > > Fixes: 8719c64e76bf ("PCI: dwc: ep: Cache MSI outbound iATU mapping")
> > > Signed-off-by: Niklas Cassel <cassel@kernel.org>
> > 
> > Acked-by: Manivannan Sadhasivam <mani@kernel.org>
> > 
> > Bjorn, since this patch fixes a regression introduced in v7.0, could you please
> > take it for current -rc?
> 
> 
> While not introduced in v7.0, it would be nice if you could also consider:
> https://patchwork.kernel.org/project/linux-pci/patch/20260211175540.105677-2-cassel@kernel.org/
> 
> (Such that v7.0 could have both working dw_pcie_ep_raise_msix_irq() and
> dw_pcie_ep_raise_msi_irq())
> 

That's upto Bjorn actually. I'd also prefer to get the fixes in ASAP.

- Mani

-- 
மணிவண்ணன் சதாசிவம்

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] PCI: dwc: ep: Fix regression in dw_pcie_ep_raise_msi_irq()
  2026-02-10 18:12 [PATCH] PCI: dwc: ep: Fix regression in dw_pcie_ep_raise_msi_irq() Niklas Cassel
                   ` (3 preceding siblings ...)
  2026-02-25 15:01 ` Manivannan Sadhasivam
@ 2026-02-25 20:05 ` Bjorn Helgaas
  2026-02-25 21:56   ` Niklas Cassel
  4 siblings, 1 reply; 15+ messages in thread
From: Bjorn Helgaas @ 2026-02-25 20:05 UTC (permalink / raw)
  To: Niklas Cassel
  Cc: Jingoo Han, Manivannan Sadhasivam, Lorenzo Pieralisi,
	Krzysztof Wilczyński, Rob Herring, Bjorn Helgaas,
	Koichiro Den, Shinichiro Kawasaki, linux-pci

On Tue, Feb 10, 2026 at 07:12:25PM +0100, Niklas Cassel wrote:
> When using the nvmet-pci-epf EPF driver, and starting the EP before
> starting a host with UEFI, the UEFI performs NVMe commands e.g.
> Identify Controller, to get the name of the controller.
> 
> nvmet-pci-epf will post the CQE (completion queue entry) to the Admin
> Completion Queue, and then raise an IRQ (using
> dw_pcie_ep_raise_msi_irq()).
> 
> Once the host boots Linux, we will see a WARN_ON_ONCE() from
> dw_pcie_ep_raise_msi_irq(), and then the booting of the host hangs,
> because it never gets an IRQ when loading the nvme driver.
> 
> The reason is that the MSI target address used by UEFI and Linux might
> be different, which will cause dw_pcie_ep_raise_msi_irq() to simply
> return -EINVAL.
> 
> This was working before commit 8719c64e76bf ("PCI: dwc: ep: Cache MSI
> outbound iATU mapping"), so this is a regression.
> 
> Also, remove the warning, as we cannot know if there are operations in
> flight or not, so it seems wrong to print this warning unconditionally
> at every boot when e.g. nvmet-pci-epf is used with a host with UEFI.

I put this on pci/for-linus for v7.0, thanks!

I'd like to make the commit log a little more general, since the issue
affects any endpoint driver.  Here's my proposal; I'll update it based
on your feedback:

  PCI: dwc: ep: Fix dw_pcie_ep_raise_msi_irq() Message Address cache

  Endpoint drivers use dw_pcie_ep_raise_msi_irq() to raise MSI interrupts to
  the host.  After 8719c64e76bf ("PCI: dwc: ep: Cache MSI outbound iATU
  mapping"), dw_pcie_ep_raise_msi_irq() caches the Message Address from the
  MSI Capability in ep->msi_msg_addr.  But that Message Address is controlled
  by the host, and it may change.  For example, if:

    - firmware on the host configures the Message Address and triggers an
      MSI,

    - a driver on the Endpoint raises the MSI via dw_pcie_ep_raise_msi_irq(),
      which caches the Message Address,

    - a kernel on the host reconfigures the Message Address and the host
      kernel driver triggers another MSI,

  dw_pcie_ep_raise_msi_irq() notices that the Message Address no longer
  matches the cached ep->msi_msg_addr, warns about it, and returns error
  instead of raising the MSI.  The host kernel may hang because it never
  receives the MSI.

  This was seen with the nvmet_pci_epf_driver: the host UEFI performs NVMe
  commands, e.g. Identify Controller to get the name of the controller,
  nvmet-pci-epf posts the completion queue entry and raises an IRQ using
  dw_pcie_ep_raise_msi_irq().  When the host boots Linux, we see a
  WARN_ON_ONCE() from dw_pcie_ep_raise_msi_irq(), and the host kernel hangs
  because the nvme driver never gets an IRQ.

  Remove the warning when dw_pcie_ep_raise_msi_irq() notices that Message
  Address has changed, remap using the new address, and update the
  ep->msi_msg_addr cache.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] PCI: dwc: ep: Fix regression in dw_pcie_ep_raise_msi_irq()
  2026-02-25 20:05 ` Bjorn Helgaas
@ 2026-02-25 21:56   ` Niklas Cassel
  0 siblings, 0 replies; 15+ messages in thread
From: Niklas Cassel @ 2026-02-25 21:56 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Jingoo Han, Manivannan Sadhasivam, Lorenzo Pieralisi,
	Krzysztof Wilczyński, Rob Herring, Bjorn Helgaas,
	Koichiro Den, Shinichiro Kawasaki, linux-pci

On Wed, Feb 25, 2026 at 02:05:53PM -0600, Bjorn Helgaas wrote:
> On Tue, Feb 10, 2026 at 07:12:25PM +0100, Niklas Cassel wrote:
> 
> I put this on pci/for-linus for v7.0, thanks!
> 
> I'd like to make the commit log a little more general, since the issue
> affects any endpoint driver.  Here's my proposal; I'll update it based
> on your feedback:
> 
>   PCI: dwc: ep: Fix dw_pcie_ep_raise_msi_irq() Message Address cache
> 
>   Endpoint drivers use dw_pcie_ep_raise_msi_irq() to raise MSI interrupts to
>   the host.  After 8719c64e76bf ("PCI: dwc: ep: Cache MSI outbound iATU
>   mapping"), dw_pcie_ep_raise_msi_irq() caches the Message Address from the
>   MSI Capability in ep->msi_msg_addr.  But that Message Address is controlled
>   by the host, and it may change.  For example, if:
> 
>     - firmware on the host configures the Message Address and triggers an
>       MSI,
> 
>     - a driver on the Endpoint raises the MSI via dw_pcie_ep_raise_msi_irq(),
>       which caches the Message Address,
> 
>     - a kernel on the host reconfigures the Message Address and the host
>       kernel driver triggers another MSI,
> 
>   dw_pcie_ep_raise_msi_irq() notices that the Message Address no longer
>   matches the cached ep->msi_msg_addr, warns about it, and returns error
>   instead of raising the MSI.  The host kernel may hang because it never
>   receives the MSI.
> 
>   This was seen with the nvmet_pci_epf_driver: the host UEFI performs NVMe
>   commands, e.g. Identify Controller to get the name of the controller,
>   nvmet-pci-epf posts the completion queue entry and raises an IRQ using
>   dw_pcie_ep_raise_msi_irq().  When the host boots Linux, we see a
>   WARN_ON_ONCE() from dw_pcie_ep_raise_msi_irq(), and the host kernel hangs
>   because the nvme driver never gets an IRQ.
> 
>   Remove the warning when dw_pcie_ep_raise_msi_irq() notices that Message
>   Address has changed, remap using the new address, and update the
>   ep->msi_msg_addr cache.

Looks good to me!

You've been reviewing/discussing the patch already, so you were already
very familiar with the problem. Thank you for picking it up!


Kind regards,
Niklas

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2026-02-25 21:56 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-10 18:12 [PATCH] PCI: dwc: ep: Fix regression in dw_pcie_ep_raise_msi_irq() Niklas Cassel
2026-02-10 19:32 ` Bjorn Helgaas
2026-02-10 20:22   ` Niklas Cassel
2026-02-10 20:33     ` Niklas Cassel
2026-02-10 20:39     ` Bjorn Helgaas
2026-02-11  8:52       ` Niklas Cassel
2026-02-11 18:08         ` Bjorn Helgaas
2026-02-25 14:59     ` Manivannan Sadhasivam
2026-02-11 16:44 ` Koichiro Den
2026-02-12  9:42 ` Shinichiro Kawasaki
2026-02-25 15:01 ` Manivannan Sadhasivam
2026-02-25 15:51   ` Niklas Cassel
2026-02-25 16:30     ` Manivannan Sadhasivam
2026-02-25 20:05 ` Bjorn Helgaas
2026-02-25 21:56   ` Niklas Cassel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox