* [PATCH] PCI: dwc: ep: Cache MSI outbound iATU mapping
@ 2025-12-10 7:13 Niklas Cassel
2025-12-22 8:19 ` Krishna Chaitanya Chundru
0 siblings, 1 reply; 17+ messages in thread
From: Niklas Cassel @ 2025-12-10 7:13 UTC (permalink / raw)
To: Jingoo Han, Manivannan Sadhasivam, Lorenzo Pieralisi,
Krzysztof Wilczyński, Rob Herring, Bjorn Helgaas
Cc: Frank Li, Damien Le Moal, Koichiro Den, Niklas Cassel, linux-pci
From: Koichiro Den <den@valinux.co.jp>
dw_pcie_ep_raise_msi_irq() currently programs an outbound iATU window
for the MSI target address on every interrupt and tears it down again
via dw_pcie_ep_unmap_addr().
On systems that heavily use the AXI bridge interface (for example when
the integrated eDMA engine is active), this means the outbound iATU
registers are updated while traffic is in flight. The DesignWare
endpoint databook 5.40a - "3.10.6.1 iATU Outbound Programming Overview"
warns that updating iATU registers in this situation is not supported,
and the behavior is undefined.
Under high MSI and eDMA load this pattern results in occasional bogus
outbound transactions and IOMMU faults, on the RC side, such as:
ipmmu-vmsa eed40000.iommu: Unhandled fault: status 0x00001502 iova 0xfe000000
followed by the system becoming unresponsive. This is the actual output
observed on Renesas R-Car S4, with its ipmmu_hc used with PCIe ch0.
There is no need to reprogram the iATU region used for MSI on every
interrupt. The host-provided MSI address is stable while MSI is enabled,
and the endpoint driver already dedicates a scratch buffer for MSI
generation.
Cache the aligned MSI address and map size, program the outbound iATU
once, and keep the window enabled. Subsequent interrupts only perform a
write to the MSI scratch buffer, avoiding dynamic iATU reprogramming in
the hot path and fixing the lockups seen under load.
Signed-off-by: Koichiro Den <den@valinux.co.jp>
[cassel: do same change for dw_pcie_ep_raise_msix_irq()]
Signed-off-by: Niklas Cassel <cassel@kernel.org>
---
Notes: Without this patch, I also see IOMMU errors on the host side,
when running drivers/nvme/target/pci-epf.c with a large queue depth.
pci_epc_raise_irq() does take a mutex, so the calls to these functions
are serialized, so it really appears that the DWC controller does not
handle iATU reprogramming while there are outstanding transactions.
Just like Koichiro describes here:
https://lore.kernel.org/linux-pci/ddriorsgyjs6klcb6d7pi2u3ah3wxlzku7v2dpyjlo6tmalvfw@yj5dczlkggt6/
I also see the iova faulting on the RC side to be the start of "addr_space"
on the EP, so it appears that a transaction has gone through untranslated.
(Most likely because the DWC controller does handle the iATU table being
modified while there are outstanding transactions.)
This patch has been tested using pci-epf-test and nvmet-pci-epf on rock5b.
pci-epf-test does change between MSI and MSI-X without calling
dw_pcie_ep_stop(), however, the msg_addr address written by the host
will be the same address, at least when using a Linux host using a DWC
based controller. If another host ends up using different msg_addr for
MSI and MSI-X, then I think that we will need to modify pci-epf-test to
call a function when changing IRQ type, such that pcie-designware-ep.c
can tear down the MSI/MSI-X mapping.
.../pci/controller/dwc/pcie-designware-ep.c | 82 ++++++++++++++++---
drivers/pci/controller/dwc/pcie-designware.h | 5 ++
2 files changed, 75 insertions(+), 12 deletions(-)
diff --git a/drivers/pci/controller/dwc/pcie-designware-ep.c b/drivers/pci/controller/dwc/pcie-designware-ep.c
index 7f2112c2fb21..2bbeddaa73d4 100644
--- a/drivers/pci/controller/dwc/pcie-designware-ep.c
+++ b/drivers/pci/controller/dwc/pcie-designware-ep.c
@@ -601,6 +601,16 @@ static void dw_pcie_ep_stop(struct pci_epc *epc)
struct dw_pcie_ep *ep = epc_get_drvdata(epc);
struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
+ /*
+ * Tear down the dedicated outbound window used for MSI
+ * generation. This avoids leaking an iATU window across
+ * endpoint stop/start cycles.
+ */
+ if (ep->msi_iatu_mapped) {
+ dw_pcie_ep_unmap_addr(epc, 0, 0, ep->msi_mem_phys);
+ ep->msi_iatu_mapped = false;
+ }
+
dw_pcie_stop_link(pci);
}
@@ -702,14 +712,37 @@ int dw_pcie_ep_raise_msi_irq(struct dw_pcie_ep *ep, u8 func_no,
msg_addr = ((u64)msg_addr_upper) << 32 | msg_addr_lower;
msg_addr = dw_pcie_ep_align_addr(epc, msg_addr, &map_size, &offset);
- ret = dw_pcie_ep_map_addr(epc, func_no, 0, ep->msi_mem_phys, msg_addr,
- map_size);
- if (ret)
- return ret;
- writel(msg_data | (interrupt_num - 1), ep->msi_mem + offset);
+ /*
+ * Program the outbound iATU once and keep it enabled.
+ *
+ * The spec warns that updating iATU registers while there are
+ * operations in flight on the AXI bridge interface is not
+ * supported, so we avoid reprogramming the region on every MSI,
+ * specifically unmapping immediately after writel().
+ */
+ if (!ep->msi_iatu_mapped) {
+ ret = dw_pcie_ep_map_addr(epc, func_no, 0,
+ ep->msi_mem_phys, msg_addr,
+ map_size);
+ if (ret)
+ return ret;
+
+ ep->msi_iatu_mapped = true;
+ ep->msi_msg_addr = msg_addr;
+ ep->msi_map_size = map_size;
+ } else if (WARN_ON_ONCE(ep->msi_msg_addr != msg_addr ||
+ ep->msi_map_size != map_size)) {
+ /*
+ * The host changed the MSI target address or the required
+ * mapping size changed. Reprogramming the iATU at runtime is
+ * unsafe on this controller, so bail out instead of trying to
+ * update the existing region.
+ */
+ return -EINVAL;
+ }
- dw_pcie_ep_unmap_addr(epc, func_no, 0, ep->msi_mem_phys);
+ writel(msg_data | (interrupt_num - 1), ep->msi_mem + offset);
return 0;
}
@@ -786,14 +819,36 @@ int dw_pcie_ep_raise_msix_irq(struct dw_pcie_ep *ep, u8 func_no,
}
msg_addr = dw_pcie_ep_align_addr(epc, msg_addr, &map_size, &offset);
- ret = dw_pcie_ep_map_addr(epc, func_no, 0, ep->msi_mem_phys, msg_addr,
- map_size);
- if (ret)
- return ret;
- writel(msg_data, ep->msi_mem + offset);
+ /*
+ * Program the outbound iATU once and keep it enabled.
+ *
+ * The spec warns that updating iATU registers while there are
+ * operations in flight on the AXI bridge interface is not
+ * supported, so we avoid reprogramming the region on every MSI-X,
+ * specifically unmapping immediately after writel().
+ */
+ if (!ep->msi_iatu_mapped) {
+ ret = dw_pcie_ep_map_addr(epc, func_no, 0, ep->msi_mem_phys, msg_addr,
+ map_size);
+ if (ret)
+ return ret;
+
+ ep->msi_iatu_mapped = true;
+ ep->msi_msg_addr = msg_addr;
+ ep->msi_map_size = map_size;
+ } else if (WARN_ON_ONCE(ep->msi_msg_addr != msg_addr ||
+ ep->msi_map_size != map_size)) {
+ /*
+ * The host changed the MSI-X target address or the required
+ * mapping size changed. Reprogramming the iATU at runtime is
+ * unsafe on this controller, so bail out instead of trying to
+ * update the existing region.
+ */
+ return -EINVAL;
+ }
- dw_pcie_ep_unmap_addr(epc, func_no, 0, ep->msi_mem_phys);
+ writel(msg_data, ep->msi_mem + offset);
return 0;
}
@@ -1086,6 +1141,9 @@ int dw_pcie_ep_init(struct dw_pcie_ep *ep)
struct device *dev = pci->dev;
INIT_LIST_HEAD(&ep->func_list);
+ ep->msi_iatu_mapped = false;
+ ep->msi_msg_addr = 0;
+ ep->msi_map_size = 0;
epc = devm_pci_epc_create(dev, &epc_ops);
if (IS_ERR(epc)) {
diff --git a/drivers/pci/controller/dwc/pcie-designware.h b/drivers/pci/controller/dwc/pcie-designware.h
index a31bd93490dc..1093c622826d 100644
--- a/drivers/pci/controller/dwc/pcie-designware.h
+++ b/drivers/pci/controller/dwc/pcie-designware.h
@@ -472,6 +472,11 @@ struct dw_pcie_ep {
void __iomem *msi_mem;
phys_addr_t msi_mem_phys;
struct pci_epf_bar *epf_bar[PCI_STD_NUM_BARS];
+
+ /* MSI outbound iATU state */
+ bool msi_iatu_mapped;
+ u64 msi_msg_addr;
+ size_t msi_map_size;
};
struct dw_pcie_ops {
--
2.52.0
^ permalink raw reply related [flat|nested] 17+ messages in thread* Re: [PATCH] PCI: dwc: ep: Cache MSI outbound iATU mapping
2025-12-10 7:13 [PATCH] PCI: dwc: ep: Cache MSI outbound iATU mapping Niklas Cassel
@ 2025-12-22 8:19 ` Krishna Chaitanya Chundru
2025-12-22 8:36 ` Niklas Cassel
2025-12-22 9:58 ` Manivannan Sadhasivam
0 siblings, 2 replies; 17+ messages in thread
From: Krishna Chaitanya Chundru @ 2025-12-22 8:19 UTC (permalink / raw)
To: Niklas Cassel, Jingoo Han, Manivannan Sadhasivam,
Lorenzo Pieralisi, Krzysztof Wilczyński, Rob Herring,
Bjorn Helgaas
Cc: Frank Li, Damien Le Moal, Koichiro Den, linux-pci
On 12/10/2025 12:43 PM, Niklas Cassel wrote:
> From: Koichiro Den <den@valinux.co.jp>
>
> dw_pcie_ep_raise_msi_irq() currently programs an outbound iATU window
> for the MSI target address on every interrupt and tears it down again
> via dw_pcie_ep_unmap_addr().
>
> On systems that heavily use the AXI bridge interface (for example when
> the integrated eDMA engine is active), this means the outbound iATU
> registers are updated while traffic is in flight. The DesignWare
> endpoint databook 5.40a - "3.10.6.1 iATU Outbound Programming Overview"
> warns that updating iATU registers in this situation is not supported,
> and the behavior is undefined.
>
> Under high MSI and eDMA load this pattern results in occasional bogus
> outbound transactions and IOMMU faults, on the RC side, such as:
>
> ipmmu-vmsa eed40000.iommu: Unhandled fault: status 0x00001502 iova 0xfe000000
>
> followed by the system becoming unresponsive. This is the actual output
> observed on Renesas R-Car S4, with its ipmmu_hc used with PCIe ch0.
>
> There is no need to reprogram the iATU region used for MSI on every
> interrupt. The host-provided MSI address is stable while MSI is enabled,
> and the endpoint driver already dedicates a scratch buffer for MSI
> generation.
>
> Cache the aligned MSI address and map size, program the outbound iATU
> once, and keep the window enabled. Subsequent interrupts only perform a
> write to the MSI scratch buffer, avoiding dynamic iATU reprogramming in
> the hot path and fixing the lockups seen under load.
>
> Signed-off-by: Koichiro Den <den@valinux.co.jp>
> [cassel: do same change for dw_pcie_ep_raise_msix_irq()]
> Signed-off-by: Niklas Cassel <cassel@kernel.org>
> ---
> Notes: Without this patch, I also see IOMMU errors on the host side,
> when running drivers/nvme/target/pci-epf.c with a large queue depth.
>
> pci_epc_raise_irq() does take a mutex, so the calls to these functions
> are serialized, so it really appears that the DWC controller does not
> handle iATU reprogramming while there are outstanding transactions.
>
> Just like Koichiro describes here:
> https://lore.kernel.org/linux-pci/ddriorsgyjs6klcb6d7pi2u3ah3wxlzku7v2dpyjlo6tmalvfw@yj5dczlkggt6/
>
> I also see the iova faulting on the RC side to be the start of "addr_space"
> on the EP, so it appears that a transaction has gone through untranslated.
> (Most likely because the DWC controller does handle the iATU table being
> modified while there are outstanding transactions.)
>
> This patch has been tested using pci-epf-test and nvmet-pci-epf on rock5b.
>
> pci-epf-test does change between MSI and MSI-X without calling
> dw_pcie_ep_stop(), however, the msg_addr address written by the host
> will be the same address, at least when using a Linux host using a DWC
> based controller. If another host ends up using different msg_addr for
> MSI and MSI-X, then I think that we will need to modify pci-epf-test to
> call a function when changing IRQ type, such that pcie-designware-ep.c
> can tear down the MSI/MSI-X mapping.
>
> .../pci/controller/dwc/pcie-designware-ep.c | 82 ++++++++++++++++---
> drivers/pci/controller/dwc/pcie-designware.h | 5 ++
> 2 files changed, 75 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/pci/controller/dwc/pcie-designware-ep.c b/drivers/pci/controller/dwc/pcie-designware-ep.c
> index 7f2112c2fb21..2bbeddaa73d4 100644
> --- a/drivers/pci/controller/dwc/pcie-designware-ep.c
> +++ b/drivers/pci/controller/dwc/pcie-designware-ep.c
> @@ -601,6 +601,16 @@ static void dw_pcie_ep_stop(struct pci_epc *epc)
> struct dw_pcie_ep *ep = epc_get_drvdata(epc);
> struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
>
> + /*
> + * Tear down the dedicated outbound window used for MSI
> + * generation. This avoids leaking an iATU window across
> + * endpoint stop/start cycles.
> + */
> + if (ep->msi_iatu_mapped) {
> + dw_pcie_ep_unmap_addr(epc, 0, 0, ep->msi_mem_phys);
> + ep->msi_iatu_mapped = false;
> + }
> +
> dw_pcie_stop_link(pci);
> }
>
> @@ -702,14 +712,37 @@ int dw_pcie_ep_raise_msi_irq(struct dw_pcie_ep *ep, u8 func_no,
> msg_addr = ((u64)msg_addr_upper) << 32 | msg_addr_lower;
>
> msg_addr = dw_pcie_ep_align_addr(epc, msg_addr, &map_size, &offset);
> - ret = dw_pcie_ep_map_addr(epc, func_no, 0, ep->msi_mem_phys, msg_addr,
> - map_size);
> - if (ret)
> - return ret;
>
> - writel(msg_data | (interrupt_num - 1), ep->msi_mem + offset);
> + /*
> + * Program the outbound iATU once and keep it enabled.
> + *
> + * The spec warns that updating iATU registers while there are
> + * operations in flight on the AXI bridge interface is not
> + * supported, so we avoid reprogramming the region on every MSI,
> + * specifically unmapping immediately after writel().
> + */
> + if (!ep->msi_iatu_mapped) {
> + ret = dw_pcie_ep_map_addr(epc, func_no, 0,
> + ep->msi_mem_phys, msg_addr,
> + map_size);
> + if (ret)
> + return ret;
> +
> + ep->msi_iatu_mapped = true;
> + ep->msi_msg_addr = msg_addr;
> + ep->msi_map_size = map_size;
> + } else if (WARN_ON_ONCE(ep->msi_msg_addr != msg_addr ||
> + ep->msi_map_size != map_size)) {
> + /*
> + * The host changed the MSI target address or the required
> + * mapping size changed. Reprogramming the iATU at runtime is
> + * unsafe on this controller, so bail out instead of trying to
> + * update the existing region.
> + */
> + return -EINVAL;
> + }
>
> - dw_pcie_ep_unmap_addr(epc, func_no, 0, ep->msi_mem_phys);
> + writel(msg_data | (interrupt_num - 1), ep->msi_mem + offset);
>
> return 0;
> }
> @@ -786,14 +819,36 @@ int dw_pcie_ep_raise_msix_irq(struct dw_pcie_ep *ep, u8 func_no,
> }
>
> msg_addr = dw_pcie_ep_align_addr(epc, msg_addr, &map_size, &offset);
> - ret = dw_pcie_ep_map_addr(epc, func_no, 0, ep->msi_mem_phys, msg_addr,
> - map_size);
> - if (ret)
> - return ret;
>
> - writel(msg_data, ep->msi_mem + offset);
> + /*
> + * Program the outbound iATU once and keep it enabled.
> + *
> + * The spec warns that updating iATU registers while there are
> + * operations in flight on the AXI bridge interface is not
> + * supported, so we avoid reprogramming the region on every MSI-X,
> + * specifically unmapping immediately after writel().
> + */
> + if (!ep->msi_iatu_mapped) {
This is wrong, in MSIX each vector can give you different address, you
can't expect same address for
all the vectors in MSIX table. In ARM based system you might see only
single address for X86 this will
change.
And also we see in MSIX the address are getting updated at runtime with
x86 windows host machines.
Use the MSIX doorbell method which will not use iATU at all,
dw_pcie_ep_raise_msix_irq_doorbell().
- Krishna Chaitanya.
> + ret = dw_pcie_ep_map_addr(epc, func_no, 0, ep->msi_mem_phys, msg_addr,
> + map_size);
> + if (ret)
> + return ret;
> +
> + ep->msi_iatu_mapped = true;
> + ep->msi_msg_addr = msg_addr;
> + ep->msi_map_size = map_size;
> + } else if (WARN_ON_ONCE(ep->msi_msg_addr != msg_addr ||
> + ep->msi_map_size != map_size)) {
> + /*
> + * The host changed the MSI-X target address or the required
> + * mapping size changed. Reprogramming the iATU at runtime is
> + * unsafe on this controller, so bail out instead of trying to
> + * update the existing region.
> + */
> + return -EINVAL;
> + }
>
> - dw_pcie_ep_unmap_addr(epc, func_no, 0, ep->msi_mem_phys);
> + writel(msg_data, ep->msi_mem + offset);
>
> return 0;
> }
> @@ -1086,6 +1141,9 @@ int dw_pcie_ep_init(struct dw_pcie_ep *ep)
> struct device *dev = pci->dev;
>
> INIT_LIST_HEAD(&ep->func_list);
> + ep->msi_iatu_mapped = false;
> + ep->msi_msg_addr = 0;
> + ep->msi_map_size = 0;
>
> epc = devm_pci_epc_create(dev, &epc_ops);
> if (IS_ERR(epc)) {
> diff --git a/drivers/pci/controller/dwc/pcie-designware.h b/drivers/pci/controller/dwc/pcie-designware.h
> index a31bd93490dc..1093c622826d 100644
> --- a/drivers/pci/controller/dwc/pcie-designware.h
> +++ b/drivers/pci/controller/dwc/pcie-designware.h
> @@ -472,6 +472,11 @@ struct dw_pcie_ep {
> void __iomem *msi_mem;
> phys_addr_t msi_mem_phys;
> struct pci_epf_bar *epf_bar[PCI_STD_NUM_BARS];
> +
> + /* MSI outbound iATU state */
> + bool msi_iatu_mapped;
> + u64 msi_msg_addr;
> + size_t msi_map_size;
> };
>
> struct dw_pcie_ops {
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: [PATCH] PCI: dwc: ep: Cache MSI outbound iATU mapping
2025-12-22 8:19 ` Krishna Chaitanya Chundru
@ 2025-12-22 8:36 ` Niklas Cassel
2025-12-22 8:42 ` Krishna Chaitanya Chundru
2025-12-22 9:58 ` Manivannan Sadhasivam
1 sibling, 1 reply; 17+ messages in thread
From: Niklas Cassel @ 2025-12-22 8:36 UTC (permalink / raw)
To: Krishna Chaitanya Chundru
Cc: Jingoo Han, Manivannan Sadhasivam, Lorenzo Pieralisi,
Krzysztof Wilczyński, Rob Herring, Bjorn Helgaas, Frank Li,
Damien Le Moal, Koichiro Den, linux-pci
On Mon, Dec 22, 2025 at 01:49:21PM +0530, Krishna Chaitanya Chundru wrote:
> On 12/10/2025 12:43 PM, Niklas Cassel wrote:
> > @@ -786,14 +819,36 @@ int dw_pcie_ep_raise_msix_irq(struct dw_pcie_ep *ep, u8 func_no,
> > }
> > msg_addr = dw_pcie_ep_align_addr(epc, msg_addr, &map_size, &offset);
> > - ret = dw_pcie_ep_map_addr(epc, func_no, 0, ep->msi_mem_phys, msg_addr,
> > - map_size);
> > - if (ret)
> > - return ret;
> > - writel(msg_data, ep->msi_mem + offset);
> > + /*
> > + * Program the outbound iATU once and keep it enabled.
> > + *
> > + * The spec warns that updating iATU registers while there are
> > + * operations in flight on the AXI bridge interface is not
> > + * supported, so we avoid reprogramming the region on every MSI-X,
> > + * specifically unmapping immediately after writel().
> > + */
> > + if (!ep->msi_iatu_mapped) {
> This is wrong, in MSIX each vector can give you different address, you can't
> expect same address for
> all the vectors in MSIX table. In ARM based system you might see only single
> address for X86 this will
> change.
Ok, thank you. I did not know.
>
> And also we see in MSIX the address are getting updated at runtime with x86
> windows host machines.
My idea was to add a pci_epc_set_currently_enabled_irq_type() API, and then
let the EPF driver call that if it wants to change the IRQ type.
But... if the msg_addr can change at runtime, even when the IRQ type does
not change, then a pci_epc_set_currently_enabled_irq_type() API will not
help.
I guess we will need to come up with something else for the MSI-X case.
>
> Use the MSIX doorbell method which will not use iATU at all,
> dw_pcie_ep_raise_msix_irq_doorbell().
AFAICT, right now, the only driver ever calling this function is:
drivers/pci/controller/dwc/pci-layerscape-ep.c
Are you suggesting that we somehow change all the other DWC based EPC
drivers' .raise_irq() callback to call dw_pcie_ep_raise_msix_irq_doorbell()
instead of dw_pcie_ep_raise_msix_irq() for case PCI_IRQ_MSIX ?
That sounds like a big change that would need to be tested and verified
for each DWC based EPC driver.
Kind regards,
Niklas
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: [PATCH] PCI: dwc: ep: Cache MSI outbound iATU mapping
2025-12-22 8:36 ` Niklas Cassel
@ 2025-12-22 8:42 ` Krishna Chaitanya Chundru
2025-12-22 8:49 ` Niklas Cassel
0 siblings, 1 reply; 17+ messages in thread
From: Krishna Chaitanya Chundru @ 2025-12-22 8:42 UTC (permalink / raw)
To: Niklas Cassel
Cc: Jingoo Han, Manivannan Sadhasivam, Lorenzo Pieralisi,
Krzysztof Wilczyński, Rob Herring, Bjorn Helgaas, Frank Li,
Damien Le Moal, Koichiro Den, linux-pci
On 12/22/2025 2:06 PM, Niklas Cassel wrote:
> On Mon, Dec 22, 2025 at 01:49:21PM +0530, Krishna Chaitanya Chundru wrote:
>> On 12/10/2025 12:43 PM, Niklas Cassel wrote:
>>> @@ -786,14 +819,36 @@ int dw_pcie_ep_raise_msix_irq(struct dw_pcie_ep *ep, u8 func_no,
>>> }
>>> msg_addr = dw_pcie_ep_align_addr(epc, msg_addr, &map_size, &offset);
>>> - ret = dw_pcie_ep_map_addr(epc, func_no, 0, ep->msi_mem_phys, msg_addr,
>>> - map_size);
>>> - if (ret)
>>> - return ret;
>>> - writel(msg_data, ep->msi_mem + offset);
>>> + /*
>>> + * Program the outbound iATU once and keep it enabled.
>>> + *
>>> + * The spec warns that updating iATU registers while there are
>>> + * operations in flight on the AXI bridge interface is not
>>> + * supported, so we avoid reprogramming the region on every MSI-X,
>>> + * specifically unmapping immediately after writel().
>>> + */
>>> + if (!ep->msi_iatu_mapped) {
>> This is wrong, in MSIX each vector can give you different address, you can't
>> expect same address for
>> all the vectors in MSIX table. In ARM based system you might see only single
>> address for X86 this will
>> change.
> Ok, thank you. I did not know.
>
>
>> And also we see in MSIX the address are getting updated at runtime with x86
>> windows host machines.
> My idea was to add a pci_epc_set_currently_enabled_irq_type() API, and then
> let the EPF driver call that if it wants to change the IRQ type.
>
> But... if the msg_addr can change at runtime, even when the IRQ type does
> not change, then a pci_epc_set_currently_enabled_irq_type() API will not
> help.
>
> I guess we will need to come up with something else for the MSI-X case.
>
>
>> Use the MSIX doorbell method which will not use iATU at all,
>> dw_pcie_ep_raise_msix_irq_doorbell().
> AFAICT, right now, the only driver ever calling this function is:
> drivers/pci/controller/dwc/pci-layerscape-ep.c
>
> Are you suggesting that we somehow change all the other DWC based EPC
> drivers' .raise_irq() callback to call dw_pcie_ep_raise_msix_irq_doorbell()
> instead of dw_pcie_ep_raise_msix_irq() for case PCI_IRQ_MSIX ?
Yes.
> That sounds like a big change that would need to be tested and verified
> for each DWC based EPC driver.
I agree, but this will be clean solution to avoid iATU for MSIX.
- Krishna Chaitanya.
> Kind regards,
> Niklas
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: [PATCH] PCI: dwc: ep: Cache MSI outbound iATU mapping
2025-12-22 8:42 ` Krishna Chaitanya Chundru
@ 2025-12-22 8:49 ` Niklas Cassel
0 siblings, 0 replies; 17+ messages in thread
From: Niklas Cassel @ 2025-12-22 8:49 UTC (permalink / raw)
To: Krishna Chaitanya Chundru
Cc: Jingoo Han, Manivannan Sadhasivam, Lorenzo Pieralisi,
Krzysztof Wilczyński, Rob Herring, Bjorn Helgaas, Frank Li,
Damien Le Moal, Koichiro Den, linux-pci
On 22 December 2025 09:42:50 CET, Krishna Chaitanya Chundru <krishna.chundru@oss.qualcomm.com> wrote:
>> I guess we will need to come up with something else for the MSI-X case.
>>
>>
>>> Use the MSIX doorbell method which will not use iATU at all,
>>> dw_pcie_ep_raise_msix_irq_doorbell().
>> AFAICT, right now, the only driver ever calling this function is:
>> drivers/pci/controller/dwc/pci-layerscape-ep.c
>>
>> Are you suggesting that we somehow change all the other DWC based EPC
>> drivers' .raise_irq() callback to call dw_pcie_ep_raise_msix_irq_doorbell()
>> instead of dw_pcie_ep_raise_msix_irq() for case PCI_IRQ_MSIX ?
>Yes.
>> That sounds like a big change that would need to be tested and verified
>> for each DWC based EPC driver.
>I agree, but this will be clean solution to avoid iATU for MSIX.
I agree that this sounds like a nice solution for those platforms that can use it.
But, for platforms that might not be able to use this method (not sure if this is possible with all versions of the DWC core),
we probably still need to come up with some other solution to this problem.
It would also be nice with some solution that can be used until all drivers have been converted.
Kind regards,
Niklas
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] PCI: dwc: ep: Cache MSI outbound iATU mapping
2025-12-22 8:19 ` Krishna Chaitanya Chundru
2025-12-22 8:36 ` Niklas Cassel
@ 2025-12-22 9:58 ` Manivannan Sadhasivam
2025-12-22 10:42 ` Niklas Cassel
2025-12-22 11:11 ` Niklas Cassel
1 sibling, 2 replies; 17+ messages in thread
From: Manivannan Sadhasivam @ 2025-12-22 9:58 UTC (permalink / raw)
To: Krishna Chaitanya Chundru
Cc: Niklas Cassel, Jingoo Han, Lorenzo Pieralisi,
Krzysztof Wilczyński, Rob Herring, Bjorn Helgaas, Frank Li,
Damien Le Moal, Koichiro Den, linux-pci
On Mon, Dec 22, 2025 at 01:49:21PM +0530, Krishna Chaitanya Chundru wrote:
>
>
> On 12/10/2025 12:43 PM, Niklas Cassel wrote:
> > From: Koichiro Den <den@valinux.co.jp>
> >
> > dw_pcie_ep_raise_msi_irq() currently programs an outbound iATU window
> > for the MSI target address on every interrupt and tears it down again
> > via dw_pcie_ep_unmap_addr().
> >
> > On systems that heavily use the AXI bridge interface (for example when
> > the integrated eDMA engine is active), this means the outbound iATU
> > registers are updated while traffic is in flight. The DesignWare
> > endpoint databook 5.40a - "3.10.6.1 iATU Outbound Programming Overview"
> > warns that updating iATU registers in this situation is not supported,
> > and the behavior is undefined.
> >
> > Under high MSI and eDMA load this pattern results in occasional bogus
> > outbound transactions and IOMMU faults, on the RC side, such as:
> >
> > ipmmu-vmsa eed40000.iommu: Unhandled fault: status 0x00001502 iova 0xfe000000
> >
> > followed by the system becoming unresponsive. This is the actual output
> > observed on Renesas R-Car S4, with its ipmmu_hc used with PCIe ch0.
> >
> > There is no need to reprogram the iATU region used for MSI on every
> > interrupt. The host-provided MSI address is stable while MSI is enabled,
> > and the endpoint driver already dedicates a scratch buffer for MSI
> > generation.
> >
> > Cache the aligned MSI address and map size, program the outbound iATU
> > once, and keep the window enabled. Subsequent interrupts only perform a
> > write to the MSI scratch buffer, avoiding dynamic iATU reprogramming in
> > the hot path and fixing the lockups seen under load.
> >
> > Signed-off-by: Koichiro Den <den@valinux.co.jp>
> > [cassel: do same change for dw_pcie_ep_raise_msix_irq()]
> > Signed-off-by: Niklas Cassel <cassel@kernel.org>
Thanks for splitting it up!
> > ---
> > Notes: Without this patch, I also see IOMMU errors on the host side,
> > when running drivers/nvme/target/pci-epf.c with a large queue depth.
> >
> > pci_epc_raise_irq() does take a mutex, so the calls to these functions
> > are serialized, so it really appears that the DWC controller does not
> > handle iATU reprogramming while there are outstanding transactions.
> >
> > Just like Koichiro describes here:
> > https://lore.kernel.org/linux-pci/ddriorsgyjs6klcb6d7pi2u3ah3wxlzku7v2dpyjlo6tmalvfw@yj5dczlkggt6/
> >
> > I also see the iova faulting on the RC side to be the start of "addr_space"
> > on the EP, so it appears that a transaction has gone through untranslated.
> > (Most likely because the DWC controller does handle the iATU table being
> > modified while there are outstanding transactions.)
> >
> > This patch has been tested using pci-epf-test and nvmet-pci-epf on rock5b.
> >
> > pci-epf-test does change between MSI and MSI-X without calling
> > dw_pcie_ep_stop(), however, the msg_addr address written by the host
> > will be the same address, at least when using a Linux host using a DWC
> > based controller. If another host ends up using different msg_addr for
> > MSI and MSI-X, then I think that we will need to modify pci-epf-test to
> > call a function when changing IRQ type, such that pcie-designware-ep.c
> > can tear down the MSI/MSI-X mapping.
> >
> > .../pci/controller/dwc/pcie-designware-ep.c | 82 ++++++++++++++++---
> > drivers/pci/controller/dwc/pcie-designware.h | 5 ++
> > 2 files changed, 75 insertions(+), 12 deletions(-)
> >
> > diff --git a/drivers/pci/controller/dwc/pcie-designware-ep.c b/drivers/pci/controller/dwc/pcie-designware-ep.c
> > index 7f2112c2fb21..2bbeddaa73d4 100644
> > --- a/drivers/pci/controller/dwc/pcie-designware-ep.c
> > +++ b/drivers/pci/controller/dwc/pcie-designware-ep.c
> > @@ -601,6 +601,16 @@ static void dw_pcie_ep_stop(struct pci_epc *epc)
> > struct dw_pcie_ep *ep = epc_get_drvdata(epc);
> > struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> > + /*
> > + * Tear down the dedicated outbound window used for MSI
> > + * generation. This avoids leaking an iATU window across
> > + * endpoint stop/start cycles.
> > + */
> > + if (ep->msi_iatu_mapped) {
> > + dw_pcie_ep_unmap_addr(epc, 0, 0, ep->msi_mem_phys);
> > + ep->msi_iatu_mapped = false;
> > + }
> > +
> > dw_pcie_stop_link(pci);
> > }
> > @@ -702,14 +712,37 @@ int dw_pcie_ep_raise_msi_irq(struct dw_pcie_ep *ep, u8 func_no,
> > msg_addr = ((u64)msg_addr_upper) << 32 | msg_addr_lower;
> > msg_addr = dw_pcie_ep_align_addr(epc, msg_addr, &map_size, &offset);
> > - ret = dw_pcie_ep_map_addr(epc, func_no, 0, ep->msi_mem_phys, msg_addr,
> > - map_size);
> > - if (ret)
> > - return ret;
> > - writel(msg_data | (interrupt_num - 1), ep->msi_mem + offset);
> > + /*
> > + * Program the outbound iATU once and keep it enabled.
> > + *
> > + * The spec warns that updating iATU registers while there are
> > + * operations in flight on the AXI bridge interface is not
> > + * supported, so we avoid reprogramming the region on every MSI,
> > + * specifically unmapping immediately after writel().
> > + */
> > + if (!ep->msi_iatu_mapped) {
> > + ret = dw_pcie_ep_map_addr(epc, func_no, 0,
> > + ep->msi_mem_phys, msg_addr,
> > + map_size);
> > + if (ret)
> > + return ret;
> > +
> > + ep->msi_iatu_mapped = true;
> > + ep->msi_msg_addr = msg_addr;
> > + ep->msi_map_size = map_size;
> > + } else if (WARN_ON_ONCE(ep->msi_msg_addr != msg_addr ||
> > + ep->msi_map_size != map_size)) {
> > + /*
> > + * The host changed the MSI target address or the required
> > + * mapping size changed. Reprogramming the iATU at runtime is
> > + * unsafe on this controller, so bail out instead of trying to
> > + * update the existing region.
> > + */
> > + return -EINVAL;
> > + }
> > - dw_pcie_ep_unmap_addr(epc, func_no, 0, ep->msi_mem_phys);
> > + writel(msg_data | (interrupt_num - 1), ep->msi_mem + offset);
> > return 0;
> > }
> > @@ -786,14 +819,36 @@ int dw_pcie_ep_raise_msix_irq(struct dw_pcie_ep *ep, u8 func_no,
> > }
> > msg_addr = dw_pcie_ep_align_addr(epc, msg_addr, &map_size, &offset);
> > - ret = dw_pcie_ep_map_addr(epc, func_no, 0, ep->msi_mem_phys, msg_addr,
> > - map_size);
> > - if (ret)
> > - return ret;
> > - writel(msg_data, ep->msi_mem + offset);
> > + /*
> > + * Program the outbound iATU once and keep it enabled.
> > + *
> > + * The spec warns that updating iATU registers while there are
> > + * operations in flight on the AXI bridge interface is not
> > + * supported, so we avoid reprogramming the region on every MSI-X,
> > + * specifically unmapping immediately after writel().
> > + */
> > + if (!ep->msi_iatu_mapped) {
> This is wrong, in MSIX each vector can give you different address, you can't
> expect same address for
> all the vectors in MSIX table. In ARM based system you might see only single
> address for X86 this will
> change.
>
> And also we see in MSIX the address are getting updated at runtime with x86
> windows host machines.
>
The spec allows changing the MSI-X address/data pair when the corresponding
vector is *masked*, and classifies the behavior as undefined if address/data
pair gets changed while the vector is *unmasked*.
If we were to enable caching, then we need to take note of the above point. But
the EP implementation doesn't get any notification when the vector gets
masked/unmasked dynamically, which makes it trickier to implement the caching in
SW.
> Use the MSIX doorbell method which will not use iATU at all,
> dw_pcie_ep_raise_msix_irq_doorbell().
>
I think this is the safe bet since this feature doesn't seem like an optional
one.
Niklas, if you can just fix MSI in this patch and leave out MSI-X for the vendor
drivers to transition to doorbell, I'm OK to merge it. Otherwise, I don't know
how you can reliably fix MSI-X generation with AXI slave interface.
- Mani
--
மணிவண்ணன் சதாசிவம்
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: [PATCH] PCI: dwc: ep: Cache MSI outbound iATU mapping
2025-12-22 9:58 ` Manivannan Sadhasivam
@ 2025-12-22 10:42 ` Niklas Cassel
2025-12-22 11:11 ` Niklas Cassel
1 sibling, 0 replies; 17+ messages in thread
From: Niklas Cassel @ 2025-12-22 10:42 UTC (permalink / raw)
To: Manivannan Sadhasivam
Cc: Krishna Chaitanya Chundru, Jingoo Han, Lorenzo Pieralisi,
Krzysztof Wilczyński, Rob Herring, Bjorn Helgaas, Frank Li,
Damien Le Moal, Koichiro Den, linux-pci
On Mon, Dec 22, 2025 at 03:28:30PM +0530, Manivannan Sadhasivam wrote:
> > This is wrong, in MSIX each vector can give you different address, you can't
> > expect same address for
> > all the vectors in MSIX table. In ARM based system you might see only single
> > address for X86 this will
> > change.
> >
> > And also we see in MSIX the address are getting updated at runtime with x86
> > windows host machines.
> >
>
> The spec allows changing the MSI-X address/data pair when the corresponding
> vector is *masked*, and classifies the behavior as undefined if address/data
> pair gets changed while the vector is *unmasked*.
MSI also supports masking as an optional feature.
Too bad that the spec just mentions that the cached MSI-X address/data is
allowed to be changed when masked, but does not mention the same for MSI.
>
> If we were to enable caching, then we need to take note of the above point. But
> the EP implementation doesn't get any notification when the vector gets
> masked/unmasked dynamically, which makes it trickier to implement the caching in
> SW.
>
> > Use the MSIX doorbell method which will not use iATU at all,
> > dw_pcie_ep_raise_msix_irq_doorbell().
> >
>
> I think this is the safe bet since this feature doesn't seem like an optional
> one.
>
> Niklas, if you can just fix MSI in this patch and leave out MSI-X for the vendor
> drivers to transition to doorbell, I'm OK to merge it. Otherwise, I don't know
> how you can reliably fix MSI-X generation with AXI slave interface.
Sounds like a plan.
Kind regards,
Niklas
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] PCI: dwc: ep: Cache MSI outbound iATU mapping
2025-12-22 9:58 ` Manivannan Sadhasivam
2025-12-22 10:42 ` Niklas Cassel
@ 2025-12-22 11:11 ` Niklas Cassel
2025-12-22 11:19 ` Manivannan Sadhasivam
2025-12-22 12:23 ` Krishna Chaitanya Chundru
1 sibling, 2 replies; 17+ messages in thread
From: Niklas Cassel @ 2025-12-22 11:11 UTC (permalink / raw)
To: Manivannan Sadhasivam
Cc: Krishna Chaitanya Chundru, Jingoo Han, Lorenzo Pieralisi,
Krzysztof Wilczyński, Rob Herring, Bjorn Helgaas, Frank Li,
Damien Le Moal, Koichiro Den, linux-pci
On Mon, Dec 22, 2025 at 03:28:30PM +0530, Manivannan Sadhasivam wrote:
>
> > Use the MSIX doorbell method which will not use iATU at all,
> > dw_pcie_ep_raise_msix_irq_doorbell().
> >
>
> I think this is the safe bet since this feature doesn't seem like an optional
> one.
>
> Niklas, if you can just fix MSI in this patch and leave out MSI-X for the vendor
> drivers to transition to doorbell, I'm OK to merge it. Otherwise, I don't know
> how you can reliably fix MSI-X generation with AXI slave interface.
FWIW, I did try to simply change:
diff --git a/drivers/pci/controller/dwc/pcie-dw-rockchip.c b/drivers/pci/controller/dwc/pcie-dw-rockchip.c
index 8f2cc1ef25e3..00770f9786e3 100644
--- a/drivers/pci/controller/dwc/pcie-dw-rockchip.c
+++ b/drivers/pci/controller/dwc/pcie-dw-rockchip.c
@@ -319,7 +319,8 @@ static int rockchip_pcie_raise_irq(struct dw_pcie_ep *ep, u8 func_no,
case PCI_IRQ_MSI:
return dw_pcie_ep_raise_msi_irq(ep, func_no, interrupt_num);
case PCI_IRQ_MSIX:
- return dw_pcie_ep_raise_msix_irq(ep, func_no, interrupt_num);
+ return dw_pcie_ep_raise_msix_irq_doorbell(ep, func_no,
+ interrupt_num);
default:
dev_err(pci->dev, "UNKNOWN IRQ type\n");
}
For the pcie-dw-rockchip driver, but it is not working:
[ 130.042849] nvme nvme0: I/O tag 0 (1000) QID 0 timeout, completion polled
Without this change, things work.
Perhaps this feature is not an optional one, but at least we will require
more changes than a simple one liner.
Kind regards,
Niklas
^ permalink raw reply related [flat|nested] 17+ messages in thread* Re: [PATCH] PCI: dwc: ep: Cache MSI outbound iATU mapping
2025-12-22 11:11 ` Niklas Cassel
@ 2025-12-22 11:19 ` Manivannan Sadhasivam
2025-12-22 12:23 ` Krishna Chaitanya Chundru
1 sibling, 0 replies; 17+ messages in thread
From: Manivannan Sadhasivam @ 2025-12-22 11:19 UTC (permalink / raw)
To: Niklas Cassel
Cc: Krishna Chaitanya Chundru, Jingoo Han, Lorenzo Pieralisi,
Krzysztof Wilczyński, Rob Herring, Bjorn Helgaas, Frank Li,
Damien Le Moal, Koichiro Den, linux-pci
On Mon, Dec 22, 2025 at 12:11:07PM +0100, Niklas Cassel wrote:
> On Mon, Dec 22, 2025 at 03:28:30PM +0530, Manivannan Sadhasivam wrote:
> >
> > > Use the MSIX doorbell method which will not use iATU at all,
> > > dw_pcie_ep_raise_msix_irq_doorbell().
> > >
> >
> > I think this is the safe bet since this feature doesn't seem like an optional
> > one.
> >
> > Niklas, if you can just fix MSI in this patch and leave out MSI-X for the vendor
> > drivers to transition to doorbell, I'm OK to merge it. Otherwise, I don't know
> > how you can reliably fix MSI-X generation with AXI slave interface.
>
> FWIW, I did try to simply change:
>
> diff --git a/drivers/pci/controller/dwc/pcie-dw-rockchip.c b/drivers/pci/controller/dwc/pcie-dw-rockchip.c
> index 8f2cc1ef25e3..00770f9786e3 100644
> --- a/drivers/pci/controller/dwc/pcie-dw-rockchip.c
> +++ b/drivers/pci/controller/dwc/pcie-dw-rockchip.c
> @@ -319,7 +319,8 @@ static int rockchip_pcie_raise_irq(struct dw_pcie_ep *ep, u8 func_no,
> case PCI_IRQ_MSI:
> return dw_pcie_ep_raise_msi_irq(ep, func_no, interrupt_num);
> case PCI_IRQ_MSIX:
> - return dw_pcie_ep_raise_msix_irq(ep, func_no, interrupt_num);
> + return dw_pcie_ep_raise_msix_irq_doorbell(ep, func_no,
> + interrupt_num);
> default:
> dev_err(pci->dev, "UNKNOWN IRQ type\n");
> }
>
>
> For the pcie-dw-rockchip driver, but it is not working:
> [ 130.042849] nvme nvme0: I/O tag 0 (1000) QID 0 timeout, completion polled
>
> Without this change, things work.
>
> Perhaps this feature is not an optional one, but at least we will require
> more changes than a simple one liner.
>
Fair enough. We should not just blindly replace the API, that's why I preferred
the vendor drivers themselves to transition (meaning the vendor maintainers
doing the change instead of us).
- Mani
--
மணிவண்ணன் சதாசிவம்
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] PCI: dwc: ep: Cache MSI outbound iATU mapping
2025-12-22 11:11 ` Niklas Cassel
2025-12-22 11:19 ` Manivannan Sadhasivam
@ 2025-12-22 12:23 ` Krishna Chaitanya Chundru
2025-12-22 13:00 ` Niklas Cassel
1 sibling, 1 reply; 17+ messages in thread
From: Krishna Chaitanya Chundru @ 2025-12-22 12:23 UTC (permalink / raw)
To: Niklas Cassel, Manivannan Sadhasivam
Cc: Jingoo Han, Lorenzo Pieralisi, Krzysztof Wilczyński,
Rob Herring, Bjorn Helgaas, Frank Li, Damien Le Moal,
Koichiro Den, linux-pci
On 12/22/2025 4:41 PM, Niklas Cassel wrote:
> On Mon, Dec 22, 2025 at 03:28:30PM +0530, Manivannan Sadhasivam wrote:
>>> Use the MSIX doorbell method which will not use iATU at all,
>>> dw_pcie_ep_raise_msix_irq_doorbell().
>>>
>> I think this is the safe bet since this feature doesn't seem like an optional
>> one.
>>
>> Niklas, if you can just fix MSI in this patch and leave out MSI-X for the vendor
>> drivers to transition to doorbell, I'm OK to merge it. Otherwise, I don't know
>> how you can reliably fix MSI-X generation with AXI slave interface.
> FWIW, I did try to simply change:
>
> diff --git a/drivers/pci/controller/dwc/pcie-dw-rockchip.c b/drivers/pci/controller/dwc/pcie-dw-rockchip.c
> index 8f2cc1ef25e3..00770f9786e3 100644
> --- a/drivers/pci/controller/dwc/pcie-dw-rockchip.c
> +++ b/drivers/pci/controller/dwc/pcie-dw-rockchip.c
> @@ -319,7 +319,8 @@ static int rockchip_pcie_raise_irq(struct dw_pcie_ep *ep, u8 func_no,
> case PCI_IRQ_MSI:
> return dw_pcie_ep_raise_msi_irq(ep, func_no, interrupt_num);
> case PCI_IRQ_MSIX:
> - return dw_pcie_ep_raise_msix_irq(ep, func_no, interrupt_num);
> + return dw_pcie_ep_raise_msix_irq_doorbell(ep, func_no,
> + interrupt_num);
> default:
> dev_err(pci->dev, "UNKNOWN IRQ type\n");
> }
>
>
> For the pcie-dw-rockchip driver, but it is not working:
> [ 130.042849] nvme nvme0: I/O tag 0 (1000) QID 0 timeout, completion polled
>
> Without this change, things work.
>
> Perhaps this feature is not an optional one, but at least we will require
> more changes than a simple one liner.
Hi Niklas,
It should be automatic only, no extra configurations should be required,
I believe your
HW doesn't support this feature, from spec 6..0a, sec 3.9.1.3 iMSIX-TX:
Integrated MSI-X Transmit (USP)
I believe your HW is not generated with MSIX_TABLE_EN =1. In that case
you can't use this feature.
or
Are using VF for NVMe, in that MSIX doorbell logic needs to be updated
for VF's also.
There is one more option to use MSIX with out using iATU i.e MSIX
address match mode
taken from same section above this also, but it is not clear to me if
this feature also depends
on MSIX_TABLE_EN=1. May be give it try once.
"Automatic triggering of MSI-X generation when your application issues a
MWr at the
AXI subordi nate interface with an address equal to
MSIX_ADDRESS_MATCH_LOW_OFF
and MSIX_ADDRESS_ MATCH_HIGH_OFF.
❑ MWr must be DWORD-aligned and must contain MSI-X vector, TC, and
function number.
■ When your local application issues an MWr on the AXI subordinate
interface to a specific address
(MSIX_ADDRESS_MATCH_LOW_OFF), the controller extracts the vector and TC
information from the
MWr payload and creates the MSI-X request.
■ AXI bridge guarantees ordering coherency by forwarding the MSI-X
request after the preceding MWr has been transmitted.
■ Processes one MSI-X request at a time and halts subordinate access if
a second MSI-X request is submitted before first one is complete"
we can allocated a address and write in this MSIX Address match register
at probe and update that address with the vector
number and function number etc(Same as how we write in MSIX doorbell
register) when we want to send the msix.
- Krishna Chaitanya.
>
>
> Kind regards,
> Niklas
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] PCI: dwc: ep: Cache MSI outbound iATU mapping
2025-12-22 12:23 ` Krishna Chaitanya Chundru
@ 2025-12-22 13:00 ` Niklas Cassel
2025-12-23 1:12 ` Shawn Lin
0 siblings, 1 reply; 17+ messages in thread
From: Niklas Cassel @ 2025-12-22 13:00 UTC (permalink / raw)
To: Krishna Chaitanya Chundru
Cc: Manivannan Sadhasivam, Jingoo Han, Lorenzo Pieralisi,
Krzysztof Wilczyński, Rob Herring, Bjorn Helgaas, Frank Li,
Damien Le Moal, Koichiro Den, linux-pci, Shawn Lin
+ Shawn
On Mon, Dec 22, 2025 at 05:53:27PM +0530, Krishna Chaitanya Chundru wrote:
>
>
> On 12/22/2025 4:41 PM, Niklas Cassel wrote:
> > On Mon, Dec 22, 2025 at 03:28:30PM +0530, Manivannan Sadhasivam wrote:
> > > > Use the MSIX doorbell method which will not use iATU at all,
> > > > dw_pcie_ep_raise_msix_irq_doorbell().
> > > >
> > > I think this is the safe bet since this feature doesn't seem like an optional
> > > one.
> > >
> > > Niklas, if you can just fix MSI in this patch and leave out MSI-X for the vendor
> > > drivers to transition to doorbell, I'm OK to merge it. Otherwise, I don't know
> > > how you can reliably fix MSI-X generation with AXI slave interface.
> > FWIW, I did try to simply change:
> >
> > diff --git a/drivers/pci/controller/dwc/pcie-dw-rockchip.c b/drivers/pci/controller/dwc/pcie-dw-rockchip.c
> > index 8f2cc1ef25e3..00770f9786e3 100644
> > --- a/drivers/pci/controller/dwc/pcie-dw-rockchip.c
> > +++ b/drivers/pci/controller/dwc/pcie-dw-rockchip.c
> > @@ -319,7 +319,8 @@ static int rockchip_pcie_raise_irq(struct dw_pcie_ep *ep, u8 func_no,
> > case PCI_IRQ_MSI:
> > return dw_pcie_ep_raise_msi_irq(ep, func_no, interrupt_num);
> > case PCI_IRQ_MSIX:
> > - return dw_pcie_ep_raise_msix_irq(ep, func_no, interrupt_num);
> > + return dw_pcie_ep_raise_msix_irq_doorbell(ep, func_no,
> > + interrupt_num);
> > default:
> > dev_err(pci->dev, "UNKNOWN IRQ type\n");
> > }
> >
> >
> > For the pcie-dw-rockchip driver, but it is not working:
> > [ 130.042849] nvme nvme0: I/O tag 0 (1000) QID 0 timeout, completion polled
> >
> > Without this change, things work.
> >
> > Perhaps this feature is not an optional one, but at least we will require
> > more changes than a simple one liner.
> Hi Niklas,
>
> It should be automatic only, no extra configurations should be
> required, I believe your
> HW doesn't support this feature, from spec 6..0a, sec 3.9.1.3
> iMSIX-TX: Integrated MSI-X Transmit (USP)
> I believe your HW is not generated with MSIX_TABLE_EN =1. In that
> case you can't use this feature.
Looking at the RK3588 TRM, it does have register:
USP_PCIE_PL_MSIX_DOORBELL_OFF
Address: Operational Base + offset (0x0248)
Port Logic registers start at offset 0x700 on this SoC,
so 0x700 + 0x248 == 0x948, which matches:
drivers/pci/controller/dwc/pcie-designware.h:#define PCIE_MSIX_DOORBELL 0x948
I don't think the TRM would include this register if the
DWC coere was not generated with MSIX_TABLE_EN=1.
Shawn, any suggestions?
For full thread, see:
https://lore.kernel.org/linux-pci/3b34aa66-a418-4f6b-930a-0728d87d79b6@oss.qualcomm.com/T/#t
FWIW, Krishna Chaitanya, I did try the dw_pcie_ep_raise_msix_irq_doorbell()
change above also with the pci-epf-test EPF driver, and it also caused the
pci-epf-test driver to stop working.
Kind regards,
Niklas
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] PCI: dwc: ep: Cache MSI outbound iATU mapping
2025-12-22 13:00 ` Niklas Cassel
@ 2025-12-23 1:12 ` Shawn Lin
2025-12-23 4:35 ` Krishna Chaitanya Chundru
0 siblings, 1 reply; 17+ messages in thread
From: Shawn Lin @ 2025-12-23 1:12 UTC (permalink / raw)
To: Niklas Cassel, Krishna Chaitanya Chundru
Cc: shawn.lin, Manivannan Sadhasivam, Jingoo Han, Lorenzo Pieralisi,
Krzysztof Wilczyński, Rob Herring, Bjorn Helgaas, Frank Li,
Damien Le Moal, Koichiro Den, linux-pci
在 2025/12/22 星期一 21:00, Niklas Cassel 写道:
> + Shawn
>
> On Mon, Dec 22, 2025 at 05:53:27PM +0530, Krishna Chaitanya Chundru wrote:
>>
>>
>> On 12/22/2025 4:41 PM, Niklas Cassel wrote:
>>> On Mon, Dec 22, 2025 at 03:28:30PM +0530, Manivannan Sadhasivam wrote:
>>>>> Use the MSIX doorbell method which will not use iATU at all,
>>>>> dw_pcie_ep_raise_msix_irq_doorbell().
>>>>>
>>>> I think this is the safe bet since this feature doesn't seem like an optional
>>>> one.
>>>>
>>>> Niklas, if you can just fix MSI in this patch and leave out MSI-X for the vendor
>>>> drivers to transition to doorbell, I'm OK to merge it. Otherwise, I don't know
>>>> how you can reliably fix MSI-X generation with AXI slave interface.
>>> FWIW, I did try to simply change:
>>>
>>> diff --git a/drivers/pci/controller/dwc/pcie-dw-rockchip.c b/drivers/pci/controller/dwc/pcie-dw-rockchip.c
>>> index 8f2cc1ef25e3..00770f9786e3 100644
>>> --- a/drivers/pci/controller/dwc/pcie-dw-rockchip.c
>>> +++ b/drivers/pci/controller/dwc/pcie-dw-rockchip.c
>>> @@ -319,7 +319,8 @@ static int rockchip_pcie_raise_irq(struct dw_pcie_ep *ep, u8 func_no,
>>> case PCI_IRQ_MSI:
>>> return dw_pcie_ep_raise_msi_irq(ep, func_no, interrupt_num);
>>> case PCI_IRQ_MSIX:
>>> - return dw_pcie_ep_raise_msix_irq(ep, func_no, interrupt_num);
>>> + return dw_pcie_ep_raise_msix_irq_doorbell(ep, func_no,
>>> + interrupt_num);
>>> default:
>>> dev_err(pci->dev, "UNKNOWN IRQ type\n");
>>> }
>>>
>>>
>>> For the pcie-dw-rockchip driver, but it is not working:
>>> [ 130.042849] nvme nvme0: I/O tag 0 (1000) QID 0 timeout, completion polled
>>>
>>> Without this change, things work.
>>>
>>> Perhaps this feature is not an optional one, but at least we will require
>>> more changes than a simple one liner.
>> Hi Niklas,
>>
>> It should be automatic only, no extra configurations should be
>> required, I believe your
>> HW doesn't support this feature, from spec 6..0a, sec 3.9.1.3
>> iMSIX-TX: Integrated MSI-X Transmit (USP)
>> I believe your HW is not generated with MSIX_TABLE_EN =1. In that
>> case you can't use this feature.
>
> Looking at the RK3588 TRM, it does have register:
> USP_PCIE_PL_MSIX_DOORBELL_OFF
> Address: Operational Base + offset (0x0248)
>
> Port Logic registers start at offset 0x700 on this SoC,
> so 0x700 + 0x248 == 0x948, which matches:
> drivers/pci/controller/dwc/pcie-designware.h:#define PCIE_MSIX_DOORBELL 0x948
>
> I don't think the TRM would include this register if the
> DWC coere was not generated with MSIX_TABLE_EN=1.
>
I checked the IP configurtion parameters for RK3588 DM controller for
sure, it sets MSIX_TABLE_EN=1.
Looking into dw_pcie_ep_raise_msix_irq_doorbell(), it doesn't seem to
match the dwc databook. No matter for non-AXI mode or AXI access mode,
shouldn't we need to generate a MSI-X table RAM with
data/address/vector/TC in advanced? Am I missing anything because I
didn't look
too much regarding to the EPC side?
> Shawn, any suggestions?
> For full thread, see:
> https://lore.kernel.org/linux-pci/3b34aa66-a418-4f6b-930a-0728d87d79b6@oss.qualcomm.com/T/#t
>
>
> FWIW, Krishna Chaitanya, I did try the dw_pcie_ep_raise_msix_irq_doorbell()
> change above also with the pci-epf-test EPF driver, and it also caused the
> pci-epf-test driver to stop working.
>
>
> Kind regards,
> Niklas
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] PCI: dwc: ep: Cache MSI outbound iATU mapping
2025-12-23 1:12 ` Shawn Lin
@ 2025-12-23 4:35 ` Krishna Chaitanya Chundru
2025-12-23 6:23 ` Shawn Lin
0 siblings, 1 reply; 17+ messages in thread
From: Krishna Chaitanya Chundru @ 2025-12-23 4:35 UTC (permalink / raw)
To: Shawn Lin, Niklas Cassel
Cc: Manivannan Sadhasivam, Jingoo Han, Lorenzo Pieralisi,
Krzysztof Wilczyński, Rob Herring, Bjorn Helgaas, Frank Li,
Damien Le Moal, Koichiro Den, linux-pci
On 12/23/2025 6:42 AM, Shawn Lin wrote:
> 在 2025/12/22 星期一 21:00, Niklas Cassel 写道:
>> + Shawn
>>
>> On Mon, Dec 22, 2025 at 05:53:27PM +0530, Krishna Chaitanya Chundru
>> wrote:
>>>
>>>
>>> On 12/22/2025 4:41 PM, Niklas Cassel wrote:
>>>> On Mon, Dec 22, 2025 at 03:28:30PM +0530, Manivannan Sadhasivam wrote:
>>>>>> Use the MSIX doorbell method which will not use iATU at all,
>>>>>> dw_pcie_ep_raise_msix_irq_doorbell().
>>>>>>
>>>>> I think this is the safe bet since this feature doesn't seem like
>>>>> an optional
>>>>> one.
>>>>>
>>>>> Niklas, if you can just fix MSI in this patch and leave out MSI-X
>>>>> for the vendor
>>>>> drivers to transition to doorbell, I'm OK to merge it. Otherwise,
>>>>> I don't know
>>>>> how you can reliably fix MSI-X generation with AXI slave interface.
>>>> FWIW, I did try to simply change:
>>>>
>>>> diff --git a/drivers/pci/controller/dwc/pcie-dw-rockchip.c
>>>> b/drivers/pci/controller/dwc/pcie-dw-rockchip.c
>>>> index 8f2cc1ef25e3..00770f9786e3 100644
>>>> --- a/drivers/pci/controller/dwc/pcie-dw-rockchip.c
>>>> +++ b/drivers/pci/controller/dwc/pcie-dw-rockchip.c
>>>> @@ -319,7 +319,8 @@ static int rockchip_pcie_raise_irq(struct
>>>> dw_pcie_ep *ep, u8 func_no,
>>>> case PCI_IRQ_MSI:
>>>> return dw_pcie_ep_raise_msi_irq(ep, func_no,
>>>> interrupt_num);
>>>> case PCI_IRQ_MSIX:
>>>> - return dw_pcie_ep_raise_msix_irq(ep, func_no,
>>>> interrupt_num);
>>>> + return dw_pcie_ep_raise_msix_irq_doorbell(ep, func_no,
>>>> + interrupt_num);
>>>> default:
>>>> dev_err(pci->dev, "UNKNOWN IRQ type\n");
>>>> }
>>>>
>>>>
>>>> For the pcie-dw-rockchip driver, but it is not working:
>>>> [ 130.042849] nvme nvme0: I/O tag 0 (1000) QID 0 timeout,
>>>> completion polled
>>>>
>>>> Without this change, things work.
>>>>
>>>> Perhaps this feature is not an optional one, but at least we will
>>>> require
>>>> more changes than a simple one liner.
>>> Hi Niklas,
>>>
>>> It should be automatic only, no extra configurations should be
>>> required, I believe your
>>> HW doesn't support this feature, from spec 6..0a, sec 3.9.1.3
>>> iMSIX-TX: Integrated MSI-X Transmit (USP)
>>> I believe your HW is not generated with MSIX_TABLE_EN =1. In that
>>> case you can't use this feature.
>>
>> Looking at the RK3588 TRM, it does have register:
>> USP_PCIE_PL_MSIX_DOORBELL_OFF
>> Address: Operational Base + offset (0x0248)
>>
>> Port Logic registers start at offset 0x700 on this SoC,
>> so 0x700 + 0x248 == 0x948, which matches:
>> drivers/pci/controller/dwc/pcie-designware.h:#define
>> PCIE_MSIX_DOORBELL 0x948
>>
>> I don't think the TRM would include this register if the
>> DWC coere was not generated with MSIX_TABLE_EN=1.
>>
>
> I checked the IP configurtion parameters for RK3588 DM controller for
> sure, it sets MSIX_TABLE_EN=1.
>
> Looking into dw_pcie_ep_raise_msix_irq_doorbell(), it doesn't seem to
> match the dwc databook. No matter for non-AXI mode or AXI access mode,
> shouldn't we need to generate a MSI-X table RAM with
> data/address/vector/TC in advanced? Am I missing anything because I
> didn't look
The MSI-X table will updated automatically when host updates the MSI-X
table, when MSI-X is enabled
by host.
- Krishna Chaitanya.
> too much regarding to the EPC side?
>
>> Shawn, any suggestions?
>> For full thread, see:
>> https://lore.kernel.org/linux-pci/3b34aa66-a418-4f6b-930a-0728d87d79b6@oss.qualcomm.com/T/#t
>>
>>
>>
>> FWIW, Krishna Chaitanya, I did try the
>> dw_pcie_ep_raise_msix_irq_doorbell()
>> change above also with the pci-epf-test EPF driver, and it also
>> caused the
>> pci-epf-test driver to stop working.
>>
>>
>> Kind regards,
>> Niklas
>>
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] PCI: dwc: ep: Cache MSI outbound iATU mapping
2025-12-23 4:35 ` Krishna Chaitanya Chundru
@ 2025-12-23 6:23 ` Shawn Lin
2025-12-23 7:12 ` Niklas Cassel
2026-01-05 11:06 ` Niklas Cassel
0 siblings, 2 replies; 17+ messages in thread
From: Shawn Lin @ 2025-12-23 6:23 UTC (permalink / raw)
To: Krishna Chaitanya Chundru, Niklas Cassel
Cc: shawn.lin, Manivannan Sadhasivam, Jingoo Han, Lorenzo Pieralisi,
Krzysztof Wilczyński, Rob Herring, Bjorn Helgaas, Frank Li,
Damien Le Moal, Koichiro Den, linux-pci
在 2025/12/23 星期二 12:35, Krishna Chaitanya Chundru 写道:
>
>
> On 12/23/2025 6:42 AM, Shawn Lin wrote:
>> 在 2025/12/22 星期一 21:00, Niklas Cassel 写道:
>>> + Shawn
>>>
>>> On Mon, Dec 22, 2025 at 05:53:27PM +0530, Krishna Chaitanya Chundru
>>> wrote:
>>>>
>>>>
>>>> On 12/22/2025 4:41 PM, Niklas Cassel wrote:
>>>>> On Mon, Dec 22, 2025 at 03:28:30PM +0530, Manivannan Sadhasivam wrote:
>>>>>>> Use the MSIX doorbell method which will not use iATU at all,
>>>>>>> dw_pcie_ep_raise_msix_irq_doorbell().
>>>>>>>
>>>>>> I think this is the safe bet since this feature doesn't seem like
>>>>>> an optional
>>>>>> one.
>>>>>>
>>>>>> Niklas, if you can just fix MSI in this patch and leave out MSI-X
>>>>>> for the vendor
>>>>>> drivers to transition to doorbell, I'm OK to merge it. Otherwise,
>>>>>> I don't know
>>>>>> how you can reliably fix MSI-X generation with AXI slave interface.
>>>>> FWIW, I did try to simply change:
>>>>>
>>>>> diff --git a/drivers/pci/controller/dwc/pcie-dw-rockchip.c b/
>>>>> drivers/pci/controller/dwc/pcie-dw-rockchip.c
>>>>> index 8f2cc1ef25e3..00770f9786e3 100644
>>>>> --- a/drivers/pci/controller/dwc/pcie-dw-rockchip.c
>>>>> +++ b/drivers/pci/controller/dwc/pcie-dw-rockchip.c
>>>>> @@ -319,7 +319,8 @@ static int rockchip_pcie_raise_irq(struct
>>>>> dw_pcie_ep *ep, u8 func_no,
>>>>> case PCI_IRQ_MSI:
>>>>> return dw_pcie_ep_raise_msi_irq(ep, func_no,
>>>>> interrupt_num);
>>>>> case PCI_IRQ_MSIX:
>>>>> - return dw_pcie_ep_raise_msix_irq(ep, func_no,
>>>>> interrupt_num);
>>>>> + return dw_pcie_ep_raise_msix_irq_doorbell(ep, func_no,
>>>>> + interrupt_num);
>>>>> default:
>>>>> dev_err(pci->dev, "UNKNOWN IRQ type\n");
>>>>> }
>>>>>
>>>>>
>>>>> For the pcie-dw-rockchip driver, but it is not working:
>>>>> [ 130.042849] nvme nvme0: I/O tag 0 (1000) QID 0 timeout,
>>>>> completion polled
>>>>>
>>>>> Without this change, things work.
>>>>>
>>>>> Perhaps this feature is not an optional one, but at least we will
>>>>> require
>>>>> more changes than a simple one liner.
>>>> Hi Niklas,
>>>>
>>>> It should be automatic only, no extra configurations should be
>>>> required, I believe your
>>>> HW doesn't support this feature, from spec 6..0a, sec 3.9.1.3
>>>> iMSIX-TX: Integrated MSI-X Transmit (USP)
>>>> I believe your HW is not generated with MSIX_TABLE_EN =1. In that
>>>> case you can't use this feature.
>>>
>>> Looking at the RK3588 TRM, it does have register:
>>> USP_PCIE_PL_MSIX_DOORBELL_OFF
>>> Address: Operational Base + offset (0x0248)
>>>
>>> Port Logic registers start at offset 0x700 on this SoC,
>>> so 0x700 + 0x248 == 0x948, which matches:
>>> drivers/pci/controller/dwc/pcie-designware.h:#define
>>> PCIE_MSIX_DOORBELL 0x948
>>>
>>> I don't think the TRM would include this register if the
>>> DWC coere was not generated with MSIX_TABLE_EN=1.
>>>
>>
>> I checked the IP configurtion parameters for RK3588 DM controller for
>> sure, it sets MSIX_TABLE_EN=1.
>>
>> Looking into dw_pcie_ep_raise_msix_irq_doorbell(), it doesn't seem to
>> match the dwc databook. No matter for non-AXI mode or AXI access mode,
>> shouldn't we need to generate a MSI-X table RAM with data/address/
>> vector/TC in advanced? Am I missing anything because I didn't look
> The MSI-X table will updated automatically when host updates the MSI-X
> table, when MSI-X is enabled
> by host.
Thanks for these details.
I re-read the databook, especially "Figure 3-49 iMSIX-TX: MSIX
Transmit ". Yes, it's updated automatically when host write access
it, which either comes from RX or local DBI(for debug purpose).
>
> - Krishna Chaitanya.
>> too much regarding to the EPC side?
>>
>>> Shawn, any suggestions?
>>> For full thread, see:
>>> https://lore.kernel.org/linux-pci/3b34aa66-
>>> a418-4f6b-930a-0728d87d79b6@oss.qualcomm.com/T/#t
>>>
>>>
>>> FWIW, Krishna Chaitanya, I did try the
>>> dw_pcie_ep_raise_msix_irq_doorbell()
>>> change above also with the pci-epf-test EPF driver, and it also
>>> caused the
>>> pci-epf-test driver to stop working.
>>>
>>>
>>> Kind regards,
>>> Niklas
>>>
>>
>
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] PCI: dwc: ep: Cache MSI outbound iATU mapping
2025-12-23 6:23 ` Shawn Lin
@ 2025-12-23 7:12 ` Niklas Cassel
2026-01-05 11:06 ` Niklas Cassel
1 sibling, 0 replies; 17+ messages in thread
From: Niklas Cassel @ 2025-12-23 7:12 UTC (permalink / raw)
To: Shawn Lin
Cc: Krishna Chaitanya Chundru, Manivannan Sadhasivam, Jingoo Han,
Lorenzo Pieralisi, Krzysztof Wilczyński, Rob Herring,
Bjorn Helgaas, Frank Li, Damien Le Moal, Koichiro Den, linux-pci
On Tue, Dec 23, 2025 at 02:23:39PM +0800, Shawn Lin wrote:
> > On 12/23/2025 6:42 AM, Shawn Lin wrote:
> > > I checked the IP configurtion parameters for RK3588 DM controller for
> > > sure, it sets MSIX_TABLE_EN=1.
> > >
> > > Looking into dw_pcie_ep_raise_msix_irq_doorbell(), it doesn't seem to
> > > match the dwc databook. No matter for non-AXI mode or AXI access mode,
> > > shouldn't we need to generate a MSI-X table RAM with data/address/
> > > vector/TC in advanced? Am I missing anything because I didn't look
> > The MSI-X table will updated automatically when host updates the MSI-X
> > table, when MSI-X is enabled
> > by host.
>
> Thanks for these details.
> I re-read the databook, especially "Figure 3-49 iMSIX-TX: MSIX
> Transmit ". Yes, it's updated automatically when host write access
> it, which either comes from RX or local DBI(for debug purpose).
Shawn,
if you have time to help figure out why we cannot simply
replace dw_pcie_ep_raise_msix_irq() with dw_pcie_ep_raise_msix_irq_doorbell()
in rockchip_pcie_raise_irq(), that would be appreciated.
As you can see, we get IOMMU errors on the host side if we
reprogram the outbound iATU while outgoing transactions are in flight.
Mani, perhaps we should even add a WARN_ON() in dw_pcie_ep_raise_msix_irq()
that warns users that this function is broken and will lead to transactions
to random writes on the host side memory, which is quite bad...
Kind regards,
Niklas
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] PCI: dwc: ep: Cache MSI outbound iATU mapping
2025-12-23 6:23 ` Shawn Lin
2025-12-23 7:12 ` Niklas Cassel
@ 2026-01-05 11:06 ` Niklas Cassel
2026-01-06 1:37 ` Shawn Lin
1 sibling, 1 reply; 17+ messages in thread
From: Niklas Cassel @ 2026-01-05 11:06 UTC (permalink / raw)
To: Shawn Lin, Manivannan Sadhasivam
Cc: Krishna Chaitanya Chundru, Jingoo Han, Lorenzo Pieralisi,
Krzysztof Wilczyński, Rob Herring, Bjorn Helgaas, Frank Li,
Damien Le Moal, Koichiro Den, linux-pci
Hello all,
On Tue, Dec 23, 2025 at 02:23:39PM +0800, Shawn Lin wrote:
> > > I checked the IP configurtion parameters for RK3588 DM controller for
> > > sure, it sets MSIX_TABLE_EN=1.
> > >
> > > Looking into dw_pcie_ep_raise_msix_irq_doorbell(), it doesn't seem to
> > > match the dwc databook. No matter for non-AXI mode or AXI access mode,
> > > shouldn't we need to generate a MSI-X table RAM with data/address/
> > > vector/TC in advanced? Am I missing anything because I didn't look
> > The MSI-X table will updated automatically when host updates the MSI-X
> > table, when MSI-X is enabled
> > by host.
>
> Thanks for these details.
> I re-read the databook, especially "Figure 3-49 iMSIX-TX: MSIX
> Transmit ". Yes, it's updated automatically when host write access
> it, which either comes from RX or local DBI(for debug purpose).
With the help of Shawn, I managed to get dw_pcie_ep_raise_msix_irq_doorbell()
to work on RK3588.
By default, on RK3588, the MSI-X table is stored in BAR4 at offset 0x4000.
There seems to be a limitation that the iMSIX-TX doorbell feature only
works when the MSI-X table is stored in this default location.
As you know, by default, pci-epf-test stores the MSI-X table in BAR0,
after the registers defined by pci-epf-test itself.
This works fine when triggering a MSI-X using dw_pcie_ep_raise_msix_irq(),
but does not work when using dw_pcie_ep_raise_msix_irq_doorbell() (at least
not or RK3588).
Additionally, on RK3588, at offset 0x0 in BAR4, the eDMA registers are
mapped, so we cannot simply use BAR4 as test_reg_bar, as that would conflict
with the registers defined by pci-epf-test-itself.
The solution here is most likely to let EPC drivers define a
"HW limitation" of the MSI-X table (BAR number and offset), and create a new
API in the PCI endpoint framework that exposes this, and modify all EPF
drivers to use this API before calling pci_epc_set_msix().
Personally, I'm too busy to work on this right now.
For our use case with the NVMe PCI EPF, we can simply force it to only
use MSI (instead of MSI-X).
But... I could submit a WARN() in dw_pcie_ep_raise_msix_irq() that explains
that this function is broken, and that dw_pcie_ep_raise_msix_irq_doorbell()
should be used instead, on platforms that support it.
Or, if it is preferred, we could simply modify all DWC based drivers that is
not using dw_pcie_ep_raise_msix_irq_doorbell() (i.e. all except layerspace)
to simply not set "msix_capable = true" in epc_features, as we know that all
drivers that are using dw_pcie_ep_raise_msix_irq() are broken.
Thoughts?
Kind regards,
Niklas
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] PCI: dwc: ep: Cache MSI outbound iATU mapping
2026-01-05 11:06 ` Niklas Cassel
@ 2026-01-06 1:37 ` Shawn Lin
0 siblings, 0 replies; 17+ messages in thread
From: Shawn Lin @ 2026-01-06 1:37 UTC (permalink / raw)
To: Niklas Cassel, Manivannan Sadhasivam
Cc: shawn.lin, Krishna Chaitanya Chundru, Jingoo Han,
Lorenzo Pieralisi, Krzysztof Wilczyński, Rob Herring,
Bjorn Helgaas, Frank Li, Damien Le Moal, Koichiro Den, linux-pci
在 2026/01/05 星期一 19:06, Niklas Cassel 写道:
> Hello all,
>
> On Tue, Dec 23, 2025 at 02:23:39PM +0800, Shawn Lin wrote:
>>>> I checked the IP configurtion parameters for RK3588 DM controller for
>>>> sure, it sets MSIX_TABLE_EN=1.
>>>>
>>>> Looking into dw_pcie_ep_raise_msix_irq_doorbell(), it doesn't seem to
>>>> match the dwc databook. No matter for non-AXI mode or AXI access mode,
>>>> shouldn't we need to generate a MSI-X table RAM with data/address/
>>>> vector/TC in advanced? Am I missing anything because I didn't look
>>> The MSI-X table will updated automatically when host updates the MSI-X
>>> table, when MSI-X is enabled
>>> by host.
>>
>> Thanks for these details.
>> I re-read the databook, especially "Figure 3-49 iMSIX-TX: MSIX
>> Transmit ". Yes, it's updated automatically when host write access
>> it, which either comes from RX or local DBI(for debug purpose).
>
> With the help of Shawn, I managed to get dw_pcie_ep_raise_msix_irq_doorbell()
> to work on RK3588.
>
> By default, on RK3588, the MSI-X table is stored in BAR4 at offset 0x4000.
>
> There seems to be a limitation that the iMSIX-TX doorbell feature only
> works when the MSI-X table is stored in this default location.
>
>
I believe the limitation comes from IP, the databook implies this:
"When you enable this feature (MSIX_TABLE_EN =1), the upstream
controller implements the logic and *RAM required* to generate MSI-X
requests."
Using non-AXI mode(MSIX_DOORBELL_OFF), MSI-X FSM need fetch MSI-X table
stored in RAM. The IP configuration could decide which type of RAM is
used by setting CX_RAM_AT_TOP_IF value. And it's also IP integration
decision to make sure RAM accessable by which BAR. That being said, BAR0
could store the table to a system memory address(DRAM), not to the RAM.
> As you know, by default, pci-epf-test stores the MSI-X table in BAR0,
> after the registers defined by pci-epf-test itself.
> This works fine when triggering a MSI-X using dw_pcie_ep_raise_msix_irq(),
> but does not work when using dw_pcie_ep_raise_msix_irq_doorbell() (at least
> not or RK3588).
>
> Additionally, on RK3588, at offset 0x0 in BAR4, the eDMA registers are
> mapped, so we cannot simply use BAR4 as test_reg_bar, as that would conflict
> with the registers defined by pci-epf-test-itself.
>
>
> The solution here is most likely to let EPC drivers define a
> "HW limitation" of the MSI-X table (BAR number and offset), and create a new
> API in the PCI endpoint framework that exposes this, and modify all EPF
> drivers to use this API before calling pci_epc_set_msix().
>
Ack, except that it's not HW limitation but IP integration decision,
because limitation sounds like a workaround for defects.
>
> Personally, I'm too busy to work on this right now.
> For our use case with the NVMe PCI EPF, we can simply force it to only
> use MSI (instead of MSI-X).
>
> But... I could submit a WARN() in dw_pcie_ep_raise_msix_irq() that explains
> that this function is broken, and that dw_pcie_ep_raise_msix_irq_doorbell()
> should be used instead, on platforms that support it.
>
> Or, if it is preferred, we could simply modify all DWC based drivers that is
> not using dw_pcie_ep_raise_msix_irq_doorbell() (i.e. all except layerspace)
> to simply not set "msix_capable = true" in epc_features, as we know that all
> drivers that are using dw_pcie_ep_raise_msix_irq() are broken.
>
> Thoughts?
>
>
> Kind regards,
> Niklas
>
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2026-01-06 8:26 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-10 7:13 [PATCH] PCI: dwc: ep: Cache MSI outbound iATU mapping Niklas Cassel
2025-12-22 8:19 ` Krishna Chaitanya Chundru
2025-12-22 8:36 ` Niklas Cassel
2025-12-22 8:42 ` Krishna Chaitanya Chundru
2025-12-22 8:49 ` Niklas Cassel
2025-12-22 9:58 ` Manivannan Sadhasivam
2025-12-22 10:42 ` Niklas Cassel
2025-12-22 11:11 ` Niklas Cassel
2025-12-22 11:19 ` Manivannan Sadhasivam
2025-12-22 12:23 ` Krishna Chaitanya Chundru
2025-12-22 13:00 ` Niklas Cassel
2025-12-23 1:12 ` Shawn Lin
2025-12-23 4:35 ` Krishna Chaitanya Chundru
2025-12-23 6:23 ` Shawn Lin
2025-12-23 7:12 ` Niklas Cassel
2026-01-05 11:06 ` Niklas Cassel
2026-01-06 1:37 ` Shawn Lin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox