* Re: [PATCH] coresight: fix resource leaks on path build failure
From: James Clark @ 2026-05-20 8:38 UTC (permalink / raw)
To: Jie Gan
Cc: coresight, linux-arm-kernel, linux-kernel, Suzuki K Poulose,
Mike Leach, Leo Yan, Alexander Shishkin, Mathieu Poirier,
Tingwei Zhang, Greg Kroah-Hartman
In-Reply-To: <5b3785a1-0f53-4b61-bc18-880091c2ce12@oss.qualcomm.com>
On 20/05/2026 2:55 am, Jie Gan wrote:
>
>
> On 5/19/2026 9:57 PM, James Clark wrote:
>>
>>
>> On 13/05/2026 2:32 am, Jie Gan wrote:
>>> Two related leaks when _coresight_build_path() encounters an error after
>>> coresight_grab_device() has already incremented the pm_runtime, module,
>>> and device references for a node:
>>>
>>> 1. In _coresight_build_path(), if kzalloc_obj() for the path node fails
>>> after coresight_grab_device() succeeds, coresight_drop_device() was
>>> never called, permanently leaking all three references.
>>>
>>> 2. In coresight_build_path(), on failure the partial path was freed with
>>> kfree(path) instead of coresight_release_path(path). kfree() only
>>> frees the coresight_path struct itself; it does not iterate
>>> path_list
>>> to call coresight_drop_device() and kfree() for each coresight_node
>>> already added by deeper recursive calls, leaking both the
>>> pm_runtime,
>>> module, and device references and the node memory for every element
>>> on the partial path.
>>>
>>> Fix both by adding coresight_drop_device() in the OOM unwind of
>>> _coresight_build_path(), and replacing kfree(path) with
>>> coresight_release_path(path) in coresight_build_path().
>>>
>>> Fixes: 32b0707a4182 ("coresight: Add try_get_module() in
>>> coresight_grab_device()")
>>> Fixes: b3e94405941e ("coresight: associating path with session rather
>>> than tracer")
>>> Signed-off-by: Jie Gan <jie.gan@oss.qualcomm.com>
>>> ---
>>> drivers/hwtracing/coresight/coresight-core.c | 6 ++++--
>>> 1 file changed, 4 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/hwtracing/coresight/coresight-core.c b/drivers/
>>> hwtracing/coresight/coresight-core.c
>>> index 46f247f73cf6..c1354ea8e11d 100644
>>> --- a/drivers/hwtracing/coresight/coresight-core.c
>>> +++ b/drivers/hwtracing/coresight/coresight-core.c
>>> @@ -825,8 +825,10 @@ static int _coresight_build_path(struct
>>> coresight_device *csdev,
>>> return ret;
>>> node = kzalloc_obj(struct coresight_node);
>>> - if (!node)
>>> + if (!node) {
>>> + coresight_drop_device(csdev);
>>> return -ENOMEM;
>>> + }
>>> node->csdev = csdev;
>>> list_add(&node->link, &path->path_list);
>>> @@ -851,7 +853,7 @@ struct coresight_path
>>> *coresight_build_path(struct coresight_device *source,
>>> rc = _coresight_build_path(source, source, sink, path);
>>> if (rc) {
>>> - kfree(path);
>>> + coresight_release_path(path);
>>> return ERR_PTR(rc);
>>> }
>>>
>>> ---
>>> base-commit: e98d21c170b01ddef366f023bbfcf6b31509fa83
>>> change-id: 20260513-fix-memory-leak-issue-034b4a45265e
>>>
>>> Best regards,
>>
>> Looks good to me, but sashiko is complaining: https://sashiko.dev/#/
>> patchset/20260513-fix-memory-leak-issue-
>> v1-1-49822d7bc7d4%40oss.qualcomm.com
>>
>> I'm trying to understand why it's saying that, but I think the
>> scenario is that if there are multiple correct paths to a sink, when
>> one path partially fails and a second path succeeds you could get a
>> path_list with some garbage entries in it.
>
> I think the coresight_release_path is added to address this situation.
> We suffered the path partially failure, and we need release all nodes
> already added to the path.
>
It wouldn't call coresight_release_path() in this case though. If one
path ends up building to success but a parallel path partially failed
then _coresight_build_path() still returns success. During the search it
would have still added the nodes from the partially failed path to the
path_list. This is only an issue if there are multiple correct paths.
>>
>> That's kind of a different and existing issue to the one you've fixed,
>> and assumes that multiple paths to one sink are possible, which I'm
>> not sure is supported?
>
> Each path is unique. We only deal with the issue path for balancing the
> reference count.
>
> Thanks,
> Jie
>
I'm not exactly sure what you mean by unique, but the same source and
sink could potentially be connected through two different sets of links.
>>
>> It might be as easy as breaking the loop early for any return value
>> other than -ENODEV, but I'll leave it to you to decide whether to do
>> that here or not.
>>
>> Reviewed-by: James Clark <james.clark@linaro.org>
>>
>
^ permalink raw reply
* Re: [PATCH v3] arm64: Kconfig: drop unneeded dependency on OF_GPIO for ARCH_MVEBU
From: Bartosz Golaszewski @ 2026-05-20 8:32 UTC (permalink / raw)
To: Arnd Bergmann
Cc: Bartosz Golaszewski, Catalin Marinas, Will Deacon,
linux-arm-kernel, linux-kernel, Linus Walleij
In-Reply-To: <CAD++jLnm2sPEQQ7cN9Ci+jhNfB=WC2g=36d_N4upyxYUsmN6dg@mail.gmail.com>
On Tue, May 5, 2026 at 12:42 PM Linus Walleij <linusw@kernel.org> wrote:
>
> On Thu, Apr 30, 2026 at 4:32 PM Bartosz Golaszewski
> <bartosz.golaszewski@oss.qualcomm.com> wrote:
>
> > OF_GPIO is selected automatically on all OF systems. Any symbols it
> > controls also provide stubs so there's really no reason to select it
> > explicitly. ARCH_MVEBU already selects GPIOLIB, drop the redundant
> > OF_GPIO dependency.
> >
> > Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
>
> Reviewed-by: Linus Walleij <linusw@kernel.org>
>
> Yours,
> Linus Walleij
Arnd, can you please queue this directly for v7.2?
Bartosz
^ permalink raw reply
* [PATCH net-next v5 13/13] misc: lan966x-pci: dts: add fdma interrupt to overlay
From: Daniel Machon @ 2026-05-20 8:12 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Horatiu Vultur, Steen Hegelund, UNGLinuxDriver,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, Stanislav Fomichev, Herve Codina, Arnd Bergmann,
Greg Kroah-Hartman, Mohsin Bashir
Cc: netdev, linux-kernel, bpf, linux-arm-kernel
In-Reply-To: <20260520-lan966x-pci-fdma-v5-0-ca56197ae05b@microchip.com>
Add the fdma interrupt (OIC interrupt 14) to the lan966x PCI device
tree overlay, enabling FDMA-based frame injection/extraction when
the switch is connected over PCIe.
Tested-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
---
drivers/misc/lan966x_pci.dtso | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/misc/lan966x_pci.dtso b/drivers/misc/lan966x_pci.dtso
index 7bb726550caf..5bb12dbc0843 100644
--- a/drivers/misc/lan966x_pci.dtso
+++ b/drivers/misc/lan966x_pci.dtso
@@ -141,8 +141,9 @@ switch: switch@e0000000 {
interrupt-parent = <&oic>;
interrupts = <12 IRQ_TYPE_LEVEL_HIGH>,
+ <14 IRQ_TYPE_LEVEL_HIGH>,
<9 IRQ_TYPE_LEVEL_HIGH>;
- interrupt-names = "xtr", "ana";
+ interrupt-names = "xtr", "fdma", "ana";
resets = <&reset 0>;
reset-names = "switch";
--
2.34.1
^ permalink raw reply related
* [PATCH net-next v5 11/13] net: lan966x: add PCIe FDMA XDP support
From: Daniel Machon @ 2026-05-20 8:12 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Horatiu Vultur, Steen Hegelund, UNGLinuxDriver,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, Stanislav Fomichev, Herve Codina, Arnd Bergmann,
Greg Kroah-Hartman, Mohsin Bashir
Cc: netdev, linux-kernel, bpf, linux-arm-kernel
In-Reply-To: <20260520-lan966x-pci-fdma-v5-0-ca56197ae05b@microchip.com>
Add XDP support for the PCIe FDMA path. The implementation operates on
contiguous ATU-mapped buffers with memcpy-based XDP_TX, unlike the
platform path which uses page_pool.
XDP sees the frame with IFH and FCS stripped. These are removed in
lan966x_fdma_pci_rx_check_frame() before the BPF program runs, because
after the program returns the driver cannot tell whether the tail
region was modified. The skb_pull/skb_trim previously done in
lan966x_fdma_pci_rx_get_frame() are removed for the same reason; the
frame pointer and length are pre-computed by rx_check_frame() and
passed through rx_get_frame() and lan966x_xdp_pci_run() to the caller.
lan966x_fdma_pci_xmit_xdpf() handles XDP_TX: it rebuilds a fresh IFH
in the TX slot, copies the post-XDP frame after it, and lets HW insert
a new FCS.
lan966x_xdp_setup() is extended so the PCIe path skips the page_pool
reload that the platform path needs.
Only XDP_ACT_BASIC is supported.
Tested-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
---
.../ethernet/microchip/lan966x/lan966x_fdma_pci.c | 162 ++++++++++++++++++---
.../net/ethernet/microchip/lan966x/lan966x_main.c | 11 +-
.../net/ethernet/microchip/lan966x/lan966x_main.h | 10 ++
.../net/ethernet/microchip/lan966x/lan966x_xdp.c | 10 ++
4 files changed, 169 insertions(+), 24 deletions(-)
diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_fdma_pci.c b/drivers/net/ethernet/microchip/lan966x/lan966x_fdma_pci.c
index 3ea6d22ee573..f9db66e3e753 100644
--- a/drivers/net/ethernet/microchip/lan966x/lan966x_fdma_pci.c
+++ b/drivers/net/ethernet/microchip/lan966x/lan966x_fdma_pci.c
@@ -1,5 +1,7 @@
// SPDX-License-Identifier: GPL-2.0+
+#include <linux/bpf_trace.h>
+
#include "fdma_api.h"
#include "lan966x_main.h"
@@ -114,7 +116,118 @@ static bool lan966x_fdma_pci_rx_size_fits(struct fdma *fdma, u32 blockl)
blockl <= fdma->db_size - XDP_PACKET_HEADROOM;
}
-static int lan966x_fdma_pci_rx_check_frame(struct lan966x_rx *rx, u64 *src_port)
+static int lan966x_fdma_pci_xmit_xdpf(struct lan966x_port *port,
+ void *ptr, u32 len)
+{
+ struct lan966x *lan966x = port->lan966x;
+ struct lan966x_tx *tx = &lan966x->tx;
+ struct fdma *fdma = &tx->fdma;
+ int next_to_use, ret = 0;
+ void *virt_addr;
+
+ spin_lock(&lan966x->tx_lock);
+
+ next_to_use = lan966x_fdma_pci_get_next_dcb(fdma);
+
+ if (next_to_use < 0) {
+ netif_stop_queue(port->dev);
+ ret = NETDEV_TX_BUSY;
+ goto out;
+ }
+
+ if (!lan966x_fdma_pci_tx_size_fits(fdma, len)) {
+ port->dev->stats.tx_dropped++;
+ ret = -EINVAL;
+ goto out;
+ }
+
+ /* virt_addr points to the IFH. */
+ virt_addr = fdma_dataptr_virt_addr_contiguous(fdma, next_to_use, 0);
+
+ /* Construct a fresh IFH. */
+ memset(virt_addr, 0, IFH_LEN_BYTES);
+ lan966x_ifh_set_bypass(virt_addr, 1);
+ lan966x_ifh_set_port(virt_addr, BIT_ULL(port->chip_port));
+
+ /* Copy the (post-XDP) frame after the IFH. */
+ memcpy(virt_addr + IFH_LEN_BYTES, ptr, len);
+
+ /* Order frame write before DCB status write below. */
+ dma_wmb();
+
+ /* Reserve ETH_FCS_LEN for the HW-inserted FCS (len is FCS-stripped). */
+ fdma_dcb_add(fdma,
+ next_to_use,
+ 0,
+ FDMA_DCB_STATUS_INTR |
+ FDMA_DCB_STATUS_SOF |
+ FDMA_DCB_STATUS_EOF |
+ FDMA_DCB_STATUS_BLOCKO(0) |
+ FDMA_DCB_STATUS_BLOCKL(IFH_LEN_BYTES + len + ETH_FCS_LEN));
+
+ /* Start the transmission. */
+ lan966x_fdma_tx_start(tx);
+
+ port->dev->stats.tx_bytes += len;
+ port->dev->stats.tx_packets++;
+
+out:
+ spin_unlock(&lan966x->tx_lock);
+
+ return ret;
+}
+
+static int lan966x_xdp_pci_run(struct lan966x_port *port, void *data,
+ u32 data_len, void **xdp_data, u32 *xdp_len)
+{
+ /* Read once so the NULL check and bpf_prog_run_xdp() see the same
+ * pointer.
+ */
+ struct bpf_prog *xdp_prog = READ_ONCE(port->xdp_prog);
+ struct lan966x *lan966x = port->lan966x;
+ struct fdma *fdma = &lan966x->rx.fdma;
+ struct xdp_buff xdp;
+ u32 act;
+
+ if (!xdp_prog)
+ return FDMA_PASS;
+
+ xdp_init_buff(&xdp, fdma->db_size, &port->xdp_rxq);
+
+ /* hard_start is set to slot start (virt_addr is XDP_PACKET_HEADROOM
+ * into the slot). Headroom includes the IFH; BPF may grow into it
+ * via adjust_head. IFH is rebuilt on XDP_TX and unread on XDP_PASS.
+ */
+ xdp_prepare_buff(&xdp,
+ data - XDP_PACKET_HEADROOM,
+ XDP_PACKET_HEADROOM + IFH_LEN_BYTES,
+ data_len,
+ false);
+
+ act = bpf_prog_run_xdp(xdp_prog, &xdp);
+
+ *xdp_data = xdp.data;
+ *xdp_len = xdp.data_end - xdp.data;
+
+ switch (act) {
+ case XDP_PASS:
+ return FDMA_PASS;
+ case XDP_TX:
+ return lan966x_fdma_pci_xmit_xdpf(port, *xdp_data, *xdp_len) ?
+ FDMA_DROP : FDMA_TX;
+ default:
+ bpf_warn_invalid_xdp_action(port->dev, xdp_prog, act);
+ fallthrough;
+ case XDP_ABORTED:
+ trace_xdp_exception(port->dev, xdp_prog, act);
+ fallthrough;
+ case XDP_DROP:
+ return FDMA_DROP;
+ }
+}
+
+static int lan966x_fdma_pci_rx_check_frame(struct lan966x_rx *rx, u64 *src_port,
+ void **data, u32 *data_len)
{
struct lan966x *lan966x = rx->lan966x;
struct fdma *fdma = &rx->fdma;
@@ -146,38 +259,33 @@ static int lan966x_fdma_pci_rx_check_frame(struct lan966x_rx *rx, u64 *src_port)
if (!lan966x_fdma_pci_rx_size_fits(fdma, blockl))
return FDMA_ERROR;
- return FDMA_PASS;
+ /* Present the Ethernet frame (no IFH, no FCS). HW re-inserts the
+ * FCS on TX; see lan966x_fdma_pci_xmit_xdpf(). May be overridden
+ * by XDP. The FCS strip is unconditional because NETIF_F_RXFCS
+ * is not advertised in hw_features.
+ */
+ *data = virt_addr + IFH_LEN_BYTES;
+ *data_len = blockl - IFH_LEN_BYTES - ETH_FCS_LEN;
+
+ return lan966x_xdp_pci_run(port, virt_addr, *data_len, data, data_len);
}
static struct sk_buff *lan966x_fdma_pci_rx_get_frame(struct lan966x_rx *rx,
- u64 src_port)
+ u64 src_port, void *data,
+ u32 data_len)
{
struct lan966x *lan966x = rx->lan966x;
- struct fdma *fdma = &rx->fdma;
struct sk_buff *skb;
- struct fdma_db *db;
- u32 data_len;
-
- /* Get the received frame and create an SKB for it. */
- db = fdma_db_next_get(fdma);
- data_len = FDMA_DCB_STATUS_BLOCKL(db->status);
skb = napi_alloc_skb(&lan966x->napi, data_len);
if (unlikely(!skb))
return NULL;
- memcpy(skb->data,
- fdma_dataptr_virt_addr_contiguous(fdma,
- fdma->dcb_index,
- fdma->db_index),
- data_len);
+ memcpy(skb->data, data, data_len);
skb_put(skb, data_len);
skb->dev = lan966x->ports[src_port]->dev;
- skb_pull(skb, IFH_LEN_BYTES);
-
- skb_trim(skb, skb->len - ETH_FCS_LEN);
skb->protocol = eth_type_trans(skb, skb->dev);
@@ -266,6 +374,8 @@ static int lan966x_fdma_pci_napi_poll(struct napi_struct *napi, int weight)
struct sk_buff *skb;
int counter = 0;
u64 src_port;
+ u32 data_len;
+ void *data;
/* Wake any stopped TX queues if a TX DCB is available. */
spin_lock(&lan966x->tx_lock);
@@ -282,7 +392,10 @@ static int lan966x_fdma_pci_napi_poll(struct napi_struct *napi, int weight)
/* Order DONE read before DCB/frame reads below. */
dma_rmb();
counter++;
- switch (lan966x_fdma_pci_rx_check_frame(rx, &src_port)) {
+ switch (lan966x_fdma_pci_rx_check_frame(rx,
+ &src_port,
+ &data,
+ &data_len)) {
case FDMA_PASS:
break;
case FDMA_ERROR:
@@ -291,8 +404,17 @@ static int lan966x_fdma_pci_napi_poll(struct napi_struct *napi, int weight)
*/
fdma_dcb_advance(fdma);
continue;
+ case FDMA_TX:
+ fdma_dcb_advance(fdma);
+ continue;
+ case FDMA_DROP:
+ fdma_dcb_advance(fdma);
+ continue;
}
- skb = lan966x_fdma_pci_rx_get_frame(rx, src_port);
+ skb = lan966x_fdma_pci_rx_get_frame(rx,
+ src_port,
+ data,
+ data_len);
fdma_dcb_advance(fdma);
if (!skb) {
lan966x->ports[src_port]->dev->stats.rx_dropped++;
diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_main.c b/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
index 7036b1d937d5..b984e819312d 100644
--- a/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
+++ b/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
@@ -876,10 +876,13 @@ static int lan966x_probe_port(struct lan966x *lan966x, u32 p,
port->phylink = phylink;
- if (lan966x->fdma)
- dev->xdp_features = NETDEV_XDP_ACT_BASIC |
- NETDEV_XDP_ACT_REDIRECT |
- NETDEV_XDP_ACT_NDO_XMIT;
+ if (lan966x->fdma) {
+ dev->xdp_features = NETDEV_XDP_ACT_BASIC;
+
+ if (!lan966x_is_pci(lan966x))
+ dev->xdp_features |= NETDEV_XDP_ACT_REDIRECT |
+ NETDEV_XDP_ACT_NDO_XMIT;
+ }
err = register_netdev(dev);
if (err) {
diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_main.h b/drivers/net/ethernet/microchip/lan966x/lan966x_main.h
index e7fdd4447fb6..8911825eab77 100644
--- a/drivers/net/ethernet/microchip/lan966x/lan966x_main.h
+++ b/drivers/net/ethernet/microchip/lan966x/lan966x_main.h
@@ -595,6 +595,16 @@ int lan966x_qsys_sw_status(struct lan966x *lan966x);
#if IS_ENABLED(CONFIG_MCHP_LAN966X_PCI)
extern const struct lan966x_fdma_ops lan966x_fdma_pci_ops;
+
+static inline bool lan966x_is_pci(struct lan966x *lan966x)
+{
+ return lan966x->ops == &lan966x_fdma_pci_ops;
+}
+#else
+static inline bool lan966x_is_pci(struct lan966x *lan966x)
+{
+ return false;
+}
#endif
int lan966x_lag_port_join(struct lan966x_port *port,
diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_xdp.c b/drivers/net/ethernet/microchip/lan966x/lan966x_xdp.c
index 9ee61db8690b..b470f731e25c 100644
--- a/drivers/net/ethernet/microchip/lan966x/lan966x_xdp.c
+++ b/drivers/net/ethernet/microchip/lan966x/lan966x_xdp.c
@@ -24,6 +24,16 @@ static int lan966x_xdp_setup(struct net_device *dev, struct netdev_bpf *xdp)
old_prog = xchg(&port->xdp_prog, xdp->prog);
new_xdp = lan966x_xdp_present(lan966x);
+ /* PCIe FDMA uses contiguous buffers, so no page_pool reload
+ * is needed. Drain NAPI before freeing the old program so
+ * no in-flight poll holds a stale pointer.
+ */
+ if (lan966x_is_pci(lan966x)) {
+ if (old_prog)
+ napi_synchronize(&lan966x->napi);
+ goto out;
+ }
+
if (old_xdp == new_xdp)
goto out;
--
2.34.1
^ permalink raw reply related
* [PATCH net-next v5 12/13] misc: lan966x-pci: dts: extend cpu reg to cover PCIE DBI space
From: Daniel Machon @ 2026-05-20 8:12 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Horatiu Vultur, Steen Hegelund, UNGLinuxDriver,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, Stanislav Fomichev, Herve Codina, Arnd Bergmann,
Greg Kroah-Hartman, Mohsin Bashir
Cc: netdev, linux-kernel, bpf, linux-arm-kernel
In-Reply-To: <20260520-lan966x-pci-fdma-v5-0-ca56197ae05b@microchip.com>
The ATU outbound windows used by the FDMA engine are programmed through
registers at offset 0x400000+, which falls outside the current cpu reg
mapping. Extend the cpu reg size from 0x100000 (1MB) to 0x800000 (8MB)
to cover the full PCIE DBI and iATU register space.
Tested-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
---
drivers/misc/lan966x_pci.dtso | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/misc/lan966x_pci.dtso b/drivers/misc/lan966x_pci.dtso
index 7b196b0a0eb6..7bb726550caf 100644
--- a/drivers/misc/lan966x_pci.dtso
+++ b/drivers/misc/lan966x_pci.dtso
@@ -135,7 +135,7 @@ lan966x_phy1: ethernet-lan966x_phy@2 {
switch: switch@e0000000 {
compatible = "microchip,lan966x-switch";
- reg = <0xe0000000 0x0100000>,
+ reg = <0xe0000000 0x0800000>,
<0xe2000000 0x0800000>;
reg-names = "cpu", "gcb";
--
2.34.1
^ permalink raw reply related
* [PATCH net-next v5 10/13] net: lan966x: add PCIe FDMA MTU change support
From: Daniel Machon @ 2026-05-20 8:12 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Horatiu Vultur, Steen Hegelund, UNGLinuxDriver,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, Stanislav Fomichev, Herve Codina, Arnd Bergmann,
Greg Kroah-Hartman, Mohsin Bashir
Cc: netdev, linux-kernel, bpf, linux-arm-kernel
In-Reply-To: <20260520-lan966x-pci-fdma-v5-0-ca56197ae05b@microchip.com>
Add MTU change support for the PCIe FDMA path. When the MTU changes,
the contiguous ATU-mapped RX and TX buffers are reallocated with the
new size. On allocation failure, the existing buffers are reused
after being reset.
Cap the PCIe DCB ring at 256 (FDMA_PCI_DCB_MAX) to keep the entire
contiguous allocation under MAX_PAGE_ORDER at jumbo MTU, which 512
DCBs would overflow.
Tested-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
---
.../ethernet/microchip/lan966x/lan966x_fdma_pci.c | 157 ++++++++++++++++++++-
1 file changed, 154 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_fdma_pci.c b/drivers/net/ethernet/microchip/lan966x/lan966x_fdma_pci.c
index c27d9e76e735..3ea6d22ee573 100644
--- a/drivers/net/ethernet/microchip/lan966x/lan966x_fdma_pci.c
+++ b/drivers/net/ethernet/microchip/lan966x/lan966x_fdma_pci.c
@@ -3,6 +3,11 @@
#include "fdma_api.h"
#include "lan966x_main.h"
+/* Ring must fit in one MAX_PAGE_ORDER DMA block; 512 DCBs overflows
+ * at jumbo MTU.
+ */
+#define FDMA_PCI_DCB_MAX 256
+
static int lan966x_fdma_pci_dataptr_cb(struct fdma *fdma, int dcb, int db,
u64 *dataptr)
{
@@ -332,7 +337,7 @@ static int lan966x_fdma_pci_init(struct lan966x *lan966x)
lan966x->rx.lan966x = lan966x;
lan966x->rx.max_mtu = lan966x_fdma_get_max_frame(lan966x);
rx_fdma->channel_id = FDMA_XTR_CHANNEL;
- rx_fdma->n_dcbs = FDMA_DCB_MAX;
+ rx_fdma->n_dcbs = FDMA_PCI_DCB_MAX;
rx_fdma->n_dbs = FDMA_RX_DCB_MAX_DBS;
rx_fdma->priv = lan966x;
rx_fdma->db_size = FDMA_PCI_DB_SIZE(lan966x->rx.max_mtu);
@@ -342,7 +347,7 @@ static int lan966x_fdma_pci_init(struct lan966x *lan966x)
lan966x->tx.lan966x = lan966x;
tx_fdma->channel_id = FDMA_INJ_CHANNEL;
- tx_fdma->n_dcbs = FDMA_DCB_MAX;
+ tx_fdma->n_dcbs = FDMA_PCI_DCB_MAX;
tx_fdma->n_dbs = FDMA_TX_DCB_MAX_DBS;
tx_fdma->priv = lan966x;
tx_fdma->db_size = FDMA_PCI_DB_SIZE(lan966x->rx.max_mtu);
@@ -365,9 +370,155 @@ static int lan966x_fdma_pci_init(struct lan966x *lan966x)
return 0;
}
+/* Reset existing rx and tx buffers. */
+static void lan966x_fdma_pci_reset_mem(struct lan966x *lan966x)
+{
+ struct lan966x_rx *rx = &lan966x->rx;
+ struct lan966x_tx *tx = &lan966x->tx;
+
+ memset(rx->fdma.dcbs, 0, rx->fdma.size);
+ memset(tx->fdma.dcbs, 0, tx->fdma.size);
+
+ fdma_dcbs_init(&rx->fdma,
+ FDMA_DCB_INFO_DATAL(rx->fdma.db_size),
+ FDMA_DCB_STATUS_INTR);
+
+ fdma_dcbs_init(&tx->fdma,
+ FDMA_DCB_INFO_DATAL(tx->fdma.db_size),
+ FDMA_DCB_STATUS_DONE);
+
+ lan966x_fdma_llp_configure(lan966x,
+ tx->fdma.atu_region->base_addr,
+ tx->fdma.channel_id);
+ lan966x_fdma_llp_configure(lan966x,
+ rx->fdma.atu_region->base_addr,
+ rx->fdma.channel_id);
+}
+
+/* Drain in-flight xmit callers and stop all TX queues on every port. */
+static void lan966x_fdma_pci_stop_netdev(struct lan966x *lan966x)
+{
+ for (int i = 0; i < lan966x->num_phys_ports; ++i) {
+ struct lan966x_port *port = lan966x->ports[i];
+
+ if (port)
+ netif_tx_disable(port->dev);
+ }
+}
+
+/* Wake all TX queues on every port (undoes lan966x_fdma_pci_stop_netdev). */
+static void lan966x_fdma_pci_wakeup_netdev(struct lan966x *lan966x)
+{
+ for (int i = 0; i < lan966x->num_phys_ports; ++i) {
+ struct lan966x_port *port = lan966x->ports[i];
+
+ if (port)
+ netif_tx_wake_all_queues(port->dev);
+ }
+}
+
+static int lan966x_fdma_pci_reload(struct lan966x *lan966x, int new_mtu)
+{
+ struct fdma tx_fdma_old = lan966x->tx.fdma;
+ struct fdma rx_fdma_old = lan966x->rx.fdma;
+ u32 old_mtu = lan966x->rx.max_mtu;
+ int err;
+
+ napi_synchronize(&lan966x->napi);
+ napi_disable(&lan966x->napi);
+ lan966x_fdma_pci_stop_netdev(lan966x);
+ lan966x_fdma_rx_disable(&lan966x->rx);
+ lan966x_fdma_tx_disable(&lan966x->tx);
+
+ lan966x->rx.max_mtu = new_mtu;
+
+ lan966x->tx.fdma.db_size = FDMA_PCI_DB_SIZE(lan966x->rx.max_mtu);
+ lan966x->tx.fdma.size = fdma_get_size_contiguous(&lan966x->tx.fdma);
+ lan966x->rx.fdma.db_size = FDMA_PCI_DB_SIZE(lan966x->rx.max_mtu);
+ lan966x->rx.fdma.size = fdma_get_size_contiguous(&lan966x->rx.fdma);
+
+ err = lan966x_fdma_pci_rx_alloc(&lan966x->rx);
+ if (err)
+ goto restore;
+
+ err = lan966x_fdma_pci_tx_alloc(&lan966x->tx);
+ if (err) {
+ fdma_free_coherent_and_unmap(lan966x->dev, &lan966x->rx.fdma);
+ goto restore;
+ }
+
+ /* Free and unmap old memory. */
+ fdma_free_coherent_and_unmap(lan966x->dev, &rx_fdma_old);
+ fdma_free_coherent_and_unmap(lan966x->dev, &tx_fdma_old);
+
+ /* Keep this order: rx_start, wakeup_netdev, napi_enable. */
+ lan966x_fdma_rx_start(&lan966x->rx);
+ lan966x_fdma_pci_wakeup_netdev(lan966x);
+ napi_enable(&lan966x->napi);
+
+ return err;
+restore:
+
+ /* No new buffers are allocated at this point. Use the old buffers,
+ * but reset them before starting the FDMA again.
+ */
+
+ memcpy(&lan966x->tx.fdma, &tx_fdma_old, sizeof(struct fdma));
+ memcpy(&lan966x->rx.fdma, &rx_fdma_old, sizeof(struct fdma));
+
+ lan966x->rx.max_mtu = old_mtu;
+
+ lan966x_fdma_pci_reset_mem(lan966x);
+
+ /* Keep this order: rx_start, wakeup_netdev, napi_enable. */
+ lan966x_fdma_rx_start(&lan966x->rx);
+ lan966x_fdma_pci_wakeup_netdev(lan966x);
+ napi_enable(&lan966x->napi);
+
+ return err;
+}
+
+static int __lan966x_fdma_pci_reload(struct lan966x *lan966x, int max_mtu)
+{
+ int err;
+ u32 val;
+
+ /* Disable the CPU port. */
+ lan_rmw(QSYS_SW_PORT_MODE_PORT_ENA_SET(0),
+ QSYS_SW_PORT_MODE_PORT_ENA,
+ lan966x, QSYS_SW_PORT_MODE(CPU_PORT));
+
+ /* Flush the CPU queues. */
+ readx_poll_timeout(lan966x_qsys_sw_status,
+ lan966x,
+ val,
+ !(QSYS_SW_STATUS_EQ_AVAIL_GET(val)),
+ READL_SLEEP_US, READL_TIMEOUT_US);
+
+ /* Add a sleep in case there are frames between the queues and the CPU
+ * port
+ */
+ usleep_range(USEC_PER_MSEC, 2 * USEC_PER_MSEC);
+
+ err = lan966x_fdma_pci_reload(lan966x, max_mtu);
+
+ /* Enable back the CPU port. */
+ lan_rmw(QSYS_SW_PORT_MODE_PORT_ENA_SET(1),
+ QSYS_SW_PORT_MODE_PORT_ENA,
+ lan966x, QSYS_SW_PORT_MODE(CPU_PORT));
+
+ return err;
+}
+
static int lan966x_fdma_pci_resize(struct lan966x *lan966x)
{
- return -EOPNOTSUPP;
+ int max_mtu;
+
+ max_mtu = lan966x_fdma_get_max_frame(lan966x);
+ if (max_mtu == lan966x->rx.max_mtu)
+ return 0;
+
+ return __lan966x_fdma_pci_reload(lan966x, max_mtu);
}
static void lan966x_fdma_pci_deinit(struct lan966x *lan966x)
--
2.34.1
^ permalink raw reply related
* [PATCH net-next v5 09/13] net: lan966x: add PCIe FDMA support
From: Daniel Machon @ 2026-05-20 8:12 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Horatiu Vultur, Steen Hegelund, UNGLinuxDriver,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, Stanislav Fomichev, Herve Codina, Arnd Bergmann,
Greg Kroah-Hartman, Mohsin Bashir
Cc: netdev, linux-kernel, bpf, linux-arm-kernel
In-Reply-To: <20260520-lan966x-pci-fdma-v5-0-ca56197ae05b@microchip.com>
Add PCIe FDMA support for lan966x. The PCIe FDMA path uses contiguous
DMA buffers mapped through the endpoint's ATU, with memcpy-based frame
transfer instead of per-page DMA mappings.
With PCIe FDMA, throughput increases from ~33 Mbps (register-based I/O)
to ~620 Mbps on an Intel x86 host with a lan966x PCIe card.
Tested-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
---
drivers/net/ethernet/microchip/lan966x/Makefile | 4 +
.../ethernet/microchip/lan966x/lan966x_fdma_pci.c | 394 +++++++++++++++++++++
.../net/ethernet/microchip/lan966x/lan966x_main.c | 11 +
.../net/ethernet/microchip/lan966x/lan966x_main.h | 11 +
.../net/ethernet/microchip/lan966x/lan966x_regs.h | 10 +
5 files changed, 430 insertions(+)
diff --git a/drivers/net/ethernet/microchip/lan966x/Makefile b/drivers/net/ethernet/microchip/lan966x/Makefile
index 4cdbe263502c..ac0beceb2a0d 100644
--- a/drivers/net/ethernet/microchip/lan966x/Makefile
+++ b/drivers/net/ethernet/microchip/lan966x/Makefile
@@ -18,6 +18,10 @@ lan966x-switch-objs := lan966x_main.o lan966x_phylink.o lan966x_port.o \
lan966x-switch-$(CONFIG_LAN966X_DCB) += lan966x_dcb.o
lan966x-switch-$(CONFIG_DEBUG_FS) += lan966x_vcap_debugfs.o
+ifdef CONFIG_MCHP_LAN966X_PCI
+lan966x-switch-y += lan966x_fdma_pci.o
+endif
+
# Provide include files
ccflags-y += -I$(srctree)/drivers/net/ethernet/microchip/vcap
ccflags-y += -I$(srctree)/drivers/net/ethernet/microchip/fdma
diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_fdma_pci.c b/drivers/net/ethernet/microchip/lan966x/lan966x_fdma_pci.c
new file mode 100644
index 000000000000..c27d9e76e735
--- /dev/null
+++ b/drivers/net/ethernet/microchip/lan966x/lan966x_fdma_pci.c
@@ -0,0 +1,394 @@
+// SPDX-License-Identifier: GPL-2.0+
+
+#include "fdma_api.h"
+#include "lan966x_main.h"
+
+static int lan966x_fdma_pci_dataptr_cb(struct fdma *fdma, int dcb, int db,
+ u64 *dataptr)
+{
+ u64 addr;
+
+ addr = fdma_dataptr_dma_addr_contiguous(fdma, dcb, db);
+
+ *dataptr = fdma_pci_atu_translate_addr(fdma->atu_region, addr);
+
+ return 0;
+}
+
+static int lan966x_fdma_pci_nextptr_cb(struct fdma *fdma, int dcb, u64 *nextptr)
+{
+ u64 addr;
+
+ fdma_nextptr_cb(fdma, dcb, &addr);
+
+ *nextptr = fdma_pci_atu_translate_addr(fdma->atu_region, addr);
+
+ return 0;
+}
+
+static int lan966x_fdma_pci_rx_alloc(struct lan966x_rx *rx)
+{
+ struct lan966x *lan966x = rx->lan966x;
+ struct fdma *fdma = &rx->fdma;
+ int err;
+
+ err = fdma_alloc_coherent_and_map(lan966x->dev, fdma, &lan966x->atu);
+ if (err)
+ return err;
+
+ fdma_dcbs_init(fdma,
+ FDMA_DCB_INFO_DATAL(fdma->db_size),
+ FDMA_DCB_STATUS_INTR);
+
+ lan966x_fdma_llp_configure(lan966x,
+ fdma->atu_region->base_addr,
+ fdma->channel_id);
+
+ return 0;
+}
+
+static int lan966x_fdma_pci_tx_alloc(struct lan966x_tx *tx)
+{
+ struct lan966x *lan966x = tx->lan966x;
+ struct fdma *fdma = &tx->fdma;
+ int err;
+
+ err = fdma_alloc_coherent_and_map(lan966x->dev, fdma, &lan966x->atu);
+ if (err)
+ return err;
+
+ fdma_dcbs_init(fdma,
+ FDMA_DCB_INFO_DATAL(fdma->db_size),
+ FDMA_DCB_STATUS_DONE);
+
+ lan966x_fdma_llp_configure(lan966x,
+ fdma->atu_region->base_addr,
+ fdma->channel_id);
+
+ return 0;
+}
+
+static int lan966x_fdma_pci_get_next_dcb(struct fdma *fdma)
+{
+ struct fdma_db *db;
+
+ for (int i = 0; i < fdma->n_dcbs; i++) {
+ db = fdma_db_get(fdma, i, 0);
+
+ if (!fdma_db_is_done(db))
+ continue;
+ if (fdma_is_last(fdma, &fdma->dcbs[i]))
+ continue;
+
+ return i;
+ }
+
+ return -ENOSPC;
+}
+
+/* TX slot layout (sizes in bytes):
+ *
+ * +---------------------+-----+---------+-----+
+ * | XDP_PACKET_HEADROOM | IFH | payload | FCS |
+ * | 256 | 28 | len | 4 |
+ * +---------------------+-----+---------+-----+
+ * |<---------------- db_size ----------------->|
+ *
+ * Return true if the frame plus required overhead fits.
+ */
+static bool lan966x_fdma_pci_tx_size_fits(struct fdma *fdma, u32 len)
+{
+ return XDP_PACKET_HEADROOM + IFH_LEN_BYTES + len + ETH_FCS_LEN <=
+ fdma->db_size;
+}
+
+/* Return true if blockl is a valid RX frame size. */
+static bool lan966x_fdma_pci_rx_size_fits(struct fdma *fdma, u32 blockl)
+{
+ return blockl >= IFH_LEN_BYTES + ETH_HLEN + ETH_FCS_LEN &&
+ blockl <= fdma->db_size - XDP_PACKET_HEADROOM;
+}
+
+static int lan966x_fdma_pci_rx_check_frame(struct lan966x_rx *rx, u64 *src_port)
+{
+ struct lan966x *lan966x = rx->lan966x;
+ struct fdma *fdma = &rx->fdma;
+ struct lan966x_port *port;
+ struct fdma_db *db;
+ void *virt_addr;
+ u32 blockl;
+
+ /* virt_addr points to the IFH. */
+ virt_addr = fdma_dataptr_virt_addr_contiguous(fdma,
+ fdma->dcb_index,
+ fdma->db_index);
+
+ lan966x_ifh_get_src_port(virt_addr, src_port);
+
+ if (WARN_ON(*src_port >= lan966x->num_phys_ports))
+ return FDMA_ERROR;
+
+ port = lan966x->ports[*src_port];
+ if (!port)
+ return FDMA_ERROR;
+
+ db = fdma_db_next_get(fdma);
+
+ /* BLOCKL is a 16-bit HW-populated field; reject obviously-bad
+ * values before they feed memcpy/XDP sizes.
+ */
+ blockl = FDMA_DCB_STATUS_BLOCKL(db->status);
+ if (!lan966x_fdma_pci_rx_size_fits(fdma, blockl))
+ return FDMA_ERROR;
+
+ return FDMA_PASS;
+}
+
+static struct sk_buff *lan966x_fdma_pci_rx_get_frame(struct lan966x_rx *rx,
+ u64 src_port)
+{
+ struct lan966x *lan966x = rx->lan966x;
+ struct fdma *fdma = &rx->fdma;
+ struct sk_buff *skb;
+ struct fdma_db *db;
+ u32 data_len;
+
+ /* Get the received frame and create an SKB for it. */
+ db = fdma_db_next_get(fdma);
+ data_len = FDMA_DCB_STATUS_BLOCKL(db->status);
+
+ skb = napi_alloc_skb(&lan966x->napi, data_len);
+ if (unlikely(!skb))
+ return NULL;
+
+ memcpy(skb->data,
+ fdma_dataptr_virt_addr_contiguous(fdma,
+ fdma->dcb_index,
+ fdma->db_index),
+ data_len);
+
+ skb_put(skb, data_len);
+
+ skb->dev = lan966x->ports[src_port]->dev;
+ skb_pull(skb, IFH_LEN_BYTES);
+
+ skb_trim(skb, skb->len - ETH_FCS_LEN);
+
+ skb->protocol = eth_type_trans(skb, skb->dev);
+
+ if (lan966x->bridge_mask & BIT(src_port)) {
+ skb->offload_fwd_mark = 1;
+
+ skb_reset_network_header(skb);
+ if (!lan966x_hw_offload(lan966x, src_port, skb))
+ skb->offload_fwd_mark = 0;
+ }
+
+ skb->dev->stats.rx_bytes += skb->len;
+ skb->dev->stats.rx_packets++;
+
+ return skb;
+}
+
+static int lan966x_fdma_pci_xmit(struct sk_buff *skb, __be32 *ifh,
+ struct net_device *dev)
+{
+ struct lan966x_port *port = netdev_priv(dev);
+ struct lan966x *lan966x = port->lan966x;
+ struct lan966x_tx *tx = &lan966x->tx;
+ struct fdma *fdma = &tx->fdma;
+ int next_to_use;
+ void *virt_addr;
+
+ next_to_use = lan966x_fdma_pci_get_next_dcb(fdma);
+
+ if (next_to_use < 0) {
+ netif_stop_queue(dev);
+ return NETDEV_TX_BUSY;
+ }
+
+ if (skb_put_padto(skb, ETH_ZLEN)) {
+ dev->stats.tx_dropped++;
+ return NETDEV_TX_OK;
+ }
+
+ if (!lan966x_fdma_pci_tx_size_fits(fdma, skb->len)) {
+ dev_kfree_skb_any(skb);
+ dev->stats.tx_dropped++;
+ return NETDEV_TX_OK;
+ }
+
+ skb_tx_timestamp(skb);
+
+ /* virt_addr points to the IFH. */
+ virt_addr = fdma_dataptr_virt_addr_contiguous(fdma, next_to_use, 0);
+ memcpy(virt_addr, ifh, IFH_LEN_BYTES);
+ memcpy(virt_addr + IFH_LEN_BYTES, skb->data, skb->len);
+
+ /* Order frame write before DCB status write below. */
+ dma_wmb();
+
+ fdma_dcb_add(fdma,
+ next_to_use,
+ 0,
+ FDMA_DCB_STATUS_INTR |
+ FDMA_DCB_STATUS_SOF |
+ FDMA_DCB_STATUS_EOF |
+ FDMA_DCB_STATUS_BLOCKO(0) |
+ FDMA_DCB_STATUS_BLOCKL(IFH_LEN_BYTES + skb->len + ETH_FCS_LEN));
+
+ /* Start the transmission. */
+ lan966x_fdma_tx_start(tx);
+
+ dev->stats.tx_bytes += skb->len;
+ dev->stats.tx_packets++;
+
+ /* Safe to free: the PCIe DTBO does not enable the PTP interrupt,
+ * so lan966x->ptp stays 0 and lan966x_port_xmit() never enqueues
+ * this skb on port->tx_skbs for a TX timestamp.
+ */
+ dev_consume_skb_any(skb);
+
+ return NETDEV_TX_OK;
+}
+
+static int lan966x_fdma_pci_napi_poll(struct napi_struct *napi, int weight)
+{
+ struct lan966x *lan966x = container_of(napi, struct lan966x, napi);
+ struct lan966x_rx *rx = &lan966x->rx;
+ struct fdma *fdma = &rx->fdma;
+ int dcb_reload, old_dcb;
+ struct sk_buff *skb;
+ int counter = 0;
+ u64 src_port;
+
+ /* Wake any stopped TX queues if a TX DCB is available. */
+ spin_lock(&lan966x->tx_lock);
+ if (lan966x_fdma_pci_get_next_dcb(&lan966x->tx.fdma) >= 0)
+ lan966x_fdma_wakeup_netdev(lan966x);
+ spin_unlock(&lan966x->tx_lock);
+
+ dcb_reload = fdma->dcb_index;
+
+ /* Get all received skbs. */
+ while (counter < weight) {
+ if (!fdma_has_frames(fdma))
+ break;
+ /* Order DONE read before DCB/frame reads below. */
+ dma_rmb();
+ counter++;
+ switch (lan966x_fdma_pci_rx_check_frame(rx, &src_port)) {
+ case FDMA_PASS:
+ break;
+ case FDMA_ERROR:
+ /* No rx_dropped increment here because src_port is
+ * invalid.
+ */
+ fdma_dcb_advance(fdma);
+ continue;
+ }
+ skb = lan966x_fdma_pci_rx_get_frame(rx, src_port);
+ fdma_dcb_advance(fdma);
+ if (!skb) {
+ lan966x->ports[src_port]->dev->stats.rx_dropped++;
+ continue;
+ }
+
+ napi_gro_receive(&lan966x->napi, skb);
+ }
+ while (dcb_reload != fdma->dcb_index) {
+ old_dcb = dcb_reload;
+ dcb_reload++;
+ dcb_reload &= fdma->n_dcbs - 1;
+
+ fdma_dcb_add(fdma,
+ old_dcb,
+ FDMA_DCB_INFO_DATAL(fdma->db_size),
+ FDMA_DCB_STATUS_INTR);
+
+ lan966x_fdma_rx_reload(rx);
+ }
+
+ if (counter < weight && napi_complete_done(napi, counter))
+ lan_wr(0xff, lan966x, FDMA_INTR_DB_ENA);
+
+ return counter;
+}
+
+static int lan966x_fdma_pci_init(struct lan966x *lan966x)
+{
+ struct fdma *rx_fdma = &lan966x->rx.fdma;
+ struct fdma *tx_fdma = &lan966x->tx.fdma;
+ int err;
+
+ if (!lan966x->fdma)
+ return 0;
+
+ lan_wr(FDMA_CTRL_NRESET_SET(0), lan966x, FDMA_CTRL);
+ lan_wr(FDMA_CTRL_NRESET_SET(1), lan966x, FDMA_CTRL);
+
+ fdma_pci_atu_init(&lan966x->atu, lan966x->regs[TARGET_PCIE_DBI]);
+
+ lan966x->rx.lan966x = lan966x;
+ lan966x->rx.max_mtu = lan966x_fdma_get_max_frame(lan966x);
+ rx_fdma->channel_id = FDMA_XTR_CHANNEL;
+ rx_fdma->n_dcbs = FDMA_DCB_MAX;
+ rx_fdma->n_dbs = FDMA_RX_DCB_MAX_DBS;
+ rx_fdma->priv = lan966x;
+ rx_fdma->db_size = FDMA_PCI_DB_SIZE(lan966x->rx.max_mtu);
+ rx_fdma->size = fdma_get_size_contiguous(rx_fdma);
+ rx_fdma->ops.nextptr_cb = &lan966x_fdma_pci_nextptr_cb;
+ rx_fdma->ops.dataptr_cb = &lan966x_fdma_pci_dataptr_cb;
+
+ lan966x->tx.lan966x = lan966x;
+ tx_fdma->channel_id = FDMA_INJ_CHANNEL;
+ tx_fdma->n_dcbs = FDMA_DCB_MAX;
+ tx_fdma->n_dbs = FDMA_TX_DCB_MAX_DBS;
+ tx_fdma->priv = lan966x;
+ tx_fdma->db_size = FDMA_PCI_DB_SIZE(lan966x->rx.max_mtu);
+ tx_fdma->size = fdma_get_size_contiguous(tx_fdma);
+ tx_fdma->ops.nextptr_cb = &lan966x_fdma_pci_nextptr_cb;
+ tx_fdma->ops.dataptr_cb = &lan966x_fdma_pci_dataptr_cb;
+
+ err = lan966x_fdma_pci_rx_alloc(&lan966x->rx);
+ if (err)
+ return err;
+
+ err = lan966x_fdma_pci_tx_alloc(&lan966x->tx);
+ if (err) {
+ fdma_free_coherent_and_unmap(lan966x->dev, rx_fdma);
+ return err;
+ }
+
+ lan966x_fdma_rx_start(&lan966x->rx);
+
+ return 0;
+}
+
+static int lan966x_fdma_pci_resize(struct lan966x *lan966x)
+{
+ return -EOPNOTSUPP;
+}
+
+static void lan966x_fdma_pci_deinit(struct lan966x *lan966x)
+{
+ if (!lan966x->fdma)
+ return;
+
+ lan966x_fdma_rx_disable(&lan966x->rx);
+ lan966x_fdma_tx_disable(&lan966x->tx);
+
+ napi_synchronize(&lan966x->napi);
+ napi_disable(&lan966x->napi);
+
+ fdma_free_coherent_and_unmap(lan966x->dev, &lan966x->rx.fdma);
+ fdma_free_coherent_and_unmap(lan966x->dev, &lan966x->tx.fdma);
+}
+
+const struct lan966x_fdma_ops lan966x_fdma_pci_ops = {
+ .fdma_init = &lan966x_fdma_pci_init,
+ .fdma_deinit = &lan966x_fdma_pci_deinit,
+ .fdma_xmit = &lan966x_fdma_pci_xmit,
+ .fdma_poll = &lan966x_fdma_pci_napi_poll,
+ .fdma_resize = &lan966x_fdma_pci_resize,
+};
diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_main.c b/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
index cc3c7b6c65ae..7036b1d937d5 100644
--- a/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
+++ b/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
@@ -7,6 +7,7 @@
#include <linux/ip.h>
#include <linux/of.h>
#include <linux/of_net.h>
+#include <linux/pci.h>
#include <linux/phy/phy.h>
#include <linux/platform_device.h>
#include <linux/reset.h>
@@ -49,6 +50,9 @@ struct lan966x_main_io_resource {
static const struct lan966x_main_io_resource lan966x_main_iomap[] = {
{ TARGET_CPU, 0xc0000, 0 }, /* 0xe00c0000 */
{ TARGET_FDMA, 0xc0400, 0 }, /* 0xe00c0400 */
+#if IS_ENABLED(CONFIG_MCHP_LAN966X_PCI)
+ { TARGET_PCIE_DBI, 0x400000, 0 }, /* 0xe0400000 */
+#endif
{ TARGET_ORG, 0, 1 }, /* 0xe2000000 */
{ TARGET_GCB, 0x4000, 1 }, /* 0xe2004000 */
{ TARGET_QS, 0x8000, 1 }, /* 0xe2008000 */
@@ -1100,6 +1104,13 @@ static int lan966x_reset_switch(struct lan966x *lan966x)
static const struct lan966x_fdma_ops *lan966x_get_fdma_ops(struct device *dev)
{
+#if IS_ENABLED(CONFIG_MCHP_LAN966X_PCI)
+ for (struct device *p = dev->parent; p; p = p->parent) {
+ if (dev_is_pci(p))
+ return &lan966x_fdma_pci_ops;
+ }
+#endif
+
return &lan966x_fdma_ops;
}
diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_main.h b/drivers/net/ethernet/microchip/lan966x/lan966x_main.h
index 5f4dbeda17cd..e7fdd4447fb6 100644
--- a/drivers/net/ethernet/microchip/lan966x/lan966x_main.h
+++ b/drivers/net/ethernet/microchip/lan966x/lan966x_main.h
@@ -17,6 +17,9 @@
#include <net/xdp.h>
#include <fdma_api.h>
+#if IS_ENABLED(CONFIG_MCHP_LAN966X_PCI)
+#include <fdma_pci.h>
+#endif
#include <vcap_api.h>
#include <vcap_api_client.h>
@@ -288,6 +291,10 @@ struct lan966x {
void __iomem *regs[NUM_TARGETS];
+#if IS_ENABLED(CONFIG_MCHP_LAN966X_PCI)
+ struct fdma_pci_atu atu;
+#endif
+
int shared_queue_sz;
u8 base_mac[ETH_ALEN];
@@ -586,6 +593,10 @@ void lan966x_fdma_wakeup_netdev(struct lan966x *lan966x);
int lan966x_fdma_get_max_frame(struct lan966x *lan966x);
int lan966x_qsys_sw_status(struct lan966x *lan966x);
+#if IS_ENABLED(CONFIG_MCHP_LAN966X_PCI)
+extern const struct lan966x_fdma_ops lan966x_fdma_pci_ops;
+#endif
+
int lan966x_lag_port_join(struct lan966x_port *port,
struct net_device *brport_dev,
struct net_device *bond,
diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_regs.h b/drivers/net/ethernet/microchip/lan966x/lan966x_regs.h
index aba0d36ae6b5..4778ea217673 100644
--- a/drivers/net/ethernet/microchip/lan966x/lan966x_regs.h
+++ b/drivers/net/ethernet/microchip/lan966x/lan966x_regs.h
@@ -20,6 +20,7 @@ enum lan966x_target {
TARGET_FDMA = 21,
TARGET_GCB = 27,
TARGET_ORG = 36,
+ TARGET_PCIE_DBI = 40,
TARGET_PTP = 41,
TARGET_QS = 42,
TARGET_QSYS = 46,
@@ -1009,6 +1010,15 @@ enum lan966x_target {
#define FDMA_CH_CFG_CH_MEM_GET(x)\
FIELD_GET(FDMA_CH_CFG_CH_MEM, x)
+/* FDMA:FDMA:FDMA_CTRL */
+#define FDMA_CTRL __REG(TARGET_FDMA, 0, 1, 8, 0, 1, 428, 424, 0, 1, 4)
+
+#define FDMA_CTRL_NRESET BIT(0)
+#define FDMA_CTRL_NRESET_SET(x)\
+ FIELD_PREP(FDMA_CTRL_NRESET, x)
+#define FDMA_CTRL_NRESET_GET(x)\
+ FIELD_GET(FDMA_CTRL_NRESET, x)
+
/* FDMA:FDMA:FDMA_PORT_CTRL */
#define FDMA_PORT_CTRL(r) __REG(TARGET_FDMA, 0, 1, 8, 0, 1, 428, 376, r, 2, 4)
--
2.34.1
^ permalink raw reply related
* [PATCH net-next v5 08/13] net: lan966x: add shutdown callback to stop FDMA on reboot
From: Daniel Machon @ 2026-05-20 8:12 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Horatiu Vultur, Steen Hegelund, UNGLinuxDriver,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, Stanislav Fomichev, Herve Codina, Arnd Bergmann,
Greg Kroah-Hartman, Mohsin Bashir
Cc: netdev, linux-kernel, bpf, linux-arm-kernel
In-Reply-To: <20260520-lan966x-pci-fdma-v5-0-ca56197ae05b@microchip.com>
When lan966x is used as a PCIe endpoint, the FDMA engine runs on the
card and survives a host reboot. Without a shutdown callback, channels
stay active and interrupt sources stay armed across the reset, causing
the shared PCIe INTx to assert before the driver has re-probed.
Add a shutdown callback, shared by the platform and PCI paths, that
masks FDMA interrupts (FDMA_INTR_ENA and FDMA_INTR_DB_ENA) and disables
the RX and TX channels.
FDMA_INTR_ENA persists on the card across a warm reboot, so also
restore the full enable in lan966x_fdma_rx_start() to re-arm interrupts
after a previous shutdown(). rx_start() runs after both the RX and TX
rings are allocated, so the same single-site re-arm works for both the
platform and PCIe backends.
Tested-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
---
drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c | 4 ++++
drivers/net/ethernet/microchip/lan966x/lan966x_main.c | 18 ++++++++++++++++++
drivers/net/ethernet/microchip/lan966x/lan966x_regs.h | 15 +++++++++++++++
3 files changed, 37 insertions(+)
diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c b/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c
index 9bb40383aa56..493aef5ba8d1 100644
--- a/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c
+++ b/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c
@@ -146,6 +146,10 @@ void lan966x_fdma_rx_start(struct lan966x_rx *rx)
struct fdma *fdma = &rx->fdma;
u32 mask;
+ lan_wr(FDMA_INTR_ENA_INTR_PORT_ENA_SET(GENMASK(1, 0)) |
+ FDMA_INTR_ENA_INTR_CH_ENA_SET(GENMASK(7, 0)),
+ lan966x, FDMA_INTR_ENA);
+
lan_wr(FDMA_CH_CFG_CH_DCB_DB_CNT_SET(fdma->n_dbs) |
FDMA_CH_CFG_CH_INTR_DB_EOF_ONLY_SET(1) |
FDMA_CH_CFG_CH_INJ_PORT_SET(0) |
diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_main.c b/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
index f6d64323d06c..cc3c7b6c65ae 100644
--- a/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
+++ b/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
@@ -1313,9 +1313,27 @@ static void lan966x_remove(struct platform_device *pdev)
debugfs_remove_recursive(lan966x->debugfs_root);
}
+static void lan966x_shutdown(struct platform_device *pdev)
+{
+ struct lan966x *lan966x = platform_get_drvdata(pdev);
+
+ if (!lan966x->fdma)
+ return;
+
+ lan966x_fdma_rx_disable(&lan966x->rx);
+ lan966x_fdma_tx_disable(&lan966x->tx);
+
+ napi_synchronize(&lan966x->napi);
+ napi_disable(&lan966x->napi);
+
+ lan_wr(0, lan966x, FDMA_INTR_ENA);
+ lan_wr(0, lan966x, FDMA_INTR_DB_ENA);
+}
+
static struct platform_driver lan966x_driver = {
.probe = lan966x_probe,
.remove = lan966x_remove,
+ .shutdown = lan966x_shutdown,
.driver = {
.name = "lan966x-switch",
.of_match_table = lan966x_match,
diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_regs.h b/drivers/net/ethernet/microchip/lan966x/lan966x_regs.h
index 4b553927d2e0..aba0d36ae6b5 100644
--- a/drivers/net/ethernet/microchip/lan966x/lan966x_regs.h
+++ b/drivers/net/ethernet/microchip/lan966x/lan966x_regs.h
@@ -1039,6 +1039,21 @@ enum lan966x_target {
/* FDMA:FDMA:FDMA_INTR_ERR */
#define FDMA_INTR_ERR __REG(TARGET_FDMA, 0, 1, 8, 0, 1, 428, 400, 0, 1, 4)
+/* FDMA:FDMA:FDMA_INTR_ENA */
+#define FDMA_INTR_ENA __REG(TARGET_FDMA, 0, 1, 8, 0, 1, 428, 404, 0, 1, 4)
+
+#define FDMA_INTR_ENA_INTR_PORT_ENA GENMASK(9, 8)
+#define FDMA_INTR_ENA_INTR_PORT_ENA_SET(x)\
+ FIELD_PREP(FDMA_INTR_ENA_INTR_PORT_ENA, x)
+#define FDMA_INTR_ENA_INTR_PORT_ENA_GET(x)\
+ FIELD_GET(FDMA_INTR_ENA_INTR_PORT_ENA, x)
+
+#define FDMA_INTR_ENA_INTR_CH_ENA GENMASK(7, 0)
+#define FDMA_INTR_ENA_INTR_CH_ENA_SET(x)\
+ FIELD_PREP(FDMA_INTR_ENA_INTR_CH_ENA, x)
+#define FDMA_INTR_ENA_INTR_CH_ENA_GET(x)\
+ FIELD_GET(FDMA_INTR_ENA_INTR_CH_ENA, x)
+
/* FDMA:FDMA:FDMA_ERRORS */
#define FDMA_ERRORS __REG(TARGET_FDMA, 0, 1, 8, 0, 1, 428, 412, 0, 1, 4)
--
2.34.1
^ permalink raw reply related
* [PATCH net-next v5 07/13] net: lan966x: clear FDMA interrupt stickies after switch reset
From: Daniel Machon @ 2026-05-20 8:12 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Horatiu Vultur, Steen Hegelund, UNGLinuxDriver,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, Stanislav Fomichev, Herve Codina, Arnd Bergmann,
Greg Kroah-Hartman, Mohsin Bashir
Cc: netdev, linux-kernel, bpf, linux-arm-kernel
In-Reply-To: <20260520-lan966x-pci-fdma-v5-0-ca56197ae05b@microchip.com>
When in PCI mode, the GCB soft reset issued by the reset controller
can latch spurious bits in the FDMA error stickies. The latched bits
sit in FDMA_INTR_ERR until the FDMA IRQ is requested later in probe,
at which point the handler fires immediately and WARNs.
Clear FDMA_ERRORS, FDMA_INTR_ERR and FDMA_INTR_DB right after the
switch reset so the FDMA comes out clean and the IRQ handler does not
see ghost errors on probe.
The clear runs on both the PCI and platform paths. On the platform
path it has no effect — there are no spurious stickies to clear — but
keeping it unconditional avoids a PCI-specific code path here.
Tested-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
---
drivers/net/ethernet/microchip/lan966x/lan966x_main.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_main.c b/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
index ff3c6c76f16c..f6d64323d06c 100644
--- a/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
+++ b/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
@@ -1066,6 +1066,15 @@ static int lan966x_reset_switch(struct lan966x *lan966x)
reset_control_reset(switch_reset);
+ /* When in PCI mode, the GCB soft reset issued by the reset
+ * controller can latch spurious bits in the FDMA error stickies.
+ * Clear them before request_irq hooks up the FDMA IRQ line,
+ * otherwise the handler fires immediately on probe.
+ */
+ lan_wr(lan_rd(lan966x, FDMA_ERRORS), lan966x, FDMA_ERRORS);
+ lan_wr(lan_rd(lan966x, FDMA_INTR_ERR), lan966x, FDMA_INTR_ERR);
+ lan_wr(lan_rd(lan966x, FDMA_INTR_DB), lan966x, FDMA_INTR_DB);
+
/* Don't reinitialize the switch core, if it is already initialized. In
* case it is initialized twice, some pointers inside the queue system
* in HW will get corrupted and then after a while the queue system gets
--
2.34.1
^ permalink raw reply related
* [PATCH net-next v5 06/13] net: lan966x: add FDMA ops dispatch for PCIe support
From: Daniel Machon @ 2026-05-20 8:12 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Horatiu Vultur, Steen Hegelund, UNGLinuxDriver,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, Stanislav Fomichev, Herve Codina, Arnd Bergmann,
Greg Kroah-Hartman, Mohsin Bashir
Cc: netdev, linux-kernel, bpf, linux-arm-kernel
In-Reply-To: <20260520-lan966x-pci-fdma-v5-0-ca56197ae05b@microchip.com>
Introduce lan966x_fdma_ops to support different FDMA implementations
for platform and PCIe. Plumb fdma_init, fdma_deinit, fdma_xmit,
fdma_poll and fdma_resize through the ops table, and select the
implementation at probe time based on runtime PCI bus detection.
Tested-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
---
.../net/ethernet/microchip/lan966x/lan966x_fdma.c | 2 +-
.../net/ethernet/microchip/lan966x/lan966x_main.c | 25 +++++++++++++++++-----
.../net/ethernet/microchip/lan966x/lan966x_main.h | 13 +++++++++++
3 files changed, 34 insertions(+), 6 deletions(-)
diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c b/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c
index 25e673bdf084..9bb40383aa56 100644
--- a/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c
+++ b/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c
@@ -925,7 +925,7 @@ void lan966x_fdma_netdev_init(struct lan966x *lan966x, struct net_device *dev)
return;
lan966x->fdma_ndev = dev;
- netif_napi_add(dev, &lan966x->napi, lan966x_fdma_napi_poll);
+ netif_napi_add(dev, &lan966x->napi, lan966x->ops->fdma_poll);
napi_enable(&lan966x->napi);
}
diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_main.c b/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
index 1179a6e127c5..ff3c6c76f16c 100644
--- a/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
+++ b/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
@@ -26,6 +26,14 @@
#define IO_RANGES 2
+static const struct lan966x_fdma_ops lan966x_fdma_ops = {
+ .fdma_init = &lan966x_fdma_init,
+ .fdma_deinit = &lan966x_fdma_deinit,
+ .fdma_xmit = &lan966x_fdma_xmit,
+ .fdma_poll = &lan966x_fdma_napi_poll,
+ .fdma_resize = &lan966x_fdma_change_mtu,
+};
+
static const struct of_device_id lan966x_match[] = {
{ .compatible = "microchip,lan966x-switch" },
{ }
@@ -391,7 +399,7 @@ static netdev_tx_t lan966x_port_xmit(struct sk_buff *skb,
spin_lock(&lan966x->tx_lock);
if (port->lan966x->fdma)
- err = lan966x_fdma_xmit(skb, ifh, dev);
+ err = lan966x->ops->fdma_xmit(skb, ifh, dev);
else
err = lan966x_port_ifh_xmit(skb, ifh, dev);
spin_unlock(&lan966x->tx_lock);
@@ -413,7 +421,7 @@ static int lan966x_port_change_mtu(struct net_device *dev, int new_mtu)
if (!lan966x->fdma)
return 0;
- err = lan966x_fdma_change_mtu(lan966x);
+ err = lan966x->ops->fdma_resize(lan966x);
if (err) {
lan_wr(DEV_MAC_MAXLEN_CFG_MAX_LEN_SET(LAN966X_HW_MTU(old_mtu)),
lan966x, DEV_MAC_MAXLEN_CFG(port->chip_port));
@@ -1081,6 +1089,11 @@ static int lan966x_reset_switch(struct lan966x *lan966x)
return 0;
}
+static const struct lan966x_fdma_ops *lan966x_get_fdma_ops(struct device *dev)
+{
+ return &lan966x_fdma_ops;
+}
+
static int lan966x_probe(struct platform_device *pdev)
{
struct fwnode_handle *ports, *portnp;
@@ -1095,6 +1108,8 @@ static int lan966x_probe(struct platform_device *pdev)
platform_set_drvdata(pdev, lan966x);
lan966x->dev = &pdev->dev;
+ lan966x->ops = lan966x_get_fdma_ops(&pdev->dev);
+
if (!device_get_mac_address(&pdev->dev, mac_addr)) {
ether_addr_copy(lan966x->base_mac, mac_addr);
} else {
@@ -1234,7 +1249,7 @@ static int lan966x_probe(struct platform_device *pdev)
if (err)
goto cleanup_fdb;
- err = lan966x_fdma_init(lan966x);
+ err = lan966x->ops->fdma_init(lan966x);
if (err)
goto cleanup_ptp;
@@ -1247,7 +1262,7 @@ static int lan966x_probe(struct platform_device *pdev)
return 0;
cleanup_fdma:
- lan966x_fdma_deinit(lan966x);
+ lan966x->ops->fdma_deinit(lan966x);
cleanup_ptp:
lan966x_ptp_deinit(lan966x);
@@ -1275,7 +1290,7 @@ static void lan966x_remove(struct platform_device *pdev)
lan966x_taprio_deinit(lan966x);
lan966x_vcap_deinit(lan966x);
- lan966x_fdma_deinit(lan966x);
+ lan966x->ops->fdma_deinit(lan966x);
lan966x_cleanup_ports(lan966x);
cancel_delayed_work_sync(&lan966x->stats_work);
diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_main.h b/drivers/net/ethernet/microchip/lan966x/lan966x_main.h
index 83c361abb789..5f4dbeda17cd 100644
--- a/drivers/net/ethernet/microchip/lan966x/lan966x_main.h
+++ b/drivers/net/ethernet/microchip/lan966x/lan966x_main.h
@@ -193,6 +193,17 @@ enum vcap_is1_port_sel_rt {
VCAP_IS1_PS_RT_FOLLOW_OTHER = 7,
};
+struct lan966x;
+
+struct lan966x_fdma_ops {
+ int (*fdma_init)(struct lan966x *lan966x);
+ void (*fdma_deinit)(struct lan966x *lan966x);
+ int (*fdma_xmit)(struct sk_buff *skb, __be32 *ifh,
+ struct net_device *dev);
+ int (*fdma_poll)(struct napi_struct *napi, int weight);
+ int (*fdma_resize)(struct lan966x *lan966x);
+};
+
struct lan966x_port;
struct lan966x_rx {
@@ -270,6 +281,8 @@ struct lan966x_skb_cb {
struct lan966x {
struct device *dev;
+ const struct lan966x_fdma_ops *ops;
+
u8 num_phys_ports;
struct lan966x_port **ports;
--
2.34.1
^ permalink raw reply related
* [PATCH net-next v5 05/13] net: lan966x: export FDMA helpers for reuse
From: Daniel Machon @ 2026-05-20 8:12 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Horatiu Vultur, Steen Hegelund, UNGLinuxDriver,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, Stanislav Fomichev, Herve Codina, Arnd Bergmann,
Greg Kroah-Hartman, Mohsin Bashir
Cc: netdev, linux-kernel, bpf, linux-arm-kernel
In-Reply-To: <20260520-lan966x-pci-fdma-v5-0-ca56197ae05b@microchip.com>
Make shared FDMA helpers non-static, so they can be reused by the PCIe
FDMA implementation.
Tested-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
---
.../net/ethernet/microchip/lan966x/lan966x_fdma.c | 22 +++++++++++-----------
.../net/ethernet/microchip/lan966x/lan966x_main.h | 11 +++++++++++
2 files changed, 22 insertions(+), 11 deletions(-)
diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c b/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c
index 6c5761e886d4..25e673bdf084 100644
--- a/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c
+++ b/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c
@@ -109,8 +109,8 @@ static int lan966x_fdma_rx_alloc_page_pool(struct lan966x_rx *rx)
return PTR_ERR_OR_ZERO(rx->page_pool);
}
-static void lan966x_fdma_llp_configure(struct lan966x *lan966x, u64 addr,
- u8 channel_id)
+void lan966x_fdma_llp_configure(struct lan966x *lan966x, u64 addr,
+ u8 channel_id)
{
lan_wr(lower_32_bits(addr), lan966x, FDMA_DCB_LLP(channel_id));
lan_wr(upper_32_bits(addr), lan966x, FDMA_DCB_LLP1(channel_id));
@@ -140,7 +140,7 @@ static int lan966x_fdma_rx_alloc(struct lan966x_rx *rx)
return 0;
}
-static void lan966x_fdma_rx_start(struct lan966x_rx *rx)
+void lan966x_fdma_rx_start(struct lan966x_rx *rx)
{
struct lan966x *lan966x = rx->lan966x;
struct fdma *fdma = &rx->fdma;
@@ -171,7 +171,7 @@ static void lan966x_fdma_rx_start(struct lan966x_rx *rx)
lan966x, FDMA_CH_ACTIVATE);
}
-static void lan966x_fdma_rx_disable(struct lan966x_rx *rx)
+void lan966x_fdma_rx_disable(struct lan966x_rx *rx)
{
struct lan966x *lan966x = rx->lan966x;
struct fdma *fdma = &rx->fdma;
@@ -191,7 +191,7 @@ static void lan966x_fdma_rx_disable(struct lan966x_rx *rx)
lan966x, FDMA_CH_DB_DISCARD);
}
-static void lan966x_fdma_rx_reload(struct lan966x_rx *rx)
+void lan966x_fdma_rx_reload(struct lan966x_rx *rx)
{
struct lan966x *lan966x = rx->lan966x;
@@ -265,7 +265,7 @@ static void lan966x_fdma_tx_activate(struct lan966x_tx *tx)
lan966x, FDMA_CH_ACTIVATE);
}
-static void lan966x_fdma_tx_disable(struct lan966x_tx *tx)
+void lan966x_fdma_tx_disable(struct lan966x_tx *tx)
{
struct lan966x *lan966x = tx->lan966x;
struct fdma *fdma = &tx->fdma;
@@ -297,7 +297,7 @@ static void lan966x_fdma_tx_reload(struct lan966x_tx *tx)
lan966x, FDMA_CH_RELOAD);
}
-static void lan966x_fdma_wakeup_netdev(struct lan966x *lan966x)
+void lan966x_fdma_wakeup_netdev(struct lan966x *lan966x)
{
struct lan966x_port *port;
int i;
@@ -471,7 +471,7 @@ static struct sk_buff *lan966x_fdma_rx_get_frame(struct lan966x_rx *rx,
return NULL;
}
-static int lan966x_fdma_napi_poll(struct napi_struct *napi, int weight)
+int lan966x_fdma_napi_poll(struct napi_struct *napi, int weight)
{
struct lan966x *lan966x = container_of(napi, struct lan966x, napi);
struct lan966x_rx *rx = &lan966x->rx;
@@ -584,7 +584,7 @@ static int lan966x_fdma_get_next_dcb(struct lan966x_tx *tx)
return -1;
}
-static void lan966x_fdma_tx_start(struct lan966x_tx *tx)
+void lan966x_fdma_tx_start(struct lan966x_tx *tx)
{
struct lan966x *lan966x = tx->lan966x;
@@ -802,7 +802,7 @@ static int lan966x_fdma_get_max_mtu(struct lan966x *lan966x)
return max_mtu;
}
-static int lan966x_qsys_sw_status(struct lan966x *lan966x)
+int lan966x_qsys_sw_status(struct lan966x *lan966x)
{
return lan_rd(lan966x, QSYS_SW_STATUS(CPU_PORT));
}
@@ -861,7 +861,7 @@ static int lan966x_fdma_reload(struct lan966x *lan966x, int new_mtu)
return err;
}
-static int lan966x_fdma_get_max_frame(struct lan966x *lan966x)
+int lan966x_fdma_get_max_frame(struct lan966x *lan966x)
{
return lan966x_fdma_get_max_mtu(lan966x) +
IFH_LEN_BYTES +
diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_main.h b/drivers/net/ethernet/microchip/lan966x/lan966x_main.h
index eea286c29474..83c361abb789 100644
--- a/drivers/net/ethernet/microchip/lan966x/lan966x_main.h
+++ b/drivers/net/ethernet/microchip/lan966x/lan966x_main.h
@@ -561,6 +561,17 @@ int lan966x_fdma_init(struct lan966x *lan966x);
void lan966x_fdma_deinit(struct lan966x *lan966x);
irqreturn_t lan966x_fdma_irq_handler(int irq, void *args);
int lan966x_fdma_reload_page_pool(struct lan966x *lan966x);
+int lan966x_fdma_napi_poll(struct napi_struct *napi, int weight);
+void lan966x_fdma_llp_configure(struct lan966x *lan966x, u64 addr,
+ u8 channel_id);
+void lan966x_fdma_rx_start(struct lan966x_rx *rx);
+void lan966x_fdma_rx_disable(struct lan966x_rx *rx);
+void lan966x_fdma_rx_reload(struct lan966x_rx *rx);
+void lan966x_fdma_tx_start(struct lan966x_tx *tx);
+void lan966x_fdma_tx_disable(struct lan966x_tx *tx);
+void lan966x_fdma_wakeup_netdev(struct lan966x *lan966x);
+int lan966x_fdma_get_max_frame(struct lan966x *lan966x);
+int lan966x_qsys_sw_status(struct lan966x *lan966x);
int lan966x_lag_port_join(struct lan966x_port *port,
struct net_device *brport_dev,
--
2.34.1
^ permalink raw reply related
* [PATCH net-next v5 04/13] net: lan966x: add FDMA LLP register write helper
From: Daniel Machon @ 2026-05-20 8:12 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Horatiu Vultur, Steen Hegelund, UNGLinuxDriver,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, Stanislav Fomichev, Herve Codina, Arnd Bergmann,
Greg Kroah-Hartman, Mohsin Bashir
Cc: netdev, linux-kernel, bpf, linux-arm-kernel
In-Reply-To: <20260520-lan966x-pci-fdma-v5-0-ca56197ae05b@microchip.com>
The FDMA Link List Pointer (LLP) register points to the first DCB in the
chain and must be written before the channel is activated. This tells
the FDMA engine where to begin DMA transfers.
Move the LLP register writes from the channel start/activate functions
into the allocation functions and introduce a shared
lan966x_fdma_llp_configure() helper. This is needed because the upcoming
PCIe FDMA path writes ATU-translated addresses to the LLP registers
instead of DMA addresses. Keeping the writes in the shared
start/activate path would overwrite these translated addresses.
Tested-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
---
.../net/ethernet/microchip/lan966x/lan966x_fdma.c | 29 ++++++++++------------
1 file changed, 13 insertions(+), 16 deletions(-)
diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c b/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c
index f8ce735a7fc0..6c5761e886d4 100644
--- a/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c
+++ b/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c
@@ -109,6 +109,13 @@ static int lan966x_fdma_rx_alloc_page_pool(struct lan966x_rx *rx)
return PTR_ERR_OR_ZERO(rx->page_pool);
}
+static void lan966x_fdma_llp_configure(struct lan966x *lan966x, u64 addr,
+ u8 channel_id)
+{
+ lan_wr(lower_32_bits(addr), lan966x, FDMA_DCB_LLP(channel_id));
+ lan_wr(upper_32_bits(addr), lan966x, FDMA_DCB_LLP1(channel_id));
+}
+
static int lan966x_fdma_rx_alloc(struct lan966x_rx *rx)
{
struct lan966x *lan966x = rx->lan966x;
@@ -127,6 +134,9 @@ static int lan966x_fdma_rx_alloc(struct lan966x_rx *rx)
fdma_dcbs_init(fdma, FDMA_DCB_INFO_DATAL(fdma->db_size),
FDMA_DCB_STATUS_INTR);
+ lan966x_fdma_llp_configure(lan966x, (u64)fdma->dma,
+ fdma->channel_id);
+
return 0;
}
@@ -136,14 +146,6 @@ static void lan966x_fdma_rx_start(struct lan966x_rx *rx)
struct fdma *fdma = &rx->fdma;
u32 mask;
- /* When activating a channel, first is required to write the first DCB
- * address and then to activate it
- */
- lan_wr(lower_32_bits((u64)fdma->dma), lan966x,
- FDMA_DCB_LLP(fdma->channel_id));
- lan_wr(upper_32_bits((u64)fdma->dma), lan966x,
- FDMA_DCB_LLP1(fdma->channel_id));
-
lan_wr(FDMA_CH_CFG_CH_DCB_DB_CNT_SET(fdma->n_dbs) |
FDMA_CH_CFG_CH_INTR_DB_EOF_ONLY_SET(1) |
FDMA_CH_CFG_CH_INJ_PORT_SET(0) |
@@ -214,6 +216,9 @@ static int lan966x_fdma_tx_alloc(struct lan966x_tx *tx)
fdma_dcbs_init(fdma, 0, 0);
+ lan966x_fdma_llp_configure(lan966x, (u64)fdma->dma,
+ fdma->channel_id);
+
return 0;
out:
@@ -235,14 +240,6 @@ static void lan966x_fdma_tx_activate(struct lan966x_tx *tx)
struct fdma *fdma = &tx->fdma;
u32 mask;
- /* When activating a channel, first is required to write the first DCB
- * address and then to activate it
- */
- lan_wr(lower_32_bits((u64)fdma->dma), lan966x,
- FDMA_DCB_LLP(fdma->channel_id));
- lan_wr(upper_32_bits((u64)fdma->dma), lan966x,
- FDMA_DCB_LLP1(fdma->channel_id));
-
lan_wr(FDMA_CH_CFG_CH_DCB_DB_CNT_SET(fdma->n_dbs) |
FDMA_CH_CFG_CH_INTR_DB_EOF_ONLY_SET(1) |
FDMA_CH_CFG_CH_INJ_PORT_SET(0) |
--
2.34.1
^ permalink raw reply related
* [PATCH net-next v5 03/13] net: microchip: fdma: add PCIe ATU support
From: Daniel Machon @ 2026-05-20 8:12 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Horatiu Vultur, Steen Hegelund, UNGLinuxDriver,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, Stanislav Fomichev, Herve Codina, Arnd Bergmann,
Greg Kroah-Hartman, Mohsin Bashir
Cc: netdev, linux-kernel, bpf, linux-arm-kernel
In-Reply-To: <20260520-lan966x-pci-fdma-v5-0-ca56197ae05b@microchip.com>
When lan966x or lan969x operates as a PCIe endpoint, the internal FDMA
engine cannot directly access host memory. Instead, DMA addresses must
be translated through the PCIe Address Translation Unit (ATU). The ATU
provides outbound windows that map internal addresses to PCIe bus
addresses.
The ATU outbound address space (0x10000000-0x1fffffff) is divided into
six equally-sized regions (~42MB each). When FDMA buffers are allocated,
a free ATU region is claimed and programmed with the DMA target address.
The FDMA engine then uses the region's base address in its descriptors,
and the ATU translates these to the actual DMA addresses on the PCIe bus.
Add the required functions and helpers that combine the DMA allocation
with the ATU region mapping, effectively adding support for PCIe FDMA.
This implementation will also be used by the lan969x, when PCIe FDMA is
added for that platform in the future.
Tested-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
---
drivers/net/ethernet/microchip/fdma/Makefile | 4 +
drivers/net/ethernet/microchip/fdma/fdma_api.c | 33 +++++
drivers/net/ethernet/microchip/fdma/fdma_api.h | 16 +++
drivers/net/ethernet/microchip/fdma/fdma_pci.c | 182 +++++++++++++++++++++++++
drivers/net/ethernet/microchip/fdma/fdma_pci.h | 42 ++++++
5 files changed, 277 insertions(+)
diff --git a/drivers/net/ethernet/microchip/fdma/Makefile b/drivers/net/ethernet/microchip/fdma/Makefile
index cc9a736be357..eed4df6f7158 100644
--- a/drivers/net/ethernet/microchip/fdma/Makefile
+++ b/drivers/net/ethernet/microchip/fdma/Makefile
@@ -5,3 +5,7 @@
obj-$(CONFIG_FDMA) += fdma.o
fdma-y += fdma_api.o
+
+ifdef CONFIG_MCHP_LAN966X_PCI
+fdma-y += fdma_pci.o
+endif
diff --git a/drivers/net/ethernet/microchip/fdma/fdma_api.c b/drivers/net/ethernet/microchip/fdma/fdma_api.c
index e78c3590da9e..e0c2b137afef 100644
--- a/drivers/net/ethernet/microchip/fdma/fdma_api.c
+++ b/drivers/net/ethernet/microchip/fdma/fdma_api.c
@@ -127,6 +127,39 @@ void fdma_free_phys(struct fdma *fdma)
}
EXPORT_SYMBOL_GPL(fdma_free_phys);
+#if IS_ENABLED(CONFIG_MCHP_LAN966X_PCI)
+/* Allocate coherent DMA memory and map it in the ATU. */
+int fdma_alloc_coherent_and_map(struct device *dev, struct fdma *fdma,
+ struct fdma_pci_atu *atu)
+{
+ struct fdma_pci_atu_region *region;
+ int err;
+
+ err = fdma_alloc_coherent(dev, fdma);
+ if (err)
+ return err;
+
+ region = fdma_pci_atu_region_map(atu, fdma->dma, fdma->size);
+ if (IS_ERR(region)) {
+ fdma_free_coherent(dev, fdma);
+ return PTR_ERR(region);
+ }
+
+ fdma->atu_region = region;
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(fdma_alloc_coherent_and_map);
+
+/* Free coherent DMA memory and unmap the memory in the ATU. */
+void fdma_free_coherent_and_unmap(struct device *dev, struct fdma *fdma)
+{
+ fdma_pci_atu_region_unmap(fdma->atu_region);
+ fdma_free_coherent(dev, fdma);
+}
+EXPORT_SYMBOL_GPL(fdma_free_coherent_and_unmap);
+#endif
+
/* Get the size of the FDMA memory */
u32 fdma_get_size(struct fdma *fdma)
{
diff --git a/drivers/net/ethernet/microchip/fdma/fdma_api.h b/drivers/net/ethernet/microchip/fdma/fdma_api.h
index 94f1a6596097..0e0f8af7463f 100644
--- a/drivers/net/ethernet/microchip/fdma/fdma_api.h
+++ b/drivers/net/ethernet/microchip/fdma/fdma_api.h
@@ -7,6 +7,10 @@
#include <linux/etherdevice.h>
#include <linux/types.h>
+#if IS_ENABLED(CONFIG_MCHP_LAN966X_PCI)
+#include "fdma_pci.h"
+#endif
+
/* This provides a common set of functions and data structures for interacting
* with the Frame DMA engine on multiple Microchip switchcores.
*
@@ -109,6 +113,11 @@ struct fdma {
u32 channel_id;
struct fdma_ops ops;
+
+#if IS_ENABLED(CONFIG_MCHP_LAN966X_PCI)
+ /* PCI ATU region for this FDMA instance. */
+ struct fdma_pci_atu_region *atu_region;
+#endif
};
/* Advance the DCB index and wrap if required. */
@@ -234,9 +243,16 @@ int __fdma_dcb_add(struct fdma *fdma, int dcb_idx, u64 info, u64 status,
int fdma_alloc_coherent(struct device *dev, struct fdma *fdma);
int fdma_alloc_phys(struct fdma *fdma);
+#if IS_ENABLED(CONFIG_MCHP_LAN966X_PCI)
+int fdma_alloc_coherent_and_map(struct device *dev, struct fdma *fdma,
+ struct fdma_pci_atu *atu);
+#endif
void fdma_free_coherent(struct device *dev, struct fdma *fdma);
void fdma_free_phys(struct fdma *fdma);
+#if IS_ENABLED(CONFIG_MCHP_LAN966X_PCI)
+void fdma_free_coherent_and_unmap(struct device *dev, struct fdma *fdma);
+#endif
u32 fdma_get_size(struct fdma *fdma);
u32 fdma_get_size_contiguous(struct fdma *fdma);
diff --git a/drivers/net/ethernet/microchip/fdma/fdma_pci.c b/drivers/net/ethernet/microchip/fdma/fdma_pci.c
new file mode 100644
index 000000000000..1bd41eaa58a4
--- /dev/null
+++ b/drivers/net/ethernet/microchip/fdma/fdma_pci.c
@@ -0,0 +1,182 @@
+// SPDX-License-Identifier: GPL-2.0+
+
+#include <linux/errno.h>
+#include <linux/io.h>
+#include <linux/kernel.h>
+#include <linux/types.h>
+
+#include "fdma_pci.h"
+
+/* When the switch operates as a PCIe endpoint, the FDMA engine needs to
+ * DMA to/from host memory. The FDMA writes to addresses within the endpoint's
+ * internal Outbound (OB) address space, and the PCIe ATU translates these to
+ * DMA addresses on the PCIe bus, targeting host memory.
+ *
+ * The ATU supports up to six outbound regions. This implementation divides
+ * the OB address space into six equally sized chunks.
+ *
+ * +-------------+------------+------------+-----+------------+
+ * | Index | Region 0 | Region 1 | ... | Region 5 |
+ * +-------------+------------+------------+-----+------------+
+ * | Base addr | 0x10000000 | 0x12aa0000 | ... | 0x1d500000 |
+ * | Limit addr | 0x12a9ffff | 0x1553ffff | ... | 0x1ff9ffff |
+ * | Target addr | host dma | host dma | ... | host dma |
+ * +-------------+------------+------------+-----+------------+
+ *
+ * Base addr is the start address of the region within the OB address space.
+ * Limit addr is the end address of the region within the OB address space.
+ * Target addr is the host DMA address that the base addr translates to.
+ */
+
+#define FDMA_PCI_ATU_REGION_ALIGN BIT(16) /* 64KB */
+#define FDMA_PCI_ATU_OB_START 0x10000000
+#define FDMA_PCI_ATU_OB_END 0x1fffffff
+
+#define FDMA_PCI_ATU_ADDR 0x300000
+#define FDMA_PCI_ATU_IDX_SIZE 0x200
+#define FDMA_PCI_ATU_ENA_REG 0x4
+#define FDMA_PCI_ATU_ENA_BIT BIT(31)
+#define FDMA_PCI_ATU_LWR_BASE_ADDR 0x8
+#define FDMA_PCI_ATU_UPP_BASE_ADDR 0xc
+#define FDMA_PCI_ATU_LIMIT_ADDR 0x10
+#define FDMA_PCI_ATU_LWR_TARGET_ADDR 0x14
+#define FDMA_PCI_ATU_UPP_TARGET_ADDR 0x18
+
+static u32 fdma_pci_atu_region_size(void)
+{
+ return round_down((FDMA_PCI_ATU_OB_END - FDMA_PCI_ATU_OB_START) /
+ FDMA_PCI_ATU_REGION_MAX, FDMA_PCI_ATU_REGION_ALIGN);
+}
+
+static void __iomem *fdma_pci_atu_addr_get(void __iomem *addr, int offset,
+ int idx)
+{
+ return addr + FDMA_PCI_ATU_ADDR + FDMA_PCI_ATU_IDX_SIZE * idx + offset;
+}
+
+static void fdma_pci_atu_region_enable(struct fdma_pci_atu_region *region)
+{
+ writel(FDMA_PCI_ATU_ENA_BIT,
+ fdma_pci_atu_addr_get(region->atu->addr, FDMA_PCI_ATU_ENA_REG,
+ region->idx));
+}
+
+static void fdma_pci_atu_region_disable(struct fdma_pci_atu_region *region)
+{
+ writel(0, fdma_pci_atu_addr_get(region->atu->addr, FDMA_PCI_ATU_ENA_REG,
+ region->idx));
+}
+
+/* Configure the address translation in the ATU. */
+static void
+fdma_pci_atu_configure_translation(struct fdma_pci_atu_region *region)
+{
+ struct fdma_pci_atu *atu = region->atu;
+ int idx = region->idx;
+
+ writel(lower_32_bits(region->base_addr),
+ fdma_pci_atu_addr_get(atu->addr,
+ FDMA_PCI_ATU_LWR_BASE_ADDR, idx));
+
+ writel(upper_32_bits(region->base_addr),
+ fdma_pci_atu_addr_get(atu->addr,
+ FDMA_PCI_ATU_UPP_BASE_ADDR, idx));
+
+ /* Upper limit register only needed with REGION_SIZE > 4GB. */
+ writel(region->limit_addr,
+ fdma_pci_atu_addr_get(atu->addr, FDMA_PCI_ATU_LIMIT_ADDR, idx));
+
+ writel(lower_32_bits(region->target_addr),
+ fdma_pci_atu_addr_get(atu->addr,
+ FDMA_PCI_ATU_LWR_TARGET_ADDR, idx));
+
+ writel(upper_32_bits(region->target_addr),
+ fdma_pci_atu_addr_get(atu->addr,
+ FDMA_PCI_ATU_UPP_TARGET_ADDR, idx));
+}
+
+/* Find an unused ATU region. */
+static struct fdma_pci_atu_region *
+fdma_pci_atu_region_get_free(struct fdma_pci_atu *atu)
+{
+ struct fdma_pci_atu_region *regions = atu->regions;
+
+ for (int i = 0; i < FDMA_PCI_ATU_REGION_MAX; i++) {
+ if (regions[i].in_use)
+ continue;
+
+ return ®ions[i];
+ }
+
+ return ERR_PTR(-ENOSPC);
+}
+
+/* Unmap an ATU region, clearing its translation and disabling it. */
+void fdma_pci_atu_region_unmap(struct fdma_pci_atu_region *region)
+{
+ if (IS_ERR_OR_NULL(region))
+ return;
+
+ region->target_addr = 0;
+ region->in_use = false;
+
+ fdma_pci_atu_region_disable(region);
+ fdma_pci_atu_configure_translation(region);
+}
+EXPORT_SYMBOL_GPL(fdma_pci_atu_region_unmap);
+
+/* Map a host DMA address into a free outbound region. */
+struct fdma_pci_atu_region *
+fdma_pci_atu_region_map(struct fdma_pci_atu *atu, u64 target_addr, int size)
+{
+ struct fdma_pci_atu_region *region;
+
+ if (!atu)
+ return ERR_PTR(-EINVAL);
+
+ if (size <= 0)
+ return ERR_PTR(-EINVAL);
+
+ if (size > fdma_pci_atu_region_size())
+ return ERR_PTR(-E2BIG);
+
+ region = fdma_pci_atu_region_get_free(atu);
+ if (IS_ERR(region))
+ return region;
+
+ region->target_addr = target_addr;
+ region->in_use = true;
+
+ /* Enable first, according to datasheet section 3.24.7.4.1 */
+ fdma_pci_atu_region_enable(region);
+ fdma_pci_atu_configure_translation(region);
+
+ return region;
+}
+EXPORT_SYMBOL_GPL(fdma_pci_atu_region_map);
+
+/* Translate a host DMA address to the corresponding OB address. */
+u64 fdma_pci_atu_translate_addr(struct fdma_pci_atu_region *region, u64 addr)
+{
+ return region->base_addr + (addr - region->target_addr);
+}
+EXPORT_SYMBOL_GPL(fdma_pci_atu_translate_addr);
+
+/* Initialize ATU, dividing the OB space into equally sized regions. */
+void fdma_pci_atu_init(struct fdma_pci_atu *atu, void __iomem *addr)
+{
+ struct fdma_pci_atu_region *regions = atu->regions;
+ u32 region_size = fdma_pci_atu_region_size();
+
+ atu->addr = addr;
+
+ for (int i = 0; i < FDMA_PCI_ATU_REGION_MAX; i++) {
+ regions[i].base_addr =
+ FDMA_PCI_ATU_OB_START + (i * region_size);
+ regions[i].limit_addr =
+ regions[i].base_addr + region_size - 1;
+ regions[i].idx = i;
+ regions[i].atu = atu;
+ }
+}
+EXPORT_SYMBOL_GPL(fdma_pci_atu_init);
diff --git a/drivers/net/ethernet/microchip/fdma/fdma_pci.h b/drivers/net/ethernet/microchip/fdma/fdma_pci.h
new file mode 100644
index 000000000000..eccfe5dc25e7
--- /dev/null
+++ b/drivers/net/ethernet/microchip/fdma/fdma_pci.h
@@ -0,0 +1,42 @@
+/* SPDX-License-Identifier: GPL-2.0+ */
+
+#ifndef _FDMA_PCI_H_
+#define _FDMA_PCI_H_
+
+#include <linux/types.h>
+
+#define FDMA_PCI_ATU_REGION_MAX 6
+#define FDMA_PCI_DB_ALIGN 128
+#define FDMA_PCI_DB_SIZE(mtu) ALIGN(mtu, FDMA_PCI_DB_ALIGN)
+
+struct fdma_pci_atu;
+
+struct fdma_pci_atu_region {
+ struct fdma_pci_atu *atu;
+ u64 base_addr; /* Base addr of the OB window */
+ u64 limit_addr; /* Limit addr of the OB window */
+ u64 target_addr; /* Host DMA address this region maps to */
+ int idx;
+ bool in_use;
+};
+
+struct fdma_pci_atu {
+ void __iomem *addr;
+ struct fdma_pci_atu_region regions[FDMA_PCI_ATU_REGION_MAX];
+};
+
+/* Initialize ATU, dividing OB space into regions. */
+void fdma_pci_atu_init(struct fdma_pci_atu *atu, void __iomem *addr);
+
+/* Unmap an ATU region, clearing its translation and disabling it. */
+void fdma_pci_atu_region_unmap(struct fdma_pci_atu_region *region);
+
+/* Map a host DMA address into a free ATU region. */
+struct fdma_pci_atu_region *fdma_pci_atu_region_map(struct fdma_pci_atu *atu,
+ u64 target_addr,
+ int size);
+
+/* Translate a host DMA address to the OB address space. */
+u64 fdma_pci_atu_translate_addr(struct fdma_pci_atu_region *region, u64 addr);
+
+#endif
--
2.34.1
^ permalink raw reply related
* [PATCH net-next v5 02/13] net: microchip: fdma: rename contiguous dataptr helpers
From: Daniel Machon @ 2026-05-20 8:12 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Horatiu Vultur, Steen Hegelund, UNGLinuxDriver,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, Stanislav Fomichev, Herve Codina, Arnd Bergmann,
Greg Kroah-Hartman, Mohsin Bashir
Cc: netdev, linux-kernel, bpf, linux-arm-kernel
In-Reply-To: <20260520-lan966x-pci-fdma-v5-0-ca56197ae05b@microchip.com>
When the FDMA library was introduced [1], two helpers to get the DMA and
virtual address of a DCB, in contiguous memory, were added. These
helpers have had no callers until this series. I found the naming I
initially used confusing and inconsistent.
Rename fdma_dataptr_get_contiguous() and
fdma_dataptr_virt_get_contiguous() to fdma_dataptr_dma_addr_contiguous()
and fdma_dataptr_virt_addr_contiguous(). This makes the pair symmetric
and clarifies what type of address each returns.
[1]: commit 30e48a75df9c ("net: microchip: add FDMA library")
Tested-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
---
drivers/net/ethernet/microchip/fdma/fdma_api.h | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/microchip/fdma/fdma_api.h b/drivers/net/ethernet/microchip/fdma/fdma_api.h
index d91affe8bd98..94f1a6596097 100644
--- a/drivers/net/ethernet/microchip/fdma/fdma_api.h
+++ b/drivers/net/ethernet/microchip/fdma/fdma_api.h
@@ -197,8 +197,9 @@ static inline int fdma_nextptr_cb(struct fdma *fdma, int dcb_idx, u64 *nextptr)
* if the dataptr addresses and DCB's are in contiguous memory and the driver
* supports XDP.
*/
-static inline u64 fdma_dataptr_get_contiguous(struct fdma *fdma, int dcb_idx,
- int db_idx)
+static inline u64 fdma_dataptr_dma_addr_contiguous(struct fdma *fdma,
+ int dcb_idx,
+ int db_idx)
{
return fdma->dma + (sizeof(struct fdma_dcb) * fdma->n_dcbs) +
(dcb_idx * fdma->n_dbs + db_idx) * fdma->db_size +
@@ -209,8 +210,8 @@ static inline u64 fdma_dataptr_get_contiguous(struct fdma *fdma, int dcb_idx,
* applicable if the dataptr addresses and DCB's are in contiguous memory and
* the driver supports XDP.
*/
-static inline void *fdma_dataptr_virt_get_contiguous(struct fdma *fdma,
- int dcb_idx, int db_idx)
+static inline void *fdma_dataptr_virt_addr_contiguous(struct fdma *fdma,
+ int dcb_idx, int db_idx)
{
return (u8 *)fdma->dcbs + (sizeof(struct fdma_dcb) * fdma->n_dcbs) +
(dcb_idx * fdma->n_dbs + db_idx) * fdma->db_size +
--
2.34.1
^ permalink raw reply related
* [PATCH net-next v5 01/13] MAINTAINERS: add FDMA library to Sparx5 SoC entry
From: Daniel Machon @ 2026-05-20 8:12 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Horatiu Vultur, Steen Hegelund, UNGLinuxDriver,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, Stanislav Fomichev, Herve Codina, Arnd Bergmann,
Greg Kroah-Hartman, Mohsin Bashir
Cc: netdev, linux-kernel, bpf, linux-arm-kernel
In-Reply-To: <20260520-lan966x-pci-fdma-v5-0-ca56197ae05b@microchip.com>
The FDMA library under drivers/net/ethernet/microchip/fdma/ is shared by
the lan966x, sparx5 and lan969x drivers, but is not covered by an entry
in the MAINTAINERS file. A subsequent patch will add new files to the
FDMA library, so let's make sure it's covered.
Add drivers/net/ethernet/microchip/fdma/ to the Sparx5 SoC entry, since
I am already listed there.
Tested-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
---
MAINTAINERS | 1 +
1 file changed, 1 insertion(+)
diff --git a/MAINTAINERS b/MAINTAINERS
index 5db1a2923dd2..2d4eeb855145 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3115,6 +3115,7 @@ M: UNGLinuxDriver@microchip.com
L: linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
S: Supported
F: arch/arm64/boot/dts/microchip/sparx*
+F: drivers/net/ethernet/microchip/fdma/
F: drivers/net/ethernet/microchip/vcap/
F: drivers/pinctrl/pinctrl-microchip-sgpio.c
N: sparx5
--
2.34.1
^ permalink raw reply related
* [PATCH net-next v5 00/13] net: lan966x: add support for PCIe FDMA
From: Daniel Machon @ 2026-05-20 8:12 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Horatiu Vultur, Steen Hegelund, UNGLinuxDriver,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, Stanislav Fomichev, Herve Codina, Arnd Bergmann,
Greg Kroah-Hartman, Mohsin Bashir
Cc: netdev, linux-kernel, bpf, linux-arm-kernel
When lan966x operates as a PCIe endpoint, the driver currently uses
register-based I/O for frame injection and extraction. This approach is
functional but slow, topping out at around 33 Mbps on an Intel x86 host
with a lan966x PCIe card.
This series adds FDMA (Frame DMA) support for the PCIe path. When
operating as a PCIe endpoint, the internal FDMA engine on lan966x cannot
directly access host memory, so DMA buffers are allocated as contiguous
coherent memory and mapped through the PCIe Address Translation Unit
(ATU). The ATU provides outbound windows that translate internal FDMA
addresses to PCIe bus addresses, allowing the FDMA engine to read and
write host memory. Because the ATU requires contiguous address regions,
page_pool and normal per-page DMA mappings cannot be used. Instead,
frames are transferred using memcpy between the ATU-mapped buffers and
the network stack. With this, throughput increases from ~33 Mbps to
~620 Mbps for default MTU.
Patch 1 adds the shared drivers/net/ethernet/microchip/fdma/ directory
to the Sparx5 SoC MAINTAINERS entry.
Patches 2-3 prepare the shared FDMA library: patch 2 renames the
contiguous dataptr helpers for clarity, and patch 3 adds PCIe ATU
region management and coherent DMA allocation with ATU mapping.
Patches 4-6 refactor the lan966x FDMA code to support both platform
and PCIe paths: extracting the LLP register write into a helper,
exporting shared functions, and introducing an ops dispatch table
selected at probe time.
Patches 7-8 harden the existing FDMA path for the PCIe endpoint
lifecycle: patch 7 clears latched FDMA error/interrupt stickies after
the switch reset so they don't assert as soon as interrupts are
enabled, and patch 8 adds a shutdown() callback that quiesces the
FDMA engine on host warm reboot (on the PCIe card the FDMA survives
host reset and would otherwise keep the shared INTx asserted into
the next probe).
Patch 9 adds the core PCIe FDMA implementation with RX/TX using
contiguous ATU-mapped buffers. Patches 10 and 11 extend it with MTU
change and XDP support respectively. XDP_PASS, XDP_TX, XDP_DROP and
XDP_ABORTED are supported; XDP_REDIRECT is deliberately not, because
the PCIe data path does not use page_pool.
Patches 12-13 update the lan966x PCI device tree overlay to extend the
cpu register mapping to cover the ATU register space and add the FDMA
interrupt.
To: Andrew Lunn <andrew+netdev@lunn.ch>
To: David S. Miller <davem@davemloft.net>
To: Eric Dumazet <edumazet@google.com>
To: Jakub Kicinski <kuba@kernel.org>
To: Paolo Abeni <pabeni@redhat.com>
To: Horatiu Vultur <horatiu.vultur@microchip.com>
To: Steen Hegelund <steen.hegelund@microchip.com>
To: UNGLinuxDriver@microchip.com
To: Alexei Starovoitov <ast@kernel.org>
To: Daniel Borkmann <daniel@iogearbox.net>
To: Jesper Dangaard Brouer <hawk@kernel.org>
To: John Fastabend <john.fastabend@gmail.com>
To: Stanislav Fomichev <sdf@fomichev.me>
To: Herve Codina <herve.codina@bootlin.com>
To: Arnd Bergmann <arnd@arndb.de>
To: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: Mohsin Bashir <mohsin.bashr@gmail.com>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: bpf@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
---
Changes in v5:
This version fixes a single AI review issue, flagged by Paolo. Other AI
issues for v4 has been classified as pre-existing or changes for
follow-ups.
- Fix premature napi_complete_done() in lan966x_fdma_pci_napi_poll() on
FDMA_ERROR and napi_alloc_skb() failure. Bailing out left DONE=1 DCBs
in the ring with no IRQ to drain them. Drop the frame and continue
the poll loop instead. Bump rx_dropped on memory-pressure drop.
(Paolo)
- Link to v4: https://lore.kernel.org/r/20260508-lan966x-pci-fdma-v4-0-14e0c89d8d63@microchip.com
Changes in v4:
- Consolidate rx size checks into lan966x_fdma_pci_rx_size_fits().
Subtract XDP_PACKET_HEADROOM on the max size check, and add ETH_HLEN
on the min size check. This fixes potential OOB reads/writes.
- On xdp_prepare_buff(), update comment to clarify that data is already
offset by XDP_PACKET_HEADROOM.
- Link to v3:
https://lore.kernel.org/r/20260504-lan966x-pci-fdma-v3-0-a56f5740d870@microchip.com
Changes in v3:
Version 3 fixes a number of issues reported by sashiko - mostly
hardening.
- Fix double use of XDP_PACKET_HEADROOM.
- Fix ERR_PTR persistence in fdma->atu_region and add missing
NULL/ERR_PTR guard in fdma_pci_atu_region_unmap().
- Reject size <= 0 in fdma_pci_atu_region_map() and return
-ENOSPC (was -ENOMEM) when no region is free.
- Introduce lan966x_fdma_pci_tx_size_fits() that accounts for
XDP_PACKET_HEADROOM; use it from both xmit paths to keep
bpf_xdp_adjust_tail from writing past the TX slot.
- Validate BLOCKL in rx_check_frame() (reject < IFH+FCS or
> db_size) before it feeds memcpy/XDP sizes.
- READ_ONCE(port->xdp_prog) inside lan966x_xdp_pci_run() to close
a TOCTOU on XDP detach that could deref NULL in
bpf_prog_run_xdp().
- Strip IFH and FCS pre-XDP in rx_check_frame(). After BPF runs
the driver cannot tell whether the tail was modified; drop the
unconditional skb_pull/skb_trim in rx_get_frame().
- Account tx_bytes/tx_packets on XDP_TX success and tx_dropped on
XDP_TX size reject.
- Add dma_wmb()/dma_rmb() around DCB status writes and reads in
xmit, xmit_xdpf, and napi_poll.
- Collected Tested-by: Hervé Codina.
- Link to v2: https://lore.kernel.org/r/20260428-lan966x-pci-fdma-v2-0-d3ec66e06202@microchip.com
Changes in v2:
Version 2 primarily addresses issues with module unload/load, where
traffic would stop working (Hervé), and XDP head/tail adjust that would be
discarded (Mohsin).
Apart from that, I ran through issues reported by Sashiko, and fixed a
number of other issues.
- New patch 1: add drivers/net/ethernet/microchip/fdma/ to the Sparx5
SoC MAINTAINERS entry.
- New patch 7: clear latched FDMA error/interrupt stickies after the
switch reset so they don't fire as soon as interrupts are enabled.
- New patch 8: shutdown() callback, quiescing FDMA on host warm reboot.
- Replaced the depth-2 dev_is_pci(parent->parent) backend selector
with a parent-chain walk.
- XDP: use xdp.data/xdp.data_end for the post-XDP frame length so that
bpf_xdp_adjust_head/tail are respected (Mohsin Bashir)
- MTU change: drain in-flight xmits with netif_tx_disable() on every
port before reallocating rings, waking them again on completion.
- MTU change: cap the PCIe DCB ring at 256 entries so a full-ring
coherent DMA allocation fits in a single MAX_PAGE_ORDER block at
jumbo MTU.
- PCIe ATU: disable the region before clearing its translation on
unmap.
- PCIe FDMA: hold tx_lock in napi_poll around the free-DCB check used
to wake stopped netdev queues.
- PCIe FDMA: return -ENOSPC (not -1) when the DCB ring is exhausted.
- Link to v1: https://lore.kernel.org/r/20260320-lan966x-pci-fdma-v1-0-ef54cb9b0c4b@microchip.com
---
Daniel Machon (13):
MAINTAINERS: add FDMA library to Sparx5 SoC entry
net: microchip: fdma: rename contiguous dataptr helpers
net: microchip: fdma: add PCIe ATU support
net: lan966x: add FDMA LLP register write helper
net: lan966x: export FDMA helpers for reuse
net: lan966x: add FDMA ops dispatch for PCIe support
net: lan966x: clear FDMA interrupt stickies after switch reset
net: lan966x: add shutdown callback to stop FDMA on reboot
net: lan966x: add PCIe FDMA support
net: lan966x: add PCIe FDMA MTU change support
net: lan966x: add PCIe FDMA XDP support
misc: lan966x-pci: dts: extend cpu reg to cover PCIE DBI space
misc: lan966x-pci: dts: add fdma interrupt to overlay
MAINTAINERS | 1 +
drivers/misc/lan966x_pci.dtso | 5 +-
drivers/net/ethernet/microchip/fdma/Makefile | 4 +
drivers/net/ethernet/microchip/fdma/fdma_api.c | 33 +
drivers/net/ethernet/microchip/fdma/fdma_api.h | 25 +-
drivers/net/ethernet/microchip/fdma/fdma_pci.c | 182 ++++++
drivers/net/ethernet/microchip/fdma/fdma_pci.h | 42 ++
drivers/net/ethernet/microchip/lan966x/Makefile | 4 +
.../net/ethernet/microchip/lan966x/lan966x_fdma.c | 51 +-
.../ethernet/microchip/lan966x/lan966x_fdma_pci.c | 667 +++++++++++++++++++++
.../net/ethernet/microchip/lan966x/lan966x_main.c | 74 ++-
.../net/ethernet/microchip/lan966x/lan966x_main.h | 45 ++
.../net/ethernet/microchip/lan966x/lan966x_regs.h | 25 +
.../net/ethernet/microchip/lan966x/lan966x_xdp.c | 10 +
14 files changed, 1128 insertions(+), 40 deletions(-)
---
base-commit: bf53bf33206137c2337bd8aacf0ef4c348b97a36
change-id: 20260313-lan966x-pci-fdma-94ed485d23fa
Best regards,
--
Daniel Machon <daniel.machon@microchip.com>
^ permalink raw reply
* [PATCH RFC net-next] net: airoha: Add TCP LRO support
From: Lorenzo Bianconi @ 2026-05-20 8:12 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni
Cc: linux-arm-kernel, linux-mediatek, netdev, Lorenzo Bianconi
Add hardware TCP Large Receive Offload (LRO) support to the airoha_eth
driver, leveraging the EN7581/AN7583 SoC's 8 dedicated LRO hardware queues
mapped to RX queues 24–31. To meet the hardware requirement for contiguous
memory regions, increase the page_pool allocation order to 2 for RX queues
24–31.
Performance comparison between GRO and hw LRO has been carried out using
a 10Gbps NIC:
GRO: ~2.7 Gbps
LRO: ~3.5 Gbps (~30% improvement)
Please note with respect to the previous implementation, page_pool
allocation order has been reduced from 5 to 2.
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
drivers/net/ethernet/airoha/airoha_eth.c | 202 +++++++++++++++++++++++++++---
drivers/net/ethernet/airoha/airoha_eth.h | 23 ++++
drivers/net/ethernet/airoha/airoha_regs.h | 22 +++-
3 files changed, 232 insertions(+), 15 deletions(-)
diff --git a/drivers/net/ethernet/airoha/airoha_eth.c b/drivers/net/ethernet/airoha/airoha_eth.c
index 5a027cc7ffcb..e2c6231ee6a0 100644
--- a/drivers/net/ethernet/airoha/airoha_eth.c
+++ b/drivers/net/ethernet/airoha/airoha_eth.c
@@ -12,6 +12,7 @@
#include <net/dst_metadata.h>
#include <net/page_pool/helpers.h>
#include <net/pkt_cls.h>
+#include <net/tcp.h>
#include <uapi/linux/ppp_defs.h>
#include "airoha_regs.h"
@@ -431,6 +432,47 @@ static void airoha_fe_crsn_qsel_init(struct airoha_eth *eth)
CDM_CRSN_QSEL_Q1));
}
+static void airoha_fe_lro_init_rx_queue(struct airoha_eth *eth, int qdma_id,
+ int lro_queue_index, int qid,
+ int nbuf, int buf_size)
+{
+ int id = qdma_id + 1;
+
+ airoha_fe_rmw(eth, REG_CDM_LRO_LIMIT(id),
+ CDM_LRO_AGG_NUM_MASK | CDM_LRO_AGG_SIZE_MASK,
+ FIELD_PREP(CDM_LRO_AGG_NUM_MASK, nbuf) |
+ FIELD_PREP(CDM_LRO_AGG_SIZE_MASK, buf_size));
+ airoha_fe_rmw(eth, REG_CDM_LRO_AGE_TIME(id),
+ CDM_LRO_AGE_TIME_MASK | CDM_LRO_AGG_TIME_MASK,
+ FIELD_PREP(CDM_LRO_AGE_TIME_MASK,
+ AIROHA_RXQ_LRO_MAX_AGE_TIME) |
+ FIELD_PREP(CDM_LRO_AGG_TIME_MASK,
+ AIROHA_RXQ_LRO_MAX_AGG_TIME));
+ airoha_fe_rmw(eth, REG_CDM_LRO_RXQ(id, lro_queue_index),
+ LRO_RXQ_MASK(lro_queue_index),
+ __field_prep(LRO_RXQ_MASK(lro_queue_index), qid));
+ airoha_fe_set(eth, REG_CDM_LRO_EN(id), BIT(lro_queue_index));
+}
+
+static void airoha_fe_lro_disable(struct airoha_eth *eth, int qdma_id)
+{
+ int i, id = qdma_id + 1;
+
+ airoha_fe_clear(eth, REG_CDM_LRO_LIMIT(id),
+ CDM_LRO_AGG_NUM_MASK | CDM_LRO_AGG_SIZE_MASK);
+ airoha_fe_clear(eth, REG_CDM_LRO_AGE_TIME(id),
+ CDM_LRO_AGE_TIME_MASK | CDM_LRO_AGG_TIME_MASK);
+ airoha_fe_clear(eth, REG_CDM_LRO_EN(id), LRO_RXQ_EN_MASK);
+ for (i = 0; i < AIROHA_MAX_NUM_LRO_QUEUES; i++)
+ airoha_fe_clear(eth, REG_CDM_LRO_RXQ(id, i), LRO_RXQ_MASK(i));
+}
+
+static bool airoha_fe_lro_is_enabled(struct airoha_eth *eth, int qdma_id)
+{
+ return airoha_fe_get(eth, REG_CDM_LRO_EN(qdma_id + 1),
+ LRO_RXQ_EN_MASK);
+}
+
static int airoha_fe_init(struct airoha_eth *eth)
{
airoha_fe_maccr_init(eth);
@@ -587,9 +629,78 @@ static int airoha_qdma_get_gdm_port(struct airoha_eth *eth,
return port >= ARRAY_SIZE(eth->ports) ? -EINVAL : port;
}
+static int airoha_qdma_lro_rx_process(struct airoha_queue *q,
+ struct airoha_qdma_desc *desc)
+{
+ u32 msg1 = le32_to_cpu(READ_ONCE(desc->msg1));
+ u32 msg2 = le32_to_cpu(READ_ONCE(desc->msg2));
+ u32 msg3 = le32_to_cpu(READ_ONCE(desc->msg3));
+ bool ipv4 = FIELD_GET(QDMA_ETH_RXMSG_IP4_MASK, msg1);
+ bool ipv6 = FIELD_GET(QDMA_ETH_RXMSG_IP6_MASK, msg1);
+ struct sk_buff *skb = q->skb;
+ u32 th_off, tcp_ack_seq;
+ u16 tcp_win, l2_len;
+ struct tcphdr *th;
+
+ if (FIELD_GET(QDMA_ETH_RXMSG_AGG_COUNT_MASK, msg2) <= 1)
+ return 0;
+
+ if (!ipv4 && !ipv6)
+ return -EOPNOTSUPP;
+
+ l2_len = FIELD_GET(QDMA_ETH_RXMSG_L2_LEN_MASK, msg2);
+ if (ipv4) {
+ u16 agg_len = FIELD_GET(QDMA_ETH_RXMSG_AGG_LEN_MASK, msg3);
+ struct iphdr *iph = (struct iphdr *)(skb->data + l2_len);
+
+ if (iph->protocol != IPPROTO_TCP)
+ return -EOPNOTSUPP;
+
+ iph->tot_len = cpu_to_be16(agg_len);
+ iph->check = 0;
+ iph->check = ip_fast_csum((void *)iph, iph->ihl);
+ th_off = l2_len + (iph->ihl << 2);
+ } else {
+ struct ipv6hdr *ip6h = (struct ipv6hdr *)(skb->data + l2_len);
+ u32 len, desc_ctrl = le32_to_cpu(READ_ONCE(desc->ctrl));
+
+ if (ip6h->nexthdr != NEXTHDR_TCP)
+ return -EOPNOTSUPP;
+
+ len = FIELD_GET(QDMA_DESC_LEN_MASK, desc_ctrl);
+ ip6h->payload_len = cpu_to_be16(len - l2_len - sizeof(*ip6h));
+ th_off = l2_len + sizeof(*ip6h);
+ }
+
+ tcp_win = FIELD_GET(QDMA_ETH_RXMSG_TCP_WIN_MASK, msg3);
+ tcp_ack_seq = le32_to_cpu(READ_ONCE(desc->data));
+
+ th = (struct tcphdr *)(skb->data + th_off);
+ th->ack_seq = cpu_to_be32(tcp_ack_seq);
+ th->window = cpu_to_be16(tcp_win);
+
+ /* Check tcp timestamp option */
+ if (th->doff == (sizeof(*th) + TCPOLEN_TSTAMP_ALIGNED) / 4) {
+ __be32 *topt = (__be32 *)(th + 1);
+
+ if (*topt == cpu_to_be32((TCPOPT_NOP << 24) |
+ (TCPOPT_NOP << 16) |
+ (TCPOPT_TIMESTAMP << 8) |
+ TCPOLEN_TIMESTAMP)) {
+ __le32 tcp_ts_reply = READ_ONCE(desc->tcp_ts_reply);
+
+ put_unaligned_be32(le32_to_cpu(tcp_ts_reply),
+ topt + 2);
+ }
+ }
+
+ return 0;
+}
+
static int airoha_qdma_rx_process(struct airoha_queue *q, int budget)
{
enum dma_data_direction dir = page_pool_get_dma_dir(q->page_pool);
+ bool lro_q = airoha_qdma_is_lro_queue(q);
struct airoha_qdma *qdma = q->qdma;
struct airoha_eth *eth = qdma->eth;
int qid = q - &qdma->q_rx[0];
@@ -636,9 +747,14 @@ static int airoha_qdma_rx_process(struct airoha_queue *q, int budget)
__skb_put(q->skb, len);
skb_mark_for_recycle(q->skb);
q->skb->dev = port->dev;
- q->skb->protocol = eth_type_trans(q->skb, port->dev);
q->skb->ip_summed = CHECKSUM_UNNECESSARY;
skb_record_rx_queue(q->skb, qid);
+
+ if (lro_q && (port->dev->features & NETIF_F_LRO) &&
+ airoha_qdma_lro_rx_process(q, desc) < 0)
+ goto free_frag;
+
+ q->skb->protocol = eth_type_trans(q->skb, port->dev);
} else { /* scattered frame */
struct skb_shared_info *shinfo = skb_shinfo(q->skb);
int nr_frags = shinfo->nr_frags;
@@ -727,23 +843,19 @@ static int airoha_qdma_rx_napi_poll(struct napi_struct *napi, int budget)
static int airoha_qdma_init_rx_queue(struct airoha_queue *q,
struct airoha_qdma *qdma, int ndesc)
{
- const struct page_pool_params pp_params = {
- .order = 0,
- .pool_size = 256,
+ struct page_pool_params pp_params = {
.flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV,
.dma_dir = DMA_FROM_DEVICE,
- .max_len = PAGE_SIZE,
.nid = NUMA_NO_NODE,
.dev = qdma->eth->dev,
.napi = &q->napi,
};
+ int pp_order, qid = q - &qdma->q_rx[0], thr;
struct airoha_eth *eth = qdma->eth;
- int qid = q - &qdma->q_rx[0], thr;
dma_addr_t dma_addr;
+ bool lro_q;
- q->buf_size = PAGE_SIZE / 2;
q->qdma = qdma;
-
q->entry = devm_kzalloc(eth->dev, ndesc * sizeof(*q->entry),
GFP_KERNEL);
if (!q->entry)
@@ -754,6 +866,12 @@ static int airoha_qdma_init_rx_queue(struct airoha_queue *q,
if (!q->desc)
return -ENOMEM;
+ lro_q = airoha_qdma_is_lro_queue(q);
+ pp_order = lro_q ? AIROHA_LRO_PAGE_ORDER : 0;
+ pp_params.order = pp_order;
+ pp_params.pool_size = 256 >> pp_order;
+ pp_params.max_len = PAGE_SIZE << pp_order;
+
q->page_pool = page_pool_create(&pp_params);
if (IS_ERR(q->page_pool)) {
int err = PTR_ERR(q->page_pool);
@@ -762,6 +880,7 @@ static int airoha_qdma_init_rx_queue(struct airoha_queue *q,
return err;
}
+ q->buf_size = pp_params.max_len / (2 * (1 + lro_q));
q->ndesc = ndesc;
netif_napi_add(eth->napi_dev, &q->napi, airoha_qdma_rx_napi_poll);
@@ -1991,6 +2110,63 @@ int airoha_get_fe_port(struct airoha_gdm_port *port)
}
}
+static int airoha_dev_set_features(struct net_device *dev,
+ netdev_features_t features)
+{
+ netdev_features_t diff = dev->features ^ features;
+ struct airoha_gdm_port *port = netdev_priv(dev);
+ struct airoha_qdma *qdma = port->qdma;
+ struct airoha_eth *eth = qdma->eth;
+ int qdma_id = qdma - ð->qdma[0];
+ int i;
+
+ if (!(diff & NETIF_F_LRO))
+ return 0;
+
+ /* reset LRO configuration */
+ if (features & NETIF_F_LRO) {
+ int lro_queue_index = 0;
+
+ if (airoha_fe_lro_is_enabled(eth, qdma_id))
+ return 0;
+
+ for (i = 0; i < ARRAY_SIZE(qdma->q_rx); i++) {
+ struct airoha_queue *q = &qdma->q_rx[i];
+
+ if (!q->ndesc)
+ continue;
+
+ if (!airoha_qdma_is_lro_queue(q))
+ continue;
+
+ airoha_fe_lro_init_rx_queue(eth, qdma_id,
+ lro_queue_index, i,
+ q->page_pool->p.pool_size,
+ q->buf_size);
+ lro_queue_index++;
+ }
+ } else {
+ for (i = 0; i < ARRAY_SIZE(eth->ports); i++) {
+ struct airoha_gdm_port *p = eth->ports[i];
+
+ if (!p)
+ continue;
+
+ if (p->qdma != qdma)
+ continue;
+
+ if (p->dev == dev)
+ continue;
+
+ if (p->dev->features & NETIF_F_LRO)
+ return 0;
+ }
+ airoha_fe_lro_disable(eth, qdma_id);
+ }
+
+ return 0;
+}
+
static netdev_tx_t airoha_dev_xmit(struct sk_buff *skb,
struct net_device *dev)
{
@@ -2890,6 +3066,7 @@ static const struct net_device_ops airoha_netdev_ops = {
.ndo_stop = airoha_dev_stop,
.ndo_change_mtu = airoha_dev_change_mtu,
.ndo_select_queue = airoha_dev_select_queue,
+ .ndo_set_features = airoha_dev_set_features,
.ndo_start_xmit = airoha_dev_xmit,
.ndo_get_stats64 = airoha_dev_get_stats64,
.ndo_set_mac_address = airoha_dev_set_macaddr,
@@ -2987,12 +3164,9 @@ static int airoha_alloc_gdm_port(struct airoha_eth *eth,
dev->ethtool_ops = &airoha_ethtool_ops;
dev->max_mtu = AIROHA_MAX_MTU;
dev->watchdog_timeo = 5 * HZ;
- dev->hw_features = NETIF_F_IP_CSUM | NETIF_F_RXCSUM |
- NETIF_F_TSO6 | NETIF_F_IPV6_CSUM |
- NETIF_F_SG | NETIF_F_TSO |
- NETIF_F_HW_TC;
- dev->features |= dev->hw_features;
- dev->vlan_features = dev->hw_features;
+ dev->hw_features = AIROHA_HW_FEATURES | NETIF_F_LRO;
+ dev->features |= AIROHA_HW_FEATURES;
+ dev->vlan_features = AIROHA_HW_FEATURES;
dev->dev.of_node = np;
SET_NETDEV_DEV(dev, eth->dev);
diff --git a/drivers/net/ethernet/airoha/airoha_eth.h b/drivers/net/ethernet/airoha/airoha_eth.h
index d3781103abb5..39ab6bdb1f67 100644
--- a/drivers/net/ethernet/airoha/airoha_eth.h
+++ b/drivers/net/ethernet/airoha/airoha_eth.h
@@ -43,6 +43,17 @@
(_n) == 15 ? 128 : \
(_n) == 0 ? 1024 : 16)
+#define AIROHA_LRO_PAGE_ORDER 2
+#define AIROHA_MAX_NUM_LRO_QUEUES 8
+#define AIROHA_RXQ_LRO_EN_MASK 0xff000000
+#define AIROHA_RXQ_LRO_MAX_AGG_TIME 100
+#define AIROHA_RXQ_LRO_MAX_AGE_TIME 2000 /* 1ms */
+
+#define AIROHA_HW_FEATURES \
+ (NETIF_F_IP_CSUM | NETIF_F_RXCSUM | \
+ NETIF_F_TSO6 | NETIF_F_IPV6_CSUM | \
+ NETIF_F_SG | NETIF_F_TSO | NETIF_F_HW_TC)
+
#define PSE_RSV_PAGES 128
#define PSE_QUEUE_RSV_PAGES 64
@@ -661,6 +672,18 @@ static inline bool airoha_is_7583(struct airoha_eth *eth)
return eth->soc->version == 0x7583;
}
+static inline bool airoha_qdma_is_lro_queue(struct airoha_queue *q)
+{
+ struct airoha_qdma *qdma = q->qdma;
+ int qid = q - &qdma->q_rx[0];
+
+ /* EN7581 SoC supports at most 8 LRO rx queues */
+ BUILD_BUG_ON(hweight32(AIROHA_RXQ_LRO_EN_MASK) >
+ AIROHA_MAX_NUM_LRO_QUEUES);
+
+ return !!(AIROHA_RXQ_LRO_EN_MASK & BIT(qid));
+}
+
int airoha_get_fe_port(struct airoha_gdm_port *port);
bool airoha_is_valid_gdm_port(struct airoha_eth *eth,
struct airoha_gdm_port *port);
diff --git a/drivers/net/ethernet/airoha/airoha_regs.h b/drivers/net/ethernet/airoha/airoha_regs.h
index 436f3c8779c1..dfc786583774 100644
--- a/drivers/net/ethernet/airoha/airoha_regs.h
+++ b/drivers/net/ethernet/airoha/airoha_regs.h
@@ -122,6 +122,20 @@
#define CDM_CRSN_QSEL_REASON_MASK(_n) \
GENMASK(4 + (((_n) % 4) << 3), (((_n) % 4) << 3))
+#define REG_CDM_LRO_RXQ(_n, _m) (CDM_BASE(_n) + 0x78 + ((_m) & 0x4))
+#define LRO_RXQ_MASK(_n) GENMASK(4 + (((_n) & 0x3) << 3), ((_n) & 0x3) << 3)
+
+#define REG_CDM_LRO_EN(_n) (CDM_BASE(_n) + 0x80)
+#define LRO_RXQ_EN_MASK GENMASK(7, 0)
+
+#define REG_CDM_LRO_LIMIT(_n) (CDM_BASE(_n) + 0x84)
+#define CDM_LRO_AGG_NUM_MASK GENMASK(23, 16)
+#define CDM_LRO_AGG_SIZE_MASK GENMASK(15, 0)
+
+#define REG_CDM_LRO_AGE_TIME(_n) (CDM_BASE(_n) + 0x88)
+#define CDM_LRO_AGE_TIME_MASK GENMASK(31, 16)
+#define CDM_LRO_AGG_TIME_MASK GENMASK(15, 0)
+
#define REG_GDM_FWD_CFG(_n) GDM_BASE(_n)
#define GDM_PAD_EN_MASK BIT(28)
#define GDM_DROP_CRC_ERR_MASK BIT(23)
@@ -883,9 +897,15 @@
#define QDMA_ETH_RXMSG_SPORT_MASK GENMASK(25, 21)
#define QDMA_ETH_RXMSG_CRSN_MASK GENMASK(20, 16)
#define QDMA_ETH_RXMSG_PPE_ENTRY_MASK GENMASK(15, 0)
+/* RX MSG2 */
+#define QDMA_ETH_RXMSG_AGG_COUNT_MASK GENMASK(31, 24)
+#define QDMA_ETH_RXMSG_L2_LEN_MASK GENMASK(6, 0)
+/* RX MSG3 */
+#define QDMA_ETH_RXMSG_AGG_LEN_MASK GENMASK(31, 16)
+#define QDMA_ETH_RXMSG_TCP_WIN_MASK GENMASK(15, 0)
struct airoha_qdma_desc {
- __le32 rsv;
+ __le32 tcp_ts_reply;
__le32 ctrl;
__le32 addr;
__le32 data;
---
base-commit: bf53bf33206137c2337bd8aacf0ef4c348b97a36
change-id: 20260520-airoha-eth-lro-a5d1c3631811
Best regards,
--
Lorenzo Bianconi <lorenzo@kernel.org>
^ permalink raw reply related
* Re: [PATCH v2 0/5] mm: reduce mmap_lock contention and improve page fault performance
From: Lorenzo Stoakes @ 2026-05-20 8:11 UTC (permalink / raw)
To: Yang Shi
Cc: Barry Song, Matthew Wilcox, surenb, akpm, linux-mm, david, liam,
vbabka, rppt, mhocko, jack, pfalcato, wanglian, chentao,
lianux.mm, kunwu.chan, liyangouwen1, chrisl, kasong, shikemeng,
nphamcs, bhe, youngjun.park, linux-arm-kernel, linux-kernel,
loongarch, linuxppc-dev, linux-riscv, linux-s390, Nanzhe Zhao
In-Reply-To: <CAHbLzkrTF7w+T5mGsQuDRuhnTk6evTKBNRcH4oS=nRcUg2zpsg@mail.gmail.com>
On Tue, May 19, 2026 at 02:02:09PM -0700, Yang Shi wrote:
> On Tue, May 19, 2026 at 11:41 AM Yang Shi <shy828301@gmail.com> wrote:
> > >
> > > >
> > > > >
> > > > > Secondly, if vma->anon_vma is NULL, it basically means either no page
> > > > > fault happened or no cow happened, so there is no page table to copy,
> > > > > this is also what copy_page_range() does currently. So we can shrink
> > > > > the critical section to:
> > > >
> > > > Firstly, with no VMA write lock, !vma->anon_vma means a fault can race and
> > > > secondly copy_page_range() checks vma_needs_copy(), there are other cases - PFN
> > > > maps, mixed maps, UFFD W/P (ugh), guard regions.
> > > >
> > > > So yeah this isn't sufficient.
> > >
> > > However this is true...
> >
> > Yes, fault can race with fork. Basically this is actually the purpose
> > of this idea. We can have improved page fault scalability. In my
> > proposal (take write vma lock if vma->anon_vma is not NULL), the race
> > just happens on the VMAs which page fault has not happened on before.
>
> Sorry, this is incorrect. Page fault can't happen on those VMAs
> because page fault needs to create anon_vma, but it requires taking
> mmap_lock.
> If anon_vma is not NULL, vma write lock will serialize against page
> fault. So there should be no race with page fault. Removing vma write
> lock suggested by Barry may increase race.
Firstly, let's none of us be worried about making mistakes here, the anon_vma
stuff is confusing, and I've stared at it more than mostly, and even so I
managed to make mistakes (as corrected here) and forget details :))
It's a sign it all needs simplifying, but hey that's what my scalable CoW
project is (partly) about :)
Removing the VMA write lock would cause races with page fault which can result
in page tables being installed which are then not correctly duplicated for
ranges that must be.
And again I think the underlying thing here overall I think is:
1. Clearly many cases require serialisation (any that cause copy_page_range() to
fire).
2. If we were to decide not to take a lock with concurrent page faults, that
lays a trap for any future change that (reasonably) assumes that page tables
cannot be simultaneously copied while being accessible to page fault
handlers, which is bug prone.
3. As per 2, even if we were to only take the lock when we felt we absolutely
needed to, we still cause risk through adding yet another 'you just have to
know' risk to this part of mm.
4. The serialisation is quite likely relied upon by other things, this is often
the case in mm, and we may only realise that such serialisation is critical
at the point a subtle issue arises out of it.
5. Fork is one of the most sensitive, intuation-defying, complicated, and
corner- case-problem-baiting areas of mm and I really oppose us changing
fundamental behaviour here unless incredibly well justified.
On this basis, let's let the sleeping dogs lie and leave fork alone I think :)
I think I am far more inclined to take Barry's fault approach (as I've said to
him) vs. changing fork behaviour.
But I want to make sure there's not a 'third way' that could avoid either!
I am going to have a look through Barry's series in detail so we can have some
movement on this one way or another :)
>
> Thanks,
> Yang
>
Cheers, Lorenzo
^ permalink raw reply
* Re: [PATCH v5 1/3] firmware: smccc: coco: Manage arm-smccc platform device and CCA auxiliary drivers
From: Aneesh Kumar K.V @ 2026-05-20 8:11 UTC (permalink / raw)
To: Greg KH
Cc: Suzuki K Poulose, linux-coco, linux-arm-kernel, linux-kernel,
Catalin Marinas, Jeremy Linton, Jonathan Cameron,
Lorenzo Pieralisi, Mark Rutland, Sudeep Holla, Will Deacon,
Steven Price
In-Reply-To: <yq5apl2txmav.fsf@kernel.org>
Hi Greg,
Aneesh Kumar K.V <aneesh.kumar@kernel.org> writes:
> Greg KH <gregkh@linuxfoundation.org> writes:
>
>> On Thu, May 14, 2026 at 08:07:27PM +0530, Aneesh Kumar K.V wrote:
>>> Greg KH <gregkh@linuxfoundation.org> writes:
>>>
>>> > On Thu, May 14, 2026 at 12:04:13PM +0100, Suzuki K Poulose wrote:
>>> >> Hi Aneesh
>>> >>
>>> >> On 14/05/2026 10:40, Aneesh Kumar K.V (Arm) wrote:
>>> >> > Make the SMCCC driver responsible for registering the arm-smccc platform
>>> >> > device and after confirming the relevant SMCCC function IDs, create
>>> >> > the arm_cca_guest auxiliary device.
>>> >> >
>>> >>
>>> >> There are a few changes squashed in to this patch. Please could we
>>> >> split the patch in the following order ?
>>> >>
>>> >> 1. Add platform device for arm-smccc
>>> >
>>> > Do not make any more "fake" platform devices please.
>>> >
>>> >> 2. Move TRNG to Auxilliary Device - (Even though it is a later patch, move
>>> >> it before the RSI changes)
>>> >
>>> > No, move it to the faux api please.
>>> >
>>>
>>>
>>> Maybe I was not complete in my previous reply. I did not want to repeat
>>> the entire thread, so I quoted the lore link for more details.
>>>
>>> 1. We have platform firmware-provided SMCCC interfaces. Based on the
>>> support/availability of these function IDs, we want to load multiple
>>> drivers.
>>> 2. This patch series adds a platform device to represent the
>>> firmware-provided SMCCC resource.
>>> 3. Different SMCCC ranges are now represented as auxiliary devices.
>>> 4. Different subsystems, such as TSM, can autoload their backend drivers
>>> based on the availability of these SMCCC ranges, which are now
>>> represented as auxiliary devices.
>>>
>>> You had agreed to all of this in the previous discussion here:
>>> https://lore.kernel.org/all/2025101516-handbook-hyphen-62ec@gregkh
>>
>> Then why did someone say "this is a fake platform device with no actual
>> resources"? That's what I was triggering off of.
>>
>> Again, if you have actual platform resources, GREAT, use a platform
>> device and aux. If you do not, then do NOT use a platform device.
>>
>> totally confused,
>>
>> greg k-h
>
> I have now rewritten the cover letter as below. Let me know if this
> helps.
>
> Switch Arm SMCCC firmware services to auxiliary devices
>
> As discussed here:
> https://lore.kernel.org/all/20250728135216.48084-12-aneesh.kumar@kernel.org
>
> The earlier CCA guest support used an arm-cca-dev platform device as a pure
> software anchor for the TSM class device. That platform device did not
> correspond to a DT/ACPI described device, MMIO range, interrupt, or other
> platform resource; it existed only to make the CCA guest driver bind and to
> place the resulting TSM device in the driver model. The same pattern also
> exists for smccc_trng. Creating separate platform devices for such
> SMCCC-discovered features is misleading, because those features are not
> independent platform devices.
>
> This series changes the model so that there is a single arm-smccc platform
> device representing the SMCCC firmware interface itself. The firmware
> interface, including its discoverable SMCCC function space, is the
> resource: after PSCI/SMCCC conduit discovery, the kernel can query SMCCC
> function IDs and determine whether optional firmware services are present.
> Services such as SMCCC TRNG and Realm Services Interface (RSI) are
> therefore represented as children of the arm-smccc device, and are created
> only when the required SMCCC function IDs and ABI checks succeed.
>
> The child devices use the auxiliary bus deliberately: they are intended to
> bind independent feature drivers, not just to provide a driverless object for
> sysfs or other class-device anchoring. They are firmware-provided functions
> of the parent SMCCC interface that are consumed by separate kernel drivers
> in different subsystems, such as hwrng and virt/coco/TSM. Those drivers
> need normal driver-core matching, probe/remove lifetime, and module
> autoloading based on the discovered firmware feature. The auxiliary bus
> provides a MODALIAS and id-table based binding model for that case, while
> keeping the feature drivers off the platform bus. A faux device was
> considered, but not used because it is suited for simple software objects
> that do not need independent bus/driver binding. The faux bus has no
> feature-driver id-table or MODALIAS matching, so it would not preserve the
> module-autoload flow that the current platform-device based users rely on.
>
> In other words, the parent arm-smccc device represents the firmware
> resource exposed through the SMCCC conduit, and each auxiliary child
> represents one discovered firmware service of that parent. This removes the
> unnecessary per-feature platform devices while retaining automatic loading
> and independent subsystem drivers for the SMCCC services.
>
> The TSM framework uses the device abstraction to provide cross-architecture
> TSM and TEE I/O functionality, including enumerating available platform TEE
> I/O capabilities and provisioning connections between the platform TSM and
> device DSMs. For Arm CCA, the RSI auxiliary device continues to provide the
> device anchor used by the CCA guest TSM provider.
>
> For the CCA platform, the resulting device hierarchy appears as follows.
> Note that the auxiliary device is parented by the arm-smccc platform device,
> so the sysfs path remains under /devices/platform/arm-smccc/:
>
> $ cd /sys/class/tsm/
> $ ls -al
> total 0
> drwxr-xr-x 2 root root 0 Jan 1 00:02 .
> drwxr-xr-x 23 root root 0 Jan 1 00:00 ..
> lrwxrwxrwx 1 root root 0 Jan 1 00:03 tsm0 -> ../../devices/platform/arm-smccc/arm_cca_guest.arm-rsi-dev.0/tsm/tsm0
> $
>
> The series also replaces the old arm-cca-dev userspace-visible dummy device
> with /sys/firmware/cca/realm_guest for detecting whether the kernel is
> running in a Realm. This keeps the guest-state ABI under /sys/firmware and
> separates it from the internal driver-binding device used by the CCA guest
> TSM provider.
>
>
> -aneesh
Gentle ping, could you let me know if the updated cover letter helps
clarify the confusion regarding the platform-device usage here?
-aneesh
^ permalink raw reply
* Re: [PATCH 4/4] firmware: arm_scmi: Validate Powercap domains before state access
From: Sudeep Holla @ 2026-05-20 8:10 UTC (permalink / raw)
To: Cristian Marussi; +Cc: arm-scmi, linux-arm-kernel, Sudeep Holla
In-Reply-To: <20260519-utopian-parrot-of-sorcery-eff40c@sudeepholla>
On Tue, May 19, 2026 at 11:04:41AM +0100, Cristian Marussi wrote:
> On Sun, May 17, 2026 at 08:02:43PM +0100, Sudeep Holla wrote:
> > Powercap protocol v2 keeps local enable and last-cap state per
> > domain. Some public operations indexed that state before checking that
> > the supplied domain id was valid, and cap_enable_get() updated it even
> > when cap_get() failed.
> >
> > Validate the domain before touching the per-domain state and only
> > refresh cached enable state after a successful cap_get().
> >
>
[...]
> > /*
> > * Report always real platform state; platform could have ignored
> > * a previous disable request. Default true on any error.
> > */
> > ret = scmi_powercap_cap_get(ph, domain_id, &power_cap);
> > - if (!ret)
> > + if (!ret) {
> > *enable = !!power_cap;
> >
> > - /* Update internal state with current real platform state */
> > - pi->states[domain_id].enabled = *enable;
> > + /* Update internal state with current real platform state */
> > + pi->states[domain_id].enabled = *enable;
> > + }
>
> Mmm, this changes the logic as stated in the above comments...now the
> problem is recalling WHY I adopted this logic :<
>
The comment captures only ignored disabled request in which case
enabled=true is correct. But I fail to understand what would happen
if the previous call is scmi_powercap_cap_enable_set(..,false); and
assume the scmi_powercap_cap_get() from scmi_powercap_cap_enable_set()
also returned false, but failure to get the state in
scmi_powercap_cap_enable_get() will update it to true.
I wonder if it is due to the SCMI versions to maintain compatibility or the
kernel default sysfs value. scmi_powercap_cap_set() treats cached disabled
state specially. If cap_enable_get() defaulted the cache to false after
a failed read, later cap updates could be suppressed even though the platform
might actually still be enabled. Defaulting to true is the conservative
choice. I will drop this change/hunk. I will assume your reviewed by with
that dropped, please shout if not.
The generic powercap sysfs layer itself defaults enabled to true, but if a
driver get_enable() returns an error it reports false. SCMI avoids that by
returning success with true I assume.
--
Regards,
Sudeep
^ permalink raw reply
* Re: [PATCH v14 0/3] of: parsing of multi #{iommu,msi}-cells in maps
From: Vijayanand Jitta @ 2026-05-20 8:05 UTC (permalink / raw)
To: Rob Herring
Cc: Nipun Gupta, Nikhil Agarwal, Joerg Roedel, Will Deacon,
Robin Murphy, Marc Zyngier, Lorenzo Pieralisi, Thomas Gleixner,
Saravana Kannan, Richard Zhu, Lucas Stach,
Krzysztof Wilczyński, Manivannan Sadhasivam, Bjorn Helgaas,
Frank Li, Sascha Hauer, Pengutronix Kernel Team, Fabio Estevam,
Juergen Gross, Stefano Stabellini, Oleksandr Tyshchenko,
Dmitry Baryshkov, Konrad Dybcio, Bjorn Andersson, Conor Dooley,
Krzysztof Kozlowski, Prakash Gupta, Vikash Garodia, linux-kernel,
iommu, linux-arm-kernel, devicetree, linux-pci, imx, xen-devel,
linux-arm-msm, Charan Teja Kalla
In-Reply-To: <20260506221915.GA3290640-robh@kernel.org>
On 5/7/2026 3:49 AM, Rob Herring wrote:
> On Fri, Apr 24, 2026 at 11:26:07AM +0530, Vijayanand Jitta wrote:
>> So far our parsing of {iommu,msi}-map properties has always blindly
>> assumed that the output specifiers will always have exactly 1 cell.
>> This typically does happen to be the case, but is not actually enforced
>> (and the PCI msi-map binding even explicitly states support for 0 or 1
>> cells) - as a result we've now ended up with dodgy DTs out in the field
>> which depend on this behaviour to map a 1-cell specifier for a 2-cell
>> provider, despite that being bogus per the bindings themselves.
>>
>> Since there is some potential use[1] in being able to map at least
>> single input IDs to multi-cell output specifiers (and properly support
>> 0-cell outputs as well), add support for properly parsing and using the
>> target nodes' #cells values, albeit with the unfortunate complication of
>> still having to work around expectations of the old behaviour too.
>> -- Robin.
>>
>> Unlike single #{}-cell, it is complex to establish a linear relation
>> between input 'id' and output specifier for multi-cell properties, thus
>> it is always expected that len never going to be > 1.
>>
>> These changes have been tested on QEMU for the arm64 architecture.
>>
>> Since, this would also need update in dt-schema, raised PR[2] for the
>> same.
>
> Sashiko has some thoughts on the series:
>
> https://sashiko.dev/#/patchset/20260424-parse_iommu_cells-v14-0-fd02f11b6c38%40oss.qualcomm.com
>
> Rob
Thanks for the feedback, I have Posted v15 addressing comments from Sashiko.
v15: https://lore.kernel.org/all/20260520-parse_iommu_cells-v15-0-b5f99ad4e7e8@oss.qualcomm.com/
Thanks,
Vijay
^ permalink raw reply
* [PATCH v15 3/3] of: Respect #{iommu,msi}-cells in maps
From: Vijayanand Jitta @ 2026-05-20 8:02 UTC (permalink / raw)
To: Nipun Gupta, Nikhil Agarwal, Joerg Roedel, Will Deacon,
Robin Murphy, Marc Zyngier, Lorenzo Pieralisi, Thomas Gleixner,
Saravana Kannan, Richard Zhu, Lucas Stach,
Krzysztof Wilczyński, Manivannan Sadhasivam, Bjorn Helgaas,
Frank Li, Sascha Hauer, Pengutronix Kernel Team, Fabio Estevam,
Juergen Gross, Stefano Stabellini, Oleksandr Tyshchenko,
Dmitry Baryshkov, Konrad Dybcio, Bjorn Andersson, Rob Herring,
Conor Dooley, Krzysztof Kozlowski, Prakash Gupta, Vikash Garodia
Cc: linux-kernel, iommu, linux-arm-kernel, devicetree, linux-pci, imx,
xen-devel, linux-arm-msm, Vijayanand Jitta, Charan Teja Kalla
In-Reply-To: <20260520-parse_iommu_cells-v15-0-b5f99ad4e7e8@oss.qualcomm.com>
From: Robin Murphy <robin.murphy@arm.com>
So far our parsing of {iommu,msi}-map properties has always blindly
assumed that the output specifiers will always have exactly 1 cell.
This typically does happen to be the case, but is not actually enforced
(and the PCI msi-map binding even explicitly states support for 0 or 1
cells) - as a result we've now ended up with dodgy DTs out in the field
which depend on this behaviour to map a 1-cell specifier for a 2-cell
provider, despite that being bogus per the bindings themselves.
Since there is some potential use in being able to map at least single
input IDs to multi-cell output specifiers (and properly support 0-cell
outputs as well), add support for properly parsing and using the target
nodes' #cells values, albeit with the unfortunate complication of still
having to work around expectations of the old behaviour too.
Since there are multi-cell output specifiers, the callers of of_map_id()
may need to get the exact cell output value for further processing.
Update of_map_id() to set args_count in the output to reflect the actual
number of output specifier cells.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Charan Teja Kalla <charan.kalla@oss.qualcomm.com>
Signed-off-by: Vijayanand Jitta <vijayanand.jitta@oss.qualcomm.com>
---
drivers/of/base.c | 166 +++++++++++++++++++++++++++++++++++++++++------------
include/linux/of.h | 6 +-
2 files changed, 134 insertions(+), 38 deletions(-)
diff --git a/drivers/of/base.c b/drivers/of/base.c
index d658c2620135..f436e2676381 100644
--- a/drivers/of/base.c
+++ b/drivers/of/base.c
@@ -2116,19 +2116,49 @@ int of_find_last_cache_level(unsigned int cpu)
return cache_level;
}
+/*
+ * Some DTs have an iommu-map targeting a 2-cell IOMMU node while
+ * specifying only 1 cell. Fortunately they all consist of value '1'
+ * as the 2nd cell entry with the same target, so check for that pattern.
+ *
+ * Example:
+ * IOMMU node:
+ * #iommu-cells = <2>;
+ *
+ * Device node:
+ * iommu-map = <0x0000 &smmu 0x0000 0x1>,
+ * <0x0100 &smmu 0x0100 0x1>;
+ */
+static bool of_check_bad_map(const __be32 *map, int len)
+{
+ __be32 phandle = map[1];
+
+ if (len % 4)
+ return false;
+ for (int i = 0; i < len; i += 4) {
+ if (map[i + 1] != phandle || map[i + 3] != cpu_to_be32(1))
+ return false;
+ }
+ return true;
+}
+
/**
* of_map_id - Translate an ID through a downstream mapping.
* @np: root complex device node.
* @id: device ID to map.
* @map_name: property name of the map to use.
+ * @cells_name: property name of target specifier cells.
* @map_mask_name: optional property name of the mask to use.
* @filter_np: pointer to an optional filter node, or NULL to allow bypass.
* If non-NULL, the map property must exist (-ENODEV if absent). If
* *filter_np is also non-NULL, only entries targeting that node match.
* @arg: pointer to a &struct of_phandle_args for the result. On success,
- * @arg->args[0] will contain the translated ID. If a map entry was
- * matched, @arg->np will be set to the target node with a reference
- * held that the caller must release with of_node_put().
+ * @arg->args_count will be set to the number of output specifier cells
+ * as defined by @cells_name in the target node, and
+ * @arg->args[0..args_count-1] will contain the translated output
+ * specifier values. If a map entry was matched, @arg->np will be set
+ * to the target node with a reference held that the caller must release
+ * with of_node_put().
*
* Given a device ID, look up the appropriate implementation-defined
* platform ID and/or the target device which receives transactions on that
@@ -2137,11 +2167,13 @@ int of_find_last_cache_level(unsigned int cpu)
* Return: 0 on success or a standard error code on failure.
*/
int of_map_id(const struct device_node *np, u32 id,
- const char *map_name, const char *map_mask_name,
+ const char *map_name, const char *cells_name,
+ const char *map_mask_name,
struct device_node * const *filter_np, struct of_phandle_args *arg)
{
u32 map_mask, masked_id;
- int map_len;
+ int map_bytes, map_len, offset = 0;
+ bool bad_map = false;
const __be32 *map = NULL;
if (!np || !map_name || !arg)
@@ -2149,7 +2181,7 @@ int of_map_id(const struct device_node *np, u32 id,
/* Ensure bypass/no-match success never returns a stale target node. */
arg->np = NULL;
- map = of_get_property(np, map_name, &map_len);
+ map = of_get_property(np, map_name, &map_bytes);
if (!map) {
if (filter_np)
return -ENODEV;
@@ -2159,11 +2191,9 @@ int of_map_id(const struct device_node *np, u32 id,
return 0;
}
- if (!map_len || map_len % (4 * sizeof(*map))) {
- pr_err("%pOF: Error: Bad %s length: %d\n", np,
- map_name, map_len);
- return -EINVAL;
- }
+ if (map_bytes % sizeof(*map))
+ goto err_map_len;
+ map_len = map_bytes / sizeof(*map);
/* The default is to select all bits. */
map_mask = 0xffffffff;
@@ -2176,39 +2206,93 @@ int of_map_id(const struct device_node *np, u32 id,
of_property_read_u32(np, map_mask_name, &map_mask);
masked_id = map_mask & id;
- for ( ; map_len > 0; map_len -= 4 * sizeof(*map), map += 4) {
+
+ while (offset < map_len) {
struct device_node *phandle_node;
- u32 id_base = be32_to_cpup(map + 0);
- u32 phandle = be32_to_cpup(map + 1);
- u32 out_base = be32_to_cpup(map + 2);
- u32 id_len = be32_to_cpup(map + 3);
+ u32 id_base, phandle, id_len, id_off, cells = 0;
+ const __be32 *out_base;
+
+ if (map_len - offset < 2)
+ goto err_map_len;
+
+ id_base = be32_to_cpup(map + offset);
if (id_base & ~map_mask) {
- pr_err("%pOF: Invalid %s translation - %s-mask (0x%x) ignores id-base (0x%x)\n",
- np, map_name, map_name,
- map_mask, id_base);
+ pr_err("%pOF: Invalid %s translation - %s (0x%x) ignores id-base (0x%x)\n",
+ np, map_name, map_mask_name, map_mask, id_base);
return -EFAULT;
}
- if (masked_id < id_base || masked_id >= id_base + id_len)
- continue;
-
+ phandle = be32_to_cpup(map + offset + 1);
phandle_node = of_find_node_by_phandle(phandle);
if (!phandle_node)
return -ENODEV;
+ if (bad_map) {
+ cells = 1;
+ } else if (of_property_read_u32(phandle_node, cells_name, &cells)) {
+ pr_err("%pOF: missing %s property\n", phandle_node, cells_name);
+ of_node_put(phandle_node);
+ return -EINVAL;
+ }
+
+ if (cells > MAX_PHANDLE_ARGS) {
+ pr_err("%pOF: %s cell count %d exceeds maximum\n",
+ phandle_node, cells_name, cells);
+ of_node_put(phandle_node);
+ return -EINVAL;
+ }
+
+ if (offset == 0 && cells == 2) {
+ bad_map = of_check_bad_map(map, map_len);
+ if (bad_map) {
+ pr_warn_once("%pOF: %s has 1-cell entries targeting 2-cell %s, treating as 1-cell output\n",
+ np, map_name, cells_name);
+ cells = 1;
+ }
+ }
+
+ if (map_len - offset < 3 + cells) {
+ of_node_put(phandle_node);
+ goto err_map_len;
+ }
+
+ out_base = map + offset + 2;
+ offset += 3 + cells;
+
+ id_len = be32_to_cpup(map + offset - 1);
+ id_off = masked_id - id_base;
+ if (masked_id < id_base || id_off >= id_len) {
+ of_node_put(phandle_node);
+ continue;
+ }
+ if (id_len > 1 && cells > 1) {
+ /*
+ * With 1 output cell we reasonably assume its value
+ * has a linear relationship to the input; with more,
+ * we'd need help from the provider to know what to do.
+ */
+ pr_err("%pOF: Unsupported %s - cannot handle %d-ID range with %d-cell output specifier\n",
+ np, map_name, id_len, cells);
+ of_node_put(phandle_node);
+ return -EINVAL;
+ }
+
if (filter_np && *filter_np && *filter_np != phandle_node) {
of_node_put(phandle_node);
continue;
}
arg->np = phandle_node;
- arg->args[0] = masked_id - id_base + out_base;
- arg->args_count = 1;
+ for (int i = 0; i < cells; i++)
+ arg->args[i] = id_off + be32_to_cpu(out_base[i]);
+ arg->args_count = cells;
pr_debug("%pOF: %s, using mask %08x, id-base: %08x, out-base: %08x, length: %08x, id: %08x -> %08x\n",
- np, map_name, map_mask, id_base, out_base,
- id_len, id, masked_id - id_base + out_base);
+ np, map_name, map_mask, id_base,
+ cells ? be32_to_cpup(out_base) : 0,
+ id_len, id,
+ cells ? id_off + be32_to_cpup(out_base) : id_off);
return 0;
}
@@ -2219,6 +2303,10 @@ int of_map_id(const struct device_node *np, u32 id,
arg->args[0] = id;
arg->args_count = 1;
return 0;
+
+err_map_len:
+ pr_err("%pOF: Error: Bad %s length: %d\n", np, map_name, map_bytes);
+ return -EINVAL;
}
EXPORT_SYMBOL_GPL(of_map_id);
@@ -2228,18 +2316,21 @@ EXPORT_SYMBOL_GPL(of_map_id);
* @id: Requester ID of the device (e.g. PCI RID/BDF or a platform
* stream/device ID) used as the lookup key in the iommu-map table.
* @arg: pointer to a &struct of_phandle_args for the result. On success,
- * @arg->args[0] contains the translated ID. If a map entry was matched,
- * @arg->np holds a reference to the target node that the caller must
- * release with of_node_put().
+ * @arg->args_count will be set to the number of output specifier cells
+ * and @arg->args[0..args_count-1] will contain the translated output
+ * specifier values. If a map entry was matched, @arg->np holds a
+ * reference to the target node that the caller must release with
+ * of_node_put().
*
- * Convenience wrapper around of_map_id() using "iommu-map" and "iommu-map-mask".
+ * Convenience wrapper around of_map_id() using "iommu-map", "#iommu-cells",
+ * and "iommu-map-mask".
*
* Return: 0 on success or a standard error code on failure.
*/
int of_map_iommu_id(const struct device_node *np, u32 id,
struct of_phandle_args *arg)
{
- return of_map_id(np, id, "iommu-map", "iommu-map-mask", NULL, arg);
+ return of_map_id(np, id, "iommu-map", "#iommu-cells", "iommu-map-mask", NULL, arg);
}
EXPORT_SYMBOL_GPL(of_map_iommu_id);
@@ -2252,17 +2343,20 @@ EXPORT_SYMBOL_GPL(of_map_iommu_id);
* If non-NULL, the map property must exist (-ENODEV if absent). If
* *filter_np is also non-NULL, only entries targeting that node match.
* @arg: pointer to a &struct of_phandle_args for the result. On success,
- * @arg->args[0] contains the translated ID. If a map entry was matched,
- * @arg->np holds a reference to the target node that the caller must
- * release with of_node_put().
+ * @arg->args_count will be set to the number of output specifier cells
+ * and @arg->args[0..args_count-1] will contain the translated output
+ * specifier values. If a map entry was matched, @arg->np holds a
+ * reference to the target node that the caller must release with
+ * of_node_put().
*
- * Convenience wrapper around of_map_id() using "msi-map" and "msi-map-mask".
+ * Convenience wrapper around of_map_id() using "msi-map", "#msi-cells",
+ * and "msi-map-mask".
*
* Return: 0 on success or a standard error code on failure.
*/
int of_map_msi_id(const struct device_node *np, u32 id,
struct device_node * const *filter_np, struct of_phandle_args *arg)
{
- return of_map_id(np, id, "msi-map", "msi-map-mask", filter_np, arg);
+ return of_map_id(np, id, "msi-map", "#msi-cells", "msi-map-mask", filter_np, arg);
}
EXPORT_SYMBOL_GPL(of_map_msi_id);
diff --git a/include/linux/of.h b/include/linux/of.h
index ea50b45d9ff7..374b249766a2 100644
--- a/include/linux/of.h
+++ b/include/linux/of.h
@@ -465,7 +465,8 @@ const char *of_prop_next_string(const struct property *prop, const char *cur);
bool of_console_check(const struct device_node *dn, char *name, int index);
int of_map_id(const struct device_node *np, u32 id,
- const char *map_name, const char *map_mask_name,
+ const char *map_name, const char *cells_name,
+ const char *map_mask_name,
struct device_node * const *filter_np,
struct of_phandle_args *arg);
@@ -950,7 +951,8 @@ static inline void of_property_clear_flag(struct property *p, unsigned long flag
}
static inline int of_map_id(const struct device_node *np, u32 id,
- const char *map_name, const char *map_mask_name,
+ const char *map_name, const char *cells_name,
+ const char *map_mask_name,
struct device_node * const *filter_np,
struct of_phandle_args *arg)
{
--
2.34.1
^ permalink raw reply related
* [PATCH v15 2/3] of: Factor arguments passed to of_map_id() into a struct
From: Vijayanand Jitta @ 2026-05-20 8:02 UTC (permalink / raw)
To: Nipun Gupta, Nikhil Agarwal, Joerg Roedel, Will Deacon,
Robin Murphy, Marc Zyngier, Lorenzo Pieralisi, Thomas Gleixner,
Saravana Kannan, Richard Zhu, Lucas Stach,
Krzysztof Wilczyński, Manivannan Sadhasivam, Bjorn Helgaas,
Frank Li, Sascha Hauer, Pengutronix Kernel Team, Fabio Estevam,
Juergen Gross, Stefano Stabellini, Oleksandr Tyshchenko,
Dmitry Baryshkov, Konrad Dybcio, Bjorn Andersson, Rob Herring,
Conor Dooley, Krzysztof Kozlowski, Prakash Gupta, Vikash Garodia
Cc: linux-kernel, iommu, linux-arm-kernel, devicetree, linux-pci, imx,
xen-devel, linux-arm-msm, Vijayanand Jitta, Charan Teja Kalla
In-Reply-To: <20260520-parse_iommu_cells-v15-0-b5f99ad4e7e8@oss.qualcomm.com>
From: Charan Teja Kalla <charan.kalla@oss.qualcomm.com>
Change of_map_id() to take a pointer to struct of_phandle_args
instead of passing target device node and translated IDs separately.
Update all callers accordingly.
Add an explicit filter_np parameter to of_map_id() and of_map_msi_id()
to separate the filter input from the output. Previously, the target
parameter served dual purpose: as an input filter (if non-NULL, only
match entries targeting that node) and as an output (receiving the
matched node with a reference held). Now filter_np is the explicit
input filter and arg->np is the pure output.
Previously, of_map_id() would call of_node_put() on the matched node
when a filter was provided, making reference ownership inconsistent.
Remove this internal of_node_put() call so that of_map_id() now always
transfers ownership of the matched node reference to the caller via
arg->np. Callers are now consistently responsible for releasing this
reference with of_node_put(arg->np) when done.
Acked-by: Frank Li <Frank.Li@nxp.com>
Suggested-by: Rob Herring (Arm) <robh@kernel.org>
Suggested-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Signed-off-by: Charan Teja Kalla <charan.kalla@oss.qualcomm.com>
Signed-off-by: Vijayanand Jitta <vijayanand.jitta@oss.qualcomm.com>
---
drivers/cdx/cdx_msi.c | 7 ++--
drivers/iommu/of_iommu.c | 4 +-
drivers/irqchip/irq-gic-its-msi-parent.c | 10 +++--
drivers/of/base.c | 71 ++++++++++++++++++--------------
drivers/of/irq.c | 24 ++++++++---
drivers/pci/controller/dwc/pci-imx6.c | 53 ++++++++++++------------
drivers/pci/controller/pcie-apple.c | 5 ++-
drivers/xen/grant-dma-ops.c | 4 +-
include/linux/of.h | 16 ++++---
9 files changed, 114 insertions(+), 80 deletions(-)
diff --git a/drivers/cdx/cdx_msi.c b/drivers/cdx/cdx_msi.c
index 78edb7308856..c8d832b0b1f5 100644
--- a/drivers/cdx/cdx_msi.c
+++ b/drivers/cdx/cdx_msi.c
@@ -121,22 +121,23 @@ static int cdx_msi_prepare(struct irq_domain *msi_domain,
struct device *dev,
int nvec, msi_alloc_info_t *info)
{
+ struct of_phandle_args msi_spec = {};
struct cdx_device *cdx_dev = to_cdx_device(dev);
struct device *parent = cdx_dev->cdx->dev;
struct msi_domain_info *msi_info;
- u32 dev_id;
int ret;
/* Retrieve device ID from requestor ID using parent device */
- ret = of_map_msi_id(parent->of_node, cdx_dev->msi_dev_id, NULL, &dev_id);
+ ret = of_map_msi_id(parent->of_node, cdx_dev->msi_dev_id, NULL, &msi_spec);
if (ret) {
dev_err(dev, "of_map_msi_id failed for MSI: %d\n", ret);
return ret;
}
+ of_node_put(msi_spec.np);
#ifdef GENERIC_MSI_DOMAIN_OPS
/* Set the device Id to be passed to the GIC-ITS */
- info->scratchpad[0].ul = dev_id;
+ info->scratchpad[0].ul = msi_spec.args[0];
#endif
msi_info = msi_get_domain_info(msi_domain->parent);
diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
index a511ecf21fcd..a18bb60f6f3d 100644
--- a/drivers/iommu/of_iommu.c
+++ b/drivers/iommu/of_iommu.c
@@ -45,10 +45,10 @@ static int of_iommu_configure_dev_id(struct device_node *master_np,
struct device *dev,
const u32 *id)
{
- struct of_phandle_args iommu_spec = { .args_count = 1 };
+ struct of_phandle_args iommu_spec = {};
int err;
- err = of_map_iommu_id(master_np, *id, &iommu_spec.np, iommu_spec.args);
+ err = of_map_iommu_id(master_np, *id, &iommu_spec);
if (err)
return err;
diff --git a/drivers/irqchip/irq-gic-its-msi-parent.c b/drivers/irqchip/irq-gic-its-msi-parent.c
index b63343a227a9..b9257103a999 100644
--- a/drivers/irqchip/irq-gic-its-msi-parent.c
+++ b/drivers/irqchip/irq-gic-its-msi-parent.c
@@ -152,6 +152,8 @@ static int its_v5_pci_msi_prepare(struct irq_domain *domain, struct device *dev,
static int of_pmsi_get_msi_info(struct irq_domain *domain, struct device *dev, u32 *dev_id,
phys_addr_t *pa)
{
+ struct device_node *msi_ctrl = NULL;
+ struct of_phandle_args msi_spec = {};
struct of_phandle_iterator it;
int ret;
@@ -178,9 +180,11 @@ static int of_pmsi_get_msi_info(struct irq_domain *domain, struct device *dev, u
}
}
- struct device_node *msi_ctrl __free(device_node) = NULL;
-
- return of_map_msi_id(dev->of_node, dev->id, &msi_ctrl, dev_id);
+ ret = of_map_msi_id(dev->of_node, dev->id, &msi_ctrl, &msi_spec);
+ if (!ret)
+ *dev_id = msi_spec.args[0];
+ of_node_put(msi_spec.np);
+ return ret;
}
static int its_pmsi_prepare(struct irq_domain *domain, struct device *dev,
diff --git a/drivers/of/base.c b/drivers/of/base.c
index 1e9b9692c0d9..d658c2620135 100644
--- a/drivers/of/base.c
+++ b/drivers/of/base.c
@@ -2122,36 +2122,40 @@ int of_find_last_cache_level(unsigned int cpu)
* @id: device ID to map.
* @map_name: property name of the map to use.
* @map_mask_name: optional property name of the mask to use.
- * @target: optional pointer to a target device node.
- * @id_out: optional pointer to receive the translated ID.
+ * @filter_np: pointer to an optional filter node, or NULL to allow bypass.
+ * If non-NULL, the map property must exist (-ENODEV if absent). If
+ * *filter_np is also non-NULL, only entries targeting that node match.
+ * @arg: pointer to a &struct of_phandle_args for the result. On success,
+ * @arg->args[0] will contain the translated ID. If a map entry was
+ * matched, @arg->np will be set to the target node with a reference
+ * held that the caller must release with of_node_put().
*
* Given a device ID, look up the appropriate implementation-defined
* platform ID and/or the target device which receives transactions on that
- * ID, as per the "iommu-map" and "msi-map" bindings. Either of @target or
- * @id_out may be NULL if only the other is required. If @target points to
- * a non-NULL device node pointer, only entries targeting that node will be
- * matched; if it points to a NULL value, it will receive the device node of
- * the first matching target phandle, with a reference held.
+ * ID, as per the "iommu-map" and "msi-map" bindings.
*
* Return: 0 on success or a standard error code on failure.
*/
int of_map_id(const struct device_node *np, u32 id,
const char *map_name, const char *map_mask_name,
- struct device_node **target, u32 *id_out)
+ struct device_node * const *filter_np, struct of_phandle_args *arg)
{
u32 map_mask, masked_id;
int map_len;
const __be32 *map = NULL;
- if (!np || !map_name || (!target && !id_out))
+ if (!np || !map_name || !arg)
return -EINVAL;
+ /* Ensure bypass/no-match success never returns a stale target node. */
+ arg->np = NULL;
map = of_get_property(np, map_name, &map_len);
if (!map) {
- if (target)
+ if (filter_np)
return -ENODEV;
/* Otherwise, no map implies no translation */
- *id_out = id;
+ arg->args[0] = id;
+ arg->args_count = 1;
return 0;
}
@@ -2193,18 +2197,14 @@ int of_map_id(const struct device_node *np, u32 id,
if (!phandle_node)
return -ENODEV;
- if (target) {
- if (*target)
- of_node_put(phandle_node);
- else
- *target = phandle_node;
-
- if (*target != phandle_node)
- continue;
+ if (filter_np && *filter_np && *filter_np != phandle_node) {
+ of_node_put(phandle_node);
+ continue;
}
- if (id_out)
- *id_out = masked_id - id_base + out_base;
+ arg->np = phandle_node;
+ arg->args[0] = masked_id - id_base + out_base;
+ arg->args_count = 1;
pr_debug("%pOF: %s, using mask %08x, id-base: %08x, out-base: %08x, length: %08x, id: %08x -> %08x\n",
np, map_name, map_mask, id_base, out_base,
@@ -2213,11 +2213,11 @@ int of_map_id(const struct device_node *np, u32 id,
}
pr_info("%pOF: no %s translation for id 0x%x on %pOF\n", np, map_name,
- id, target && *target ? *target : NULL);
+ id, filter_np && *filter_np ? *filter_np : NULL);
/* Bypasses translation */
- if (id_out)
- *id_out = id;
+ arg->args[0] = id;
+ arg->args_count = 1;
return 0;
}
EXPORT_SYMBOL_GPL(of_map_id);
@@ -2227,17 +2227,19 @@ EXPORT_SYMBOL_GPL(of_map_id);
* @np: root complex device node.
* @id: Requester ID of the device (e.g. PCI RID/BDF or a platform
* stream/device ID) used as the lookup key in the iommu-map table.
- * @target: optional pointer to a target device node.
- * @id_out: optional pointer to receive the translated ID.
+ * @arg: pointer to a &struct of_phandle_args for the result. On success,
+ * @arg->args[0] contains the translated ID. If a map entry was matched,
+ * @arg->np holds a reference to the target node that the caller must
+ * release with of_node_put().
*
* Convenience wrapper around of_map_id() using "iommu-map" and "iommu-map-mask".
*
* Return: 0 on success or a standard error code on failure.
*/
int of_map_iommu_id(const struct device_node *np, u32 id,
- struct device_node **target, u32 *id_out)
+ struct of_phandle_args *arg)
{
- return of_map_id(np, id, "iommu-map", "iommu-map-mask", target, id_out);
+ return of_map_id(np, id, "iommu-map", "iommu-map-mask", NULL, arg);
}
EXPORT_SYMBOL_GPL(of_map_iommu_id);
@@ -2246,16 +2248,21 @@ EXPORT_SYMBOL_GPL(of_map_iommu_id);
* @np: root complex device node.
* @id: Requester ID of the device (e.g. PCI RID/BDF or a platform
* stream/device ID) used as the lookup key in the msi-map table.
- * @target: optional pointer to a target device node.
- * @id_out: optional pointer to receive the translated ID.
+ * @filter_np: pointer to an optional filter node, or NULL to allow bypass.
+ * If non-NULL, the map property must exist (-ENODEV if absent). If
+ * *filter_np is also non-NULL, only entries targeting that node match.
+ * @arg: pointer to a &struct of_phandle_args for the result. On success,
+ * @arg->args[0] contains the translated ID. If a map entry was matched,
+ * @arg->np holds a reference to the target node that the caller must
+ * release with of_node_put().
*
* Convenience wrapper around of_map_id() using "msi-map" and "msi-map-mask".
*
* Return: 0 on success or a standard error code on failure.
*/
int of_map_msi_id(const struct device_node *np, u32 id,
- struct device_node **target, u32 *id_out)
+ struct device_node * const *filter_np, struct of_phandle_args *arg)
{
- return of_map_id(np, id, "msi-map", "msi-map-mask", target, id_out);
+ return of_map_id(np, id, "msi-map", "msi-map-mask", filter_np, arg);
}
EXPORT_SYMBOL_GPL(of_map_msi_id);
diff --git a/drivers/of/irq.c b/drivers/of/irq.c
index e37c1b3f8736..e63a43be6c4a 100644
--- a/drivers/of/irq.c
+++ b/drivers/of/irq.c
@@ -796,14 +796,15 @@ static int of_check_msi_parent(struct device_node *dev_node, struct device_node
/**
* of_msi_xlate - map a MSI ID and find relevant MSI controller node
* @dev: device for which the mapping is to be done.
- * @msi_np: Pointer to target MSI controller node
+ * @msi_np: Pointer to target MSI controller node, or NULL if the caller
+ * only needs the translated ID without receiving the controller node.
+ * If non-NULL and pointing to a non-NULL node, only entries targeting
+ * that node will be matched. If non-NULL and pointing to NULL, it will
+ * receive the first matching target node with a reference held.
* @id_in: Device ID.
*
* Walk up the device hierarchy looking for devices with a "msi-map"
* or "msi-parent" property. If found, apply the mapping to @id_in.
- * If @msi_np points to a non-NULL device node pointer, only entries targeting
- * that node will be matched; if it points to a NULL value, it will receive the
- * device node of the first matching target phandle, with a reference held.
*
* Returns: The mapped MSI id.
*/
@@ -817,8 +818,21 @@ u32 of_msi_xlate(struct device *dev, struct device_node **msi_np, u32 id_in)
* "msi-map" or an "msi-parent" property.
*/
for (parent_dev = dev; parent_dev; parent_dev = parent_dev->parent) {
- if (!of_map_msi_id(parent_dev->of_node, id_in, msi_np, &id_out))
+ struct of_phandle_args msi_spec = {};
+
+ if (!of_map_msi_id(parent_dev->of_node, id_in, msi_np, &msi_spec)) {
+ if (msi_spec.np) {
+ /* msi-map matched: use the translated ID and target node */
+ if (msi_spec.args_count > 0)
+ id_out = msi_spec.args[0];
+ if (msi_np && !*msi_np)
+ *msi_np = of_node_get(msi_spec.np);
+ of_node_put(msi_spec.np);
+ }
+ /* msi-map present but no match → stop walking */
break;
+ }
+ /* -ENODEV: msi-map absent → check for msi-parent */
if (!of_check_msi_parent(parent_dev->of_node, msi_np))
break;
}
diff --git a/drivers/pci/controller/dwc/pci-imx6.c b/drivers/pci/controller/dwc/pci-imx6.c
index c863c7b02289..105038c15aa8 100644
--- a/drivers/pci/controller/dwc/pci-imx6.c
+++ b/drivers/pci/controller/dwc/pci-imx6.c
@@ -1121,41 +1121,42 @@ static void imx_pcie_remove_lut(struct imx_pcie *imx_pcie, u16 rid)
static int imx_pcie_add_lut_by_rid(struct imx_pcie *imx_pcie, u32 rid)
{
+ struct of_phandle_args iommu_spec = {};
+ struct of_phandle_args msi_spec = {};
struct device *dev = imx_pcie->pci->dev;
- struct device_node *target;
+ struct device_node *msi_filter = NULL;
u32 sid_i, sid_m;
int err_i, err_m;
u32 sid = 0;
- target = NULL;
- err_i = of_map_iommu_id(dev->of_node, rid, &target, &sid_i);
- if (target) {
- of_node_put(target);
- } else {
- /*
- * "target == NULL && err_i == 0" means RID out of map range.
- * Use 1:1 map RID to streamID. Hardware can't support this
- * because the streamID is only 6 bits
- */
- err_i = -EINVAL;
+ err_i = of_map_iommu_id(dev->of_node, rid, &iommu_spec);
+ if (!err_i) {
+ if (!iommu_spec.np)
+ /*
+ * "iommu_spec.np == NULL && err_i == 0" means RID out of map
+ * range. Use 1:1 map RID to streamID. Hardware can't support
+ * this because the streamID is only 6 bits.
+ */
+ err_i = -EINVAL;
+ else
+ sid_i = iommu_spec.args[0];
}
+ of_node_put(iommu_spec.np);
- target = NULL;
- err_m = of_map_msi_id(dev->of_node, rid, &target, &sid_m);
-
+ err_m = of_map_msi_id(dev->of_node, rid, &msi_filter, &msi_spec);
/*
- * err_m target
- * 0 NULL RID out of range. Use 1:1 map RID to
- * streamID, Current hardware can't
- * support it, so return -EINVAL.
- * != 0 NULL msi-map does not exist, use built-in MSI
- * 0 != NULL Get correct streamID from RID
- * != 0 != NULL Invalid combination
+ * err_m msi_spec.np
+ * 0 != NULL Got correct streamID from RID via msi-map
+ * 0 NULL msi-map present but RID out of range
+ * -ENODEV NULL msi-map absent, use built-in MSI controller
*/
- if (!err_m && !target)
- return -EINVAL;
- else if (target)
- of_node_put(target); /* Find streamID map entry for RID in msi-map */
+ if (!err_m) {
+ if (!msi_spec.np)
+ /* msi-map present but RID out of range */
+ return -EINVAL;
+ sid_m = msi_spec.args[0];
+ }
+ of_node_put(msi_spec.np);
/*
* msi-map iommu-map
diff --git a/drivers/pci/controller/pcie-apple.c b/drivers/pci/controller/pcie-apple.c
index a0937b7b3c4d..c2cffc0659f4 100644
--- a/drivers/pci/controller/pcie-apple.c
+++ b/drivers/pci/controller/pcie-apple.c
@@ -755,6 +755,7 @@ static int apple_pcie_enable_device(struct pci_host_bridge *bridge, struct pci_d
{
u32 sid, rid = pci_dev_id(pdev);
struct apple_pcie_port *port;
+ struct of_phandle_args iommu_spec = {};
int idx, err;
port = apple_pcie_get_port(pdev);
@@ -764,10 +765,12 @@ static int apple_pcie_enable_device(struct pci_host_bridge *bridge, struct pci_d
dev_dbg(&pdev->dev, "added to bus %s, index %d\n",
pci_name(pdev->bus->self), port->idx);
- err = of_map_iommu_id(port->pcie->dev->of_node, rid, NULL, &sid);
+ err = of_map_iommu_id(port->pcie->dev->of_node, rid, &iommu_spec);
if (err)
return err;
+ of_node_put(iommu_spec.np);
+ sid = iommu_spec.args[0];
mutex_lock(&port->pcie->lock);
idx = bitmap_find_free_region(port->sid_map, port->sid_map_sz, 0);
diff --git a/drivers/xen/grant-dma-ops.c b/drivers/xen/grant-dma-ops.c
index 1b7696b2d762..2aa1a772a0ff 100644
--- a/drivers/xen/grant-dma-ops.c
+++ b/drivers/xen/grant-dma-ops.c
@@ -319,13 +319,13 @@ static int xen_dt_grant_init_backend_domid(struct device *dev,
struct device_node *np,
domid_t *backend_domid)
{
- struct of_phandle_args iommu_spec = { .args_count = 1 };
+ struct of_phandle_args iommu_spec = {};
if (dev_is_pci(dev)) {
struct pci_dev *pdev = to_pci_dev(dev);
u32 rid = PCI_DEVID(pdev->bus->number, pdev->devfn);
- if (of_map_iommu_id(np, rid, &iommu_spec.np, iommu_spec.args)) {
+ if (of_map_iommu_id(np, rid, &iommu_spec)) {
dev_dbg(dev, "Cannot translate ID\n");
return -ESRCH;
}
diff --git a/include/linux/of.h b/include/linux/of.h
index 721525334b4b..ea50b45d9ff7 100644
--- a/include/linux/of.h
+++ b/include/linux/of.h
@@ -466,13 +466,15 @@ bool of_console_check(const struct device_node *dn, char *name, int index);
int of_map_id(const struct device_node *np, u32 id,
const char *map_name, const char *map_mask_name,
- struct device_node **target, u32 *id_out);
+ struct device_node * const *filter_np,
+ struct of_phandle_args *arg);
int of_map_iommu_id(const struct device_node *np, u32 id,
- struct device_node **target, u32 *id_out);
+ struct of_phandle_args *arg);
int of_map_msi_id(const struct device_node *np, u32 id,
- struct device_node **target, u32 *id_out);
+ struct device_node * const *filter_np,
+ struct of_phandle_args *arg);
phys_addr_t of_dma_get_max_cpu_address(struct device_node *np);
@@ -949,19 +951,21 @@ static inline void of_property_clear_flag(struct property *p, unsigned long flag
static inline int of_map_id(const struct device_node *np, u32 id,
const char *map_name, const char *map_mask_name,
- struct device_node **target, u32 *id_out)
+ struct device_node * const *filter_np,
+ struct of_phandle_args *arg)
{
return -EINVAL;
}
static inline int of_map_iommu_id(const struct device_node *np, u32 id,
- struct device_node **target, u32 *id_out)
+ struct of_phandle_args *arg)
{
return -EINVAL;
}
static inline int of_map_msi_id(const struct device_node *np, u32 id,
- struct device_node **target, u32 *id_out)
+ struct device_node * const *filter_np,
+ struct of_phandle_args *arg)
{
return -EINVAL;
}
--
2.34.1
^ permalink raw reply related
* [PATCH v15 1/3] of: Add convenience wrappers for of_map_id()
From: Vijayanand Jitta @ 2026-05-20 8:02 UTC (permalink / raw)
To: Nipun Gupta, Nikhil Agarwal, Joerg Roedel, Will Deacon,
Robin Murphy, Marc Zyngier, Lorenzo Pieralisi, Thomas Gleixner,
Saravana Kannan, Richard Zhu, Lucas Stach,
Krzysztof Wilczyński, Manivannan Sadhasivam, Bjorn Helgaas,
Frank Li, Sascha Hauer, Pengutronix Kernel Team, Fabio Estevam,
Juergen Gross, Stefano Stabellini, Oleksandr Tyshchenko,
Dmitry Baryshkov, Konrad Dybcio, Bjorn Andersson, Rob Herring,
Conor Dooley, Krzysztof Kozlowski, Prakash Gupta, Vikash Garodia
Cc: linux-kernel, iommu, linux-arm-kernel, devicetree, linux-pci, imx,
xen-devel, linux-arm-msm, Vijayanand Jitta
In-Reply-To: <20260520-parse_iommu_cells-v15-0-b5f99ad4e7e8@oss.qualcomm.com>
From: Robin Murphy <robin.murphy@arm.com>
Since we now have quite a few users parsing "iommu-map" and "msi-map"
properties, give them some wrappers to conveniently encapsulate the
appropriate sets of property names. This will also make it easier to
then change of_map_id() to correctly account for specifier cells.
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Vijayanand Jitta <vijayanand.jitta@oss.qualcomm.com>
---
drivers/cdx/cdx_msi.c | 5 ++---
drivers/iommu/of_iommu.c | 4 +---
drivers/irqchip/irq-gic-its-msi-parent.c | 2 +-
drivers/of/base.c | 38 ++++++++++++++++++++++++++++++++
drivers/of/irq.c | 3 +--
drivers/pci/controller/dwc/pci-imx6.c | 6 ++---
drivers/pci/controller/pcie-apple.c | 3 +--
drivers/xen/grant-dma-ops.c | 3 +--
include/linux/of.h | 18 +++++++++++++++
9 files changed, 65 insertions(+), 17 deletions(-)
diff --git a/drivers/cdx/cdx_msi.c b/drivers/cdx/cdx_msi.c
index 91b95422b263..78edb7308856 100644
--- a/drivers/cdx/cdx_msi.c
+++ b/drivers/cdx/cdx_msi.c
@@ -128,10 +128,9 @@ static int cdx_msi_prepare(struct irq_domain *msi_domain,
int ret;
/* Retrieve device ID from requestor ID using parent device */
- ret = of_map_id(parent->of_node, cdx_dev->msi_dev_id, "msi-map", "msi-map-mask",
- NULL, &dev_id);
+ ret = of_map_msi_id(parent->of_node, cdx_dev->msi_dev_id, NULL, &dev_id);
if (ret) {
- dev_err(dev, "of_map_id failed for MSI: %d\n", ret);
+ dev_err(dev, "of_map_msi_id failed for MSI: %d\n", ret);
return ret;
}
diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
index 6b989a62def2..a511ecf21fcd 100644
--- a/drivers/iommu/of_iommu.c
+++ b/drivers/iommu/of_iommu.c
@@ -48,9 +48,7 @@ static int of_iommu_configure_dev_id(struct device_node *master_np,
struct of_phandle_args iommu_spec = { .args_count = 1 };
int err;
- err = of_map_id(master_np, *id, "iommu-map",
- "iommu-map-mask", &iommu_spec.np,
- iommu_spec.args);
+ err = of_map_iommu_id(master_np, *id, &iommu_spec.np, iommu_spec.args);
if (err)
return err;
diff --git a/drivers/irqchip/irq-gic-its-msi-parent.c b/drivers/irqchip/irq-gic-its-msi-parent.c
index d36b278ae66c..b63343a227a9 100644
--- a/drivers/irqchip/irq-gic-its-msi-parent.c
+++ b/drivers/irqchip/irq-gic-its-msi-parent.c
@@ -180,7 +180,7 @@ static int of_pmsi_get_msi_info(struct irq_domain *domain, struct device *dev, u
struct device_node *msi_ctrl __free(device_node) = NULL;
- return of_map_id(dev->of_node, dev->id, "msi-map", "msi-map-mask", &msi_ctrl, dev_id);
+ return of_map_msi_id(dev->of_node, dev->id, &msi_ctrl, dev_id);
}
static int its_pmsi_prepare(struct irq_domain *domain, struct device *dev,
diff --git a/drivers/of/base.c b/drivers/of/base.c
index a650c91897cc..1e9b9692c0d9 100644
--- a/drivers/of/base.c
+++ b/drivers/of/base.c
@@ -2221,3 +2221,41 @@ int of_map_id(const struct device_node *np, u32 id,
return 0;
}
EXPORT_SYMBOL_GPL(of_map_id);
+
+/**
+ * of_map_iommu_id - Translate an ID using "iommu-map" bindings.
+ * @np: root complex device node.
+ * @id: Requester ID of the device (e.g. PCI RID/BDF or a platform
+ * stream/device ID) used as the lookup key in the iommu-map table.
+ * @target: optional pointer to a target device node.
+ * @id_out: optional pointer to receive the translated ID.
+ *
+ * Convenience wrapper around of_map_id() using "iommu-map" and "iommu-map-mask".
+ *
+ * Return: 0 on success or a standard error code on failure.
+ */
+int of_map_iommu_id(const struct device_node *np, u32 id,
+ struct device_node **target, u32 *id_out)
+{
+ return of_map_id(np, id, "iommu-map", "iommu-map-mask", target, id_out);
+}
+EXPORT_SYMBOL_GPL(of_map_iommu_id);
+
+/**
+ * of_map_msi_id - Translate an ID using "msi-map" bindings.
+ * @np: root complex device node.
+ * @id: Requester ID of the device (e.g. PCI RID/BDF or a platform
+ * stream/device ID) used as the lookup key in the msi-map table.
+ * @target: optional pointer to a target device node.
+ * @id_out: optional pointer to receive the translated ID.
+ *
+ * Convenience wrapper around of_map_id() using "msi-map" and "msi-map-mask".
+ *
+ * Return: 0 on success or a standard error code on failure.
+ */
+int of_map_msi_id(const struct device_node *np, u32 id,
+ struct device_node **target, u32 *id_out)
+{
+ return of_map_id(np, id, "msi-map", "msi-map-mask", target, id_out);
+}
+EXPORT_SYMBOL_GPL(of_map_msi_id);
diff --git a/drivers/of/irq.c b/drivers/of/irq.c
index 6367c67732d2..e37c1b3f8736 100644
--- a/drivers/of/irq.c
+++ b/drivers/of/irq.c
@@ -817,8 +817,7 @@ u32 of_msi_xlate(struct device *dev, struct device_node **msi_np, u32 id_in)
* "msi-map" or an "msi-parent" property.
*/
for (parent_dev = dev; parent_dev; parent_dev = parent_dev->parent) {
- if (!of_map_id(parent_dev->of_node, id_in, "msi-map",
- "msi-map-mask", msi_np, &id_out))
+ if (!of_map_msi_id(parent_dev->of_node, id_in, msi_np, &id_out))
break;
if (!of_check_msi_parent(parent_dev->of_node, msi_np))
break;
diff --git a/drivers/pci/controller/dwc/pci-imx6.c b/drivers/pci/controller/dwc/pci-imx6.c
index 1034ac5c5f5c..c863c7b02289 100644
--- a/drivers/pci/controller/dwc/pci-imx6.c
+++ b/drivers/pci/controller/dwc/pci-imx6.c
@@ -1128,8 +1128,7 @@ static int imx_pcie_add_lut_by_rid(struct imx_pcie *imx_pcie, u32 rid)
u32 sid = 0;
target = NULL;
- err_i = of_map_id(dev->of_node, rid, "iommu-map", "iommu-map-mask",
- &target, &sid_i);
+ err_i = of_map_iommu_id(dev->of_node, rid, &target, &sid_i);
if (target) {
of_node_put(target);
} else {
@@ -1142,8 +1141,7 @@ static int imx_pcie_add_lut_by_rid(struct imx_pcie *imx_pcie, u32 rid)
}
target = NULL;
- err_m = of_map_id(dev->of_node, rid, "msi-map", "msi-map-mask",
- &target, &sid_m);
+ err_m = of_map_msi_id(dev->of_node, rid, &target, &sid_m);
/*
* err_m target
diff --git a/drivers/pci/controller/pcie-apple.c b/drivers/pci/controller/pcie-apple.c
index 2d92fc79f6dd..a0937b7b3c4d 100644
--- a/drivers/pci/controller/pcie-apple.c
+++ b/drivers/pci/controller/pcie-apple.c
@@ -764,8 +764,7 @@ static int apple_pcie_enable_device(struct pci_host_bridge *bridge, struct pci_d
dev_dbg(&pdev->dev, "added to bus %s, index %d\n",
pci_name(pdev->bus->self), port->idx);
- err = of_map_id(port->pcie->dev->of_node, rid, "iommu-map",
- "iommu-map-mask", NULL, &sid);
+ err = of_map_iommu_id(port->pcie->dev->of_node, rid, NULL, &sid);
if (err)
return err;
diff --git a/drivers/xen/grant-dma-ops.c b/drivers/xen/grant-dma-ops.c
index c2603e700178..1b7696b2d762 100644
--- a/drivers/xen/grant-dma-ops.c
+++ b/drivers/xen/grant-dma-ops.c
@@ -325,8 +325,7 @@ static int xen_dt_grant_init_backend_domid(struct device *dev,
struct pci_dev *pdev = to_pci_dev(dev);
u32 rid = PCI_DEVID(pdev->bus->number, pdev->devfn);
- if (of_map_id(np, rid, "iommu-map", "iommu-map-mask", &iommu_spec.np,
- iommu_spec.args)) {
+ if (of_map_iommu_id(np, rid, &iommu_spec.np, iommu_spec.args)) {
dev_dbg(dev, "Cannot translate ID\n");
return -ESRCH;
}
diff --git a/include/linux/of.h b/include/linux/of.h
index 959786f8f196..721525334b4b 100644
--- a/include/linux/of.h
+++ b/include/linux/of.h
@@ -468,6 +468,12 @@ int of_map_id(const struct device_node *np, u32 id,
const char *map_name, const char *map_mask_name,
struct device_node **target, u32 *id_out);
+int of_map_iommu_id(const struct device_node *np, u32 id,
+ struct device_node **target, u32 *id_out);
+
+int of_map_msi_id(const struct device_node *np, u32 id,
+ struct device_node **target, u32 *id_out);
+
phys_addr_t of_dma_get_max_cpu_address(struct device_node *np);
struct kimage;
@@ -948,6 +954,18 @@ static inline int of_map_id(const struct device_node *np, u32 id,
return -EINVAL;
}
+static inline int of_map_iommu_id(const struct device_node *np, u32 id,
+ struct device_node **target, u32 *id_out)
+{
+ return -EINVAL;
+}
+
+static inline int of_map_msi_id(const struct device_node *np, u32 id,
+ struct device_node **target, u32 *id_out)
+{
+ return -EINVAL;
+}
+
static inline phys_addr_t of_dma_get_max_cpu_address(struct device_node *np)
{
return PHYS_ADDR_MAX;
--
2.34.1
^ permalink raw reply related
* [PATCH v15 0/3] of: parsing of multi #{iommu,msi}-cells in maps
From: Vijayanand Jitta @ 2026-05-20 8:02 UTC (permalink / raw)
To: Nipun Gupta, Nikhil Agarwal, Joerg Roedel, Will Deacon,
Robin Murphy, Marc Zyngier, Lorenzo Pieralisi, Thomas Gleixner,
Saravana Kannan, Richard Zhu, Lucas Stach,
Krzysztof Wilczyński, Manivannan Sadhasivam, Bjorn Helgaas,
Frank Li, Sascha Hauer, Pengutronix Kernel Team, Fabio Estevam,
Juergen Gross, Stefano Stabellini, Oleksandr Tyshchenko,
Dmitry Baryshkov, Konrad Dybcio, Bjorn Andersson, Rob Herring,
Conor Dooley, Krzysztof Kozlowski, Prakash Gupta, Vikash Garodia
Cc: linux-kernel, iommu, linux-arm-kernel, devicetree, linux-pci, imx,
xen-devel, linux-arm-msm, Vijayanand Jitta, Charan Teja Kalla
So far our parsing of {iommu,msi}-map properties has always blindly
assumed that the output specifiers will always have exactly 1 cell.
This typically does happen to be the case, but is not actually enforced
(and the PCI msi-map binding even explicitly states support for 0 or 1
cells) - as a result we've now ended up with dodgy DTs out in the field
which depend on this behaviour to map a 1-cell specifier for a 2-cell
provider, despite that being bogus per the bindings themselves.
Since there is some potential use[1] in being able to map at least
single input IDs to multi-cell output specifiers (and properly support
0-cell outputs as well), add support for properly parsing and using the
target nodes' #cells values, albeit with the unfortunate complication of
still having to work around expectations of the old behaviour too.
-- Robin.
Unlike single #{}-cell, it is complex to establish a linear relation
between input 'id' and output specifier for multi-cell properties, thus
it is always expected that len never going to be > 1.
These changes have been tested on QEMU for the arm64 architecture.
Since, this would also need update in dt-schema, raised PR[2] for the
same.
[1] https://lore.kernel.org/all/20250627-video_cb-v3-0-51e18c0ffbce@quicinc.com/
[2] PR for iommu-map dtschema: https://github.com/devicetree-org/dt-schema/pull/184
V15:
Address Sashiko AI review comments on v14:
Patch 2:
- [Critical] pci-imx6: pass &msi_filter (not NULL) to of_map_msi_id()
so that of_map_id() returns -ENODEV when msi-map is absent, preventing
the '!err_m && !msi_spec.np' path from incorrectly returning -EINVAL
- [High] of_map_id(): explicitly set arg->np = NULL before any bypass
path so callers can safely call of_node_put(arg->np) on all return paths
- [Medium] of_msi_xlate(): pass msi_np directly to of_map_msi_id() and
of_check_msi_parent() (removing the local_np/np indirection), and use
'break' (not 'continue') when msi_spec.np is NULL so that msi-parent
bindings are still checked when msi-map is present but has no match
- Guard 'id_out = msi_spec.args[0]' with 'args_count > 0' in
of_msi_xlate() to correctly handle 0-cell MSI output specifiers
- Use of_node_get() + unconditional of_node_put() in of_msi_xlate()
for clearer reference ownership
Patch 3:
- [Critical] of_map_id(): add 'cells > MAX_PHANDLE_ARGS' check before
using cells as an array index to prevent stack buffer overflow
- [High] of_map_id(): the MAX_PHANDLE_ARGS bound on cells also prevents
integer overflow in the '3 + cells' length check
- [High] of_map_id(): fix misleading bad-map workaround message from
"assuming extra cell of 0" to "treating as 1-cell output" to accurately
describe the actual behavior
- [Medium] of_msi_xlate(): guard 'id_out = msi_spec.args[0]' with
'args_count > 0' to preserve id_in for 0-cell MSI output specifiers
Link to v14:
https://patch.msgid.link/20260424-parse_iommu_cells-v14-0-fd02f11b6c38@oss.qualcomm.com
V14:
- Updated Patch 2 ("of: Factor arguments passed to of_map_id() into a struct") to
fix below two issues in of_msi_xlate() that were introduced by the API refactoring:
1) The refactoring changed of_map_id()'s dual-purpose **target parameter to
an explicit filter_np parameter. In of_msi_xlate(), this caused
of_map_msi_id() to return 0 (pass-through) instead of -ENODEV when a node
has no msi-map, terminating the device hierarchy walk prematurely before
reaching the root complex node that has the msi-map. This broke MSI
allocation for PCIe endpoint devices (e.g., wcn7850 Wi-Fi on ARM64).
2) Additionally, fsl_mc_get_msi_id() passes msi_np == NULL to of_msi_xlate(),
which would dereference NULL with the new API.
Link to v13:
https://patch.msgid.link/20260408-parse_iommu_cells-v13-0-fa921e92661b@oss.qualcomm.com
V13:
- Fix bad_map handling in of_map_id(): 'cells' is re-initialized to 0
on each loop iteration, so the !bad_map guard was insufficient, cells
stayed 0 for all entries after the first. Fix by explicitly setting
cells=1 when bad_map is true on every iteration.
- Collected Acked-by from Frank Li.
Link to v12:
https://patch.msgid.link/20260331-parse_iommu_cells-v12-0-decfd305eea9@oss.qualcomm.com
V12:
- Call of_node_put() unconditionally in imx_pcie_add_lut_by_rid()
thereby addressing comments from Bjorn Helgaas.
Link to v11:
https://lore.kernel.org/r/20260325-parse_iommu_cells-v11-0-1fefa5c0e82c@oss.qualcomm.com
V11:
- Added explicit filter_np parameter to of_map_id() and of_map_msi_id()
per Dmitry Baryshkov's review feedback, making the filter explicit
instead of overloading arg->np as both input filter and output parameter.
- Removed of_node_put() from inside of_map_id(), making the caller responsible
for reference management. Updated of_msi_xlate() to properly handle reference counting.
- Collected ACKed by tags, and fixed minor typos.
Link to v10:
https://lore.kernel.org/r/20260309-parse_iommu_cells-v10-0-c62fcaa5a1d8@oss.qualcomm.com
V10:
- Move of_map_iommu_id()/of_map_msi_id() from include/linux/of.h to
drivers/of/base.c as out-of-line helpers per feedback from Marc Zyngier
and Rob Herring.
- Add kernel-doc to document both helpers for discoverability and
usage clarity.
- Fix of_map_msi_id() wrapper and all its callers (cdx_msi.c,
irq-gic-its-msi-parent.c, drivers/of/irq.c) to correctly use the new
struct of_phandle_args-based API with proper of_node_put() handling
as per feeback from Dmitry.
Link to v9:
https://lore.kernel.org/r/20260301-parse_iommu_cells-v9-0-4d1bceecc5e1@oss.qualcomm.com
V9:
- Updated TO/CC list based on feedback to include all relevant
maintainers.
- No functional changes to the patches themselves.
Link to V8:
https://lore.kernel.org/all/20260226074245.3098486-1-vijayanand.jitta@oss.qualcomm.com/
V8:
- Removed mentions of of_map_args from commit message to match code.
Link to V7:
https://lore.kernel.org/all/20260210101157.2145113-1-vijayanand.jitta@oss.qualcomm.com/
V7:
- Removed of_map_id_args structure and replaced it with
of_phandle_args as suggested by Dmitry.
Link to V6:
https://lore.kernel.org/all/20260121055400.937856-1-vijayanand.jitta@oss.qualcomm.com/
V6:
- Fixed build error reported by kernel test bot.
Link to V5:
https://lore.kernel.org/all/20260118181125.1436036-1-vijayanand.jitta@oss.qualcomm.com/
V5:
- Fixed Build Warnings.
- Raised PR for iommu-map dtschema: https://github.com/devicetree-org/dt-schema/pull/184
Link to V4:
https://lore.kernel.org/all/20251231114257.2382820-1-vijayanand.jitta@oss.qualcomm.com/
V4:
- Added Reviewed-by tag.
- Resolved warnings reported by kernel test bot, minor code
reorganization.
Link to V3:
https://lore.kernel.org/all/20251221213602.2413124-1-vijayanand.jitta@oss.qualcomm.com/
V3:
- Added Reviewed-by tag.
- Updated of_map_id_args struct as a wrapper to of_phandle_args and
added comment description as suggested by Rob Herring.
Link to V2:
https://lore.kernel.org/all/20251204095530.8627-1-vijayanand.jitta@oss.qualcomm.com/
V2:
- Incorporated the patches from Robin that does the clean implementation.
- Dropped the patches the were adding multi-map support from this series
as suggested.
V1:
https://lore.kernel.org/all/cover.1762235099.git.charan.kalla@oss.qualcomm.com/
RFC:
https://lore.kernel.org/all/20250928171718.436440-1-charan.kalla@oss.qualcomm.com/#r
Signed-off-by: Vijayanand Jitta <vijayanand.jitta@oss.qualcomm.com>
---
To: Nipun Gupta <nipun.gupta@amd.com>
To: Nikhil Agarwal <nikhil.agarwal@amd.com>
To: Joerg Roedel <joro@8bytes.org>
To: Will Deacon <will@kernel.org>
To: Robin Murphy <robin.murphy@arm.com>
To: Lorenzo Pieralisi <lpieralisi@kernel.org>
To: Marc Zyngier <maz@kernel.org>
To: Thomas Gleixner <tglx@kernel.org>
To: Rob Herring <robh@kernel.org>
To: Saravana Kannan <saravanak@kernel.org>
To: Richard Zhu <hongxing.zhu@nxp.com>
To: Lucas Stach <l.stach@pengutronix.de>
To: Krzysztof Wilczyński <kwilczynski@kernel.org>
To: Manivannan Sadhasivam <mani@kernel.org>
To: Bjorn Helgaas <bhelgaas@google.com>
To: Frank Li <Frank.Li@nxp.com>
To: Sascha Hauer <s.hauer@pengutronix.de>
To: Pengutronix Kernel Team <kernel@pengutronix.de>
To: Fabio Estevam <festevam@gmail.com>
To: Juergen Gross <jgross@suse.com>
To: Stefano Stabellini <sstabellini@kernel.org>
To: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Cc: linux-arm-msm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: iommu@lists.linux.dev
Cc: linux-arm-kernel@lists.infradead.org
Cc: devicetree@vger.kernel.org
Cc: linux-pci@vger.kernel.org
Cc: imx@lists.linux.dev
Cc: xen-devel@lists.xenproject.org
---
To: Nipun Gupta <nipun.gupta@amd.com>
To: Nikhil Agarwal <nikhil.agarwal@amd.com>
To: Joerg Roedel <joro@8bytes.org>
To: Will Deacon <will@kernel.org>
To: Robin Murphy <robin.murphy@arm.com>
To: Lorenzo Pieralisi <lpieralisi@kernel.org>
To: Marc Zyngier <maz@kernel.org>
To: Thomas Gleixner <tglx@kernel.org>
To: Rob Herring <robh@kernel.org>
To: Saravana Kannan <saravanak@kernel.org>
To: Richard Zhu <hongxing.zhu@nxp.com>
To: Lucas Stach <l.stach@pengutronix.de>
To: Krzysztof Wilczyński <kwilczynski@kernel.org>
To: Manivannan Sadhasivam <mani@kernel.org>
To: Bjorn Helgaas <bhelgaas@google.com>
To: Frank Li <Frank.Li@nxp.com>
To: Sascha Hauer <s.hauer@pengutronix.de>
To: Pengutronix Kernel Team <kernel@pengutronix.de>
To: Fabio Estevam <festevam@gmail.com>
To: Juergen Gross <jgross@suse.com>
To: Stefano Stabellini <sstabellini@kernel.org>
To: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Cc: linux-arm-msm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: iommu@lists.linux.dev
Cc: linux-arm-kernel@lists.infradead.org
Cc: devicetree@vger.kernel.org
Cc: linux-pci@vger.kernel.org
Cc: imx@lists.linux.dev
Cc: xen-devel@lists.xenproject.org
---
Charan Teja Kalla (1):
of: Factor arguments passed to of_map_id() into a struct
Robin Murphy (2):
of: Add convenience wrappers for of_map_id()
of: Respect #{iommu,msi}-cells in maps
drivers/cdx/cdx_msi.c | 10 +-
drivers/iommu/of_iommu.c | 6 +-
drivers/irqchip/irq-gic-its-msi-parent.c | 10 +-
drivers/of/base.c | 227 +++++++++++++++++++++++++------
drivers/of/irq.c | 25 +++-
drivers/pci/controller/dwc/pci-imx6.c | 55 ++++----
drivers/pci/controller/pcie-apple.c | 6 +-
drivers/xen/grant-dma-ops.c | 5 +-
include/linux/of.h | 32 ++++-
9 files changed, 277 insertions(+), 99 deletions(-)
---
base-commit: e98d21c170b01ddef366f023bbfcf6b31509fa83
change-id: 20260301-parse_iommu_cells-1c33768aebba
Best regards,
--
Vijayanand Jitta <vijayanand.jitta@oss.qualcomm.com>
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox