* [net-next v18 0/7] net: mtip: Add support for MTIP imx287 L2 switch driver
@ 2025-08-13 7:07 Lukasz Majewski
[not found] ` <20250813070755.1523898-3-lukasz.majewski@mailbox.org>
[not found] ` <20250813070755.1523898-6-lukasz.majewski@mailbox.org>
0 siblings, 2 replies; 6+ messages in thread
From: Lukasz Majewski @ 2025-08-13 7:07 UTC (permalink / raw)
To: Andrew Lunn, davem, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Rob Herring, Krzysztof Kozlowski, Conor Dooley, Shawn Guo
Cc: Sascha Hauer, Pengutronix Kernel Team, Fabio Estevam,
Richard Cochran, netdev, devicetree, linux-kernel, imx,
linux-arm-kernel, Stefan Wahren, Simon Horman, Lukasz Majewski
This patch series adds support for More Than IP's L2 switch driver embedded
in some NXP's SoCs. This one has been tested on imx287, but is also available
in the vf610.
In the past there has been performed some attempts to upstream this driver:
1. The 4.19-cip based one [1]
2. DSA based one for 5.12 [2] - i.e. the switch itself was treat as a DSA switch
with NO tag appended.
3. The extension for FEC driver for 5.12 [3] - the trick here was to fully reuse
FEC when the in-HW switching is disabled. When bridge offloading is enabled,
the driver uses already configured MAC and PHY to also configure PHY.
All three approaches were not accepted as eligible for upstreaming.
The driver from this series has floowing features:
1. It is fully separated from fec_main - i.e. can be used interchangeable
with it. To be more specific - one can build them as modules and
if required switch between them when e.g. bridge offloading is required.
To be more specific:
- Use FEC_MAIN: When one needs support for two ETH ports with separate
uDMAs used for both and bridging can be realized in SW.
- Use MTIPL2SW: When it is enough to support two ports with only uDMA0
attached to switch and bridging shall be offloaded to HW.
2. This driver uses MTIP's L2 switch internal VLAN feature to provide port
separation at boot time. Port separation is disabled when bridging is
required.
3. Example usage:
Configuration:
ip link set lan0 up; sleep 1;
ip link set lan1 up; sleep 1;
ip link add name br0 type bridge;
ip link set br0 up; sleep 1;
ip link set lan0 master br0;
ip link set lan1 master br0;
bridge link;
ip addr add 192.168.2.17/24 dev br0;
ping -c 5 192.168.2.222
Removal:
ip link set br0 down;
ip link delete br0 type bridge;
ip link set dev lan1 down
ip link set dev lan0 down
4. Limitations:
- Driver enables and disables switch operation with learning and ageing.
- Missing is the advanced configuration (e.g. adding entries to FBD). This is
on purpose, as up till now we didn't had consensus about how the driver
shall be added to Linux.
5. Clang build:
make LLVM_SUFFIX=-19 LLVM=1 mrproper
cp ./arch/arm/configs/mxs_defconfig .config
make ARCH=arm LLVM_SUFFIX=-19 LLVM=1 W=1 menuconfig
make ARCH=arm LLVM_SUFFIX=-19 LLVM=1 W=1 -j8 LOADADDR=0x40008000 uImage dtbs
make LLVM_SUFFIX=-19 LLVM=1 mrproper
make LLVM_SUFFIX=-19 LLVM=1 allmodconfig
make LLVM_SUFFIX=-19 LLVM=1 W=1 drivers/net/ethernet/freescale/mtipsw/ | tee llvm_build.log
make LLVM_SUFFIX=-19 LLVM=1 W=1 -j8 | tee llvm_build.log
6. Kernel compliance checks:
make coccicheck MODE=report J=4 M=drivers/net/ethernet/freescale/mtipsw/
~/work/src/smatch/smatch_scripts/kchecker drivers/net/ethernet/freescale/mtipsw/
7. GCC
make mrproper
make allmodconfig
make W=1 drivers/net/ethernet/freescale/mtipsw/
Links:
[1] - https://github.com/lmajewski/linux-imx28-l2switch/commits/master
[2] - https://github.com/lmajewski/linux-imx28-l2switch/tree/imx28-v5.12-L2-upstream-RFC_v1
[3] - https://source.denx.de/linux/linux-imx28-l2switch/-/tree/imx28-v5.12-L2-upstream-switchdev-RFC_v1?ref_type=heads
Lukasz Majewski (7):
dt-bindings: net: Add MTIP L2 switch description
net: mtip: The L2 switch driver for imx287
net: mtip: Add buffers management functions to the L2 switch driver
net: mtip: Add net_device_ops functions to the L2 switch driver
net: mtip: Add mtip_switch_{rx|tx} functions to the L2 switch driver
net: mtip: Extend the L2 switch driver with management operations
net: mtip: Extend the L2 switch driver for imx287 with bridge
operations
.../bindings/net/nxp,imx28-mtip-switch.yaml | 150 ++
MAINTAINERS | 7 +
drivers/net/ethernet/freescale/Kconfig | 1 +
drivers/net/ethernet/freescale/Makefile | 1 +
drivers/net/ethernet/freescale/mtipsw/Kconfig | 13 +
.../net/ethernet/freescale/mtipsw/Makefile | 4 +
.../net/ethernet/freescale/mtipsw/mtipl2sw.c | 1962 +++++++++++++++++
.../net/ethernet/freescale/mtipsw/mtipl2sw.h | 651 ++++++
.../ethernet/freescale/mtipsw/mtipl2sw_br.c | 132 ++
.../ethernet/freescale/mtipsw/mtipl2sw_mgnt.c | 443 ++++
10 files changed, 3364 insertions(+)
create mode 100644 Documentation/devicetree/bindings/net/nxp,imx28-mtip-switch.yaml
create mode 100644 drivers/net/ethernet/freescale/mtipsw/Kconfig
create mode 100644 drivers/net/ethernet/freescale/mtipsw/Makefile
create mode 100644 drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c
create mode 100644 drivers/net/ethernet/freescale/mtipsw/mtipl2sw.h
create mode 100644 drivers/net/ethernet/freescale/mtipsw/mtipl2sw_br.c
create mode 100644 drivers/net/ethernet/freescale/mtipsw/mtipl2sw_mgnt.c
--
2.39.5
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [net-next v18 2/7] net: mtip: The L2 switch driver for imx287
[not found] ` <20250813070755.1523898-3-lukasz.majewski@mailbox.org>
@ 2025-08-16 1:29 ` Jakub Kicinski
2025-08-18 20:07 ` Łukasz Majewski
0 siblings, 1 reply; 6+ messages in thread
From: Jakub Kicinski @ 2025-08-16 1:29 UTC (permalink / raw)
To: Lukasz Majewski
Cc: Andrew Lunn, davem, Eric Dumazet, Paolo Abeni, Rob Herring,
Krzysztof Kozlowski, Conor Dooley, Shawn Guo, Sascha Hauer,
Pengutronix Kernel Team, Fabio Estevam, Richard Cochran, netdev,
devicetree, linux-kernel, imx, linux-arm-kernel, Stefan Wahren,
Simon Horman, Andrew Lunn
On Wed, 13 Aug 2025 09:07:50 +0200 Lukasz Majewski wrote:
> + pkts = mtip_switch_rx(napi->dev, budget, &port);
> + if (pkts == -ENOMEM) {
> + napi_complete(napi);
> + return 0;
And what happens next? looks like you're not unmasking the interrupt in
this case so we'll never get an IRQ until timeout kicks in?
> + }
> +
> + if ((port == 1 || port == 2) && fep->ndev[port - 1])
> + mtip_switch_tx(fep->ndev[port - 1]);
> + else
> + mtip_switch_tx(napi->dev);
> +
> + if (pkts < budget) {
> + napi_complete_done(napi, pkts);
Please take napi_complete_done()'s return value into account
> + /* Set default interrupt mask for L2 switch */
> + writel(MCF_ESW_IMR_RXF | MCF_ESW_IMR_TXF,
> + fep->hwp + ESW_IMR);
> + }
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [net-next v18 5/7] net: mtip: Add mtip_switch_{rx|tx} functions to the L2 switch driver
[not found] ` <20250813070755.1523898-6-lukasz.majewski@mailbox.org>
@ 2025-08-16 1:33 ` Jakub Kicinski
2025-08-19 8:31 ` Łukasz Majewski
0 siblings, 1 reply; 6+ messages in thread
From: Jakub Kicinski @ 2025-08-16 1:33 UTC (permalink / raw)
To: Lukasz Majewski
Cc: Andrew Lunn, davem, Eric Dumazet, Paolo Abeni, Rob Herring,
Krzysztof Kozlowski, Conor Dooley, Shawn Guo, Sascha Hauer,
Pengutronix Kernel Team, Fabio Estevam, Richard Cochran, netdev,
devicetree, linux-kernel, imx, linux-arm-kernel, Stefan Wahren,
Simon Horman
On Wed, 13 Aug 2025 09:07:53 +0200 Lukasz Majewski wrote:
> + page = fep->page[bdp - fep->rx_bd_base];
> + /* Process the incoming frame */
> + pkt_len = bdp->cbd_datlen;
> +
> + dma_sync_single_for_cpu(&fep->pdev->dev, bdp->cbd_bufaddr,
> + pkt_len, DMA_FROM_DEVICE);
> + net_prefetch(page_address(page));
> + data = page_address(page);
> +
> + if (fep->quirks & FEC_QUIRK_SWAP_FRAME)
> + swap_buffer(data, pkt_len);
> +
> + eth_hdr = (struct ethhdr *)data;
> + mtip_atable_get_entry_port_number(fep, eth_hdr->h_source,
> + &rx_port);
> + if (rx_port == MTIP_PORT_FORWARDING_INIT)
> + mtip_atable_dynamicms_learn_migration(fep,
> + mtip_get_time(),
> + eth_hdr->h_source,
> + &rx_port);
> +
> + if ((rx_port == 1 || rx_port == 2) && fep->ndev[rx_port - 1])
> + pndev = fep->ndev[rx_port - 1];
> + else
> + pndev = dev;
> +
> + *port = rx_port;
> +
> + /* This does 16 byte alignment, exactly what we need.
> + * The packet length includes FCS, but we don't want to
> + * include that when passing upstream as it messes up
> + * bridging applications.
> + */
> + skb = netdev_alloc_skb(pndev, pkt_len + NET_IP_ALIGN);
> + if (unlikely(!skb)) {
> + dev_dbg(&fep->pdev->dev,
> + "%s: Memory squeeze, dropping packet.\n",
> + pndev->name);
> + page_pool_recycle_direct(fep->page_pool, page);
> + pndev->stats.rx_dropped++;
> + return -ENOMEM;
> + }
> +
> + skb_reserve(skb, NET_IP_ALIGN);
> + skb_put(skb, pkt_len); /* Make room */
> + skb_copy_to_linear_data(skb, data, pkt_len);
> + skb->protocol = eth_type_trans(skb, pndev);
> + skb->offload_fwd_mark = fep->br_offload;
> + napi_gro_receive(&fep->napi, skb);
The rx buffer circulation is very odd. You seem to pre-allocate buffers
for the full ring from a page_pool. And then copy the data out of those
pages. The normal process is that after packet is received a new page is
allocated to give to HW, and old is attached to an skb, and sent up the
stack.
Also you are releasing the page to be recycled without clearing it from
the ring. I think you'd free it again on shutdown, so it's a
double-free.
--
pw-bot: cr
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [net-next v18 2/7] net: mtip: The L2 switch driver for imx287
2025-08-16 1:29 ` [net-next v18 2/7] net: mtip: The L2 switch driver for imx287 Jakub Kicinski
@ 2025-08-18 20:07 ` Łukasz Majewski
0 siblings, 0 replies; 6+ messages in thread
From: Łukasz Majewski @ 2025-08-18 20:07 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Andrew Lunn, davem, Eric Dumazet, Paolo Abeni, Rob Herring,
Krzysztof Kozlowski, Conor Dooley, Shawn Guo, Sascha Hauer,
Pengutronix Kernel Team, Fabio Estevam, Richard Cochran, netdev,
devicetree, linux-kernel, imx, linux-arm-kernel, Stefan Wahren,
Simon Horman, Andrew Lunn
Hi Jakub,
> On Wed, 13 Aug 2025 09:07:50 +0200 Lukasz Majewski wrote:
> > + pkts = mtip_switch_rx(napi->dev, budget, &port);
> > + if (pkts == -ENOMEM) {
> > + napi_complete(napi);
> > + return 0;
>
> And what happens next? looks like you're not unmasking the interrupt
> in this case so we'll never get an IRQ until timeout kicks in?
Good point - I shall set the "default" set of interrupts before return
0;
>
> > + }
> > +
> > + if ((port == 1 || port == 2) && fep->ndev[port - 1])
> > + mtip_switch_tx(fep->ndev[port - 1]);
> > + else
> > + mtip_switch_tx(napi->dev);
> > +
> > + if (pkts < budget) {
> > + napi_complete_done(napi, pkts);
>
> Please take napi_complete_done()'s return value into account
Ok.
>
> > + /* Set default interrupt mask for L2 switch */
> > + writel(MCF_ESW_IMR_RXF | MCF_ESW_IMR_TXF,
> > + fep->hwp + ESW_IMR);
> > + }
--
Best regards,
Łukasz Majewski
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [net-next v18 5/7] net: mtip: Add mtip_switch_{rx|tx} functions to the L2 switch driver
2025-08-16 1:33 ` [net-next v18 5/7] net: mtip: Add mtip_switch_{rx|tx} functions to the L2 switch driver Jakub Kicinski
@ 2025-08-19 8:31 ` Łukasz Majewski
2025-08-19 14:42 ` Jakub Kicinski
0 siblings, 1 reply; 6+ messages in thread
From: Łukasz Majewski @ 2025-08-19 8:31 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Andrew Lunn, davem, Eric Dumazet, Paolo Abeni, Rob Herring,
Krzysztof Kozlowski, Conor Dooley, Shawn Guo, Sascha Hauer,
Pengutronix Kernel Team, Fabio Estevam, Richard Cochran, netdev,
devicetree, linux-kernel, imx, linux-arm-kernel, Stefan Wahren,
Simon Horman
Hi Jakub,
> On Wed, 13 Aug 2025 09:07:53 +0200 Lukasz Majewski wrote:
> > + page = fep->page[bdp - fep->rx_bd_base];
> > + /* Process the incoming frame */
> > + pkt_len = bdp->cbd_datlen;
> > +
> > + dma_sync_single_for_cpu(&fep->pdev->dev,
> > bdp->cbd_bufaddr,
> > + pkt_len, DMA_FROM_DEVICE);
> > + net_prefetch(page_address(page));
> > + data = page_address(page);
> > +
> > + if (fep->quirks & FEC_QUIRK_SWAP_FRAME)
> > + swap_buffer(data, pkt_len);
> > +
> > + eth_hdr = (struct ethhdr *)data;
> > + mtip_atable_get_entry_port_number(fep,
> > eth_hdr->h_source,
> > + &rx_port);
> > + if (rx_port == MTIP_PORT_FORWARDING_INIT)
> > + mtip_atable_dynamicms_learn_migration(fep,
> > +
> > mtip_get_time(),
> > +
> > eth_hdr->h_source,
> > +
> > &rx_port); +
> > + if ((rx_port == 1 || rx_port == 2) &&
> > fep->ndev[rx_port - 1])
> > + pndev = fep->ndev[rx_port - 1];
> > + else
> > + pndev = dev;
> > +
> > + *port = rx_port;
> > +
> > + /* This does 16 byte alignment, exactly what we
> > need.
> > + * The packet length includes FCS, but we don't
> > want to
> > + * include that when passing upstream as it messes
> > up
> > + * bridging applications.
> > + */
> > + skb = netdev_alloc_skb(pndev, pkt_len +
> > NET_IP_ALIGN);
> > + if (unlikely(!skb)) {
> > + dev_dbg(&fep->pdev->dev,
> > + "%s: Memory squeeze, dropping
> > packet.\n",
> > + pndev->name);
> > + page_pool_recycle_direct(fep->page_pool,
> > page);
> > + pndev->stats.rx_dropped++;
> > + return -ENOMEM;
> > + }
> > +
> > + skb_reserve(skb, NET_IP_ALIGN);
> > + skb_put(skb, pkt_len); /* Make room */
> > + skb_copy_to_linear_data(skb, data, pkt_len);
> > + skb->protocol = eth_type_trans(skb, pndev);
> > + skb->offload_fwd_mark = fep->br_offload;
> > + napi_gro_receive(&fep->napi, skb);
>
> The rx buffer circulation is very odd.
The fec_main.c uses page_pool_alloc_pages() to allocate RX page from
the pool.
At the RX function the __build_skb(data, ...) is called to create skb.
Last step with the RX function is to call skb_mark_for_recycle(skb),
which sets skb->pp_recycle = 1.
And yes, in the MTIP I do copy the data to the newly created skb in RX
function (anyway, I need to swap bytes in the buffer).
It seems like extra copy is performed in the RX function.
> You seem to pre-allocate
> buffers for the full ring from a page_pool. And then copy the data
> out of those pages.
Yes, correct.
> The normal process is that after packet is
> received a new page is allocated to give to HW, and old is attached
> to an skb, and sent up the stack.
Ok.
>
> Also you are releasing the page to be recycled without clearing it
> from the ring. I think you'd free it again on shutdown, so it's a
> double-free.
No, the page is persistent. It will be removed when the driver is
closed and memory for pages and descriptors is released.
--
Best regards,
Łukasz Majewski
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [net-next v18 5/7] net: mtip: Add mtip_switch_{rx|tx} functions to the L2 switch driver
2025-08-19 8:31 ` Łukasz Majewski
@ 2025-08-19 14:42 ` Jakub Kicinski
0 siblings, 0 replies; 6+ messages in thread
From: Jakub Kicinski @ 2025-08-19 14:42 UTC (permalink / raw)
To: Łukasz Majewski
Cc: Andrew Lunn, davem, Eric Dumazet, Paolo Abeni, Rob Herring,
Krzysztof Kozlowski, Conor Dooley, Shawn Guo, Sascha Hauer,
Pengutronix Kernel Team, Fabio Estevam, Richard Cochran, netdev,
devicetree, linux-kernel, imx, linux-arm-kernel, Stefan Wahren,
Simon Horman
On Tue, 19 Aug 2025 10:31:19 +0200 Łukasz Majewski wrote:
> > The rx buffer circulation is very odd.
>
> The fec_main.c uses page_pool_alloc_pages() to allocate RX page from
> the pool.
>
> At the RX function the __build_skb(data, ...) is called to create skb.
>
> Last step with the RX function is to call skb_mark_for_recycle(skb),
> which sets skb->pp_recycle = 1.
>
> And yes, in the MTIP I do copy the data to the newly created skb in RX
> function (anyway, I need to swap bytes in the buffer).
>
> It seems like extra copy is performed in the RX function.
Right, so the use of page pool is entirely pointless.
The strength of the page pool is recycling the pages.
If you don't free / allocate pages on the fast path you're
just paying the extra overhead (of having a populated cache)
> > Also you are releasing the page to be recycled without clearing it
> > from the ring. I think you'd free it again on shutdown, so it's a
> > double-free.
>
> No, the page is persistent. It will be removed when the driver is
> closed and memory for pages and descriptors is released.
So remove the page_pool_recycle_direct() call, please:
static int mtip_switch_rx(struct net_device *dev, int budget, int *port)
...
skb = netdev_alloc_skb(pndev, pkt_len + NET_IP_ALIGN);
if (unlikely(!skb)) {
...
page_pool_recycle_direct(fep->page_pool, page);
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-08-19 21:10 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-13 7:07 [net-next v18 0/7] net: mtip: Add support for MTIP imx287 L2 switch driver Lukasz Majewski
[not found] ` <20250813070755.1523898-3-lukasz.majewski@mailbox.org>
2025-08-16 1:29 ` [net-next v18 2/7] net: mtip: The L2 switch driver for imx287 Jakub Kicinski
2025-08-18 20:07 ` Łukasz Majewski
[not found] ` <20250813070755.1523898-6-lukasz.majewski@mailbox.org>
2025-08-16 1:33 ` [net-next v18 5/7] net: mtip: Add mtip_switch_{rx|tx} functions to the L2 switch driver Jakub Kicinski
2025-08-19 8:31 ` Łukasz Majewski
2025-08-19 14:42 ` Jakub Kicinski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).