linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [net-next v18 0/7] net: mtip: Add support for MTIP imx287 L2 switch driver
@ 2025-08-13  7:07 Lukasz Majewski
       [not found] ` <20250813070755.1523898-3-lukasz.majewski@mailbox.org>
       [not found] ` <20250813070755.1523898-6-lukasz.majewski@mailbox.org>
  0 siblings, 2 replies; 6+ messages in thread
From: Lukasz Majewski @ 2025-08-13  7:07 UTC (permalink / raw)
  To: Andrew Lunn, davem, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Rob Herring, Krzysztof Kozlowski, Conor Dooley, Shawn Guo
  Cc: Sascha Hauer, Pengutronix Kernel Team, Fabio Estevam,
	Richard Cochran, netdev, devicetree, linux-kernel, imx,
	linux-arm-kernel, Stefan Wahren, Simon Horman, Lukasz Majewski

This patch series adds support for More Than IP's L2 switch driver embedded
in some NXP's SoCs. This one has been tested on imx287, but is also available
in the vf610.

In the past there has been performed some attempts to upstream this driver:

1. The 4.19-cip based one [1]
2. DSA based one for 5.12 [2] - i.e. the switch itself was treat as a DSA switch
   with NO tag appended.
3. The extension for FEC driver for 5.12 [3] - the trick here was to fully reuse
   FEC when the in-HW switching is disabled. When bridge offloading is enabled,
   the driver uses already configured MAC and PHY to also configure PHY.

All three approaches were not accepted as eligible for upstreaming.

The driver from this series has floowing features:

1. It is fully separated from fec_main - i.e. can be used interchangeable
   with it. To be more specific - one can build them as modules and
   if required switch between them when e.g. bridge offloading is required.

   To be more specific:
        - Use FEC_MAIN: When one needs support for two ETH ports with separate
          uDMAs used for both and bridging can be realized in SW.

        - Use MTIPL2SW: When it is enough to support two ports with only uDMA0
          attached to switch and bridging shall be offloaded to HW. 

2. This driver uses MTIP's L2 switch internal VLAN feature to provide port
   separation at boot time. Port separation is disabled when bridging is
   required.

3. Example usage:
        Configuration:
        ip link set lan0 up; sleep 1;
        ip link set lan1 up; sleep 1;
        ip link add name br0 type bridge;
        ip link set br0 up; sleep 1;
        ip link set lan0 master br0;
        ip link set lan1 master br0;
        bridge link;
        ip addr add 192.168.2.17/24 dev br0;
        ping -c 5 192.168.2.222

        Removal:
        ip link set br0 down;
        ip link delete br0 type bridge;
        ip link set dev lan1 down
        ip link set dev lan0 down

4. Limitations:
        - Driver enables and disables switch operation with learning and ageing.
        - Missing is the advanced configuration (e.g. adding entries to FBD). This is
          on purpose, as up till now we didn't had consensus about how the driver
          shall be added to Linux.

5. Clang build:
	make LLVM_SUFFIX=-19 LLVM=1 mrproper
	cp ./arch/arm/configs/mxs_defconfig .config
	make ARCH=arm LLVM_SUFFIX=-19 LLVM=1 W=1 menuconfig
	make ARCH=arm LLVM_SUFFIX=-19 LLVM=1 W=1 -j8 LOADADDR=0x40008000 uImage dtbs

        make LLVM_SUFFIX=-19 LLVM=1 mrproper
        make LLVM_SUFFIX=-19 LLVM=1 allmodconfig
        make LLVM_SUFFIX=-19 LLVM=1 W=1 drivers/net/ethernet/freescale/mtipsw/ | tee llvm_build.log
        make LLVM_SUFFIX=-19 LLVM=1 W=1 -j8 | tee llvm_build.log

6. Kernel compliance checks:
	make coccicheck MODE=report J=4 M=drivers/net/ethernet/freescale/mtipsw/
	~/work/src/smatch/smatch_scripts/kchecker drivers/net/ethernet/freescale/mtipsw/

7. GCC
        make mrproper
        make allmodconfig
        make W=1 drivers/net/ethernet/freescale/mtipsw/

Links:
[1] - https://github.com/lmajewski/linux-imx28-l2switch/commits/master
[2] - https://github.com/lmajewski/linux-imx28-l2switch/tree/imx28-v5.12-L2-upstream-RFC_v1
[3] - https://source.denx.de/linux/linux-imx28-l2switch/-/tree/imx28-v5.12-L2-upstream-switchdev-RFC_v1?ref_type=heads

Lukasz Majewski (7):
  dt-bindings: net: Add MTIP L2 switch description
  net: mtip: The L2 switch driver for imx287
  net: mtip: Add buffers management functions to the L2 switch driver
  net: mtip: Add net_device_ops functions to the L2 switch driver
  net: mtip: Add mtip_switch_{rx|tx} functions to the L2 switch driver
  net: mtip: Extend the L2 switch driver with management operations
  net: mtip: Extend the L2 switch driver for imx287 with bridge
    operations

 .../bindings/net/nxp,imx28-mtip-switch.yaml   |  150 ++
 MAINTAINERS                                   |    7 +
 drivers/net/ethernet/freescale/Kconfig        |    1 +
 drivers/net/ethernet/freescale/Makefile       |    1 +
 drivers/net/ethernet/freescale/mtipsw/Kconfig |   13 +
 .../net/ethernet/freescale/mtipsw/Makefile    |    4 +
 .../net/ethernet/freescale/mtipsw/mtipl2sw.c  | 1962 +++++++++++++++++
 .../net/ethernet/freescale/mtipsw/mtipl2sw.h  |  651 ++++++
 .../ethernet/freescale/mtipsw/mtipl2sw_br.c   |  132 ++
 .../ethernet/freescale/mtipsw/mtipl2sw_mgnt.c |  443 ++++
 10 files changed, 3364 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/net/nxp,imx28-mtip-switch.yaml
 create mode 100644 drivers/net/ethernet/freescale/mtipsw/Kconfig
 create mode 100644 drivers/net/ethernet/freescale/mtipsw/Makefile
 create mode 100644 drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c
 create mode 100644 drivers/net/ethernet/freescale/mtipsw/mtipl2sw.h
 create mode 100644 drivers/net/ethernet/freescale/mtipsw/mtipl2sw_br.c
 create mode 100644 drivers/net/ethernet/freescale/mtipsw/mtipl2sw_mgnt.c

-- 
2.39.5



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [net-next v18 2/7] net: mtip: The L2 switch driver for imx287
       [not found] ` <20250813070755.1523898-3-lukasz.majewski@mailbox.org>
@ 2025-08-16  1:29   ` Jakub Kicinski
  2025-08-18 20:07     ` Łukasz Majewski
  0 siblings, 1 reply; 6+ messages in thread
From: Jakub Kicinski @ 2025-08-16  1:29 UTC (permalink / raw)
  To: Lukasz Majewski
  Cc: Andrew Lunn, davem, Eric Dumazet, Paolo Abeni, Rob Herring,
	Krzysztof Kozlowski, Conor Dooley, Shawn Guo, Sascha Hauer,
	Pengutronix Kernel Team, Fabio Estevam, Richard Cochran, netdev,
	devicetree, linux-kernel, imx, linux-arm-kernel, Stefan Wahren,
	Simon Horman, Andrew Lunn

On Wed, 13 Aug 2025 09:07:50 +0200 Lukasz Majewski wrote:
> +	pkts = mtip_switch_rx(napi->dev, budget, &port);
> +	if (pkts == -ENOMEM) {
> +		napi_complete(napi);
> +		return 0;

And what happens next? looks like you're not unmasking the interrupt in
this case so we'll never get an IRQ until timeout kicks in?

> +	}
> +
> +	if ((port == 1 || port == 2) && fep->ndev[port - 1])
> +		mtip_switch_tx(fep->ndev[port - 1]);
> +	else
> +		mtip_switch_tx(napi->dev);
> +
> +	if (pkts < budget) {
> +		napi_complete_done(napi, pkts);

Please take napi_complete_done()'s return value into account

> +		/* Set default interrupt mask for L2 switch */
> +		writel(MCF_ESW_IMR_RXF | MCF_ESW_IMR_TXF,
> +		       fep->hwp + ESW_IMR);
> +	}


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [net-next v18 5/7] net: mtip: Add mtip_switch_{rx|tx} functions to the L2 switch driver
       [not found] ` <20250813070755.1523898-6-lukasz.majewski@mailbox.org>
@ 2025-08-16  1:33   ` Jakub Kicinski
  2025-08-19  8:31     ` Łukasz Majewski
  0 siblings, 1 reply; 6+ messages in thread
From: Jakub Kicinski @ 2025-08-16  1:33 UTC (permalink / raw)
  To: Lukasz Majewski
  Cc: Andrew Lunn, davem, Eric Dumazet, Paolo Abeni, Rob Herring,
	Krzysztof Kozlowski, Conor Dooley, Shawn Guo, Sascha Hauer,
	Pengutronix Kernel Team, Fabio Estevam, Richard Cochran, netdev,
	devicetree, linux-kernel, imx, linux-arm-kernel, Stefan Wahren,
	Simon Horman

On Wed, 13 Aug 2025 09:07:53 +0200 Lukasz Majewski wrote:
> +		page = fep->page[bdp - fep->rx_bd_base];
> +		/* Process the incoming frame */
> +		pkt_len = bdp->cbd_datlen;
> +
> +		dma_sync_single_for_cpu(&fep->pdev->dev, bdp->cbd_bufaddr,
> +					pkt_len, DMA_FROM_DEVICE);
> +		net_prefetch(page_address(page));
> +		data = page_address(page);
> +
> +		if (fep->quirks & FEC_QUIRK_SWAP_FRAME)
> +			swap_buffer(data, pkt_len);
> +
> +		eth_hdr = (struct ethhdr *)data;
> +		mtip_atable_get_entry_port_number(fep, eth_hdr->h_source,
> +						  &rx_port);
> +		if (rx_port == MTIP_PORT_FORWARDING_INIT)
> +			mtip_atable_dynamicms_learn_migration(fep,
> +							      mtip_get_time(),
> +							      eth_hdr->h_source,
> +							      &rx_port);
> +
> +		if ((rx_port == 1 || rx_port == 2) && fep->ndev[rx_port - 1])
> +			pndev = fep->ndev[rx_port - 1];
> +		else
> +			pndev = dev;
> +
> +		*port = rx_port;
> +
> +		/* This does 16 byte alignment, exactly what we need.
> +		 * The packet length includes FCS, but we don't want to
> +		 * include that when passing upstream as it messes up
> +		 * bridging applications.
> +		 */
> +		skb = netdev_alloc_skb(pndev, pkt_len + NET_IP_ALIGN);
> +		if (unlikely(!skb)) {
> +			dev_dbg(&fep->pdev->dev,
> +				"%s: Memory squeeze, dropping packet.\n",
> +				pndev->name);
> +			page_pool_recycle_direct(fep->page_pool, page);
> +			pndev->stats.rx_dropped++;
> +			return -ENOMEM;
> +		}
> +
> +		skb_reserve(skb, NET_IP_ALIGN);
> +		skb_put(skb, pkt_len);      /* Make room */
> +		skb_copy_to_linear_data(skb, data, pkt_len);
> +		skb->protocol = eth_type_trans(skb, pndev);
> +		skb->offload_fwd_mark = fep->br_offload;
> +		napi_gro_receive(&fep->napi, skb);

The rx buffer circulation is very odd. You seem to pre-allocate buffers
for the full ring from a page_pool. And then copy the data out of those
pages. The normal process is that after packet is received a new page is
allocated to give to HW, and old is attached to an skb, and sent up the
stack.

Also you are releasing the page to be recycled without clearing it from
the ring. I think you'd free it again on shutdown, so it's a
double-free.
-- 
pw-bot: cr


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [net-next v18 2/7] net: mtip: The L2 switch driver for imx287
  2025-08-16  1:29   ` [net-next v18 2/7] net: mtip: The L2 switch driver for imx287 Jakub Kicinski
@ 2025-08-18 20:07     ` Łukasz Majewski
  0 siblings, 0 replies; 6+ messages in thread
From: Łukasz Majewski @ 2025-08-18 20:07 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Andrew Lunn, davem, Eric Dumazet, Paolo Abeni, Rob Herring,
	Krzysztof Kozlowski, Conor Dooley, Shawn Guo, Sascha Hauer,
	Pengutronix Kernel Team, Fabio Estevam, Richard Cochran, netdev,
	devicetree, linux-kernel, imx, linux-arm-kernel, Stefan Wahren,
	Simon Horman, Andrew Lunn

Hi Jakub,

> On Wed, 13 Aug 2025 09:07:50 +0200 Lukasz Majewski wrote:
> > +	pkts = mtip_switch_rx(napi->dev, budget, &port);
> > +	if (pkts == -ENOMEM) {
> > +		napi_complete(napi);
> > +		return 0;  
> 
> And what happens next? looks like you're not unmasking the interrupt
> in this case so we'll never get an IRQ until timeout kicks in?

Good point - I shall set the "default" set of interrupts before return
0;

> 
> > +	}
> > +
> > +	if ((port == 1 || port == 2) && fep->ndev[port - 1])
> > +		mtip_switch_tx(fep->ndev[port - 1]);
> > +	else
> > +		mtip_switch_tx(napi->dev);
> > +
> > +	if (pkts < budget) {
> > +		napi_complete_done(napi, pkts);  
> 
> Please take napi_complete_done()'s return value into account

Ok.

> 
> > +		/* Set default interrupt mask for L2 switch */
> > +		writel(MCF_ESW_IMR_RXF | MCF_ESW_IMR_TXF,
> > +		       fep->hwp + ESW_IMR);
> > +	}  



-- 
Best regards,

Łukasz Majewski


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [net-next v18 5/7] net: mtip: Add mtip_switch_{rx|tx} functions to the L2 switch driver
  2025-08-16  1:33   ` [net-next v18 5/7] net: mtip: Add mtip_switch_{rx|tx} functions to the L2 switch driver Jakub Kicinski
@ 2025-08-19  8:31     ` Łukasz Majewski
  2025-08-19 14:42       ` Jakub Kicinski
  0 siblings, 1 reply; 6+ messages in thread
From: Łukasz Majewski @ 2025-08-19  8:31 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Andrew Lunn, davem, Eric Dumazet, Paolo Abeni, Rob Herring,
	Krzysztof Kozlowski, Conor Dooley, Shawn Guo, Sascha Hauer,
	Pengutronix Kernel Team, Fabio Estevam, Richard Cochran, netdev,
	devicetree, linux-kernel, imx, linux-arm-kernel, Stefan Wahren,
	Simon Horman

Hi Jakub,

> On Wed, 13 Aug 2025 09:07:53 +0200 Lukasz Majewski wrote:
> > +		page = fep->page[bdp - fep->rx_bd_base];
> > +		/* Process the incoming frame */
> > +		pkt_len = bdp->cbd_datlen;
> > +
> > +		dma_sync_single_for_cpu(&fep->pdev->dev,
> > bdp->cbd_bufaddr,
> > +					pkt_len, DMA_FROM_DEVICE);
> > +		net_prefetch(page_address(page));
> > +		data = page_address(page);
> > +
> > +		if (fep->quirks & FEC_QUIRK_SWAP_FRAME)
> > +			swap_buffer(data, pkt_len);
> > +
> > +		eth_hdr = (struct ethhdr *)data;
> > +		mtip_atable_get_entry_port_number(fep,
> > eth_hdr->h_source,
> > +						  &rx_port);
> > +		if (rx_port == MTIP_PORT_FORWARDING_INIT)
> > +			mtip_atable_dynamicms_learn_migration(fep,
> > +
> > mtip_get_time(),
> > +
> > eth_hdr->h_source,
> > +
> > &rx_port); +
> > +		if ((rx_port == 1 || rx_port == 2) &&
> > fep->ndev[rx_port - 1])
> > +			pndev = fep->ndev[rx_port - 1];
> > +		else
> > +			pndev = dev;
> > +
> > +		*port = rx_port;
> > +
> > +		/* This does 16 byte alignment, exactly what we
> > need.
> > +		 * The packet length includes FCS, but we don't
> > want to
> > +		 * include that when passing upstream as it messes
> > up
> > +		 * bridging applications.
> > +		 */
> > +		skb = netdev_alloc_skb(pndev, pkt_len +
> > NET_IP_ALIGN);
> > +		if (unlikely(!skb)) {
> > +			dev_dbg(&fep->pdev->dev,
> > +				"%s: Memory squeeze, dropping
> > packet.\n",
> > +				pndev->name);
> > +			page_pool_recycle_direct(fep->page_pool,
> > page);
> > +			pndev->stats.rx_dropped++;
> > +			return -ENOMEM;
> > +		}
> > +
> > +		skb_reserve(skb, NET_IP_ALIGN);
> > +		skb_put(skb, pkt_len);      /* Make room */
> > +		skb_copy_to_linear_data(skb, data, pkt_len);
> > +		skb->protocol = eth_type_trans(skb, pndev);
> > +		skb->offload_fwd_mark = fep->br_offload;
> > +		napi_gro_receive(&fep->napi, skb);  
> 
> The rx buffer circulation is very odd.

The fec_main.c uses page_pool_alloc_pages() to allocate RX page from
the pool.

At the RX function the __build_skb(data, ...) is called to create skb.

Last step with the RX function is to call skb_mark_for_recycle(skb),
which sets skb->pp_recycle = 1.

And yes, in the MTIP I do copy the data to the newly created skb in RX
function (anyway, I need to swap bytes in the buffer). 

It seems like extra copy is performed in the RX function.

> You seem to pre-allocate
> buffers for the full ring from a page_pool. And then copy the data
> out of those pages.

Yes, correct.

> The normal process is that after packet is
> received a new page is allocated to give to HW, and old is attached
> to an skb, and sent up the stack.

Ok.

> 
> Also you are releasing the page to be recycled without clearing it
> from the ring. I think you'd free it again on shutdown, so it's a
> double-free.

No, the page is persistent. It will be removed when the driver is
closed and memory for pages and descriptors is released.

-- 
Best regards,

Łukasz Majewski


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [net-next v18 5/7] net: mtip: Add mtip_switch_{rx|tx} functions to the L2 switch driver
  2025-08-19  8:31     ` Łukasz Majewski
@ 2025-08-19 14:42       ` Jakub Kicinski
  0 siblings, 0 replies; 6+ messages in thread
From: Jakub Kicinski @ 2025-08-19 14:42 UTC (permalink / raw)
  To: Łukasz Majewski
  Cc: Andrew Lunn, davem, Eric Dumazet, Paolo Abeni, Rob Herring,
	Krzysztof Kozlowski, Conor Dooley, Shawn Guo, Sascha Hauer,
	Pengutronix Kernel Team, Fabio Estevam, Richard Cochran, netdev,
	devicetree, linux-kernel, imx, linux-arm-kernel, Stefan Wahren,
	Simon Horman

On Tue, 19 Aug 2025 10:31:19 +0200 Łukasz Majewski wrote:
> > The rx buffer circulation is very odd.  
> 
> The fec_main.c uses page_pool_alloc_pages() to allocate RX page from
> the pool.
> 
> At the RX function the __build_skb(data, ...) is called to create skb.
> 
> Last step with the RX function is to call skb_mark_for_recycle(skb),
> which sets skb->pp_recycle = 1.
> 
> And yes, in the MTIP I do copy the data to the newly created skb in RX
> function (anyway, I need to swap bytes in the buffer). 
> 
> It seems like extra copy is performed in the RX function.

Right, so the use of page pool is entirely pointless.
The strength of the page pool is recycling the pages.
If you don't free / allocate pages on the fast path you're 
just paying the extra overhead (of having a populated cache)

> > Also you are releasing the page to be recycled without clearing it
> > from the ring. I think you'd free it again on shutdown, so it's a
> > double-free.  
> 
> No, the page is persistent. It will be removed when the driver is
> closed and memory for pages and descriptors is released.

So remove the page_pool_recycle_direct() call, please:

  static int mtip_switch_rx(struct net_device *dev, int budget, int *port)
	...
	skb = netdev_alloc_skb(pndev, pkt_len + NET_IP_ALIGN);
	if (unlikely(!skb)) {
			...
			page_pool_recycle_direct(fep->page_pool, page);


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-08-19 21:10 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-13  7:07 [net-next v18 0/7] net: mtip: Add support for MTIP imx287 L2 switch driver Lukasz Majewski
     [not found] ` <20250813070755.1523898-3-lukasz.majewski@mailbox.org>
2025-08-16  1:29   ` [net-next v18 2/7] net: mtip: The L2 switch driver for imx287 Jakub Kicinski
2025-08-18 20:07     ` Łukasz Majewski
     [not found] ` <20250813070755.1523898-6-lukasz.majewski@mailbox.org>
2025-08-16  1:33   ` [net-next v18 5/7] net: mtip: Add mtip_switch_{rx|tx} functions to the L2 switch driver Jakub Kicinski
2025-08-19  8:31     ` Łukasz Majewski
2025-08-19 14:42       ` Jakub Kicinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).