[net-next v24 0/7] net: mtip: Add support for MTIP imx287 L2 switch driver

All of lore.kernel.org
 help / color / mirror / Atom feed

* [net-next v24 0/7] net: mtip: Add support for MTIP imx287 L2 switch driver
@ 2026-06-01 11:24 Lukasz Majewski
  2026-06-01 11:24 ` [net-next v24 1/7] dt-bindings: net: Add MTIP L2 switch description Lukasz Majewski
                   ` (6 more replies)
  0 siblings, 7 replies; 21+ messages in thread
From: Lukasz Majewski @ 2026-06-01 11:24 UTC (permalink / raw)
  To: Andrew Lunn, davem, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Richard Cochran, Liam Girdwood, Nathan Chancellor, linux-kernel,
	netdev, Krzysztof Kozlowski, imx, linux-arm-kernel, Rob Herring,
	Lukasz Majewski

This patch series adds support for More Than IP's L2 switch driver embedded
in some NXP's SoCs. This one has been tested on imx287, but is also available
in the vf610.

In the past there has been performed some attempts to upstream this driver:

1. The 4.19-cip based one [1]
2. DSA based one for 5.12 [2] - i.e. the switch itself was treat as a DSA switch
   with NO tag appended.
3. The extension for FEC driver for 5.12 [3] - the trick here was to fully reuse
   FEC when the in-HW switching is disabled. When bridge offloading is enabled,
   the driver uses already configured MAC and PHY to also configure PHY.

All three approaches were not accepted as eligible for upstreaming.

The driver from this series has following features:

1. It is fully separated from fec_main - i.e. can be used interchangeable
   with it. To be more specific - one can build them as modules and
   if required switch between them when e.g. bridge offloading is required.

   To be more specific:
        - Use FEC_MAIN: When one needs support for two ETH ports with separate
          uDMAs used for both and bridging can be realized in SW.

        - Use MTIPL2SW: When it is enough to support two ports with only uDMA0
          attached to switch and bridging shall be offloaded to HW. 

2. This driver uses MTIP's L2 switch internal VLAN feature to provide port
   separation at boot time. Port separation is disabled when bridging is
   required.

3. Example usage:
        Configuration:
        ip link set lan0 up; sleep 1;
        ip link set lan1 up; sleep 1;
        ip link add name br0 type bridge;
        ip link set br0 up; sleep 1;
        ip link set lan0 master br0;
        ip link set lan1 master br0;
        bridge link;
        ip addr add 192.168.2.17/24 dev br0;
        ping -c 5 192.168.2.222

        Removal:
        ip link set br0 down;
        ip link delete br0 type bridge;
        ip link set dev lan1 down
        ip link set dev lan0 down

4. Limitations:
        - Driver enables and disables switch operation with learning and ageing.
        - Missing is the advanced configuration (e.g. adding entries to FBD). This is
          on purpose, as up till now we didn't had consensus about how the driver
          shall be added to Linux.
          However, on top of this patch set the code with SWITCHDEV support (v6.6.
          PREEMPT_RT enabled) has been implemented in [4].

5. Clang build:
	make LLVM_SUFFIX=-19 LLVM=1 mrproper
	cp ./arch/arm/configs/mxs_defconfig .config
	make ARCH=arm LLVM_SUFFIX=-19 LLVM=1 W=1 menuconfig
	make ARCH=arm LLVM_SUFFIX=-19 LLVM=1 W=1 -j8 LOADADDR=0x40008000 uImage dtbs

        make LLVM_SUFFIX=-19 LLVM=1 mrproper
        make LLVM_SUFFIX=-19 LLVM=1 allmodconfig
        make LLVM_SUFFIX=-19 LLVM=1 W=1 drivers/net/ethernet/freescale/mtipsw/ | tee llvm_build.log
        make LLVM_SUFFIX=-19 LLVM=1 W=1 -j8 | tee llvm_build.log

6. Kernel compliance checks:
	make coccicheck MODE=report J=4 M=drivers/net/ethernet/freescale/mtipsw/
        make allmodconfig; ~/work/src/smatch/smatch_scripts/kchecker drivers/net/ethernet/freescale/mtipsw/

7. GCC:
        make mrproper
        make allmodconfig
        make W=1 drivers/net/ethernet/freescale/mtipsw/

        
        [source OE/Yocto SDK build environment]
        CROSS_COMPILE=arm-poky-linux-gnueabi- ARCH=arm make mrproper
        cp ./arch/arm/configs/mxs_defconfig .config
        CROSS_COMPILE=arm-poky-linux-gnueabi- ARCH=arm make menuconfig
        CROSS_COMPILE=arm-poky-linux-gnueabi- ARCH=arm make -j8 LOADADDR=0x40008000 uImage dtbs

8. DT_SCHEMA checks:
        source ~/.venv/bin/activate
        source /opt/poky/3.1.31/environment-setup-armv5e-poky-linux-gnueabi
        make dt_binding_check DT_SCHEMA_FILES=nxp,imx28-mtip-switch.yaml
        make CHECK_DTBS=y DT_SCHEMA_FILES=nxp,imx28-mtip-switch.yaml nxp/mxs/imx28-xea.dtb


Links:
[1] - https://github.com/lmajewski/linux-imx28-l2switch/commits/master
[2] - https://github.com/lmajewski/linux-imx28-l2switch/tree/imx28-v5.12-L2-upstream-RFC_v1
[3] - https://github.com/lmajewski/linux-imx28-l2switch/commits/imx28-v5.12-L2-upstream-RFC_v1
[4] - https://github.com/lmajewski/linux-imx28-l2switch/commits/vf610-linux-6.6.y-mtipl2sw


Lukasz Majewski (7):
  dt-bindings: net: Add MTIP L2 switch description
  net: mtip: The L2 switch driver for imx287
  net: mtip: Add buffers management functions to the L2 switch driver
  net: mtip: Add net_device_ops functions to the L2 switch driver
  net: mtip: Add mtip_switch_{rx|tx} functions to the L2 switch driver
  net: mtip: Extend the L2 switch driver with management operations
  net: mtip: Extend the L2 switch driver for imx287 with bridge
    operations

 .../bindings/net/nxp,imx28-mtip-switch.yaml   |  150 ++
 MAINTAINERS                                   |    7 +
 drivers/net/ethernet/freescale/Kconfig        |    1 +
 drivers/net/ethernet/freescale/Makefile       |    1 +
 drivers/net/ethernet/freescale/mtipsw/Kconfig |   13 +
 .../net/ethernet/freescale/mtipsw/Makefile    |    4 +
 .../net/ethernet/freescale/mtipsw/mtipl2sw.c  | 2046 +++++++++++++++++
 .../net/ethernet/freescale/mtipsw/mtipl2sw.h  |  648 ++++++
 .../ethernet/freescale/mtipsw/mtipl2sw_br.c   |  138 ++
 .../ethernet/freescale/mtipsw/mtipl2sw_mgnt.c |  442 ++++
 10 files changed, 3450 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/net/nxp,imx28-mtip-switch.yaml
 create mode 100644 drivers/net/ethernet/freescale/mtipsw/Kconfig
 create mode 100644 drivers/net/ethernet/freescale/mtipsw/Makefile
 create mode 100644 drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c
 create mode 100644 drivers/net/ethernet/freescale/mtipsw/mtipl2sw.h
 create mode 100644 drivers/net/ethernet/freescale/mtipsw/mtipl2sw_br.c
 create mode 100644 drivers/net/ethernet/freescale/mtipsw/mtipl2sw_mgnt.c

-- 
2.39.5



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [net-next v24 1/7] dt-bindings: net: Add MTIP L2 switch description
  2026-06-01 11:24 [net-next v24 0/7] net: mtip: Add support for MTIP imx287 L2 switch driver Lukasz Majewski
@ 2026-06-01 11:24 ` Lukasz Majewski
  2026-06-02 11:32   ` sashiko-bot
  2026-06-01 11:24 ` [net-next v24 2/7] net: mtip: The L2 switch driver for imx287 Lukasz Majewski
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 21+ messages in thread
From: Lukasz Majewski @ 2026-06-01 11:24 UTC (permalink / raw)
  To: Andrew Lunn, davem, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Richard Cochran, Liam Girdwood, Nathan Chancellor, linux-kernel,
	netdev, Krzysztof Kozlowski, imx, linux-arm-kernel, Rob Herring,
	Lukasz Majewski, Stefan Wahren

This patch provides description of the MTIP L2 switch available in some
NXP's SOCs - e.g. imx287.

Signed-off-by: Lukasz Majewski <lukasz.majewski@mailbox.org>
Reviewed-by: Stefan Wahren <wahrenst@gmx.net>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>

---
Changes for v2:
- Rename the file to match exactly the compatible
  (nxp,imx287-mtip-switch)

Changes for v3:
- Remove '-' from const:'nxp,imx287-mtip-switch'
- Use '^port@[12]+$' for port patternProperties
- Drop status = "okay";
- Provide proper indentation for 'example' binding (replace 8
  spaces with 4 spaces)
- Remove smsc,disable-energy-detect; property
- Remove interrupt-parent and interrupts properties as not required
- Remove #address-cells and #size-cells from required properties check
- remove description from reg:
- Add $ref: ethernet-switch.yaml#

Changes for v4:
- Use $ref: ethernet-switch.yaml#/$defs/ethernet-ports and remove already
  referenced properties
- Rename file to nxp,imx28-mtip-switch.yaml

Changes for v5:
- Provide proper description for 'ethernet-port' node

Changes for v6:
- Proper usage of
  $ref: ethernet-switch.yaml#/$defs/ethernet-ports/patternProperties
  when specifying the 'ethernet-ports' property
- Add description and check for interrupt-names property

Changes for v7:
- Change switch interrupt name from 'mtipl2sw' to 'enet_switch'

Changes for v8:
- None

Changes for v9:
- Add GPIO_ACTIVE_LOW to reset-gpios mdio phandle

Changes for v10:
- None

Changes for v11:
- None

Changes for v12:
- Remove 'label' from required properties
- Move the reference to $ref: ethernet-switch.yaml#/$defs/ethernet-ports
  the proper place (under 'allOf:')

Changes for v13 - v22:
- None

Changes for v23:
- Update e-mail address

Changes for v24:
- None
---
 .../bindings/net/nxp,imx28-mtip-switch.yaml   | 150 ++++++++++++++++++
 1 file changed, 150 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/net/nxp,imx28-mtip-switch.yaml

diff --git a/Documentation/devicetree/bindings/net/nxp,imx28-mtip-switch.yaml b/Documentation/devicetree/bindings/net/nxp,imx28-mtip-switch.yaml
new file mode 100644
index 000000000000..f201951e73d1
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/nxp,imx28-mtip-switch.yaml
@@ -0,0 +1,150 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/net/nxp,imx28-mtip-switch.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: NXP SoC Ethernet Switch Controller (L2 MoreThanIP switch)
+
+maintainers:
+  - Lukasz Majewski <lukma@nabladev.com>
+
+description:
+  The 2-port switch ethernet subsystem provides ethernet packet (L2)
+  communication and can be configured as an ethernet switch. It provides the
+  reduced media independent interface (RMII), the management data input
+  output (MDIO) for physical layer device (PHY) management.
+
+allOf:
+  - $ref: ethernet-switch.yaml#/$defs/ethernet-ports
+
+properties:
+  compatible:
+    const: nxp,imx28-mtip-switch
+
+  reg:
+    maxItems: 1
+
+  phy-supply:
+    description:
+      Regulator that powers Ethernet PHYs.
+
+  clocks:
+    items:
+      - description: Register accessing clock
+      - description: Bus access clock
+      - description: Output clock for external device - e.g. PHY source clock
+      - description: IEEE1588 timer clock
+
+  clock-names:
+    items:
+      - const: ipg
+      - const: ahb
+      - const: enet_out
+      - const: ptp
+
+  interrupts:
+    items:
+      - description: Switch interrupt
+      - description: ENET0 interrupt
+      - description: ENET1 interrupt
+
+  interrupt-names:
+    items:
+      - const: enet_switch
+      - const: enet0
+      - const: enet1
+
+  pinctrl-names: true
+
+  ethernet-ports:
+    type: object
+    additionalProperties: true
+
+    patternProperties:
+      '^ethernet-port@[12]$':
+        type: object
+        additionalProperties: true
+        properties:
+          reg:
+            items:
+              - enum: [1, 2]
+            description: MTIP L2 switch port number
+
+        required:
+          - reg
+          - phy-mode
+          - phy-handle
+
+  mdio:
+    type: object
+    $ref: mdio.yaml#
+    unevaluatedProperties: false
+    description:
+      Specifies the mdio bus in the switch, used as a container for phy nodes.
+
+required:
+  - compatible
+  - reg
+  - clocks
+  - clock-names
+  - interrupts
+  - interrupt-names
+  - mdio
+  - ethernet-ports
+
+unevaluatedProperties: false
+
+examples:
+  - |
+    #include<dt-bindings/interrupt-controller/irq.h>
+    #include<dt-bindings/gpio/gpio.h>
+    switch@800f0000 {
+        compatible = "nxp,imx28-mtip-switch";
+        reg = <0x800f0000 0x20000>;
+        pinctrl-names = "default";
+        pinctrl-0 = <&mac0_pins_a>, <&mac1_pins_a>;
+        phy-supply = <&reg_fec_3v3>;
+        interrupts = <100>, <101>, <102>;
+        interrupt-names = "enet_switch", "enet0", "enet1";
+        clocks = <&clks 57>, <&clks 57>, <&clks 64>, <&clks 35>;
+        clock-names = "ipg", "ahb", "enet_out", "ptp";
+
+        ethernet-ports {
+            #address-cells = <1>;
+            #size-cells = <0>;
+
+            mtip_port1: ethernet-port@1 {
+                reg = <1>;
+                label = "lan0";
+                local-mac-address = [ 00 00 00 00 00 00 ];
+                phy-mode = "rmii";
+                phy-handle = <&ethphy0>;
+            };
+
+            mtip_port2: ethernet-port@2 {
+                reg = <2>;
+                label = "lan1";
+                local-mac-address = [ 00 00 00 00 00 00 ];
+                phy-mode = "rmii";
+                phy-handle = <&ethphy1>;
+            };
+        };
+
+        mdio_sw: mdio {
+            #address-cells = <1>;
+            #size-cells = <0>;
+
+            reset-gpios = <&gpio2 13 GPIO_ACTIVE_LOW>;
+            reset-delay-us = <25000>;
+            reset-post-delay-us = <10000>;
+
+            ethphy0: ethernet-phy@0 {
+                reg = <0>;
+            };
+
+            ethphy1: ethernet-phy@1 {
+                reg = <1>;
+            };
+        };
+    };
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [net-next v24 2/7] net: mtip: The L2 switch driver for imx287
  2026-06-01 11:24 [net-next v24 0/7] net: mtip: Add support for MTIP imx287 L2 switch driver Lukasz Majewski
  2026-06-01 11:24 ` [net-next v24 1/7] dt-bindings: net: Add MTIP L2 switch description Lukasz Majewski
@ 2026-06-01 11:24 ` Lukasz Majewski
  2026-06-02 11:32   ` sashiko-bot
  2026-06-05  2:18   ` Jakub Kicinski
  2026-06-01 11:24 ` [net-next v24 3/7] net: mtip: Add buffers management functions to the L2 switch driver Lukasz Majewski
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 21+ messages in thread
From: Lukasz Majewski @ 2026-06-01 11:24 UTC (permalink / raw)
  To: Andrew Lunn, davem, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Richard Cochran, Liam Girdwood, Nathan Chancellor, linux-kernel,
	netdev, Krzysztof Kozlowski, imx, linux-arm-kernel, Rob Herring,
	Lukasz Majewski, Stefan Wahren, Andrew Lunn

This patch series provides support for More Than IP L2 switch embedded
in the imx287 SoC.

This is a two port switch (placed between uDMA[01] and MAC-NET[01]),
which can be used for offloading the network traffic.

It can be used interchangeably with current FEC driver - to be more
specific: one can use either of it, depending on the requirements.

The biggest difference is the usage of DMA - when FEC is used, separate
DMAs are available for each ENET-MAC block.
However, with switch enabled - only the DMA0 is used to send/receive data
to/form switch (and then switch sends them to respecitive ports).

Signed-off-by: Lukasz Majewski <lukasz.majewski@mailbox.org>
Reviewed-by: Stefan Wahren <wahrenst@gmx.net>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>

---
Changes for v2:

- Remove not needed comments
- Restore udelay(10) for switch reset (such delay is explicitly specifed
  in the documentation
- Add COMPILE_TEST
- replace pr_* with dev_*
- Use for_each_available_child_of_node_scoped()
- Use devm_* function for memory allocation
- Remove printing information about the HW and SW revision of the driver
- Use devm_regulator_get_optional()
- Change compatible prefix from 'fsl' to more up to date 'nxp'
- Remove .owner = THIS_MODULE
- Use devm_platform_ioremap_resource(pdev, 0);
- Use devm_request_irq()
- Use devm_regulator_get_enable_optional()
- Replace clk_prepare_enable() and devm_clk_get() with single
  call to devm_clk_get_optional_enabled()
- Cleanup error patch when function calls in probe fail
- Refactor the mtip_reset_phy() to serve as mdio bus reset callback
- Add myself as the MTIP L2 switch maintainer (squashed the separated
  commit)
- More descriptive help paragraphs (> 4 lines)

Changes for v3:
- Remove 'bridge_offloading' module parameter (to bridge ports just after probe)
- Remove forward references
- Fix reverse christmas tree formatting in functions
- Convert eligible comments to kernel doc format
- Remove extra MAC address validation check at esw_mac_addr_static()
- Remove mtip_print_link_status() and replace it with phy_print_status()
- Avoid changing phy device state in the driver (instead use functions
  exported by the phy API)
- Do not print extra information regarding PHY (which is printed by phylib) -
  e.g. net lan0: lan0: MTIP eth L2 switch 1e:ce:a5:0b:4c:12
- Remove VERSION from the driver - now we rely on the SHA1 in Linux
  mainline tree
- Remove zeroing of the net device private area (shall be already done
  during allocation)
- Refactor the code to remove mtip_ndev_setup()
- Use -ENOMEM instead of -1 return code when allocation fails
- Replace dev_info() with dev_dbg() to reduce number of information print
  on normal operation
- Return ret instead of 0 from mtip_ndev_init()
- Remove fep->mii_timeout flag from the driver
- Remove not used stop_gpr_* fields in mtip_devinfo struct
- Remove platform_device_id description for mtipl2sw driver
- Add MODULE_DEVICE_TABLE() for mtip_of_match
- Remove MODULE_ALIAS()

Changes for v4:
- Rename imx287 with imx28 (as the former is not used in kernel anymore)
- Reorder the place where ENET interface is initialized - without this
  change the enet_out clock has default (25 MHz) value, which causes
  issues during reset (RMII's 50 MHz is required for proper PHY reset).
- Use PAUR instead of PAUR register to program MAC address
- Replace eth_mac_addr() with eth_hw_addr_set()
- Write to HW the randomly generated MAC address (if required)
- Adjust the reset code
- s/read_atable/mtip_read_atable/g and s/write_atable/mtip_write_atable/g
- Add clk_disable() and netif_napi_del() when errors occur during
  mtip_open() - refactor the error handling path.
- Refactor the mtip_set_multicast_list() to write (now) correct values to
  ENET-FEC registers.
- Replace dev_warn() with dev_err()
- Use GPIO_ACTIVE_LOW to indicate polarity in DTS
- Refactor code to check if network device is the switch device
- Remove mtip_port_dev_check()
- Refactor mtip_ndev_port_link() avoid starting HW offloading for bridge
  when MTIP ports are parts of two distinct bridges
- Replace del_timer() with timer_delete_sync()

Changes for v5:
- Fix spelling in Kconfig
- Replace tmp with reg or register name
- Replace tmpaddr with mac_addr
- Use mac address assignment (from registers) code similar to fec_main.c (as it
  shall handle properly generic endianess)
- Add description for fep: in the mtip_update_atable_static() kernel doc
- Replace writel(bdp, &fep->cur_rx) with fep->cur_rx = bdp;
- Fix spelling of transmit
- Remove not needed white spaces in mtipl2sw.h
- Remove '_t' from struct mtip_addr_table_t
- Provide proper alignment in the mtipl2sw.h
- Add blank line after local header in mtipl2sw_br.c
- Use %p instead of %x (and cast) for fep in debug message
- Disable L2 switch in-HW offloading when only one
  of eligible ports is removed from the bridge
- Sort includes in the patch set alphabethically
- Introduce FEC_QUIRK_SWAP_FRAME to avoid #ifdef for imx28 proper operation
- Move 'mtip_port_info g_info' to struct switch_enet_private
- Replace some unsigned int with u32 (on data fields with 32 bit size)
- Remove not relevant comments from mtip_enet_init()
- Refactor functions definitions to be void when no other
  value than 0 is returned.
- Use capital letters in HEX constants
- Use u32 instead of unsigned int when applicable
- Add error handling code after the dma_map_single() is called
- The MCF_FEC_MSCR register can be written unconditionally
  for all supported platforms.
- Use IS_ENABLED() instead of #ifdef in mtip_timeout()
- Replace dev_info() with dev_warn_ratelimited() in mtip_switch_rx()
- Add code to handle situation when there is no memory
- Remove kfree(fep->mii_bus->irq)
- Provide more verbose output of mdio_{read|write} functions
- Handle error when clk_enable() fails in mtip_open()
- Use dev_dbg() at mtip_set_multicast_list()
- Simplify the mtip_is_switch_netdev_port() function to return condition check value
- Add dev_dbg() when of_get_mac_address() fails (as it may not be provided)
- Remove return ret; in mtip_register_notifiers()
- Replace int to bool in mtipl2sw_mgnt.c file's function definitions
- Replace unsigned int/long with u32 where applicable (where access to
  32 bit registers is performed)
- Refactor code in mtip_{read|write}_atable() to be more readable
- Remove code added for not (yet) supported IMX's vf610 SoC
- Remove do { } while(); loop from mtip_interrupt() function
- Introduce MTIP_PORT_FORWARDING_INIT to indicate intial value for
  port forwarding
- Replace 'unsigned long' to 'u32' in mtipl2sw.h
- Replace 'unsigned short' to 'u16' in mtipl2sw.h
- use %#x in dev_dbg()
- Call SET_NETDEV_DEV() macro to set network device' parent - otherwise
  phy_attach_direct() will fail.

Changes for v6:
- Use dev_name(&pdev->dev) when requesting IRQ (to be in sync with other subsystems)
- Use platform_get_irq_byname() for beter readability
- Replace ARCH_MXS with SOC_IMX28
- Replace 2048 with MTIP_ATABLE_MEM_NUM_ENTRIES
- Remove check if fep == NULL in mtip_aging_timer() as timer can be setup only
  after the fep structure is allocated and already filled durring probe()
  execution

Changes for v7:
- Change switch interrupt name from 'mtipl2sw' to 'enet_switch'

Changes for v8:
- Replace struct switch_t with set of #define(s) for MTIP L2
  switch IP block registers offsets. This helps to keep the '__iomem'
  annotation when accessing them with readl/writel() and fix warnings
  from sparse on GCC and CLANG. No functional changes, just registers'
  access coding paradigm has been updated.
- Fix warings regarding access to atable - by adding '__iomem' attribute
- Remove not used struct mtip_port_statistics_status

Changes for v9:
- Adjust Makefile to properly build mtipl2sw driver as a module
  (otherwise make allmodconfig build fails).

Changes for v10:
- Remove __init attribute from mtip_switch_dma_init() to avoid clang
  modpost Warninig
  Reproduction steps:
  make LLVM_SUFFIX=-19 LLVM=1 mrproper
  cp ./arch/arm/configs/mxs_defconfig .config
  make ARCH=arm LLVM_SUFFIX=-19 LLVM=1 W=1 menuconfig
  make ARCH=arm LLVM_SUFFIX=-19 LLVM=1 W=1 -j8 LOADADDR=0x40008000 uImage dtbs

Changes for v11:
- Replace of_find_node_by_name() with of_get_child_by_name()
- Replace of_match_node() with dedicated of_device_get_match_data()
- Replace devm_err() with dev_err_probe() in the probe function
- Remove kfree(fep) from mtip_sw_remove() to fix:
  ./mtipl2sw.c:1953:1-6: WARNING: invalid free of devm_ allocated data
- the *bus pointer provided as an agument to mtip_mdiobus_reset() cannot
  be NULL itself, as then the "reset" callback couldn't be referenced.
  Considering the above - the "!bus" check can be removed.
  It fixes the following error from coccinelle
  ./mtipl2sw.c:1237:16-19: ERROR: bus is NULL but dereferenced.
- Fix smatch errors:
  mtip_atable_dynamicms_learn_migration() error: uninitialized symbol 'rx_mac_lo/hi'.
  by initializing rx_mac_lo and rx_mac_hi variables
  Replace fep->irq with ret in dev_err_probe()

Changes for v12:
- Clear fep->rx_skbuff[i] when buffers are freed

Changes for v13:
- Replace spin_lock_irqsave() with spin_lock_bh() when eligible (NAPI or softirq context)
- Increment dev->stats.tx_bytes after data is really send
- Add wmb() before descriptor is used for transmission (either RX or TX)
- Add work queue to handle bottom half of network interface timeout
- Move the network statistics update (only when packet is correctly received) just before
  finishing processing of data.
- Remove dev_kfree_skb_any(skb); and finally rely on mtip_free_buffers()
- Use mtip_set_last_buf_to_wrap() helper function
- Cleanup defines for internal routing table entries
- Use GENMASK() when applicable
- Replace extra pair of dma_map_single()/dma_unmap_single() in RX function with
  dma_sync_single_for_cpu() as there is already allocated buffer (its pointer is
  stored in bdp->cbd_bufaddr)
- Use proper network devices when offloading is enabled
- Use FIELD_PREP() and FIELD_GET() macros
- Replace preprocessor macros with FIELD_PREP()
- Remove not used defines
- Replace AT_EXTRACT* macros with FIELD_PREP()
- Introduce mtip_timedelta() static inline function instead of macro
- Add missing error check in mtip_alloc_buffers()
- Use page pool to allocate and sync space for incoming data (instead
  of using page map and unmap.
- Replace from_timer() with timer_container_of()
- Move mtipl2sw_br.c related code to a separate commit
- Move mtipl2sw_mgnt.c related code to a separate commit
- Exclude struct net_device_ops callback function to a separate commit

Changes for v14:
- Increase the maximal received frame size to 1536 (for VLAN)
- Use spin_{un}lock_irq{save|restore} when altering dynamic table of the
  switch and mtip_adjust_link() as both cannot be done when switch IRQ is
  potentially enabled (the previous one alters entries in switching table
  the latter one may reset the whole IP block)
- Remove separate kthread for learning
- Use jiffies to calculate time stamps for the entries in dynamic switching
  table
- Remove not needed mtip_mii_unregister()
- Rename timer to reflect its function

Changes for v15:
- Remove mii_lock, as it is not used in the series
- Remove extra space
- Use proper error check for devm_regulator_get_enable_optional()
- Use spin_{un}lock() instead of spin_{un}clock_irq{restore|save}()
  Context of altering switching table doesn't require IRQs disabled.
- Set larger number of descriptors to allow higher throughtput

Changes for v16:
- Remove not used at_curr_entries
- Set the fep->ndev[] to NULL after ndev being unregistered
- Reorder the initialization code to register network devices as the last
  step.
- Change error execution path labels
- Remove spin_{un}lock_irq{save|restore}
- Stop netif queue before switch is going to be restarted

Changes for v17:
- Rewrite error handling code in mtip_mdev_init()

Changes for v18:
- Update e-mail address in MAINTAINERS

Changes for v19:
- Restore interrupts when mtip_rx_napi() exits due to no memory error
  (avoiding waiting for timeout)
- Handle return value of napi_complete_done() - i.e. only when it
  returns true interrupts are restored

Changes for v20:
- Change mtip_switch_{rx|tx} function prototypes
- Remove the need to have the port information in the NAPI RX context.
- Remove fep->skb_cur ( = fep->cur_tx - fep->tx_bd_base)
- Remove fep->skb_dirty (= fep->dirty_tx - fep->tx_bd_base)
- Remove fep->tx_full flag - as replaced with check for
  fep-> cur_tx == fep->dirty_tx and at completion of TX path with
  additional check if queues for network devices were disabled.
- Add routines to manage net devices' queues - code to stop and
  wake both network interfaces' queues when either TX ring
  descriptors are all full or during link adjustments we need to
  re-configure the switch HW).

Changes for v21:
- Stop and wake not only first network device
- Add missing mdiobus_unregister() before mdiobus_free()
- Free DMA memory used to hold MTIP uDMA0 memory descriptors
- Add explicit check for port numbers when parsing switch DTB description

Changes for v22:
- Add extra check if phy_dev is not NULL in mtip_mii_probe
- Check if ethernet port's DTS description has proper phandle defined
  for its PHY

Changes for v23:
- Fix driver description in Kconfig
- Remove wrong comment
- Change dev_err() to dev_dbg() in mtip_mii_init
- Move timer_delete_sync() to the beginning of mtip_sw_remove()
- Set indexold to value of current block_index (not 0 by default)
- Call mtip_ndev_cleanup() unconditionally on mtip_ndev_init() error path
- Properly cleanup references to phy nodes
- Update copyright notice
- Remove tx_timeout_work queue cancelling as it is not yet defined
- Protect atable's "static address" adjustment path with lear_lock spin_lock

Changes for v24:
- Enable removal of CRC field from received frames
- Do not clear MTIP L2 switch address table at each switch restart
- Clear address table each time the switch driver is probed
- Remove unused struct mtip_ndev_priv *priv declaration from mtip_ndev_cleanup()
- Fix naming convention pointed out by AI review
- Replace timer_container_of() with container_of()
- Always call mtip_parse_of_cleanup(fep) when error in mtip_parse_of() is detected
---
 MAINTAINERS                                   |    7 +
 drivers/net/ethernet/freescale/Kconfig        |    1 +
 drivers/net/ethernet/freescale/Makefile       |    1 +
 drivers/net/ethernet/freescale/mtipsw/Kconfig |   13 +
 .../net/ethernet/freescale/mtipsw/Makefile    |    4 +
 .../net/ethernet/freescale/mtipsw/mtipl2sw.c  | 1363 +++++++++++++++++
 .../net/ethernet/freescale/mtipsw/mtipl2sw.h  |  623 ++++++++
 7 files changed, 2012 insertions(+)
 create mode 100644 drivers/net/ethernet/freescale/mtipsw/Kconfig
 create mode 100644 drivers/net/ethernet/freescale/mtipsw/Makefile
 create mode 100644 drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c
 create mode 100644 drivers/net/ethernet/freescale/mtipsw/mtipl2sw.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 9ce5542ab2f3..8507c8745634 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -10330,6 +10330,13 @@ S:	Maintained
 F:	Documentation/devicetree/bindings/i2c/i2c-mpc.yaml
 F:	drivers/i2c/busses/i2c-mpc.c
 
+FREESCALE MTIP ETHERNET SWITCH DRIVER
+M:	Lukasz Majewski <lukasz.majewski@mailbox.org>
+L:	netdev@vger.kernel.org
+S:	Maintained
+F:	Documentation/devicetree/bindings/net/nxp,imx28-mtip-switch.yaml
+F:	drivers/net/ethernet/freescale/mtipsw/*
+
 FREESCALE QORIQ DPAA ETHERNET DRIVER
 M:	Madalin Bucur <madalin.bucur@nxp.com>
 L:	netdev@vger.kernel.org
diff --git a/drivers/net/ethernet/freescale/Kconfig b/drivers/net/ethernet/freescale/Kconfig
index 11edbb46a118..b97435d4841f 100644
--- a/drivers/net/ethernet/freescale/Kconfig
+++ b/drivers/net/ethernet/freescale/Kconfig
@@ -61,6 +61,7 @@ config FEC_MPC52xx_MDIO
 
 source "drivers/net/ethernet/freescale/fs_enet/Kconfig"
 source "drivers/net/ethernet/freescale/fman/Kconfig"
+source "drivers/net/ethernet/freescale/mtipsw/Kconfig"
 
 config FSL_PQ_MDIO
 	tristate "Freescale PQ MDIO"
diff --git a/drivers/net/ethernet/freescale/Makefile b/drivers/net/ethernet/freescale/Makefile
index d0a259e47960..04229056cdd3 100644
--- a/drivers/net/ethernet/freescale/Makefile
+++ b/drivers/net/ethernet/freescale/Makefile
@@ -24,3 +24,4 @@ obj-$(CONFIG_FSL_DPAA_ETH) += dpaa/
 
 obj-y += dpaa2/
 obj-y += enetc/
+obj-y += mtipsw/
diff --git a/drivers/net/ethernet/freescale/mtipsw/Kconfig b/drivers/net/ethernet/freescale/mtipsw/Kconfig
new file mode 100644
index 000000000000..3c55b4b7018e
--- /dev/null
+++ b/drivers/net/ethernet/freescale/mtipsw/Kconfig
@@ -0,0 +1,13 @@
+# SPDX-License-Identifier: GPL-2.0-only
+config FEC_MTIP_L2SW
+	tristate "MoreThanIP L2 switch support to FEC driver"
+	depends on OF
+	depends on NET_SWITCHDEV
+	depends on BRIDGE
+	depends on SOC_IMX28 || COMPILE_TEST
+	help
+	  This enables support for the MoreThanIP L2 switch on i.MX
+	  SoCs (e.g. i.MX287). It offloads bridging to this IP block's
+	  hardware and allows switch management with standard Linux tools.
+	  This switch driver can be used interchangeably with the already
+	  available FEC driver, depending on the use case's requirements.
diff --git a/drivers/net/ethernet/freescale/mtipsw/Makefile b/drivers/net/ethernet/freescale/mtipsw/Makefile
new file mode 100644
index 000000000000..bd8ffb30939a
--- /dev/null
+++ b/drivers/net/ethernet/freescale/mtipsw/Makefile
@@ -0,0 +1,4 @@
+# SPDX-License-Identifier: GPL-2.0
+
+obj-$(CONFIG_FEC_MTIP_L2SW) += nxp-mtipl2sw.o
+nxp-mtipl2sw-objs := mtipl2sw.o
diff --git a/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c b/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c
new file mode 100644
index 000000000000..1d1005a42306
--- /dev/null
+++ b/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c
@@ -0,0 +1,1363 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ *  L2 switch Controller (Ethernet L2 switch) driver for MTIP block.
+ *
+ *  Copyright (C) 2026 NABLADEV Software Engineering GmbH
+ *  Lukasz Majewski <lukma@nabladev.com>
+ *
+ *  Based on a previous work by:
+ *
+ *  Copyright 2010-2012 Freescale Semiconductor, Inc.
+ *  Alison Wang (b18965@freescale.com)
+ *  Jason Jin (Jason.jin@freescale.com)
+ *
+ *  Copyright (C) 2010-2013 Freescale Semiconductor, Inc. All Rights Reserved.
+ *  Shrek Wu (B16972@freescale.com)
+ */
+
+#include <linux/bitops.h>
+#include <linux/clk.h>
+#include <linux/delay.h>
+#include <linux/errno.h>
+#include <linux/etherdevice.h>
+#include <linux/gpio/consumer.h>
+#include <linux/init.h>
+#include <linux/interrupt.h>
+#include <linux/io.h>
+#include <linux/iopoll.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/netdevice.h>
+#include <linux/of_mdio.h>
+#include <linux/of_net.h>
+#include <linux/of_platform.h>
+#include <linux/phy.h>
+#include <linux/platform_device.h>
+#include <linux/regulator/consumer.h>
+#include <linux/rtnetlink.h>
+#include <linux/skbuff.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+#include <linux/string.h>
+#include <net/page_pool/helpers.h>
+
+#include "mtipl2sw.h"
+
+/* Set the last buffer to wrap */
+static void mtip_set_last_buf_to_wrap(struct cbd_t *bdp)
+{
+	bdp--;
+	bdp->cbd_sc |= BD_SC_WRAP;
+}
+
+static void mtip_netif_stop_queues(struct switch_enet_private *fep)
+{
+	for (int i = 0; i < SWITCH_EPORT_NUMBER; i++)
+		netif_stop_queue(fep->ndev[i]);
+}
+
+static void mtip_netif_wake_queues(struct switch_enet_private *fep)
+{
+	for (int i = 0; i < SWITCH_EPORT_NUMBER; i++)
+		netif_wake_queue(fep->ndev[i]);
+}
+
+static bool mtip_netif_queues_stopped(struct switch_enet_private *fep)
+{
+	return netif_queue_stopped(fep->ndev[0]) &&
+		netif_queue_stopped(fep->ndev[1]);
+}
+
+struct mtip_devinfo {
+	u32 quirks;
+};
+
+static void mtip_enet_init(struct switch_enet_private *fep, int port)
+{
+	void __iomem *enet_addr = fep->enet_addr;
+	u32 mii_speed, holdtime, reg;
+
+	if (port == 2)
+		enet_addr += MCF_ESW_ENET_PORT_OFFSET;
+
+	reg = MCF_FEC_RCR_PROM | MCF_FEC_RCR_MII_MODE | MCF_FEC_RCR_CRC_FWD |
+		FIELD_PREP(MCF_FEC_RCR_MAX_FL_MASK, 1522);
+
+	if (fep->phy_interface[port - 1] == PHY_INTERFACE_MODE_RMII)
+		reg |= MCF_FEC_RCR_RMII_MODE;
+
+	writel(reg, enet_addr + MCF_FEC_RCR);
+
+	writel(MCF_FEC_TCR_FDEN, enet_addr + MCF_FEC_TCR);
+	writel(MCF_FEC_ECR_ETHER_EN, enet_addr + MCF_FEC_ECR);
+
+	mii_speed = DIV_ROUND_UP(clk_get_rate(fep->clk_ipg), 5000000);
+	mii_speed--;
+
+	holdtime = DIV_ROUND_UP(clk_get_rate(fep->clk_ipg), 100000000) - 1;
+
+	fep->phy_speed = mii_speed << 1 | holdtime << 8;
+
+	writel(fep->phy_speed, enet_addr + MCF_FEC_MSCR);
+}
+
+static void mtip_setup_mac(struct net_device *dev)
+{
+	struct mtip_ndev_priv *priv = netdev_priv(dev);
+	struct switch_enet_private *fep = priv->fep;
+	unsigned char *iap, mac_addr[ETH_ALEN];
+
+	/* Use MAC address from DTS */
+	iap = &fep->mac[priv->portnum - 1][0];
+
+	/* Use MAC address set by bootloader */
+	if (!is_valid_ether_addr(iap)) {
+		*((__be32 *)&mac_addr[0]) =
+			cpu_to_be32(readl(fep->enet_addr + MCF_FEC_PALR));
+		*((__be16 *)&mac_addr[4]) =
+			cpu_to_be16(readl(fep->enet_addr +
+					  MCF_FEC_PAUR) >> 16);
+		iap = &mac_addr[0];
+	}
+
+	/* Use random MAC address */
+	if (!is_valid_ether_addr(iap)) {
+		eth_hw_addr_random(dev);
+		dev_info(&fep->pdev->dev, "Using random MAC address: %pM\n",
+			 dev->dev_addr);
+		iap = (unsigned char *)dev->dev_addr;
+	}
+
+	/* Adjust MAC if using macaddr (and increment if needed) */
+	eth_hw_addr_gen(dev, iap, priv->portnum - 1);
+}
+
+/**
+ * crc8_calc - calculate CRC for MAC storage
+ *
+ * @pmacaddress: A 6-byte array with the MAC address. The first byte is
+ *               the first byte transmitted.
+ *
+ * Calculate Galois Field Arithmetic CRC for Polynom x^8+x^2+x+1.
+ * It omits the final shift in of 8 zeroes a "normal" CRC would do
+ * (getting the remainder).
+ *
+ *  Examples (hexadecimal values):<br>
+ *   10-11-12-13-14-15  => CRC=0xc2
+ *   10-11-cc-dd-ee-00  => CRC=0xe6
+ *
+ * Return: The 8-bit CRC in bits 7:0
+ */
+static int crc8_calc(unsigned char *pmacaddress)
+{
+	int byt; /* byte index */
+	int bit; /* bit index */
+	int crc = 0x12;
+	int inval;
+
+	for (byt = 0; byt < ETH_ALEN; byt++) {
+		inval = (((int)pmacaddress[byt]) & 0xFF);
+		/* shift bit 0 to bit 8 so all our bits
+		 * travel through bit 8
+		 * (simplifies below calc)
+		 */
+		inval <<= 8;
+
+		for (bit = 0; bit < 8; bit++) {
+			/* next input bit comes into d7 after shift */
+			crc |= inval & 0x100;
+			if (crc & 0x01)
+				/* before shift  */
+				crc ^= 0x1C0;
+
+			crc >>= 1;
+			inval >>= 1;
+		}
+	}
+	/* upper bits are clean as we shifted in zeroes! */
+	return crc;
+}
+
+static void mtip_read_atable(struct switch_enet_private *fep, int index,
+			     u32 *read_lo, u32 *read_hi)
+{
+	struct addr_table64b_entry __iomem *atable_base =
+		fep->hwentry->mtip_table64b_entry;
+
+	*read_lo = readl(&atable_base[index].lo);
+	*read_hi = readl(&atable_base[index].hi);
+}
+
+static void mtip_write_atable(struct switch_enet_private *fep, int index,
+			      u32 write_lo, u32 write_hi)
+{
+	struct addr_table64b_entry __iomem *atable_base =
+		fep->hwentry->mtip_table64b_entry;
+
+	writel(write_lo, &atable_base[index].lo);
+	writel(write_hi, &atable_base[index].hi);
+}
+
+/**
+ * mtip_portinfofifo_read - Read element from receive FIFO
+ *
+ * @fep: Structure describing switch
+ *
+ * Read one element from the HW receive FIFO (Queue)
+ * if available and return it.
+ *
+ * Return: mtip_port_info or NULL if no data is available.
+ */
+static
+struct mtip_port_info *mtip_portinfofifo_read(struct switch_enet_private *fep)
+{
+	struct mtip_port_info *info = &fep->g_info;
+	u32 reg;
+
+	reg = readl(fep->hwp + ESW_LSR);
+	if (reg == 0) {
+		dev_dbg(&fep->pdev->dev, "%s: ESW_LSR = 0x%x\n", __func__, reg);
+		return NULL;
+	}
+
+	/* read word from FIFO */
+	info->maclo = readl(fep->hwp + ESW_LREC0);
+	if (info->maclo == 0) {
+		dev_dbg(&fep->pdev->dev, "%s: mac lo 0x%x\n", __func__,
+			info->maclo);
+		return NULL;
+	}
+
+	/* read 2nd word from FIFO */
+	reg = readl(fep->hwp + ESW_LREC1);
+	info->machi = reg & 0xFFFF;
+	info->hash  = (reg >> 16) & 0xFF;
+	info->port  = (reg >> 24) & 0xF;
+
+	return info;
+}
+
+/* Clear complete MAC Look Up Table */
+void mtip_clear_atable(struct switch_enet_private *fep)
+{
+	unsigned long flags;
+	int index;
+
+	spin_lock_irqsave(&fep->learn_lock, flags);
+
+	for (index = 0; index < MTIP_ATABLE_MEM_NUM_ENTRIES; index++)
+		mtip_write_atable(fep, index, 0, 0);
+
+	spin_unlock_irqrestore(&fep->learn_lock, flags);
+}
+
+/**
+ * mtip_update_atable_static - Update switch static address table
+ *
+ * @mac_addr: Pointer to the array containing MAC address to
+ *            be put as static entry
+ * @port:     Port bitmask numbers to be added in static entry,
+ *            valid values are 1-7
+ * @priority: The priority for the static entry in table
+ *
+ * @fep:      Pointer to the structure describing the switch
+ *
+ * Updates MAC address lookup table with a static entry.
+ *
+ * Searches if the MAC address is already there in the block and replaces
+ * the older entry with the new one. If MAC address is not there then puts
+ * a new entry in the first empty slot available in the block.
+ *
+ * Return: 0 for a successful update else -ENOSPC when no slot available
+ */
+static int mtip_update_atable_static(unsigned char *mac_addr, unsigned int port,
+				     unsigned int priority,
+				     struct switch_enet_private *fep)
+{
+	unsigned long block_index, entry, index_end, flags;
+	u32 write_lo, write_hi, read_lo, read_hi;
+
+	spin_lock_irqsave(&fep->learn_lock, flags);
+
+	write_lo = (u32)((mac_addr[3] << 24) | (mac_addr[2] << 16) |
+			 (mac_addr[1] << 8) | mac_addr[0]);
+	write_hi = (u32)(0 | (port << AT_SENTRY_PORTMASK_shift) |
+			 (priority << AT_SENTRY_PRIO_shift) |
+			 (AT_ENTRY_TYPE_STATIC << AT_ENTRY_TYPE_shift) |
+			 (AT_ENTRY_RECORD_VALID << AT_ENTRY_VALID_shift) |
+			 (mac_addr[5] << 8) | (mac_addr[4]));
+
+	block_index = GET_BLOCK_PTR(crc8_calc(mac_addr));
+	index_end = block_index + ATABLE_ENTRY_PER_SLOT;
+	/* Now search all the entries in the selected block */
+	for (entry = block_index; entry < index_end; entry++) {
+		mtip_read_atable(fep, entry, &read_lo, &read_hi);
+		/* MAC address matched, so update the
+		 * existing entry
+		 * even if its a dynamic one
+		 */
+		if (read_lo == write_lo &&
+		    ((read_hi & 0x0000FFFF) ==
+		     (write_hi & 0x0000FFFF))) {
+			mtip_write_atable(fep, entry, write_lo, write_hi);
+			spin_unlock_irqrestore(&fep->learn_lock, flags);
+			return 0;
+		} else if (!(read_hi & (1 << 16))) {
+			/* Fill this empty slot (valid bit zero),
+			 * assuming no holes in the block
+			 */
+			mtip_write_atable(fep, entry, write_lo, write_hi);
+			spin_unlock_irqrestore(&fep->learn_lock, flags);
+			return 0;
+		}
+	}
+
+	spin_unlock_irqrestore(&fep->learn_lock, flags);
+	/* No space available for this static entry */
+	return -ENOSPC;
+}
+
+static bool mtip_update_atable_dynamic1(u32 write_lo, u32 write_hi,
+					int block_index, unsigned int port,
+					unsigned int curr_time,
+					struct switch_enet_private *fep)
+{
+	unsigned long entry, index_end;
+	int time, timeold, indexold;
+	u32 read_lo, read_hi;
+	unsigned long conf;
+
+	/* prepare update port and timestamp */
+	conf = AT_ENTRY_RECORD_VALID << AT_ENTRY_VALID_shift;
+	conf |= AT_ENTRY_TYPE_DYNAMIC << AT_ENTRY_TYPE_shift;
+	conf |= curr_time << AT_DENTRY_TIME_shift;
+	conf |= port << AT_DENTRY_PORT_shift;
+	conf |= write_hi;
+
+	/* linear search through all slot
+	 * entries and update if found
+	 */
+	index_end = block_index + ATABLE_ENTRY_PER_SLOT;
+	/* Now search all the entries in the selected block */
+	for (entry = block_index; entry < index_end; entry++) {
+		mtip_read_atable(fep, entry, &read_lo, &read_hi);
+		if (read_lo == write_lo &&
+		    ((read_hi & 0x0000FFFF) ==
+		     (write_hi & 0x0000FFFF))) {
+			/* found correct address,
+			 * update timestamp.
+			 */
+			mtip_write_atable(fep, entry, write_lo, conf);
+
+			return false;
+		} else if (!(read_hi & (1 << 16))) {
+			/* slot is empty, then use it
+			 * for new entry
+			 * Note: There are no holes,
+			 * therefore cannot be any
+			 * more that need to be compared.
+			 */
+			mtip_write_atable(fep, entry, write_lo, conf);
+			return true;
+		}
+	}
+
+	/* No more entry available in block overwrite oldest */
+	timeold = 0;
+	indexold = block_index;
+	for (entry = block_index; entry < index_end; entry++) {
+		mtip_read_atable(fep, entry, &read_lo, &read_hi);
+		time = FIELD_GET(AT_TIMESTAMP_MASK, read_hi);
+		dev_dbg(&fep->pdev->dev, "%s : time %x currtime %x\n",
+			__func__, time, curr_time);
+		time = mtip_timedelta(curr_time, time);
+		if (time > timeold) {
+			/* is it older ? */
+			timeold = time;
+			indexold = entry;
+		}
+	}
+
+	mtip_write_atable(fep, indexold, write_lo, conf);
+
+	fep->at_block_overflows++;
+	dev_err(&fep->pdev->dev, "%s update time, at_block_overflows %x\n",
+		__func__, fep->at_block_overflows);
+	/* newly inserted */
+	return true;
+}
+
+/* dynamicms MAC address table learn and migration */
+static void
+mtip_atable_dynamicms_learn_migration(struct switch_enet_private *fep,
+				      int curr_time, unsigned char *mac,
+				      u8 *rx_port)
+{
+	u8 port = MTIP_PORT_FORWARDING_INIT;
+	struct mtip_port_info *port_info;
+	u32 rx_mac_lo = 0, rx_mac_hi = 0;
+	int index;
+
+	spin_lock(&fep->learn_lock);
+
+	if (mac && is_valid_ether_addr(mac)) {
+		rx_mac_lo = (u32)((mac[3] << 24) | (mac[2] << 16) |
+				  (mac[1] << 8) | mac[0]);
+		rx_mac_hi = (u32)((mac[5] << 8) | (mac[4]));
+	}
+
+	port_info = mtip_portinfofifo_read(fep);
+	while (port_info) {
+		/* get block index from lookup table */
+		index = GET_BLOCK_PTR(port_info->hash);
+		mtip_update_atable_dynamic1(port_info->maclo, port_info->machi,
+					    index, port_info->port,
+					    curr_time, fep);
+
+		if (mac && is_valid_ether_addr(mac) &&
+		    port == MTIP_PORT_FORWARDING_INIT) {
+			if (rx_mac_lo == port_info->maclo &&
+			    rx_mac_hi == port_info->machi) {
+				/* The newly learned MAC is the source of
+				 * our filtered frame.
+				 */
+				port = (u8)port_info->port;
+			}
+		}
+		port_info = mtip_portinfofifo_read(fep);
+	}
+
+	if (rx_port)
+		*rx_port = port;
+
+	spin_unlock(&fep->learn_lock);
+}
+
+static void mtip_mgnt_timer(struct timer_list *t)
+{
+	struct switch_enet_private *fep =
+		container_of(t, struct switch_enet_private, timer_mgnt);
+
+	mtip_atable_dynamicms_learn_migration(fep, mtip_get_time(),
+					      NULL, NULL);
+	mod_timer(&fep->timer_mgnt,
+		  jiffies + msecs_to_jiffies(LEARNING_AGING_INTERVAL));
+}
+
+static void esw_mac_addr_static(struct switch_enet_private *fep)
+{
+	int i;
+
+	for (i = 0; i < SWITCH_EPORT_NUMBER; i++)
+		mtip_update_atable_static((unsigned char *)
+					  fep->ndev[i]->dev_addr, 7, 7, fep);
+}
+
+static void mtip_config_switch(struct switch_enet_private *fep)
+{
+	esw_mac_addr_static(fep);
+
+	writel(0, fep->hwp + ESW_BKLR);
+
+	writel(MCF_ESW_IMR_TXF | MCF_ESW_IMR_RXF,
+	       fep->hwp + ESW_IMR);
+}
+
+static void mtip_configure_enet_mii(struct switch_enet_private *fep, int port)
+{
+	struct phy_device *phydev = fep->phy_dev[port - 1];
+	struct net_device *dev = fep->ndev[port - 1];
+	void __iomem *enet_addr = fep->enet_addr;
+	int duplex = fep->full_duplex[port - 1];
+	u32 rcr;
+
+	if (port == 2)
+		enet_addr += MCF_ESW_ENET_PORT_OFFSET;
+
+	/* ECR */
+	writel(MCF_FEC_ECR_MAGIC_ENA, enet_addr + MCF_FEC_ECR);
+
+	/* EMRBR */
+	writel(PKT_MAXBLR_SIZE, enet_addr + MCF_FEC_EMRBR);
+
+	/* set the receive and transmit BDs ring base to
+	 * hardware registers(ERDSR & ETDSR)
+	 */
+	writel(fep->bd_dma, enet_addr + MCF_FEC_ERDSR);
+	writel((unsigned long)fep->bd_dma + sizeof(struct cbd_t) * RX_RING_SIZE,
+	       enet_addr + MCF_FEC_ETDSR);
+
+	writel(fep->phy_speed, enet_addr + MCF_FEC_MSCR);
+
+	/* EIR */
+	writel(0, enet_addr + MCF_FEC_EIR);
+
+	/* IAUR */
+	writel(0, enet_addr + MCF_FEC_IAUR);
+
+	/* IALR */
+	writel(0, enet_addr + MCF_FEC_IALR);
+
+	/* GAUR */
+	writel(0, enet_addr + MCF_FEC_GAUR);
+
+	/* GALR */
+	writel(0, enet_addr + MCF_FEC_GALR);
+
+	/* EMRBR */
+	writel(PKT_MAXBLR_SIZE, enet_addr + MCF_FEC_EMRBR);
+
+	/* EIMR */
+	writel(0, enet_addr + MCF_FEC_EIMR);
+
+	/* PALR PAUR */
+	/* Set the station address for the ENET Adapter */
+	writel(dev->dev_addr[3] |
+	       dev->dev_addr[2] << 8 |
+	       dev->dev_addr[1] << 16 |
+	       dev->dev_addr[0] << 24, enet_addr + MCF_FEC_PALR);
+	writel(dev->dev_addr[5] << 16 |
+	       (dev->dev_addr[4] + (unsigned char)(0)) << 24,
+	       enet_addr + MCF_FEC_PAUR);
+
+	/* RCR */
+	rcr = readl(enet_addr + MCF_FEC_RCR);
+	if (phydev && phydev->speed == SPEED_100)
+		rcr &= ~MCF_FEC_RCR_RMII_10BASET;
+	else
+		rcr |= MCF_FEC_RCR_RMII_10BASET;
+
+	if (duplex == DUPLEX_FULL)
+		rcr &= ~MCF_FEC_RCR_DRT;
+	else
+		rcr |= MCF_FEC_RCR_DRT;
+
+	writel(rcr, enet_addr + MCF_FEC_RCR);
+
+	/* TCR */
+	if (duplex == DUPLEX_FULL)
+		writel(0x1C, enet_addr + MCF_FEC_TCR);
+	else
+		writel(0x18, enet_addr + MCF_FEC_TCR);
+
+	/* ECR */
+	writel(readl(enet_addr + MCF_FEC_ECR) | MCF_FEC_ECR_ETHER_EN,
+	       enet_addr + MCF_FEC_ECR);
+}
+
+/* This function is called to start or restart the FEC during a link
+ * change. This only happens when switching between half and full
+ * duplex.
+ */
+static void mtip_switch_restart(struct net_device *dev, int duplex0,
+				int duplex1)
+{
+	struct mtip_ndev_priv *priv = netdev_priv(dev);
+	struct switch_enet_private *fep = priv->fep;
+	int i;
+
+	 /* Perform a reset. We should wait for this. */
+	writel(MCF_ESW_MODE_SW_RST, fep->hwp + ESW_MODE);
+
+	/* Delay of 10us specified in the documentation to perform
+	 * SW reset by the switch internally.
+	 */
+	udelay(10);
+	writel(MCF_ESW_MODE_STATRST, fep->hwp + ESW_MODE);
+	writel(MCF_ESW_MODE_SW_EN, fep->hwp + ESW_MODE);
+
+	/* Management port configuration,
+	 * make port 0 as management port
+	 */
+	writel(0, fep->hwp + ESW_BMPC);
+
+	/* Clear any outstanding interrupt */
+	writel(0xFFFFFFFF, fep->hwp + ESW_ISR);
+
+	/* Set backpressure threshold to minimize discarded frames
+	 * during due to congestion.
+	 */
+	writel(P0BC_THRESHOLD, fep->hwp + ESW_P0BCT);
+
+	/* Set maximum receive buffer size */
+	writel(PKT_MAXBLR_SIZE, fep->hwp + ESW_MRBR);
+
+	/* Set receive and transmit descriptor base */
+	writel(fep->bd_dma, fep->hwp + ESW_RDSR);
+	writel((unsigned long)fep->bd_dma
+		+ sizeof(struct cbd_t) * RX_RING_SIZE,
+		fep->hwp + ESW_TDSR);
+
+	fep->cur_tx = fep->tx_bd_base;
+	fep->cur_rx = fep->rx_bd_base;
+	fep->dirty_tx = fep->cur_tx;
+
+	/* Reset SKB transmit buffers */
+	for (i = 0; i <= TX_RING_MOD_MASK; i++) {
+		if (fep->tx_skbuff[i]) {
+			dev_kfree_skb_any(fep->tx_skbuff[i]);
+			fep->tx_skbuff[i] = NULL;
+		}
+	}
+
+	fep->full_duplex[0] = duplex0;
+	fep->full_duplex[1] = duplex1;
+
+	mtip_configure_enet_mii(fep, 1);
+	mtip_configure_enet_mii(fep, 2);
+
+	/* And last, enable the transmit and receive processing */
+	writel(MCF_ESW_RDAR_R_DES_ACTIVE, fep->hwp + ESW_RDAR);
+
+	/* Enable interrupts we wish to service */
+	writel(0xFFFFFFFF, fep->hwp + ESW_ISR);
+	writel(MCF_ESW_IMR_TXF | MCF_ESW_IMR_RXF,
+	       fep->hwp + ESW_IMR);
+
+	mtip_config_switch(fep);
+}
+
+static irqreturn_t mtip_interrupt(int irq, void *ptr_fep)
+{
+	struct switch_enet_private *fep = ptr_fep;
+	irqreturn_t ret = IRQ_NONE;
+	u32 int_events, int_imask;
+
+	/* Get the interrupt events that caused us to be here */
+	int_events = readl(fep->hwp + ESW_ISR);
+	writel(int_events, fep->hwp + ESW_ISR);
+
+	if (int_events & (MCF_ESW_ISR_RXF | MCF_ESW_ISR_TXF)) {
+		ret = IRQ_HANDLED;
+		/* Disable the RX interrupt */
+		if (napi_schedule_prep(&fep->napi)) {
+			int_imask = readl(fep->hwp + ESW_IMR);
+			int_imask &= ~MCF_ESW_IMR_RXF;
+			writel(int_imask, fep->hwp + ESW_IMR);
+			__napi_schedule(&fep->napi);
+		}
+	}
+
+	return ret;
+}
+
+static void mtip_switch_tx(struct switch_enet_private *fep)
+{
+}
+
+static int mtip_switch_rx(struct net_device *dev, int budget)
+{
+	return -ENOMEM;
+}
+
+static void mtip_adjust_link(struct net_device *dev)
+{
+	struct mtip_ndev_priv *priv = netdev_priv(dev);
+	struct switch_enet_private *fep = priv->fep;
+	struct phy_device *phy_dev;
+	int status_change = 0, idx;
+
+	idx = priv->portnum - 1;
+	phy_dev = fep->phy_dev[idx];
+
+	/* Duplex link change */
+	if (phy_dev->link && fep->full_duplex[idx] != phy_dev->duplex) {
+		mtip_netif_stop_queues(fep);
+		if (idx == 0)
+			mtip_switch_restart(dev, phy_dev->duplex,
+					    fep->full_duplex[!idx]);
+		else
+			mtip_switch_restart(dev, fep->full_duplex[!idx],
+					    phy_dev->duplex);
+
+		if (mtip_netif_queues_stopped(fep))
+			mtip_netif_wake_queues(fep);
+
+		status_change = 1;
+	}
+
+	/* Link on or off change */
+	if (phy_dev->link != fep->link[idx]) {
+		fep->link[idx] = phy_dev->link;
+		if (phy_dev->link) {
+			mtip_netif_stop_queues(fep);
+			if (idx == 0)
+				mtip_switch_restart(dev, phy_dev->duplex,
+						    fep->full_duplex[!idx]);
+			else
+				mtip_switch_restart(dev, fep->full_duplex[!idx],
+						    phy_dev->duplex);
+
+			if (mtip_netif_queues_stopped(fep))
+				mtip_netif_wake_queues(fep);
+		}
+		status_change = 1;
+	}
+
+	if (status_change)
+		phy_print_status(phy_dev);
+}
+
+static int mtip_mdio_wait(struct switch_enet_private *fep)
+{
+	uint ievent = 0;
+	int ret;
+
+	ret = readl_poll_timeout_atomic(fep->enet_addr + MCF_FEC_EIR, ievent,
+					ievent & MCF_ENET_MII, 2, 30000);
+	if (!ret)
+		writel(MCF_ENET_MII, fep->enet_addr + MCF_FEC_EIR);
+
+	return ret;
+}
+
+static int mtip_mdio_read(struct mii_bus *bus, int mii_id, int regnum)
+{
+	struct switch_enet_private *fep = bus->priv;
+	int ret;
+
+	/* start a read op */
+	writel(FEC_MMFR_ST | FEC_MMFR_OP_READ |
+	       FIELD_PREP(FEC_MMFR_PA_MASK, mii_id) |
+	       FIELD_PREP(FEC_MMFR_RA_MASK, regnum) |
+	       FEC_MMFR_TA, fep->enet_addr + MCF_FEC_MII_DATA);
+
+	/* wait for end of transfer */
+	ret = mtip_mdio_wait(fep);
+	if (ret) {
+		dev_err(&fep->pdev->dev, "MTIP: MDIO (%s:%d) read timeout\n",
+			bus->id, mii_id);
+		return ret;
+	}
+
+	/* return value */
+	return FIELD_GET(FEC_MMFR_DATA_MASK,
+			 readl(fep->enet_addr + MCF_FEC_MII_DATA));
+}
+
+static int mtip_mdio_write(struct mii_bus *bus, int mii_id, int regnum,
+			   u16 value)
+{
+	struct switch_enet_private *fep = bus->priv;
+	int ret;
+
+	/* start a write op */
+	writel(FEC_MMFR_ST | FEC_MMFR_OP_WRITE |
+	       FIELD_PREP(FEC_MMFR_PA_MASK, mii_id) |
+	       FIELD_PREP(FEC_MMFR_RA_MASK, regnum) |
+	       FEC_MMFR_TA | FIELD_PREP(FEC_MMFR_DATA_MASK, value),
+	       fep->enet_addr + MCF_FEC_MII_DATA);
+
+	/* wait for end of transfer */
+	ret = mtip_mdio_wait(fep);
+	if (ret)
+		dev_err(&fep->pdev->dev, "MTIP: MDIO (%s:%d) write timeout\n",
+			bus->id, mii_id);
+
+	return ret;
+}
+
+static int mtip_mii_probe(struct net_device *dev)
+{
+	struct mtip_ndev_priv *priv = netdev_priv(dev);
+	struct switch_enet_private *fep = priv->fep;
+	struct phy_device *phy_dev = NULL;
+	int port_idx = priv->portnum - 1;
+
+	if (fep->phy_np[port_idx])
+		phy_dev = of_phy_connect(dev, fep->phy_np[port_idx],
+					 &mtip_adjust_link, 0,
+					 fep->phy_interface[port_idx]);
+
+	if (!phy_dev) {
+		netdev_err(dev, "Unable to connect to phy\n");
+		return -ENODEV;
+	}
+
+	phy_set_max_speed(phy_dev, 100);
+	fep->phy_dev[port_idx] = phy_dev;
+	fep->link[port_idx] = 0;
+	fep->full_duplex[port_idx] = 0;
+
+	dev_dbg(&dev->dev,
+		"MTIP PHY driver [%s] (mii_bus:phy_addr=%s, irq=%d)\n",
+		fep->phy_dev[port_idx]->drv->name,
+		phydev_name(fep->phy_dev[port_idx]),
+		fep->phy_dev[port_idx]->irq);
+
+	return 0;
+}
+
+static int mtip_mdiobus_reset(struct mii_bus *bus)
+{
+	if (!bus->reset_gpiod) {
+		dev_err(&bus->dev, "Reset GPIO pin not provided!\n");
+		return -EINVAL;
+	}
+
+	gpiod_set_value_cansleep(bus->reset_gpiod, 0);
+
+	/* Extra time to allow:
+	 * 1. GPIO RESET pin go high to prevent situation where its value is
+	 *    "LOW" as it is NOT configured.
+	 * 2. The ENET CLK to stabilize before GPIO RESET is asserted
+	 */
+	usleep_range(200, 300);
+
+	gpiod_set_value_cansleep(bus->reset_gpiod, 1);
+	usleep_range(bus->reset_delay_us, bus->reset_delay_us + 1000);
+	gpiod_set_value_cansleep(bus->reset_gpiod, 0);
+
+	if (bus->reset_post_delay_us > 0)
+		usleep_range(bus->reset_post_delay_us,
+			     bus->reset_post_delay_us + 1000);
+
+	return 0;
+}
+
+static int mtip_mii_init(struct switch_enet_private *fep,
+			 struct platform_device *pdev)
+{
+	struct device_node *node;
+	int err = -ENXIO;
+
+	/* Clear MMFR to avoid to generate MII event by writing MSCR.
+	 * MII event generation condition:
+	 * - writing MSCR:
+	 *      - mmfr[31:0]_not_zero & mscr[7:0]_is_zero &
+	 *        mscr_reg_data_in[7:0] != 0
+	 * - writing MMFR:
+	 *      - mscr[7:0]_not_zero
+	 */
+	writel(0, fep->hwp + MCF_FEC_MII_DATA);
+	/* Clear any pending transaction complete indication */
+	writel(MCF_ENET_MII, fep->enet_addr + MCF_FEC_EIR);
+
+	fep->mii_bus = mdiobus_alloc();
+	if (!fep->mii_bus) {
+		err = -ENOMEM;
+		goto err_out;
+	}
+
+	fep->mii_bus->name = "mtip_mii_bus";
+	fep->mii_bus->read = mtip_mdio_read;
+	fep->mii_bus->write = mtip_mdio_write;
+	fep->mii_bus->reset = mtip_mdiobus_reset;
+	snprintf(fep->mii_bus->id, MII_BUS_ID_SIZE, "%x", 0);
+	fep->mii_bus->priv = fep;
+	fep->mii_bus->parent = &pdev->dev;
+
+	node = of_get_child_by_name(pdev->dev.of_node, "mdio");
+	if (node)
+		dev_dbg(&fep->pdev->dev, "%s: PHY name: %s\n",
+			__func__, node->name);
+
+	err = of_mdiobus_register(fep->mii_bus, node);
+	if (node)
+		of_node_put(node);
+	if (err)
+		goto err_out_free_mdiobus;
+
+	return 0;
+
+err_out_free_mdiobus:
+	mdiobus_free(fep->mii_bus);
+err_out:
+	return err;
+}
+
+static void mtip_mii_remove(struct switch_enet_private *fep)
+{
+	int i;
+
+	for (i = 0; i < SWITCH_EPORT_NUMBER; i++)
+		if (fep->phy_dev[i])
+			phy_disconnect(fep->phy_dev[i]);
+
+	mdiobus_unregister(fep->mii_bus);
+	mdiobus_free(fep->mii_bus);
+}
+
+static void mtip_get_drvinfo(struct net_device *dev,
+			     struct ethtool_drvinfo *info)
+{
+	struct mtip_ndev_priv *priv = netdev_priv(dev);
+	struct switch_enet_private *fep = priv->fep;
+
+	strscpy(info->driver, fep->pdev->dev.driver->name,
+		sizeof(info->driver));
+	strscpy(info->bus_info, dev_name(&dev->dev),
+		sizeof(info->bus_info));
+}
+
+static void mtip_free_buffers(struct net_device *dev)
+{
+}
+
+static int mtip_alloc_buffers(struct net_device *dev)
+{
+	return 0;
+}
+
+static int mtip_rx_napi(struct napi_struct *napi, int budget)
+{
+	struct mtip_ndev_priv *priv = netdev_priv(napi->dev);
+	struct switch_enet_private *fep = priv->fep;
+	int pkts;
+
+	pkts = mtip_switch_rx(napi->dev, budget);
+	if (pkts == -ENOMEM) {
+		napi_complete(napi);
+		/* Set default interrupt mask for L2 switch */
+		writel(MCF_ESW_IMR_RXF | MCF_ESW_IMR_TXF,
+		       fep->hwp + ESW_IMR);
+		return 0;
+	}
+
+	mtip_switch_tx(fep);
+
+	if (pkts < budget) {
+		if (likely(napi_complete_done(napi, pkts)))
+			/* Set default interrupt mask for L2 switch */
+			writel(MCF_ESW_IMR_RXF | MCF_ESW_IMR_TXF,
+			       fep->hwp + ESW_IMR);
+	}
+	return pkts;
+}
+
+static int mtip_open(struct net_device *dev)
+{
+	struct mtip_ndev_priv *priv = netdev_priv(dev);
+	struct switch_enet_private *fep = priv->fep;
+	int ret, port_idx = priv->portnum - 1;
+
+	if (fep->usage_count == 0) {
+		ret = clk_enable(fep->clk_ipg);
+		if (ret) {
+			dev_err(&fep->pdev->dev,
+				"Cannot enable switch IPG clock\n");
+			return ret;
+		}
+
+		netif_napi_add(dev, &fep->napi, mtip_rx_napi);
+
+		ret = mtip_alloc_buffers(dev);
+		if (ret)
+			goto mtip_alloc_buffers_err;
+	}
+
+	fep->link[port_idx] = 0;
+
+	/* Probe and connect to PHY when open the interface, if already
+	 * NOT done in the switch driver probe (or when the device is
+	 * re-opened).
+	 */
+	ret = mtip_mii_probe(dev);
+	if (ret)
+		goto mtip_mii_probe_err;
+
+	phy_start(fep->phy_dev[port_idx]);
+
+	if (fep->usage_count == 0) {
+		napi_enable(&fep->napi);
+		mtip_switch_restart(dev, 1, 1);
+
+		netif_start_queue(dev);
+	}
+
+	fep->usage_count++;
+	return 0;
+
+ mtip_mii_probe_err:
+	mtip_free_buffers(dev);
+ mtip_alloc_buffers_err:
+	if (fep->usage_count == 0) {
+		netif_napi_del(&fep->napi);
+		clk_disable(fep->clk_ipg);
+	}
+	return ret;
+};
+
+static int mtip_close(struct net_device *dev)
+{
+	struct mtip_ndev_priv *priv = netdev_priv(dev);
+	struct switch_enet_private *fep = priv->fep;
+	int idx = priv->portnum - 1;
+
+	fep->link[idx] = 0;
+
+	if (fep->phy_dev[idx]) {
+		phy_stop(fep->phy_dev[idx]);
+		netif_stop_queue(dev);
+		phy_disconnect(fep->phy_dev[idx]);
+		fep->phy_dev[idx] = NULL;
+	}
+
+	if (fep->usage_count == 1) {
+		napi_disable(&fep->napi);
+		netif_napi_del(&fep->napi);
+		mtip_free_buffers(dev);
+		clk_disable(fep->clk_ipg);
+	}
+
+	fep->usage_count--;
+
+	return 0;
+}
+
+static const struct ethtool_ops mtip_ethtool_ops = {
+	.get_link_ksettings     = phy_ethtool_get_link_ksettings,
+	.set_link_ksettings     = phy_ethtool_set_link_ksettings,
+	.get_drvinfo            = mtip_get_drvinfo,
+	.get_link               = ethtool_op_get_link,
+	.get_ts_info		= ethtool_op_get_ts_info,
+};
+
+static const struct net_device_ops mtip_netdev_ops = {
+	.ndo_open		= mtip_open,
+	.ndo_stop		= mtip_close,
+};
+
+bool mtip_is_switch_netdev_port(const struct net_device *ndev)
+{
+	return ndev->netdev_ops == &mtip_netdev_ops;
+}
+
+static int mtip_switch_dma_init(struct switch_enet_private *fep)
+{
+	struct cbd_t *bdp, *cbd_base;
+	int ret, i;
+
+	/* Check mask of the streaming and coherent API */
+	ret = dma_set_mask_and_coherent(&fep->pdev->dev, DMA_BIT_MASK(32));
+	if (ret < 0) {
+		dev_err(&fep->pdev->dev, "No suitable DMA available\n");
+		return ret;
+	}
+
+	/* Allocate memory for buffer descriptors */
+	cbd_base = dma_alloc_coherent(&fep->pdev->dev, PAGE_SIZE, &fep->bd_dma,
+				      GFP_KERNEL);
+	if (!cbd_base)
+		return -ENOMEM;
+
+	/* Set receive and transmit descriptor base */
+	fep->rx_bd_base = cbd_base;
+	fep->tx_bd_base = cbd_base + RX_RING_SIZE;
+
+	/* Initialize the receive buffer descriptors */
+	bdp = fep->rx_bd_base;
+	for (i = 0; i < RX_RING_SIZE; i++) {
+		bdp->cbd_sc = 0;
+		bdp++;
+	}
+
+	mtip_set_last_buf_to_wrap(bdp);
+	/* ...and the same for transmit */
+	bdp = fep->tx_bd_base;
+	for (i = 0; i < TX_RING_SIZE; i++) {
+		/* Initialize the BD for every fragment in the page */
+		bdp->cbd_sc = 0;
+		bdp->cbd_bufaddr = 0;
+		bdp++;
+	}
+
+	mtip_set_last_buf_to_wrap(bdp);
+	return 0;
+}
+
+static void mtip_ndev_cleanup(struct switch_enet_private *fep)
+{
+	int i;
+
+	for (i = 0; i < SWITCH_EPORT_NUMBER; i++) {
+		if (fep->ndev[i]) {
+			unregister_netdev(fep->ndev[i]);
+			free_netdev(fep->ndev[i]);
+			fep->ndev[i] = NULL;
+		}
+	}
+}
+
+static int mtip_ndev_init(struct switch_enet_private *fep,
+			  struct platform_device *pdev)
+{
+	struct mtip_ndev_priv *priv;
+	int i, ret = 0;
+
+	for (i = 0; i < SWITCH_EPORT_NUMBER; i++) {
+		fep->ndev[i] = alloc_netdev(sizeof(struct mtip_ndev_priv),
+					    fep->ndev_name[i], NET_NAME_USER,
+					    ether_setup);
+		if (!fep->ndev[i]) {
+			ret = -ENOMEM;
+			goto cleanup_created_ndev;
+		}
+
+		fep->ndev[i]->ethtool_ops = &mtip_ethtool_ops;
+		fep->ndev[i]->netdev_ops = &mtip_netdev_ops;
+		SET_NETDEV_DEV(fep->ndev[i], &pdev->dev);
+
+		priv = netdev_priv(fep->ndev[i]);
+		priv->dev = fep->ndev[i];
+		priv->fep = fep;
+		priv->portnum = i + 1;
+		fep->ndev[i]->irq = fep->irq;
+
+		mtip_setup_mac(fep->ndev[i]);
+
+		ret = register_netdev(fep->ndev[i]);
+		if (ret) {
+			dev_err(&fep->ndev[i]->dev,
+				"%s: ndev %s register err: %d\n", __func__,
+				fep->ndev[i]->name, ret);
+			free_netdev(fep->ndev[i]);
+			fep->ndev[i] = NULL;
+			goto cleanup_created_ndev;
+		}
+
+		dev_dbg(&fep->ndev[i]->dev, "%s: MTIP eth L2 switch %pM\n",
+			fep->ndev[i]->name, fep->ndev[i]->dev_addr);
+	}
+
+	return 0;
+
+ cleanup_created_ndev:
+	mtip_ndev_cleanup(fep);
+
+	return ret;
+}
+
+static void mtip_parse_of_cleanup(struct switch_enet_private *fep)
+{
+	int i;
+
+	for (i = 0; i < SWITCH_EPORT_NUMBER; i++)
+		if (fep->phy_np[i])
+			of_node_put(fep->phy_np[i]);
+}
+
+static int mtip_parse_of(struct switch_enet_private *fep,
+			 struct device_node *np)
+{
+	struct device_node *p;
+	unsigned int port_num;
+	int ret = 0;
+
+	p = of_get_child_by_name(np, "ethernet-ports");
+
+	for_each_available_child_of_node_scoped(p, port) {
+		if (of_property_read_u32(port, "reg", &port_num))
+			continue;
+
+		if (!(port_num == 1 || port_num == 2)) {
+			dev_err(&fep->pdev->dev,
+				"%s: The switch supports up to %d ports!\n",
+				__func__, SWITCH_EPORT_NUMBER);
+			goto of_get_err;
+		}
+
+		fep->n_ports = port_num;
+		ret = of_get_mac_address(port, &fep->mac[port_num - 1][0]);
+		if (ret)
+			dev_dbg(&fep->pdev->dev,
+				"of_get_mac_address(%pOF) failed (%d)!\n",
+				port, ret);
+
+		ret = of_property_read_string(port, "label",
+					      &fep->ndev_name[port_num - 1]);
+		if (ret < 0) {
+			dev_err(&fep->pdev->dev,
+				"%s: Cannot get ethernet port name (%d)!\n",
+				__func__, ret);
+			goto of_get_err;
+		}
+
+		ret = of_get_phy_mode(port, &fep->phy_interface[port_num - 1]);
+		if (ret < 0) {
+			dev_err(&fep->pdev->dev,
+				"%s: Cannot get PHY mode (%d)!\n", __func__,
+				ret);
+			goto of_get_err;
+		}
+
+		fep->phy_np[port_num - 1] = of_parse_phandle(port,
+							     "phy-handle", 0);
+		if (!fep->phy_np[port_num - 1]) {
+			dev_err(&fep->pdev->dev,
+				"%s: PHY handle not defined!\n", __func__);
+			ret = -ENODEV;
+			goto of_get_err;
+		}
+	}
+
+ of_get_err:
+	mtip_parse_of_cleanup(fep);
+	of_node_put(p);
+
+	return ret;
+}
+
+static const struct mtip_devinfo mtip_imx28_l2switch_info = {
+	.quirks = FEC_QUIRK_BUG_CAPTURE | FEC_QUIRK_SINGLE_MDIO |
+		  FEC_QUIRK_SWAP_FRAME,
+};
+
+static const struct of_device_id mtipl2_of_match[] = {
+	{ .compatible = "nxp,imx28-mtip-switch",
+	  .data = &mtip_imx28_l2switch_info},
+	{ /* sentinel */ }
+};
+MODULE_DEVICE_TABLE(of, mtipl2_of_match);
+
+static int mtip_sw_probe(struct platform_device *pdev)
+{
+	struct device_node *np = pdev->dev.of_node;
+	const struct mtip_devinfo *dev_info;
+	struct switch_enet_private *fep;
+	int ret;
+
+	fep = devm_kzalloc(&pdev->dev, sizeof(*fep), GFP_KERNEL);
+	if (!fep)
+		return -ENOMEM;
+
+	dev_info = of_device_get_match_data(&pdev->dev);
+	if (dev_info)
+		fep->quirks = dev_info->quirks;
+
+	fep->pdev = pdev;
+	platform_set_drvdata(pdev, fep);
+
+	fep->enet_addr = devm_platform_ioremap_resource(pdev, 0);
+	if (IS_ERR(fep->enet_addr))
+		return PTR_ERR(fep->enet_addr);
+
+	fep->irq = platform_get_irq_byname(pdev, "enet_switch");
+	if (fep->irq < 0)
+		return fep->irq;
+
+	/* Create an Ethernet device instance.
+	 * The switch lookup address memory starts at 0x800FC000
+	 */
+	fep->hwp_enet = fep->enet_addr;
+	fep->hwp = fep->enet_addr + ENET_SWI_PHYS_ADDR_OFFSET;
+	fep->hwentry = (struct mtip_addr_table __iomem *)
+		(fep->hwp + MCF_ESW_LOOKUP_MEM_OFFSET);
+
+	ret = devm_regulator_get_enable_optional(&pdev->dev, "phy");
+	if (ret < 0 && ret != -ENODEV)
+		return dev_err_probe(&pdev->dev, ret,
+				     "Unable to get and enable 'phy'\n");
+
+	fep->clk_ipg = devm_clk_get_enabled(&pdev->dev, "ipg");
+	if (IS_ERR(fep->clk_ipg))
+		return dev_err_probe(&pdev->dev, PTR_ERR(fep->clk_ipg),
+				     "Unable to acquire 'ipg' clock\n");
+
+	fep->clk_ahb = devm_clk_get_enabled(&pdev->dev, "ahb");
+	if (IS_ERR(fep->clk_ahb))
+		return dev_err_probe(&pdev->dev, PTR_ERR(fep->clk_ahb),
+				     "Unable to acquire 'ahb' clock\n");
+
+	fep->clk_enet_out = devm_clk_get_optional_enabled(&pdev->dev,
+							  "enet_out");
+	if (IS_ERR(fep->clk_enet_out))
+		return dev_err_probe(&pdev->dev, PTR_ERR(fep->clk_enet_out),
+				     "Unable to acquire 'enet_out' clock\n");
+
+	ret = mtip_parse_of(fep, np);
+	if (ret < 0)
+		return dev_err_probe(&pdev->dev, ret, "OF parse error\n");
+
+	/* setup MII interface for external switch ports */
+	mtip_enet_init(fep, 1);
+	mtip_enet_init(fep, 2);
+
+	spin_lock_init(&fep->learn_lock);
+	spin_lock_init(&fep->hw_lock);
+
+	ret = devm_request_irq(&pdev->dev, fep->irq, mtip_interrupt, 0,
+			       dev_name(&pdev->dev), fep);
+	if (ret) {
+		mtip_parse_of_cleanup(fep);
+		return dev_err_probe(&pdev->dev, ret, "Could not alloc IRQ\n");
+	}
+
+	mtip_clear_atable(fep);
+
+	ret = mtip_switch_dma_init(fep);
+	if (ret) {
+		dev_err(&pdev->dev, "%s: ethernet switch init fail (%d)!\n",
+			__func__, ret);
+		goto of_free_references;
+	}
+
+	ret = mtip_mii_init(fep, pdev);
+	if (ret) {
+		dev_err(&pdev->dev, "%s: Cannot init phy bus (%d)!\n", __func__,
+			ret);
+		goto dma_free_coherent_memory;
+	}
+
+	ret = mtip_ndev_init(fep, pdev);
+	if (ret) {
+		dev_err(&pdev->dev, "%s: Failed to create virtual ndev (%d)\n",
+			__func__, ret);
+		goto mdiobus_free_memory;
+	}
+
+	/* setup timer for learning aging function */
+	timer_setup(&fep->timer_mgnt, mtip_mgnt_timer, 0);
+	mod_timer(&fep->timer_mgnt,
+		  jiffies + msecs_to_jiffies(LEARNING_AGING_INTERVAL));
+
+	return 0;
+
+ mdiobus_free_memory:
+	mdiobus_unregister(fep->mii_bus);
+	mdiobus_free(fep->mii_bus);
+ dma_free_coherent_memory:
+	dma_free_coherent(&fep->pdev->dev, PAGE_SIZE, fep->rx_bd_base,
+			  fep->bd_dma);
+	fep->rx_bd_base = NULL;
+	fep->tx_bd_base = NULL;
+ of_free_references:
+	mtip_parse_of_cleanup(fep);
+
+	return ret;
+}
+
+static void mtip_sw_remove(struct platform_device *pdev)
+{
+	struct switch_enet_private *fep = platform_get_drvdata(pdev);
+
+	timer_delete_sync(&fep->timer_mgnt);
+
+	mtip_ndev_cleanup(fep);
+
+	mtip_parse_of_cleanup(fep);
+	mtip_mii_remove(fep);
+
+	dma_free_coherent(&fep->pdev->dev, PAGE_SIZE, fep->rx_bd_base,
+			  fep->bd_dma);
+	fep->rx_bd_base = NULL;
+	fep->tx_bd_base = NULL;
+
+	platform_set_drvdata(pdev, NULL);
+}
+
+static struct platform_driver mtipl2plat_driver = {
+	.driver         = {
+		.name   = "mtipl2sw",
+		.of_match_table = mtipl2_of_match,
+		.suppress_bind_attrs = true,
+	},
+	.probe          = mtip_sw_probe,
+	.remove         = mtip_sw_remove,
+};
+
+module_platform_driver(mtipl2plat_driver);
+
+MODULE_AUTHOR("Lukasz Majewski <lukma@denx.de>");
+MODULE_DESCRIPTION("Driver for MTIP L2 on SOC switch");
+MODULE_LICENSE("GPL");
diff --git a/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.h b/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.h
new file mode 100644
index 000000000000..7fbad157ff83
--- /dev/null
+++ b/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.h
@@ -0,0 +1,623 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ *  Copyright (C) 2026 NABLADEV Software Engineering GmbH
+ *  Lukasz Majewski <lukma@nabladev.com>
+ */
+
+#ifndef __MTIP_L2SWITCH_H_
+#define __MTIP_L2SWITCH_H_
+
+#include <linux/clocksource.h>
+#include <linux/net_tstamp.h>
+#include <linux/netdevice.h>
+#include <linux/phy.h>
+#include <linux/ptp_clock_kernel.h>
+#include <linux/timecounter.h>
+
+#define PKT_MAXBUF_SIZE         1518
+#define PKT_MINBUF_SIZE         64
+#define PKT_MAXBLR_SIZE         1536
+
+/* The number of Tx and Rx buffers. These are allocated from the page
+ * pool. The code may assume these are power of two, so it is best
+ * to keep them that size.
+ * We don't need to allocate pages for the transmitter.  We just use
+ * the skbuffer directly.
+ */
+#define MTIP_SWITCH_RX_PAGES       128
+#define MTIP_SWITCH_RX_FRSIZE      2048
+#define MTIP_SWITCH_RX_FRPPG       (PAGE_SIZE / MTIP_SWITCH_RX_FRSIZE)
+#define RX_RING_SIZE            (MTIP_SWITCH_RX_FRPPG * MTIP_SWITCH_RX_PAGES)
+#define MTIP_SWITCH_TX_FRSIZE      2048
+
+#define TX_RING_SIZE            256      /* Must be power of two */
+#define TX_RING_MOD_MASK        GENMASK(7, 0)      /*   for this to work */
+
+#define SWITCH_EPORT_NUMBER	2
+
+#if (((RX_RING_SIZE + TX_RING_SIZE) * 8) > PAGE_SIZE)
+#error "L2SWITCH: descriptor ring size constants too large"
+#endif
+
+#define ESW_REVISION        (0x000)
+#define ESW_SCRATCH         (0x004)
+#define ESW_PER             (0x008)
+#define ESW_VLANV           (0x010)
+#define ESW_DBCR            (0x014)
+#define ESW_DMCR            (0x018)
+#define ESW_BKLR            (0x01C)
+#define ESW_BMPC            (0x020)
+#define ESW_MODE            (0x024)
+#define ESW_VIMSEL          (0x028)
+#define ESW_VOMSEL          (0x02C)
+#define ESW_VIMEN           (0x030)
+#define ESW_VID             (0x034)
+
+#define ESW_MCR             (0x040)
+#define ESW_EGMAP           (0x044)
+#define ESW_INGMAP          (0x048)
+#define ESW_INGSAL          (0x04C)
+#define ESW_INGSAH          (0x050)
+#define ESW_INGDAL          (0x054)
+#define ESW_INGDAH          (0x058)
+#define ESW_ENGSAL          (0x05C)
+#define ESW_ENGSAH          (0x060)
+#define ESW_ENGDAL          (0x064)
+#define ESW_ENGDAH          (0x068)
+#define ESW_MCVAL           (0x06C)
+
+#define ESW_MMSR            (0x080)
+#define ESW_LMT             (0x084)
+#define ESW_LFC             (0x088)
+#define ESW_PCSR            (0x08C)
+#define ESW_IOSR            (0x090)
+#define ESW_QWT             (0x094)
+
+#define ESW_P0BCT           (0x09C)
+
+#define ESW_P0FFEN          (0x0BC)
+
+#define ESW_PSNP_BASE       (0x0C0)
+#define ESW_PSNP(x)         (ESW_PSNP_BASE + (4 * (x)))
+
+#define ESW_IPSNP_BASE      (0x0EC)
+#define ESW_IPSNP(x)        (ESW_IPSNP_BASE + (4 * (x)))
+
+#define ESW_PVRES_BASE      (0x100)
+#define ESW_PVRES(x)        (ESW_PVRES_BASE + (4 * (x)))
+
+#define ESW_IPRES           (0x140)
+
+#define ESW_PRES_BASE       (0x180)
+#define ESW_PRES(x)         (ESW_PRES_BASE + (4 * (x)))
+
+#define ESW_PID_BASE        (0x200)
+#define ESW_PID(x)          (ESW_PID_BASE + (4 * (x)))
+
+#define ESW_VRES_BASE       (0x280)
+#define ESW_VRES(x)         (ESW_VRES_BASE + (4 * (x)))
+
+#define ESW_DISCN           (0x300)
+#define ESW_DISCB           (0x304)
+#define ESW_NDISCN          (0x308)
+#define ESW_NDISCB          (0x30C)
+
+#define ESW_ISR             (0x400)
+#define ESW_IMR             (0x404)
+#define ESW_RDSR            (0x408)
+#define ESW_TDSR            (0x40C)
+#define ESW_MRBR            (0x410)
+#define ESW_RDAR            (0x414)
+#define ESW_TDAR            (0x418)
+
+#define ESW_LREC0           (0x500)
+#define ESW_LREC1           (0x504)
+#define ESW_LSR             (0x508)
+
+struct addr_table64b_entry {
+	u32 lo;  /* lower 32 bits */
+	u32 hi;  /* upper 32 bits */
+};
+
+struct mtip_addr_table {
+	struct addr_table64b_entry  mtip_table64b_entry[2048];
+};
+
+#define MCF_ESW_LOOKUP_MEM_OFFSET     0x4000
+#define MCF_ESW_ENET_PORT_OFFSET      0x4000
+#define ENET_SWI_PHYS_ADDR_OFFSET     0x8000
+#define MCF_ESW_PER	(0x08)
+#define MCF_ESW_DBCR	(0x14)
+#define MCF_ESW_IMR	(0x404)
+
+#define MCF_FEC_BASE_ADDR	(fep->enet_addr)
+#define MCF_FEC_EIR		(0x04)
+#define MCF_FEC_EIMR		(0x08)
+#define MCF_FEC_MMFR		(0x40)
+#define MCF_FEC_MSCR		(0x44)
+
+#define MCF_FEC_RCR		(0x84)
+#define MCF_FEC_TCR		(0xC4)
+#define MCF_FEC_ECR		(0x24)
+
+#define MCF_FEC_PALR          (0xE4)
+#define MCF_FEC_PAUR          (0xE8)
+
+#define MCF_FEC_ERDSR         (0x180)
+#define MCF_FEC_ETDSR         (0x184)
+
+#define MCF_FEC_IAUR          (0x118)
+#define MCF_FEC_IALR          (0x11C)
+
+#define MCF_FEC_GAUR          (0x120)
+#define MCF_FEC_GALR          (0x124)
+
+#define MCF_FEC_EMRBR         (0x188)
+
+#define MCF_FEC_RCR_DRT	  BIT(1)
+#define MCF_FEC_RCR_MII_MODE      BIT(2)
+#define MCF_FEC_RCR_PROM          BIT(3)
+#define MCF_FEC_RCR_FCE	  BIT(5)
+#define MCF_FEC_RCR_RMII_MODE     BIT(8)
+#define MCF_FEC_RCR_RMII_10BASET  BIT(9)
+#define MCF_FEC_RCR_MAX_FL_MASK   GENMASK(29, 16)
+#define MCF_FEC_RCR_CRC_FWD       BIT(14)
+#define MCF_FEC_RCR_NO_LGTH_CHECK BIT(30)
+#define MCF_FEC_TCR_FDEN          BIT(2)
+
+#define MCF_FEC_ECR_RESET      BIT(0)
+#define MCF_FEC_ECR_ETHER_EN   BIT(1)
+#define MCF_FEC_ECR_MAGIC_ENA  BIT(2)
+#define MCF_FEC_ECR_ENA_1588   BIT(4)
+
+#define MTIP_ALIGNMENT  GENMASK(3, 0)
+#define MCF_ENET_MII	BIT(23)
+
+/* FEC MII MMFR bits definition */
+#define FEC_MMFR_ST             BIT(30)
+#define FEC_MMFR_OP_READ        BIT(29)
+#define FEC_MMFR_OP_WRITE       BIT(28)
+#define FEC_MMFR_PA_MASK        GENMASK(27, 23)
+#define FEC_MMFR_RA_MASK        GENMASK(22, 18)
+#define FEC_MMFR_TA             BIT(17)
+#define FEC_MMFR_DATA_MASK      GENMASK(15, 0)
+
+/* Port 0 backpressure congestion threshold */
+#define P0BC_THRESHOLD		0x40
+#define LEARNING_AGING_INTERVAL 100
+/* Info received from Hardware Learning FIFO,
+ * holding MAC address and corresponding Hash Value and
+ * port number where the frame was received (disassembled).
+ */
+struct mtip_port_info {
+	/* MAC lower 32 bits (first byte is 7:0). */
+	u32 maclo;
+	/* MAC upper 16 bits (47:32). */
+	u32 machi;
+	/* the hash value for this MAC address. */
+	u32 hash;
+	/* the port number this MAC address is associated with. */
+	u32 port;
+};
+
+/* Define the buffer descriptor structure. */
+struct cbd_t {
+	u16 cbd_datlen;		/* Data length */
+	u16 cbd_sc;			/* Control and status info */
+	u32 cbd_bufaddr;		/* Buffer address */
+};
+
+/* The switch buffer descriptors track the ring buffers. The rx_bd_base and
+ * tx_bd_base always point to the base of the buffer descriptors.  The
+ * cur_rx and cur_tx point to the currently available buffer.
+ * The dirty_tx tracks the current buffer that is being sent by the
+ * controller. The cur_tx and dirty_tx are equal under both completely
+ * empty and completely full conditions.  The empty/ready indicator in
+ * the buffer descriptor determines the actual condition.
+ */
+struct switch_enet_private {
+	/* Base addresses for HW registers of the switch device */
+	void __iomem *hwp_enet, *hwp, *enet_addr;
+	struct mtip_addr_table __iomem *hwentry;
+
+	struct platform_device *pdev;
+
+	/* Switch internals */
+	struct mtip_port_info g_info;
+
+	/* Clocks */
+	struct clk *clk_ipg;
+	struct clk *clk_ahb;
+	struct clk *clk_enet_out;
+
+	/* skbuff */
+	unsigned char *tx_bounce[TX_RING_SIZE];
+	struct sk_buff *tx_skbuff[TX_RING_SIZE];
+
+	/* page_pool */
+	struct page_pool *page_pool;
+	struct page *page[RX_RING_SIZE];
+
+	/* DMA */
+	dma_addr_t bd_dma;
+	struct cbd_t *rx_bd_base;	/* Address of Rx and Tx buffers. */
+	struct cbd_t *tx_bd_base;
+	struct cbd_t *cur_rx, *cur_tx;	/* The next free ring entry */
+	struct cbd_t *dirty_tx;      /* The ring entries to be free()ed. */
+
+	/* Locking */
+	spinlock_t hw_lock; /* Lock for HW configuration */
+	spinlock_t learn_lock; /* Lock for learning DB adjustments */
+
+	/* NAPI support */
+	struct napi_struct napi;
+
+	/* Timer for switch management (learning and aging) */
+	struct timer_list timer_mgnt;
+	int at_block_overflows;
+
+	/* PHY and MDIO */
+	struct mii_bus *mii_bus;
+	struct phy_device *phy_dev[SWITCH_EPORT_NUMBER];
+	uint phy_speed;
+	int link[SWITCH_EPORT_NUMBER];
+	int full_duplex[SWITCH_EPORT_NUMBER];
+	phy_interface_t phy_interface[SWITCH_EPORT_NUMBER];
+	struct device_node *phy_np[SWITCH_EPORT_NUMBER];
+
+	/* IRQ number */
+	int irq;
+
+	/* lan[01] ports */
+	int n_ports;
+	const char *ndev_name[SWITCH_EPORT_NUMBER];
+	struct net_device *ndev[SWITCH_EPORT_NUMBER];
+	unsigned char mac[SWITCH_EPORT_NUMBER][ETH_ALEN];
+
+	/* Switch state */
+	u8 br_members; /* Bit field with active members */
+	u8 br_offload; /* Bridge in-HW offloading flag */
+	int usage_count; /* Number of configured ports */
+
+	/* Driver related */
+	u32 quirks;
+};
+
+struct mtip_ndev_priv {
+	int portnum;
+	struct net_device *dev;
+	struct net_device_stats stats;
+	struct net_device *master_dev;
+	struct switch_enet_private *fep;
+	struct work_struct tx_timeout_work;
+};
+
+#define MCF_FEC_MII_DATA	0x040 /* MII manage frame reg */
+#define MCF_FEC_GRP_HASH_TABLE_HIGH	0x120 /* High 32bits hash table */
+#define MCF_FEC_GRP_HASH_TABLE_LOW	0x124 /* Low 32bits hash table */
+
+#define BD_SC_EMPTY     ((ushort)0x8000) /* Receive is empty */
+#define BD_SC_READY     ((ushort)0x8000) /* Transmit is ready */
+#define BD_SC_WRAP      ((ushort)0x2000) /* Last buffer descriptor */
+#define BD_SC_INTRPT    ((ushort)0x1000) /* Interrupt on change */
+#define BD_SC_CM        ((ushort)0x0200) /* Continuous mode */
+#define BD_SC_ID        ((ushort)0x0100) /* Rec'd too many idles */
+#define BD_SC_P         ((ushort)0x0100) /* xmt preamble */
+#define BD_SC_BR        ((ushort)0x0020) /* Break received */
+#define BD_SC_FR        ((ushort)0x0010) /* Framing error */
+#define BD_SC_PR        ((ushort)0x0008) /* Parity error */
+#define BD_SC_OV        ((ushort)0x0002) /* Overrun */
+#define BD_SC_CD        ((ushort)0x0001)
+
+/* Buffer descriptor control/status used by Ethernet receive. */
+#define BD_ENET_RX_EMPTY        ((ushort)0x8000)
+#define BD_ENET_RX_WRAP         ((ushort)0x2000)
+#define BD_ENET_RX_INTR         ((ushort)0x1000)
+#define BD_ENET_RX_LAST         ((ushort)0x0800)
+#define BD_ENET_RX_FIRST        ((ushort)0x0400)
+#define BD_ENET_RX_MISS         ((ushort)0x0100)
+#define BD_ENET_RX_LG           ((ushort)0x0020)
+#define BD_ENET_RX_NO           ((ushort)0x0010)
+#define BD_ENET_RX_SH           ((ushort)0x0008)
+#define BD_ENET_RX_CR           ((ushort)0x0004)
+#define BD_ENET_RX_OV           ((ushort)0x0002)
+#define BD_ENET_RX_CL           ((ushort)0x0001)
+/* All status bits */
+#define BD_ENET_RX_STATS        ((ushort)0x013f)
+
+/* Buffer descriptor control/status used by Ethernet transmit.*/
+#define BD_ENET_TX_READY        ((ushort)0x8000)
+#define BD_ENET_TX_PAD          ((ushort)0x4000)
+#define BD_ENET_TX_WRAP         ((ushort)0x2000)
+#define BD_ENET_TX_INTR         ((ushort)0x1000)
+#define BD_ENET_TX_LAST         ((ushort)0x0800)
+#define BD_ENET_TX_TC           ((ushort)0x0400)
+#define BD_ENET_TX_DEF          ((ushort)0x0200)
+#define BD_ENET_TX_HB           ((ushort)0x0100)
+#define BD_ENET_TX_LC           ((ushort)0x0080)
+#define BD_ENET_TX_RL           ((ushort)0x0040)
+#define BD_ENET_TX_RCMASK       ((ushort)0x003c)
+#define BD_ENET_TX_UN           ((ushort)0x0002)
+#define BD_ENET_TX_CSL          ((ushort)0x0001)
+/* All status bits */
+#define BD_ENET_TX_STATS        ((ushort)0x03ff)
+
+/* Copy from validation code */
+#define RX_BUFFER_SIZE 256
+#define TX_BUFFER_SIZE 256
+
+#define TX_BD_R                 BIT(15)
+#define TX_BD_TO1               BIT(14)
+#define TX_BD_W                 BIT(13)
+#define TX_BD_TO2               BIT(12)
+#define TX_BD_L                 BIT(11)
+#define TX_BD_TC                BIT(10)
+
+#define TX_BD_INT       BIT(30)
+#define TX_BD_TS        BIT(29)
+#define TX_BD_PINS      BIT(28)
+#define TX_BD_IINS      BIT(27)
+#define TX_BD_TXE       BIT(15)
+#define TX_BD_UE        BIT(13)
+#define TX_BD_EE        BIT(12)
+#define TX_BD_FE        BIT(11)
+#define TX_BD_LCE       BIT(10)
+#define TX_BD_OE        BIT(9)
+#define TX_BD_TSE       BIT(8)
+#define TX_BD_BDU       BIT(31)
+
+#define RX_BD_E                 BIT(15)
+#define RX_BD_R01               BIT(14)
+#define RX_BD_W                 BIT(13)
+#define RX_BD_R02               BIT(12)
+#define RX_BD_L                 BIT(11)
+#define RX_BD_M                 BIT(8)
+#define RX_BD_BC                BIT(7)
+#define RX_BD_MC                BIT(6)
+#define RX_BD_LG                BIT(5)
+#define RX_BD_NO                BIT(4)
+#define RX_BD_CR                BIT(2)
+#define RX_BD_OV                BIT(1)
+#define RX_BD_TR                BIT(0)
+
+#define RX_BD_ME               BIT(31)
+#define RX_BD_PE               BIT(26)
+#define RX_BD_CE               BIT(25)
+#define RX_BD_UC               BIT(24)
+#define RX_BD_INT              BIT(23)
+#define RX_BD_ICE              BIT(5)
+#define RX_BD_PCR              BIT(4)
+#define RX_BD_VLAN             BIT(2)
+#define RX_BD_IPV6             BIT(1)
+#define RX_BD_FRAG             BIT(0)
+#define RX_BD_BDU              BIT(31)
+/****************************************************************************/
+
+/* Address Table size in bytes(2048 64bit entry ) */
+#define MTIP_ATABLE_MEM_SIZE         (2048 * 8)
+/* How many 64-bit elements fit in the address table */
+#define MTIP_ATABLE_MEM_NUM_ENTRIES  (2048)
+/* Address Table Maximum number of entries in each Slot */
+#define ATABLE_ENTRY_PER_SLOT 8
+/* log2(ATABLE_ENTRY_PER_SLOT)*/
+#define ATABLE_ENTRY_PER_SLOT_bits 3
+/* entry size in byte */
+#define ATABLE_ENTRY_SIZE 8
+/*  slot size in byte */
+#define ATABLE_SLOT_SIZE (ATABLE_ENTRY_PER_SLOT * ATABLE_ENTRY_SIZE)
+/* timestamp variable mask within address table entry */
+#define AT_DENTRY_TIMESTAMP_MASK GENMASK(9, 0)
+
+/* number of bits for port bitmask number storage */
+#define AT_SENTRY_PORT_WIDTH 11
+/* address table static entry port bitmask start address bit */
+#define AT_SENTRY_PORTMASK_shift 21
+/* address table static entry priority start address bit */
+#define AT_SENTRY_PRIO_shift 18
+/* address table dynamic entry port start address bit */
+#define AT_DENTRY_PORT_shift 28
+/* address table dynamic entry timestamp start address bit */
+#define AT_DENTRY_TIME_shift 18
+/* address table entry record type start address bit */
+#define AT_ENTRY_TYPE_shift 17
+/* address table entry record type bit: 1 static, 0 dynamic */
+#define AT_ENTRY_TYPE_STATIC 1
+#define AT_ENTRY_TYPE_DYNAMIC 0
+/* address table entry record valid start address bit */
+#define AT_ENTRY_VALID_shift 16
+#define AT_ENTRY_RECORD_VALID 1
+
+/* return block corresponding to the 8 bit hash value calculated */
+#define GET_BLOCK_PTR(hash) ((hash) << 3)
+
+/* time stamp storage mask */
+#define AT_TIMESTAMP_MASK GENMASK(27, 18)
+/* port number storage mask */
+#define AT_PORT_MASK GENMASK(31, 28)
+
+static inline int mtip_timedelta(unsigned int curr_time, int time)
+{
+	return (curr_time - time) & AT_DENTRY_TIMESTAMP_MASK;
+}
+
+/* get monotonic time for switching table entries from jiffies */
+static inline int mtip_get_time(void)
+{
+	return (jiffies_to_msecs(jiffies) / LEARNING_AGING_INTERVAL)
+		& AT_DENTRY_TIMESTAMP_MASK;
+}
+
+/* ------------------------------------------------------------------------- */
+/* Bit definitions and macros for MCF_ESW_PER */
+#define MCF_ESW_PER_TE0                        BIT(0)
+#define MCF_ESW_PER_TE1                        BIT(1)
+#define MCF_ESW_PER_TE2                        BIT(2)
+#define MCF_ESW_PER_RE0                        BIT(16)
+#define MCF_ESW_PER_RE1                        BIT(17)
+#define MCF_ESW_PER_RE2                        BIT(18)
+
+/* Bit definitions and macros for MCF_ESW_VLANV */
+#define MCF_ESW_VLANV_VV0                      BIT(0)
+#define MCF_ESW_VLANV_VV1                      BIT(1)
+#define MCF_ESW_VLANV_VV2                      BIT(2)
+#define MCF_ESW_VLANV_DU0                      BIT(16)
+#define MCF_ESW_VLANV_DU1                      BIT(17)
+#define MCF_ESW_VLANV_DU2                      BIT(18)
+
+/* Bit definitions and macros for MCF_ESW_DBCR */
+#define MCF_ESW_DBCR_P0                        BIT(0)
+#define MCF_ESW_DBCR_P1                        BIT(1)
+#define MCF_ESW_DBCR_P2                        BIT(2)
+
+/* Bit definitions and macros for MCF_ESW_DMCR */
+#define MCF_ESW_DMCR_P0                        BIT(0)
+#define MCF_ESW_DMCR_P1                        BIT(1)
+#define MCF_ESW_DMCR_P2                        BIT(2)
+
+/* Bit definitions and macros for MCF_ESW_BKLR */
+#define MCF_ESW_BKLR_BE0                       BIT(0)
+#define MCF_ESW_BKLR_BE1                       BIT(1)
+#define MCF_ESW_BKLR_BE2                       BIT(2)
+#define MCF_ESW_BKLR_LD0                       BIT(16)
+#define MCF_ESW_BKLR_LD1                       BIT(17)
+#define MCF_ESW_BKLR_LD2                       BIT(18)
+
+/* Bit definitions and macros for MCF_ESW_MODE */
+#define MCF_ESW_MODE_SW_RST                    BIT(0)
+#define MCF_ESW_MODE_SW_EN                     BIT(1)
+#define MCF_ESW_MODE_STOP                      BIT(7)
+#define MCF_ESW_MODE_CRC_TRAN                  BIT(8)
+#define MCF_ESW_MODE_P0CT                      BIT(9)
+#define MCF_ESW_MODE_STATRST                   BIT(31)
+
+/* Bit definitions and macros for MCF_ESW_VIMSEL */
+#define MCF_ESW_VIMSEL_IM0_MASK                GENMASK(1, 0)
+#define MCF_ESW_VIMSEL_IM1_MASK                GENMASK(3, 2)
+#define MCF_ESW_VIMSEL_IM2_MASK                GENMASK(5, 4)
+
+/* Bit definitions and macros for MCF_ESW_VOMSEL */
+#define MCF_ESW_VOMSEL_OM0_MASK                GENMASK(1, 0)
+#define MCF_ESW_VOMSEL_OM1_MASK                GENMASK(3, 2)
+#define MCF_ESW_VOMSEL_OM2_MASK                GENMASK(5, 4)
+
+/* Bit definitions and macros for MCF_ESW_VIMEN */
+#define MCF_ESW_VIMEN_EN0                      BIT(0)
+#define MCF_ESW_VIMEN_EN1                      BIT(1)
+#define MCF_ESW_VIMEN_EN2                      BIT(2)
+
+/* Bit definitions and macros for MCF_ESW_MCR */
+#define MCF_ESW_MCR_PORT_MASK                  GENMASK(3, 0)
+#define MCF_ESW_MCR_MEN                        BIT(4)
+#define MCF_ESW_MCR_INGMAP                     BIT(5)
+#define MCF_ESW_MCR_EGMAP                      BIT(6)
+#define MCF_ESW_MCR_INGSA                      BIT(7)
+#define MCF_ESW_MCR_INGDA                      BIT(8)
+#define MCF_ESW_MCR_EGSA                       BIT(9)
+#define MCF_ESW_MCR_EGDA                       BIT(10)
+
+/* Bit definitions and macros for MCF_ESW_EGMAP */
+#define MCF_ESW_EGMAP_EG0                      BIT(0)
+#define MCF_ESW_EGMAP_EG1                      BIT(1)
+#define MCF_ESW_EGMAP_EG2                      BIT(2)
+
+/* Bit definitions and macros for MCF_ESW_INGMAP */
+#define MCF_ESW_INGMAP_ING0                    BIT(0)
+#define MCF_ESW_INGMAP_ING1                    BIT(1)
+#define MCF_ESW_INGMAP_ING2                    BIT(2)
+
+/* Bit definitions and macros for MCF_ESW_MMSR */
+#define MCF_ESW_MMSR_BUSY                      BIT(0)
+#define MCF_ESW_MMSR_NOCELL                    BIT(1)
+#define MCF_ESW_MMSR_MEMFULL                   BIT(2)
+#define MCF_ESW_MMSR_MFLATCH                   BIT(3)
+#define MCF_ESW_MMSR_DQ_GRNT                   BIT(6)
+#define MCF_ESW_MMSR_CELLS_AVAIL_MASK          GENMASK(23, 16)
+
+/* Bit definitions and macros for MCF_ESW_PCSR */
+#define MCF_ESW_PCSR_PC0                       BIT(0)
+#define MCF_ESW_PCSR_PC1                       BIT(1)
+#define MCF_ESW_PCSR_PC2                       BIT(2)
+
+/* Bit definitions and macros for MCF_ESW_IOSR */
+#define MCF_ESW_IOSR_OR0                       BIT(0)
+#define MCF_ESW_IOSR_OR1                       BIT(1)
+#define MCF_ESW_IOSR_OR2                       BIT(2)
+
+/* Bit definitions and macros for MCF_ESW_P0BCT */
+#define MCF_ESW_P0BCT_THRESH_MASK              GENMASK(7, 0)
+
+/* Bit definitions and macros for MCF_ESW_P0FFEN */
+#define MCF_ESW_P0FFEN_FEN                     BIT(0)
+#define MCF_ESW_P0FFEN_FD_MASK                 GENMASK(3, 2)
+
+/* Bit definitions and macros for MCF_ESW_PID */
+#define MCF_ESW_PID_VLANID_MASK                GENMASK(15, 0)
+
+/* Bit definitions and macros for MCF_ESW_VRES */
+#define MCF_ESW_VRES_P0                        BIT(0)
+#define MCF_ESW_VRES_P1                        BIT(1)
+#define MCF_ESW_VRES_P2                        BIT(2)
+#define MCF_ESW_VRES_VLANID_MASK               GENMASK(14, 3)
+
+/* Bit definitions and macros for MCF_ESW_ISR */
+#define MCF_ESW_ISR_EBERR                      BIT(0)
+#define MCF_ESW_ISR_RXB                        BIT(1)
+#define MCF_ESW_ISR_RXF                        BIT(2)
+#define MCF_ESW_ISR_TXB                        BIT(3)
+#define MCF_ESW_ISR_TXF                        BIT(4)
+#define MCF_ESW_ISR_QM                         BIT(5)
+#define MCF_ESW_ISR_OD0                        BIT(6)
+#define MCF_ESW_ISR_OD1                        BIT(7)
+#define MCF_ESW_ISR_OD2                        BIT(8)
+#define MCF_ESW_ISR_LRN                        BIT(9)
+
+/* Bit definitions and macros for MCF_ESW_IMR */
+#define MCF_ESW_IMR_EBERR                      BIT(0)
+#define MCF_ESW_IMR_RXB                        BIT(1)
+#define MCF_ESW_IMR_RXF                        BIT(2)
+#define MCF_ESW_IMR_TXB                        BIT(3)
+#define MCF_ESW_IMR_TXF                        BIT(4)
+#define MCF_ESW_IMR_QM                         BIT(5)
+#define MCF_ESW_IMR_OD0                        BIT(6)
+#define MCF_ESW_IMR_OD1                        BIT(7)
+#define MCF_ESW_IMR_OD2                        BIT(8)
+#define MCF_ESW_IMR_LRN                        BIT(9)
+
+/* Bit definitions and macros for MCF_ESW_RDSR */
+#define MCF_ESW_RDSR_ADDRESS_MASK              GENMASK(31, 2)
+
+/* Bit definitions and macros for MCF_ESW_TDSR */
+#define MCF_ESW_TDSR_ADDRESS_MASK              GENMASK(31, 2)
+
+/* Bit definitions and macros for MCF_ESW_MRBR */
+#define MCF_ESW_MRBR_SIZE_MASK                 GENMASK(13, 4)
+
+/* Bit definitions and macros for MCF_ESW_RDAR */
+#define MCF_ESW_RDAR_R_DES_ACTIVE              BIT(24)
+
+/* Bit definitions and macros for MCF_ESW_TDAR */
+#define MCF_ESW_TDAR_X_DES_ACTIVE              BIT(24)
+
+/* Bit definitions and macros for MCF_ESW_LSR */
+#define MCF_ESW_LSR_DA                         BIT(0)
+
+/* QUIRKS */
+/* Controller needs driver to swap frame */
+#define FEC_QUIRK_SWAP_FRAME		BIT(1)
+/* ENET Block Guide/ Chapter for the iMX6SX (PELE) address one issue:
+ * After set ENET_ATCR[Capture], there need some time cycles before the counter
+ * value is capture in the register clock domain.
+ * The wait-time-cycles is at least 6 clock cycles of the slower clock between
+ * the register clock and the 1588 clock. The 1588 ts_clk is fixed to 25Mhz,
+ * register clock is 66Mhz, so the wait-time-cycles must be greater than 240ns
+ * (40ns * 6).
+ */
+#define FEC_QUIRK_BUG_CAPTURE		BIT(10)
+/* Controller has only one MDIO bus */
+#define FEC_QUIRK_SINGLE_MDIO		BIT(11)
+
+#define MTIP_PORT_FORWARDING_INIT 0xFF
+
+bool mtip_is_switch_netdev_port(const struct net_device *ndev);
+void mtip_clear_atable(struct switch_enet_private *fep);
+#endif /* __MTIP_L2SWITCH_H_ */
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [net-next v24 3/7] net: mtip: Add buffers management functions to the L2 switch driver
  2026-06-01 11:24 [net-next v24 0/7] net: mtip: Add support for MTIP imx287 L2 switch driver Lukasz Majewski
  2026-06-01 11:24 ` [net-next v24 1/7] dt-bindings: net: Add MTIP L2 switch description Lukasz Majewski
  2026-06-01 11:24 ` [net-next v24 2/7] net: mtip: The L2 switch driver for imx287 Lukasz Majewski
@ 2026-06-01 11:24 ` Lukasz Majewski
  2026-06-02 11:32   ` sashiko-bot
  2026-06-05  2:18   ` Jakub Kicinski
  2026-06-01 11:24 ` [net-next v24 4/7] net: mtip: Add net_device_ops " Lukasz Majewski
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 21+ messages in thread
From: Lukasz Majewski @ 2026-06-01 11:24 UTC (permalink / raw)
  To: Andrew Lunn, davem, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Richard Cochran, Liam Girdwood, Nathan Chancellor, linux-kernel,
	netdev, Krzysztof Kozlowski, imx, linux-arm-kernel, Rob Herring,
	Lukasz Majewski

This patch provides buffers management funcions' content for MTIP
L2 switch.

Signed-off-by: Lukasz Majewski <lukasz.majewski@mailbox.org>
---
Changes for v14:
- New patch - created by excluding some code from large (i.e. v13
  and earlier) MTIP driver

Changes for v15 - v20:
- None

Changes for v21:
- Add check if fep->page[i] is not NULL before returning it to page pool

Changes for v22:
- Add fep->tx_bounce[i] = NULL; in the mtip_free_buffers

Changes for v23-v24:
- None
---
 .../net/ethernet/freescale/mtipsw/mtipl2sw.c  | 90 +++++++++++++++++++
 1 file changed, 90 insertions(+)

diff --git a/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c b/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c
index 1d1005a42306..5b5229ee411d 100644
--- a/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c
+++ b/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c
@@ -892,11 +892,101 @@ static void mtip_get_drvinfo(struct net_device *dev,
 
 static void mtip_free_buffers(struct net_device *dev)
 {
+	struct mtip_ndev_priv *priv = netdev_priv(dev);
+	struct switch_enet_private *fep = priv->fep;
+	int i;
+
+	for (i = 0; i < RX_RING_SIZE; i++) {
+		if (!fep->page[i])
+			continue;
+
+		page_pool_put_full_page(fep->page_pool,
+					fep->page[i], false);
+		fep->page[i] = NULL;
+	}
+
+	page_pool_destroy(fep->page_pool);
+	fep->page_pool = NULL;
+
+	for (i = 0; i < TX_RING_SIZE; i++) {
+		kfree(fep->tx_bounce[i]);
+		fep->tx_bounce[i] = NULL;
+	}
+}
+
+static int mtip_create_page_pool(struct switch_enet_private *fep, int size)
+{
+	struct page_pool_params pp_params = {
+		.order = 0,
+		.flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV,
+		.pool_size = size,
+		.nid = dev_to_node(&fep->pdev->dev),
+		.dev = &fep->pdev->dev,
+		.dma_dir = DMA_FROM_DEVICE,
+		.offset = 0,
+		.max_len = MTIP_SWITCH_RX_FRSIZE,
+	};
+	int ret = 0;
+
+	fep->page_pool = page_pool_create(&pp_params);
+	if (IS_ERR(fep->page_pool)) {
+		ret = PTR_ERR(fep->page_pool);
+		fep->page_pool = NULL;
+	}
+
+	return ret;
 }
 
 static int mtip_alloc_buffers(struct net_device *dev)
 {
+	struct mtip_ndev_priv *priv = netdev_priv(dev);
+	struct switch_enet_private *fep = priv->fep;
+	struct cbd_t *bdp;
+	struct page *page;
+	int i, ret;
+
+	ret = mtip_create_page_pool(fep, RX_RING_SIZE);
+	if (ret < 0) {
+		dev_err(&fep->pdev->dev, "Failed to create page pool\n");
+		return ret;
+	}
+
+	bdp = fep->rx_bd_base;
+	for (i = 0; i < RX_RING_SIZE; i++) {
+		page = page_pool_dev_alloc_pages(fep->page_pool);
+		if (!page) {
+			dev_err(&fep->pdev->dev,
+				"Failed to allocate page for rx buffer\n");
+			goto err;
+		}
+
+		bdp->cbd_bufaddr = page_pool_get_dma_addr(page);
+		fep->page[i] = page;
+
+		bdp->cbd_sc = BD_ENET_RX_EMPTY;
+		bdp++;
+	}
+
+	mtip_set_last_buf_to_wrap(bdp);
+
+	bdp = fep->tx_bd_base;
+	for (i = 0; i < TX_RING_SIZE; i++) {
+		fep->tx_bounce[i] = kmalloc(MTIP_SWITCH_TX_FRSIZE, GFP_KERNEL);
+		if (!fep->tx_bounce[i])
+			goto err;
+
+		bdp->cbd_sc = 0;
+		bdp->cbd_bufaddr = 0;
+		bdp++;
+	}
+
+	mtip_set_last_buf_to_wrap(bdp);
+
 	return 0;
+
+ err:
+	mtip_free_buffers(dev);
+	return -ENOMEM;
 }
 
 static int mtip_rx_napi(struct napi_struct *napi, int budget)
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [net-next v24 4/7] net: mtip: Add net_device_ops functions to the L2 switch driver
  2026-06-01 11:24 [net-next v24 0/7] net: mtip: Add support for MTIP imx287 L2 switch driver Lukasz Majewski
                   ` (2 preceding siblings ...)
  2026-06-01 11:24 ` [net-next v24 3/7] net: mtip: Add buffers management functions to the L2 switch driver Lukasz Majewski
@ 2026-06-01 11:24 ` Lukasz Majewski
  2026-06-02 11:32   ` sashiko-bot
  2026-06-05  2:18   ` Jakub Kicinski
  2026-06-01 11:24 ` [net-next v24 5/7] net: mtip: Add mtip_switch_{rx|tx} " Lukasz Majewski
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 21+ messages in thread
From: Lukasz Majewski @ 2026-06-01 11:24 UTC (permalink / raw)
  To: Andrew Lunn, davem, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Richard Cochran, Liam Girdwood, Nathan Chancellor, linux-kernel,
	netdev, Krzysztof Kozlowski, imx, linux-arm-kernel, Rob Herring,
	Lukasz Majewski

This patch provides callbacks for struct net_device_ops for MTIP
L2 switch.

Signed-off-by: Lukasz Majewski <lukasz.majewski@mailbox.org>

---
Changes for v13:
- New patch - created by excluding some code from large (i.e. v12 and
  earlier) MTIP driver

Changes for v14:
- Add read memory barier (rmb) before reading current descriptor
- Use proper locking primitives

Changes for v15 - v15:
- None

Changes for v16:
- Enable MTIP ports to support bridge offloading
- Use dev_err_ratelimited() instead of plain dev_err()
- Move skb storage and tx ring buffer modifications after
  dma mapping code.
- Do not increase tx_errors when frames are dropped after
  failed dma_mapping.
- Refactor the code for better readability
- Remove legacy call to netif_trans_update()
- Remove not needed rmb() - synchronized data read already assured by
  coherent DMA allocation
- Replace spin_{un}lock() with _bh variant

Changes for v17:
- Add missing _bh() variant of spin_unlock
- Avoid reverse christmas tree in swap_buffer()
- Print error message after unlock
- Add DO_ONCE() and a separate function to print state of switch HW
- Remove dev->stats.tx_errors++

Changes for v18 - v19:
- None

Changes for v20:
- Perform data swap on SKB data only when it is copied to a separate
  buffer.
- Clean up the comment
- Stop both network interfaces' TX queues when no resources for
  transmission available (uDMA0 descriptors)
- Do not use fep->skb_cur and fep->tx_full

Changes for v21 - v22:
- None

Changes for v23:
- Move cancel_work_sync(&priv->tx_timeout_work); to where it is handled
- Remove dev->stats.tx_errors++; as errors are already noted in net stack

Changes for v24:
- Add struct mtip_ndev_priv *priv declaration to mtip_ndev_cleanup() - i.e.
  where it is referenced
- Add comment regarding the need for switch reset on TX timeout
- First update switch atable and only when successful update switch configuration
- Call INIT_WORK before the network device is registered - to avoid race conditions
---
 .../net/ethernet/freescale/mtipsw/mtipl2sw.c  | 296 ++++++++++++++++++
 1 file changed, 296 insertions(+)

diff --git a/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c b/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c
index 5b5229ee411d..7dfeda230e3f 100644
--- a/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c
+++ b/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c
@@ -43,6 +43,15 @@
 
 #include "mtipl2sw.h"
 
+static void swap_buffer(void *bufaddr, int len)
+{
+	unsigned int *buf = bufaddr;
+	int i;
+
+	for (i = 0; i < len; i += 4, buf++)
+		swab32s(buf);
+}
+
 /* Set the last buffer to wrap */
 static void mtip_set_last_buf_to_wrap(struct cbd_t *bdp)
 {
@@ -463,6 +472,120 @@ static void mtip_config_switch(struct switch_enet_private *fep)
 	       fep->hwp + ESW_IMR);
 }
 
+static netdev_tx_t mtip_start_xmit_port(struct sk_buff *skb,
+					struct net_device *dev, int port)
+{
+	struct mtip_ndev_priv *priv = netdev_priv(dev);
+	struct switch_enet_private *fep = priv->fep;
+	unsigned short status;
+	unsigned int index;
+	struct cbd_t *bdp;
+	void *bufaddr;
+
+	spin_lock_bh(&fep->hw_lock);
+
+	if (!fep->link[0] && !fep->link[1]) {
+		/* Link is down or autonegotiation is in progress. */
+		mtip_netif_stop_queues(fep);
+		spin_unlock_bh(&fep->hw_lock);
+		return NETDEV_TX_BUSY;
+	}
+
+	/* Fill in a Tx ring entry */
+	bdp = fep->cur_tx;
+	status = bdp->cbd_sc;
+
+	if (status & BD_ENET_TX_READY) {
+		/* All transmit buffers are full. Bail out. */
+		mtip_netif_stop_queues(fep);
+		spin_unlock_bh(&fep->hw_lock);
+		dev_err_ratelimited(&fep->pdev->dev, "%s: tx queue full!.\n",
+				    dev->name);
+		return NETDEV_TX_BUSY;
+	}
+
+	/* Clear all of the status flags */
+	status &= ~BD_ENET_TX_STATS;
+
+	/* Set buffer length and buffer pointer */
+	bufaddr = skb->data;
+	bdp->cbd_datlen = skb->len;
+
+	index = bdp - fep->tx_bd_base;
+	/* On some FEC implementations data must be aligned on
+	 * 4-byte boundaries. Use bounce buffers to copy data
+	 * and get it aligned.
+	 */
+	if ((unsigned long)bufaddr & MTIP_ALIGNMENT ||
+	    fep->quirks & FEC_QUIRK_SWAP_FRAME) {
+		memcpy(fep->tx_bounce[index], skb->data, skb->len);
+		bufaddr = fep->tx_bounce[index];
+
+		if (fep->quirks & FEC_QUIRK_SWAP_FRAME)
+			swap_buffer(bufaddr, skb->len);
+	}
+
+	/* Push the data cache so the CPM does not get stale memory
+	 * data.
+	 */
+	bdp->cbd_bufaddr = dma_map_single(&fep->pdev->dev, bufaddr,
+					  MTIP_SWITCH_TX_FRSIZE,
+					  DMA_TO_DEVICE);
+	if (unlikely(dma_mapping_error(&fep->pdev->dev, bdp->cbd_bufaddr))) {
+		dev_err(&fep->pdev->dev,
+			"Failed to map descriptor tx buffer\n");
+		dev->stats.tx_dropped++;
+		dev_kfree_skb_any(skb);
+		goto err;
+	}
+
+	/* Save skb pointer. */
+	fep->tx_skbuff[index] = skb;
+
+	/* Send it on its way.  Tell FEC it's ready, interrupt when done,
+	 * it's the last BD of the frame, and to put the CRC on the end.
+	 */
+
+	status |= (BD_ENET_TX_READY | BD_ENET_TX_INTR | BD_ENET_TX_LAST |
+		   BD_ENET_TX_TC);
+
+	/* Synchronize all descriptor writes */
+	wmb();
+	bdp->cbd_sc = status;
+
+	skb_tx_timestamp(skb);
+
+	dev->stats.tx_bytes += skb->len;
+	/* If this was the last BD in the ring,
+	 * start at the beginning again.
+	 */
+	if (status & BD_ENET_TX_WRAP)
+		bdp = fep->tx_bd_base;
+	else
+		bdp++;
+
+	fep->cur_tx = bdp;
+	/* When TX descriptors' ring buffer is full stop both interfaces */
+	if (fep->cur_tx == fep->dirty_tx)
+		mtip_netif_stop_queues(fep);
+
+	/* Trigger transmission start */
+	writel(MCF_ESW_TDAR_X_DES_ACTIVE, fep->hwp + ESW_TDAR);
+
+ err:
+	spin_unlock_bh(&fep->hw_lock);
+
+	return NETDEV_TX_OK;
+}
+
+static netdev_tx_t mtip_start_xmit(struct sk_buff *skb,
+				   struct net_device *dev)
+{
+	struct mtip_ndev_priv *priv = netdev_priv(dev);
+
+	return mtip_start_xmit_port(skb, dev, priv->portnum);
+}
+
 static void mtip_configure_enet_mii(struct switch_enet_private *fep, int port)
 {
 	struct phy_device *phydev = fep->phy_dev[port - 1];
@@ -617,6 +740,76 @@ static void mtip_switch_restart(struct net_device *dev, int duplex0,
 	mtip_config_switch(fep);
 }
 
+static void mtip_print_hw_state(struct net_device *dev)
+{
+	struct mtip_ndev_priv *priv = netdev_priv(dev);
+	struct switch_enet_private *fep = priv->fep;
+	struct cbd_t *bdp;
+	bool tx_full;
+	int i;
+
+	spin_lock_bh(&fep->hw_lock);
+	tx_full = fep->dirty_tx == fep->cur_tx &&
+		mtip_netif_queues_stopped(fep);
+
+	dev_info(&dev->dev, "%s: transmit timed out.\n", dev->name);
+	dev_info(&dev->dev,
+		 "Ring data: cur_tx 0x%p%s, dirty_tx 0x%p cur_rx: 0x%p\n",
+		 fep->cur_tx, tx_full ? " (full)" : "", fep->dirty_tx,
+		 fep->cur_rx);
+
+	bdp = fep->tx_bd_base;
+	dev_info(&dev->dev, " tx: %u buffers\n", TX_RING_SIZE);
+	for (i = 0; i < TX_RING_SIZE; i++) {
+		dev_info(&dev->dev, "  0x%p: %04x %04x %08x\n",
+			 bdp, bdp->cbd_sc, bdp->cbd_datlen,
+			 (int)bdp->cbd_bufaddr);
+		bdp++;
+	}
+
+	bdp = fep->rx_bd_base;
+	dev_info(&dev->dev, " rx: %lu buffers\n", RX_RING_SIZE);
+	for (i = 0 ; i < RX_RING_SIZE; i++) {
+		dev_info(&dev->dev, "  0x%p: %04x %04x %08x\n",
+			 bdp, bdp->cbd_sc, bdp->cbd_datlen,
+			 (int)bdp->cbd_bufaddr);
+		bdp++;
+	}
+	spin_unlock_bh(&fep->hw_lock);
+}
+
+static void mtip_timeout(struct net_device *dev, unsigned int txqueue)
+{
+	struct mtip_ndev_priv *priv = netdev_priv(dev);
+
+	DO_ONCE(mtip_print_hw_state, dev);
+
+	/* Adjust switch registers to handle TX timeout - to be more
+	 * specific - restart is required.
+	 */
+	schedule_work(&priv->tx_timeout_work);
+}
+
+static void mtip_timeout_work(struct work_struct *work)
+{
+	struct mtip_ndev_priv *priv =
+		container_of(work, struct mtip_ndev_priv, tx_timeout_work);
+	struct switch_enet_private *fep = priv->fep;
+	struct net_device *dev = priv->dev;
+
+	rtnl_lock();
+	if (netif_device_present(dev) || netif_running(dev)) {
+		napi_disable(&fep->napi);
+		netif_tx_lock_bh(dev);
+		mtip_switch_restart(dev, fep->full_duplex[0],
+				    fep->full_duplex[1]);
+		netif_tx_wake_all_queues(dev);
+		netif_tx_unlock_bh(dev);
+		napi_enable(&fep->napi);
+	}
+	rtnl_unlock();
+}
+
 static irqreturn_t mtip_interrupt(int irq, void *ptr_fep)
 {
 	struct switch_enet_private *fep = ptr_fep;
@@ -1095,6 +1288,98 @@ static int mtip_close(struct net_device *dev)
 	return 0;
 }
 
+#define FEC_HASH_BITS	6		/* #bits in hash */
+static void mtip_set_multicast_list(struct net_device *dev)
+{
+	struct mtip_ndev_priv *priv = netdev_priv(dev);
+	unsigned int hash_high = 0, hash_low = 0, crc;
+	struct switch_enet_private *fep = priv->fep;
+	void __iomem *enet_addr = fep->enet_addr;
+	struct netdev_hw_addr *ha;
+	unsigned char hash;
+
+	if (priv->portnum == 2)
+		enet_addr += MCF_ESW_ENET_PORT_OFFSET;
+
+	if (dev->flags & IFF_PROMISC) {
+		/* Promisc mode is required for switch - it is
+		 * already enabled during driver's probe.
+		 */
+		dev_dbg(&dev->dev, "%s: IFF_PROMISC\n", __func__);
+		return;
+	}
+
+	if (dev->flags & IFF_ALLMULTI) {
+		dev_dbg(&dev->dev, "%s: IFF_ALLMULTI\n", __func__);
+
+		/* Allow all multicast addresses */
+		writel(0xFFFFFFFF, enet_addr + MCF_FEC_GRP_HASH_TABLE_HIGH);
+		writel(0xFFFFFFFF, enet_addr + MCF_FEC_GRP_HASH_TABLE_LOW);
+
+		return;
+	}
+
+	netdev_for_each_mc_addr(ha, dev) {
+		/* Calculate crc32 value of mac address */
+		crc = ether_crc_le(dev->addr_len, ha->addr);
+
+		/* Only upper 6 bits (FEC_HASH_BITS) are used
+		 * which point to specific bit in the hash registers
+		 */
+		hash = (crc >> (32 - FEC_HASH_BITS)) & 0x3F;
+
+		if (hash > 31)
+			hash_high |= 1 << (hash - 32);
+		else
+			hash_low |= 1 << hash;
+	}
+
+	writel(hash_high, enet_addr + MCF_FEC_GRP_HASH_TABLE_HIGH);
+	writel(hash_low, enet_addr + MCF_FEC_GRP_HASH_TABLE_LOW);
+}
+
+static int mtip_set_mac_address(struct net_device *dev, void *p)
+{
+	struct mtip_ndev_priv *priv = netdev_priv(dev);
+	struct switch_enet_private *fep = priv->fep;
+	void __iomem *enet_addr = fep->enet_addr;
+	struct sockaddr *addr = p;
+	int ret;
+
+	if (!is_valid_ether_addr(addr->sa_data))
+		return -EADDRNOTAVAIL;
+
+	ret = mtip_update_atable_static((unsigned char *)addr->sa_data,
+					7, 7, fep);
+	if (ret)
+		return ret;
+
+	eth_hw_addr_set(dev, addr->sa_data);
+
+	if (priv->portnum == 2)
+		enet_addr += MCF_ESW_ENET_PORT_OFFSET;
+
+	writel(dev->dev_addr[3] | (dev->dev_addr[2] << 8) |
+	       (dev->dev_addr[1] << 16) | (dev->dev_addr[0] << 24),
+	       enet_addr + MCF_FEC_PALR);
+	writel((dev->dev_addr[5] << 16) | (dev->dev_addr[4] << 24),
+	       enet_addr + MCF_FEC_PAUR);
+
+	return 0;
+}
+
+static int mtip_get_port_parent_id(struct net_device *ndev,
+				   struct netdev_phys_item_id *ppid)
+{
+	struct mtip_ndev_priv *priv = netdev_priv(ndev);
+	struct switch_enet_private *fep = priv->fep;
+
+	ppid->id_len = sizeof(fep->mac[0]);
+	memcpy(&ppid->id, &fep->mac[0], ppid->id_len);
+
+	return 0;
+}
+
 static const struct ethtool_ops mtip_ethtool_ops = {
 	.get_link_ksettings     = phy_ethtool_get_link_ksettings,
 	.set_link_ksettings     = phy_ethtool_set_link_ksettings,
@@ -1106,6 +1391,11 @@ static const struct ethtool_ops mtip_ethtool_ops = {
 static const struct net_device_ops mtip_netdev_ops = {
 	.ndo_open		= mtip_open,
 	.ndo_stop		= mtip_close,
+	.ndo_start_xmit	= mtip_start_xmit,
+	.ndo_set_rx_mode	= mtip_set_multicast_list,
+	.ndo_tx_timeout	= mtip_timeout,
+	.ndo_set_mac_address	= mtip_set_mac_address,
+	.ndo_get_port_parent_id	= mtip_get_port_parent_id,
 };
 
 bool mtip_is_switch_netdev_port(const struct net_device *ndev)
@@ -1158,10 +1448,14 @@ static int mtip_switch_dma_init(struct switch_enet_private *fep)
 
 static void mtip_ndev_cleanup(struct switch_enet_private *fep)
 {
+	struct mtip_ndev_priv *priv;
 	int i;
 
 	for (i = 0; i < SWITCH_EPORT_NUMBER; i++) {
 		if (fep->ndev[i]) {
+			priv = netdev_priv(fep->ndev[i]);
+			cancel_work_sync(&priv->tx_timeout_work);
+
 			unregister_netdev(fep->ndev[i]);
 			free_netdev(fep->ndev[i]);
 			fep->ndev[i] = NULL;
@@ -1196,6 +1490,8 @@ static int mtip_ndev_init(struct switch_enet_private *fep,
 
 		mtip_setup_mac(fep->ndev[i]);
 
+		INIT_WORK(&priv->tx_timeout_work, mtip_timeout_work);
+
 		ret = register_netdev(fep->ndev[i]);
 		if (ret) {
 			dev_err(&fep->ndev[i]->dev,
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [net-next v24 5/7] net: mtip: Add mtip_switch_{rx|tx} functions to the L2 switch driver
  2026-06-01 11:24 [net-next v24 0/7] net: mtip: Add support for MTIP imx287 L2 switch driver Lukasz Majewski
                   ` (3 preceding siblings ...)
  2026-06-01 11:24 ` [net-next v24 4/7] net: mtip: Add net_device_ops " Lukasz Majewski
@ 2026-06-01 11:24 ` Lukasz Majewski
  2026-06-02 11:32   ` sashiko-bot
  2026-06-05  2:18   ` Jakub Kicinski
  2026-06-01 11:24 ` [net-next v24 6/7] net: mtip: Extend the L2 switch driver with management operations Lukasz Majewski
  2026-06-01 11:24 ` [net-next v24 7/7] net: mtip: Extend the L2 switch driver for imx287 with bridge operations Lukasz Majewski
  6 siblings, 2 replies; 21+ messages in thread
From: Lukasz Majewski @ 2026-06-01 11:24 UTC (permalink / raw)
  To: Andrew Lunn, davem, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Richard Cochran, Liam Girdwood, Nathan Chancellor, linux-kernel,
	netdev, Krzysztof Kozlowski, imx, linux-arm-kernel, Rob Herring,
	Lukasz Majewski

This patch provides mtip_switch_tx and mtip_switch_rx functions
code for MTIP L2 switch.

Signed-off-by: Lukasz Majewski <lukasz.majewski@mailbox.org>
---
Changes for v13:
- New patch - created by excluding some code from large (i.e. v12 and
  earlier) MTIP driver

Changes for v14:
- Rewrite RX error handling code
- Remove } else { from if (unlikely(!skb)) { condition in mtip_switch_rx()
- Remove locking from RX patch (done under NAPI API and similar to fec_main.c
  driver)
- Use net_prefetch() instead of prefetch()

Changes for v15:
- Use page_address() instead of __va()
- Remove the check if data is NOT null, as it cannot be (those values are
  assured to be allocated earlier for RX path).

Changes for v16:
- Disable RX interrupt when in switch RX function
- Set offload_fwd_mark when L2 offloading is enabled (fix broadcast flooding)
- Replace spin_{un}lock() with _bh variant

Changes for v17 - v18:
- None

Changes for v19:
- Pass the page with data to upper part of the network stack
- Use new page from page pool for new transfer
- Remove extra copy of the data

Changes for v20:
- Use dev_err_ratelimited() to not spam console
- Replace dev_consume_skb_irq() with dev_consume_skb_any()
- Use skb->dev to assign it to tx packet device (avoid assigning to
  napi->dev)
- Remove the need to export the port information
- Do not use fep->skb_dirty (calculate proper 'index' instead)
- Use information about stopped queues to determine if driver can accept
  further the packets for TX

Changes for v21 - v23:
- None

Changes for v24:
- Provide comment regarding atomicity when atable is read
---
 .../net/ethernet/freescale/mtipsw/mtipl2sw.c  | 261 +++++++++++++++++-
 1 file changed, 260 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c b/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c
index 7dfeda230e3f..212207861e13 100644
--- a/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c
+++ b/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c
@@ -193,6 +193,10 @@ static void mtip_read_atable(struct switch_enet_private *fep, int index,
 	struct addr_table64b_entry __iomem *atable_base =
 		fep->hwentry->mtip_table64b_entry;
 
+	/* The switch HW guarantees that the read of 64bit atable value
+	 * is atomic when presented below order is used - i.e. first read
+	 * low 32 bits and then high 32 bits.
+	 */
 	*read_lo = readl(&atable_base[index].lo);
 	*read_hi = readl(&atable_base[index].hi);
 }
@@ -246,6 +250,39 @@ struct mtip_port_info *mtip_portinfofifo_read(struct switch_enet_private *fep)
 	return info;
 }
 
+static void mtip_atable_get_entry_port_number(struct switch_enet_private *fep,
+					      unsigned char *mac_addr, u8 *port)
+{
+	int block_index, block_index_end, entry;
+	u32 mac_addr_lo, mac_addr_hi;
+	u32 read_lo, read_hi;
+
+	mac_addr_lo = (u32)((mac_addr[3] << 24) | (mac_addr[2] << 16) |
+			    (mac_addr[1] << 8) | mac_addr[0]);
+	mac_addr_hi = (u32)((mac_addr[5] << 8) | (mac_addr[4]));
+
+	block_index = GET_BLOCK_PTR(crc8_calc(mac_addr));
+	block_index_end = block_index + ATABLE_ENTRY_PER_SLOT;
+
+	/* now search all the entries in the selected block */
+	for (entry = block_index; entry < block_index_end; entry++) {
+		mtip_read_atable(fep, entry, &read_lo, &read_hi);
+		*port = MTIP_PORT_FORWARDING_INIT;
+
+		if (read_lo == mac_addr_lo &&
+		    ((read_hi & 0x0000FFFF) ==
+		     (mac_addr_hi & 0x0000FFFF))) {
+			/* found the correct address */
+			if ((read_hi & (1 << 16)) && (!(read_hi & (1 << 17))))
+				*port = FIELD_GET(AT_PORT_MASK, read_hi);
+			break;
+		}
+	}
+
+	dev_dbg(&fep->pdev->dev, "%s: MAC: %pM PORT: 0x%x\n", __func__,
+		mac_addr, *port);
+}
+
 /* Clear complete MAC Look Up Table */
 void mtip_clear_atable(struct switch_enet_private *fep)
 {
@@ -836,11 +873,233 @@ static irqreturn_t mtip_interrupt(int irq, void *ptr_fep)
 
 static void mtip_switch_tx(struct switch_enet_private *fep)
 {
+	struct net_device *dev;
+	unsigned short status;
+	struct sk_buff *skb;
+	struct cbd_t *bdp;
+	int index;
+
+	spin_lock_bh(&fep->hw_lock);
+	bdp = fep->dirty_tx;
+
+	while (((status = bdp->cbd_sc) & BD_ENET_TX_READY) == 0) {
+		if (bdp == fep->cur_tx &&
+		    !mtip_netif_queues_stopped(fep))
+			break;
+
+		index = bdp - fep->tx_bd_base;
+		dma_unmap_single(&fep->pdev->dev, bdp->cbd_bufaddr,
+				 MTIP_SWITCH_TX_FRSIZE, DMA_TO_DEVICE);
+		bdp->cbd_bufaddr = 0;
+		skb = fep->tx_skbuff[index];
+		dev = skb->dev;
+		/* Check for errors */
+		if (status & (BD_ENET_TX_HB | BD_ENET_TX_LC |
+				   BD_ENET_TX_RL | BD_ENET_TX_UN |
+				   BD_ENET_TX_CSL)) {
+			dev->stats.tx_errors++;
+			if (status & BD_ENET_TX_HB)  /* No heartbeat */
+				dev->stats.tx_heartbeat_errors++;
+			if (status & BD_ENET_TX_LC)  /* Late collision */
+				dev->stats.tx_window_errors++;
+			if (status & BD_ENET_TX_RL)  /* Retrans limit */
+				dev->stats.tx_aborted_errors++;
+			if (status & BD_ENET_TX_UN)  /* Underrun */
+				dev->stats.tx_fifo_errors++;
+			if (status & BD_ENET_TX_CSL) /* Carrier lost */
+				dev->stats.tx_carrier_errors++;
+		} else {
+			dev->stats.tx_packets++;
+		}
+
+		if (status & BD_ENET_TX_READY)
+			dev_err_ratelimited(&fep->pdev->dev,
+					    "xmit interrupt and TX_READY.\n");
+
+		/* Deferred means some collisions occurred during transmit,
+		 * but we eventually sent the packet OK.
+		 */
+		if (status & BD_ENET_TX_DEF)
+			dev->stats.collisions++;
+
+		/* Free the sk buffer associated with this last transmit */
+		dev_consume_skb_any(skb);
+		fep->tx_skbuff[index] = NULL;
+
+		/* Update pointer to next buffer descriptor to be transmitted */
+		if (status & BD_ENET_TX_WRAP)
+			bdp = fep->tx_bd_base;
+		else
+			bdp++;
+
+		/* Since we have freed up a buffer, the ring is no longer
+		 * full.
+		 */
+		if (fep->dirty_tx == fep->cur_tx &&
+		    mtip_netif_queues_stopped(fep))
+			mtip_netif_wake_queues(fep);
+	}
+	fep->dirty_tx = bdp;
+	spin_unlock_bh(&fep->hw_lock);
 }
 
+static int mtip_update_cbd(struct switch_enet_private *fep, struct cbd_t *bdp,
+			   int index)
+{
+	struct page *new_page;
+
+	new_page = page_pool_dev_alloc_pages(fep->page_pool);
+	if (unlikely(!new_page))
+		return -ENOMEM;
+
+	fep->page[index] = new_page;
+	bdp->cbd_bufaddr = page_pool_get_dma_addr(new_page);
+
+	return 0;
+}
+
+/* During a receive, the cur_rx points to the current incoming buffer.
+ * When we update through the ring, if the next incoming buffer has
+ * not been given to the system, we just set the empty indicator,
+ * effectively tossing the packet.
+ */
 static int mtip_switch_rx(struct net_device *dev, int budget)
 {
-	return -ENOMEM;
+	struct mtip_ndev_priv *priv = netdev_priv(dev);
+	u8 *data, rx_port = MTIP_PORT_FORWARDING_INIT;
+	struct switch_enet_private *fep = priv->fep;
+	unsigned short status, pkt_len;
+	struct net_device *pndev;
+	struct ethhdr *eth_hdr;
+	int pkt_received = 0;
+	struct sk_buff *skb;
+	struct cbd_t *bdp;
+	struct page *page;
+	int index;
+
+	/* First, grab all of the stats for the incoming packet.
+	 * These get messed up if we get called due to a busy condition.
+	 */
+	bdp = fep->cur_rx;
+
+	while (!((status = bdp->cbd_sc) & BD_ENET_RX_EMPTY)) {
+		if (pkt_received >= budget)
+			break;
+
+		pkt_received++;
+
+		writel(MCF_ESW_IMR_RXF, fep->hwp + ESW_ISR);
+		if (!fep->usage_count)
+			goto rx_processing_done;
+
+		status ^= BD_ENET_RX_LAST;
+		/* Check for errors. */
+		if (status & (BD_ENET_RX_LG | BD_ENET_RX_SH | BD_ENET_RX_NO |
+			      BD_ENET_RX_CR | BD_ENET_RX_OV | BD_ENET_RX_LAST |
+			      BD_ENET_RX_CL)) {
+			dev->stats.rx_errors++;
+			if (status & BD_ENET_RX_OV) {
+				/* FIFO overrun */
+				dev->stats.rx_fifo_errors++;
+				goto rx_processing_done;
+			}
+			if (status & (BD_ENET_RX_LG | BD_ENET_RX_SH
+				      | BD_ENET_RX_LAST)) {
+				/* Frame too long or too short. */
+				dev->stats.rx_length_errors++;
+				if (status & BD_ENET_RX_LAST)
+					netdev_err(dev, "rcv is not +last\n");
+			}
+			if (status & BD_ENET_RX_CR)	/* CRC Error */
+				dev->stats.rx_crc_errors++;
+
+			/* Report late collisions as a frame error. */
+			if (status & (BD_ENET_RX_NO | BD_ENET_RX_CL))
+				dev->stats.rx_frame_errors++;
+			goto rx_processing_done;
+		}
+
+		/* Get correct RX page */
+		index = bdp - fep->rx_bd_base;
+		page = fep->page[index];
+		/* Process the incoming frame */
+		pkt_len = bdp->cbd_datlen;
+
+		dma_sync_single_for_cpu(&fep->pdev->dev, bdp->cbd_bufaddr,
+					pkt_len, DMA_FROM_DEVICE);
+		net_prefetch(page_address(page));
+		data = page_address(page);
+
+		if (fep->quirks & FEC_QUIRK_SWAP_FRAME)
+			swap_buffer(data, pkt_len);
+
+		eth_hdr = (struct ethhdr *)data;
+		mtip_atable_get_entry_port_number(fep, eth_hdr->h_source,
+						  &rx_port);
+		if (rx_port == MTIP_PORT_FORWARDING_INIT)
+			mtip_atable_dynamicms_learn_migration(fep,
+							      mtip_get_time(),
+							      eth_hdr->h_source,
+							      &rx_port);
+
+		if ((rx_port == 1 || rx_port == 2) && fep->ndev[rx_port - 1])
+			pndev = fep->ndev[rx_port - 1];
+		else
+			pndev = dev;
+
+		if (mtip_update_cbd(fep, bdp, index)) {
+			pndev->stats.rx_dropped++;
+			goto rx_processing_done;
+		}
+
+		/* The packet length includes FCS, but we don't want to
+		 * include that when passing upstream as it messes up
+		 * bridging applications.
+		 */
+		skb = build_skb(page_address(page), PAGE_SIZE);
+		if (unlikely(!skb)) {
+			page_pool_recycle_direct(fep->page_pool, page);
+			pndev->stats.rx_dropped++;
+
+			netdev_err_once(pndev, "build_skb failed!\n");
+			goto rx_processing_done;
+		}
+
+		skb_put(skb, pkt_len);      /* Make room */
+		skb_mark_for_recycle(skb);
+		skb->protocol = eth_type_trans(skb, pndev);
+		skb->offload_fwd_mark = fep->br_offload;
+		napi_gro_receive(&fep->napi, skb);
+
+		pndev->stats.rx_packets++;
+		pndev->stats.rx_bytes += pkt_len;
+
+ rx_processing_done:
+		/* Clear the status flags for this buffer */
+		status &= ~BD_ENET_RX_STATS;
+
+		/* Mark the buffer empty */
+		status |= BD_ENET_RX_EMPTY;
+		/* Make sure that updates to the descriptor are performed */
+		wmb();
+		bdp->cbd_sc = status;
+
+		/* Update BD pointer to next entry */
+		if (status & BD_ENET_RX_WRAP)
+			bdp = fep->rx_bd_base;
+		else
+			bdp++;
+
+		/* Doing this here will keep the FEC running while we process
+		 * incoming frames.  On a heavily loaded network, we should be
+		 * able to keep up at the expense of system resources.
+		 */
+		writel(MCF_ESW_RDAR_R_DES_ACTIVE, fep->hwp + ESW_RDAR);
+	} /* while (!((status = bdp->cbd_sc) & BD_ENET_RX_EMPTY)) */
+
+	fep->cur_rx = bdp;
+
+	return pkt_received;
 }
 
 static void mtip_adjust_link(struct net_device *dev)
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [net-next v24 6/7] net: mtip: Extend the L2 switch driver with management operations
  2026-06-01 11:24 [net-next v24 0/7] net: mtip: Add support for MTIP imx287 L2 switch driver Lukasz Majewski
                   ` (4 preceding siblings ...)
  2026-06-01 11:24 ` [net-next v24 5/7] net: mtip: Add mtip_switch_{rx|tx} " Lukasz Majewski
@ 2026-06-01 11:24 ` Lukasz Majewski
  2026-06-02 11:32   ` sashiko-bot
  2026-06-05  2:18   ` Jakub Kicinski
  2026-06-01 11:24 ` [net-next v24 7/7] net: mtip: Extend the L2 switch driver for imx287 with bridge operations Lukasz Majewski
  6 siblings, 2 replies; 21+ messages in thread
From: Lukasz Majewski @ 2026-06-01 11:24 UTC (permalink / raw)
  To: Andrew Lunn, davem, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Richard Cochran, Liam Girdwood, Nathan Chancellor, linux-kernel,
	netdev, Krzysztof Kozlowski, imx, linux-arm-kernel, Rob Herring,
	Lukasz Majewski

This patch provides function necessary for managing the L2 switch.

Signed-off-by: Lukasz Majewski <lukasz.majewski@mailbox.org>

---
Changes for v13:
- New patch - created by excluding some code from large (i.e. v12 and
  earlier) MTIP driver

Changes for v14 - v21:
- None

Changes for v22:
- Update 'mode' check in mtip_vlan_input_process(). The comment was
  misleading, so it has been removed. ESW_VIMSEL accepts "Mode" 1 to 4,
  but it has allowed values from 0 to 3 (the IM fields' values)

Changes for v23:
- Update copyright notice

Changes for v24:
- None
---
 .../net/ethernet/freescale/mtipsw/Makefile    |   2 +-
 .../net/ethernet/freescale/mtipsw/mtipl2sw.c  |  31 ++
 .../net/ethernet/freescale/mtipsw/mtipl2sw.h  |  23 +
 .../ethernet/freescale/mtipsw/mtipl2sw_mgnt.c | 442 ++++++++++++++++++
 4 files changed, 497 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/freescale/mtipsw/mtipl2sw_mgnt.c

diff --git a/drivers/net/ethernet/freescale/mtipsw/Makefile b/drivers/net/ethernet/freescale/mtipsw/Makefile
index bd8ffb30939a..a99aaf6ddfb2 100644
--- a/drivers/net/ethernet/freescale/mtipsw/Makefile
+++ b/drivers/net/ethernet/freescale/mtipsw/Makefile
@@ -1,4 +1,4 @@
 # SPDX-License-Identifier: GPL-2.0
 
 obj-$(CONFIG_FEC_MTIP_L2SW) += nxp-mtipl2sw.o
-nxp-mtipl2sw-objs := mtipl2sw.o
+nxp-mtipl2sw-objs := mtipl2sw.o mtipl2sw_mgnt.o
diff --git a/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c b/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c
index 212207861e13..76e503e128dd 100644
--- a/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c
+++ b/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c
@@ -505,8 +505,35 @@ static void mtip_config_switch(struct switch_enet_private *fep)
 
 	writel(0, fep->hwp + ESW_BKLR);
 
+	/* Do NOT disable learning */
+	mtip_port_learning_config(fep, 0, 0, 0);
+	mtip_port_learning_config(fep, 1, 0, 0);
+	mtip_port_learning_config(fep, 2, 0, 0);
+
+	/* Disable blocking */
+	mtip_port_blocking_config(fep, 0, 0);
+	mtip_port_blocking_config(fep, 1, 0);
+	mtip_port_blocking_config(fep, 2, 0);
+
 	writel(MCF_ESW_IMR_TXF | MCF_ESW_IMR_RXF,
 	       fep->hwp + ESW_IMR);
+
+	mtip_port_enable_config(fep, 0, 1, 1);
+	mtip_port_enable_config(fep, 1, 1, 1);
+	mtip_port_enable_config(fep, 2, 1, 1);
+
+	mtip_port_broadcast_config(fep, 0, 1);
+	mtip_port_broadcast_config(fep, 1, 1);
+	mtip_port_broadcast_config(fep, 2, 1);
+
+	/* Disable multicast receive on port 0 (MGNT) */
+	mtip_port_multicast_config(fep, 0, 0);
+	mtip_port_multicast_config(fep, 1, 1);
+	mtip_port_multicast_config(fep, 2, 1);
+
+	/* Setup VLANs to provide port separation */
+	if (!fep->br_offload)
+		mtip_switch_en_port_separation(fep);
 }
 
 static netdev_tx_t mtip_start_xmit_port(struct sk_buff *skb,
@@ -592,6 +619,10 @@ static netdev_tx_t mtip_start_xmit_port(struct sk_buff *skb,
 
 	skb_tx_timestamp(skb);
 
+	/* For port separation - force sending via specified port */
+	if (!fep->br_offload && port != 0)
+		mtip_forced_forward(fep, port, 1);
+
 	dev->stats.tx_bytes += skb->len;
 	/* If this was the last BD in the ring,
 	 * start at the beginning again.
diff --git a/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.h b/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.h
index 7fbad157ff83..72784df6c127 100644
--- a/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.h
+++ b/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.h
@@ -618,6 +618,29 @@ static inline int mtip_get_time(void)
 
 #define MTIP_PORT_FORWARDING_INIT 0xFF
 
+/* Switch Management functions */
+int mtip_vlan_input_process(struct switch_enet_private *fep,
+			    int port, int mode, unsigned short port_vlanid,
+			    int vlan_verify_en, int vlan_domain_num,
+			    int vlan_domain_port);
+int mtip_set_vlan_verification(struct switch_enet_private *fep, int port,
+			       int vlan_domain_verify_en,
+			       int vlan_discard_unknown_en);
+int mtip_port_multicast_config(struct switch_enet_private *fep, int port,
+			       bool enable);
+int mtip_vlan_output_process(struct switch_enet_private *fep, int port,
+			     int mode);
+void mtip_switch_en_port_separation(struct switch_enet_private *fep);
+void mtip_switch_dis_port_separation(struct switch_enet_private *fep);
+int mtip_port_broadcast_config(struct switch_enet_private *fep,
+			       int port, bool enable);
+int mtip_forced_forward(struct switch_enet_private *fep, int port, bool enable);
+int mtip_port_learning_config(struct switch_enet_private *fep, int port,
+			      bool disable, bool irq_adj);
+int mtip_port_blocking_config(struct switch_enet_private *fep, int port,
+			      bool enable);
 bool mtip_is_switch_netdev_port(const struct net_device *ndev);
+int mtip_port_enable_config(struct switch_enet_private *fep, int port,
+			    bool tx_en, bool rx_en);
 void mtip_clear_atable(struct switch_enet_private *fep);
 #endif /* __MTIP_L2SWITCH_H_ */
diff --git a/drivers/net/ethernet/freescale/mtipsw/mtipl2sw_mgnt.c b/drivers/net/ethernet/freescale/mtipsw/mtipl2sw_mgnt.c
new file mode 100644
index 000000000000..4c07861a6a3e
--- /dev/null
+++ b/drivers/net/ethernet/freescale/mtipsw/mtipl2sw_mgnt.c
@@ -0,0 +1,442 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ *  L2 switch Controller driver for MTIP block - switch MGNT
+ *
+ *  Copyright (C) 2026 NABLADEV Software Engineering GmbH
+ *  Lukasz Majewski <lukma@nabladev.com>
+ *
+ *  Based on a previous work by:
+ *
+ *  Copyright 2010-2012 Freescale Semiconductor, Inc.
+ *  Alison Wang (b18965@freescale.com)
+ *  Jason Jin (Jason.jin@freescale.com)
+ *
+ *  Copyright (C) 2010-2013 Freescale Semiconductor, Inc. All Rights Reserved.
+ *  Shrek Wu (B16972@freescale.com)
+ */
+
+#include <linux/etherdevice.h>
+#include <linux/netdevice.h>
+#include <linux/platform_device.h>
+
+#include "mtipl2sw.h"
+
+int mtip_vlan_input_process(struct switch_enet_private *fep,
+			    int port, int mode, unsigned short port_vlanid,
+			    int vlan_verify_en, int vlan_domain_num,
+			    int vlan_domain_port)
+{
+	if (mode < 0 || mode > 3) {
+		dev_err(&fep->pdev->dev,
+			"%s: VLAN input processing mode (%d) not supported\n",
+			__func__, mode);
+		return -EINVAL;
+	}
+
+	if (port < 0 || port > 2) {
+		dev_err(&fep->pdev->dev, "%s: Port (%d) not supported!\n",
+			__func__, port);
+		return -EINVAL;
+	}
+
+	if (vlan_verify_en == 1 &&
+	    (vlan_domain_num < 0 || vlan_domain_num > 32)) {
+		dev_err(&fep->pdev->dev, "%s: Domain out of range\n", __func__);
+		return -EINVAL;
+	}
+
+	writel(FIELD_PREP(MCF_ESW_PID_VLANID_MASK, port_vlanid),
+	       fep->hwp + ESW_PID(port));
+	if (port == 0) {
+		if (vlan_verify_en == 1)
+			writel(FIELD_PREP(MCF_ESW_VRES_VLANID_MASK,
+					  port_vlanid) | MCF_ESW_VRES_P0,
+			       fep->hwp + ESW_VRES(vlan_domain_num));
+
+		writel(readl(fep->hwp + ESW_VIMEN) | MCF_ESW_VIMEN_EN0,
+		       fep->hwp + ESW_VIMEN);
+		writel(readl(fep->hwp + ESW_VIMSEL) |
+		       FIELD_PREP(MCF_ESW_VIMSEL_IM0_MASK, mode),
+		       fep->hwp + ESW_VIMSEL);
+	} else if (port == 1) {
+		if (vlan_verify_en == 1)
+			writel(FIELD_PREP(MCF_ESW_VRES_VLANID_MASK,
+					  port_vlanid) | MCF_ESW_VRES_P1,
+			       fep->hwp + ESW_VRES(vlan_domain_num));
+
+		writel(readl(fep->hwp + ESW_VIMEN) | MCF_ESW_VIMEN_EN1,
+		       fep->hwp + ESW_VIMEN);
+		writel(readl(fep->hwp + ESW_VIMSEL) |
+		       FIELD_PREP(MCF_ESW_VIMSEL_IM1_MASK, mode),
+		       fep->hwp + ESW_VIMSEL);
+	} else if (port == 2) {
+		if (vlan_verify_en == 1)
+			writel(FIELD_PREP(MCF_ESW_VRES_VLANID_MASK,
+					  port_vlanid) | MCF_ESW_VRES_P2,
+			       fep->hwp + ESW_VRES(vlan_domain_num));
+
+		writel(readl(fep->hwp + ESW_VIMEN) | MCF_ESW_VIMEN_EN2,
+		       fep->hwp + ESW_VIMEN);
+		writel(readl(fep->hwp + ESW_VIMSEL) |
+		       FIELD_PREP(MCF_ESW_VIMSEL_IM2_MASK, mode),
+		       fep->hwp + ESW_VIMSEL);
+	}
+
+	return 0;
+}
+
+int mtip_vlan_output_process(struct switch_enet_private *fep, int port,
+			     int mode)
+{
+	if (port < 0 || port > 2) {
+		dev_err(&fep->pdev->dev, "%s: Port (%d) not supported!\n",
+			__func__, port);
+		return -EINVAL;
+	}
+
+	if (port == 0) {
+		writel(readl(fep->hwp + ESW_VOMSEL) |
+		       FIELD_PREP(MCF_ESW_VOMSEL_OM0_MASK, mode),
+		       fep->hwp + ESW_VOMSEL);
+	} else if (port == 1) {
+		writel(readl(fep->hwp + ESW_VOMSEL) |
+		       FIELD_PREP(MCF_ESW_VOMSEL_OM1_MASK, mode),
+		       fep->hwp + ESW_VOMSEL);
+	} else if (port == 2) {
+		writel(readl(fep->hwp + ESW_VOMSEL) |
+		       FIELD_PREP(MCF_ESW_VOMSEL_OM2_MASK, mode),
+		       fep->hwp + ESW_VOMSEL);
+	}
+
+	return 0;
+}
+
+int mtip_set_vlan_verification(struct switch_enet_private *fep, int port,
+			       int vlan_domain_verify_en,
+			       int vlan_discard_unknown_en)
+{
+	if (port < 0 || port > 2) {
+		dev_err(&fep->pdev->dev, "%s: Port (%d) not supported!\n",
+			__func__, port);
+		return -EINVAL;
+	}
+
+	if (vlan_domain_verify_en == 1) {
+		if (port == 0)
+			writel(readl(fep->hwp + ESW_VLANV) | MCF_ESW_VLANV_VV0,
+			       fep->hwp + ESW_VLANV);
+		else if (port == 1)
+			writel(readl(fep->hwp + ESW_VLANV) | MCF_ESW_VLANV_VV1,
+			       fep->hwp + ESW_VLANV);
+		else if (port == 2)
+			writel(readl(fep->hwp + ESW_VLANV) | MCF_ESW_VLANV_VV2,
+			       fep->hwp + ESW_VLANV);
+	} else if (vlan_domain_verify_en == 0) {
+		if (port == 0)
+			writel(readl(fep->hwp + ESW_VLANV) & ~MCF_ESW_VLANV_VV0,
+			       fep->hwp + ESW_VLANV);
+		else if (port == 1)
+			writel(readl(fep->hwp + ESW_VLANV) & ~MCF_ESW_VLANV_VV1,
+			       fep->hwp + ESW_VLANV);
+		else if (port == 2)
+			writel(readl(fep->hwp + ESW_VLANV) & ~MCF_ESW_VLANV_VV2,
+			       fep->hwp + ESW_VLANV);
+	}
+
+	if (vlan_discard_unknown_en == 1) {
+		if (port == 0)
+			writel(readl(fep->hwp + ESW_VLANV) | MCF_ESW_VLANV_DU0,
+			       fep->hwp + ESW_VLANV);
+		else if (port == 1)
+			writel(readl(fep->hwp + ESW_VLANV) | MCF_ESW_VLANV_DU1,
+			       fep->hwp + ESW_VLANV);
+		else if (port == 2)
+			writel(readl(fep->hwp + ESW_VLANV) | MCF_ESW_VLANV_DU2,
+			       fep->hwp + ESW_VLANV);
+	} else if (vlan_discard_unknown_en == 0) {
+		if (port == 0)
+			writel(readl(fep->hwp + ESW_VLANV) & ~MCF_ESW_VLANV_DU0,
+			       fep->hwp + ESW_VLANV);
+		else if (port == 1)
+			writel(readl(fep->hwp + ESW_VLANV) & ~MCF_ESW_VLANV_DU1,
+			       fep->hwp + ESW_VLANV);
+		else if (port == 2)
+			writel(readl(fep->hwp + ESW_VLANV) & ~MCF_ESW_VLANV_DU2,
+			       fep->hwp + ESW_VLANV);
+	}
+
+	dev_dbg(&fep->pdev->dev, "%s: ESW_VLANV %#x\n", __func__,
+		readl(fep->hwp + ESW_VLANV));
+
+	return 0;
+}
+
+int mtip_port_multicast_config(struct switch_enet_private *fep,
+			       int port, bool enable)
+{
+	u32 reg = 0;
+
+	if (port < 0 || port > 2) {
+		dev_err(&fep->pdev->dev, "%s: Port (%d) not supported\n",
+			__func__, port);
+		return -EINVAL;
+	}
+
+	reg = readl(fep->hwp + ESW_DMCR);
+	if (enable) {
+		if (port == 0)
+			reg |= MCF_ESW_DMCR_P0;
+		else if (port == 1)
+			reg |= MCF_ESW_DMCR_P1;
+		else if (port == 2)
+			reg |= MCF_ESW_DMCR_P2;
+	} else {
+		if (port == 0)
+			reg &= ~MCF_ESW_DMCR_P0;
+		else if (port == 1)
+			reg &= ~MCF_ESW_DMCR_P1;
+		else if (port == 2)
+			reg &= ~MCF_ESW_DMCR_P2;
+	}
+
+	writel(reg, fep->hwp + ESW_DMCR);
+	return 0;
+}
+
+/* enable or disable port n tx or rx
+ * tx_en 0 disable port n tx
+ * tx_en 1 enable  port n tx
+ * rx_en 0 disable port n rx
+ * rx_en 1 enable  port n rx
+ */
+int mtip_port_enable_config(struct switch_enet_private *fep, int port,
+			    bool tx_en, bool rx_en)
+{
+	u32 reg = 0;
+
+	if (port < 0 || port > 2) {
+		dev_err(&fep->pdev->dev, "%s: Port (%d) not supported\n",
+			__func__, port);
+		return -EINVAL;
+	}
+
+	reg = readl(fep->hwp + ESW_PER);
+	if (tx_en) {
+		if (port == 0)
+			reg |= MCF_ESW_PER_TE0;
+		else if (port == 1)
+			reg |= MCF_ESW_PER_TE1;
+		else if (port == 2)
+			reg |= MCF_ESW_PER_TE2;
+	} else {
+		if (port == 0)
+			reg &= (~MCF_ESW_PER_TE0);
+		else if (port == 1)
+			reg &= (~MCF_ESW_PER_TE1);
+		else if (port == 2)
+			reg &= (~MCF_ESW_PER_TE2);
+	}
+
+	if (rx_en) {
+		if (port == 0)
+			reg |= MCF_ESW_PER_RE0;
+		else if (port == 1)
+			reg |= MCF_ESW_PER_RE1;
+		else if (port == 2)
+			reg |= MCF_ESW_PER_RE2;
+	} else {
+		if (port == 0)
+			reg &= (~MCF_ESW_PER_RE0);
+		else if (port == 1)
+			reg &= (~MCF_ESW_PER_RE1);
+		else if (port == 2)
+			reg &= (~MCF_ESW_PER_RE2);
+	}
+
+	writel(reg, fep->hwp + ESW_PER);
+	return 0;
+}
+
+void mtip_switch_en_port_separation(struct switch_enet_private *fep)
+{
+	u32 reg;
+
+	mtip_vlan_input_process(fep, 0, 3, 0x10, 1, 0, 0);
+	mtip_vlan_input_process(fep, 1, 3, 0x11, 1, 1, 0);
+	mtip_vlan_input_process(fep, 2, 3, 0x12, 1, 2, 0);
+
+	reg = readl(fep->hwp + ESW_VRES(0));
+	writel(reg | MCF_ESW_VRES_P1 | MCF_ESW_VRES_P2,
+	       fep->hwp + ESW_VRES(0));
+
+	reg = readl(fep->hwp + ESW_VRES(1));
+	writel(reg | MCF_ESW_VRES_P0, fep->hwp + ESW_VRES(1));
+
+	reg = readl(fep->hwp + ESW_VRES(2));
+	writel(reg | MCF_ESW_VRES_P0, fep->hwp + ESW_VRES(2));
+
+	dev_dbg(&fep->pdev->dev, "%s: VRES0: 0x%x\n",
+		__func__, readl(fep->hwp + ESW_VRES(0)));
+	dev_dbg(&fep->pdev->dev, "%s: VRES1: 0x%x\n", __func__,
+		readl(fep->hwp + ESW_VRES(1)));
+	dev_dbg(&fep->pdev->dev, "%s: VRES2: 0x%x\n", __func__,
+		readl(fep->hwp + ESW_VRES(2)));
+
+	mtip_set_vlan_verification(fep, 0, 1, 0);
+	mtip_set_vlan_verification(fep, 1, 1, 0);
+	mtip_set_vlan_verification(fep, 2, 1, 0);
+
+	mtip_vlan_output_process(fep, 0, 2);
+	mtip_vlan_output_process(fep, 1, 2);
+	mtip_vlan_output_process(fep, 2, 2);
+}
+
+void mtip_switch_dis_port_separation(struct switch_enet_private *fep)
+{
+	writel(0, fep->hwp + ESW_PID(0));
+	writel(0, fep->hwp + ESW_PID(1));
+	writel(0, fep->hwp + ESW_PID(2));
+
+	writel(0, fep->hwp + ESW_VRES(0));
+	writel(0, fep->hwp + ESW_VRES(1));
+	writel(0, fep->hwp + ESW_VRES(2));
+
+	writel(0, fep->hwp + ESW_VIMEN);
+	writel(0, fep->hwp + ESW_VIMSEL);
+	writel(0, fep->hwp + ESW_VLANV);
+	writel(0, fep->hwp + ESW_VOMSEL);
+}
+
+int mtip_port_broadcast_config(struct switch_enet_private *fep,
+			       int port, bool enable)
+{
+	u32 reg = 0;
+
+	if (port < 0 || port > 2) {
+		dev_err(&fep->pdev->dev, "%s: Port (%d) not supported\n",
+			__func__, port);
+		return -EINVAL;
+	}
+
+	reg = readl(fep->hwp + ESW_DBCR);
+	if (enable) {
+		if (port == 0)
+			reg |= MCF_ESW_DBCR_P0;
+		else if (port == 1)
+			reg |= MCF_ESW_DBCR_P1;
+		else if (port == 2)
+			reg |= MCF_ESW_DBCR_P2;
+	} else {
+		if (port == 0)
+			reg &= ~MCF_ESW_DBCR_P0;
+		else if (port == 1)
+			reg &= ~MCF_ESW_DBCR_P1;
+		else if (port == 2)
+			reg &= ~MCF_ESW_DBCR_P2;
+	}
+
+	writel(reg, fep->hwp + ESW_DBCR);
+	return 0;
+}
+
+/* The frame is forwarded to the forced destination ports.
+ * It only replace the MAC lookup function,
+ * all other filtering(eg.VLAN verification) act as normal
+ */
+int mtip_forced_forward(struct switch_enet_private *fep, int port, bool enable)
+{
+	u32 reg = 0;
+
+	if (port & ~GENMASK(1, 0)) {
+		dev_err(&fep->pdev->dev,
+			"%s: Forced forward for port(s): 0x%x not supported!\n",
+			__func__, port);
+		return -EINVAL;
+	}
+
+	/* Enable Forced forwarding for port(s) */
+	reg |= FIELD_PREP(MCF_ESW_P0FFEN_FD_MASK, port & GENMASK(1, 0));
+
+	if (enable)
+		reg |= MCF_ESW_P0FFEN_FEN;
+	else
+		reg &= ~MCF_ESW_P0FFEN_FEN;
+
+	writel(reg, fep->hwp + ESW_P0FFEN);
+	return 0;
+}
+
+int mtip_port_learning_config(struct switch_enet_private *fep, int port,
+			      bool disable, bool irq_adj)
+{
+	u32 reg = 0;
+
+	if (port < 0 || port > 2) {
+		dev_err(&fep->pdev->dev, "%s: Port (%d) not supported\n",
+			__func__, port);
+		return -EINVAL;
+	}
+
+	reg = readl(fep->hwp + ESW_BKLR);
+	if (disable) {
+		if (irq_adj)
+			writel(readl(fep->hwp + ESW_IMR) & ~MCF_ESW_IMR_LRN,
+			       fep->hwp + ESW_IMR);
+
+		if (port == 0)
+			reg |= MCF_ESW_BKLR_LD0;
+		else if (port == 1)
+			reg |= MCF_ESW_BKLR_LD1;
+		else if (port == 2)
+			reg |= MCF_ESW_BKLR_LD2;
+	} else {
+		if (irq_adj)
+			writel(readl(fep->hwp + ESW_IMR) | MCF_ESW_IMR_LRN,
+			       fep->hwp + ESW_IMR);
+
+		if (port == 0)
+			reg &= ~MCF_ESW_BKLR_LD0;
+		else if (port == 1)
+			reg &= ~MCF_ESW_BKLR_LD1;
+		else if (port == 2)
+			reg &= ~MCF_ESW_BKLR_LD2;
+	}
+
+	writel(reg, fep->hwp + ESW_BKLR);
+	dev_dbg(&fep->pdev->dev, "%s ESW_BKLR %#x, ESW_IMR %#x\n", __func__,
+		readl(fep->hwp + ESW_BKLR), readl(fep->hwp + ESW_IMR));
+
+	return 0;
+}
+
+int mtip_port_blocking_config(struct switch_enet_private *fep, int port,
+			      bool enable)
+{
+	u32 reg = 0;
+
+	if (port < 0 || port > 2) {
+		dev_err(&fep->pdev->dev, "%s: Port (%d) not supported\n",
+			__func__, port);
+		return -EINVAL;
+	}
+
+	reg = readl(fep->hwp + ESW_BKLR);
+	if (enable) {
+		if (port == 0)
+			reg |= MCF_ESW_BKLR_BE0;
+		else if (port == 1)
+			reg |= MCF_ESW_BKLR_BE1;
+		else if (port == 2)
+			reg |= MCF_ESW_BKLR_BE2;
+	} else {
+		if (port == 0)
+			reg &= ~MCF_ESW_BKLR_BE0;
+		else if (port == 1)
+			reg &= ~MCF_ESW_BKLR_BE1;
+		else if (port == 2)
+			reg &= ~MCF_ESW_BKLR_BE2;
+	}
+
+	writel(reg, fep->hwp + ESW_BKLR);
+	return 0;
+}
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [net-next v24 7/7] net: mtip: Extend the L2 switch driver for imx287 with bridge operations
  2026-06-01 11:24 [net-next v24 0/7] net: mtip: Add support for MTIP imx287 L2 switch driver Lukasz Majewski
                   ` (5 preceding siblings ...)
  2026-06-01 11:24 ` [net-next v24 6/7] net: mtip: Extend the L2 switch driver with management operations Lukasz Majewski
@ 2026-06-01 11:24 ` Lukasz Majewski
  2026-06-02 11:32   ` sashiko-bot
  6 siblings, 1 reply; 21+ messages in thread
From: Lukasz Majewski @ 2026-06-01 11:24 UTC (permalink / raw)
  To: Andrew Lunn, davem, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Richard Cochran, Liam Girdwood, Nathan Chancellor, linux-kernel,
	netdev, Krzysztof Kozlowski, imx, linux-arm-kernel, Rob Herring,
	Lukasz Majewski

After this change the MTIP L2 switch can be configured as offloading
device for packet switching when bridge on its interfaces is created.

Signed-off-by: Lukasz Majewski <lukasz.majewski@mailbox.org>
---

Changes for v13:
- New patch - created by excluding some code from large (i.e. v12 and
  earlier) MTIP driver

Changes for v14 - v15:
- None

Changes for v16:
- Enable MTIP ports to support bridge offloading

Changes for v17 - v20:
- None

Changes for v21:
- Avoid double call of notifier_from_errno() on -EOPNOTSUPP
- Rollback changes to driver state when switchdev_bridge_port_offload()
  fails

Changes for v22:
- Reorder setting of br_members and master_dev to successful call of
  switchdev_bridge_port_offload()

Changes for v23:
- Update copyright notice
- Clear the br_offload flag and enable switch port separation

Changes for v24:
- None
---
 .../net/ethernet/freescale/mtipsw/Makefile    |   2 +-
 .../net/ethernet/freescale/mtipsw/mtipl2sw.c  |   9 +-
 .../net/ethernet/freescale/mtipsw/mtipl2sw.h  |   2 +
 .../ethernet/freescale/mtipsw/mtipl2sw_br.c   | 138 ++++++++++++++++++
 4 files changed, 149 insertions(+), 2 deletions(-)
 create mode 100644 drivers/net/ethernet/freescale/mtipsw/mtipl2sw_br.c

diff --git a/drivers/net/ethernet/freescale/mtipsw/Makefile b/drivers/net/ethernet/freescale/mtipsw/Makefile
index a99aaf6ddfb2..81e2b0e03e6c 100644
--- a/drivers/net/ethernet/freescale/mtipsw/Makefile
+++ b/drivers/net/ethernet/freescale/mtipsw/Makefile
@@ -1,4 +1,4 @@
 # SPDX-License-Identifier: GPL-2.0
 
 obj-$(CONFIG_FEC_MTIP_L2SW) += nxp-mtipl2sw.o
-nxp-mtipl2sw-objs := mtipl2sw.o mtipl2sw_mgnt.o
+nxp-mtipl2sw-objs := mtipl2sw.o mtipl2sw_mgnt.o mtipl2sw_br.o
diff --git a/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c b/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c
index 76e503e128dd..0a12d115b78b 100644
--- a/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c
+++ b/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c
@@ -1961,11 +1961,15 @@ static int mtip_sw_probe(struct platform_device *pdev)
 
 	mtip_clear_atable(fep);
 
+	ret = mtip_register_notifiers(fep);
+	if (ret)
+		goto of_free_references;
+
 	ret = mtip_switch_dma_init(fep);
 	if (ret) {
 		dev_err(&pdev->dev, "%s: ethernet switch init fail (%d)!\n",
 			__func__, ret);
-		goto of_free_references;
+		goto unregister_notifiers;
 	}
 
 	ret = mtip_mii_init(fep, pdev);
@@ -1997,6 +2001,8 @@ static int mtip_sw_probe(struct platform_device *pdev)
 			  fep->bd_dma);
 	fep->rx_bd_base = NULL;
 	fep->tx_bd_base = NULL;
+ unregister_notifiers:
+	mtip_unregister_notifiers(fep);
  of_free_references:
 	mtip_parse_of_cleanup(fep);
 
@@ -2009,6 +2015,7 @@ static void mtip_sw_remove(struct platform_device *pdev)
 
 	timer_delete_sync(&fep->timer_mgnt);
 
+	mtip_unregister_notifiers(fep);
 	mtip_ndev_cleanup(fep);
 
 	mtip_parse_of_cleanup(fep);
diff --git a/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.h b/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.h
index 72784df6c127..49d8454f7cd1 100644
--- a/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.h
+++ b/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.h
@@ -640,6 +640,8 @@ int mtip_port_learning_config(struct switch_enet_private *fep, int port,
 int mtip_port_blocking_config(struct switch_enet_private *fep, int port,
 			      bool enable);
 bool mtip_is_switch_netdev_port(const struct net_device *ndev);
+int mtip_register_notifiers(struct switch_enet_private *fep);
+void mtip_unregister_notifiers(struct switch_enet_private *fep);
 int mtip_port_enable_config(struct switch_enet_private *fep, int port,
 			    bool tx_en, bool rx_en);
 void mtip_clear_atable(struct switch_enet_private *fep);
diff --git a/drivers/net/ethernet/freescale/mtipsw/mtipl2sw_br.c b/drivers/net/ethernet/freescale/mtipsw/mtipl2sw_br.c
new file mode 100644
index 000000000000..c9d9151b0e81
--- /dev/null
+++ b/drivers/net/ethernet/freescale/mtipsw/mtipl2sw_br.c
@@ -0,0 +1,138 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ *  L2 switch Controller driver for MTIP block - bridge network interface
+ *
+ *  Copyright (C) 2026 NABLADEV Software Engineering GmbH
+ *  Lukasz Majewski <lukma@nabladev.com>
+ */
+
+#include <linux/etherdevice.h>
+#include <linux/netdevice.h>
+#include <linux/platform_device.h>
+#include <net/switchdev.h>
+
+#include "mtipl2sw.h"
+
+static int mtip_ndev_port_link(struct net_device *ndev,
+			       struct net_device *br_ndev,
+			       struct netlink_ext_ack *extack)
+{
+	struct mtip_ndev_priv *priv = netdev_priv(ndev), *other_priv;
+	struct switch_enet_private *fep = priv->fep;
+	struct net_device *other_ndev;
+	int err;
+
+	/* Check if one port of MTIP switch is already bridged */
+	if (fep->br_members && !fep->br_offload) {
+		/* Get the second bridge ndev */
+		other_ndev = fep->ndev[fep->br_members - 1];
+		other_priv = netdev_priv(other_ndev);
+		if (other_priv->master_dev != br_ndev) {
+			NL_SET_ERR_MSG_MOD(extack,
+					   "L2 offloading only possible for the same bridge!");
+			return -EOPNOTSUPP;
+		}
+
+		fep->br_offload = 1;
+		mtip_switch_dis_port_separation(fep);
+		mtip_clear_atable(fep);
+	}
+
+	err = switchdev_bridge_port_offload(ndev, ndev, NULL, NULL, NULL,
+					    false, extack);
+	if (err) {
+		dev_err(&ndev->dev, "can't offload bridge port %s [err: %d]\n",
+			ndev->name, err);
+
+		if (fep->br_members && fep->br_offload) {
+			fep->br_offload = 0;
+			mtip_switch_en_port_separation(fep);
+		}
+
+		return err;
+	}
+
+	if (!priv->master_dev)
+		priv->master_dev = br_ndev;
+
+	fep->br_members |= BIT(priv->portnum - 1);
+
+	dev_dbg(&ndev->dev,
+		"%s: ndev: %s br: %s fep: %p members: 0x%x offload: %d\n",
+		__func__, ndev->name,  br_ndev->name, fep, fep->br_members,
+		fep->br_offload);
+
+	return NOTIFY_DONE;
+}
+
+static void mtip_netdevice_port_unlink(struct net_device *ndev)
+{
+	struct mtip_ndev_priv *priv = netdev_priv(ndev);
+	struct switch_enet_private *fep = priv->fep;
+
+	dev_dbg(&ndev->dev, "%s: ndev: %s members: 0x%x\n", __func__,
+		ndev->name, fep->br_members);
+
+	switchdev_bridge_port_unoffload(ndev, NULL, NULL, NULL);
+
+	fep->br_members &= ~BIT(priv->portnum - 1);
+	priv->master_dev = NULL;
+
+	if (fep->br_members && fep->br_offload) {
+		fep->br_offload = 0;
+		mtip_switch_en_port_separation(fep);
+		mtip_clear_atable(fep);
+	}
+}
+
+/* netdev notifier */
+static int mtip_netdevice_event(struct notifier_block *unused,
+				unsigned long event, void *ptr)
+{
+	struct net_device *ndev = netdev_notifier_info_to_dev(ptr);
+	struct netdev_notifier_changeupper_info *info = ptr;
+	struct netlink_ext_ack *extack;
+	int ret = NOTIFY_DONE;
+
+	if (!mtip_is_switch_netdev_port(ndev))
+		return NOTIFY_DONE;
+
+	extack = netdev_notifier_info_to_extack(&info->info);
+
+	switch (event) {
+	case NETDEV_CHANGEUPPER:
+		if (!netif_is_bridge_master(info->upper_dev))
+			break;
+
+		if (info->linking)
+			ret = mtip_ndev_port_link(ndev, info->upper_dev,
+						  extack);
+		else
+			mtip_netdevice_port_unlink(ndev);
+
+		break;
+	default:
+		return NOTIFY_DONE;
+	}
+
+	return notifier_from_errno(ret);
+}
+
+static struct notifier_block mtip_netdevice_nb __read_mostly = {
+	.notifier_call = mtip_netdevice_event,
+};
+
+int mtip_register_notifiers(struct switch_enet_private *fep)
+{
+	int ret = register_netdevice_notifier(&mtip_netdevice_nb);
+
+	if (ret)
+		dev_err(&fep->pdev->dev, "can't register netdevice notifier\n");
+
+	return ret;
+}
+
+void mtip_unregister_notifiers(struct switch_enet_private *fep)
+{
+	unregister_netdevice_notifier(&mtip_netdevice_nb);
+}
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [net-next v24 1/7] dt-bindings: net: Add MTIP L2 switch description
  2026-06-01 11:24 ` [net-next v24 1/7] dt-bindings: net: Add MTIP L2 switch description Lukasz Majewski
@ 2026-06-02 11:32   ` sashiko-bot
  0 siblings, 0 replies; 21+ messages in thread
From: sashiko-bot @ 2026-06-02 11:32 UTC (permalink / raw)
  To: Lukasz Majewski; +Cc: Frank.Li, imx

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Low] Schema Validation Bypass due to `additionalProperties: true`
--

commit 5d51965bfdbb5587449b5ea451099860ef23a8f0
Author: Lukasz Majewski <lukasz.majewski@mailbox.org>

dt-bindings: net: Add MTIP L2 switch description

This commit introduces the device tree binding for the MTIP L2 switch
found in some NXP SoCs like the imx287. It defines properties for ports,
MDIO, clocks, interrupts, and provides an example configuration.

> diff --git a/Documentation/devicetree/bindings/net/nxp,imx28-mtip-switch.yaml b/Documentation/devicetree/bindings/net/nxp,imx28-mtip-switch.yaml
> new file mode 100644
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/net/nxp,imx28-mtip-switch.yaml

[ ... ]

> +  pinctrl-names: true
> +
> +  ethernet-ports:
> +    type: object
> +    additionalProperties: true

[Severity: Low]
Does setting additionalProperties to true here disable strict schema
validation for the container node?

Allowing additional properties means that undocumented properties and
arbitrary child nodes will pass without error. Should this be set to
unevaluatedProperties: false (or additionalProperties: false) to ensure
that only documented properties are permitted?

> +
> +    patternProperties:
> +      '^ethernet-port@[12]$':
> +        type: object
> +        additionalProperties: true

[Severity: Low]
Can this bypass strict property validation for the individual port nodes?

If validation is disabled here, typos like 'phy-mod' instead of 'phy-mode'
will silently pass the dt_binding_check validation.

> +        properties:
> +          reg:
> +            items:
> +              - enum: [1, 2]
> +            description: MTIP L2 switch port number

[ ... ]

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260601112437.2216043-1-lukasz.majewski@mailbox.org?part=1

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [net-next v24 2/7] net: mtip: The L2 switch driver for imx287
  2026-06-01 11:24 ` [net-next v24 2/7] net: mtip: The L2 switch driver for imx287 Lukasz Majewski
@ 2026-06-02 11:32   ` sashiko-bot
  2026-06-05  2:18   ` Jakub Kicinski
  1 sibling, 0 replies; 21+ messages in thread
From: sashiko-bot @ 2026-06-02 11:32 UTC (permalink / raw)
  To: Lukasz Majewski; +Cc: Frank.Li, imx

Thank you for your contribution! Sashiko AI review found 12 potential issue(s) to consider:
- [High] Premature cleanup of PHY device nodes leading to Use-After-Free and Double Free.
- [High] Missing mandatory `ndo_start_xmit` callback in `net_device_ops`.
- [High] Incorrect teardown of a self-rearming timer causes Use-After-Free.
- [High] Unaligned memory access and strict aliasing violation when reading MAC address.
- [High] Out-of-bounds access and NULL pointer dereference if device tree defines fewer than 2 ports.
- [High] Shared NAPI instance lifecycle is incorrectly tied to independent network devices, leading to Use-After-Free.
- [High] Hardware IRQ is requested before NAPI initialization is completed.
- [High] Hardware DMA and interrupts are not halted during device teardown, causing memory corruption.
- [High] Data race and double-frees during asynchronous hardware reset in `mtip_adjust_link()`.
- [High] TOCTOU race condition on the interrupt mask register causes interrupt storms.
- [High] Incorrect base address used during MII initialization corrupts the switch Multicast Control Register.
- [High] DMA descriptor rings are not reinitialized during hardware reset, causing UAF and double transmissions.
--

commit 9013be10c1b09f8737ae605cb41647830304da99
Author: Lukasz Majewski <lukasz.majewski@mailbox.org>

net: mtip: The L2 switch driver for imx287

This commit introduces support for the More Than IP L2 switch embedded in the
imx287 SoC. It allows offloading network traffic between the internal uDMA and
MAC-NET blocks.

> diff --git a/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c b/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c
> new file mode 100644
> index 0000000000000..1d1005a42306d
> --- /dev/null
> +++ b/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c
> @@ -0,0 +1,1363 @@

[ ... ]

> +static void mtip_setup_mac(struct net_device *dev)
> +{
> +	struct mtip_ndev_priv *priv = netdev_priv(dev);
> +	struct switch_enet_private *fep = priv->fep;
> +	unsigned char *iap, mac_addr[ETH_ALEN];
> +
> +	/* Use MAC address from DTS */
> +	iap = &fep->mac[priv->portnum - 1][0];
> +
> +	/* Use MAC address set by bootloader */
> +	if (!is_valid_ether_addr(iap)) {
> +		*((__be32 *)&mac_addr[0]) =
> +			cpu_to_be32(readl(fep->enet_addr + MCF_FEC_PALR));
> +		*((__be16 *)&mac_addr[4]) =
> +			cpu_to_be16(readl(fep->enet_addr +
> +					  MCF_FEC_PAUR) >> 16);
> +		iap = &mac_addr[0];
> +	}

[Severity: High]
Does this direct casting of a byte array to a 32-bit and 16-bit pointer violate
strict aliasing and cause unaligned access faults on architectures that do not
support unaligned 32-bit stores? Using put_unaligned_be32() and
put_unaligned_be16() might be safer here.

[ ... ]

> +static void mtip_mgnt_timer(struct timer_list *t)
> +{
> +	struct switch_enet_private *fep =
> +		container_of(t, struct switch_enet_private, timer_mgnt);
> +
> +	mtip_atable_dynamicms_learn_migration(fep, mtip_get_time(),
> +					      NULL, NULL);
> +	mod_timer(&fep->timer_mgnt,
> +		  jiffies + msecs_to_jiffies(LEARNING_AGING_INTERVAL));
> +}

[Severity: High]
Since the management timer callback unconditionally re-arms itself here, does
calling timer_delete_sync() in mtip_sw_remove() fail to permanently disarm
the timer? Should this be using timer_shutdown_sync() in the teardown path
to prevent a use-after-free if the callback fires after the structures are
freed?

[ ... ]

> +static void mtip_switch_restart(struct net_device *dev, int duplex0,
> +				int duplex1)
> +{

[ ... ]

> +	fep->cur_tx = fep->tx_bd_base;
> +	fep->cur_rx = fep->rx_bd_base;
> +	fep->dirty_tx = fep->cur_tx;
> +
> +	/* Reset SKB transmit buffers */
> +	for (i = 0; i <= TX_RING_MOD_MASK; i++) {
> +		if (fep->tx_skbuff[i]) {
> +			dev_kfree_skb_any(fep->tx_skbuff[i]);
> +			fep->tx_skbuff[i] = NULL;
> +		}
> +	}

[Severity: High]
When freeing the software tracking structures here, the hardware descriptor
status fields (cbd_sc) are left unmodified. Could the hardware immediately
re-transmit stale, previously READY descriptors upon reset, and subsequently
dereference the NULL tx_skbuff pointer upon completion?

[ ... ]

> +static irqreturn_t mtip_interrupt(int irq, void *ptr_fep)
> +{
> +	struct switch_enet_private *fep = ptr_fep;
> +	irqreturn_t ret = IRQ_NONE;
> +	u32 int_events, int_imask;
> +
> +	/* Get the interrupt events that caused us to be here */
> +	int_events = readl(fep->hwp + ESW_ISR);
> +	writel(int_events, fep->hwp + ESW_ISR);
> +
> +	if (int_events & (MCF_ESW_ISR_RXF | MCF_ESW_ISR_TXF)) {
> +		ret = IRQ_HANDLED;
> +		/* Disable the RX interrupt */
> +		if (napi_schedule_prep(&fep->napi)) {
> +			int_imask = readl(fep->hwp + ESW_IMR);
> +			int_imask &= ~MCF_ESW_IMR_RXF;
> +			writel(int_imask, fep->hwp + ESW_IMR);
> +			__napi_schedule(&fep->napi);
> +		}
> +	}
> +
> +	return ret;
> +}

[Severity: High]
Is there a TOCTOU race condition here if mtip_rx_napi() concurrently writes
MCF_ESW_IMR_RXF | MCF_ESW_IMR_TXF to re-enable interrupts? If mtip_rx_napi()
writes immediately after this readl(), could this read-modify-write clobber
that update and leave the RXF interrupt enabled, causing an interrupt storm
while NAPI is actively running?

[ ... ]

> +static void mtip_adjust_link(struct net_device *dev)
> +{
> +	struct mtip_ndev_priv *priv = netdev_priv(dev);
> +	struct switch_enet_private *fep = priv->fep;
> +	struct phy_device *phy_dev;
> +	int status_change = 0, idx;
> +
> +	idx = priv->portnum - 1;
> +	phy_dev = fep->phy_dev[idx];
> +
> +	/* Duplex link change */
> +	if (phy_dev->link && fep->full_duplex[idx] != phy_dev->duplex) {
> +		mtip_netif_stop_queues(fep);
> +		if (idx == 0)
> +			mtip_switch_restart(dev, phy_dev->duplex,
> +					    fep->full_duplex[!idx]);

[Severity: High]
Does initiating a full hardware reset here race asynchronously with NAPI
softirqs or active transmit paths? Without disabling NAPI via napi_disable()
or holding hardware locks, could this lead to SKB double-frees and ring
descriptor desynchronization?

[ ... ]

> +static int mtip_mii_init(struct switch_enet_private *fep,
> +			 struct platform_device *pdev)
> +{
> +	struct device_node *node;
> +	int err = -ENXIO;
> +
> +	/* Clear MMFR to avoid to generate MII event by writing MSCR.
> +	 * MII event generation condition:
> +	 * - writing MSCR:
> +	 *      - mmfr[31:0]_not_zero & mscr[7:0]_is_zero &
> +	 *        mscr_reg_data_in[7:0] != 0
> +	 * - writing MMFR:
> +	 *      - mscr[7:0]_not_zero
> +	 */
> +	writel(0, fep->hwp + MCF_FEC_MII_DATA);

[Severity: High]
Is the switch base address fep->hwp intentionally used here instead of the
MAC base address fep->enet_addr? Offset 0x040 in the switch block is the
Multicast Control Register (ESW_MCR). Could this corrupt the switch multicast
configuration while leaving the MAC's register uninitialized?

[ ... ]

> +static int mtip_close(struct net_device *dev)
> +{
> +	struct mtip_ndev_priv *priv = netdev_priv(dev);
> +	struct switch_enet_private *fep = priv->fep;
> +	int idx = priv->portnum - 1;
> +
> +	fep->link[idx] = 0;
> +
> +	if (fep->phy_dev[idx]) {
> +		phy_stop(fep->phy_dev[idx]);
> +		netif_stop_queue(dev);
> +		phy_disconnect(fep->phy_dev[idx]);
> +		fep->phy_dev[idx] = NULL;
> +	}
> +
> +	if (fep->usage_count == 1) {
> +		napi_disable(&fep->napi);
> +		netif_napi_del(&fep->napi);
> +		mtip_free_buffers(dev);
> +		clk_disable(fep->clk_ipg);
> +	}

[Severity: High]
If ndev[0] is unregistered and freed first, usage_count will drop to 1 and NAPI
teardown will be skipped. Because fep->napi is shared and was registered
against the first device, does this leave the active NAPI instance with a
dangling pointer to the freed ndev[0], resulting in a use-after-free when
ndev[1] triggers an interrupt?

[ ... ]

> +static const struct net_device_ops mtip_netdev_ops = {
> +	.ndo_open		= mtip_open,
> +	.ndo_stop		= mtip_close,
> +};

[Severity: High]
The mandatory ndo_start_xmit callback is missing from this definition. Will any
attempt to transmit packets cause an unconditional kernel panic in the core
networking stack? (Note: Although this might be fixed in a later patch in the
series, introducing it here breaks bisectability.)

[ ... ]

> +static int mtip_ndev_init(struct switch_enet_private *fep,
> +			  struct platform_device *pdev)
> +{
> +	struct mtip_ndev_priv *priv;
> +	int i, ret = 0;
> +
> +	for (i = 0; i < SWITCH_EPORT_NUMBER; i++) {
> +		fep->ndev[i] = alloc_netdev(sizeof(struct mtip_ndev_priv),
> +					    fep->ndev_name[i], NET_NAME_USER,
> +					    ether_setup);

[Severity: High]
If the device tree specifies fewer than 2 ethernet ports, fep->ndev_name[1]
could remain NULL. Because the loop unconditionally runs SWITCH_EPORT_NUMBER
times, will passing a NULL name string to alloc_netdev() cause a NULL pointer
dereference in strscpy()?

[ ... ]

> +static int mtip_parse_of(struct switch_enet_private *fep,
> +			 struct device_node *np)
> +{
> +	struct device_node *p;
> +	unsigned int port_num;
> +	int ret = 0;

[ ... ]

> +		fep->phy_np[port_num - 1] = of_parse_phandle(port,
> +							     "phy-handle", 0);
> +		if (!fep->phy_np[port_num - 1]) {
> +			dev_err(&fep->pdev->dev,
> +				"%s: PHY handle not defined!\n", __func__);
> +			ret = -ENODEV;
> +			goto of_get_err;
> +		}
> +	}
> +
> + of_get_err:
> +	mtip_parse_of_cleanup(fep);
> +	of_node_put(p);
> +
> +	return ret;
> +}

[Severity: High]
Is there a missing return 0; before the of_get_err label? It appears the
successful path falls through into mtip_parse_of_cleanup(), which decrements
the reference count of the PHY device nodes without clearing the pointers,
ultimately leading to a use-after-free during PHY probe.

[ ... ]

> +static int mtip_sw_probe(struct platform_device *pdev)
> +{

[ ... ]

> +	ret = devm_request_irq(&pdev->dev, fep->irq, mtip_interrupt, 0,
> +			       dev_name(&pdev->dev), fep);
> +	if (ret) {
> +		mtip_parse_of_cleanup(fep);
> +		return dev_err_probe(&pdev->dev, ret, "Could not alloc IRQ\n");
> +	}

[Severity: High]
Is it safe to request the IRQ here before fep->napi is fully initialized in
mtip_open()? If an interrupt is pending or fires immediately, could
mtip_interrupt() trigger napi_schedule_prep() on uninitialized memory and
crash the system?

[ ... ]

> +static void mtip_sw_remove(struct platform_device *pdev)
> +{
> +	struct switch_enet_private *fep = platform_get_drvdata(pdev);
> +
> +	timer_delete_sync(&fep->timer_mgnt);
> +
> +	mtip_ndev_cleanup(fep);
> +
> +	mtip_parse_of_cleanup(fep);
> +	mtip_mii_remove(fep);
> +
> +	dma_free_coherent(&fep->pdev->dev, PAGE_SIZE, fep->rx_bd_base,
> +			  fep->bd_dma);

[Severity: High]
Are the hardware DMA engine and interrupts halted before the DMA rings are
freed here? Because the switch hardware remains active, could incoming
packets cause the DMA engine to write into these freed pages, corrupting
kernel memory?

> +	fep->rx_bd_base = NULL;
> +	fep->tx_bd_base = NULL;
> +
> +	platform_set_drvdata(pdev, NULL);
> +}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260601112437.2216043-1-lukasz.majewski@mailbox.org?part=2

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [net-next v24 3/7] net: mtip: Add buffers management functions to the L2 switch driver
  2026-06-01 11:24 ` [net-next v24 3/7] net: mtip: Add buffers management functions to the L2 switch driver Lukasz Majewski
@ 2026-06-02 11:32   ` sashiko-bot
  2026-06-05  2:18   ` Jakub Kicinski
  1 sibling, 0 replies; 21+ messages in thread
From: sashiko-bot @ 2026-06-02 11:32 UTC (permalink / raw)
  To: Lukasz Majewski; +Cc: imx, Frank.Li

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [Critical] mtip_free_buffers() is invoked unconditionally on the mtip_open() error path, causing Use-After-Free for shared ring buffers when usage_count > 0.
- [High] mtip_free_buffers() frees DMA-mapped buffers without ensuring the hardware MAC/DMA engine is stopped.
- [Medium] Pending transmit sk_buffs are leaked in mtip_free_buffers().
--

commit 4b7d8928772619aad35c821850421906a5fec249
Author: Lukasz Majewski <lukasz.majewski@mailbox.org>
Date:   Mon Jun 1 11:24:32 2026 +0000

net: mtip: Add buffers management functions to the L2 switch driver

This patch provides buffers management functions' content for the MTIP
L2 switch driver. It adds implementations for allocating and freeing
shared ring buffers and page pools.

> diff --git a/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c b/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c
> index 1d1005a42306d..5b5229ee411df 100644
> --- a/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c
> +++ b/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c
> @@ -892,11 +892,101 @@ static void mtip_get_drvinfo(struct net_device *dev,
>  
>  static void mtip_free_buffers(struct net_device *dev)
>  {
> +	struct mtip_ndev_priv *priv = netdev_priv(dev);
> +	struct switch_enet_private *fep = priv->fep;
> +	int i;
> +

[Severity: High]
Does this function ensure the hardware MAC and DMA engines are stopped
before freeing the buffers?

Looking at its caller in mtip_close(), mtip_free_buffers() is invoked when
fep->usage_count == 1. However, mtip_close() does not appear to explicitly
stop the MAC, disable the DMA engine, or reset the hardware before calling
this. It only disables the clock after mtip_free_buffers() returns:

	if (fep->usage_count == 1) {
		napi_disable(&fep->napi);
		netif_napi_del(&fep->napi);
		mtip_free_buffers(dev);
		clk_disable(fep->clk_ipg);
	}

If the hardware is still processing queued TX packets or receiving RX packets,
could it continue to perform DMA operations on the memory that has just been
freed and reallocated to the system?

[Severity: Critical]
Is it safe for the mtip_open() error path to call this function
unconditionally?

When a subsequent port is brought up (fep->usage_count > 0) and
mtip_mii_probe() fails, the error path goes to mtip_mii_probe_err and calls
mtip_free_buffers(dev) regardless of the usage count:

 mtip_mii_probe_err:
	mtip_free_buffers(dev);

Since these buffers are shared across all switch ports, won't this destroy
the page pool and free the TX bounce buffers while the first port's NAPI
and hardware DMA are still actively using them?

> +	for (i = 0; i < RX_RING_SIZE; i++) {
> +		if (!fep->page[i])
> +			continue;
> +
> +		page_pool_put_full_page(fep->page_pool,
> +					fep->page[i], false);
> +		fep->page[i] = NULL;
> +	}
> +
> +	page_pool_destroy(fep->page_pool);
> +	fep->page_pool = NULL;
> +
> +	for (i = 0; i < TX_RING_SIZE; i++) {
> +		kfree(fep->tx_bounce[i]);
> +		fep->tx_bounce[i] = NULL;
> +	}

[Severity: Medium]
Are pending transmit sk_buffs leaked here?

This cleanup correctly frees the fep->tx_bounce buffers, but it does not
seem to iterate over the fep->tx_skbuff array to free any pending socket
buffers that were queued for transmission. Will any sk_buff still in
tx_skbuff[i] when the interface is closed be permanently leaked?

> +}

[ ... ]

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260601112437.2216043-1-lukasz.majewski@mailbox.org?part=3

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [net-next v24 4/7] net: mtip: Add net_device_ops functions to the L2 switch driver
  2026-06-01 11:24 ` [net-next v24 4/7] net: mtip: Add net_device_ops " Lukasz Majewski
@ 2026-06-02 11:32   ` sashiko-bot
  2026-06-05  2:18   ` Jakub Kicinski
  1 sibling, 0 replies; 21+ messages in thread
From: sashiko-bot @ 2026-06-02 11:32 UTC (permalink / raw)
  To: Lukasz Majewski; +Cc: Frank.Li, imx

Thank you for your contribution! Sashiko AI review found 7 potential issue(s) to consider:
- [Critical] Unchecked memcpy into a fixed-size bounce buffer causes a heap buffer overflow if the SKB exceeds the buffer size.
- [High] Accessing an SKB after handing it to the hardware transmission ring causes a use-after-free.
- [High] The watchdog timeout work logic evaluates to true when the device is closed, leading to hardware access crashes.
- [High] Missing hardware lock during `mtip_switch_restart()` causes a data race with concurrent transmissions.
- [High] Mapping a fixed maximum size instead of the actual SKB length causes DMA out-of-bounds reads.
- [High] Incorrect teardown ordering in `mtip_ndev_cleanup` causes a Use-After-Free race with the TX timeout work.
- [Medium] The TX path maps buffers for DMA but does not unmap them during a timeout restart, causing a DMA mapping leak.
--

commit 472dd23af31f2a6b37da050f58cdf8ed66693ebc
Author: Lukasz Majewski <lukasz.majewski@mailbox.org>

net: mtip: Add net_device_ops functions to the L2 switch driver

This patch provides callbacks for struct net_device_ops for MTIP
L2 switch.

> diff --git a/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c b/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c
> index 5b5229ee411df..7dfeda230e3fb 100644
> --- a/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c
> +++ b/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c
[ ... ]
> @@ -463,6 +472,120 @@ static void mtip_config_switch(struct switch_enet_private *fep)
[ ... ]
> +	/* On some FEC implementations data must be aligned on
> +	 * 4-byte boundaries. Use bounce buffers to copy data
> +	 * and get it aligned.
> +	 */
> +	if ((unsigned long)bufaddr & MTIP_ALIGNMENT ||
> +	    fep->quirks & FEC_QUIRK_SWAP_FRAME) {
> +		memcpy(fep->tx_bounce[index], skb->data, skb->len);

[Severity: Critical]
Does this code overflow the bounce buffer? The bounce buffer has a fixed size,
but there appears to be no bounds check to ensure the skb length actually
fits. A maliciously constructed oversized packet could cause a heap buffer
overflow.

> +		bufaddr = fep->tx_bounce[index];
> +
> +		if (fep->quirks & FEC_QUIRK_SWAP_FRAME)
> +			swap_buffer(bufaddr, skb->len);
> +	}
> +
> +	/* Push the data cache so the CPM does not get stale memory
> +	 * data.
> +	 */
> +	bdp->cbd_bufaddr = dma_map_single(&fep->pdev->dev, bufaddr,
> +					  MTIP_SWITCH_TX_FRSIZE,
> +					  DMA_TO_DEVICE);

[Severity: High]
Is it safe to map a fixed maximum size instead of the actual packet length?
Since the underlying memory allocated for the packet may be smaller than
MTIP_SWITCH_TX_FRSIZE, this can force the DMA API to flush memory beyond the
bounds of the allocation, potentially triggering an out-of-bounds read or
IOMMU faults.

> +	if (unlikely(dma_mapping_error(&fep->pdev->dev, bdp->cbd_bufaddr))) {
[ ... ]
> +	/* Send it on its way.  Tell FEC it's ready, interrupt when done,
> +	 * it's the last BD of the frame, and to put the CRC on the end.
> +	 */
> +
> +	status |= (BD_ENET_TX_READY | BD_ENET_TX_INTR | BD_ENET_TX_LAST |
> +		   BD_ENET_TX_TC);
> +
> +	/* Synchronize all descriptor writes */
> +	wmb();
> +	bdp->cbd_sc = status;
> +
> +	skb_tx_timestamp(skb);
> +
> +	dev->stats.tx_bytes += skb->len;

[Severity: High]
Can this sequence cause a use-after-free? Once the descriptor status is
updated and the memory barrier is issued, ownership of the buffer is
transferred to the hardware. The transmission completion interrupt could free
the packet before the timestamp and length are accessed here.

[ ... ]
> +static void mtip_timeout_work(struct work_struct *work)
> +{
> +	struct mtip_ndev_priv *priv =
> +		container_of(work, struct mtip_ndev_priv, tx_timeout_work);
> +	struct switch_enet_private *fep = priv->fep;
> +	struct net_device *dev = priv->dev;
> +
> +	rtnl_lock();
> +	if (netif_device_present(dev) || netif_running(dev)) {

[Severity: High]
Is this logical condition correct? The netif_device_present check evaluates to
true as long as the device is registered, even if it is administratively down.
Writing to the hardware registers via mtip_switch_restart while the device is
closed and the IPG clock is disabled could trigger an imprecise external abort
and crash the system.

> +		napi_disable(&fep->napi);
> +		netif_tx_lock_bh(dev);
> +		mtip_switch_restart(dev, fep->full_duplex[0],
> +				    fep->full_duplex[1]);

[Severity: High]
Does this restart logic cause a data race with concurrent transmissions on the
other port? The driver manages two network ports sharing a single hardware
ring. mtip_switch_restart resets the shared transmission ring state, but only
the tx lock for the specific port that timed out is acquired here, rather than
the shared hardware lock.

[Severity: Medium]
Does mtip_switch_restart leak DMA mappings during a timeout recovery? The
transmission path maps buffers for DMA, but the restart routine appears to
simply free the transmit socket buffers without calling the corresponding unmap
function, potentially leaking IOVA space or bounce buffers on every timeout.

> +		netif_tx_wake_all_queues(dev);
> +		netif_tx_unlock_bh(dev);
> +		napi_enable(&fep->napi);
> +	}
> +	rtnl_unlock();
> +}
[ ... ]
> @@ -1158,10 +1448,14 @@ static int mtip_switch_dma_init(struct switch_enet_private *fep)
>  
>  static void mtip_ndev_cleanup(struct switch_enet_private *fep)
>  {
> +	struct mtip_ndev_priv *priv;
>  	int i;
>  
>  	for (i = 0; i < SWITCH_EPORT_NUMBER; i++) {
>  		if (fep->ndev[i]) {
> +			priv = netdev_priv(fep->ndev[i]);
> +			cancel_work_sync(&priv->tx_timeout_work);
> +
>  			unregister_netdev(fep->ndev[i]);

[Severity: High]
Can this result in a use-after-free race with the timeout work? Calling
unregister_netdev after cancel_work_sync means the networking watchdog is
still active and could fire immediately after cancel_work_sync returns. If the
watchdog queues the work item again, the work could execute after the device
structures have been freed.

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260601112437.2216043-1-lukasz.majewski@mailbox.org?part=4

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [net-next v24 5/7] net: mtip: Add mtip_switch_{rx|tx} functions to the L2 switch driver
  2026-06-01 11:24 ` [net-next v24 5/7] net: mtip: Add mtip_switch_{rx|tx} " Lukasz Majewski
@ 2026-06-02 11:32   ` sashiko-bot
  2026-06-05  2:18   ` Jakub Kicinski
  1 sibling, 0 replies; 21+ messages in thread
From: sashiko-bot @ 2026-06-02 11:32 UTC (permalink / raw)
  To: Lukasz Majewski; +Cc: Frank.Li, imx

Thank you for your contribution! Sashiko AI review found 9 potential issue(s) to consider:
- [Critical] Flawed error handling for multi-descriptor packets causes fragmented packet injection.
- [High] NULL pointer dereference in mtip_switch_tx() when queues are stopped and ring is empty.
- [High] Missing dma_sync_single_for_device() on RX error path causes data corruption for subsequent packets.
- [High] `swap_buffer()` corrupts the tail of packets whose length is not a multiple of 4 bytes.
- [High] Missing dma_rmb() after reading RX descriptor ownership allows speculative execution to read stale data.
- [High] The TX completion path unmaps DMA buffers using a hardcoded frame size instead of the original mapped length.
- [Medium] The Ethernet FCS is incorrectly included in the SKB payload, breaking bridging applications.
- [Low] Unreachable code: The TX completion check for `BD_ENET_TX_READY` is mathematically impossible.
- [Low] Received SKBs are built with zero headroom, leading to inefficient processing.
--

commit be1b7f747ef200958ab836f16e9cf7ce2caf1583
Author: Lukasz Majewski <lukasz.majewski@mailbox.org>

net: mtip: Add mtip_switch_{rx|tx} functions to the L2 switch driver

This patch provides mtip_switch_tx and mtip_switch_rx functions
code for MTIP L2 switch.

> diff --git a/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c b/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c
> index 7dfeda230e3fb..212207861e13e 100644
> --- a/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c
> +++ b/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c
[ ... ]
> @@ -836,11 +873,233 @@ static irqreturn_t mtip_interrupt(int irq, void *ptr_fep)
>  
>  static void mtip_switch_tx(struct switch_enet_private *fep)
>  {
> +	struct net_device *dev;
> +	unsigned short status;
> +	struct sk_buff *skb;
> +	struct cbd_t *bdp;
> +	int index;
> +
> +	spin_lock_bh(&fep->hw_lock);
> +	bdp = fep->dirty_tx;
> +
> +	while (((status = bdp->cbd_sc) & BD_ENET_TX_READY) == 0) {
> +		if (bdp == fep->cur_tx &&
> +		    !mtip_netif_queues_stopped(fep))
> +			break;

[Severity: High]
What happens if the queues are stopped and the ring is empty?

If the ring is empty (bdp == fep->cur_tx) but the queues are stopped
(e.g. link down), this break condition evaluates to false. Would this
cause the loop to process an empty, unsubmitted descriptor, leading to a
NULL pointer dereference when accessing fep->tx_skbuff[index] and
skb->dev below?

> +
> +		index = bdp - fep->tx_bd_base;
> +		dma_unmap_single(&fep->pdev->dev, bdp->cbd_bufaddr,
> +				 MTIP_SWITCH_TX_FRSIZE, DMA_TO_DEVICE);

[Severity: High]
Does this unmap call violate the DMA API by using a hardcoded length?

The DMA API mandates that the unmap size must exactly match the length
passed to dma_map_single(). Since outgoing network packets have variable
lengths, could using the hardcoded MTIP_SWITCH_TX_FRSIZE maximum size
cause DMA API warnings, IOMMU faults, or memory corruption?

> +		bdp->cbd_bufaddr = 0;
> +		skb = fep->tx_skbuff[index];
> +		dev = skb->dev;
> +		/* Check for errors */
> +		if (status & (BD_ENET_TX_HB | BD_ENET_TX_LC |
> +				   BD_ENET_TX_RL | BD_ENET_TX_UN |
> +				   BD_ENET_TX_CSL)) {
[ ... ]
> +		} else {
> +			dev->stats.tx_packets++;
> +		}
> +
> +		if (status & BD_ENET_TX_READY)
> +			dev_err_ratelimited(&fep->pdev->dev,
> +					    "xmit interrupt and TX_READY.\n");

[Severity: Low]
Is it mathematically possible to reach this condition?

The while loop entry condition guarantees that BD_ENET_TX_READY is zero:
while (((status = bdp->cbd_sc) & BD_ENET_TX_READY) == 0)

Could this check be unreachable dead code, indicating a potential logic
flaw in the state handling?

[ ... ]
> +static int mtip_switch_rx(struct net_device *dev, int budget)
>  {
> -	return -ENOMEM;
> +	struct mtip_ndev_priv *priv = netdev_priv(dev);
> +	u8 *data, rx_port = MTIP_PORT_FORWARDING_INIT;
> +	struct switch_enet_private *fep = priv->fep;
> +	unsigned short status, pkt_len;
> +	struct net_device *pndev;
> +	struct ethhdr *eth_hdr;
> +	int pkt_received = 0;
> +	struct sk_buff *skb;
> +	struct cbd_t *bdp;
> +	struct page *page;
> +	int index;
> +
> +	/* First, grab all of the stats for the incoming packet.
> +	 * These get messed up if we get called due to a busy condition.
> +	 */
> +	bdp = fep->cur_rx;
> +
> +	while (!((status = bdp->cbd_sc) & BD_ENET_RX_EMPTY)) {

[Severity: High]
Is a read memory barrier needed here?

The loop reads the descriptor status to verify that the hardware has
yielded ownership. Without a dma_rmb() executed after this check, could
the CPU speculatively reorder subsequent reads of bdp->cbd_datlen and
the packet payload to occur before the ownership status change is
globally visible, leading to stale data being processed?

> +		if (pkt_received >= budget)
> +			break;
> +
> +		pkt_received++;
> +
> +		writel(MCF_ESW_IMR_RXF, fep->hwp + ESW_ISR);
> +		if (!fep->usage_count)
> +			goto rx_processing_done;
> +
> +		status ^= BD_ENET_RX_LAST;
> +		/* Check for errors. */
> +		if (status & (BD_ENET_RX_LG | BD_ENET_RX_SH | BD_ENET_RX_NO |
> +			      BD_ENET_RX_CR | BD_ENET_RX_OV | BD_ENET_RX_LAST |
> +			      BD_ENET_RX_CL)) {

[Severity: Critical]
Does this logic safely handle multi-descriptor packets?

The XOR logic (status ^= BD_ENET_RX_LAST) treats any descriptor without
the last-fragment bit as an error. The error handling path then yields the
descriptor back to the hardware but does not drain the remaining descriptors
of the packet.

When the final descriptor of the fragmented packet is received, it will
pass the error check, and the driver might incorrectly parse the middle or
end of the fragmented packet as a brand new Ethernet frame.

[ ... ]
> +		/* Get correct RX page */
> +		index = bdp - fep->rx_bd_base;
> +		page = fep->page[index];
> +		/* Process the incoming frame */
> +		pkt_len = bdp->cbd_datlen;
> +
> +		dma_sync_single_for_cpu(&fep->pdev->dev, bdp->cbd_bufaddr,
> +					pkt_len, DMA_FROM_DEVICE);
> +		net_prefetch(page_address(page));
> +		data = page_address(page);
> +
> +		if (fep->quirks & FEC_QUIRK_SWAP_FRAME)
> +			swap_buffer(data, pkt_len);

[Severity: High]
Does swap_buffer() safely handle packet lengths that are not a multiple of 4?

Looking at the implementation:

static void swap_buffer(void *bufaddr, int len)
{
        unsigned int *buf = bufaddr;
        int i;

        for (i = 0; i < len; i += 4, buf++)
                swab32s(buf);
}

It appears the loop condition iterates strictly by 4. If the packet length
is not an exact multiple of 4 bytes, could the final iteration overshoot
the valid packet data, pulling uninitialized garbage bytes into the tail
of the packet?

> +
> +		eth_hdr = (struct ethhdr *)data;
> +		mtip_atable_get_entry_port_number(fep, eth_hdr->h_source,
> +						  &rx_port);
[ ... ]
> +		if (mtip_update_cbd(fep, bdp, index)) {
> +			pndev->stats.rx_dropped++;
> +			goto rx_processing_done;
> +		}

[Severity: High]
If this allocation fails, does the driver return a dirty buffer back to the
hardware?

When swap_buffer() modifies the buffer in-place, the CPU cache is dirtied.
If we take this error path, the buffer is yielded back to the hardware
(via rx_processing_done) without calling dma_sync_single_for_device() to
transfer ownership. Could the dirty CPU cache lines later be asynchronously
evicted to RAM, overwriting new packet data being DMA'd by the hardware?

> +
> +		/* The packet length includes FCS, but we don't want to
> +		 * include that when passing upstream as it messes up
> +		 * bridging applications.
> +		 */
> +		skb = build_skb(page_address(page), PAGE_SIZE);

[Severity: Low]
Does this allocation leave zero headroom for the SKB?

Incoming packets are DMA'd to offset 0 of the allocated page. By building
the SKB directly at page_address(page), the SKB is created with
skb->head == skb->data. While the networking core will safely reallocate
headers if necessary, failing to reserve standard headroom (like NET_SKB_PAD)
can cause unnecessary performance degradation.

> +		if (unlikely(!skb)) {
> +			page_pool_recycle_direct(fep->page_pool, page);
> +			pndev->stats.rx_dropped++;
> +
> +			netdev_err_once(pndev, "build_skb failed!\n");
> +			goto rx_processing_done;
> +		}
> +
> +		skb_put(skb, pkt_len);      /* Make room */

[Severity: Medium]
Does this skb_put() call incorrectly include the FCS in the SKB payload?

The comment immediately above states the intention to exclude the FCS
because it messes up bridging applications. However, this calls
skb_put() using the full pkt_len without subtracting ETH_FCS_LEN (4 bytes).

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260601112437.2216043-1-lukasz.majewski@mailbox.org?part=5

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [net-next v24 6/7] net: mtip: Extend the L2 switch driver with management operations
  2026-06-01 11:24 ` [net-next v24 6/7] net: mtip: Extend the L2 switch driver with management operations Lukasz Majewski
@ 2026-06-02 11:32   ` sashiko-bot
  2026-06-05  2:18   ` Jakub Kicinski
  1 sibling, 0 replies; 21+ messages in thread
From: sashiko-bot @ 2026-06-02 11:32 UTC (permalink / raw)
  To: Lukasz Majewski; +Cc: imx, Frank.Li

Thank you for your contribution! Sashiko AI review found 11 potential issue(s) to consider:

New issues:
- [High] Global steering register race with shared DMA transmission ring causes packets to be sent to the wrong port.
- [High] Bitwise OR operations on hardware registers without clearing field masks cause invalid modes.
- [High] Disabling multicast reception on the CPU management port breaks IPv6 and essential protocols.
- [High] Off-by-one error in VLAN domain verification check allows out-of-bounds register writes.
- [High] VLAN input processing helper unconditionally overwrites the `ESW_VRES` register, clobbering other ports' memberships.
- [High] Unsynchronized Read-Modify-Write (RMW) on shared hardware registers in management functions.
- [Medium] Per-port learning configuration incorrectly toggles a global interrupt mask and conflicts with NAPI polling.

Pre-existing issues:
- [High] Use-After-Free of SKB after transferring ownership to DMA hardware.
- [High] Double free and UAF of transmit SKBs due to lockless freeing in `mtip_switch_restart`.
- [High] Concurrent PHY adjustments trigger unsynchronized global switch resets.
- [High] Out-of-bounds DMA Mapping of `skb->data`.
--

commit 18b1fc23ec60b0ca391ad794f056d1c911c2b51b
Author: Lukasz Majewski <lukasz.majewski@mailbox.org>

net: mtip: Extend the L2 switch driver with management operations

This patch provides function necessary for managing the L2 switch.

> diff --git a/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c b/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c
> --- a/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c
> +++ b/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c
> @@ -505,8 +505,35 @@ static void mtip_config_switch(struct switch_enet_private *fep)
>  
>  	writel(0, fep->hwp + ESW_BKLR);

[ ... ]

> +	mtip_port_broadcast_config(fep, 2, 1);
> +
> +	/* Disable multicast receive on port 0 (MGNT) */
> +	mtip_port_multicast_config(fep, 0, 0);

[Severity: High]
Does disabling multicast receive on port 0 break IPv6 communication?

The Linux network stack strictly requires multicast reception to function,
particularly for IPv6 (Neighbor Discovery, Router Advertisements, MLD) and
IPv4 protocols like mDNS. Dropping all multicast traffic in hardware for the
CPU management port might disrupt these protocols.

> +	mtip_port_multicast_config(fep, 1, 1);
> +	mtip_port_multicast_config(fep, 2, 1);
> +
> +	/* Setup VLANs to provide port separation */
> +	if (!fep->br_offload)
> +		mtip_switch_en_port_separation(fep);
>  }

[ ... ]

> @@ -592,6 +619,10 @@ static netdev_tx_t mtip_start_xmit_port(struct sk_buff *skb,
>  
>  	skb_tx_timestamp(skb);

[Severity: High]
This is a pre-existing issue, but does accessing the skb fields here cause a
use-after-free?

Ownership of the SKB is transferred to the hardware by setting
bdp->cbd_sc = status just above this. The hardware can immediately transmit
the packet and issue a completion interrupt which frees the SKB before
skb_tx_timestamp() or skb->len are evaluated.

> +	/* For port separation - force sending via specified port */
> +	if (!fep->br_offload && port != 0)
> +		mtip_forced_forward(fep, port, 1);

[Severity: High]
Is there a race condition here with the shared DMA transmission ring?

The driver steers individual packets to a specific port by updating the global
hardware register ESW_P0FFEN via mtip_forced_forward() immediately before
triggering the DMA transmission. Since the hardware processes the DMA ring
asynchronously, modifying this global register dynamically might forcefully
redirect previously enqueued but not-yet-transmitted packets to the new port.

Also, since forced forwarding is never disabled (a call with enable=0 is
missing), will all subsequent bridge or broadcast traffic originating from the
CPU be misrouted to the last specified port?

>  	dev->stats.tx_bytes += skb->len;

> diff --git a/drivers/net/ethernet/freescale/mtipsw/mtipl2sw_mgnt.c b/drivers/net/ethernet/freescale/mtipsw/mtipl2sw_mgnt.c
> --- /dev/null
> +++ b/drivers/net/ethernet/freescale/mtipsw/mtipl2sw_mgnt.c
> @@ -0,0 +1,442 @@

[ ... ]

> +int mtip_vlan_input_process(struct switch_enet_private *fep,
> +			    int port, int mode, unsigned short port_vlanid,
> +			    int vlan_verify_en, int vlan_domain_num,
> +			    int vlan_domain_port)
> +{

[ ... ]

> +	if (vlan_verify_en == 1 &&
> +	    (vlan_domain_num < 0 || vlan_domain_num > 32)) {

[Severity: High]
Does this bounds check permit the index to be exactly 32?

Since the VLAN resolution table consists of 32 entries (indexed 0 to 31),
calculating the offset for index 32 might overwrite the adjacent ESW_DISCN
register and cause memory corruption.

> +		dev_err(&fep->pdev->dev, "%s: Domain out of range\n", __func__);
> +		return -EINVAL;
> +	}
> +
> +	writel(FIELD_PREP(MCF_ESW_PID_VLANID_MASK, port_vlanid),
> +	       fep->hwp + ESW_PID(port));
> +	if (port == 0) {
> +		if (vlan_verify_en == 1)
> +			writel(FIELD_PREP(MCF_ESW_VRES_VLANID_MASK,
> +					  port_vlanid) | MCF_ESW_VRES_P0,
> +			       fep->hwp + ESW_VRES(vlan_domain_num));

[Severity: High]
Does writing directly to ESW_VRES(vlan_domain_num) using an absolute writel()
clobber other ports' memberships?

Because ESW_VRES is a bitmap representing all ports in the domain, this might
destroy the membership of any other port previously assigned to that same
VLAN domain.

> +		writel(readl(fep->hwp + ESW_VIMEN) | MCF_ESW_VIMEN_EN0,
> +		       fep->hwp + ESW_VIMEN);
> +		writel(readl(fep->hwp + ESW_VIMSEL) |
> +		       FIELD_PREP(MCF_ESW_VIMSEL_IM0_MASK, mode),
> +		       fep->hwp + ESW_VIMSEL);

[Severity: High]
Does this read-modify-write pattern clear the previous bits in the target
mask before the new mode is applied?

If the register previously held a mode with conflicting bits, the logical OR
will combine them (e.g., updating mode 3 to mode 1 leaves the register at 3),
placing the hardware in an invalid configuration state.

[ ... ]

> +int mtip_vlan_output_process(struct switch_enet_private *fep, int port,
> +			     int mode)
> +{

[ ... ]

> +	if (port == 0) {
> +		writel(readl(fep->hwp + ESW_VOMSEL) |
> +		       FIELD_PREP(MCF_ESW_VOMSEL_OM0_MASK, mode),
> +		       fep->hwp + ESW_VOMSEL);

[Severity: High]
Does this read-modify-write pattern clear the previous bits in the target
mask before the new mode is applied?

If the register previously held a mode with conflicting bits, the logical OR
will combine them, placing the hardware in an invalid configuration state.

[ ... ]

> +int mtip_port_enable_config(struct switch_enet_private *fep, int port,
> +			    bool tx_en, bool rx_en)
> +{
> +	u32 reg = 0;

[ ... ]

> +	reg = readl(fep->hwp + ESW_PER);

[Severity: High]
Are these read-modify-write operations on shared global hardware registers
synchronized?

Since these functions do not acquire fep->hw_lock to serialize access,
concurrent calls from ethtool, sysfs, or NAPI might overlap, tearing the
state and corrupting the hardware registers.

[ ... ]

> +int mtip_port_learning_config(struct switch_enet_private *fep, int port,
> +			      bool disable, bool irq_adj)
> +{
> +	u32 reg = 0;

[ ... ]

> +	reg = readl(fep->hwp + ESW_BKLR);
> +	if (disable) {
> +		if (irq_adj)
> +			writel(readl(fep->hwp + ESW_IMR) & ~MCF_ESW_IMR_LRN,
> +			       fep->hwp + ESW_IMR);

[Severity: Medium]
Does disabling learning on a single port clear MCF_ESW_IMR_LRN in the global
ESW_IMR register?

This might suppress learning interrupts for all other ports. Also, the
pre-existing NAPI poll routine mtip_rx_napi() unconditionally overwrites
ESW_IMR with hardcoded values, meaning this configuration might be
immediately discarded upon the next packet reception.

Additional pre-existing issues found in the codebase:

[Severity: High]
This is a pre-existing issue, but does lockless freeing of SKBs cause a
double free and use-after-free here?

drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c:mtip_switch_restart() {
...
    for (i = 0; i <= TX_RING_MOD_MASK; i++) {
        if (fep->tx_skbuff[i]) {
            dev_kfree_skb_any(fep->tx_skbuff[i]);
            fep->tx_skbuff[i] = NULL;
        }
    }
...
}

The phylib state machine triggers mtip_adjust_link() from process context,
which unconditionally calls mtip_switch_restart() and locklessly frees all
tx_skbuff entries. If the NAPI softirq mtip_switch_tx() is concurrently
processing TX completions, both threads can simultaneously retrieve and free
the same SKB pointers.

[Severity: High]
This is a pre-existing issue, but do concurrent PHY adjustments trigger
unsynchronized global switch resets?

drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c:mtip_adjust_link() {
...
}

The phylib core triggers mtip_adjust_link() for each port's PHY
independently. Because mtip_adjust_link() invokes mtip_switch_restart()
without holding a global lock spanning all ports, simultaneous link state
changes on multiple ports will trigger concurrent MCF_ESW_MODE_SW_RST resets
and overlapping re-initialization of the shared hardware rings.

[Severity: High]
This is a pre-existing issue, but does this mapping read out-of-bounds?

drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c:mtip_start_xmit_port() {
...
    bdp->cbd_bufaddr = dma_map_single(&fep->pdev->dev, bufaddr,
                                      MTIP_SWITCH_TX_FRSIZE,
                                      DMA_TO_DEVICE);
...
}

When preparing a transmission descriptor without a bounce buffer,
dma_map_single() is called with a fixed length of MTIP_SWITCH_TX_FRSIZE
instead of the packet's actual length skb->len. Since networking packets are
generally much smaller than the maximum frame size, this causes the DMA
mapping to read well beyond the end of the allocated SKB buffer, potentially
triggering IOMMU faults.

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260601112437.2216043-1-lukasz.majewski@mailbox.org?part=6

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [net-next v24 7/7] net: mtip: Extend the L2 switch driver for imx287 with bridge operations
  2026-06-01 11:24 ` [net-next v24 7/7] net: mtip: Extend the L2 switch driver for imx287 with bridge operations Lukasz Majewski
@ 2026-06-02 11:32   ` sashiko-bot
  0 siblings, 0 replies; 21+ messages in thread
From: sashiko-bot @ 2026-06-02 11:32 UTC (permalink / raw)
  To: Lukasz Majewski; +Cc: imx, Frank.Li

Thank you for your contribution! Sashiko AI review found 5 potential issue(s) to consider:

New issues:
- [High] Global static netdevice notifier block registration breaks multi-instance support.
- [High] Returning -EOPNOTSUPP on bridge mismatch aborts software bridging entirely.
- [Medium] Missing switchdev unoffload due to premature notifier unregistration in remove path.

Pre-existing issues:
- [High] Use-After-Free / NULL pointer dereference risk due to devm_request_irq outliving netdev destruction.
- [High] Lockless switch restart triggered by PHY state machine causes double-frees and hardware races.
--

commit 11e801c2e83fa88f60f9277fdccfb7c571bbd65f
Author: Lukasz Majewski <lukasz.majewski@mailbox.org>

net: mtip: Extend the L2 switch driver for imx287 with bridge operations

This commit extends the MTIP L2 switch driver to support bridge operations.
It allows configuring the switch as an offloading device for packet switching
when a bridge is created on its interfaces.

> diff --git a/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c b/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c
> index 76e503e128ddb..0a12d115b78b2 100644
> --- a/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c
> +++ b/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c

[ ... ]

> @@ -2009,6 +2015,7 @@ static void mtip_sw_remove(struct platform_device *pdev)
>  
>  	timer_delete_sync(&fep->timer_mgnt);
>  
> +	mtip_unregister_notifiers(fep);
>  	mtip_ndev_cleanup(fep);

[Severity: Medium]
Will calling mtip_unregister_notifiers() before mtip_ndev_cleanup() miss
the automatic switchdev unoffload?

When mtip_ndev_cleanup() unregisters the netdevices, the network stack will
automatically detach them from their upper bridges and generate
NETDEV_CHANGEUPPER unlink events. Since the notifier is already unregistered
at this point, won't the driver miss these events and fail to call
switchdev_bridge_port_unoffload()? This appears to leave the bridge's
offload_count for the port never decremented and leaks the switchdev hardware
domain object.

[Severity: High]
This isn't a bug introduced by this patch, but does the devm_request_irq()
allocation outlive the netdevice destruction here?

The driver allocates its hardware interrupt using devm_request_irq(), which
means it will only be freed after mtip_sw_remove() returns. However,
mtip_ndev_cleanup() unregisters and frees the fep->ndev structures before
the interrupt is masked or freed. If an interrupt fires during or after this
cleanup, won't mtip_interrupt() execute, schedule NAPI, and cause the NAPI
poll routine to dereference the freed fep->ndev pointers?

[ ... ]

> diff --git a/drivers/net/ethernet/freescale/mtipsw/mtipl2sw_br.c b/drivers/net/ethernet/freescale/mtipsw/mtipl2sw_br.c
> new file mode 100644
> index 0000000000000..c9d9151b0e81d
> --- /dev/null
> +++ b/drivers/net/ethernet/freescale/mtipsw/mtipl2sw_br.c

[ ... ]

> +static int mtip_ndev_port_link(struct net_device *ndev,
> +			       struct net_device *br_ndev,
> +			       struct netlink_ext_ack *extack)
> +{

[ ... ]

> +	if (fep->br_members && !fep->br_offload) {
> +		/* Get the second bridge ndev */
> +		other_ndev = fep->ndev[fep->br_members - 1];
> +		other_priv = netdev_priv(other_ndev);
> +		if (other_priv->master_dev != br_ndev) {
> +			NL_SET_ERR_MSG_MOD(extack,
> +					   "L2 offloading only possible for the same bridge!");
> +			return -EOPNOTSUPP;
> +		}

[Severity: High]
Does returning -EOPNOTSUPP here prevent the kernel from falling back to
software bridging?

When hardware offloading is not possible because the ports belong to different
bridges, returning -EOPNOTSUPP directly from the NETDEV_CHANGEUPPER notifier
callback translates to NOTIFY_STOP_MASK. Doesn't this cause
__netdev_upper_dev_link() to completely abort adding the interface to the
bridge? Should this return NOTIFY_DONE or 0 instead to allow graceful
fallback to software bridging?

[ ... ]

> +static struct notifier_block mtip_netdevice_nb __read_mostly = {
> +	.notifier_call = mtip_netdevice_event,
> +};
> +
> +int mtip_register_notifiers(struct switch_enet_private *fep)
> +{
> +	int ret = register_netdevice_notifier(&mtip_netdevice_nb);

[Severity: High]
Does registering a single global static notifier block break multiple device
instances?

If a second device instance is probed, calling register_netdevice_notifier()
with the same mtip_netdevice_nb block will return -EEXIST and trigger a
kernel warning. Furthermore, if any instance is unbound, calling
unregister_netdevice_notifier() will remove the global block, silently
breaking switchdev bridge offloading for all remaining active instances.
Should the notifier_block be embedded within the per-device private
structure instead?

[ ... ]

[Severity: High]
This is a pre-existing issue, but does mtip_adjust_link() race with the TX
hotpath?

The phylib state machine calls mtip_adjust_link() from a workqueue without
holding fep->hw_lock or rtnl_lock. In response to link changes, it calls
mtip_switch_restart(), which resets the switch and calls
dev_kfree_skb_any() on fep->tx_skbuff entries locklessly:

mtip_switch_restart() {
    ...
    dev_kfree_skb_any(fep->tx_skbuff[i]);
    ...
}

Because this runs concurrently with mtip_start_xmit_port() and
mtip_switch_tx(), which modify the same buffers and rings under hw_lock,
won't this cause SKB double-frees, ring descriptor corruption, and hardware
state inconsistencies?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260601112437.2216043-1-lukasz.majewski@mailbox.org?part=7

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [net-next v24 2/7] net: mtip: The L2 switch driver for imx287
  2026-06-01 11:24 ` [net-next v24 2/7] net: mtip: The L2 switch driver for imx287 Lukasz Majewski
  2026-06-02 11:32   ` sashiko-bot
@ 2026-06-05  2:18   ` Jakub Kicinski
  1 sibling, 0 replies; 21+ messages in thread
From: Jakub Kicinski @ 2026-06-05  2:18 UTC (permalink / raw)
  To: lukasz.majewski
  Cc: Jakub Kicinski, andrew+netdev, davem, edumazet, pabeni,
	richardcochran, lgirdwood, nathan, linux-kernel, netdev, krzk+dt,
	imx, linux-arm-kernel, robh

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
net: mtip: The L2 switch driver for imx287

This patch adds a new driver for the MoreThanIP L2 switch embedded in
the i.MX287 SoC.  The two-port switch sits between uDMA[01] and the
MAC-NET[01] blocks and can be used as an alternative to the existing
FEC driver.

A few questions inline below.

> diff --git a/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c b/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c
> new file mode 100644
> index 000000000000..1d1005a42306
> --- /dev/null
> +++ b/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c

[ ... ]

> +static void mtip_switch_restart(struct net_device *dev, int duplex0,
> +				int duplex1)
> +{
> +	struct mtip_ndev_priv *priv = netdev_priv(dev);
> +	struct switch_enet_private *fep = priv->fep;
> +	int i;
> +
> +	 /* Perform a reset. We should wait for this. */
> +	writel(MCF_ESW_MODE_SW_RST, fep->hwp + ESW_MODE);

[High]
Should this routine hold a lock?  fep->hw_lock is declared and
initialised in mtip_sw_probe() with the comment "/* Lock for HW
configuration */" but neither mtip_switch_restart() nor mtip_adjust_link()
acquire it.

The two ports each get their own phy_device via of_phy_connect(), and
phylib serialises adjust_link callbacks per phy_device, not across them.
If both ports' state machines fire mtip_adjust_link() at the same time,
both CPUs end up inside mtip_switch_restart() in parallel, racing on
ESW_MODE_SW_RST, ESW_BMPC, ESW_ISR, ESW_RDSR/ESW_TDSR, fep->cur_tx /
cur_rx / dirty_tx, and fep->full_duplex[].

> +	/* Reset SKB transmit buffers */
> +	for (i = 0; i <= TX_RING_MOD_MASK; i++) {
> +		if (fep->tx_skbuff[i]) {
> +			dev_kfree_skb_any(fep->tx_skbuff[i]);
> +			fep->tx_skbuff[i] = NULL;
> +		}
> +	}

[High]
Can two concurrent invocations of this loop double-free a tx_skbuff[i]
entry?  Both CPUs read the same non-NULL pointer, both call
dev_kfree_skb_any() on it, and only then race on the NULL store.

> +
> +	fep->full_duplex[0] = duplex0;
> +	fep->full_duplex[1] = duplex1;
> +
> +	mtip_configure_enet_mii(fep, 1);
> +	mtip_configure_enet_mii(fep, 2);
> +
> +	/* And last, enable the transmit and receive processing */
> +	writel(MCF_ESW_RDAR_R_DES_ACTIVE, fep->hwp + ESW_RDAR);
> +
> +	/* Enable interrupts we wish to service */
> +	writel(0xFFFFFFFF, fep->hwp + ESW_ISR);
> +	writel(MCF_ESW_IMR_TXF | MCF_ESW_IMR_RXF,
> +	       fep->hwp + ESW_IMR);
> +
> +	mtip_config_switch(fep);
> +}
> +
> +static irqreturn_t mtip_interrupt(int irq, void *ptr_fep)
> +{
> +	struct switch_enet_private *fep = ptr_fep;
> +	irqreturn_t ret = IRQ_NONE;
> +	u32 int_events, int_imask;
> +
> +	/* Get the interrupt events that caused us to be here */
> +	int_events = readl(fep->hwp + ESW_ISR);
> +	writel(int_events, fep->hwp + ESW_ISR);
> +
> +	if (int_events & (MCF_ESW_ISR_RXF | MCF_ESW_ISR_TXF)) {
> +		ret = IRQ_HANDLED;
> +		/* Disable the RX interrupt */
> +		if (napi_schedule_prep(&fep->napi)) {
> +			int_imask = readl(fep->hwp + ESW_IMR);
> +			int_imask &= ~MCF_ESW_IMR_RXF;
> +			writel(int_imask, fep->hwp + ESW_IMR);
> +			__napi_schedule(&fep->napi);
> +		}
> +	}
> +
> +	return ret;
> +}

[Medium]
Should this also clear MCF_ESW_IMR_TXF here?  The condition above
schedules NAPI on either RXF or TXF, and the NAPI poll re-enables both
bits together:

	writel(MCF_ESW_IMR_RXF | MCF_ESW_IMR_TXF,
	       fep->hwp + ESW_IMR);

With only RXF masked here, every TX completion while NAPI is running
re-asserts the IRQ line; mtip_interrupt() runs again, napi_schedule_prep()
returns false because NAPI is already scheduled, and the handler returns
IRQ_HANDLED having done nothing useful.  Once mtip_switch_tx() is filled
in by the later patch in the series, would this turn into a sustained
IRQ storm on TX-busy workloads?

[ ... ]

> +static int mtip_mii_init(struct switch_enet_private *fep,
> +			 struct platform_device *pdev)
> +{
> +	struct device_node *node;
> +	int err = -ENXIO;
> +
> +	/* Clear MMFR to avoid to generate MII event by writing MSCR.
> +	 * MII event generation condition:
> +	 * - writing MSCR:
> +	 *      - mmfr[31:0]_not_zero & mscr[7:0]_is_zero &
> +	 *        mscr_reg_data_in[7:0] != 0
> +	 * - writing MMFR:
> +	 *      - mscr[7:0]_not_zero
> +	 */
> +	writel(0, fep->hwp + MCF_FEC_MII_DATA);
> +	/* Clear any pending transaction complete indication */
> +	writel(MCF_ENET_MII, fep->enet_addr + MCF_FEC_EIR);

[Medium]
Is the base address for the MMFR clear correct here?

fep->hwp is the switch base (fep->enet_addr + ENET_SWI_PHYS_ADDR_OFFSET,
i.e. enet_addr + 0x8000), while MCF_FEC_MII_DATA (0x040) is an offset
within the MAC register space.  At offset 0x040 from fep->hwp the header
defines:

	#define ESW_MCR             (0x040)

So this writel() goes to the switch's Mirror Configuration Register
rather than to MMFR.  The very next line correctly uses fep->enet_addr +
MCF_FEC_EIR for the EIR clear, and mtip_mdio_read()/mtip_mdio_write()
also use fep->enet_addr + MCF_FEC_MII_DATA for the actual MDIO accesses.
Should this writel() also use fep->enet_addr?

[ ... ]

> +static int mtip_open(struct net_device *dev)
> +{
> +	struct mtip_ndev_priv *priv = netdev_priv(dev);
> +	struct switch_enet_private *fep = priv->fep;
> +	int ret, port_idx = priv->portnum - 1;
> +
> +	if (fep->usage_count == 0) {
> +		ret = clk_enable(fep->clk_ipg);

[ ... ]

> +	ret = mtip_mii_probe(dev);
> +	if (ret)
> +		goto mtip_mii_probe_err;
> +
> +	phy_start(fep->phy_dev[port_idx]);
> +
> +	if (fep->usage_count == 0) {
> +		napi_enable(&fep->napi);
> +		mtip_switch_restart(dev, 1, 1);
> +
> +		netif_start_queue(dev);
> +	}

[High]
Can phy_start() race with the mtip_switch_restart() call below it?
Once phy_start() returns, the phylib state machine workqueue may invoke
mtip_adjust_link() on another CPU, which calls mtip_switch_restart() on
link-up.  That can run concurrently with the mtip_switch_restart(dev,
1, 1) here, and both paths walk fep->tx_skbuff[] freeing entries and
reprogram switch HW registers without synchronisation.

Would it be safer to start the PHY only after the switch HW has been
configured (and to take a lock around mtip_switch_restart())?

[ ... ]

> +static int mtip_parse_of(struct switch_enet_private *fep,
> +			 struct device_node *np)
> +{
> +	struct device_node *p;
> +	unsigned int port_num;
> +	int ret = 0;
> +
> +	p = of_get_child_by_name(np, "ethernet-ports");
> +
> +	for_each_available_child_of_node_scoped(p, port) {
> +		if (of_property_read_u32(port, "reg", &port_num))
> +			continue;

[Medium]
If of_get_child_by_name() returns NULL (no "ethernet-ports" node), the
for_each_available_child_of_node_scoped() loop is skipped entirely, ret
stays 0, and mtip_parse_of() returns success with no ports parsed.
Similarly, if every child fails the "reg" read above, the continue keeps
ret at 0.

After such a "success" return, mtip_ndev_init() calls
alloc_netdev(sizeof(struct mtip_ndev_priv), fep->ndev_name[i],
NET_NAME_USER, ether_setup) with fep->ndev_name[i] still NULL.
alloc_netdev_mqs() then calls strscpy_pad() on a NULL name and oopses.

Should missing "ethernet-ports", missing "reg", or having no parsed
ports be treated as errors here?

[ ... ]

> +		fep->phy_np[port_num - 1] = of_parse_phandle(port,
> +							     "phy-handle", 0);
> +		if (!fep->phy_np[port_num - 1]) {
> +			dev_err(&fep->pdev->dev,
> +				"%s: PHY handle not defined!\n", __func__);
> +			ret = -ENODEV;
> +			goto of_get_err;
> +		}
> +	}
> +
> + of_get_err:
> +	mtip_parse_of_cleanup(fep);
> +	of_node_put(p);
> +
> +	return ret;
> +}

[High]
Does the success path fall through into of_get_err: unconditionally?

There is no return ret; (or success label) before the of_get_err label,
so on a successful parse mtip_parse_of_cleanup(fep) runs and calls
of_node_put() on every fep->phy_np[i] just acquired by
of_parse_phandle().  mtip_parse_of_cleanup() also does not NULL the
slots:

	static void mtip_parse_of_cleanup(struct switch_enet_private *fep)
	{
		int i;

		for (i = 0; i < SWITCH_EPORT_NUMBER; i++)
			if (fep->phy_np[i])
				of_node_put(fep->phy_np[i]);
	}

After probe, mtip_open()->mtip_mii_probe() then passes the same
fep->phy_np[i] pointers to of_phy_connect() — does this dereference an
OF node the driver no longer holds a reference on?

The error and remove paths (the of_free_references label, the explicit
cleanup after devm_request_irq() failure, and mtip_sw_remove() ->
mtip_parse_of_cleanup()) call mtip_parse_of_cleanup() again on the same
non-NULL pointers, which would underflow the OF refcount on each.

Should there be a return ret; before of_get_err: and a NULL-out of the
slot in mtip_parse_of_cleanup() after each of_node_put()?

[ ... ]

> +static int mtip_sw_probe(struct platform_device *pdev)
> +{
[ ... ]
> +	ret = devm_request_irq(&pdev->dev, fep->irq, mtip_interrupt, 0,
> +			       dev_name(&pdev->dev), fep);
> +	if (ret) {
> +		mtip_parse_of_cleanup(fep);
> +		return dev_err_probe(&pdev->dev, ret, "Could not alloc IRQ\n");
> +	}

[Medium]
Can a stale IRQ here dereference an uninitialised napi struct?

netif_napi_add(dev, &fep->napi, mtip_rx_napi) is only called from
mtip_open(), so between this devm_request_irq() and the first ndo_open
fep->napi is zero-initialised (no poll, no dev, no state).  Probe also
doesn't write 0 to ESW_IMR before requesting the IRQ — it relies on the
register's reset value.

If the bootloader or a warm reset left the switch with IMR set and
events pending in ESW_ISR, mtip_interrupt() can fire immediately on
enable, see RXF/TXF, call napi_schedule_prep(&fep->napi) (which sets
NAPIF_STATE_SCHED unconditionally and returns true), and queue the
zero-initialised napi struct.  net_rx_action() would then call
n->poll == NULL.

Would it help to either mask all interrupts (writel(0, fep->hwp +
ESW_IMR)) before request_irq, or defer request_irq until after
netif_napi_add()?
-- 
pw-bot: cr

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [net-next v24 3/7] net: mtip: Add buffers management functions to the L2 switch driver
  2026-06-01 11:24 ` [net-next v24 3/7] net: mtip: Add buffers management functions to the L2 switch driver Lukasz Majewski
  2026-06-02 11:32   ` sashiko-bot
@ 2026-06-05  2:18   ` Jakub Kicinski
  2026-06-09  8:37     ` Łukasz Majewski
  1 sibling, 1 reply; 21+ messages in thread
From: Jakub Kicinski @ 2026-06-05  2:18 UTC (permalink / raw)
  To: lukasz.majewski
  Cc: Jakub Kicinski, andrew+netdev, davem, edumazet, pabeni,
	richardcochran, lgirdwood, nathan, linux-kernel, netdev, krzk+dt,
	imx, linux-arm-kernel, robh

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
net: mtip: Add buffers management functions to the L2 switch driver

This patch fills in mtip_alloc_buffers() and mtip_free_buffers() for the
MTIP L2 switch driver, replacing the previously empty stubs with real
page_pool setup/teardown and tx_bounce buffer allocation/free.

> diff --git a/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c b/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c
> index 1d1005a42306..5b5229ee411d 100644
> --- a/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c
> +++ b/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c
> @@ -892,11 +892,101 @@ static void mtip_get_drvinfo(struct net_device *dev,
>  
>  static void mtip_free_buffers(struct net_device *dev)
>  {
> +	struct mtip_ndev_priv *priv = netdev_priv(dev);
> +	struct switch_enet_private *fep = priv->fep;
> +	int i;
> +
> +	for (i = 0; i < RX_RING_SIZE; i++) {
> +		if (!fep->page[i])
> +			continue;
> +
> +		page_pool_put_full_page(fep->page_pool,
> +					fep->page[i], false);
> +		fep->page[i] = NULL;
> +	}
> +
> +	page_pool_destroy(fep->page_pool);
> +	fep->page_pool = NULL;
> +
> +	for (i = 0; i < TX_RING_SIZE; i++) {
> +		kfree(fep->tx_bounce[i]);
> +		fep->tx_bounce[i] = NULL;
> +	}
> +}

[High]
Now that mtip_free_buffers() is no longer an empty stub, can the
mtip_open() error path tear down buffers that another active port is
still using?

Looking at mtip_open():

	if (fep->usage_count == 0) {
		...
		ret = mtip_alloc_buffers(dev);
		if (ret)
			goto mtip_alloc_buffers_err;
	}
	...
	ret = mtip_mii_probe(dev);
	if (ret)
		goto mtip_mii_probe_err;
	...
 mtip_mii_probe_err:
	mtip_free_buffers(dev);
 mtip_alloc_buffers_err:
	if (fep->usage_count == 0) {
		netif_napi_del(&fep->napi);
		clk_disable(fep->clk_ipg);
	}
	return ret;

mtip_alloc_buffers() only runs when fep->usage_count == 0, so the
page_pool, fep->page[] and fep->tx_bounce[] are shared across switch
ports and only allocated on first open. mtip_close() reflects this and
only calls mtip_free_buffers() when fep->usage_count == 1.

When a second port is opened (fep->usage_count > 0) and mtip_mii_probe()
fails, the mtip_mii_probe_err label still calls mtip_free_buffers()
unconditionally, which destroys the page_pool and frees the tx_bounce
buffers that the first, still-active port is using.

Should the mtip_mii_probe_err teardown also be guarded by
fep->usage_count == 0 (or the labels reordered) so buffers are only
freed for the last user, similar to mtip_close()?

After this change the previously harmless call becomes a real teardown,
so the surviving port's NAPI path (mtip_switch_rx() and the later
mtip_update_cbd() refill) would dereference a freed fep->page_pool and
NULL fep->page[] / fep->tx_bounce[] entries whenever the second port's
PHY probe fails while the first port is up.

>  
> +static int mtip_create_page_pool(struct switch_enet_private *fep, int size)
> +{
[ ... ]
> +}
>  
>  static int mtip_alloc_buffers(struct net_device *dev)
>  {
[ ... ]
>  	return 0;
> +
> + err:
> +	mtip_free_buffers(dev);
> +	return -ENOMEM;
>  }
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [net-next v24 4/7] net: mtip: Add net_device_ops functions to the L2 switch driver
  2026-06-01 11:24 ` [net-next v24 4/7] net: mtip: Add net_device_ops " Lukasz Majewski
  2026-06-02 11:32   ` sashiko-bot
@ 2026-06-05  2:18   ` Jakub Kicinski
  1 sibling, 0 replies; 21+ messages in thread
From: Jakub Kicinski @ 2026-06-05  2:18 UTC (permalink / raw)
  To: lukasz.majewski
  Cc: Jakub Kicinski, andrew+netdev, davem, edumazet, pabeni,
	richardcochran, lgirdwood, nathan, linux-kernel, netdev, krzk+dt,
	imx, linux-arm-kernel, robh

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
net: mtip: Add net_device_ops functions to the L2 switch driver

This patch adds the ndo_start_xmit, ndo_set_rx_mode, ndo_tx_timeout,
ndo_set_mac_address, and ndo_get_port_parent_id callbacks to the MTIP
L2 switch driver, along with helpers and a tx_timeout work item.

> diff --git a/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c b/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c
> index 5b5229ee411d..7dfeda230e3f 100644
> --- a/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c
> +++ b/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c

[ ... ]

> @@ -617,6 +740,76 @@ static void mtip_switch_restart(struct net_device *dev, int duplex0,
>  	mtip_config_switch(fep);
>  }
>  
> +static void mtip_print_hw_state(struct net_device *dev)
> +{

[ ... ]

> +static void mtip_timeout_work(struct work_struct *work)
> +{
> +	struct mtip_ndev_priv *priv =
> +		container_of(work, struct mtip_ndev_priv, tx_timeout_work);
> +	struct switch_enet_private *fep = priv->fep;
> +	struct net_device *dev = priv->dev;
> +
> +	rtnl_lock();
> +	if (netif_device_present(dev) || netif_running(dev)) {
> +		napi_disable(&fep->napi);
> +		netif_tx_lock_bh(dev);
> +		mtip_switch_restart(dev, fep->full_duplex[0],
> +				    fep->full_duplex[1]);
> +		netif_tx_wake_all_queues(dev);
> +		netif_tx_unlock_bh(dev);
> +		napi_enable(&fep->napi);
> +	}
> +	rtnl_unlock();
> +}

[High]
Should the guard here be a logical AND rather than OR?

	if (netif_device_present(dev) || netif_running(dev)) {

netif_device_present() returns true unless netif_device_detach() has been
called, and this driver never calls netif_device_detach().  So the OR
collapses to "always true" once the device has been registered, and the
work proceeds even after mtip_close().

mtip_close() does this on the last close:

	if (fep->usage_count == 1) {
		napi_disable(&fep->napi);
		netif_napi_del(&fep->napi);
		mtip_free_buffers(dev);
		clk_disable(fep->clk_ipg);
	}

After that, can mtip_timeout_work() running due to a previously scheduled
watchdog still call napi_disable(&fep->napi) / napi_enable(&fep->napi) on
a netif_napi_del()'d napi, and call mtip_switch_restart() which writes
hardware registers after clk_ipg has been disabled?

Would gating the work on fep->usage_count (and/or cancel_work_sync() in
mtip_close()) be more correct than the present netif_device_present() ||
netif_running() check?

[ ... ]

> @@ -1095,6 +1288,98 @@ static int mtip_close(struct net_device *dev)
>  	return 0;
>  }
>  
> +#define FEC_HASH_BITS	6		/* #bits in hash */

[ ... ]

> +static int mtip_set_mac_address(struct net_device *dev, void *p)
> +{
> +	struct mtip_ndev_priv *priv = netdev_priv(dev);
> +	struct switch_enet_private *fep = priv->fep;
> +	void __iomem *enet_addr = fep->enet_addr;
> +	struct sockaddr *addr = p;
> +	int ret;
> +
> +	if (!is_valid_ether_addr(addr->sa_data))
> +		return -EADDRNOTAVAIL;
> +
> +	ret = mtip_update_atable_static((unsigned char *)addr->sa_data,
> +					7, 7, fep);
> +	if (ret)
> +		return ret;

[Medium]
Is the previously-programmed MAC ever removed from the switch's static
address table?  mtip_update_atable_static() either updates a matching
entry in place or fills the first empty slot in an 8-entry block:

	for (entry = block_index; entry < index_end; entry++) {
		mtip_read_atable(fep, entry, &read_lo, &read_hi);
		if (read_lo == write_lo &&
		    ((read_hi & 0x0000FFFF) ==
		     (write_hi & 0x0000FFFF))) {
			mtip_write_atable(fep, entry, write_lo, write_hi);
			...
			return 0;
		} else if (!(read_hi & (1 << 16))) {
			mtip_write_atable(fep, entry, write_lo, write_hi);
			...
			return 0;
		}
	}
	...
	return -ENOSPC;

After 8 distinct MAC changes whose CRC8 maps to the same block, will
further mtip_set_mac_address() calls return -ENOSPC, and will the stale
MACs continue to be honored as static entries with port mask 7 and
priority 7?

> +
> +	eth_hw_addr_set(dev, addr->sa_data);
> +
> +	if (priv->portnum == 2)
> +		enet_addr += MCF_ESW_ENET_PORT_OFFSET;
> +
> +	writel(dev->dev_addr[3] | (dev->dev_addr[2] << 8) |
> +	       (dev->dev_addr[1] << 16) | (dev->dev_addr[0] << 24),
> +	       enet_addr + MCF_FEC_PALR);
> +	writel((dev->dev_addr[5] << 16) | (dev->dev_addr[4] << 24),
> +	       enet_addr + MCF_FEC_PAUR);
> +
> +	return 0;
> +}

[Medium]
Should fep->mac[priv->portnum - 1] be updated here as well?  The driver
caches the MAC in fep->mac[] and mtip_get_port_parent_id() reads
fep->mac[0] as the parent identifier.  After a userspace MAC change
fep->mac[] stays at its boot/DT value, so ndo_get_port_parent_id keeps
returning the old value while dev->dev_addr and the hardware registers
have moved on.

[Medium]
Does this sequence need to disable the controller before reprogramming
PALR/PAUR?  mtip_configure_enet_mii() programs ECR to MAGIC_ENA only
(no ETHER_EN), then writes PALR/PAUR, then enables ETHER_EN at the end:

	writel(MCF_FEC_ECR_MAGIC_ENA, enet_addr + MCF_FEC_ECR);
	...
	writel(... , enet_addr + MCF_FEC_PALR);
	writel(... , enet_addr + MCF_FEC_PAUR);
	...
	writel(readl(enet_addr + MCF_FEC_ECR) | MCF_FEC_ECR_ETHER_EN,
	       enet_addr + MCF_FEC_ECR);

mtip_set_mac_address() writes PALR and PAUR while ETHER_EN can be set
and traffic is being matched, and it does not take fep->hw_lock either,
so it can race with mtip_configure_enet_mii() / mtip_switch_restart()
that touch the same register block.  Is that intended?

> +
> +static int mtip_get_port_parent_id(struct net_device *ndev,
> +				   struct netdev_phys_item_id *ppid)
> +{
> +	struct mtip_ndev_priv *priv = netdev_priv(ndev);
> +	struct switch_enet_private *fep = priv->fep;
> +
> +	ppid->id_len = sizeof(fep->mac[0]);
> +	memcpy(&ppid->id, &fep->mac[0], ppid->id_len);
> +
> +	return 0;
> +}

[Medium]
Is fep->mac[0] a suitable choice for the switch parent id?  fep->mac[]
is populated only from DT via of_get_mac_address() in mtip_parse_of();
when DT does not supply a mac-address for port 1, fep->mac[0] stays
zeroed (it is kzalloc'd in probe), since mtip_setup_mac()'s bootloader
and random-MAC fallbacks update dev->dev_addr only and never write back
into fep->mac[].

If two MTIP switches both lack a DT MAC for port 1, both will report
ppid = 00:00:00:00:00:00, which switchdev/bridge code will treat as the
same switch (offload_fwd_mark, bridge offload decisions).

Would a switch-wide identifier (a register-derived id, the platform
of_node phandle, etc.) be more appropriate than a port MAC?

[ ... ]

> @@ -1158,10 +1448,14 @@ static int mtip_switch_dma_init(struct switch_enet_private *fep)
>  
>  static void mtip_ndev_cleanup(struct switch_enet_private *fep)
>  {
> +	struct mtip_ndev_priv *priv;
>  	int i;
>  
>  	for (i = 0; i < SWITCH_EPORT_NUMBER; i++) {
>  		if (fep->ndev[i]) {
> +			priv = netdev_priv(fep->ndev[i]);
> +			cancel_work_sync(&priv->tx_timeout_work);
> +
>  			unregister_netdev(fep->ndev[i]);
>  			free_netdev(fep->ndev[i]);
>  			fep->ndev[i] = NULL;

[High]
Is the order here safe against the netdev TX watchdog?  The watchdog
timer is only stopped inside unregister_netdev() (via dev_deactivate),
so between cancel_work_sync() returning and unregister_netdev() actually
disarming the watchdog the timer can still fire mtip_timeout(), which
calls schedule_work(&priv->tx_timeout_work) and re-queues the work that
was just drained.

After unregister_netdev() and free_netdev() complete, priv (which embeds
tx_timeout_work) is freed, and when the worker thread later picks up
the queued work it will dereference a freed priv/fep.

Would the sequence

	unregister_netdev(fep->ndev[i]);
	cancel_work_sync(&priv->tx_timeout_work);
	free_netdev(fep->ndev[i]);

be safer, since unregister_netdev() quiesces all sources of new work
before cancel_work_sync() drains what remains?  The same concern applies
to the rollback path in mtip_ndev_init() for ports whose watchdog is
already active.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [net-next v24 5/7] net: mtip: Add mtip_switch_{rx|tx} functions to the L2 switch driver
  2026-06-01 11:24 ` [net-next v24 5/7] net: mtip: Add mtip_switch_{rx|tx} " Lukasz Majewski
  2026-06-02 11:32   ` sashiko-bot
@ 2026-06-05  2:18   ` Jakub Kicinski
  1 sibling, 0 replies; 21+ messages in thread
From: Jakub Kicinski @ 2026-06-05  2:18 UTC (permalink / raw)
  To: lukasz.majewski
  Cc: Jakub Kicinski, andrew+netdev, davem, edumazet, pabeni,
	richardcochran, lgirdwood, nathan, linux-kernel, netdev, krzk+dt,
	imx, linux-arm-kernel, robh

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
net: mtip: Add mtip_switch_{rx|tx} functions to the L2 switch driver

This patch fills in the previously stubbed mtip_switch_tx() and
mtip_switch_rx() functions for the MTIP L2 switch driver, adds a small
helper mtip_atable_get_entry_port_number() that looks a MAC up in the
hardware address table, and a mtip_update_cbd() helper used to refill RX
buffer descriptors from the page_pool.

> diff --git a/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c b/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c
> index 7dfeda230e3f..212207861e13 100644
> --- a/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c
> +++ b/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c

[ ... ]

> @@ -836,11 +873,233 @@ static irqreturn_t mtip_interrupt(int irq, void *ptr_fep)
>  
>  static void mtip_switch_tx(struct switch_enet_private *fep)
>  {
> +	struct net_device *dev;
> +	unsigned short status;
> +	struct sk_buff *skb;
> +	struct cbd_t *bdp;
> +	int index;
> +
> +	spin_lock_bh(&fep->hw_lock);
> +	bdp = fep->dirty_tx;
> +
> +	while (((status = bdp->cbd_sc) & BD_ENET_TX_READY) == 0) {
> +		if (bdp == fep->cur_tx &&
> +		    !mtip_netif_queues_stopped(fep))
> +			break;
> +
> +		index = bdp - fep->tx_bd_base;
> +		dma_unmap_single(&fep->pdev->dev, bdp->cbd_bufaddr,
> +				 MTIP_SWITCH_TX_FRSIZE, DMA_TO_DEVICE);
> +		bdp->cbd_bufaddr = 0;
> +		skb = fep->tx_skbuff[index];
> +		dev = skb->dev;

[High]
Can mtip_switch_tx() race with mtip_adjust_link() and dereference a NULL
skb here?

mtip_adjust_link() runs in process context from phylib's state machine
and calls:

    mtip_netif_stop_queues(fep);
    mtip_switch_restart(...);
    if (mtip_netif_queues_stopped(fep))
            mtip_netif_wake_queues(fep);

mtip_switch_restart() does not appear to take fep->hw_lock, does not
disable NAPI, clears tx_skbuff[i] = NULL and resets cur_tx = dirty_tx
= base, and at the end re-enables RX/TX interrupts via ESW_IMR.

While this is in flight, both per-port queues are XOFF so
mtip_netif_queues_stopped(fep) returns true. A concurrent RX interrupt
on another CPU can schedule NAPI, which calls mtip_switch_tx() with
this state:

    bdp = fep->dirty_tx;             /* == base after restart */
    while ((bdp->cbd_sc & BD_ENET_TX_READY) == 0) {
            if (bdp == fep->cur_tx && !mtip_netif_queues_stopped(fep))
                    break;           /* not taken: queues stopped */
            ...
            skb = fep->tx_skbuff[index];   /* NULL */
            dev = skb->dev;                /* NULL deref */

mtip_switch_restart() does not reset cbd_sc, so cbd_sc & BD_ENET_TX_READY
can still be 0 from prior TX activity, which lets the loop body run.
dma_unmap_single() would also be called with the stale cbd_bufaddr.

The mtip_timeout_work() path wraps the same restart in napi_disable() /
netif_tx_lock_bh(); should the adjust_link path do the same?

> +		/* Check for errors */
> +		if (status & (BD_ENET_TX_HB | BD_ENET_TX_LC |
> +				   BD_ENET_TX_RL | BD_ENET_TX_UN |
> +				   BD_ENET_TX_CSL)) {
> +			dev->stats.tx_errors++;
> +			if (status & BD_ENET_TX_HB)  /* No heartbeat */
> +				dev->stats.tx_heartbeat_errors++;
> +			if (status & BD_ENET_TX_LC)  /* Late collision */
> +				dev->stats.tx_window_errors++;
> +			if (status & BD_ENET_TX_RL)  /* Retrans limit */
> +				dev->stats.tx_aborted_errors++;
> +			if (status & BD_ENET_TX_UN)  /* Underrun */
> +				dev->stats.tx_fifo_errors++;
> +			if (status & BD_ENET_TX_CSL) /* Carrier lost */
> +				dev->stats.tx_carrier_errors++;
> +		} else {
> +			dev->stats.tx_packets++;
> +		}
> +
> +		if (status & BD_ENET_TX_READY)
> +			dev_err_ratelimited(&fep->pdev->dev,
> +					    "xmit interrupt and TX_READY.\n");

[Low]
Can this dev_err_ratelimited() ever fire?

The enclosing loop is:

    while (((status = bdp->cbd_sc) & BD_ENET_TX_READY) == 0) {

status is only assigned in the loop header and is not re-read inside the
body, so by construction status & BD_ENET_TX_READY is always 0 when this
diagnostic is reached. Was the intent to re-read bdp->cbd_sc, or is this
leftover scaffolding that should just be removed?

> +
> +		/* Deferred means some collisions occurred during transmit,
> +		 * but we eventually sent the packet OK.
> +		 */
> +		if (status & BD_ENET_TX_DEF)
> +			dev->stats.collisions++;
> +
> +		/* Free the sk buffer associated with this last transmit */
> +		dev_consume_skb_any(skb);
> +		fep->tx_skbuff[index] = NULL;

[ ... ]

> +	}
> +	fep->dirty_tx = bdp;
> +	spin_unlock_bh(&fep->hw_lock);
>  }
>  
> +static int mtip_update_cbd(struct switch_enet_private *fep, struct cbd_t *bdp,
> +			   int index)
> +{

[ ... ]

> +}
> +

[ ... ]

>  static int mtip_switch_rx(struct net_device *dev, int budget)
>  {
> -	return -ENOMEM;
> +	struct mtip_ndev_priv *priv = netdev_priv(dev);

[Low]
After this patch mtip_switch_rx() only ever returns pkt_received, which
is non-negative; the page_pool allocation failure path does
pndev->stats.rx_dropped++ and goto rx_processing_done rather than
returning -ENOMEM.

The earlier patch in the series left mtip_rx_napi() with:

    pkts = mtip_switch_rx(napi->dev, budget);
    if (pkts == -ENOMEM) {
            napi_complete(napi);
            writel(MCF_ESW_IMR_RXF | MCF_ESW_IMR_TXF, fep->hwp + ESW_IMR);
            return 0;
    }

Is that branch now unreachable, and should it be removed so the contract
between mtip_switch_rx() and its caller stays consistent?

[ ... ]

> +
> +		/* Get correct RX page */
> +		index = bdp - fep->rx_bd_base;
> +		page = fep->page[index];
> +		/* Process the incoming frame */
> +		pkt_len = bdp->cbd_datlen;
> +
> +		dma_sync_single_for_cpu(&fep->pdev->dev, bdp->cbd_bufaddr,
> +					pkt_len, DMA_FROM_DEVICE);
> +		net_prefetch(page_address(page));
> +		data = page_address(page);
> +
> +		if (fep->quirks & FEC_QUIRK_SWAP_FRAME)
> +			swap_buffer(data, pkt_len);
> +
> +		eth_hdr = (struct ethhdr *)data;
> +		mtip_atable_get_entry_port_number(fep, eth_hdr->h_source,
> +						  &rx_port);

[ ... ]

> +
> +		if (mtip_update_cbd(fep, bdp, index)) {
> +			pndev->stats.rx_dropped++;
> +			goto rx_processing_done;
> +		}
> +
> +		/* The packet length includes FCS, but we don't want to
> +		 * include that when passing upstream as it messes up
> +		 * bridging applications.
> +		 */
> +		skb = build_skb(page_address(page), PAGE_SIZE);
> +		if (unlikely(!skb)) {
> +			page_pool_recycle_direct(fep->page_pool, page);
> +			pndev->stats.rx_dropped++;
> +
> +			netdev_err_once(pndev, "build_skb failed!\n");
> +			goto rx_processing_done;
> +		}
> +
> +		skb_put(skb, pkt_len);      /* Make room */
> +		skb_mark_for_recycle(skb);
> +		skb->protocol = eth_type_trans(skb, pndev);
> +		skb->offload_fwd_mark = fep->br_offload;
> +		napi_gro_receive(&fep->napi, skb);
> +
> +		pndev->stats.rx_packets++;
> +		pndev->stats.rx_bytes += pkt_len;

[High]
Is the FCS actually being stripped here?

The comment above build_skb() states:

    /* The packet length includes FCS, but we don't want to
     * include that when passing upstream as it messes up
     * bridging applications.
     */

but skb_put() and the rx_bytes accounting both use pkt_len unmodified:

    skb_put(skb, pkt_len);      /* Make room */
    ...
    pndev->stats.rx_bytes += pkt_len;

The equivalent fec_enet_rx_queue_napi() in fec_main.c does
pkt_len - sub_len (with sub_len = 4 + fep->rx_shift). Combined with
mtip_enet_init() programming MCF_FEC_RCR_CRC_FWD in the RCR register so
the CRC is forwarded into the RX buffer, every skb handed to
napi_gro_receive() ends up with 4 trailing FCS bytes and rx_bytes is
overcounted by 4 per frame. Should pkt_len be reduced by 4 (or by
4 + fep->rx_shift) before skb_put() and the stats update?

[ ... ]


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [net-next v24 6/7] net: mtip: Extend the L2 switch driver with management operations
  2026-06-01 11:24 ` [net-next v24 6/7] net: mtip: Extend the L2 switch driver with management operations Lukasz Majewski
  2026-06-02 11:32   ` sashiko-bot
@ 2026-06-05  2:18   ` Jakub Kicinski
  1 sibling, 0 replies; 21+ messages in thread
From: Jakub Kicinski @ 2026-06-05  2:18 UTC (permalink / raw)
  To: lukasz.majewski
  Cc: Jakub Kicinski, andrew+netdev, davem, edumazet, pabeni,
	richardcochran, lgirdwood, nathan, linux-kernel, netdev, krzk+dt,
	imx, linux-arm-kernel, robh

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
net: mtip: Extend the L2 switch driver with management operations

This patch adds switch management helpers (port enable, blocking,
learning, broadcast, multicast, VLAN input/output, VLAN verification,
forced forwarding, port separation) and wires them into the switch
configuration and TX paths.

> diff --git a/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c b/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c
> index 212207861e13..76e503e128dd 100644
> --- a/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c
> +++ b/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c
> @@ -505,8 +505,35 @@ static void mtip_config_switch(struct switch_enet_private *fep)
>  
>  	writel(0, fep->hwp + ESW_BKLR);
>  
> +	/* Do NOT disable learning */
> +	mtip_port_learning_config(fep, 0, 0, 0);
> +	mtip_port_learning_config(fep, 1, 0, 0);
> +	mtip_port_learning_config(fep, 2, 0, 0);
> +
> +	/* Disable blocking */
> +	mtip_port_blocking_config(fep, 0, 0);
> +	mtip_port_blocking_config(fep, 1, 0);
> +	mtip_port_blocking_config(fep, 2, 0);
> +
>  	writel(MCF_ESW_IMR_TXF | MCF_ESW_IMR_RXF,
>  	       fep->hwp + ESW_IMR);
> +
> +	mtip_port_enable_config(fep, 0, 1, 1);
> +	mtip_port_enable_config(fep, 1, 1, 1);
> +	mtip_port_enable_config(fep, 2, 1, 1);
> +
> +	mtip_port_broadcast_config(fep, 0, 1);
> +	mtip_port_broadcast_config(fep, 1, 1);
> +	mtip_port_broadcast_config(fep, 2, 1);
> +
> +	/* Disable multicast receive on port 0 (MGNT) */
> +	mtip_port_multicast_config(fep, 0, 0);
> +	mtip_port_multicast_config(fep, 1, 1);
> +	mtip_port_multicast_config(fep, 2, 1);

[Low]
mtip_port_learning_config(), mtip_port_blocking_config(),
mtip_port_enable_config(), mtip_port_broadcast_config() and
mtip_port_multicast_config() all return int, but every return value here
is discarded and mtip_config_switch() itself stays void.

The arguments are compile-time constants today so validation cannot
fail, but if any of these helpers later acquire a real failure mode (for
example, hardware probing or capability gating) the switch would be
silently left in a partially configured state.  Should the return
values be checked, or should these helpers be void?

> +
> +	/* Setup VLANs to provide port separation */
> +	if (!fep->br_offload)
> +		mtip_switch_en_port_separation(fep);
>  }
>  
>  static netdev_tx_t mtip_start_xmit_port(struct sk_buff *skb,
> @@ -592,6 +619,10 @@ static netdev_tx_t mtip_start_xmit_port(struct sk_buff *skb,
>  
>  	skb_tx_timestamp(skb);
>  
> +	/* For port separation - force sending via specified port */
> +	if (!fep->br_offload && port != 0)
> +		mtip_forced_forward(fep, port, 1);
> +

[High]
This is the per-packet TX path, and mtip_forced_forward() is called for
every non-management transmit.  A few questions about that call:

The helper passes the netdev port index (1 or 2) in as the FD value, but
its body encodes that argument as a bitmap into MCF_ESW_P0FFEN_FD_MASK:

	reg |= FIELD_PREP(MCF_ESW_P0FFEN_FD_MASK, port & GENMASK(1, 0));

For port == 1 the FD field is 0b01, for port == 2 it is 0b10.  If FD is
a destination-port bitmap, does this mean port == 1 forwards to MGNT
port 0 and port == 2 forwards to the first user port?  Is the caller
passing the right value?

The function builds the register value from a fresh u32 reg = 0 and
unconditionally writes ESW_P0FFEN, so the else branch that masks off
MCF_ESW_P0FFEN_FEN never runs in practice (reg already has FEN clear).
Once this path runs once with enable == 1, who clears the FEN bit?

mtip_switch_dis_port_separation() resets PID/VRES/VIMEN/VIMSEL/VLANV/
VOMSEL but not ESW_P0FFEN.  Is forced-forward state expected to remain
asserted across reconfiguration?

>  	dev->stats.tx_bytes += skb->len;
>  	/* If this was the last BD in the ring,
>  	 * start at the beginning again.

> diff --git a/drivers/net/ethernet/freescale/mtipsw/mtipl2sw_mgnt.c b/drivers/net/ethernet/freescale/mtipsw/mtipl2sw_mgnt.c
> new file mode 100644
> index 000000000000..4c07861a6a3e
> --- /dev/null
> +++ b/drivers/net/ethernet/freescale/mtipsw/mtipl2sw_mgnt.c

[ ... ]

> +int mtip_vlan_input_process(struct switch_enet_private *fep,
> +			    int port, int mode, unsigned short port_vlanid,
> +			    int vlan_verify_en, int vlan_domain_num,
> +			    int vlan_domain_port)
> +{
> +	if (mode < 0 || mode > 3) {
> +		dev_err(&fep->pdev->dev,
> +			"%s: VLAN input processing mode (%d) not supported\n",
> +			__func__, mode);
> +		return -EINVAL;
> +	}
> +
> +	if (port < 0 || port > 2) {
> +		dev_err(&fep->pdev->dev, "%s: Port (%d) not supported!\n",
> +			__func__, port);
> +		return -EINVAL;
> +	}
> +
> +	if (vlan_verify_en == 1 &&
> +	    (vlan_domain_num < 0 || vlan_domain_num > 32)) {
> +		dev_err(&fep->pdev->dev, "%s: Domain out of range\n", __func__);
> +		return -EINVAL;
> +	}

[Medium]
Is the upper bound here off by one?  ESW_VRES_BASE is 0x280 and
ESW_VRES(x) = 0x280 + 4*x, so vlan_domain_num == 32 produces offset
0x300, which is the ESW_DISCN register.  A call with
vlan_domain_num == 32 would write a VRES bitmap value into the
discard-stats register.  Should the test be > 31 (or >= 32)?

[Low]
The vlan_domain_port parameter is part of the function signature and
the public header but never referenced in the body, and all in-tree
callers pass 0.  Is it intended to control which VRES port bit is set
(currently hardcoded from port), or should the parameter be removed
until it has a use?

> +
> +	writel(FIELD_PREP(MCF_ESW_PID_VLANID_MASK, port_vlanid),
> +	       fep->hwp + ESW_PID(port));
> +	if (port == 0) {
> +		if (vlan_verify_en == 1)
> +			writel(FIELD_PREP(MCF_ESW_VRES_VLANID_MASK,
> +					  port_vlanid) | MCF_ESW_VRES_P0,
> +			       fep->hwp + ESW_VRES(vlan_domain_num));
> +
> +		writel(readl(fep->hwp + ESW_VIMEN) | MCF_ESW_VIMEN_EN0,
> +		       fep->hwp + ESW_VIMEN);
> +		writel(readl(fep->hwp + ESW_VIMSEL) |
> +		       FIELD_PREP(MCF_ESW_VIMSEL_IM0_MASK, mode),
> +		       fep->hwp + ESW_VIMSEL);

[Medium]
VIMSEL IMx and VOMSEL OMx are 2-bit fields, but the update here is
read | FIELD_PREP(MASK, mode) without first clearing the field.  Once a
non-zero mode has been programmed for a port, can a subsequent
reprogramming actually replace it?

For example, programming port 0 first with mode 1 (binary 01) and then
with mode 2 (binary 10) would yield 01 | 10 = 11 in IM0.

mtip_config_switch() runs from the switch-restart path
(mtip_open(), mtip_timeout_work(), mtip_adjust_link()) and does not
zero VIMSEL/VOMSEL beforehand, so a restart with a different mode would
hit this.  Should the helpers do reg &= ~MASK; reg |= FIELD_PREP(...)
as the sibling helpers (e.g. mtip_port_enable_config) do?

> +	} else if (port == 1) {

[ ... ]

> +int mtip_vlan_output_process(struct switch_enet_private *fep, int port,
> +			     int mode)
> +{
> +	if (port < 0 || port > 2) {
> +		dev_err(&fep->pdev->dev, "%s: Port (%d) not supported!\n",
> +			__func__, port);
> +		return -EINVAL;
> +	}
> +
> +	if (port == 0) {
> +		writel(readl(fep->hwp + ESW_VOMSEL) |
> +		       FIELD_PREP(MCF_ESW_VOMSEL_OM0_MASK, mode),
> +		       fep->hwp + ESW_VOMSEL);

[Medium]
The same accumulating-OR pattern is used here, and there is no
validation of mode (the input variant has it).  Should mode be range
checked, and should the OMx field be cleared before being rewritten?

> +	} else if (port == 1) {

[ ... ]

> +int mtip_set_vlan_verification(struct switch_enet_private *fep, int port,
> +			       int vlan_domain_verify_en,
> +			       int vlan_discard_unknown_en)
> +{
> +	if (port < 0 || port > 2) {
> +		dev_err(&fep->pdev->dev, "%s: Port (%d) not supported!\n",
> +			__func__, port);
> +		return -EINVAL;
> +	}
> +
> +	if (vlan_domain_verify_en == 1) {

[Low]
Both vlan_domain_verify_en and vlan_discard_unknown_en are tested for
exactly 1 and exactly 0; any other value silently performs no register
update yet the function still returns 0.

Would it be cleaner to declare these parameters bool (as several other
helpers in this file do) or to validate them and return -EINVAL on out
of range input?

> +		if (port == 0)
> +			writel(readl(fep->hwp + ESW_VLANV) | MCF_ESW_VLANV_VV0,
> +			       fep->hwp + ESW_VLANV);

[ ... ]

> +int mtip_port_multicast_config(struct switch_enet_private *fep,
> +			       int port, bool enable)
> +{
> +	u32 reg = 0;
> +
> +	if (port < 0 || port > 2) {
> +		dev_err(&fep->pdev->dev, "%s: Port (%d) not supported\n",
> +			__func__, port);
> +		return -EINVAL;
> +	}
> +
> +	reg = readl(fep->hwp + ESW_DMCR);

[Medium]
A general question that applies to every helper added in this file:
each one performs a read-modify-write on a shared MMIO register
(ESW_BKLR, ESW_PER, ESW_DBCR, ESW_DMCR, ESW_VLANV, ESW_VIMEN,
ESW_VIMSEL, ESW_VOMSEL, ESW_VRES, ESW_IMR) without taking any lock and
without a documented locking precondition, and the helpers are exported
via mtipl2sw.h.

fep->hw_lock already exists for HW configuration but the helpers
neither take it nor call lockdep_assert_held() on it.
mtip_config_switch() itself issues three independent RMWs on the same
register for each helper (one per port).

Today the only caller is mtip_config_switch() during switch restart, so
the race is dormant.  If a future caller (sysfs/netlink/bridge offload
hook) appears, every shared-register update becomes a race.  Should
these helpers take fep->hw_lock internally, or assert it via
lockdep_assert_held()?

[ ... ]

> +int mtip_forced_forward(struct switch_enet_private *fep, int port, bool enable)
> +{
> +	u32 reg = 0;
> +
> +	if (port & ~GENMASK(1, 0)) {
> +		dev_err(&fep->pdev->dev,
> +			"%s: Forced forward for port(s): 0x%x not supported!\n",
> +			__func__, port);
> +		return -EINVAL;
> +	}
> +
> +	/* Enable Forced forwarding for port(s) */
> +	reg |= FIELD_PREP(MCF_ESW_P0FFEN_FD_MASK, port & GENMASK(1, 0));
> +
> +	if (enable)
> +		reg |= MCF_ESW_P0FFEN_FEN;
> +	else
> +		reg &= ~MCF_ESW_P0FFEN_FEN;
> +
> +	writel(reg, fep->hwp + ESW_P0FFEN);
> +	return 0;
> +}

[High]
Two more notes here in addition to the call-site question above:

The validation accepts port values 0..3 via port & ~GENMASK(1, 0) while
every sibling helper in this file rejects port > 2.  Which convention
is intended for the API?

The function unconditionally overwrites ESW_P0FFEN starting from
u32 reg = 0, instead of doing the readl/modify/writel that every other
helper here does.  Is that intentional, and if so what about other bits
that may live in this register?

> +
> +int mtip_port_learning_config(struct switch_enet_private *fep, int port,
> +			      bool disable, bool irq_adj)
> +{
> +	u32 reg = 0;
> +
> +	if (port < 0 || port > 2) {
> +		dev_err(&fep->pdev->dev, "%s: Port (%d) not supported\n",
> +			__func__, port);
> +		return -EINVAL;
> +	}
> +
> +	reg = readl(fep->hwp + ESW_BKLR);
> +	if (disable) {
> +		if (irq_adj)
> +			writel(readl(fep->hwp + ESW_IMR) & ~MCF_ESW_IMR_LRN,
> +			       fep->hwp + ESW_IMR);

[Medium]
Can this RMW on ESW_IMR race with mtip_interrupt()?  The hardirq
handler does its own RMW on the same register with no lock:

	int_imask = readl(fep->hwp + ESW_IMR);
	int_imask &= ~MCF_ESW_IMR_RXF;
	writel(int_imask, fep->hwp + ESW_IMR);

If the hardirq fires between this helper's readl() and writel(), the
interrupt handler's RXF mask clear can be lost on writeback,
re-enabling RXF while NAPI is supposed to be servicing it.

Today mtip_config_switch() calls this helper with irq_adj = 0 so the
ESW_IMR side is dormant, but the function is exported for the explicit
purpose of toggling the LRN mask, so the documented use of the API
trips the race.  Should this take fep->hw_lock with spin_lock_irqsave()
(and have mtip_interrupt() take the same lock around its RMW)?

> +
[ ... ]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [net-next v24 3/7] net: mtip: Add buffers management functions to the L2 switch driver
  2026-06-05  2:18   ` Jakub Kicinski
@ 2026-06-09  8:37     ` Łukasz Majewski
  0 siblings, 0 replies; 21+ messages in thread
From: Łukasz Majewski @ 2026-06-09  8:37 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: andrew+netdev, davem, edumazet, pabeni, richardcochran, lgirdwood,
	nathan, linux-kernel, netdev, krzk+dt, imx, linux-arm-kernel,
	robh, Andrew Lunn

Hi Jakub,

> This is an AI-generated review of your patch. The human sending this
> email has considered the AI review valid, or at least plausible.
> ---
> net: mtip: Add buffers management functions to the L2 switch driver
> 
> This patch fills in mtip_alloc_buffers() and mtip_free_buffers() for
> the MTIP L2 switch driver, replacing the previously empty stubs with
> real page_pool setup/teardown and tx_bounce buffer allocation/free.
> 
> > diff --git a/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c
> > b/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c index
> > 1d1005a42306..5b5229ee411d 100644 ---
> > a/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c +++
> > b/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c @@ -892,11
> > +892,101 @@ static void mtip_get_drvinfo(struct net_device *dev, 
> >  static void mtip_free_buffers(struct net_device *dev)
> >  {
> > +	struct mtip_ndev_priv *priv = netdev_priv(dev);
> > +	struct switch_enet_private *fep = priv->fep;
> > +	int i;
> > +
> > +	for (i = 0; i < RX_RING_SIZE; i++) {
> > +		if (!fep->page[i])
> > +			continue;
> > +
> > +		page_pool_put_full_page(fep->page_pool,
> > +					fep->page[i], false);
> > +		fep->page[i] = NULL;
> > +	}
> > +
> > +	page_pool_destroy(fep->page_pool);
> > +	fep->page_pool = NULL;
> > +
> > +	for (i = 0; i < TX_RING_SIZE; i++) {
> > +		kfree(fep->tx_bounce[i]);
> > +		fep->tx_bounce[i] = NULL;
> > +	}
> > +}  
> 
> [High]
> Now that mtip_free_buffers() is no longer an empty stub, can the
> mtip_open() error path tear down buffers that another active port is
> still using?
> 
> Looking at mtip_open():
> 
> 	if (fep->usage_count == 0) {
> 		...
> 		ret = mtip_alloc_buffers(dev);
> 		if (ret)
> 			goto mtip_alloc_buffers_err;
> 	}
> 	...
> 	ret = mtip_mii_probe(dev);
> 	if (ret)
> 		goto mtip_mii_probe_err;
> 	...
>  mtip_mii_probe_err:
> 	mtip_free_buffers(dev);
>  mtip_alloc_buffers_err:
> 	if (fep->usage_count == 0) {
> 		netif_napi_del(&fep->napi);
> 		clk_disable(fep->clk_ipg);
> 	}
> 	return ret;
> 
> mtip_alloc_buffers() only runs when fep->usage_count == 0, so the
> page_pool, fep->page[] and fep->tx_bounce[] are shared across switch
> ports and only allocated on first open. mtip_close() reflects this and
> only calls mtip_free_buffers() when fep->usage_count == 1.
> 
> When a second port is opened (fep->usage_count > 0) and
> mtip_mii_probe() fails, the mtip_mii_probe_err label still calls
> mtip_free_buffers() unconditionally, which destroys the page_pool and
> frees the tx_bounce buffers that the first, still-active port is
> using.
> 
> Should the mtip_mii_probe_err teardown also be guarded by
> fep->usage_count == 0 (or the labels reordered) so buffers are only
> freed for the last user, similar to mtip_close()?
> 
> After this change the previously harmless call becomes a real
> teardown, so the surviving port's NAPI path (mtip_switch_rx() and the
> later mtip_update_cbd() refill) would dereference a freed
> fep->page_pool and NULL fep->page[] / fep->tx_bounce[] entries
> whenever the second port's PHY probe fails while the first port is up.
> 
> >  
> > +static int mtip_create_page_pool(struct switch_enet_private *fep,
> > int size) +{  
> [ ... ]
> > +}
> >  
> >  static int mtip_alloc_buffers(struct net_device *dev)
> >  {  
> [ ... ]
> >  	return 0;
> > +
> > + err:
> > +	mtip_free_buffers(dev);
> > +	return -ENOMEM;
> >  }
> >  

Thank you for sending the AI review.

Please find my comments:

- I've gone through them and yes, there are a few issues found.
  However, those are IMHO not critical ones.

- Some issues are not valid at all - like concern what happens when the
  switch is run on multi processor chip - both imx287 and vf610 are
  single core chips.

- It seems like the AI review is "progressing" - i.e. there were no
  issues found with v23 for this particular patch. I do guess that the
  new Orc Mode from [1] has found it.

- I do guess that different AI agent would generate different review

- The aforementioned issues could be addressed with a fix patches -
  dragging along (and preparing for upstream) patch set with ~3400
  LOC (which is rejected by non-deterministic AI review) takes too much
  resources now.


To sum up:
----------

The MTIP driver for v6.6 kernel (YPRR Scarthgap) with and without
PREEMPT_RT for vf610 and imx287 as well as the v24 for net-next can be
found at [2].


Links:

[1] - https://netdev-ai.bots.linux.dev/ai-local.html
[2] - https://github.com/lmajewski/linux-imx28-l2switch/branches



-- 
Best regards,

Łukasz Majewski

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2026-06-09  8:37 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-01 11:24 [net-next v24 0/7] net: mtip: Add support for MTIP imx287 L2 switch driver Lukasz Majewski
2026-06-01 11:24 ` [net-next v24 1/7] dt-bindings: net: Add MTIP L2 switch description Lukasz Majewski
2026-06-02 11:32   ` sashiko-bot
2026-06-01 11:24 ` [net-next v24 2/7] net: mtip: The L2 switch driver for imx287 Lukasz Majewski
2026-06-02 11:32   ` sashiko-bot
2026-06-05  2:18   ` Jakub Kicinski
2026-06-01 11:24 ` [net-next v24 3/7] net: mtip: Add buffers management functions to the L2 switch driver Lukasz Majewski
2026-06-02 11:32   ` sashiko-bot
2026-06-05  2:18   ` Jakub Kicinski
2026-06-09  8:37     ` Łukasz Majewski
2026-06-01 11:24 ` [net-next v24 4/7] net: mtip: Add net_device_ops " Lukasz Majewski
2026-06-02 11:32   ` sashiko-bot
2026-06-05  2:18   ` Jakub Kicinski
2026-06-01 11:24 ` [net-next v24 5/7] net: mtip: Add mtip_switch_{rx|tx} " Lukasz Majewski
2026-06-02 11:32   ` sashiko-bot
2026-06-05  2:18   ` Jakub Kicinski
2026-06-01 11:24 ` [net-next v24 6/7] net: mtip: Extend the L2 switch driver with management operations Lukasz Majewski
2026-06-02 11:32   ` sashiko-bot
2026-06-05  2:18   ` Jakub Kicinski
2026-06-01 11:24 ` [net-next v24 7/7] net: mtip: Extend the L2 switch driver for imx287 with bridge operations Lukasz Majewski
2026-06-02 11:32   ` sashiko-bot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.