Netdev List
 help / color / mirror / Atom feed
* [net-next v8 0/3] Add RSS and LRO support
@ 2026-05-09 19:09 Frank Wunderlich
  2026-05-09 19:09 ` [net-next v8 1/3] net: ethernet: mtk_eth_soc: Add register definitions for RSS and LRO Frank Wunderlich
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Frank Wunderlich @ 2026-05-09 19:09 UTC (permalink / raw)
  To: Felix Fietkau, Lorenzo Bianconi, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Matthias Brugger,
	AngeloGioacchino Del Regno, Russell King
  Cc: Frank Wunderlich, netdev, linux-kernel, linux-arm-kernel,
	linux-mediatek, Mason Chang, Daniel Golle

From: Frank Wunderlich <frank-w@public-files.de>

This series add RSS and LRO hardware acceleration for terminating
traffic on MT798x.

patches are upported from mtk SDK:
- https://git01.mediatek.com/plugins/gitiles/openwrt/feeds/mtk-openwrt-feeds/+/refs/heads/master/master/files/target/linux/mediatek/patches-6.12/999-eth-08-mtk_eth_soc-add-register-definitions-for-rss-lro-reg.patch
- https://git01.mediatek.com/plugins/gitiles/openwrt/feeds/mtk-openwrt-feeds/+/refs/heads/master/master/files/target/linux/mediatek/patches-6.12/999-eth-09-mtk_eth_soc-add-rss-support.patch
- https://git01.mediatek.com/plugins/gitiles/openwrt/feeds/mtk-openwrt-feeds/+/refs/heads/master/master/files/target/linux/mediatek/patches-6.12/999-eth-10-mtk_eth_soc-add-hw-lro-support.patch
with additional fixes

changes:
  v8:
  - fix more sparse errors, should be all touched lines now

  v7:
  - fix u32 vs. be32 reported by patchwork check
  - add L4 PSH check
    https://git01.mediatek.com/plugins/gitiles/openwrt/feeds/mtk-openwrt-feeds/+/7521c42b0bd5be20d52e20b110daea8c756fc069%5E%21/#F1
  - Add HW LRO max 4-depth VLAN support including switch special tag.
    https://git01.mediatek.com/plugins/gitiles/openwrt/feeds/mtk-openwrt-feeds/+/35490cec6a2e5982532935fb0a1c884f7c4efdb0%5E%21/#F2

  v6:
    - no RFC
    - rebase on netnext (7.1-rc1)
    - drop unused MTK_CTRL_DW0_SDL_MASK
    - e33bd8dd7f1f ("net: mediatek: convert to use .get_rx_ring_count") moved
      ETHTOOL_GRXRINGS handling from mtk_get_rxnfc to mtk_get_rx_ring_count
      move changes to this new function too
    - fix some Macro argument '...' may be better as '(...)' to avoid precedence issues
  v5:
    - fix too long lines after macro changes reported by checkpatch
  v4:
    - drop unrelated file
    - rss-changes suggested by andrew
    - fix MTK_HW_LRO_RING_NUM macro (add eth)
    - fix MTK_LRO_CTRL_DW[123]_CFG (add reg_map param)
    - fix MTK_RX_DONE_INT (add eth param)
    - fix lro reverse christmas tree and LRO params suggested by andrew
    - drop mtk_hwlro_stats_ebl and unused IS_HW_LRO_RING (only used in
      properitary debugfs)
  v3:
    - readded the change dropped in v2 because it was a fix
      for getting RSS working on mt7986
    - changes requested by jakub
    - reworked coverletter (dropped instructions for configuration)
    - name all PDMA-IRQ the same way
    - retested on
      - BPI-R3/mt7986 (RSS needs to be enabled)
      - BPI-R4/mt7988
      - BPI-R64/mt7622 and BPI-R2/mt7623 for not breaking network functionality

  v2:
    - drop wrong change (MTK_CDMP_IG_CTRL is only netsys v1)
    - Fix immutable string IRQ setup (thx to Emilia Schotte)
    - drop links to 6.6 patches/commits in sdk in comments

Mason Chang (3):
  net: ethernet: mtk_eth_soc: Add register definitions for RSS and LRO
  net: ethernet: mtk_eth_soc: Add RSS support
  net: ethernet: mtk_eth_soc: Add LRO support

 drivers/net/ethernet/mediatek/mtk_eth_soc.c | 828 ++++++++++++++++----
 drivers/net/ethernet/mediatek/mtk_eth_soc.h | 177 +++--
 2 files changed, 792 insertions(+), 213 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [net-next v8 1/3] net: ethernet: mtk_eth_soc: Add register definitions for RSS and LRO
  2026-05-09 19:09 [net-next v8 0/3] Add RSS and LRO support Frank Wunderlich
@ 2026-05-09 19:09 ` Frank Wunderlich
  2026-05-09 19:09 ` [net-next v8 2/3] net: ethernet: mtk_eth_soc: Add RSS support Frank Wunderlich
  2026-05-09 19:09 ` [net-next v8 3/3] net: ethernet: mtk_eth_soc: Add LRO support Frank Wunderlich
  2 siblings, 0 replies; 8+ messages in thread
From: Frank Wunderlich @ 2026-05-09 19:09 UTC (permalink / raw)
  To: Felix Fietkau, Lorenzo Bianconi, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Matthias Brugger,
	AngeloGioacchino Del Regno, Russell King
  Cc: Frank Wunderlich, netdev, linux-kernel, linux-arm-kernel,
	linux-mediatek, Mason Chang, Daniel Golle

From: Mason Chang <mason-cw.chang@mediatek.com>

Add definitions for Receive Side Scaling and Large Receive Offload support.

Signed-off-by: Mason Chang <mason-cw.chang@mediatek.com>
Signed-off-by: Frank Wunderlich <frank-w@public-files.de>
---
 drivers/net/ethernet/mediatek/mtk_eth_soc.c | 23 +++++++++++++++
 drivers/net/ethernet/mediatek/mtk_eth_soc.h | 32 +++++++++++++++------
 2 files changed, 46 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index 8d225bc9f063..d25e0b96c26e 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -50,13 +50,18 @@ static const struct mtk_reg_map mtk_reg_map = {
 		.rx_ptr		= 0x0900,
 		.rx_cnt_cfg	= 0x0904,
 		.pcrx_ptr	= 0x0908,
+		.lro_ctrl_dw0   = 0x0980,
 		.glo_cfg	= 0x0a04,
 		.rst_idx	= 0x0a08,
 		.delay_irq	= 0x0a0c,
 		.irq_status	= 0x0a20,
 		.irq_mask	= 0x0a28,
 		.adma_rx_dbg0	= 0x0a38,
+		.lro_alt_score_delta	= 0x0a4c,
 		.int_grp	= 0x0a50,
+		.lro_rx1_dly_int	= 0x0a70,
+		.lro_ring_dip_dw0	= 0x0b04,
+		.lro_ring_ctrl_dw1	= 0x0b28,
 	},
 	.qdma = {
 		.qtx_cfg	= 0x1800,
@@ -113,6 +118,7 @@ static const struct mtk_reg_map mt7986_reg_map = {
 	.tx_irq_mask		= 0x461c,
 	.tx_irq_status		= 0x4618,
 	.pdma = {
+		.rss_glo_cfg    = 0x2800,
 		.rx_ptr		= 0x4100,
 		.rx_cnt_cfg	= 0x4104,
 		.pcrx_ptr	= 0x4108,
@@ -123,6 +129,12 @@ static const struct mtk_reg_map mt7986_reg_map = {
 		.irq_mask	= 0x4228,
 		.adma_rx_dbg0	= 0x4238,
 		.int_grp	= 0x4250,
+		.int_grp3	= 0x422c,
+		.lro_ctrl_dw0	= 0x4180,
+		.lro_alt_score_delta	= 0x424c,
+		.lro_rx1_dly_int	= 0x4270,
+		.lro_ring_dip_dw0	= 0x4304,
+		.lro_ring_ctrl_dw1	= 0x4328,
 	},
 	.qdma = {
 		.qtx_cfg	= 0x4400,
@@ -170,10 +182,21 @@ static const struct mtk_reg_map mt7988_reg_map = {
 		.glo_cfg	= 0x6a04,
 		.rst_idx	= 0x6a08,
 		.delay_irq	= 0x6a0c,
+		.rx_cfg		= 0x6a10,
 		.irq_status	= 0x6a20,
 		.irq_mask	= 0x6a28,
 		.adma_rx_dbg0	= 0x6a38,
 		.int_grp	= 0x6a50,
+		.int_grp3	= 0x6a58,
+		.tx_delay_irq	= 0x6ab0,
+		.rx_delay_irq	= 0x6ac0,
+		.lro_ctrl_dw0	= 0x6c08,
+		.lro_alt_score_delta	= 0x6c1c,
+		.lro_ring_dip_dw0	= 0x6c14,
+		.lro_ring_ctrl_dw1	= 0x6c38,
+		.lro_alt_dbg	= 0x6c40,
+		.lro_alt_dbg_data	= 0x6c44,
+		.rss_glo_cfg	= 0x7000,
 	},
 	.qdma = {
 		.qtx_cfg	= 0x4400,
diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.h b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
index 0168e2fbc619..334625814b97 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.h
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
@@ -1143,16 +1143,30 @@ struct mtk_reg_map {
 	u32	tx_irq_mask;
 	u32	tx_irq_status;
 	struct {
-		u32	rx_ptr;		/* rx base pointer */
-		u32	rx_cnt_cfg;	/* rx max count configuration */
-		u32	pcrx_ptr;	/* rx cpu pointer */
-		u32	glo_cfg;	/* global configuration */
-		u32	rst_idx;	/* reset index */
-		u32	delay_irq;	/* delay interrupt */
-		u32	irq_status;	/* interrupt status */
-		u32	irq_mask;	/* interrupt mask */
+		u32	rx_ptr;			/* rx base pointer */
+		u32	rx_cnt_cfg;		/* rx max count configuration */
+		u32	pcrx_ptr;		/* rx cpu pointer */
+		u32	pdrx_ptr;		/* rx dma pointer */
+		u32	glo_cfg;		/* global configuration */
+		u32	rst_idx;		/* reset index */
+		u32	rx_cfg;			/* rx dma configuration */
+		u32	delay_irq;		/* delay interrupt */
+		u32	irq_status;		/* interrupt status */
+		u32	irq_mask;		/* interrupt mask */
 		u32	adma_rx_dbg0;
-		u32	int_grp;
+		u32	int_grp;		/* interrupt group1 */
+		u32	int_grp3;		/* interrupt group3 */
+		u32	tx_delay_irq;		/* tx delay interrupt */
+		u32	rx_delay_irq;		/* rx delay interrupt */
+		u32	lro_ctrl_dw0;		/* lro ctrl dword0 */
+		u32	lro_alt_score_delta;	/* lro auto-learn score delta */
+		u32	lro_rx1_dly_int;	/* lro rx ring1 delay interrupt */
+		u32	lro_ring_dip_dw0;	/* lro ring dip dword0 */
+		u32	lro_ring_ctrl_dw1;	/* lro ring ctrl dword1 */
+		u32	lro_alt_dbg;		/* lro auto-learn debug */
+		u32	lro_alt_dbg_data;	/* lro auto-learn debug data */
+		u32	rss_glo_cfg;		/* rss global configuration */
+
 	} pdma;
 	struct {
 		u32	qtx_cfg;	/* tx queue configuration */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [net-next v8 2/3] net: ethernet: mtk_eth_soc: Add RSS support
  2026-05-09 19:09 [net-next v8 0/3] Add RSS and LRO support Frank Wunderlich
  2026-05-09 19:09 ` [net-next v8 1/3] net: ethernet: mtk_eth_soc: Add register definitions for RSS and LRO Frank Wunderlich
@ 2026-05-09 19:09 ` Frank Wunderlich
  2026-05-14  1:52   ` Jakub Kicinski
  2026-05-14  1:56   ` Jakub Kicinski
  2026-05-09 19:09 ` [net-next v8 3/3] net: ethernet: mtk_eth_soc: Add LRO support Frank Wunderlich
  2 siblings, 2 replies; 8+ messages in thread
From: Frank Wunderlich @ 2026-05-09 19:09 UTC (permalink / raw)
  To: Felix Fietkau, Lorenzo Bianconi, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Matthias Brugger,
	AngeloGioacchino Del Regno, Russell King
  Cc: Frank Wunderlich, netdev, linux-kernel, linux-arm-kernel,
	linux-mediatek, Mason Chang, Daniel Golle

From: Mason Chang <mason-cw.chang@mediatek.com>

Add support for Receive Side Scaling.

We can adjust SMP affinity with the following command:
echo [CPU bitmap num] > /proc/irq/[virtual IRQ ID]/smp_affinity,
with interrupts evenly assigned to 4 CPUs, we were able to measure
an RX throughput of 7.3Gbps using iperf3 on the MT7988. Further
optimizations will be carried out in the future.

The experimental command is as follows:
PC: iperf3 -c [IP] -P 10
DUT: iperf3 -s

The entire indirection table can be imagined as 128 buckets, we
can use the ethtool command to mark which RX ring we want to send
the packets in these buckets to.

Show RSS RX ring parameters in indirection table and RSS hash key:
ethtool -x [interface]
Change RSS RX rings weight under uniform distribution:
ethtool --set-rxfh-indir [interface] equal [ring num]
Change RSS RX rings weight under non-uniform distribution:
ethtool --set-rxfh-indir [interface] weight [ring0 weight]
[ring1 weight] [ring2 weight] [ring3 weight]

Signed-off-by: Mason Chang <mason-cw.chang@mediatek.com>
Signed-off-by: Frank Wunderlich <frank-w@public-files.de>
---
v6:
- e33bd8dd7f1f ("net: mediatek: convert to use .get_rx_ring_count") moved
  ETHTOOL_GRXRINGS handling from mtk_get_rxnfc to mtk_get_rx_ring_count
  move changes to this new function too
- fix some Macro argument '...' may be better as '(...)' to avoid precedence issues

v5:
- fix too long line reported by checkpatch
  MTK_RSS_HASH_KEY_DW
  MTK_RSS_INDR_TABLE_DW
  MTK_LRO_CTRL_DW[123]_CFG

v4:
- drop unrelated file
- rss-changes suggested by andrew
  - fix MTK_HW_LRO_RING_NUM macro (add eth)
  - fix MTK_LRO_CTRL_DW[123]_CFG (add reg_map param)
  - fix MTK_RX_DONE_INT (add eth param)

v3:
- changes requested by jakub
- readded rss fix for mt7986
- name all PDMA-IRQ the same way

v2:
- drop wrong change (MTK_CDMP_IG_CTRL is only netsys v1)
- Fix immutable string IRQ setup (thx to Emilia Schotte)
- drop link to no more existent 6.6 patch in comment
---
 drivers/net/ethernet/mediatek/mtk_eth_soc.c | 544 +++++++++++++++-----
 drivers/net/ethernet/mediatek/mtk_eth_soc.h |  98 +++-
 2 files changed, 496 insertions(+), 146 deletions(-)

diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index d25e0b96c26e..908fd88287ac 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -1297,6 +1297,7 @@ static bool mtk_rx_get_desc(struct mtk_eth *eth, struct mtk_rx_dma_v2 *rxd,
 	if (mtk_is_netsys_v3_or_greater(eth)) {
 		rxd->rxd5 = READ_ONCE(dma_rxd->rxd5);
 		rxd->rxd6 = READ_ONCE(dma_rxd->rxd6);
+		rxd->rxd7 = READ_ONCE(dma_rxd->rxd7);
 	}
 
 	return true;
@@ -1864,47 +1865,9 @@ static netdev_tx_t mtk_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	return NETDEV_TX_OK;
 }
 
-static struct mtk_rx_ring *mtk_get_rx_ring(struct mtk_eth *eth)
+static void mtk_update_rx_cpu_idx(struct mtk_eth *eth, struct mtk_rx_ring *ring)
 {
-	int i;
-	struct mtk_rx_ring *ring;
-	int idx;
-
-	if (!eth->hwlro)
-		return &eth->rx_ring[0];
-
-	for (i = 0; i < MTK_MAX_RX_RING_NUM; i++) {
-		struct mtk_rx_dma *rxd;
-
-		ring = &eth->rx_ring[i];
-		idx = NEXT_DESP_IDX(ring->calc_idx, ring->dma_size);
-		rxd = ring->dma + idx * eth->soc->rx.desc_size;
-		if (rxd->rxd2 & RX_DMA_DONE) {
-			ring->calc_idx_update = true;
-			return ring;
-		}
-	}
-
-	return NULL;
-}
-
-static void mtk_update_rx_cpu_idx(struct mtk_eth *eth)
-{
-	struct mtk_rx_ring *ring;
-	int i;
-
-	if (!eth->hwlro) {
-		ring = &eth->rx_ring[0];
-		mtk_w32(eth, ring->calc_idx, ring->crx_idx_reg);
-	} else {
-		for (i = 0; i < MTK_MAX_RX_RING_NUM; i++) {
-			ring = &eth->rx_ring[i];
-			if (ring->calc_idx_update) {
-				ring->calc_idx_update = false;
-				mtk_w32(eth, ring->calc_idx, ring->crx_idx_reg);
-			}
-		}
-	}
+	mtk_w32(eth, ring->calc_idx, ring->crx_idx_reg);
 }
 
 static bool mtk_page_pool_enabled(struct mtk_eth *eth)
@@ -1935,7 +1898,7 @@ static struct page_pool *mtk_create_page_pool(struct mtk_eth *eth,
 		return pp;
 
 	err = __xdp_rxq_info_reg(xdp_q, eth->dummy_dev, id,
-				 eth->rx_napi.napi_id, PAGE_SIZE);
+				 eth->rx_napi[id].napi.napi_id, PAGE_SIZE);
 	if (err < 0)
 		goto err_free_pp;
 
@@ -2224,7 +2187,8 @@ static int mtk_poll_rx(struct napi_struct *napi, int budget,
 		       struct mtk_eth *eth)
 {
 	struct dim_sample dim_sample = {};
-	struct mtk_rx_ring *ring;
+	struct mtk_napi *rx_napi = container_of(napi, struct mtk_napi, napi);
+	struct mtk_rx_ring *ring = rx_napi->rx_ring;
 	bool xdp_flush = false;
 	int idx;
 	struct sk_buff *skb;
@@ -2235,16 +2199,15 @@ static int mtk_poll_rx(struct napi_struct *napi, int budget,
 	dma_addr_t dma_addr = DMA_MAPPING_ERROR;
 	int ppe_idx = 0;
 
+	if (unlikely(!ring))
+		goto rx_done;
+
 	while (done < budget) {
 		unsigned int pktlen, *rxdcsum;
 		struct net_device *netdev;
 		u32 hash, reason;
 		int mac = 0;
 
-		ring = mtk_get_rx_ring(eth);
-		if (unlikely(!ring))
-			goto rx_done;
-
 		idx = NEXT_DESP_IDX(ring->calc_idx, ring->dma_size);
 		rxd = ring->dma + idx * eth->soc->rx.desc_size;
 		data = ring->data[idx];
@@ -2436,7 +2399,7 @@ static int mtk_poll_rx(struct napi_struct *napi, int budget,
 		 * we continue
 		 */
 		wmb();
-		mtk_update_rx_cpu_idx(eth);
+		mtk_update_rx_cpu_idx(eth, ring);
 	}
 
 	eth->rx_packets += done;
@@ -2645,7 +2608,9 @@ static int mtk_napi_tx(struct napi_struct *napi, int budget)
 
 static int mtk_napi_rx(struct napi_struct *napi, int budget)
 {
-	struct mtk_eth *eth = container_of(napi, struct mtk_eth, rx_napi);
+	struct mtk_napi *rx_napi = container_of(napi, struct mtk_napi, napi);
+	struct mtk_eth *eth = rx_napi->eth;
+	struct mtk_rx_ring *ring = rx_napi->rx_ring;
 	const struct mtk_reg_map *reg_map = eth->soc->reg_map;
 	int rx_done_total = 0;
 
@@ -2654,7 +2619,7 @@ static int mtk_napi_rx(struct napi_struct *napi, int budget)
 	do {
 		int rx_done;
 
-		mtk_w32(eth, eth->soc->rx.irq_done_mask,
+		mtk_w32(eth, MTK_RX_DONE_INT(eth, ring->ring_no),
 			reg_map->pdma.irq_status);
 		rx_done = mtk_poll_rx(napi, budget - rx_done_total, eth);
 		rx_done_total += rx_done;
@@ -2670,10 +2635,10 @@ static int mtk_napi_rx(struct napi_struct *napi, int budget)
 			return budget;
 
 	} while (mtk_r32(eth, reg_map->pdma.irq_status) &
-		 eth->soc->rx.irq_done_mask);
+		 MTK_RX_DONE_INT(eth, ring->ring_no));
 
 	if (napi_complete_done(napi, rx_done_total))
-		mtk_rx_irq_enable(eth, eth->soc->rx.irq_done_mask);
+		mtk_rx_irq_enable(eth, MTK_RX_DONE_INT(eth, ring->ring_no));
 
 	return rx_done_total;
 }
@@ -2918,6 +2883,7 @@ static int mtk_rx_alloc(struct mtk_eth *eth, int ring_no, int rx_flag)
 	else
 		ring->crx_idx_reg = reg_map->pdma.pcrx_ptr +
 				    ring_no * MTK_QRX_OFFSET;
+	ring->ring_no = ring_no;
 	/* make sure that all changes to the dma ring are flushed before we
 	 * continue
 	 */
@@ -2986,6 +2952,7 @@ static void mtk_rx_clean(struct mtk_eth *eth, struct mtk_rx_ring *ring, bool in_
 
 static int mtk_hwlro_rx_init(struct mtk_eth *eth)
 {
+	const struct mtk_reg_map *reg_map = eth->soc->reg_map;
 	int i;
 	u32 ring_ctrl_dw1 = 0, ring_ctrl_dw2 = 0, ring_ctrl_dw3 = 0;
 	u32 lro_ctrl_dw0 = 0, lro_ctrl_dw3 = 0;
@@ -3008,9 +2975,9 @@ static int mtk_hwlro_rx_init(struct mtk_eth *eth)
 	ring_ctrl_dw3 |= MTK_RING_MAX_AGG_CNT_H;
 
 	for (i = 1; i < MTK_MAX_RX_RING_NUM; i++) {
-		mtk_w32(eth, ring_ctrl_dw1, MTK_LRO_CTRL_DW1_CFG(i));
-		mtk_w32(eth, ring_ctrl_dw2, MTK_LRO_CTRL_DW2_CFG(i));
-		mtk_w32(eth, ring_ctrl_dw3, MTK_LRO_CTRL_DW3_CFG(i));
+		mtk_w32(eth, ring_ctrl_dw1, MTK_LRO_CTRL_DW1_CFG(reg_map, i));
+		mtk_w32(eth, ring_ctrl_dw2, MTK_LRO_CTRL_DW2_CFG(reg_map, i));
+		mtk_w32(eth, ring_ctrl_dw3, MTK_LRO_CTRL_DW3_CFG(reg_map, i));
 	}
 
 	/* IPv4 checksum update enable */
@@ -3046,6 +3013,7 @@ static int mtk_hwlro_rx_init(struct mtk_eth *eth)
 
 static void mtk_hwlro_rx_uninit(struct mtk_eth *eth)
 {
+	const struct mtk_reg_map *reg_map = eth->soc->reg_map;
 	int i;
 	u32 val;
 
@@ -3064,7 +3032,7 @@ static void mtk_hwlro_rx_uninit(struct mtk_eth *eth)
 
 	/* invalidate lro rings */
 	for (i = 1; i < MTK_MAX_RX_RING_NUM; i++)
-		mtk_w32(eth, 0, MTK_LRO_CTRL_DW2_CFG(i));
+		mtk_w32(eth, 0, MTK_LRO_CTRL_DW2_CFG(reg_map, i));
 
 	/* disable HW LRO */
 	mtk_w32(eth, 0, MTK_PDMA_LRO_CTRL_DW0);
@@ -3072,27 +3040,29 @@ static void mtk_hwlro_rx_uninit(struct mtk_eth *eth)
 
 static void mtk_hwlro_val_ipaddr(struct mtk_eth *eth, int idx, __be32 ip)
 {
+	const struct mtk_reg_map *reg_map = eth->soc->reg_map;
 	u32 reg_val;
 
-	reg_val = mtk_r32(eth, MTK_LRO_CTRL_DW2_CFG(idx));
+	reg_val = mtk_r32(eth, MTK_LRO_CTRL_DW2_CFG(reg_map, idx));
 
 	/* invalidate the IP setting */
-	mtk_w32(eth, (reg_val & ~MTK_RING_MYIP_VLD), MTK_LRO_CTRL_DW2_CFG(idx));
+	mtk_w32(eth, (reg_val & ~MTK_RING_MYIP_VLD), MTK_LRO_CTRL_DW2_CFG(reg_map, idx));
 
 	mtk_w32(eth, ip, MTK_LRO_DIP_DW0_CFG(idx));
 
 	/* validate the IP setting */
-	mtk_w32(eth, (reg_val | MTK_RING_MYIP_VLD), MTK_LRO_CTRL_DW2_CFG(idx));
+	mtk_w32(eth, (reg_val | MTK_RING_MYIP_VLD), MTK_LRO_CTRL_DW2_CFG(reg_map, idx));
 }
 
 static void mtk_hwlro_inval_ipaddr(struct mtk_eth *eth, int idx)
 {
+	const struct mtk_reg_map *reg_map = eth->soc->reg_map;
 	u32 reg_val;
 
-	reg_val = mtk_r32(eth, MTK_LRO_CTRL_DW2_CFG(idx));
+	reg_val = mtk_r32(eth, MTK_LRO_CTRL_DW2_CFG(reg_map, idx));
 
 	/* invalidate the IP setting */
-	mtk_w32(eth, (reg_val & ~MTK_RING_MYIP_VLD), MTK_LRO_CTRL_DW2_CFG(idx));
+	mtk_w32(eth, (reg_val & ~MTK_RING_MYIP_VLD), MTK_LRO_CTRL_DW2_CFG(reg_map, idx));
 
 	mtk_w32(eth, 0, MTK_LRO_DIP_DW0_CFG(idx));
 }
@@ -3222,6 +3192,105 @@ static int mtk_hwlro_get_fdir_all(struct net_device *dev,
 	return 0;
 }
 
+static u32 mtk_rss_indr_table(struct mtk_rss_params *rss_params, int index)
+{
+	u32 val = 0;
+	int i;
+
+	for (i = 16 * index; i < 16 * index + 16; i++)
+		val |= (rss_params->indirection_table[i] << (2 * (i % 16)));
+
+	return val;
+}
+
+static int mtk_rss_init(struct mtk_eth *eth)
+{
+	const struct mtk_soc_data *soc = eth->soc;
+	const struct mtk_reg_map *reg_map = eth->soc->reg_map;
+	struct mtk_rss_params *rss_params = &eth->rss_params;
+	u32 val;
+	int i;
+
+	netdev_rss_key_fill(rss_params->hash_key, MTK_RSS_HASH_KEYSIZE);
+
+	for (i = 0; i < MTK_RSS_MAX_INDIRECTION_TABLE; i++)
+		rss_params->indirection_table[i] = ethtool_rxfh_indir_default(i, eth->soc->rss_num);
+
+	if (soc->rx.desc_size == sizeof(struct mtk_rx_dma)) {
+		/* Set RSS rings to PSE modes */
+		for (i = 1; i <= MTK_HW_LRO_RING_NUM(eth); i++) {
+			val = mtk_r32(eth, MTK_LRO_CTRL_DW2_CFG(reg_map, i));
+			val |= MTK_RING_PSE_MODE;
+			mtk_w32(eth, val, MTK_LRO_CTRL_DW2_CFG(reg_map, i));
+		}
+
+		/* Enable non-lro multiple rx */
+		val = mtk_r32(eth, reg_map->pdma.lro_ctrl_dw0);
+		val |= MTK_NON_LRO_MULTI_EN;
+		mtk_w32(eth, val, reg_map->pdma.lro_ctrl_dw0);
+
+		/* Enable RSS dly int supoort */
+		val |= MTK_LRO_DLY_INT_EN;
+		mtk_w32(eth, val, reg_map->pdma.lro_ctrl_dw0);
+	}
+
+	/* Hash Type */
+	val = mtk_r32(eth, reg_map->pdma.rss_glo_cfg);
+	val |= MTK_RSS_IPV4_STATIC_HASH;
+	val |= MTK_RSS_IPV6_STATIC_HASH;
+	mtk_w32(eth, val, reg_map->pdma.rss_glo_cfg);
+
+	/* Hash Key */
+	for (i = 0; i < MTK_RSS_HASH_KEYSIZE / sizeof(u32); i++)
+		mtk_w32(eth, rss_params->hash_key[i], MTK_RSS_HASH_KEY_DW(reg_map, i));
+
+	/* Select the size of indirection table */
+	for (i = 0; i < MTK_RSS_MAX_INDIRECTION_TABLE / 16; i++)
+		mtk_w32(eth, mtk_rss_indr_table(rss_params, i),
+			MTK_RSS_INDR_TABLE_DW(reg_map, i));
+
+	/* Pause */
+	val |= MTK_RSS_CFG_REQ;
+	mtk_w32(eth, val, reg_map->pdma.rss_glo_cfg);
+
+	/* Enable RSS */
+	val |= MTK_RSS_EN;
+	mtk_w32(eth, val, reg_map->pdma.rss_glo_cfg);
+
+	/* Release pause */
+	val &= ~(MTK_RSS_CFG_REQ);
+	mtk_w32(eth, val, reg_map->pdma.rss_glo_cfg);
+
+	/* Set perRSS GRP INT */
+	mtk_m32(eth, MTK_RX_DONE_INT(eth, MTK_RSS_RING(1)),
+		MTK_RX_DONE_INT(eth, MTK_RSS_RING(1)), reg_map->pdma.int_grp);
+	mtk_m32(eth, MTK_RX_DONE_INT(eth, MTK_RSS_RING(2)),
+		MTK_RX_DONE_INT(eth, MTK_RSS_RING(2)), reg_map->pdma.int_grp + 0x4);
+	mtk_m32(eth, MTK_RX_DONE_INT(eth, MTK_RSS_RING(3)),
+		MTK_RX_DONE_INT(eth, MTK_RSS_RING(3)), reg_map->pdma.int_grp3);
+
+	return 0;
+}
+
+static void mtk_rss_uninit(struct mtk_eth *eth)
+{
+	const struct mtk_reg_map *reg_map = eth->soc->reg_map;
+	u32 val;
+
+	/* Pause */
+	val = mtk_r32(eth, reg_map->pdma.rss_glo_cfg);
+	val |= MTK_RSS_CFG_REQ;
+	mtk_w32(eth, val, reg_map->pdma.rss_glo_cfg);
+
+	/* Disable RSS */
+	val &= ~(MTK_RSS_EN);
+	mtk_w32(eth, val, reg_map->pdma.rss_glo_cfg);
+
+	/* Release pause */
+	val &= ~(MTK_RSS_CFG_REQ);
+	mtk_w32(eth, val, reg_map->pdma.rss_glo_cfg);
+}
+
 static netdev_features_t mtk_fix_features(struct net_device *dev,
 					  netdev_features_t features)
 {
@@ -3312,6 +3381,17 @@ static int mtk_dma_init(struct mtk_eth *eth)
 			return err;
 	}
 
+	if (MTK_HAS_CAPS(eth->soc->caps, MTK_RSS)) {
+		for (i = 1; i < MTK_RX_RSS_NUM(eth); i++) {
+			err = mtk_rx_alloc(eth, MTK_RSS_RING(i), MTK_RX_FLAGS_NORMAL);
+			if (err)
+				return err;
+		}
+		err = mtk_rss_init(eth);
+		if (err)
+			return err;
+	}
+
 	if (MTK_HAS_CAPS(eth->soc->caps, MTK_QDMA)) {
 		/* Enable random early drop and set drop threshold
 		 * automatically
@@ -3358,6 +3438,12 @@ static void mtk_dma_free(struct mtk_eth *eth)
 			mtk_rx_clean(eth, &eth->rx_ring[i], false);
 	}
 
+	if (MTK_HAS_CAPS(eth->soc->caps, MTK_RSS)) {
+		mtk_rss_uninit(eth);
+		for (i = 1; i < MTK_RX_RSS_NUM(eth); i++)
+			mtk_rx_clean(eth, &eth->rx_ring[MTK_RSS_RING(i)], true);
+	}
+
 	for (i = 0; i < DIV_ROUND_UP(soc->tx.fq_dma_size, MTK_FQ_DMA_LENGTH); i++) {
 		kfree(eth->scratch_head[i]);
 		eth->scratch_head[i] = NULL;
@@ -3390,23 +3476,23 @@ static void mtk_tx_timeout(struct net_device *dev, unsigned int txqueue)
 	schedule_work(&eth->pending_work);
 }
 
-static int mtk_get_irqs(struct platform_device *pdev, struct mtk_eth *eth)
+static int mtk_get_irqs_fe(struct platform_device *pdev, struct mtk_eth *eth)
 {
 	int i;
 
 	/* future SoCs beginning with MT7988 should use named IRQs in dts */
-	eth->irq[MTK_FE_IRQ_TX] = platform_get_irq_byname_optional(pdev, "fe1");
-	eth->irq[MTK_FE_IRQ_RX] = platform_get_irq_byname_optional(pdev, "fe2");
-	if (eth->irq[MTK_FE_IRQ_TX] >= 0 && eth->irq[MTK_FE_IRQ_RX] >= 0)
+	eth->irq_fe[MTK_FE_IRQ_TX] = platform_get_irq_byname_optional(pdev, "fe1");
+	eth->irq_fe[MTK_FE_IRQ_RX] = platform_get_irq_byname_optional(pdev, "fe2");
+	if (eth->irq_fe[MTK_FE_IRQ_TX] >= 0 && eth->irq_fe[MTK_FE_IRQ_RX] >= 0)
 		return 0;
 
 	/* only use legacy mode if platform_get_irq_byname_optional returned -ENXIO */
-	if (eth->irq[MTK_FE_IRQ_TX] != -ENXIO)
-		return dev_err_probe(&pdev->dev, eth->irq[MTK_FE_IRQ_TX],
+	if (eth->irq_fe[MTK_FE_IRQ_TX] != -ENXIO)
+		return dev_err_probe(&pdev->dev, eth->irq_fe[MTK_FE_IRQ_TX],
 				     "Error requesting FE TX IRQ\n");
 
-	if (eth->irq[MTK_FE_IRQ_RX] != -ENXIO)
-		return dev_err_probe(&pdev->dev, eth->irq[MTK_FE_IRQ_RX],
+	if (eth->irq_fe[MTK_FE_IRQ_RX] != -ENXIO)
+		return dev_err_probe(&pdev->dev, eth->irq_fe[MTK_FE_IRQ_RX],
 				     "Error requesting FE RX IRQ\n");
 
 	if (!MTK_HAS_CAPS(eth->soc->caps, MTK_SHARED_INT))
@@ -3421,14 +3507,14 @@ static int mtk_get_irqs(struct platform_device *pdev, struct mtk_eth *eth)
 	for (i = 0; i < MTK_FE_IRQ_NUM; i++) {
 		if (MTK_HAS_CAPS(eth->soc->caps, MTK_SHARED_INT)) {
 			if (i == MTK_FE_IRQ_SHARED)
-				eth->irq[MTK_FE_IRQ_SHARED] = platform_get_irq(pdev, i);
+				eth->irq_fe[MTK_FE_IRQ_SHARED] = platform_get_irq(pdev, i);
 			else
-				eth->irq[i] = eth->irq[MTK_FE_IRQ_SHARED];
+				eth->irq_fe[i] = eth->irq_fe[MTK_FE_IRQ_SHARED];
 		} else {
-			eth->irq[i] = platform_get_irq(pdev, i + 1);
+			eth->irq_fe[i] = platform_get_irq(pdev, i + 1);
 		}
 
-		if (eth->irq[i] < 0) {
+		if (eth->irq_fe[i] < 0) {
 			dev_err(&pdev->dev, "no IRQ%d resource found\n", i);
 			return -ENXIO;
 		}
@@ -3437,14 +3523,36 @@ static int mtk_get_irqs(struct platform_device *pdev, struct mtk_eth *eth)
 	return 0;
 }
 
-static irqreturn_t mtk_handle_irq_rx(int irq, void *_eth)
+static int mtk_get_irqs_pdma(struct platform_device *pdev, struct mtk_eth *eth)
 {
-	struct mtk_eth *eth = _eth;
+	char rxring[] = "pdma0";
+	int i;
+
+	for (i = 0; i < MTK_PDMA_IRQ_NUM; i++) {
+		rxring[4] = '0' + i;
+		eth->irq_pdma[i] = platform_get_irq_byname(pdev, rxring);
+		if (eth->irq_pdma[i] < 0)
+			return eth->irq_pdma[i];
+	}
+
+	return 0;
+}
+
+static irqreturn_t mtk_handle_irq_rx(int irq, void *priv)
+{
+	struct mtk_napi *rx_napi = priv;
+	struct mtk_eth *eth = rx_napi->eth;
+	struct mtk_rx_ring *ring = rx_napi->rx_ring;
 
 	eth->rx_events++;
-	if (likely(napi_schedule_prep(&eth->rx_napi))) {
-		mtk_rx_irq_disable(eth, eth->soc->rx.irq_done_mask);
-		__napi_schedule(&eth->rx_napi);
+	if (unlikely(!(mtk_r32(eth, eth->soc->reg_map->pdma.irq_status) &
+		       mtk_r32(eth, eth->soc->reg_map->pdma.irq_mask) &
+		       MTK_RX_DONE_INT(eth, ring->ring_no))))
+		return IRQ_NONE;
+
+	if (likely(napi_schedule_prep(&rx_napi->napi))) {
+		mtk_rx_irq_disable(eth, MTK_RX_DONE_INT(eth, ring->ring_no));
+		__napi_schedule(&rx_napi->napi);
 	}
 
 	return IRQ_HANDLED;
@@ -3469,10 +3577,10 @@ static irqreturn_t mtk_handle_irq(int irq, void *_eth)
 	const struct mtk_reg_map *reg_map = eth->soc->reg_map;
 
 	if (mtk_r32(eth, reg_map->pdma.irq_mask) &
-	    eth->soc->rx.irq_done_mask) {
+	    MTK_RX_DONE_INT(eth, 0)) {
 		if (mtk_r32(eth, reg_map->pdma.irq_status) &
-		    eth->soc->rx.irq_done_mask)
-			mtk_handle_irq_rx(irq, _eth);
+		    MTK_RX_DONE_INT(eth, 0))
+			mtk_handle_irq_rx(irq, &eth->rx_napi[0]);
 	}
 	if (mtk_r32(eth, reg_map->tx_irq_mask) & MTK_TX_DONE_INT) {
 		if (mtk_r32(eth, reg_map->tx_irq_status) & MTK_TX_DONE_INT)
@@ -3489,10 +3597,10 @@ static void mtk_poll_controller(struct net_device *dev)
 	struct mtk_eth *eth = mac->hw;
 
 	mtk_tx_irq_disable(eth, MTK_TX_DONE_INT);
-	mtk_rx_irq_disable(eth, eth->soc->rx.irq_done_mask);
-	mtk_handle_irq_rx(eth->irq[MTK_FE_IRQ_RX], dev);
+	mtk_rx_irq_disable(eth, MTK_RX_DONE_INT(eth, 0));
+	mtk_handle_irq_rx(eth->irq_fe[MTK_FE_IRQ_RX], &eth->rx_napi[0]);
 	mtk_tx_irq_enable(eth, MTK_TX_DONE_INT);
-	mtk_rx_irq_enable(eth, eth->soc->rx.irq_done_mask);
+	mtk_rx_irq_enable(eth, MTK_RX_DONE_INT(eth, 0));
 }
 #endif
 
@@ -3679,9 +3787,17 @@ static int mtk_open(struct net_device *dev)
 			mtk_ppe_update_mtu(eth->ppe[i], mtu);
 
 		napi_enable(&eth->tx_napi);
-		napi_enable(&eth->rx_napi);
+		napi_enable(&eth->rx_napi[0].napi);
 		mtk_tx_irq_enable(eth, MTK_TX_DONE_INT);
-		mtk_rx_irq_enable(eth, soc->rx.irq_done_mask);
+		mtk_rx_irq_enable(eth, MTK_RX_DONE_INT(eth, 0));
+
+		if (MTK_HAS_CAPS(eth->soc->caps, MTK_RSS)) {
+			for (i = 1; i < MTK_RX_RSS_NUM(eth); i++) {
+				napi_enable(&eth->rx_napi[MTK_RSS_RING(i)].napi);
+				mtk_rx_irq_enable(eth, MTK_RX_DONE_INT(eth, MTK_RSS_RING(i)));
+			}
+		}
+
 		refcount_set(&eth->dma_refcnt, 1);
 	} else {
 		refcount_inc(&eth->dma_refcnt);
@@ -3766,9 +3882,16 @@ static int mtk_stop(struct net_device *dev)
 		mtk_gdm_config(eth, i, MTK_GDMA_DROP_ALL);
 
 	mtk_tx_irq_disable(eth, MTK_TX_DONE_INT);
-	mtk_rx_irq_disable(eth, eth->soc->rx.irq_done_mask);
+	mtk_rx_irq_disable(eth, MTK_RX_DONE_INT(eth, 0));
 	napi_disable(&eth->tx_napi);
-	napi_disable(&eth->rx_napi);
+	napi_disable(&eth->rx_napi[0].napi);
+
+	if (MTK_HAS_CAPS(eth->soc->caps, MTK_RSS)) {
+		for (i = 1; i < MTK_RX_RSS_NUM(eth); i++) {
+			mtk_rx_irq_disable(eth, MTK_RX_DONE_INT(eth, MTK_RSS_RING(i)));
+			napi_disable(&eth->rx_napi[MTK_RSS_RING(i)].napi);
+		}
+	}
 
 	cancel_work_sync(&eth->rx_dim.work);
 	cancel_work_sync(&eth->tx_dim.work);
@@ -3888,9 +4011,7 @@ static void mtk_dim_rx(struct work_struct *work)
 						dim->profile_ix);
 	spin_lock_bh(&eth->dim_lock);
 
-	val = mtk_r32(eth, reg_map->pdma.delay_irq);
-	val &= MTK_PDMA_DELAY_TX_MASK;
-	val |= MTK_PDMA_DELAY_RX_EN;
+	val = MTK_PDMA_DELAY_RX_EN;
 
 	cur = min_t(u32, DIV_ROUND_UP(cur_profile.usec, 20), MTK_PDMA_DELAY_PTIME_MASK);
 	val |= cur << MTK_PDMA_DELAY_RX_PTIME_SHIFT;
@@ -3898,9 +4019,19 @@ static void mtk_dim_rx(struct work_struct *work)
 	cur = min_t(u32, cur_profile.pkts, MTK_PDMA_DELAY_PINT_MASK);
 	val |= cur << MTK_PDMA_DELAY_RX_PINT_SHIFT;
 
-	mtk_w32(eth, val, reg_map->pdma.delay_irq);
 	if (MTK_HAS_CAPS(eth->soc->caps, MTK_QDMA))
-		mtk_w32(eth, val, reg_map->qdma.delay_irq);
+		mtk_m32(eth, MTK_PDMA_DELAY_TX_MASK,
+			val << MTK_PDMA_DELAY_TX_PTIME_SHIFT, reg_map->qdma.delay_irq);
+
+	if (eth->soc->rx.desc_size == sizeof(struct mtk_rx_dma)) {
+		mtk_m32(eth, MTK_PDMA_DELAY_RX_MASK, val, reg_map->pdma.delay_irq);
+		mtk_w32(eth, val, reg_map->pdma.lro_rx1_dly_int);
+		mtk_w32(eth, val, reg_map->pdma.lro_rx1_dly_int + 0x4);
+		mtk_w32(eth, val, reg_map->pdma.lro_rx1_dly_int + 0x8);
+	} else {
+		val = val | (val << MTK_PDMA_DELAY_RX_RING_SHIFT);
+		mtk_w32(eth, val, reg_map->pdma.rx_delay_irq);
+	}
 
 	spin_unlock_bh(&eth->dim_lock);
 
@@ -3919,9 +4050,7 @@ static void mtk_dim_tx(struct work_struct *work)
 						dim->profile_ix);
 	spin_lock_bh(&eth->dim_lock);
 
-	val = mtk_r32(eth, reg_map->pdma.delay_irq);
-	val &= MTK_PDMA_DELAY_RX_MASK;
-	val |= MTK_PDMA_DELAY_TX_EN;
+	val = MTK_PDMA_DELAY_TX_EN;
 
 	cur = min_t(u32, DIV_ROUND_UP(cur_profile.usec, 20), MTK_PDMA_DELAY_PTIME_MASK);
 	val |= cur << MTK_PDMA_DELAY_TX_PTIME_SHIFT;
@@ -3929,9 +4058,16 @@ static void mtk_dim_tx(struct work_struct *work)
 	cur = min_t(u32, cur_profile.pkts, MTK_PDMA_DELAY_PINT_MASK);
 	val |= cur << MTK_PDMA_DELAY_TX_PINT_SHIFT;
 
-	mtk_w32(eth, val, reg_map->pdma.delay_irq);
 	if (MTK_HAS_CAPS(eth->soc->caps, MTK_QDMA))
-		mtk_w32(eth, val, reg_map->qdma.delay_irq);
+		mtk_m32(eth, MTK_PDMA_DELAY_RX_MASK,
+			val >> MTK_PDMA_DELAY_TX_PTIME_SHIFT, reg_map->qdma.delay_irq);
+
+	if (eth->soc->rx.desc_size == sizeof(struct mtk_rx_dma)) {
+		mtk_m32(eth, MTK_PDMA_DELAY_TX_MASK, val, reg_map->pdma.delay_irq);
+	} else {
+		mtk_w32(eth, val >> MTK_PDMA_DELAY_TX_PTIME_SHIFT,
+			reg_map->pdma.tx_delay_irq);
+	}
 
 	spin_unlock_bh(&eth->dim_lock);
 
@@ -4149,6 +4285,25 @@ static void mtk_hw_reset_monitor_work(struct work_struct *work)
 			      MTK_DMA_MONITOR_TIMEOUT);
 }
 
+static int mtk_napi_init(struct mtk_eth *eth)
+{
+	struct mtk_napi *rx_napi = &eth->rx_napi[0];
+	int i;
+
+	rx_napi->eth = eth;
+	rx_napi->rx_ring = &eth->rx_ring[0];
+
+	if (MTK_HAS_CAPS(eth->soc->caps, MTK_RSS)) {
+		for (i = 1; i < MTK_RX_RSS_NUM(eth); i++) {
+			rx_napi = &eth->rx_napi[MTK_RSS_RING(i)];
+			rx_napi->eth = eth;
+			rx_napi->rx_ring = &eth->rx_ring[MTK_RSS_RING(i)];
+		}
+	}
+
+	return 0;
+}
+
 static int mtk_hw_init(struct mtk_eth *eth, bool reset)
 {
 	u32 dma_mask = ETHSYS_DMA_AG_MAP_PDMA | ETHSYS_DMA_AG_MAP_QDMA |
@@ -4238,12 +4393,11 @@ static int mtk_hw_init(struct mtk_eth *eth, bool reset)
 	 */
 	val = mtk_r32(eth, MTK_CDMQ_IG_CTRL);
 	mtk_w32(eth, val | MTK_CDMQ_STAG_EN, MTK_CDMQ_IG_CTRL);
-	if (mtk_is_netsys_v1(eth)) {
-		val = mtk_r32(eth, MTK_CDMP_IG_CTRL);
-		mtk_w32(eth, val | MTK_CDMP_STAG_EN, MTK_CDMP_IG_CTRL);
+	val = mtk_r32(eth, MTK_CDMP_IG_CTRL);
+	mtk_w32(eth, val | MTK_CDMP_STAG_EN, MTK_CDMP_IG_CTRL);
 
+	if (mtk_is_netsys_v1(eth))
 		mtk_w32(eth, 1, MTK_CDMP_EG_CTRL);
-	}
 
 	/* set interrupt delays based on current Net DIM sample */
 	mtk_dim_rx(&eth->rx_dim.work);
@@ -4254,11 +4408,17 @@ static int mtk_hw_init(struct mtk_eth *eth, bool reset)
 	mtk_rx_irq_disable(eth, ~0);
 
 	/* FE int grouping */
-	mtk_w32(eth, MTK_TX_DONE_INT, reg_map->pdma.int_grp);
-	mtk_w32(eth, eth->soc->rx.irq_done_mask, reg_map->pdma.int_grp + 4);
+
 	mtk_w32(eth, MTK_TX_DONE_INT, reg_map->qdma.int_grp);
-	mtk_w32(eth, eth->soc->rx.irq_done_mask, reg_map->qdma.int_grp + 4);
-	mtk_w32(eth, 0x21021000, MTK_FE_INT_GRP);
+	mtk_w32(eth, MTK_RX_DONE_INT(eth, 0), reg_map->qdma.int_grp + 4);
+
+	if (MTK_HAS_CAPS(eth->soc->caps, MTK_PDMA_INT)) {
+		mtk_w32(eth, 0x210FFFF2, MTK_FE_INT_GRP);
+	} else {
+		mtk_w32(eth, MTK_TX_DONE_INT, reg_map->pdma.int_grp);
+		mtk_w32(eth, MTK_RX_DONE_INT(eth, 0), reg_map->pdma.int_grp + 4);
+		mtk_w32(eth, 0x21021000, MTK_FE_INT_GRP);
+	}
 
 	if (mtk_is_netsys_v3_or_greater(eth)) {
 		/* PSE dummy page mechanism */
@@ -4700,8 +4860,13 @@ static void mtk_get_ethtool_stats(struct net_device *dev,
 
 static u32 mtk_get_rx_ring_count(struct net_device *dev)
 {
+	struct mtk_mac *mac = netdev_priv(dev);
+	struct mtk_eth *eth = mac->hw;
+
 	if (dev->hw_features & NETIF_F_LRO)
 		return MTK_MAX_RX_RING_NUM;
+	else if (MTK_HAS_CAPS(eth->soc->caps, MTK_RSS))
+		return MTK_RX_RSS_NUM(eth);
 
 	return 0;
 }
@@ -4784,6 +4949,70 @@ static int mtk_set_eee(struct net_device *dev, struct ethtool_keee *eee)
 	return phylink_ethtool_set_eee(mac->phylink, eee);
 }
 
+static u32 mtk_get_rxfh_key_size(struct net_device *dev)
+{
+	return MTK_RSS_HASH_KEYSIZE;
+}
+
+static u32 mtk_get_rxfh_indir_size(struct net_device *dev)
+{
+	return MTK_RSS_MAX_INDIRECTION_TABLE;
+}
+
+static int mtk_get_rxfh(struct net_device *dev, struct ethtool_rxfh_param *rxfh)
+{
+	struct mtk_mac *mac = netdev_priv(dev);
+	struct mtk_eth *eth = mac->hw;
+	struct mtk_rss_params *rss_params = &eth->rss_params;
+	int i;
+
+	rxfh->hfunc = ETH_RSS_HASH_TOP;	/* Toeplitz */
+
+	if (rxfh->key) {
+		memcpy(rxfh->key, rss_params->hash_key,
+		       sizeof(rss_params->hash_key));
+	}
+
+	if (rxfh->indir) {
+		for (i = 0; i < MTK_RSS_MAX_INDIRECTION_TABLE; i++)
+			rxfh->indir[i] = rss_params->indirection_table[i];
+	}
+
+	return 0;
+}
+
+static int mtk_set_rxfh(struct net_device *dev, struct ethtool_rxfh_param *rxfh,
+			struct netlink_ext_ack *extack)
+{
+	struct mtk_mac *mac = netdev_priv(dev);
+	struct mtk_eth *eth = mac->hw;
+	struct mtk_rss_params *rss_params = &eth->rss_params;
+	const struct mtk_reg_map *reg_map = eth->soc->reg_map;
+	int i;
+
+	if (rxfh->hfunc != ETH_RSS_HASH_NO_CHANGE &&
+	    rxfh->hfunc != ETH_RSS_HASH_TOP)
+		return -EOPNOTSUPP;
+
+	if (rxfh->key) {
+		memcpy(rss_params->hash_key, rxfh->key,
+		       sizeof(rss_params->hash_key));
+		for (i = 0; i < MTK_RSS_HASH_KEYSIZE / sizeof(u32); i++)
+			mtk_w32(eth, rss_params->hash_key[i],
+				MTK_RSS_HASH_KEY_DW(reg_map, i));
+	}
+
+	if (rxfh->indir) {
+		for (i = 0; i < MTK_RSS_MAX_INDIRECTION_TABLE; i++)
+			rss_params->indirection_table[i] = rxfh->indir[i];
+		for (i = 0; i < MTK_RSS_MAX_INDIRECTION_TABLE / 16; i++)
+			mtk_w32(eth, mtk_rss_indr_table(rss_params, i),
+				MTK_RSS_INDR_TABLE_DW(reg_map, i));
+	}
+
+	return 0;
+}
+
 static u16 mtk_select_queue(struct net_device *dev, struct sk_buff *skb,
 			    struct net_device *sb_dev)
 {
@@ -4819,6 +5048,10 @@ static const struct ethtool_ops mtk_ethtool_ops = {
 	.get_rx_ring_count	= mtk_get_rx_ring_count,
 	.get_eee		= mtk_get_eee,
 	.set_eee		= mtk_set_eee,
+	.get_rxfh_key_size	= mtk_get_rxfh_key_size,
+	.get_rxfh_indir_size	= mtk_get_rxfh_indir_size,
+	.get_rxfh		= mtk_get_rxfh,
+	.set_rxfh		= mtk_set_rxfh,
 };
 
 static const struct net_device_ops mtk_netdev_ops = {
@@ -5012,7 +5245,7 @@ static int mtk_add_mac(struct mtk_eth *eth, struct device_node *np)
 	eth->netdev[id]->features |= eth->soc->hw_features;
 	eth->netdev[id]->ethtool_ops = &mtk_ethtool_ops;
 
-	eth->netdev[id]->irq = eth->irq[MTK_FE_IRQ_SHARED];
+	eth->netdev[id]->irq = eth->irq_fe[MTK_FE_IRQ_SHARED];
 	eth->netdev[id]->dev.of_node = np;
 
 	if (MTK_HAS_CAPS(eth->soc->caps, MTK_SOC_MT7628))
@@ -5120,6 +5353,7 @@ static int mtk_probe(struct platform_device *pdev)
 	struct resource *res = NULL;
 	struct device_node *mac_np;
 	struct mtk_eth *eth;
+	char *irqname;
 	int err, i;
 
 	eth = devm_kzalloc(&pdev->dev, sizeof(*eth), GFP_KERNEL);
@@ -5251,10 +5485,16 @@ static int mtk_probe(struct platform_device *pdev)
 		}
 	}
 
-	err = mtk_get_irqs(pdev, eth);
+	err = mtk_get_irqs_fe(pdev, eth);
 	if (err)
 		goto err_wed_exit;
 
+	if (MTK_HAS_CAPS(eth->soc->caps, MTK_PDMA_INT)) {
+		err = mtk_get_irqs_pdma(pdev, eth);
+		if (err)
+			goto err_wed_exit;
+	}
+
 	for (i = 0; i < ARRAY_SIZE(eth->clks); i++) {
 		eth->clks[i] = devm_clk_get(eth->dev,
 					    mtk_clks_source_name[i]);
@@ -5297,23 +5537,56 @@ static int mtk_probe(struct platform_device *pdev)
 		}
 	}
 
+	err = mtk_napi_init(eth);
+	if (err)
+		goto err_free_dev;
+
 	if (MTK_HAS_CAPS(eth->soc->caps, MTK_SHARED_INT)) {
-		err = devm_request_irq(eth->dev, eth->irq[MTK_FE_IRQ_SHARED],
+		err = devm_request_irq(eth->dev, eth->irq_fe[MTK_FE_IRQ_SHARED],
 				       mtk_handle_irq, 0,
 				       dev_name(eth->dev), eth);
 	} else {
-		err = devm_request_irq(eth->dev, eth->irq[MTK_FE_IRQ_TX],
+		irqname = devm_kasprintf(eth->dev, GFP_KERNEL, "%s TX",
+					 dev_name(eth->dev));
+		err = devm_request_irq(eth->dev, eth->irq_fe[MTK_FE_IRQ_TX],
 				       mtk_handle_irq_tx, 0,
-				       dev_name(eth->dev), eth);
+				       irqname, eth);
 		if (err)
 			goto err_free_dev;
 
-		err = devm_request_irq(eth->dev, eth->irq[MTK_FE_IRQ_RX],
-				       mtk_handle_irq_rx, 0,
-				       dev_name(eth->dev), eth);
+		if (MTK_HAS_CAPS(eth->soc->caps, MTK_PDMA_INT)) {
+			irqname = devm_kasprintf(eth->dev, GFP_KERNEL, "%s PDMA RX %d",
+						 dev_name(eth->dev), 0);
+			err = devm_request_irq(eth->dev, eth->irq_pdma[0],
+					       mtk_handle_irq_rx, IRQF_SHARED,
+					       irqname, &eth->rx_napi[0]);
+			if (err)
+				goto err_free_dev;
+
+			if (MTK_HAS_CAPS(eth->soc->caps, MTK_RSS)) {
+				for (i = 1; i < MTK_RX_RSS_NUM(eth); i++) {
+					irqname = devm_kasprintf(eth->dev, GFP_KERNEL,
+								 "%s PDMA RX %d",
+								 dev_name(eth->dev), i);
+					err = devm_request_irq(eth->dev,
+							       eth->irq_pdma[MTK_RSS_RING(i)],
+							       mtk_handle_irq_rx, IRQF_SHARED,
+							       irqname,
+							       &eth->rx_napi[MTK_RSS_RING(i)]);
+					if (err)
+						goto err_free_dev;
+				}
+			}
+		} else {
+			irqname = devm_kasprintf(eth->dev, GFP_KERNEL, "%s RX",
+						 dev_name(eth->dev));
+			err = devm_request_irq(eth->dev, eth->irq_fe[MTK_FE_IRQ_RX],
+					       mtk_handle_irq_rx, 0,
+					       irqname, &eth->rx_napi[0]);
+			if (err)
+				goto err_free_dev;
+		}
 	}
-	if (err)
-		goto err_free_dev;
 
 	/* No MT7628/88 support yet */
 	if (!MTK_HAS_CAPS(eth->soc->caps, MTK_SOC_MT7628)) {
@@ -5354,7 +5627,7 @@ static int mtk_probe(struct platform_device *pdev)
 		} else
 			netif_info(eth, probe, eth->netdev[i],
 				   "mediatek frame engine at 0x%08lx, irq %d\n",
-				   eth->netdev[i]->base_addr, eth->irq[MTK_FE_IRQ_SHARED]);
+				   eth->netdev[i]->base_addr, eth->irq_fe[MTK_FE_IRQ_SHARED]);
 	}
 
 	/* we run 2 devices on the same DMA ring so we need a dummy device
@@ -5367,7 +5640,13 @@ static int mtk_probe(struct platform_device *pdev)
 		goto err_unreg_netdev;
 	}
 	netif_napi_add(eth->dummy_dev, &eth->tx_napi, mtk_napi_tx);
-	netif_napi_add(eth->dummy_dev, &eth->rx_napi, mtk_napi_rx);
+	netif_napi_add(eth->dummy_dev, &eth->rx_napi[0].napi, mtk_napi_rx);
+
+	if (MTK_HAS_CAPS(eth->soc->caps, MTK_RSS)) {
+		for (i = 1; i < MTK_RX_RSS_NUM(eth); i++)
+			netif_napi_add(eth->dummy_dev, &eth->rx_napi[MTK_RSS_RING(i)].napi,
+				       mtk_napi_rx);
+	}
 
 	platform_set_drvdata(pdev, eth);
 	schedule_delayed_work(&eth->reset.monitor_work,
@@ -5411,7 +5690,12 @@ static void mtk_remove(struct platform_device *pdev)
 	mtk_hw_deinit(eth);
 
 	netif_napi_del(&eth->tx_napi);
-	netif_napi_del(&eth->rx_napi);
+	netif_napi_del(&eth->rx_napi[0].napi);
+
+	if (MTK_HAS_CAPS(eth->soc->caps, MTK_RSS)) {
+		for (i = 1; i < MTK_RX_RSS_NUM(eth); i++)
+			netif_napi_del(&eth->rx_napi[MTK_RSS_RING(i)].napi);
+	}
 	mtk_cleanup(eth);
 	free_netdev(eth->dummy_dev);
 	mtk_mdio_cleanup(eth);
@@ -5424,6 +5708,7 @@ static const struct mtk_soc_data mt2701_data = {
 	.required_clks = MT7623_CLKS_BITMAP,
 	.required_pctl = true,
 	.version = 1,
+	.rss_num = 0,
 	.tx = {
 		.desc_size = sizeof(struct mtk_tx_dma),
 		.dma_max_len = MTK_TX_DMA_BUF_LEN,
@@ -5433,7 +5718,6 @@ static const struct mtk_soc_data mt2701_data = {
 	},
 	.rx = {
 		.desc_size = sizeof(struct mtk_rx_dma),
-		.irq_done_mask = MTK_RX_DONE_INT,
 		.dma_l4_valid = RX_DMA_L4_VALID,
 		.dma_size = MTK_DMA_SIZE(2K),
 		.dma_max_len = MTK_TX_DMA_BUF_LEN,
@@ -5452,6 +5736,7 @@ static const struct mtk_soc_data mt7621_data = {
 	.ppe_num = 1,
 	.hash_offset = 2,
 	.foe_entry_size = MTK_FOE_ENTRY_V1_SIZE,
+	.rss_num = 0,
 	.tx = {
 		.desc_size = sizeof(struct mtk_tx_dma),
 		.dma_max_len = MTK_TX_DMA_BUF_LEN,
@@ -5461,7 +5746,6 @@ static const struct mtk_soc_data mt7621_data = {
 	},
 	.rx = {
 		.desc_size = sizeof(struct mtk_rx_dma),
-		.irq_done_mask = MTK_RX_DONE_INT,
 		.dma_l4_valid = RX_DMA_L4_VALID,
 		.dma_size = MTK_DMA_SIZE(2K),
 		.dma_max_len = MTK_TX_DMA_BUF_LEN,
@@ -5482,6 +5766,7 @@ static const struct mtk_soc_data mt7622_data = {
 	.hash_offset = 2,
 	.has_accounting = true,
 	.foe_entry_size = MTK_FOE_ENTRY_V1_SIZE,
+	.rss_num = 0,
 	.tx = {
 		.desc_size = sizeof(struct mtk_tx_dma),
 		.dma_max_len = MTK_TX_DMA_BUF_LEN,
@@ -5491,7 +5776,6 @@ static const struct mtk_soc_data mt7622_data = {
 	},
 	.rx = {
 		.desc_size = sizeof(struct mtk_rx_dma),
-		.irq_done_mask = MTK_RX_DONE_INT,
 		.dma_l4_valid = RX_DMA_L4_VALID,
 		.dma_size = MTK_DMA_SIZE(2K),
 		.dma_max_len = MTK_TX_DMA_BUF_LEN,
@@ -5511,6 +5795,7 @@ static const struct mtk_soc_data mt7623_data = {
 	.hash_offset = 2,
 	.foe_entry_size = MTK_FOE_ENTRY_V1_SIZE,
 	.disable_pll_modes = true,
+	.rss_num = 0,
 	.tx = {
 		.desc_size = sizeof(struct mtk_tx_dma),
 		.dma_max_len = MTK_TX_DMA_BUF_LEN,
@@ -5520,7 +5805,6 @@ static const struct mtk_soc_data mt7623_data = {
 	},
 	.rx = {
 		.desc_size = sizeof(struct mtk_rx_dma),
-		.irq_done_mask = MTK_RX_DONE_INT,
 		.dma_l4_valid = RX_DMA_L4_VALID,
 		.dma_size = MTK_DMA_SIZE(2K),
 		.dma_max_len = MTK_TX_DMA_BUF_LEN,
@@ -5537,6 +5821,7 @@ static const struct mtk_soc_data mt7629_data = {
 	.required_pctl = false,
 	.has_accounting = true,
 	.version = 1,
+	.rss_num = 0,
 	.tx = {
 		.desc_size = sizeof(struct mtk_tx_dma),
 		.dma_max_len = MTK_TX_DMA_BUF_LEN,
@@ -5546,7 +5831,6 @@ static const struct mtk_soc_data mt7629_data = {
 	},
 	.rx = {
 		.desc_size = sizeof(struct mtk_rx_dma),
-		.irq_done_mask = MTK_RX_DONE_INT,
 		.dma_l4_valid = RX_DMA_L4_VALID,
 		.dma_size = MTK_DMA_SIZE(2K),
 		.dma_max_len = MTK_TX_DMA_BUF_LEN,
@@ -5567,16 +5851,16 @@ static const struct mtk_soc_data mt7981_data = {
 	.hash_offset = 4,
 	.has_accounting = true,
 	.foe_entry_size = MTK_FOE_ENTRY_V2_SIZE,
+	.rss_num = 4,
 	.tx = {
 		.desc_size = sizeof(struct mtk_tx_dma_v2),
 		.dma_max_len = MTK_TX_DMA_BUF_LEN_V2,
 		.dma_len_offset = 8,
-		.dma_size = MTK_DMA_SIZE(2K),
+		.dma_size = MTK_DMA_SIZE(4K),
 		.fq_dma_size = MTK_DMA_SIZE(2K),
 	},
 	.rx = {
 		.desc_size = sizeof(struct mtk_rx_dma),
-		.irq_done_mask = MTK_RX_DONE_INT,
 		.dma_l4_valid = RX_DMA_L4_VALID_V2,
 		.dma_max_len = MTK_TX_DMA_BUF_LEN,
 		.dma_len_offset = 16,
@@ -5597,6 +5881,7 @@ static const struct mtk_soc_data mt7986_data = {
 	.hash_offset = 4,
 	.has_accounting = true,
 	.foe_entry_size = MTK_FOE_ENTRY_V2_SIZE,
+	.rss_num = 4,
 	.tx = {
 		.desc_size = sizeof(struct mtk_tx_dma_v2),
 		.dma_max_len = MTK_TX_DMA_BUF_LEN_V2,
@@ -5606,7 +5891,6 @@ static const struct mtk_soc_data mt7986_data = {
 	},
 	.rx = {
 		.desc_size = sizeof(struct mtk_rx_dma),
-		.irq_done_mask = MTK_RX_DONE_INT,
 		.dma_l4_valid = RX_DMA_L4_VALID_V2,
 		.dma_max_len = MTK_TX_DMA_BUF_LEN,
 		.dma_len_offset = 16,
@@ -5627,20 +5911,20 @@ static const struct mtk_soc_data mt7988_data = {
 	.hash_offset = 4,
 	.has_accounting = true,
 	.foe_entry_size = MTK_FOE_ENTRY_V3_SIZE,
+	.rss_num = 4,
 	.tx = {
 		.desc_size = sizeof(struct mtk_tx_dma_v2),
 		.dma_max_len = MTK_TX_DMA_BUF_LEN_V2,
 		.dma_len_offset = 8,
-		.dma_size = MTK_DMA_SIZE(2K),
+		.dma_size = MTK_DMA_SIZE(4K),
 		.fq_dma_size = MTK_DMA_SIZE(4K),
 	},
 	.rx = {
 		.desc_size = sizeof(struct mtk_rx_dma_v2),
-		.irq_done_mask = MTK_RX_DONE_INT_V2,
 		.dma_l4_valid = RX_DMA_L4_VALID_V2,
 		.dma_max_len = MTK_TX_DMA_BUF_LEN_V2,
 		.dma_len_offset = 8,
-		.dma_size = MTK_DMA_SIZE(2K),
+		.dma_size = MTK_DMA_SIZE(1K),
 	},
 };
 
@@ -5651,6 +5935,7 @@ static const struct mtk_soc_data rt5350_data = {
 	.required_clks = MT7628_CLKS_BITMAP,
 	.required_pctl = false,
 	.version = 1,
+	.rss_num = 0,
 	.tx = {
 		.desc_size = sizeof(struct mtk_tx_dma),
 		.dma_max_len = MTK_TX_DMA_BUF_LEN,
@@ -5659,7 +5944,6 @@ static const struct mtk_soc_data rt5350_data = {
 	},
 	.rx = {
 		.desc_size = sizeof(struct mtk_rx_dma),
-		.irq_done_mask = MTK_RX_DONE_INT,
 		.dma_l4_valid = RX_DMA_L4_VALID_PDMA,
 		.dma_max_len = MTK_TX_DMA_BUF_LEN,
 		.dma_len_offset = 16,
diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.h b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
index 334625814b97..378cf47913ef 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.h
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
@@ -76,6 +76,8 @@
 #define	MTK_HW_LRO_BW_THRE		3000
 #define	MTK_HW_LRO_REPLACE_DELTA	1000
 #define	MTK_HW_LRO_SDL_REMAIN_ROOM	1522
+#define MTK_RSS_HASH_KEYSIZE		40
+#define MTK_RSS_MAX_INDIRECTION_TABLE	128
 
 /* Frame Engine Global Configuration */
 #define MTK_FE_GLO_CFG(x)	(((x) == MTK_GMAC3_ID) ? 0x24 : 0x00)
@@ -97,6 +99,8 @@
 #define MTK_GDM1_AF		BIT(28)
 #define MTK_GDM2_AF		BIT(29)
 
+#define MTK_PDMA_IRQ_NUM	(4)
+
 /* PDMA HW LRO Alter Flow Timer Register */
 #define MTK_PDMA_LRO_ALT_REFRESH_TIMER	0x1c
 
@@ -179,7 +183,10 @@
 
 /* PDMA HW LRO Control Registers */
 #define MTK_PDMA_LRO_CTRL_DW0	0x980
+#define MTK_HW_LRO_RING_NUM(eth)		(mtk_is_netsys_v3_or_greater(eth) ? 4 : 3)
 #define MTK_LRO_EN			BIT(0)
+#define MTK_NON_LRO_MULTI_EN		BIT(2)
+#define MTK_LRO_DLY_INT_EN		BIT(5)
 #define MTK_L3_CKS_UPD_EN		BIT(7)
 #define MTK_L3_CKS_UPD_EN_V2		BIT(19)
 #define MTK_LRO_ALT_PKT_CNT_MODE	BIT(21)
@@ -198,6 +205,19 @@
 #define MTK_MULTI_EN		BIT(10)
 #define MTK_PDMA_SIZE_8DWORDS	(1 << 4)
 
+/* PDMA RSS Control Registers */
+#define MTK_RX_NAPI_NUM			(4)
+#define MTK_RX_RSS_NUM(eth)		((eth)->soc->rss_num)
+#define MTK_RSS_RING(x)			(x)
+#define MTK_RSS_EN			BIT(0)
+#define MTK_RSS_CFG_REQ			BIT(2)
+#define MTK_RSS_IPV6_STATIC_HASH	(0x7 << 8)
+#define MTK_RSS_IPV4_STATIC_HASH	(0x7 << 12)
+#define MTK_RSS_HASH_KEY_DW(reg_map, x)		((reg_map)->pdma.rss_glo_cfg + \
+						0x20 + ((x) * 0x4))
+#define MTK_RSS_INDR_TABLE_DW(reg_map, x)	((reg_map)->pdma.rss_glo_cfg + \
+						0x50 + ((x) * 0x4))
+
 /* PDMA Global Configuration Register */
 #define MTK_PDMA_LRO_SDL	0x3000
 #define MTK_RX_CFG_SDL_OFFSET	16
@@ -209,6 +229,7 @@
 /* PDMA Delay Interrupt Register */
 #define MTK_PDMA_DELAY_RX_MASK		GENMASK(15, 0)
 #define MTK_PDMA_DELAY_RX_EN		BIT(15)
+#define MTK_PDMA_DELAY_RX_RING_SHIFT	16
 #define MTK_PDMA_DELAY_RX_PINT_SHIFT	8
 #define MTK_PDMA_DELAY_RX_PTIME_SHIFT	0
 
@@ -229,14 +250,15 @@
 #define MTK_RING_MYIP_VLD		BIT(9)
 
 /* PDMA HW LRO Ring Control Registers */
-#define MTK_LRO_RX_RING0_CTRL_DW1	0xb28
-#define MTK_LRO_RX_RING0_CTRL_DW2	0xb2c
-#define MTK_LRO_RX_RING0_CTRL_DW3	0xb30
-#define MTK_LRO_CTRL_DW1_CFG(x)		(MTK_LRO_RX_RING0_CTRL_DW1 + (x * 0x40))
-#define MTK_LRO_CTRL_DW2_CFG(x)		(MTK_LRO_RX_RING0_CTRL_DW2 + (x * 0x40))
-#define MTK_LRO_CTRL_DW3_CFG(x)		(MTK_LRO_RX_RING0_CTRL_DW3 + (x * 0x40))
+#define MTK_LRO_CTRL_DW1_CFG(reg_map, x)	((reg_map)->pdma.lro_ring_ctrl_dw1 + \
+						((x) * 0x40))
+#define MTK_LRO_CTRL_DW2_CFG(reg_map, x)	((reg_map)->pdma.lro_ring_ctrl_dw1 + \
+						0x4 + ((x) * 0x40))
+#define MTK_LRO_CTRL_DW3_CFG(reg_map, x)	((reg_map)->pdma.lro_ring_ctrl_dw1 + \
+						0x8 + ((x) * 0x40))
 #define MTK_RING_AGE_TIME_L		((MTK_HW_LRO_AGE_TIME & 0x3ff) << 22)
 #define MTK_RING_AGE_TIME_H		((MTK_HW_LRO_AGE_TIME >> 10) & 0x3f)
+#define MTK_RING_PSE_MODE		BIT(6)
 #define MTK_RING_AUTO_LERAN_MODE	(3 << 6)
 #define MTK_RING_VLD			BIT(8)
 #define MTK_RING_MAX_AGG_TIME		((MTK_HW_LRO_AGG_TIME & 0xffff) << 10)
@@ -290,7 +312,20 @@
 #define FC_THRES_MIN		0x4444
 
 /* QDMA Interrupt Status Register */
-#define MTK_RX_DONE_DLY		BIT(30)
+#define MTK_RX_DONE_INT_V1(ring_no) \
+	( \
+		(ring_no) ? \
+		BIT(24 + (ring_no)) : \
+		BIT(30) \
+	)
+
+#define MTK_RX_DONE_INT_V2(ring_no)	BIT(24 + (ring_no))
+
+#define MTK_RX_DONE_INT(eth, ring_no)		\
+	(mtk_is_netsys_v3_or_greater(eth) ?  \
+	 MTK_RX_DONE_INT_V2(ring_no) : \
+	 MTK_RX_DONE_INT_V1(ring_no))
+
 #define MTK_TX_DONE_DLY		BIT(28)
 #define MTK_RX_DONE_INT3	BIT(19)
 #define MTK_RX_DONE_INT2	BIT(18)
@@ -300,11 +335,8 @@
 #define MTK_TX_DONE_INT2	BIT(2)
 #define MTK_TX_DONE_INT1	BIT(1)
 #define MTK_TX_DONE_INT0	BIT(0)
-#define MTK_RX_DONE_INT		MTK_RX_DONE_DLY
 #define MTK_TX_DONE_INT		MTK_TX_DONE_DLY
 
-#define MTK_RX_DONE_INT_V2	BIT(14)
-
 #define MTK_CDM_TXFIFO_RDY	BIT(7)
 
 /* QDMA Interrupt grouping registers */
@@ -942,6 +974,7 @@ struct mtk_tx_ring {
 	struct mtk_tx_dma *dma_pdma;	/* For MT7628/88 PDMA handling */
 	dma_addr_t phys_pdma;
 	int cpu_idx;
+	bool in_sram;
 };
 
 /* PDMA rx ring mode */
@@ -967,13 +1000,38 @@ struct mtk_rx_ring {
 	u16 buf_size;
 	u16 dma_size;
 	bool calc_idx_update;
+	bool in_sram;
 	u16 calc_idx;
 	u32 crx_idx_reg;
+	u32 ring_no;
 	/* page_pool */
 	struct page_pool *page_pool;
 	struct xdp_rxq_info xdp_q;
 };
 
+/* struct mtk_rss_params -	This is the structure holding parameters
+ *				for the RSS ring
+ * @hash_key			The element is used to record the
+ *				secret key for the RSS ring
+ * indirection_table		The element is used to record the
+ *				indirection table for the RSS ring
+ */
+struct mtk_rss_params {
+	u32		hash_key[MTK_RSS_HASH_KEYSIZE / sizeof(u32)];
+	u8		indirection_table[MTK_RSS_MAX_INDIRECTION_TABLE];
+};
+
+/* struct mtk_napi -	This is the structure holding NAPI-related information,
+ *			and a mtk_napi struct is binding to one interrupt group
+ * @napi:		The NAPI struct
+ * @rx_ring:		Pointer to the memory holding info about the RX ring
+ */
+struct mtk_napi {
+	struct napi_struct	napi;
+	struct mtk_eth		*eth;
+	struct mtk_rx_ring	*rx_ring;
+};
+
 enum mkt_eth_capabilities {
 	MTK_RGMII_BIT = 0,
 	MTK_TRGMII_BIT,
@@ -985,7 +1043,9 @@ enum mkt_eth_capabilities {
 	MTK_INFRA_BIT,
 	MTK_SHARED_SGMII_BIT,
 	MTK_HWLRO_BIT,
+	MTK_RSS_BIT,
 	MTK_SHARED_INT_BIT,
+	MTK_PDMA_INT_BIT,
 	MTK_TRGMII_MT7621_CLK_BIT,
 	MTK_QDMA_BIT,
 	MTK_SOC_MT7628_BIT,
@@ -1025,7 +1085,9 @@ enum mkt_eth_capabilities {
 #define MTK_INFRA		BIT_ULL(MTK_INFRA_BIT)
 #define MTK_SHARED_SGMII	BIT_ULL(MTK_SHARED_SGMII_BIT)
 #define MTK_HWLRO		BIT_ULL(MTK_HWLRO_BIT)
+#define MTK_RSS			BIT_ULL(MTK_RSS_BIT)
 #define MTK_SHARED_INT		BIT_ULL(MTK_SHARED_INT_BIT)
+#define MTK_PDMA_INT		BIT_ULL(MTK_PDMA_INT_BIT)
 #define MTK_TRGMII_MT7621_CLK	BIT_ULL(MTK_TRGMII_MT7621_CLK_BIT)
 #define MTK_QDMA		BIT_ULL(MTK_QDMA_BIT)
 #define MTK_SOC_MT7628		BIT_ULL(MTK_SOC_MT7628_BIT)
@@ -1117,15 +1179,15 @@ enum mkt_eth_capabilities {
 #define MT7981_CAPS  (MTK_GMAC1_SGMII | MTK_GMAC2_SGMII | MTK_GMAC2_GEPHY | \
 		      MTK_MUX_GMAC12_TO_GEPHY_SGMII | MTK_QDMA | \
 		      MTK_MUX_U3_GMAC2_TO_QPHY | MTK_U3_COPHY_V2 | \
-		      MTK_RSTCTRL_PPE1 | MTK_SRAM)
+		      MTK_RSTCTRL_PPE1 | MTK_SRAM | MTK_PDMA_INT)
 
 #define MT7986_CAPS  (MTK_GMAC1_SGMII | MTK_GMAC2_SGMII | \
 		      MTK_MUX_GMAC12_TO_GEPHY_SGMII | MTK_QDMA | \
-		      MTK_RSTCTRL_PPE1 | MTK_SRAM)
+		      MTK_RSTCTRL_PPE1 | MTK_SRAM | MTK_PDMA_INT)
 
 #define MT7988_CAPS  (MTK_36BIT_DMA | MTK_GDM1_ESW | MTK_GMAC2_2P5GPHY | \
 		      MTK_MUX_GMAC2_TO_2P5GPHY | MTK_QDMA | MTK_RSTCTRL_PPE1 | \
-		      MTK_RSTCTRL_PPE2 | MTK_SRAM)
+		      MTK_RSTCTRL_PPE2 | MTK_SRAM | MTK_PDMA_INT | MTK_RSS)
 
 struct mtk_tx_dma_desc_info {
 	dma_addr_t	addr;
@@ -1223,6 +1285,7 @@ struct mtk_reg_map {
 struct mtk_soc_data {
 	const struct mtk_reg_map *reg_map;
 	u32             ana_rgc3;
+	u32		rss_num;
 	u64		caps;
 	u64		required_clks;
 	bool		required_pctl;
@@ -1270,7 +1333,8 @@ struct mtk_soc_data {
  *			dummy for NAPI to work
  * @netdev:		The netdev instances
  * @mac:		Each netdev is linked to a physical MAC
- * @irq:		The IRQ that we are using
+ * @irq_fe:		Array of IRQs of the frame engine
+ * @irq_pdma:		Array of IRQs of the PDMA used for RSS
  * @msg_enable:		Ethtool msg level
  * @ethsys:		The register map pointing at the range used to setup
  *			MII modes
@@ -1314,7 +1378,8 @@ struct mtk_eth {
 	struct net_device		*dummy_dev;
 	struct net_device		*netdev[MTK_MAX_DEVS];
 	struct mtk_mac			*mac[MTK_MAX_DEVS];
-	int				irq[MTK_FE_IRQ_NUM];
+	int				irq_fe[MTK_FE_IRQ_NUM];
+	int				irq_pdma[MTK_PDMA_IRQ_NUM];
 	u32				msg_enable;
 	unsigned long			sysclk;
 	struct regmap			*ethsys;
@@ -1327,7 +1392,8 @@ struct mtk_eth {
 	struct mtk_rx_ring		rx_ring[MTK_MAX_RX_RING_NUM];
 	struct mtk_rx_ring		rx_ring_qdma;
 	struct napi_struct		tx_napi;
-	struct napi_struct		rx_napi;
+	struct mtk_napi			rx_napi[MTK_RX_NAPI_NUM];
+	struct mtk_rss_params		rss_params;
 	void				*scratch_ring;
 	dma_addr_t			phy_scratch_ring;
 	void				*scratch_head[MTK_FQ_DMA_HEAD];
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [net-next v8 3/3] net: ethernet: mtk_eth_soc: Add LRO support
  2026-05-09 19:09 [net-next v8 0/3] Add RSS and LRO support Frank Wunderlich
  2026-05-09 19:09 ` [net-next v8 1/3] net: ethernet: mtk_eth_soc: Add register definitions for RSS and LRO Frank Wunderlich
  2026-05-09 19:09 ` [net-next v8 2/3] net: ethernet: mtk_eth_soc: Add RSS support Frank Wunderlich
@ 2026-05-09 19:09 ` Frank Wunderlich
  2026-05-14  1:52   ` Jakub Kicinski
  2026-05-14  1:53   ` Jakub Kicinski
  2 siblings, 2 replies; 8+ messages in thread
From: Frank Wunderlich @ 2026-05-09 19:09 UTC (permalink / raw)
  To: Felix Fietkau, Lorenzo Bianconi, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Matthias Brugger,
	AngeloGioacchino Del Regno, Russell King
  Cc: Frank Wunderlich, netdev, linux-kernel, linux-arm-kernel,
	linux-mediatek, Mason Chang, Daniel Golle

From: Mason Chang <mason-cw.chang@mediatek.com>

Add Large Receive Offload support to mediatek ethernet driver and
activate it for MT7988.

Signed-off-by: Mason Chang <mason-cw.chang@mediatek.com>
Signed-off-by: Frank Wunderlich <frank-w@public-files.de>
---
v8:
- fix u32 vs. be32 completely (with sparse check) except one unrelated one

v7:
- fix u32 vs. be32 reported by patchwork check
- add L4 PSH check
  https://git01.mediatek.com/plugins/gitiles/openwrt/feeds/mtk-openwrt-feeds/+/7521c42b0bd5be20d52e20b110daea8c756fc069%5E%21/#F1
- Add HW LRO max 4-depth VLAN support including switch special tag.
  https://git01.mediatek.com/plugins/gitiles/openwrt/feeds/mtk-openwrt-feeds/+/35490cec6a2e5982532935fb0a1c884f7c4efdb0%5E%21/#F2

v6:
- fix some Macro argument '...' may be better as '(...)' to avoid precedence issues for LRO
- drop unused MTK_CTRL_DW0_SDL_MASK

v5:
- fix too long lines reported by checkpatch
  MTK_LRO_RING_RELINQUISH_REQ
  MTK_LRO_RING_RELINQUISH_DONE
  irq handling (MTK_HW_LRO_IRQ + MTK_HW_LRO_RING)

v4:
- fix lro reverse christmas tree and LRO params suggested by andrew
- drop mtk_hwlro_stats_ebl and unused IS_HW_LRO_RING (only used in
  properitary debugfs)

v2:
- drop link to commit for 6.6 patch
---
 drivers/net/ethernet/mediatek/mtk_eth_soc.c | 263 +++++++++++++++++---
 drivers/net/ethernet/mediatek/mtk_eth_soc.h |  53 ++--
 2 files changed, 254 insertions(+), 62 deletions(-)

diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index 908fd88287ac..8035fc2557de 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -2793,7 +2793,7 @@ static int mtk_rx_alloc(struct mtk_eth *eth, int ring_no, int rx_flag)
 
 	if (rx_flag == MTK_RX_FLAGS_HWLRO) {
 		rx_data_len = MTK_MAX_LRO_RX_LENGTH;
-		rx_dma_size = MTK_HW_LRO_DMA_SIZE;
+		rx_dma_size = MTK_HW_LRO_DMA_SIZE(eth);
 	} else {
 		rx_data_len = ETH_DATA_LEN;
 		rx_dma_size = soc->rx.dma_size;
@@ -2806,7 +2806,7 @@ static int mtk_rx_alloc(struct mtk_eth *eth, int ring_no, int rx_flag)
 	if (!ring->data)
 		return -ENOMEM;
 
-	if (mtk_page_pool_enabled(eth)) {
+	if (mtk_page_pool_enabled(eth) && rcu_access_pointer(eth->prog))  {
 		struct page_pool *pp;
 
 		pp = mtk_create_page_pool(eth, &ring->xdp_q, ring_no,
@@ -2952,10 +2952,11 @@ static void mtk_rx_clean(struct mtk_eth *eth, struct mtk_rx_ring *ring, bool in_
 
 static int mtk_hwlro_rx_init(struct mtk_eth *eth)
 {
-	const struct mtk_reg_map *reg_map = eth->soc->reg_map;
-	int i;
 	u32 ring_ctrl_dw1 = 0, ring_ctrl_dw2 = 0, ring_ctrl_dw3 = 0;
+	const struct mtk_reg_map *reg_map = eth->soc->reg_map;
+	const struct mtk_soc_data *soc = eth->soc;
 	u32 lro_ctrl_dw0 = 0, lro_ctrl_dw3 = 0;
+	int i, val;
 
 	/* set LRO rings to auto-learn modes */
 	ring_ctrl_dw2 |= MTK_RING_AUTO_LERAN_MODE;
@@ -2974,30 +2975,50 @@ static int mtk_hwlro_rx_init(struct mtk_eth *eth)
 	ring_ctrl_dw2 |= MTK_RING_MAX_AGG_CNT_L;
 	ring_ctrl_dw3 |= MTK_RING_MAX_AGG_CNT_H;
 
-	for (i = 1; i < MTK_MAX_RX_RING_NUM; i++) {
+	for (i = 1; i <= MTK_HW_LRO_RING_NUM(eth); i++) {
 		mtk_w32(eth, ring_ctrl_dw1, MTK_LRO_CTRL_DW1_CFG(reg_map, i));
 		mtk_w32(eth, ring_ctrl_dw2, MTK_LRO_CTRL_DW2_CFG(reg_map, i));
 		mtk_w32(eth, ring_ctrl_dw3, MTK_LRO_CTRL_DW3_CFG(reg_map, i));
 	}
 
 	/* IPv4 checksum update enable */
-	lro_ctrl_dw0 |= MTK_L3_CKS_UPD_EN;
+	lro_ctrl_dw0 |= MTK_L3_CKS_UPD_EN(eth);
 
 	/* switch priority comparison to packet count mode */
 	lro_ctrl_dw0 |= MTK_LRO_ALT_PKT_CNT_MODE;
 
+	/* enable L4 PSH flag check */
+	lro_ctrl_dw0 |= MTK_LRO_L4_CTRL_PSH_EN;
+
 	/* bandwidth threshold setting */
-	mtk_w32(eth, MTK_HW_LRO_BW_THRE, MTK_PDMA_LRO_CTRL_DW2);
+	mtk_w32(eth, MTK_HW_LRO_BW_THRE, MTK_PDMA_LRO_CTRL_DW2(reg_map));
 
 	/* auto-learn score delta setting */
-	mtk_w32(eth, MTK_HW_LRO_REPLACE_DELTA, MTK_PDMA_LRO_ALT_SCORE_DELTA);
+	mtk_w32(eth, MTK_HW_LRO_REPLACE_DELTA, MTK_PDMA_LRO_ALT_SCORE_DELTA(reg_map));
 
 	/* set refresh timer for altering flows to 1 sec. (unit: 20us) */
 	mtk_w32(eth, (MTK_HW_LRO_TIMER_UNIT << 16) | MTK_HW_LRO_REFRESH_TIME,
 		MTK_PDMA_LRO_ALT_REFRESH_TIMER);
 
-	/* set HW LRO mode & the max aggregation count for rx packets */
-	lro_ctrl_dw3 |= MTK_ADMA_MODE | (MTK_HW_LRO_MAX_AGG_CNT & 0xff);
+	if (mtk_is_netsys_v3_or_greater(eth)) {
+		val = mtk_r32(eth, reg_map->pdma.rx_cfg);
+		mtk_w32(eth, val | ((MTK_PDMA_LRO_SDL + MTK_MAX_RX_LENGTH) <<
+			MTK_RX_CFG_SDL_OFFSET), reg_map->pdma.rx_cfg);
+
+		lro_ctrl_dw0 |= MTK_PDMA_LRO_SDL << MTK_CTRL_DW0_SDL_OFFSET;
+
+		/* enable cpu reason black list */
+		lro_ctrl_dw0 |= MTK_LRO_CRSN_BNW(eth);
+
+		/* no use PPE cpu reason */
+		mtk_w32(eth, 0xffffffff, MTK_PDMA_LRO_CTRL_DW1(reg_map));
+	} else {
+		/* set HW LRO mode & the max aggregation count for rx packets */
+		lro_ctrl_dw3 |= MTK_ADMA_MODE | (MTK_HW_LRO_MAX_AGG_CNT & 0xff);
+	}
+
+	/* enable max 4-depth VLAN support including switch special tag */
+	lro_ctrl_dw3 |= MTK_LRO_VLAN_VID_CMP_DEPTH | MTK_LRO_VLAN_EN;
 
 	/* the minimal remaining room of SDL0 in RXD for lro aggregation */
 	lro_ctrl_dw3 |= MTK_LRO_MIN_RXD_SDL;
@@ -3005,8 +3026,19 @@ static int mtk_hwlro_rx_init(struct mtk_eth *eth)
 	/* enable HW LRO */
 	lro_ctrl_dw0 |= MTK_LRO_EN;
 
-	mtk_w32(eth, lro_ctrl_dw3, MTK_PDMA_LRO_CTRL_DW3);
-	mtk_w32(eth, lro_ctrl_dw0, MTK_PDMA_LRO_CTRL_DW0);
+	mtk_w32(eth, lro_ctrl_dw3, MTK_PDMA_LRO_CTRL_DW3(reg_map));
+	mtk_w32(eth, lro_ctrl_dw0, MTK_PDMA_LRO_CTRL_DW0(reg_map));
+
+	if (mtk_is_netsys_v2_or_greater(eth)) {
+		i = (soc->rx.desc_size == sizeof(struct mtk_rx_dma_v2)) ? 1 : 0;
+		mtk_m32(eth, MTK_RX_DONE_INT(eth, MTK_HW_LRO_RING(eth, i)),
+			MTK_RX_DONE_INT(eth, MTK_HW_LRO_RING(eth, i)), reg_map->pdma.int_grp);
+		mtk_m32(eth, MTK_RX_DONE_INT(eth, MTK_HW_LRO_RING(eth, i + 1)),
+			MTK_RX_DONE_INT(eth, MTK_HW_LRO_RING(eth, i + 1)),
+			reg_map->pdma.int_grp + 0x4);
+		mtk_m32(eth, MTK_RX_DONE_INT(eth, MTK_HW_LRO_RING(eth, i + 2)),
+			MTK_RX_DONE_INT(eth, MTK_HW_LRO_RING(eth, i + 2)), reg_map->pdma.int_grp3);
+	}
 
 	return 0;
 }
@@ -3014,16 +3046,16 @@ static int mtk_hwlro_rx_init(struct mtk_eth *eth)
 static void mtk_hwlro_rx_uninit(struct mtk_eth *eth)
 {
 	const struct mtk_reg_map *reg_map = eth->soc->reg_map;
-	int i;
 	u32 val;
+	int i;
 
 	/* relinquish lro rings, flush aggregated packets */
-	mtk_w32(eth, MTK_LRO_RING_RELINQUISH_REQ, MTK_PDMA_LRO_CTRL_DW0);
+	mtk_w32(eth, MTK_LRO_RING_RELINQUISH_REQ(eth), MTK_PDMA_LRO_CTRL_DW0(reg_map));
 
 	/* wait for relinquishments done */
 	for (i = 0; i < 10; i++) {
-		val = mtk_r32(eth, MTK_PDMA_LRO_CTRL_DW0);
-		if (val & MTK_LRO_RING_RELINQUISH_DONE) {
+		val = mtk_r32(eth, MTK_PDMA_LRO_CTRL_DW0(reg_map));
+		if (val & MTK_LRO_RING_RELINQUISH_DONE(eth)) {
 			msleep(20);
 			continue;
 		}
@@ -3031,14 +3063,14 @@ static void mtk_hwlro_rx_uninit(struct mtk_eth *eth)
 	}
 
 	/* invalidate lro rings */
-	for (i = 1; i < MTK_MAX_RX_RING_NUM; i++)
+	for (i = 1; i <= MTK_HW_LRO_RING_NUM(eth); i++)
 		mtk_w32(eth, 0, MTK_LRO_CTRL_DW2_CFG(reg_map, i));
 
 	/* disable HW LRO */
-	mtk_w32(eth, 0, MTK_PDMA_LRO_CTRL_DW0);
+	mtk_w32(eth, 0, MTK_PDMA_LRO_CTRL_DW0(reg_map));
 }
 
-static void mtk_hwlro_val_ipaddr(struct mtk_eth *eth, int idx, __be32 ip)
+static void mtk_hwlro_val_ipaddr(struct mtk_eth *eth, int idx, u32 ip)
 {
 	const struct mtk_reg_map *reg_map = eth->soc->reg_map;
 	u32 reg_val;
@@ -3048,7 +3080,7 @@ static void mtk_hwlro_val_ipaddr(struct mtk_eth *eth, int idx, __be32 ip)
 	/* invalidate the IP setting */
 	mtk_w32(eth, (reg_val & ~MTK_RING_MYIP_VLD), MTK_LRO_CTRL_DW2_CFG(reg_map, idx));
 
-	mtk_w32(eth, ip, MTK_LRO_DIP_DW0_CFG(idx));
+	mtk_w32(eth, ip, MTK_LRO_DIP_DW0_CFG(reg_map, idx));
 
 	/* validate the IP setting */
 	mtk_w32(eth, (reg_val | MTK_RING_MYIP_VLD), MTK_LRO_CTRL_DW2_CFG(reg_map, idx));
@@ -3064,7 +3096,7 @@ static void mtk_hwlro_inval_ipaddr(struct mtk_eth *eth, int idx)
 	/* invalidate the IP setting */
 	mtk_w32(eth, (reg_val & ~MTK_RING_MYIP_VLD), MTK_LRO_CTRL_DW2_CFG(reg_map, idx));
 
-	mtk_w32(eth, 0, MTK_LRO_DIP_DW0_CFG(idx));
+	mtk_w32(eth, 0, MTK_LRO_DIP_DW0_CFG(reg_map, idx));
 }
 
 static int mtk_hwlro_get_ip_cnt(struct mtk_mac *mac)
@@ -3080,6 +3112,65 @@ static int mtk_hwlro_get_ip_cnt(struct mtk_mac *mac)
 	return cnt;
 }
 
+static int mtk_hwlro_add_ipaddr_idx(struct net_device *dev, u32 ip4dst)
+{
+	struct mtk_mac *mac = netdev_priv(dev);
+	const struct mtk_reg_map *reg_map;
+	struct mtk_eth *eth = mac->hw;
+	u32 reg_val;
+	int i;
+
+	reg_map = eth->soc->reg_map;
+
+	/* check for duplicate IP address in the current DIP list */
+	for (i = 1; i <= MTK_HW_LRO_DIP_NUM(eth); i++) {
+		reg_val = mtk_r32(eth, MTK_LRO_DIP_DW0_CFG(reg_map, i));
+		if (reg_val == ip4dst)
+			break;
+	}
+
+	if (i < MTK_HW_LRO_DIP_NUM(eth) + 1) {
+		netdev_warn(dev, "Duplicate IP address at DIP(%d)!\n", i);
+		return -EEXIST;
+	}
+
+	/* find out available DIP index */
+	for (i = 1; i <= MTK_HW_LRO_DIP_NUM(eth); i++) {
+		reg_val = mtk_r32(eth, MTK_LRO_DIP_DW0_CFG(reg_map, i));
+		if (reg_val == 0UL)
+			break;
+	}
+
+	if (i >= MTK_HW_LRO_DIP_NUM(eth) + 1) {
+		netdev_warn(dev, "DIP index is currently out of resource!\n");
+		return -EBUSY;
+	}
+
+	return i;
+}
+
+static int mtk_hwlro_get_ipaddr_idx(struct net_device *dev, u32 ip4dst)
+{
+	struct mtk_mac *mac = netdev_priv(dev);
+	struct mtk_eth *eth = mac->hw;
+	u32 reg_val;
+	int i;
+
+	/* find out DIP index that matches the given IP address */
+	for (i = 1; i <= MTK_HW_LRO_DIP_NUM(eth); i++) {
+		reg_val = mtk_r32(eth, MTK_LRO_DIP_DW0_CFG(eth->soc->reg_map, i));
+		if (reg_val == ip4dst)
+			break;
+	}
+
+	if (i >= MTK_HW_LRO_DIP_NUM(eth) + 1) {
+		netdev_warn(dev, "DIP address is not exist!\n");
+		return -ENOENT;
+	}
+
+	return i;
+}
+
 static int mtk_hwlro_add_ipaddr(struct net_device *dev,
 				struct ethtool_rxnfc *cmd)
 {
@@ -3088,18 +3179,22 @@ static int mtk_hwlro_add_ipaddr(struct net_device *dev,
 	struct mtk_mac *mac = netdev_priv(dev);
 	struct mtk_eth *eth = mac->hw;
 	int hwlro_idx;
+	u32 ip4dst;
 
 	if ((fsp->flow_type != TCP_V4_FLOW) ||
 	    (!fsp->h_u.tcp_ip4_spec.ip4dst) ||
 	    (fsp->location > 1))
 		return -EINVAL;
 
-	mac->hwlro_ip[fsp->location] = htonl(fsp->h_u.tcp_ip4_spec.ip4dst);
-	hwlro_idx = (mac->id * MTK_MAX_LRO_IP_CNT) + fsp->location;
+	ip4dst = ntohl(fsp->h_u.tcp_ip4_spec.ip4dst);
+	hwlro_idx = mtk_hwlro_add_ipaddr_idx(dev, ip4dst);
+	if (hwlro_idx < 0)
+		return hwlro_idx;
 
+	mac->hwlro_ip[fsp->location] = ip4dst;
 	mac->hwlro_ip_cnt = mtk_hwlro_get_ip_cnt(mac);
 
-	mtk_hwlro_val_ipaddr(eth, hwlro_idx, mac->hwlro_ip[fsp->location]);
+	mtk_hwlro_val_ipaddr(eth, hwlro_idx, ip4dst);
 
 	return 0;
 }
@@ -3112,13 +3207,17 @@ static int mtk_hwlro_del_ipaddr(struct net_device *dev,
 	struct mtk_mac *mac = netdev_priv(dev);
 	struct mtk_eth *eth = mac->hw;
 	int hwlro_idx;
+	u32 ip4dst;
 
 	if (fsp->location > 1)
 		return -EINVAL;
 
-	mac->hwlro_ip[fsp->location] = 0;
-	hwlro_idx = (mac->id * MTK_MAX_LRO_IP_CNT) + fsp->location;
+	ip4dst = mac->hwlro_ip[fsp->location];
+	hwlro_idx = mtk_hwlro_get_ipaddr_idx(dev, ip4dst);
+	if (hwlro_idx < 0)
+		return hwlro_idx;
 
+	mac->hwlro_ip[fsp->location] = 0;
 	mac->hwlro_ip_cnt = mtk_hwlro_get_ip_cnt(mac);
 
 	mtk_hwlro_inval_ipaddr(eth, hwlro_idx);
@@ -3126,6 +3225,24 @@ static int mtk_hwlro_del_ipaddr(struct net_device *dev,
 	return 0;
 }
 
+static void mtk_hwlro_netdev_enable(struct net_device *dev)
+{
+	struct mtk_mac *mac = netdev_priv(dev);
+	struct mtk_eth *eth = mac->hw;
+	int i, hwlro_idx;
+
+	for (i = 0; i < MTK_MAX_LRO_IP_CNT; i++) {
+		if (mac->hwlro_ip[i] == 0)
+			continue;
+
+		hwlro_idx = mtk_hwlro_get_ipaddr_idx(dev, mac->hwlro_ip[i]);
+		if (hwlro_idx < 0)
+			continue;
+
+		mtk_hwlro_val_ipaddr(eth, hwlro_idx, mac->hwlro_ip[i]);
+	}
+}
+
 static void mtk_hwlro_netdev_disable(struct net_device *dev)
 {
 	struct mtk_mac *mac = netdev_priv(dev);
@@ -3133,8 +3250,14 @@ static void mtk_hwlro_netdev_disable(struct net_device *dev)
 	int i, hwlro_idx;
 
 	for (i = 0; i < MTK_MAX_LRO_IP_CNT; i++) {
+		if (mac->hwlro_ip[i] == 0)
+			continue;
+
+		hwlro_idx = mtk_hwlro_get_ipaddr_idx(dev, mac->hwlro_ip[i]);
+		if (hwlro_idx < 0)
+			continue;
+
 		mac->hwlro_ip[i] = 0;
-		hwlro_idx = (mac->id * MTK_MAX_LRO_IP_CNT) + i;
 
 		mtk_hwlro_inval_ipaddr(eth, hwlro_idx);
 	}
@@ -3154,15 +3277,15 @@ static int mtk_hwlro_get_fdir_entry(struct net_device *dev,
 
 	/* only tcp dst ipv4 is meaningful, others are meaningless */
 	fsp->flow_type = TCP_V4_FLOW;
-	fsp->h_u.tcp_ip4_spec.ip4dst = ntohl(mac->hwlro_ip[fsp->location]);
+	fsp->h_u.tcp_ip4_spec.ip4dst = htonl(mac->hwlro_ip[fsp->location]);
 	fsp->m_u.tcp_ip4_spec.ip4dst = 0;
 
 	fsp->h_u.tcp_ip4_spec.ip4src = 0;
-	fsp->m_u.tcp_ip4_spec.ip4src = 0xffffffff;
+	fsp->m_u.tcp_ip4_spec.ip4src = htonl(~0U);
 	fsp->h_u.tcp_ip4_spec.psrc = 0;
-	fsp->m_u.tcp_ip4_spec.psrc = 0xffff;
+	fsp->m_u.tcp_ip4_spec.psrc = htons(~0);
 	fsp->h_u.tcp_ip4_spec.pdst = 0;
-	fsp->m_u.tcp_ip4_spec.pdst = 0xffff;
+	fsp->m_u.tcp_ip4_spec.pdst = htons(~0);
 	fsp->h_u.tcp_ip4_spec.tos = 0;
 	fsp->m_u.tcp_ip4_spec.tos = 0xff;
 
@@ -3314,6 +3437,8 @@ static int mtk_set_features(struct net_device *dev, netdev_features_t features)
 
 	if ((diff & NETIF_F_LRO) && !(features & NETIF_F_LRO))
 		mtk_hwlro_netdev_disable(dev);
+	else if ((diff & NETIF_F_LRO) && (features & NETIF_F_LRO))
+		mtk_hwlro_netdev_enable(dev);
 
 	return 0;
 }
@@ -3371,8 +3496,8 @@ static int mtk_dma_init(struct mtk_eth *eth)
 		return err;
 
 	if (eth->hwlro) {
-		for (i = 1; i < MTK_MAX_RX_RING_NUM; i++) {
-			err = mtk_rx_alloc(eth, i, MTK_RX_FLAGS_HWLRO);
+		for (i = 0; i < MTK_HW_LRO_RING_NUM(eth); i++) {
+			err = mtk_rx_alloc(eth, MTK_HW_LRO_RING(eth, i), MTK_RX_FLAGS_HWLRO);
 			if (err)
 				return err;
 		}
@@ -3434,8 +3559,8 @@ static void mtk_dma_free(struct mtk_eth *eth)
 
 	if (eth->hwlro) {
 		mtk_hwlro_rx_uninit(eth);
-		for (i = 1; i < MTK_MAX_RX_RING_NUM; i++)
-			mtk_rx_clean(eth, &eth->rx_ring[i], false);
+		for (i = 0; i < MTK_HW_LRO_RING_NUM(eth); i++)
+			mtk_rx_clean(eth, &eth->rx_ring[MTK_HW_LRO_RING(eth, i)], false);
 	}
 
 	if (MTK_HAS_CAPS(eth->soc->caps, MTK_RSS)) {
@@ -3630,16 +3755,21 @@ static int mtk_start_dma(struct mtk_eth *eth)
 			val |= MTK_RX_BT_32DWORDS;
 		mtk_w32(eth, val, reg_map->qdma.glo_cfg);
 
-		mtk_w32(eth,
-			MTK_RX_DMA_EN | rx_2b_offset |
-			MTK_RX_BT_32DWORDS | MTK_MULTI_EN,
-			reg_map->pdma.glo_cfg);
+		val = mtk_r32(eth, reg_map->pdma.glo_cfg);
+		val |= MTK_RX_DMA_EN | rx_2b_offset |
+		       MTK_RX_BT_32DWORDS | MTK_MULTI_EN;
+		mtk_w32(eth, val, reg_map->pdma.glo_cfg);
 	} else {
 		mtk_w32(eth, MTK_TX_WB_DDONE | MTK_TX_DMA_EN | MTK_RX_DMA_EN |
 			MTK_MULTI_EN | MTK_PDMA_SIZE_8DWORDS,
 			reg_map->pdma.glo_cfg);
 	}
 
+	if (eth->hwlro && mtk_is_netsys_v3_or_greater(eth)) {
+		val = mtk_r32(eth, reg_map->pdma.glo_cfg);
+		mtk_w32(eth, val | MTK_RX_DMA_LRO_EN, reg_map->pdma.glo_cfg);
+	}
+
 	return 0;
 }
 
@@ -3798,6 +3928,14 @@ static int mtk_open(struct net_device *dev)
 			}
 		}
 
+		if (eth->hwlro) {
+			for (i = 0; i < MTK_HW_LRO_RING_NUM(eth); i++) {
+				napi_enable(&eth->rx_napi[MTK_HW_LRO_RING(eth, i)].napi);
+				mtk_rx_irq_enable(eth, MTK_RX_DONE_INT(eth,
+								       MTK_HW_LRO_RING(eth, i)));
+			}
+		}
+
 		refcount_set(&eth->dma_refcnt, 1);
 	} else {
 		refcount_inc(&eth->dma_refcnt);
@@ -3893,6 +4031,14 @@ static int mtk_stop(struct net_device *dev)
 		}
 	}
 
+	if (eth->hwlro) {
+		for (i = 0; i < MTK_HW_LRO_RING_NUM(eth); i++) {
+			mtk_rx_irq_disable(eth, MTK_RX_DONE_INT(eth, MTK_HW_LRO_RING(eth, i)));
+			napi_synchronize(&eth->rx_napi[MTK_HW_LRO_RING(eth, i)].napi);
+			napi_disable(&eth->rx_napi[MTK_HW_LRO_RING(eth, i)].napi);
+		}
+	}
+
 	cancel_work_sync(&eth->rx_dim.work);
 	cancel_work_sync(&eth->tx_dim.work);
 
@@ -4301,6 +4447,14 @@ static int mtk_napi_init(struct mtk_eth *eth)
 		}
 	}
 
+	if (eth->hwlro) {
+		for (i = 0; i < MTK_HW_LRO_RING_NUM(eth); i++) {
+			rx_napi = &eth->rx_napi[MTK_HW_LRO_RING(eth, i)];
+			rx_napi->eth = eth;
+			rx_napi->rx_ring = &eth->rx_ring[MTK_HW_LRO_RING(eth, i)];
+		}
+	}
+
 	return 0;
 }
 
@@ -5352,6 +5506,7 @@ static int mtk_probe(struct platform_device *pdev)
 {
 	struct resource *res = NULL;
 	struct device_node *mac_np;
+	u8 lro_irq, lro_ring;
 	struct mtk_eth *eth;
 	char *irqname;
 	int err, i;
@@ -5577,6 +5732,23 @@ static int mtk_probe(struct platform_device *pdev)
 						goto err_free_dev;
 				}
 			}
+
+			if (eth->hwlro) {
+				for (i = 0; i < MTK_HW_LRO_RING_NUM(eth); i++) {
+					lro_irq = MTK_HW_LRO_IRQ(eth, i);
+					lro_ring = MTK_HW_LRO_RING(eth, i);
+					irqname = devm_kasprintf(eth->dev, GFP_KERNEL,
+								 "%s LRO RX %d",
+								 dev_name(eth->dev), i);
+					err = devm_request_irq(eth->dev,
+							       eth->irq_pdma[lro_irq],
+							       mtk_handle_irq_rx, IRQF_SHARED,
+							       irqname,
+							       &eth->rx_napi[lro_ring]);
+					if (err)
+						goto err_free_dev;
+				}
+			}
 		} else {
 			irqname = devm_kasprintf(eth->dev, GFP_KERNEL, "%s RX",
 						 dev_name(eth->dev));
@@ -5648,6 +5820,13 @@ static int mtk_probe(struct platform_device *pdev)
 				       mtk_napi_rx);
 	}
 
+	if (eth->hwlro) {
+		for (i = 0; i < MTK_HW_LRO_RING_NUM(eth); i++) {
+			netif_napi_add(eth->dummy_dev, &eth->rx_napi[MTK_HW_LRO_RING(eth, i)].napi,
+				       mtk_napi_rx);
+		}
+	}
+
 	platform_set_drvdata(pdev, eth);
 	schedule_delayed_work(&eth->reset.monitor_work,
 			      MTK_DMA_MONITOR_TIMEOUT);
@@ -5696,6 +5875,12 @@ static void mtk_remove(struct platform_device *pdev)
 		for (i = 1; i < MTK_RX_RSS_NUM(eth); i++)
 			netif_napi_del(&eth->rx_napi[MTK_RSS_RING(i)].napi);
 	}
+
+	if (eth->hwlro) {
+		for (i = 0; i < MTK_HW_LRO_RING_NUM(eth); i++)
+			netif_napi_del(&eth->rx_napi[MTK_HW_LRO_RING(eth, i)].napi);
+	}
+
 	mtk_cleanup(eth);
 	free_netdev(eth->dummy_dev);
 	mtk_mdio_cleanup(eth);
diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.h b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
index 378cf47913ef..f7e7299fef6b 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.h
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
@@ -35,7 +35,7 @@
 #define MTK_DMA_SIZE(x)		(SZ_##x)
 #define MTK_FQ_DMA_HEAD		32
 #define MTK_FQ_DMA_LENGTH	2048
-#define MTK_RX_ETH_HLEN		(ETH_HLEN + ETH_FCS_LEN)
+#define MTK_RX_ETH_HLEN		(VLAN_ETH_HLEN + VLAN_HLEN + ETH_FCS_LEN)
 #define MTK_RX_HLEN		(NET_SKB_PAD + MTK_RX_ETH_HLEN + NET_IP_ALIGN)
 #define MTK_DMA_DUMMY_DESC	0xffffffff
 #define MTK_DEFAULT_MSG_ENABLE	(NETIF_MSG_DRV | \
@@ -63,10 +63,9 @@
 
 #define MTK_QRX_OFFSET		0x10
 
-#define MTK_MAX_RX_RING_NUM	4
-#define MTK_HW_LRO_DMA_SIZE	8
-
-#define	MTK_MAX_LRO_RX_LENGTH		(4096 * 3)
+#define MTK_MAX_RX_RING_NUM	(8)
+#define MTK_HW_LRO_DMA_SIZE(eth)	(mtk_is_netsys_v3_or_greater(eth) ? 64 : 8)
+#define	MTK_MAX_LRO_RX_LENGTH		(4096 * 3 + MTK_MAX_RX_LENGTH)
 #define	MTK_MAX_LRO_IP_CNT		2
 #define	MTK_HW_LRO_TIMER_UNIT		1	/* 20 us */
 #define	MTK_HW_LRO_REFRESH_TIME		50000	/* 1 sec. */
@@ -182,31 +181,39 @@
 #define MTK_CDMM_THRES		0x165c
 
 /* PDMA HW LRO Control Registers */
-#define MTK_PDMA_LRO_CTRL_DW0	0x980
-#define MTK_HW_LRO_RING_NUM(eth)		(mtk_is_netsys_v3_or_greater(eth) ? 4 : 3)
+#define MTK_HW_LRO_DIP_NUM(eth)		(mtk_is_netsys_v3_or_greater(eth) ? 4 : 3)
+#define MTK_HW_LRO_RING_NUM(eth)	(mtk_is_netsys_v3_or_greater(eth) ? 4 : 3)
+#define MTK_HW_LRO_RING(eth, x)		((x) + (mtk_is_netsys_v3_or_greater(eth) ? 4 : 1))
+#define MTK_HW_LRO_IRQ(eth, x)		((x) + (mtk_is_netsys_v3_or_greater(eth) ? 0 : 1))
+#define MTK_LRO_CRSN_BNW(eth)		BIT((mtk_is_netsys_v3_or_greater(eth) ? 22 : 6))
 #define MTK_LRO_EN			BIT(0)
 #define MTK_NON_LRO_MULTI_EN		BIT(2)
 #define MTK_LRO_DLY_INT_EN		BIT(5)
-#define MTK_L3_CKS_UPD_EN		BIT(7)
-#define MTK_L3_CKS_UPD_EN_V2		BIT(19)
+#define MTK_L3_CKS_UPD_EN(eth)		BIT(mtk_is_netsys_v3_or_greater(eth) ? 19 : 7)
 #define MTK_LRO_ALT_PKT_CNT_MODE	BIT(21)
-#define MTK_LRO_RING_RELINQUISH_REQ	(0x7 << 26)
-#define MTK_LRO_RING_RELINQUISH_REQ_V2	(0xf << 24)
-#define MTK_LRO_RING_RELINQUISH_DONE	(0x7 << 29)
-#define MTK_LRO_RING_RELINQUISH_DONE_V2	(0xf << 28)
-
-#define MTK_PDMA_LRO_CTRL_DW1	0x984
-#define MTK_PDMA_LRO_CTRL_DW2	0x988
-#define MTK_PDMA_LRO_CTRL_DW3	0x98c
+#define MTK_LRO_RING_RELINQUISH_REQ(eth)	(mtk_is_netsys_v3_or_greater(eth) ? \
+						0xf << 24 : 0x7 << 26)
+#define MTK_LRO_RING_RELINQUISH_DONE(eth)	(mtk_is_netsys_v3_or_greater(eth) ? \
+						0xf << 28 : 0x7 << 29)
+
+#define MTK_PDMA_LRO_CTRL_DW0(reg_map)	((reg_map)->pdma.lro_ctrl_dw0)
+#define MTK_PDMA_LRO_CTRL_DW1(reg_map)	((reg_map)->pdma.lro_ctrl_dw0 + 0x04)
+#define MTK_PDMA_LRO_CTRL_DW2(reg_map)	((reg_map)->pdma.lro_ctrl_dw0 + 0x08)
+#define MTK_PDMA_LRO_CTRL_DW3(reg_map)	((reg_map)->pdma.lro_ctrl_dw0 + 0x0c)
+#define MTK_LRO_VLAN_EN			(0xf << 8)
+#define MTK_LRO_VLAN_VID_CMP_DEPTH	(0x3 << 12)
+#define MTK_LRO_L4_CTRL_PSH_EN		BIT(23)
 #define MTK_ADMA_MODE		BIT(15)
 #define MTK_LRO_MIN_RXD_SDL	(MTK_HW_LRO_SDL_REMAIN_ROOM << 16)
 
+#define MTK_CTRL_DW0_SDL_OFFSET	(3)
+
 #define MTK_RX_DMA_LRO_EN	BIT(8)
 #define MTK_MULTI_EN		BIT(10)
 #define MTK_PDMA_SIZE_8DWORDS	(1 << 4)
 
 /* PDMA RSS Control Registers */
-#define MTK_RX_NAPI_NUM			(4)
+#define MTK_RX_NAPI_NUM			(8)
 #define MTK_RX_RSS_NUM(eth)		((eth)->soc->rss_num)
 #define MTK_RSS_RING(x)			(x)
 #define MTK_RSS_EN			BIT(0)
@@ -242,11 +249,10 @@
 #define MTK_PDMA_DELAY_PTIME_MASK	0xff
 
 /* PDMA HW LRO Alter Flow Delta Register */
-#define MTK_PDMA_LRO_ALT_SCORE_DELTA	0xa4c
+#define MTK_PDMA_LRO_ALT_SCORE_DELTA(reg_map)	((reg_map)->pdma.lro_alt_score_delta)
 
 /* PDMA HW LRO IP Setting Registers */
-#define MTK_LRO_RX_RING0_DIP_DW0	0xb04
-#define MTK_LRO_DIP_DW0_CFG(x)		(MTK_LRO_RX_RING0_DIP_DW0 + (x * 0x40))
+#define MTK_LRO_DIP_DW0_CFG(reg_map, x)	((reg_map)->pdma.lro_ring_dip_dw0 + ((x) * 0x40))
 #define MTK_RING_MYIP_VLD		BIT(9)
 
 /* PDMA HW LRO Ring Control Registers */
@@ -1187,7 +1193,8 @@ enum mkt_eth_capabilities {
 
 #define MT7988_CAPS  (MTK_36BIT_DMA | MTK_GDM1_ESW | MTK_GMAC2_2P5GPHY | \
 		      MTK_MUX_GMAC2_TO_2P5GPHY | MTK_QDMA | MTK_RSTCTRL_PPE1 | \
-		      MTK_RSTCTRL_PPE2 | MTK_SRAM | MTK_PDMA_INT | MTK_RSS)
+		      MTK_RSTCTRL_PPE2 | MTK_SRAM | MTK_PDMA_INT | MTK_RSS | \
+		      MTK_HWLRO)
 
 struct mtk_tx_dma_desc_info {
 	dma_addr_t	addr;
@@ -1454,7 +1461,7 @@ struct mtk_mac {
 	struct phylink_config		phylink_config;
 	struct mtk_eth			*hw;
 	struct mtk_hw_stats		*hw_stats;
-	__be32				hwlro_ip[MTK_MAX_LRO_IP_CNT];
+	u32				hwlro_ip[MTK_MAX_LRO_IP_CNT];
 	int				hwlro_ip_cnt;
 	unsigned int			syscfg0;
 	struct notifier_block		device_notifier;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [net-next v8 2/3] net: ethernet: mtk_eth_soc: Add RSS support
  2026-05-09 19:09 ` [net-next v8 2/3] net: ethernet: mtk_eth_soc: Add RSS support Frank Wunderlich
@ 2026-05-14  1:52   ` Jakub Kicinski
  2026-05-14  1:56   ` Jakub Kicinski
  1 sibling, 0 replies; 8+ messages in thread
From: Jakub Kicinski @ 2026-05-14  1:52 UTC (permalink / raw)
  To: linux
  Cc: Jakub Kicinski, nbd, lorenzo, andrew+netdev, davem, edumazet,
	pabeni, matthias.bgg, angelogioacchino.delregno, linux,
	mason-cw.chang, netdev, linux-kernel, daniel, linux-mediatek,
	linux-arm-kernel

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
Subject: net: ethernet: mtk_eth_soc: Add RSS support

This commit adds Receive Side Scaling support for the mtk_eth_soc driver,
enabling up to four RX rings on MT7988 with ethtool -x and
--set-rxfh-indir support. It also reworks the RX NAPI / IRQ handling
into per-ring instances and renames eth->irq[] to eth->irq_fe[] /
eth->irq_pdma[].

> diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c

[ ... ]

> @@ -2235,16 +2199,15 @@ static int mtk_poll_rx(struct napi_struct *napi, int budget,
[ ... ]
> @@ -2436,7 +2399,7 @@ static int mtk_poll_rx(struct napi_struct *napi, int budget,
>  		 * we continue
>  		 */
>  		wmb();
> -		mtk_update_rx_cpu_idx(eth);
> +		mtk_update_rx_cpu_idx(eth, ring);
>  	}
>  
>  	eth->rx_packets += done;

[High]
With RSS enabled, up to MTK_RX_NAPI_NUM (=4) NAPI instances can run
concurrently on different CPUs.  eth->rx_packets, eth->rx_bytes (here)
and eth->rx_events (incremented in mtk_handle_irq_rx with IRQF_SHARED)
are plain u32 fields with no atomic, no spinlock, and no per-CPU
storage:

	eth->rx_packets += done;
	eth->rx_bytes += bytes;
	dim_update_sample(eth->rx_events, eth->rx_packets, eth->rx_bytes,
			  &dim_sample);
	net_dim(&eth->rx_dim, &dim_sample);

Pre-patch a single rx_napi serialised these accesses.  Are these
unsynchronised cross-CPU updates intentional?  The corrupted values
feed dim_update_sample() / net_dim() which drives interrupt
moderation, and eth->rx_dim is a single struct dim shared across all
RX NAPIs.

> @@ -2645,7 +2608,9 @@ static int mtk_napi_tx(struct napi_struct *napi, int budget)
>  
>  static int mtk_napi_rx(struct napi_struct *napi, int budget)
>  {
> -	struct mtk_eth *eth = container_of(napi, struct mtk_eth, rx_napi);
> +	struct mtk_napi *rx_napi = container_of(napi, struct mtk_napi, napi);
> +	struct mtk_eth *eth = rx_napi->eth;
> +	struct mtk_rx_ring *ring = rx_napi->rx_ring;
>  	const struct mtk_reg_map *reg_map = eth->soc->reg_map;
>  	int rx_done_total = 0;

[Medium]
mtk_napi_rx() calls mtk_handle_status_irq(eth) on entry of every poll.
mtk_handle_status_irq() does an unprotected read / clear of
MTK_INT_STATUS2 and calls mtk_stats_update():

	status2 = mtk_r32(eth, MTK_INT_STATUS2);
	if (unlikely(status2 & (MTK_GDM1_AF | MTK_GDM2_AF))) {
		mtk_stats_update(eth);
		mtk_w32(eth, (MTK_GDM1_AF | MTK_GDM2_AF), MTK_INT_STATUS2);
	}

With multiple rx_napi instances polling on different CPUs, can a fresh
AF event arriving between the read and the write be silently dropped,
or stats be double-counted?  The per-MAC stats_lock taken inside
mtk_stats_update_mac does not synchronise the status read or the
clear.

[ ... ]

> @@ -3222,6 +3192,105 @@ static int mtk_hwlro_get_fdir_all(struct net_device *dev,
[ ... ]
> +static int mtk_rss_init(struct mtk_eth *eth)
> +{
> +	const struct mtk_soc_data *soc = eth->soc;
> +	const struct mtk_reg_map *reg_map = eth->soc->reg_map;
> +	struct mtk_rss_params *rss_params = &eth->rss_params;
> +	u32 val;
> +	int i;
> +
> +	netdev_rss_key_fill(rss_params->hash_key, MTK_RSS_HASH_KEYSIZE);
> +
> +	for (i = 0; i < MTK_RSS_MAX_INDIRECTION_TABLE; i++)
> +		rss_params->indirection_table[i] = ethtool_rxfh_indir_default(i, eth->soc->rss_num);
> +
> +	if (soc->rx.desc_size == sizeof(struct mtk_rx_dma)) {
> +		/* Set RSS rings to PSE modes */
> +		for (i = 1; i <= MTK_HW_LRO_RING_NUM(eth); i++) {
> +			val = mtk_r32(eth, MTK_LRO_CTRL_DW2_CFG(reg_map, i));
> +			val |= MTK_RING_PSE_MODE;
> +			mtk_w32(eth, val, MTK_LRO_CTRL_DW2_CFG(reg_map, i));
> +		}
> +
> +		/* Enable non-lro multiple rx */
> +		val = mtk_r32(eth, reg_map->pdma.lro_ctrl_dw0);
> +		val |= MTK_NON_LRO_MULTI_EN;
> +		mtk_w32(eth, val, reg_map->pdma.lro_ctrl_dw0);
> +
> +		/* Enable RSS dly int supoort */
> +		val |= MTK_LRO_DLY_INT_EN;
> +		mtk_w32(eth, val, reg_map->pdma.lro_ctrl_dw0);
> +	}

[Medium]
Is this block reachable on any SoC?  mtk_rss_init() is only invoked
from mtk_dma_init() under MTK_HAS_CAPS(MTK_RSS).  Only mt7988_data
sets MTK_RSS in caps, and mt7988_data uses
rx.desc_size = sizeof(struct mtk_rx_dma_v2), not
sizeof(struct mtk_rx_dma).

If the PSE-mode / MTK_NON_LRO_MULTI_EN / MTK_LRO_DLY_INT_EN
programming is needed on MT7988, is it being silently skipped?  If it
was intended for mt7981 / mt7986, those SoCs never reach this path
because their caps lack MTK_RSS.

There is also a typo above: "RSS dly int supoort" -> "support".

[ ... ]

> +	/* Set perRSS GRP INT */
> +	mtk_m32(eth, MTK_RX_DONE_INT(eth, MTK_RSS_RING(1)),
> +		MTK_RX_DONE_INT(eth, MTK_RSS_RING(1)), reg_map->pdma.int_grp);

[ ... ]

> +
> +	return 0;
> +}

[ ... ]

> @@ -3312,6 +3381,17 @@ static int mtk_dma_init(struct mtk_eth *eth)
[ ... ]
> +	if (MTK_HAS_CAPS(eth->soc->caps, MTK_RSS)) {
> +		for (i = 1; i < MTK_RX_RSS_NUM(eth); i++) {
> +			err = mtk_rx_alloc(eth, MTK_RSS_RING(i), MTK_RX_FLAGS_NORMAL);
> +			if (err)
> +				return err;
> +		}
> +		err = mtk_rss_init(eth);
> +		if (err)
> +			return err;
> +	}
> +

[ ... ]

> @@ -3437,14 +3523,36 @@ static int mtk_get_irqs(struct platform_device *pdev, struct mtk_eth *eth)
>  	return 0;
>  }
>  
> -static irqreturn_t mtk_handle_irq_rx(int irq, void *_eth)
> +static int mtk_get_irqs_pdma(struct platform_device *pdev, struct mtk_eth *eth)
>  {
> -	struct mtk_eth *eth = _eth;
> +	char rxring[] = "pdma0";
> +	int i;
> +
> +	for (i = 0; i < MTK_PDMA_IRQ_NUM; i++) {
> +		rxring[4] = '0' + i;
> +		eth->irq_pdma[i] = platform_get_irq_byname(pdev, rxring);
> +		if (eth->irq_pdma[i] < 0)
> +			return eth->irq_pdma[i];
> +	}
> +
> +	return 0;
> +}

[High]
This loop demands all four pdma0..pdma3 IRQs whenever
MTK_HAS_CAPS(MTK_PDMA_INT) is set, but the same patch adds
MTK_PDMA_INT to MT7981_CAPS and MT7986_CAPS — SoCs which do not
have MTK_RSS.  On those SoCs only rx_napi[0] / pdma0 is ever
referenced (every pdma1..pdma3 use site is gated on
MTK_HAS_CAPS(MTK_RSS) in probe / open / stop / dma_init / dma_free).

Existing MT7981 / MT7986 device trees declare only fe1 / fe2.  Does
this turn into a probe-time -ENXIO failure for those boards?  Should
the call to mtk_get_irqs_pdma(), or at least the requests for
pdma1..pdma3, be folded under MTK_HAS_CAPS(MTK_RSS) so older DTs
continue to probe?

> +
> +static irqreturn_t mtk_handle_irq_rx(int irq, void *priv)
> +{
> +	struct mtk_napi *rx_napi = priv;
> +	struct mtk_eth *eth = rx_napi->eth;
> +	struct mtk_rx_ring *ring = rx_napi->rx_ring;
>  
>  	eth->rx_events++;
> -	if (likely(napi_schedule_prep(&eth->rx_napi))) {
> -		mtk_rx_irq_disable(eth, eth->soc->rx.irq_done_mask);
> -		__napi_schedule(&eth->rx_napi);
> +	if (unlikely(!(mtk_r32(eth, eth->soc->reg_map->pdma.irq_status) &
> +		       mtk_r32(eth, eth->soc->reg_map->pdma.irq_mask) &
> +		       MTK_RX_DONE_INT(eth, ring->ring_no))))
> +		return IRQ_NONE;

[High]
Pre-patch mt7988_data set rx.irq_done_mask = MTK_RX_DONE_INT_V2 =
BIT(14).  Post-patch the per-soc field is removed and replaced with
MTK_RX_DONE_INT(eth, ring_no), which for V3 (mt7988) expands to
BIT(24 + ring_no), so MTK_RX_DONE_INT(eth, 0) is now BIT(24) instead
of BIT(14).

Is the bit number change intentional?  If MT7988 RX-done is actually
signalled on BIT(14), the new mask never matches, mtk_handle_irq_rx
returns IRQ_NONE, and RX traffic stalls.  If BIT(14) was wrong all
along, could the commit message describe the silent fix?  The patch
otherwise advertises only RSS.

[ ... ]

> @@ -3888,9 +4011,7 @@ static void mtk_dim_rx(struct work_struct *work)
[ ... ]
> -	val = mtk_r32(eth, reg_map->pdma.delay_irq);
> -	val &= MTK_PDMA_DELAY_TX_MASK;
> -	val |= MTK_PDMA_DELAY_RX_EN;
> +	val = MTK_PDMA_DELAY_RX_EN;
[ ... ]
> -	mtk_w32(eth, val, reg_map->pdma.delay_irq);
>  	if (MTK_HAS_CAPS(eth->soc->caps, MTK_QDMA))
> -		mtk_w32(eth, val, reg_map->qdma.delay_irq);
> +		mtk_m32(eth, MTK_PDMA_DELAY_TX_MASK,
> +			val << MTK_PDMA_DELAY_TX_PTIME_SHIFT, reg_map->qdma.delay_irq);
> +
> +	if (eth->soc->rx.desc_size == sizeof(struct mtk_rx_dma)) {
> +		mtk_m32(eth, MTK_PDMA_DELAY_RX_MASK, val, reg_map->pdma.delay_irq);
> +		mtk_w32(eth, val, reg_map->pdma.lro_rx1_dly_int);
> +		mtk_w32(eth, val, reg_map->pdma.lro_rx1_dly_int + 0x4);
> +		mtk_w32(eth, val, reg_map->pdma.lro_rx1_dly_int + 0x8);
> +	} else {
> +		val = val | (val << MTK_PDMA_DELAY_RX_RING_SHIFT);
> +		mtk_w32(eth, val, reg_map->pdma.rx_delay_irq);
> +	}

[Critical]
The new branch is gated on rx.desc_size, not on a capability bit.
rt5350_data (MT7628) has rx.desc_size = sizeof(struct mtk_rx_dma) and
uses mt7628_reg_map, which does not initialise lro_rx1_dly_int (it
defaults to 0).

mtk_hw_init() calls mtk_dim_rx(&eth->rx_dim.work) inside the MT7628
branch, and net_dim invokes it again on every moderation update.  Each
call therefore performs:

	mtk_w32(eth, val, 0);
	mtk_w32(eth, val, 0x4);
	mtk_w32(eth, val, 0x8);

These offsets land in the Frame Engine global config region
(MTK_FE_GLO_CFG(0) = 0x00 etc.) instead of any LRO/RSS register, with
val containing MTK_PDMA_DELAY_RX_EN = BIT(15).  Can this corrupt
MT7628 frame-engine state on every boot and every NAPI moderation
update?  Should the branch be gated on MTK_HWLRO or MTK_RSS rather
than on rx.desc_size?

[Medium]
On V3 hardware (the only chip enabling RSS) only two ring slots are
programmed here:

	val = val | (val << MTK_PDMA_DELAY_RX_RING_SHIFT);
	mtk_w32(eth, val, reg_map->pdma.rx_delay_irq);

With MTK_PDMA_DELAY_RX_RING_SHIFT = 16 the register holds two rings,
but RSS uses MTK_RX_RSS_NUM(eth) = 4 rings.  Do rings 2 and 3 ever
receive DIM-driven moderation updates after probe, or do they stay at
the hardware default?

[ ... ]

> @@ -4149,6 +4285,25 @@ static void mtk_hw_reset_monitor_work(struct work_struct *work)
[ ... ]
> +static int mtk_napi_init(struct mtk_eth *eth)
> +{
[ ... ]
> +}

[ ... ]

> @@ -4238,12 +4393,11 @@ static int mtk_hw_init(struct mtk_eth *eth, bool reset)
[ ... ]

> @@ -4700,8 +4860,13 @@ static void mtk_get_ethtool_stats(struct net_device *dev,
>  
>  static u32 mtk_get_rx_ring_count(struct net_device *dev)
>  {
> +	struct mtk_mac *mac = netdev_priv(dev);
> +	struct mtk_eth *eth = mac->hw;
> +
>  	if (dev->hw_features & NETIF_F_LRO)
>  		return MTK_MAX_RX_RING_NUM;
> +	else if (MTK_HAS_CAPS(eth->soc->caps, MTK_RSS))
> +		return MTK_RX_RSS_NUM(eth);
>  
>  	return 0;
>  }

[ ... ]

> @@ -4784,6 +4949,70 @@ static int mtk_set_eee(struct net_device *dev, struct ethtool_keee *eee)
[ ... ]
> +static int mtk_set_rxfh(struct net_device *dev, struct ethtool_rxfh_param *rxfh,
> +			struct netlink_ext_ack *extack)
> +{
> +	struct mtk_mac *mac = netdev_priv(dev);
> +	struct mtk_eth *eth = mac->hw;
> +	struct mtk_rss_params *rss_params = &eth->rss_params;
> +	const struct mtk_reg_map *reg_map = eth->soc->reg_map;
> +	int i;
> +
> +	if (rxfh->hfunc != ETH_RSS_HASH_NO_CHANGE &&
> +	    rxfh->hfunc != ETH_RSS_HASH_TOP)
> +		return -EOPNOTSUPP;
> +
> +	if (rxfh->key) {
> +		memcpy(rss_params->hash_key, rxfh->key,
> +		       sizeof(rss_params->hash_key));
> +		for (i = 0; i < MTK_RSS_HASH_KEYSIZE / sizeof(u32); i++)
> +			mtk_w32(eth, rss_params->hash_key[i],
> +				MTK_RSS_HASH_KEY_DW(reg_map, i));
> +	}
> +
> +	if (rxfh->indir) {
> +		for (i = 0; i < MTK_RSS_MAX_INDIRECTION_TABLE; i++)
> +			rss_params->indirection_table[i] = rxfh->indir[i];
> +		for (i = 0; i < MTK_RSS_MAX_INDIRECTION_TABLE / 16; i++)
> +			mtk_w32(eth, mtk_rss_indr_table(rss_params, i),
> +				MTK_RSS_INDR_TABLE_DW(reg_map, i));
> +	}
> +
> +	return 0;
> +}

[High]
Should this op (and .get_rxfh / .get_rxfh_key_size /
.get_rxfh_indir_size) be gated on MTK_HAS_CAPS(MTK_RSS)?  As written,
mtk_ethtool_ops exposes them unconditionally for every SoC the driver
supports.

MTK_RSS_HASH_KEY_DW(reg_map, x) and MTK_RSS_INDR_TABLE_DW(reg_map, x)
are computed as (reg_map)->pdma.rss_glo_cfg + 0x20 + ... and + 0x50 +
... .  mtk_reg_map (mt2701/mt7621/mt7622/mt7623/mt7629) and
mt7628_reg_map do not initialise rss_glo_cfg, so it defaults to 0.

A CAP_NET_ADMIN user invoking ethtool -X dev hkey ... on those SoCs
would write user-controlled u32 values into absolute frame-engine
offsets 0x20..0x44 (and indirection writes to 0x50..0x6c).  Can these
ops be made conditional on MTK_RSS, or filled in only for the RSS-
capable SoC?

[Medium]
mtk_set_rxfh() does not validate user-supplied indir[i] entries
against MTK_RX_RSS_NUM(eth).  The packing helper allots only 2 bits
per entry:

	val |= (rss_params->indirection_table[i] << (2 * (i % 16)));

ethtool core only enforces indir[i] < num_rx_rings when
num_rx_rings is non-zero, and MTK_RSS hardware has at most 4 rings.
Should the driver reject values >= MTK_RX_RSS_NUM(eth) so they don't
silently bleed across the 2-bit slots in the packed register?  The
indirection table also stores the raw u32 into a u8 field, which
truncates 4..255 instead of catching them.

[Medium]
mtk_rss_init() wraps key / indirection writes with an
MTK_RSS_CFG_REQ pause / release handshake:

	/* Pause */
	val |= MTK_RSS_CFG_REQ;
	mtk_w32(eth, val, reg_map->pdma.rss_glo_cfg);
	...
	/* Release pause */
	val &= ~(MTK_RSS_CFG_REQ);

mtk_set_rxfh() writes the same registers at runtime without that
handshake.  Can the hardware sample partially-updated key /
indirection state and steer packets to incorrect or out-of-range
rings while the write loop is in flight?

[ ... ]

> @@ -5424,6 +5708,7 @@ static const struct mtk_soc_data mt2701_data = {

[ ... entries for mt2701/mt7621/mt7622/mt7623/mt7629/rt5350 ... ]

> @@ -5567,16 +5851,16 @@ static const struct mtk_soc_data mt7981_data = {
[ ... ]
>  	.foe_entry_size = MTK_FOE_ENTRY_V2_SIZE,
> +	.rss_num = 4,
[ ... ]
> @@ -5597,6 +5881,7 @@ static const struct mtk_soc_data mt7986_data = {
[ ... ]
>  	.foe_entry_size = MTK_FOE_ENTRY_V2_SIZE,
> +	.rss_num = 4,

[Low]
mt7981_data and mt7986_data set rss_num = 4, but MT7981_CAPS and
MT7986_CAPS do not include MTK_RSS (only MT7988_CAPS does).  Every
RSS code path is gated on MTK_HAS_CAPS(MTK_RSS) before consulting
rss_num, so this has no current effect, but the SoC capability table
becomes internally inconsistent.  Should rss_num be 0 on these SoCs
until MTK_RSS is actually wired up and tested for them?

> @@ -5627,20 +5911,20 @@ static const struct mtk_soc_data mt7988_data = {
[ ... ]
>  	.foe_entry_size = MTK_FOE_ENTRY_V3_SIZE,
> +	.rss_num = 4,
[ ... ]
>  	.rx = {
>  		.desc_size = sizeof(struct mtk_rx_dma_v2),
> -		.irq_done_mask = MTK_RX_DONE_INT_V2,
>  		.dma_l4_valid = RX_DMA_L4_VALID_V2,
>  		.dma_max_len = MTK_TX_DMA_BUF_LEN_V2,
>  		.dma_len_offset = 8,
> -		.dma_size = MTK_DMA_SIZE(2K),
> +		.dma_size = MTK_DMA_SIZE(1K),
>  	},
>  };

[ ... ]
-- 
pw-bot: cr

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [net-next v8 3/3] net: ethernet: mtk_eth_soc: Add LRO support
  2026-05-09 19:09 ` [net-next v8 3/3] net: ethernet: mtk_eth_soc: Add LRO support Frank Wunderlich
@ 2026-05-14  1:52   ` Jakub Kicinski
  2026-05-14  1:53   ` Jakub Kicinski
  1 sibling, 0 replies; 8+ messages in thread
From: Jakub Kicinski @ 2026-05-14  1:52 UTC (permalink / raw)
  To: linux
  Cc: Jakub Kicinski, nbd, lorenzo, andrew+netdev, davem, edumazet,
	pabeni, matthias.bgg, angelogioacchino.delregno, linux,
	mason-cw.chang, netdev, linux-kernel, daniel, linux-mediatek,
	linux-arm-kernel

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
Subject: net: ethernet: mtk_eth_soc: Add LRO support

This patch adds Hardware Large Receive Offload support to the
mediatek ethernet driver and enables it on MT7988 by adding
MTK_HWLRO to MT7988_CAPS. The patch parameterizes the LRO
register accessors and ring counts on the SoC version, and adds
LRO-specific NAPI/IRQ wiring.

A few questions on the changes below.

> diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
> index 908fd88287ac..8035fc2557de 100644
> --- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
> +++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
> @@ -2806,7 +2806,7 @@ static int mtk_rx_alloc(struct mtk_eth *eth, int ring_no, int rx_flag)
>  	if (!ring->data)
>  		return -ENOMEM;
>  
> -	if (mtk_page_pool_enabled(eth)) {
> +	if (mtk_page_pool_enabled(eth) && rcu_access_pointer(eth->prog))  {
>  		struct page_pool *pp;
>  
>  		pp = mtk_create_page_pool(eth, &ring->xdp_q, ring_no,

[High]
Could the rationale for this gating change be explained in the commit
message?  The subject is "Add LRO support" but this hunk changes the
page_pool creation policy for every netsys v2+ chip.

Before the patch, every v2+ RX ring was backed by a page_pool (since
mtk_page_pool_enabled() returns mtk_is_netsys_v2_or_greater(eth)).
After the patch, a page_pool is only created when an XDP program is
already attached at ring allocation time.

mtk_xdp_setup() also rejects XDP attach with -EOPNOTSUPP when
eth->hwlro is true:

    if (eth->hwlro) {
            NL_SET_ERR_MSG_MOD(extack, "HW LRO and XDP are mutually exclusive");
            return -EOPNOTSUPP;
    }

So on MT7988 (where this patch enables MTK_HWLRO) eth->prog is
permanently NULL and mtk_create_page_pool() is now never reached
for any RX ring, including the non-LRO ring 0.

Does this intentionally disable page_pool/DMA buffer recycling on
MT7988, and on every other v2+ chip until a BPF program is loaded?
mtk_ethtool_pp_stats() depends on ring->page_pool, so the page-pool
counters reported through ethtool would also become empty.

The driver still advertises NETDEV_XDP_ACT_BASIC|REDIRECT|NDO_XMIT|
NDO_XMIT_SG via xdp_features for v2+, which appears to diverge from
the buffer-pool capability after this change.

> @@ -2974,30 +2975,50 @@ static int mtk_hwlro_rx_init(struct mtk_eth *eth)
>  	ring_ctrl_dw2 |= MTK_RING_MAX_AGG_CNT_L;
>  	ring_ctrl_dw3 |= MTK_RING_MAX_AGG_CNT_H;
>  
> -	for (i = 1; i < MTK_MAX_RX_RING_NUM; i++) {
> +	for (i = 1; i <= MTK_HW_LRO_RING_NUM(eth); i++) {
>  		mtk_w32(eth, ring_ctrl_dw1, MTK_LRO_CTRL_DW1_CFG(reg_map, i));

[ ... ]

>  	/* set refresh timer for altering flows to 1 sec. (unit: 20us) */
>  	mtk_w32(eth, (MTK_HW_LRO_TIMER_UNIT << 16) | MTK_HW_LRO_REFRESH_TIME,
>  		MTK_PDMA_LRO_ALT_REFRESH_TIMER);

[Medium]
Should MTK_PDMA_LRO_ALT_REFRESH_TIMER also have been converted to
go through reg_map like every other LRO register touched by this
patch?  Looking at the header, it is still defined as a hardcoded
absolute offset 0x1c that has lived in the v1 PDMA window since
2016.  On MT7988 (newly added to MTK_HWLRO via MT7988_CAPS) the
LRO block lives at 0x6c08-0x6c44 per mt7988_reg_map, while offset
0x1c on v3 falls in the Frame Engine register area between
MTK_FE_INT_ENABLE and MTK_FE_INT_GRP.

Every other LRO register access in this function was switched to
MTK_PDMA_LRO_CTRL_DW0(reg_map), MTK_PDMA_LRO_ALT_SCORE_DELTA(reg_map),
MTK_LRO_DIP_DW0_CFG(reg_map, x) and so on.  Was the omission of
this one register intentional, or should it be a similar
reg_map-based accessor so that on MT7988 the auto-learn refresh
timer is programmed in the LRO block instead of in the FE engine
region?

> -	/* set HW LRO mode & the max aggregation count for rx packets */
> -	lro_ctrl_dw3 |= MTK_ADMA_MODE | (MTK_HW_LRO_MAX_AGG_CNT & 0xff);
> +	if (mtk_is_netsys_v3_or_greater(eth)) {

[ ... ]

>  	/* enable HW LRO */
>  	lro_ctrl_dw0 |= MTK_LRO_EN;
>  
> -	mtk_w32(eth, lro_ctrl_dw3, MTK_PDMA_LRO_CTRL_DW3);
> -	mtk_w32(eth, lro_ctrl_dw0, MTK_PDMA_LRO_CTRL_DW0);
> +	mtk_w32(eth, lro_ctrl_dw3, MTK_PDMA_LRO_CTRL_DW3(reg_map));
> +	mtk_w32(eth, lro_ctrl_dw0, MTK_PDMA_LRO_CTRL_DW0(reg_map));
> +
> +	if (mtk_is_netsys_v2_or_greater(eth)) {
> +		i = (soc->rx.desc_size == sizeof(struct mtk_rx_dma_v2)) ? 1 : 0;
> +		mtk_m32(eth, MTK_RX_DONE_INT(eth, MTK_HW_LRO_RING(eth, i)),
> +			MTK_RX_DONE_INT(eth, MTK_HW_LRO_RING(eth, i)), reg_map->pdma.int_grp);
> +		mtk_m32(eth, MTK_RX_DONE_INT(eth, MTK_HW_LRO_RING(eth, i + 1)),
> +			MTK_RX_DONE_INT(eth, MTK_HW_LRO_RING(eth, i + 1)),
> +			reg_map->pdma.int_grp + 0x4);
> +		mtk_m32(eth, MTK_RX_DONE_INT(eth, MTK_HW_LRO_RING(eth, i + 2)),
> +			MTK_RX_DONE_INT(eth, MTK_HW_LRO_RING(eth, i + 2)), reg_map->pdma.int_grp3);
> +	}

[Medium]
On v3 (the only chip with both MTK_HWLRO and v2+ enabled by this
patch), the desc_size check yields i = 1, so this block routes
MTK_HW_LRO_RING(eth, 1..3) — physical rings 5, 6, 7 with done bits
BIT(29..31) — into int_grp, int_grp+0x4 and int_grp3 respectively.
What routes MTK_HW_LRO_RING(eth, 0), i.e. physical ring 4 with done
bit BIT(28)?

mtk_probe() registers an IRQ handler for all MTK_HW_LRO_RING_NUM(eth)
= 4 LRO rings using IRQF_SHARED, so a NAPI is attached for ring 4 on
eth->irq_pdma[MTK_HW_LRO_IRQ(eth, 0)] (= irq_pdma[0] on v3).  Without
ring 4's BIT(28) being OR'd into any int_grp register, does ring 4's
RX interrupt actually reach NAPI, or does that depend on hardware
reset defaults of int_grp on MT7988?

>  	return 0;
>  }
> @@ -3080,6 +3112,65 @@ static int mtk_hwlro_get_ip_cnt(struct mtk_mac *mac)
>  	return cnt;
>  }
>  
> +static int mtk_hwlro_add_ipaddr_idx(struct net_device *dev, u32 ip4dst)
> +{
> +	struct mtk_mac *mac = netdev_priv(dev);
> +	const struct mtk_reg_map *reg_map;
> +	struct mtk_eth *eth = mac->hw;
> +	u32 reg_val;
> +	int i;
> +
> +	reg_map = eth->soc->reg_map;
> +
> +	/* check for duplicate IP address in the current DIP list */
> +	for (i = 1; i <= MTK_HW_LRO_DIP_NUM(eth); i++) {
> +		reg_val = mtk_r32(eth, MTK_LRO_DIP_DW0_CFG(reg_map, i));
> +		if (reg_val == ip4dst)
> +			break;
> +	}
> +
> +	if (i < MTK_HW_LRO_DIP_NUM(eth) + 1) {
> +		netdev_warn(dev, "Duplicate IP address at DIP(%d)!\n", i);
> +		return -EEXIST;
> +	}
> +
> +	/* find out available DIP index */
> +	for (i = 1; i <= MTK_HW_LRO_DIP_NUM(eth); i++) {
> +		reg_val = mtk_r32(eth, MTK_LRO_DIP_DW0_CFG(reg_map, i));
> +		if (reg_val == 0UL)
> +			break;
> +	}
> +
> +	if (i >= MTK_HW_LRO_DIP_NUM(eth) + 1) {
> +		netdev_warn(dev, "DIP index is currently out of resource!\n");
> +		return -EBUSY;
> +	}
> +
> +	return i;
> +}

[High]
Is it intentional that the DIP-table allocation policy is changed
from a per-MAC partition (previously hwlro_idx = mac->id *
MTK_MAX_LRO_IP_CNT + fsp->location) to a single global pool searched
across all MACs?  The duplicate check and free-slot search above
both iterate every DIP register without scoping by mac->id.

Two observations:

1. Two netdevs that both want HW LRO on the same destination IP
   collide on the duplicate-check and the second one fails with
   -EEXIST.  A single MAC can also fill all DIP slots and starve
   the other; non-v3 silicon has only 3 slots while the uAPI
   advertises 2 MACs * MTK_MAX_LRO_IP_CNT = 4 rules.

2. mtk_hwlro_add_ipaddr() (below) does not invalidate any
   pre-existing entry at mac->hwlro_ip[fsp->location] before
   calling this helper, so adding a new IP at a location that
   already had one finds a fresh slot for the new IP and then
   overwrites the software state.  The old slot still has the
   original IP with MYIP_VLD set in hardware, and a later del at
   the same location only finds and invalidates the new entry.

Together they look like a CAP_NET_ADMIN-reachable way to leak DIP
slots until device reset, while mac->hwlro_ip[] looks consistent.

> +static int mtk_hwlro_get_ipaddr_idx(struct net_device *dev, u32 ip4dst)
> +{
> +	struct mtk_mac *mac = netdev_priv(dev);
> +	struct mtk_eth *eth = mac->hw;
> +	u32 reg_val;
> +	int i;
> +
> +	/* find out DIP index that matches the given IP address */
> +	for (i = 1; i <= MTK_HW_LRO_DIP_NUM(eth); i++) {
> +		reg_val = mtk_r32(eth, MTK_LRO_DIP_DW0_CFG(eth->soc->reg_map, i));
> +		if (reg_val == ip4dst)
> +			break;
> +	}
> +
> +	if (i >= MTK_HW_LRO_DIP_NUM(eth) + 1) {
> +		netdev_warn(dev, "DIP address is not exist!\n");
> +		return -ENOENT;
> +	}
> +
> +	return i;
> +}
> +
>  static int mtk_hwlro_add_ipaddr(struct net_device *dev,
>  				struct ethtool_rxnfc *cmd)
>  {
> @@ -3112,13 +3207,17 @@ static int mtk_hwlro_del_ipaddr(struct net_device *dev,
>  	struct mtk_mac *mac = netdev_priv(dev);
>  	struct mtk_eth *eth = mac->hw;
>  	int hwlro_idx;
> +	u32 ip4dst;
>  
>  	if (fsp->location > 1)
>  		return -EINVAL;
>  
> -	mac->hwlro_ip[fsp->location] = 0;
> -	hwlro_idx = (mac->id * MTK_MAX_LRO_IP_CNT) + fsp->location;
> +	ip4dst = mac->hwlro_ip[fsp->location];
> +	hwlro_idx = mtk_hwlro_get_ipaddr_idx(dev, ip4dst);
> +	if (hwlro_idx < 0)
> +		return hwlro_idx;
>  
> +	mac->hwlro_ip[fsp->location] = 0;
>  	mac->hwlro_ip_cnt = mtk_hwlro_get_ip_cnt(mac);
>  
>  	mtk_hwlro_inval_ipaddr(eth, hwlro_idx);

[Medium]
Should this path validate that mac->hwlro_ip[fsp->location] is
non-zero before passing it to mtk_hwlro_get_ipaddr_idx()?  When the
location was never populated (or was already deleted), ip4dst is 0.
mtk_hwlro_get_ipaddr_idx() then matches the first slot whose register
reads 0 — i.e. an arbitrary unrelated empty slot — and returns a
positive index, after which mtk_hwlro_inval_ipaddr() clears MYIP_VLD
on that unrelated slot and the call returns 0 to userspace.

Either rejecting the call with -ENOENT here when
mac->hwlro_ip[fsp->location] == 0, or having mtk_hwlro_get_ipaddr_idx()
refuse ip4dst == 0, would avoid that.

> @@ -3126,6 +3225,24 @@ static int mtk_hwlro_del_ipaddr(struct net_device *dev,
>  	return 0;
>  }
>  
> +static void mtk_hwlro_netdev_enable(struct net_device *dev)
> +{
> +	struct mtk_mac *mac = netdev_priv(dev);
> +	struct mtk_eth *eth = mac->hw;
> +	int i, hwlro_idx;
> +
> +	for (i = 0; i < MTK_MAX_LRO_IP_CNT; i++) {
> +		if (mac->hwlro_ip[i] == 0)
> +			continue;
> +
> +		hwlro_idx = mtk_hwlro_get_ipaddr_idx(dev, mac->hwlro_ip[i]);
> +		if (hwlro_idx < 0)
> +			continue;
> +
> +		mtk_hwlro_val_ipaddr(eth, hwlro_idx, mac->hwlro_ip[i]);
> +	}
> +}
> +

[Low]
This isn't a bug, but is mtk_hwlro_netdev_enable() reachable in
practice?  It uses mtk_hwlro_get_ipaddr_idx(), which only succeeds
when the IP is already programmed in hardware.  The only path that
disables LRO is mtk_hwlro_netdev_disable() below, which clears
mac->hwlro_ip[i] = 0 (so the first guard always continues) and also
invalidates the DIP entries.  After ethtool -K lro off the IPs are
gone from both software and hardware, so the subsequent ethtool -K
lro on appears to do nothing.

If the intent is restore-on-re-enable, should the disable preserve
mac->hwlro_ip[] and/or should this enable path call
mtk_hwlro_add_ipaddr_idx() to allocate fresh slots?

>  static void mtk_hwlro_netdev_disable(struct net_device *dev)
>  {
>  	struct mtk_mac *mac = netdev_priv(dev);
> @@ -3133,8 +3250,14 @@ static void mtk_hwlro_netdev_disable(struct net_device *dev)
>  	int i, hwlro_idx;
>  
>  	for (i = 0; i < MTK_MAX_LRO_IP_CNT; i++) {
> +		if (mac->hwlro_ip[i] == 0)
> +			continue;
> +
> +		hwlro_idx = mtk_hwlro_get_ipaddr_idx(dev, mac->hwlro_ip[i]);
> +		if (hwlro_idx < 0)
> +			continue;
> +
>  		mac->hwlro_ip[i] = 0;
> -		hwlro_idx = (mac->id * MTK_MAX_LRO_IP_CNT) + i;
>  
>  		mtk_hwlro_inval_ipaddr(eth, hwlro_idx);
>  	}

[Medium]
Can this leave mac->hwlro_ip[i] populated while mac->hwlro_ip_cnt is
reset to 0?  When mtk_hwlro_get_ipaddr_idx() returns < 0 (for example
because the DIP table was already cleared by a prior
mtk_hwlro_rx_uninit, by a parallel teardown on another MAC, or by a
hardware reset), the continue skips the mac->hwlro_ip[i] = 0
assignment.  After the loop, mac->hwlro_ip_cnt = 0 still runs
unconditionally.

The previous code unconditionally cleared mac->hwlro_ip[i].  In the
new shape, ETHTOOL_GRXCLSRLCNT would report 0 while
ETHTOOL_GRXCLSRLALL via mtk_hwlro_get_fdir_all() still enumerates
the stale entries, and mtk_hwlro_get_ip_cnt() called from a later
add would recount them.

> diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.h b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
> index 378cf47913ef..f7e7299fef6b 100644
> --- a/drivers/net/ethernet/mediatek/mtk_eth_soc.h
> +++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
> @@ -35,7 +35,7 @@
>  #define MTK_DMA_SIZE(x)		(SZ_##x)
>  #define MTK_FQ_DMA_HEAD		32
>  #define MTK_FQ_DMA_LENGTH	2048
> -#define MTK_RX_ETH_HLEN		(ETH_HLEN + ETH_FCS_LEN)
> +#define MTK_RX_ETH_HLEN		(VLAN_ETH_HLEN + VLAN_HLEN + ETH_FCS_LEN)
>  #define MTK_RX_HLEN		(NET_SKB_PAD + MTK_RX_ETH_HLEN + NET_IP_ALIGN)

[High]
Could this change be called out in the commit message?  It widens
MTK_RX_ETH_HLEN from 18 to 26 globally for every supported chip,
not just MT7988/LRO.

mtk_add_mac() sets:

    eth->netdev[id]->max_mtu = MTK_MAX_RX_LENGTH_2K - MTK_RX_ETH_HLEN;

(and MTK_MAX_RX_LENGTH - MTK_RX_ETH_HLEN for MT7628), so the
maximum MTU advertised by every supported MediaTek SoC drops by 8
bytes (e.g. 2030 -> 2022 on the 2K path, 1518 -> 1510 on MT7628).
Userspace configuration that previously set MTU values up to the
prior limit will now be rejected with -EINVAL.

If headroom for stacked VLANs is needed only on the LRO path, would
it be possible to confine the +8 to MTK_MAX_LRO_RX_LENGTH instead
of changing MTK_RX_ETH_HLEN for every chip?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [net-next v8 3/3] net: ethernet: mtk_eth_soc: Add LRO support
  2026-05-09 19:09 ` [net-next v8 3/3] net: ethernet: mtk_eth_soc: Add LRO support Frank Wunderlich
  2026-05-14  1:52   ` Jakub Kicinski
@ 2026-05-14  1:53   ` Jakub Kicinski
  1 sibling, 0 replies; 8+ messages in thread
From: Jakub Kicinski @ 2026-05-14  1:53 UTC (permalink / raw)
  To: Frank Wunderlich
  Cc: Felix Fietkau, Lorenzo Bianconi, Andrew Lunn, David S. Miller,
	Eric Dumazet, Paolo Abeni, Matthias Brugger,
	AngeloGioacchino Del Regno, Russell King, Frank Wunderlich,
	netdev, linux-kernel, linux-arm-kernel, linux-mediatek,
	Mason Chang, Daniel Golle

On Sat,  9 May 2026 21:09:32 +0200 Frank Wunderlich wrote:
> +			mtk_rx_irq_disable(eth, MTK_RX_DONE_INT(eth, MTK_HW_LRO_RING(eth, i)));
> +			napi_synchronize(&eth->rx_napi[MTK_HW_LRO_RING(eth, i)].napi);
> +			napi_disable(&eth->rx_napi[MTK_HW_LRO_RING(eth, i)].napi);

What purpose that napi_synchronize() serve?
Also we don't charge for temporary variables, maybe save that
MTK_HW_LRO_RING(eth, i) to make this slightly more readable.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [net-next v8 2/3] net: ethernet: mtk_eth_soc: Add RSS support
  2026-05-09 19:09 ` [net-next v8 2/3] net: ethernet: mtk_eth_soc: Add RSS support Frank Wunderlich
  2026-05-14  1:52   ` Jakub Kicinski
@ 2026-05-14  1:56   ` Jakub Kicinski
  1 sibling, 0 replies; 8+ messages in thread
From: Jakub Kicinski @ 2026-05-14  1:56 UTC (permalink / raw)
  To: Frank Wunderlich
  Cc: Felix Fietkau, Lorenzo Bianconi, Andrew Lunn, David S. Miller,
	Eric Dumazet, Paolo Abeni, Matthias Brugger,
	AngeloGioacchino Del Regno, Russell King, Frank Wunderlich,
	netdev, linux-kernel, linux-arm-kernel, linux-mediatek,
	Mason Chang, Daniel Golle

On Sat,  9 May 2026 21:09:31 +0200 Frank Wunderlich wrote:
> From: Mason Chang <mason-cw.chang@mediatek.com>
> 
> Add support for Receive Side Scaling.
> 
> We can adjust SMP affinity with the following command:
> echo [CPU bitmap num] > /proc/irq/[virtual IRQ ID]/smp_affinity,
> with interrupts evenly assigned to 4 CPUs, we were able to measure
> an RX throughput of 7.3Gbps using iperf3 on the MT7988. Further
> optimizations will be carried out in the future.

Would be great to split this up a little more for ease of review.

> +static int mtk_rss_init(struct mtk_eth *eth)
> +{
> +	const struct mtk_soc_data *soc = eth->soc;
> +	const struct mtk_reg_map *reg_map = eth->soc->reg_map;
> +	struct mtk_rss_params *rss_params = &eth->rss_params;

reverse xmas tree should be followed, please fix everywhere in this
submission 

> +	u32 val;
> +	int i;
> +
> +	netdev_rss_key_fill(rss_params->hash_key, MTK_RSS_HASH_KEYSIZE);
> +
> +	for (i = 0; i < MTK_RSS_MAX_INDIRECTION_TABLE; i++)
> +		rss_params->indirection_table[i] = ethtool_rxfh_indir_default(i, eth->soc->rss_num);
> +
> +	if (soc->rx.desc_size == sizeof(struct mtk_rx_dma)) {
> +		/* Set RSS rings to PSE modes */
> +		for (i = 1; i <= MTK_HW_LRO_RING_NUM(eth); i++) {
> +			val = mtk_r32(eth, MTK_LRO_CTRL_DW2_CFG(reg_map, i));
> +			val |= MTK_RING_PSE_MODE;
> +			mtk_w32(eth, val, MTK_LRO_CTRL_DW2_CFG(reg_map, i));
> +		}
> +
> +		/* Enable non-lro multiple rx */
> +		val = mtk_r32(eth, reg_map->pdma.lro_ctrl_dw0);
> +		val |= MTK_NON_LRO_MULTI_EN;
> +		mtk_w32(eth, val, reg_map->pdma.lro_ctrl_dw0);
> +
> +		/* Enable RSS dly int supoort */
> +		val |= MTK_LRO_DLY_INT_EN;
> +		mtk_w32(eth, val, reg_map->pdma.lro_ctrl_dw0);
> +	}
> +
> +	/* Hash Type */
> +	val = mtk_r32(eth, reg_map->pdma.rss_glo_cfg);
> +	val |= MTK_RSS_IPV4_STATIC_HASH;
> +	val |= MTK_RSS_IPV6_STATIC_HASH;
> +	mtk_w32(eth, val, reg_map->pdma.rss_glo_cfg);
> +
> +	/* Hash Key */
> +	for (i = 0; i < MTK_RSS_HASH_KEYSIZE / sizeof(u32); i++)
> +		mtk_w32(eth, rss_params->hash_key[i], MTK_RSS_HASH_KEY_DW(reg_map, i));
> +
> +	/* Select the size of indirection table */
> +	for (i = 0; i < MTK_RSS_MAX_INDIRECTION_TABLE / 16; i++)
> +		mtk_w32(eth, mtk_rss_indr_table(rss_params, i),
> +			MTK_RSS_INDR_TABLE_DW(reg_map, i));
> +
> +	/* Pause */
> +	val |= MTK_RSS_CFG_REQ;
> +	mtk_w32(eth, val, reg_map->pdma.rss_glo_cfg);
> +
> +	/* Enable RSS */
> +	val |= MTK_RSS_EN;
> +	mtk_w32(eth, val, reg_map->pdma.rss_glo_cfg);
> +
> +	/* Release pause */
> +	val &= ~(MTK_RSS_CFG_REQ);
> +	mtk_w32(eth, val, reg_map->pdma.rss_glo_cfg);
> +
> +	/* Set perRSS GRP INT */
> +	mtk_m32(eth, MTK_RX_DONE_INT(eth, MTK_RSS_RING(1)),
> +		MTK_RX_DONE_INT(eth, MTK_RSS_RING(1)), reg_map->pdma.int_grp);
> +	mtk_m32(eth, MTK_RX_DONE_INT(eth, MTK_RSS_RING(2)),
> +		MTK_RX_DONE_INT(eth, MTK_RSS_RING(2)), reg_map->pdma.int_grp + 0x4);
> +	mtk_m32(eth, MTK_RX_DONE_INT(eth, MTK_RSS_RING(3)),
> +		MTK_RX_DONE_INT(eth, MTK_RSS_RING(3)), reg_map->pdma.int_grp3);
> +
> +	return 0;
> +}

> +/* struct mtk_rss_params -	This is the structure holding parameters
> + *				for the RSS ring
> + * @hash_key			The element is used to record the
> + *				secret key for the RSS ring
> + * indirection_table		The element is used to record the
> + *				indirection table for the RSS ring
> + */

Quite odd looking comment. Having the right side aligned like that
makes it header to correlate where doc for fields start.
And there's @ missing for indirection_table.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-05-14  1:56 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-09 19:09 [net-next v8 0/3] Add RSS and LRO support Frank Wunderlich
2026-05-09 19:09 ` [net-next v8 1/3] net: ethernet: mtk_eth_soc: Add register definitions for RSS and LRO Frank Wunderlich
2026-05-09 19:09 ` [net-next v8 2/3] net: ethernet: mtk_eth_soc: Add RSS support Frank Wunderlich
2026-05-14  1:52   ` Jakub Kicinski
2026-05-14  1:56   ` Jakub Kicinski
2026-05-09 19:09 ` [net-next v8 3/3] net: ethernet: mtk_eth_soc: Add LRO support Frank Wunderlich
2026-05-14  1:52   ` Jakub Kicinski
2026-05-14  1:53   ` Jakub Kicinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox