Re: [net-next v8 3/3] net: ethernet: mtk_eth_soc: Add LRO support

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jakub Kicinski <kuba@kernel.org>
To: linux@fw-web.de
Cc: Jakub Kicinski <kuba@kernel.org>,
	nbd@nbd.name, lorenzo@kernel.org, andrew+netdev@lunn.ch,
	davem@davemloft.net, edumazet@google.com, pabeni@redhat.com,
	matthias.bgg@gmail.com, angelogioacchino.delregno@collabora.com,
	linux@armlinux.org.uk, mason-cw.chang@mediatek.com,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	daniel@makrotopia.org, linux-mediatek@lists.infradead.org,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [net-next v8 3/3] net: ethernet: mtk_eth_soc: Add LRO support
Date: Wed, 13 May 2026 18:52:24 -0700	[thread overview]
Message-ID: <20260514015224.521125-1-kuba@kernel.org> (raw)
In-Reply-To: <20260509190938.169290-4-linux@fw-web.de>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
Subject: net: ethernet: mtk_eth_soc: Add LRO support

This patch adds Hardware Large Receive Offload support to the
mediatek ethernet driver and enables it on MT7988 by adding
MTK_HWLRO to MT7988_CAPS. The patch parameterizes the LRO
register accessors and ring counts on the SoC version, and adds
LRO-specific NAPI/IRQ wiring.

A few questions on the changes below.

> diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
> index 908fd88287ac..8035fc2557de 100644
> --- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
> +++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
> @@ -2806,7 +2806,7 @@ static int mtk_rx_alloc(struct mtk_eth *eth, int ring_no, int rx_flag)
>  	if (!ring->data)
>  		return -ENOMEM;
>  
> -	if (mtk_page_pool_enabled(eth)) {
> +	if (mtk_page_pool_enabled(eth) && rcu_access_pointer(eth->prog))  {
>  		struct page_pool *pp;
>  
>  		pp = mtk_create_page_pool(eth, &ring->xdp_q, ring_no,

[High]
Could the rationale for this gating change be explained in the commit
message?  The subject is "Add LRO support" but this hunk changes the
page_pool creation policy for every netsys v2+ chip.

Before the patch, every v2+ RX ring was backed by a page_pool (since
mtk_page_pool_enabled() returns mtk_is_netsys_v2_or_greater(eth)).
After the patch, a page_pool is only created when an XDP program is
already attached at ring allocation time.

mtk_xdp_setup() also rejects XDP attach with -EOPNOTSUPP when
eth->hwlro is true:

    if (eth->hwlro) {
            NL_SET_ERR_MSG_MOD(extack, "HW LRO and XDP are mutually exclusive");
            return -EOPNOTSUPP;
    }

So on MT7988 (where this patch enables MTK_HWLRO) eth->prog is
permanently NULL and mtk_create_page_pool() is now never reached
for any RX ring, including the non-LRO ring 0.

Does this intentionally disable page_pool/DMA buffer recycling on
MT7988, and on every other v2+ chip until a BPF program is loaded?
mtk_ethtool_pp_stats() depends on ring->page_pool, so the page-pool
counters reported through ethtool would also become empty.

The driver still advertises NETDEV_XDP_ACT_BASIC|REDIRECT|NDO_XMIT|
NDO_XMIT_SG via xdp_features for v2+, which appears to diverge from
the buffer-pool capability after this change.

> @@ -2974,30 +2975,50 @@ static int mtk_hwlro_rx_init(struct mtk_eth *eth)
>  	ring_ctrl_dw2 |= MTK_RING_MAX_AGG_CNT_L;
>  	ring_ctrl_dw3 |= MTK_RING_MAX_AGG_CNT_H;
>  
> -	for (i = 1; i < MTK_MAX_RX_RING_NUM; i++) {
> +	for (i = 1; i <= MTK_HW_LRO_RING_NUM(eth); i++) {
>  		mtk_w32(eth, ring_ctrl_dw1, MTK_LRO_CTRL_DW1_CFG(reg_map, i));

[ ... ]

>  	/* set refresh timer for altering flows to 1 sec. (unit: 20us) */
>  	mtk_w32(eth, (MTK_HW_LRO_TIMER_UNIT << 16) | MTK_HW_LRO_REFRESH_TIME,
>  		MTK_PDMA_LRO_ALT_REFRESH_TIMER);

[Medium]
Should MTK_PDMA_LRO_ALT_REFRESH_TIMER also have been converted to
go through reg_map like every other LRO register touched by this
patch?  Looking at the header, it is still defined as a hardcoded
absolute offset 0x1c that has lived in the v1 PDMA window since
2016.  On MT7988 (newly added to MTK_HWLRO via MT7988_CAPS) the
LRO block lives at 0x6c08-0x6c44 per mt7988_reg_map, while offset
0x1c on v3 falls in the Frame Engine register area between
MTK_FE_INT_ENABLE and MTK_FE_INT_GRP.

Every other LRO register access in this function was switched to
MTK_PDMA_LRO_CTRL_DW0(reg_map), MTK_PDMA_LRO_ALT_SCORE_DELTA(reg_map),
MTK_LRO_DIP_DW0_CFG(reg_map, x) and so on.  Was the omission of
this one register intentional, or should it be a similar
reg_map-based accessor so that on MT7988 the auto-learn refresh
timer is programmed in the LRO block instead of in the FE engine
region?

> -	/* set HW LRO mode & the max aggregation count for rx packets */
> -	lro_ctrl_dw3 |= MTK_ADMA_MODE | (MTK_HW_LRO_MAX_AGG_CNT & 0xff);
> +	if (mtk_is_netsys_v3_or_greater(eth)) {

[ ... ]

>  	/* enable HW LRO */
>  	lro_ctrl_dw0 |= MTK_LRO_EN;
>  
> -	mtk_w32(eth, lro_ctrl_dw3, MTK_PDMA_LRO_CTRL_DW3);
> -	mtk_w32(eth, lro_ctrl_dw0, MTK_PDMA_LRO_CTRL_DW0);
> +	mtk_w32(eth, lro_ctrl_dw3, MTK_PDMA_LRO_CTRL_DW3(reg_map));
> +	mtk_w32(eth, lro_ctrl_dw0, MTK_PDMA_LRO_CTRL_DW0(reg_map));
> +
> +	if (mtk_is_netsys_v2_or_greater(eth)) {
> +		i = (soc->rx.desc_size == sizeof(struct mtk_rx_dma_v2)) ? 1 : 0;
> +		mtk_m32(eth, MTK_RX_DONE_INT(eth, MTK_HW_LRO_RING(eth, i)),
> +			MTK_RX_DONE_INT(eth, MTK_HW_LRO_RING(eth, i)), reg_map->pdma.int_grp);
> +		mtk_m32(eth, MTK_RX_DONE_INT(eth, MTK_HW_LRO_RING(eth, i + 1)),
> +			MTK_RX_DONE_INT(eth, MTK_HW_LRO_RING(eth, i + 1)),
> +			reg_map->pdma.int_grp + 0x4);
> +		mtk_m32(eth, MTK_RX_DONE_INT(eth, MTK_HW_LRO_RING(eth, i + 2)),
> +			MTK_RX_DONE_INT(eth, MTK_HW_LRO_RING(eth, i + 2)), reg_map->pdma.int_grp3);
> +	}

[Medium]
On v3 (the only chip with both MTK_HWLRO and v2+ enabled by this
patch), the desc_size check yields i = 1, so this block routes
MTK_HW_LRO_RING(eth, 1..3) — physical rings 5, 6, 7 with done bits
BIT(29..31) — into int_grp, int_grp+0x4 and int_grp3 respectively.
What routes MTK_HW_LRO_RING(eth, 0), i.e. physical ring 4 with done
bit BIT(28)?

mtk_probe() registers an IRQ handler for all MTK_HW_LRO_RING_NUM(eth)
= 4 LRO rings using IRQF_SHARED, so a NAPI is attached for ring 4 on
eth->irq_pdma[MTK_HW_LRO_IRQ(eth, 0)] (= irq_pdma[0] on v3).  Without
ring 4's BIT(28) being OR'd into any int_grp register, does ring 4's
RX interrupt actually reach NAPI, or does that depend on hardware
reset defaults of int_grp on MT7988?

>  	return 0;
>  }
> @@ -3080,6 +3112,65 @@ static int mtk_hwlro_get_ip_cnt(struct mtk_mac *mac)
>  	return cnt;
>  }
>  
> +static int mtk_hwlro_add_ipaddr_idx(struct net_device *dev, u32 ip4dst)
> +{
> +	struct mtk_mac *mac = netdev_priv(dev);
> +	const struct mtk_reg_map *reg_map;
> +	struct mtk_eth *eth = mac->hw;
> +	u32 reg_val;
> +	int i;
> +
> +	reg_map = eth->soc->reg_map;
> +
> +	/* check for duplicate IP address in the current DIP list */
> +	for (i = 1; i <= MTK_HW_LRO_DIP_NUM(eth); i++) {
> +		reg_val = mtk_r32(eth, MTK_LRO_DIP_DW0_CFG(reg_map, i));
> +		if (reg_val == ip4dst)
> +			break;
> +	}
> +
> +	if (i < MTK_HW_LRO_DIP_NUM(eth) + 1) {
> +		netdev_warn(dev, "Duplicate IP address at DIP(%d)!\n", i);
> +		return -EEXIST;
> +	}
> +
> +	/* find out available DIP index */
> +	for (i = 1; i <= MTK_HW_LRO_DIP_NUM(eth); i++) {
> +		reg_val = mtk_r32(eth, MTK_LRO_DIP_DW0_CFG(reg_map, i));
> +		if (reg_val == 0UL)
> +			break;
> +	}
> +
> +	if (i >= MTK_HW_LRO_DIP_NUM(eth) + 1) {
> +		netdev_warn(dev, "DIP index is currently out of resource!\n");
> +		return -EBUSY;
> +	}
> +
> +	return i;
> +}

[High]
Is it intentional that the DIP-table allocation policy is changed
from a per-MAC partition (previously hwlro_idx = mac->id *
MTK_MAX_LRO_IP_CNT + fsp->location) to a single global pool searched
across all MACs?  The duplicate check and free-slot search above
both iterate every DIP register without scoping by mac->id.

Two observations:

1. Two netdevs that both want HW LRO on the same destination IP
   collide on the duplicate-check and the second one fails with
   -EEXIST.  A single MAC can also fill all DIP slots and starve
   the other; non-v3 silicon has only 3 slots while the uAPI
   advertises 2 MACs * MTK_MAX_LRO_IP_CNT = 4 rules.

2. mtk_hwlro_add_ipaddr() (below) does not invalidate any
   pre-existing entry at mac->hwlro_ip[fsp->location] before
   calling this helper, so adding a new IP at a location that
   already had one finds a fresh slot for the new IP and then
   overwrites the software state.  The old slot still has the
   original IP with MYIP_VLD set in hardware, and a later del at
   the same location only finds and invalidates the new entry.

Together they look like a CAP_NET_ADMIN-reachable way to leak DIP
slots until device reset, while mac->hwlro_ip[] looks consistent.

> +static int mtk_hwlro_get_ipaddr_idx(struct net_device *dev, u32 ip4dst)
> +{
> +	struct mtk_mac *mac = netdev_priv(dev);
> +	struct mtk_eth *eth = mac->hw;
> +	u32 reg_val;
> +	int i;
> +
> +	/* find out DIP index that matches the given IP address */
> +	for (i = 1; i <= MTK_HW_LRO_DIP_NUM(eth); i++) {
> +		reg_val = mtk_r32(eth, MTK_LRO_DIP_DW0_CFG(eth->soc->reg_map, i));
> +		if (reg_val == ip4dst)
> +			break;
> +	}
> +
> +	if (i >= MTK_HW_LRO_DIP_NUM(eth) + 1) {
> +		netdev_warn(dev, "DIP address is not exist!\n");
> +		return -ENOENT;
> +	}
> +
> +	return i;
> +}
> +
>  static int mtk_hwlro_add_ipaddr(struct net_device *dev,
>  				struct ethtool_rxnfc *cmd)
>  {
> @@ -3112,13 +3207,17 @@ static int mtk_hwlro_del_ipaddr(struct net_device *dev,
>  	struct mtk_mac *mac = netdev_priv(dev);
>  	struct mtk_eth *eth = mac->hw;
>  	int hwlro_idx;
> +	u32 ip4dst;
>  
>  	if (fsp->location > 1)
>  		return -EINVAL;
>  
> -	mac->hwlro_ip[fsp->location] = 0;
> -	hwlro_idx = (mac->id * MTK_MAX_LRO_IP_CNT) + fsp->location;
> +	ip4dst = mac->hwlro_ip[fsp->location];
> +	hwlro_idx = mtk_hwlro_get_ipaddr_idx(dev, ip4dst);
> +	if (hwlro_idx < 0)
> +		return hwlro_idx;
>  
> +	mac->hwlro_ip[fsp->location] = 0;
>  	mac->hwlro_ip_cnt = mtk_hwlro_get_ip_cnt(mac);
>  
>  	mtk_hwlro_inval_ipaddr(eth, hwlro_idx);

[Medium]
Should this path validate that mac->hwlro_ip[fsp->location] is
non-zero before passing it to mtk_hwlro_get_ipaddr_idx()?  When the
location was never populated (or was already deleted), ip4dst is 0.
mtk_hwlro_get_ipaddr_idx() then matches the first slot whose register
reads 0 — i.e. an arbitrary unrelated empty slot — and returns a
positive index, after which mtk_hwlro_inval_ipaddr() clears MYIP_VLD
on that unrelated slot and the call returns 0 to userspace.

Either rejecting the call with -ENOENT here when
mac->hwlro_ip[fsp->location] == 0, or having mtk_hwlro_get_ipaddr_idx()
refuse ip4dst == 0, would avoid that.

> @@ -3126,6 +3225,24 @@ static int mtk_hwlro_del_ipaddr(struct net_device *dev,
>  	return 0;
>  }
>  
> +static void mtk_hwlro_netdev_enable(struct net_device *dev)
> +{
> +	struct mtk_mac *mac = netdev_priv(dev);
> +	struct mtk_eth *eth = mac->hw;
> +	int i, hwlro_idx;
> +
> +	for (i = 0; i < MTK_MAX_LRO_IP_CNT; i++) {
> +		if (mac->hwlro_ip[i] == 0)
> +			continue;
> +
> +		hwlro_idx = mtk_hwlro_get_ipaddr_idx(dev, mac->hwlro_ip[i]);
> +		if (hwlro_idx < 0)
> +			continue;
> +
> +		mtk_hwlro_val_ipaddr(eth, hwlro_idx, mac->hwlro_ip[i]);
> +	}
> +}
> +

[Low]
This isn't a bug, but is mtk_hwlro_netdev_enable() reachable in
practice?  It uses mtk_hwlro_get_ipaddr_idx(), which only succeeds
when the IP is already programmed in hardware.  The only path that
disables LRO is mtk_hwlro_netdev_disable() below, which clears
mac->hwlro_ip[i] = 0 (so the first guard always continues) and also
invalidates the DIP entries.  After ethtool -K lro off the IPs are
gone from both software and hardware, so the subsequent ethtool -K
lro on appears to do nothing.

If the intent is restore-on-re-enable, should the disable preserve
mac->hwlro_ip[] and/or should this enable path call
mtk_hwlro_add_ipaddr_idx() to allocate fresh slots?

>  static void mtk_hwlro_netdev_disable(struct net_device *dev)
>  {
>  	struct mtk_mac *mac = netdev_priv(dev);
> @@ -3133,8 +3250,14 @@ static void mtk_hwlro_netdev_disable(struct net_device *dev)
>  	int i, hwlro_idx;
>  
>  	for (i = 0; i < MTK_MAX_LRO_IP_CNT; i++) {
> +		if (mac->hwlro_ip[i] == 0)
> +			continue;
> +
> +		hwlro_idx = mtk_hwlro_get_ipaddr_idx(dev, mac->hwlro_ip[i]);
> +		if (hwlro_idx < 0)
> +			continue;
> +
>  		mac->hwlro_ip[i] = 0;
> -		hwlro_idx = (mac->id * MTK_MAX_LRO_IP_CNT) + i;
>  
>  		mtk_hwlro_inval_ipaddr(eth, hwlro_idx);
>  	}

[Medium]
Can this leave mac->hwlro_ip[i] populated while mac->hwlro_ip_cnt is
reset to 0?  When mtk_hwlro_get_ipaddr_idx() returns < 0 (for example
because the DIP table was already cleared by a prior
mtk_hwlro_rx_uninit, by a parallel teardown on another MAC, or by a
hardware reset), the continue skips the mac->hwlro_ip[i] = 0
assignment.  After the loop, mac->hwlro_ip_cnt = 0 still runs
unconditionally.

The previous code unconditionally cleared mac->hwlro_ip[i].  In the
new shape, ETHTOOL_GRXCLSRLCNT would report 0 while
ETHTOOL_GRXCLSRLALL via mtk_hwlro_get_fdir_all() still enumerates
the stale entries, and mtk_hwlro_get_ip_cnt() called from a later
add would recount them.

> diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.h b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
> index 378cf47913ef..f7e7299fef6b 100644
> --- a/drivers/net/ethernet/mediatek/mtk_eth_soc.h
> +++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
> @@ -35,7 +35,7 @@
>  #define MTK_DMA_SIZE(x)		(SZ_##x)
>  #define MTK_FQ_DMA_HEAD		32
>  #define MTK_FQ_DMA_LENGTH	2048
> -#define MTK_RX_ETH_HLEN		(ETH_HLEN + ETH_FCS_LEN)
> +#define MTK_RX_ETH_HLEN		(VLAN_ETH_HLEN + VLAN_HLEN + ETH_FCS_LEN)
>  #define MTK_RX_HLEN		(NET_SKB_PAD + MTK_RX_ETH_HLEN + NET_IP_ALIGN)

[High]
Could this change be called out in the commit message?  It widens
MTK_RX_ETH_HLEN from 18 to 26 globally for every supported chip,
not just MT7988/LRO.

mtk_add_mac() sets:

    eth->netdev[id]->max_mtu = MTK_MAX_RX_LENGTH_2K - MTK_RX_ETH_HLEN;

(and MTK_MAX_RX_LENGTH - MTK_RX_ETH_HLEN for MT7628), so the
maximum MTU advertised by every supported MediaTek SoC drops by 8
bytes (e.g. 2030 -> 2022 on the 2K path, 1518 -> 1510 on MT7628).
Userspace configuration that previously set MTU values up to the
prior limit will now be rejected with -EINVAL.

If headroom for stacked VLANs is needed only on the LRO path, would
it be possible to confine the +8 to MTK_MAX_LRO_RX_LENGTH instead
of changing MTK_RX_ETH_HLEN for every chip?

next prev parent reply	other threads:[~2026-05-14  1:52 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-09 19:09 [net-next v8 0/3] Add RSS and LRO support Frank Wunderlich
2026-05-09 19:09 ` Frank Wunderlich
2026-05-09 19:09 ` [net-next v8 1/3] net: ethernet: mtk_eth_soc: Add register definitions for RSS and LRO Frank Wunderlich
2026-05-09 19:09   ` Frank Wunderlich
2026-05-09 19:09 ` [net-next v8 2/3] net: ethernet: mtk_eth_soc: Add RSS support Frank Wunderlich
2026-05-09 19:09   ` Frank Wunderlich
2026-05-14  1:52   ` Jakub Kicinski
2026-05-14  1:56   ` Jakub Kicinski
2026-05-14  1:56     ` Jakub Kicinski
2026-05-15 11:13     ` Frank Wunderlich
2026-05-15 22:45       ` Jakub Kicinski
2026-05-09 19:09 ` [net-next v8 3/3] net: ethernet: mtk_eth_soc: Add LRO support Frank Wunderlich
2026-05-09 19:09   ` Frank Wunderlich
2026-05-14  1:52   ` Jakub Kicinski [this message]
2026-05-14  1:53   ` Jakub Kicinski
2026-05-14  1:53     ` Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260514015224.521125-1-kuba@kernel.org \
    --to=kuba@kernel.org \
    --cc=andrew+netdev@lunn.ch \
    --cc=angelogioacchino.delregno@collabora.com \
    --cc=daniel@makrotopia.org \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mediatek@lists.infradead.org \
    --cc=linux@armlinux.org.uk \
    --cc=linux@fw-web.de \
    --cc=lorenzo@kernel.org \
    --cc=mason-cw.chang@mediatek.com \
    --cc=matthias.bgg@gmail.com \
    --cc=nbd@nbd.name \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.