From: Jakub Kicinski <kuba@kernel.org>
To: linux@fw-web.de
Cc: Jakub Kicinski <kuba@kernel.org>,
nbd@nbd.name, lorenzo@kernel.org, andrew+netdev@lunn.ch,
davem@davemloft.net, edumazet@google.com, pabeni@redhat.com,
matthias.bgg@gmail.com, angelogioacchino.delregno@collabora.com,
linux@armlinux.org.uk, mason-cw.chang@mediatek.com,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
daniel@makrotopia.org, linux-mediatek@lists.infradead.org,
linux-arm-kernel@lists.infradead.org
Subject: Re: [net-next v8 3/3] net: ethernet: mtk_eth_soc: Add LRO support
Date: Wed, 13 May 2026 18:52:24 -0700 [thread overview]
Message-ID: <20260514015224.521125-1-kuba@kernel.org> (raw)
In-Reply-To: <20260509190938.169290-4-linux@fw-web.de>
This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
Subject: net: ethernet: mtk_eth_soc: Add LRO support
This patch adds Hardware Large Receive Offload support to the
mediatek ethernet driver and enables it on MT7988 by adding
MTK_HWLRO to MT7988_CAPS. The patch parameterizes the LRO
register accessors and ring counts on the SoC version, and adds
LRO-specific NAPI/IRQ wiring.
A few questions on the changes below.
> diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
> index 908fd88287ac..8035fc2557de 100644
> --- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
> +++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
> @@ -2806,7 +2806,7 @@ static int mtk_rx_alloc(struct mtk_eth *eth, int ring_no, int rx_flag)
> if (!ring->data)
> return -ENOMEM;
>
> - if (mtk_page_pool_enabled(eth)) {
> + if (mtk_page_pool_enabled(eth) && rcu_access_pointer(eth->prog)) {
> struct page_pool *pp;
>
> pp = mtk_create_page_pool(eth, &ring->xdp_q, ring_no,
[High]
Could the rationale for this gating change be explained in the commit
message? The subject is "Add LRO support" but this hunk changes the
page_pool creation policy for every netsys v2+ chip.
Before the patch, every v2+ RX ring was backed by a page_pool (since
mtk_page_pool_enabled() returns mtk_is_netsys_v2_or_greater(eth)).
After the patch, a page_pool is only created when an XDP program is
already attached at ring allocation time.
mtk_xdp_setup() also rejects XDP attach with -EOPNOTSUPP when
eth->hwlro is true:
if (eth->hwlro) {
NL_SET_ERR_MSG_MOD(extack, "HW LRO and XDP are mutually exclusive");
return -EOPNOTSUPP;
}
So on MT7988 (where this patch enables MTK_HWLRO) eth->prog is
permanently NULL and mtk_create_page_pool() is now never reached
for any RX ring, including the non-LRO ring 0.
Does this intentionally disable page_pool/DMA buffer recycling on
MT7988, and on every other v2+ chip until a BPF program is loaded?
mtk_ethtool_pp_stats() depends on ring->page_pool, so the page-pool
counters reported through ethtool would also become empty.
The driver still advertises NETDEV_XDP_ACT_BASIC|REDIRECT|NDO_XMIT|
NDO_XMIT_SG via xdp_features for v2+, which appears to diverge from
the buffer-pool capability after this change.
> @@ -2974,30 +2975,50 @@ static int mtk_hwlro_rx_init(struct mtk_eth *eth)
> ring_ctrl_dw2 |= MTK_RING_MAX_AGG_CNT_L;
> ring_ctrl_dw3 |= MTK_RING_MAX_AGG_CNT_H;
>
> - for (i = 1; i < MTK_MAX_RX_RING_NUM; i++) {
> + for (i = 1; i <= MTK_HW_LRO_RING_NUM(eth); i++) {
> mtk_w32(eth, ring_ctrl_dw1, MTK_LRO_CTRL_DW1_CFG(reg_map, i));
[ ... ]
> /* set refresh timer for altering flows to 1 sec. (unit: 20us) */
> mtk_w32(eth, (MTK_HW_LRO_TIMER_UNIT << 16) | MTK_HW_LRO_REFRESH_TIME,
> MTK_PDMA_LRO_ALT_REFRESH_TIMER);
[Medium]
Should MTK_PDMA_LRO_ALT_REFRESH_TIMER also have been converted to
go through reg_map like every other LRO register touched by this
patch? Looking at the header, it is still defined as a hardcoded
absolute offset 0x1c that has lived in the v1 PDMA window since
2016. On MT7988 (newly added to MTK_HWLRO via MT7988_CAPS) the
LRO block lives at 0x6c08-0x6c44 per mt7988_reg_map, while offset
0x1c on v3 falls in the Frame Engine register area between
MTK_FE_INT_ENABLE and MTK_FE_INT_GRP.
Every other LRO register access in this function was switched to
MTK_PDMA_LRO_CTRL_DW0(reg_map), MTK_PDMA_LRO_ALT_SCORE_DELTA(reg_map),
MTK_LRO_DIP_DW0_CFG(reg_map, x) and so on. Was the omission of
this one register intentional, or should it be a similar
reg_map-based accessor so that on MT7988 the auto-learn refresh
timer is programmed in the LRO block instead of in the FE engine
region?
> - /* set HW LRO mode & the max aggregation count for rx packets */
> - lro_ctrl_dw3 |= MTK_ADMA_MODE | (MTK_HW_LRO_MAX_AGG_CNT & 0xff);
> + if (mtk_is_netsys_v3_or_greater(eth)) {
[ ... ]
> /* enable HW LRO */
> lro_ctrl_dw0 |= MTK_LRO_EN;
>
> - mtk_w32(eth, lro_ctrl_dw3, MTK_PDMA_LRO_CTRL_DW3);
> - mtk_w32(eth, lro_ctrl_dw0, MTK_PDMA_LRO_CTRL_DW0);
> + mtk_w32(eth, lro_ctrl_dw3, MTK_PDMA_LRO_CTRL_DW3(reg_map));
> + mtk_w32(eth, lro_ctrl_dw0, MTK_PDMA_LRO_CTRL_DW0(reg_map));
> +
> + if (mtk_is_netsys_v2_or_greater(eth)) {
> + i = (soc->rx.desc_size == sizeof(struct mtk_rx_dma_v2)) ? 1 : 0;
> + mtk_m32(eth, MTK_RX_DONE_INT(eth, MTK_HW_LRO_RING(eth, i)),
> + MTK_RX_DONE_INT(eth, MTK_HW_LRO_RING(eth, i)), reg_map->pdma.int_grp);
> + mtk_m32(eth, MTK_RX_DONE_INT(eth, MTK_HW_LRO_RING(eth, i + 1)),
> + MTK_RX_DONE_INT(eth, MTK_HW_LRO_RING(eth, i + 1)),
> + reg_map->pdma.int_grp + 0x4);
> + mtk_m32(eth, MTK_RX_DONE_INT(eth, MTK_HW_LRO_RING(eth, i + 2)),
> + MTK_RX_DONE_INT(eth, MTK_HW_LRO_RING(eth, i + 2)), reg_map->pdma.int_grp3);
> + }
[Medium]
On v3 (the only chip with both MTK_HWLRO and v2+ enabled by this
patch), the desc_size check yields i = 1, so this block routes
MTK_HW_LRO_RING(eth, 1..3) — physical rings 5, 6, 7 with done bits
BIT(29..31) — into int_grp, int_grp+0x4 and int_grp3 respectively.
What routes MTK_HW_LRO_RING(eth, 0), i.e. physical ring 4 with done
bit BIT(28)?
mtk_probe() registers an IRQ handler for all MTK_HW_LRO_RING_NUM(eth)
= 4 LRO rings using IRQF_SHARED, so a NAPI is attached for ring 4 on
eth->irq_pdma[MTK_HW_LRO_IRQ(eth, 0)] (= irq_pdma[0] on v3). Without
ring 4's BIT(28) being OR'd into any int_grp register, does ring 4's
RX interrupt actually reach NAPI, or does that depend on hardware
reset defaults of int_grp on MT7988?
> return 0;
> }
> @@ -3080,6 +3112,65 @@ static int mtk_hwlro_get_ip_cnt(struct mtk_mac *mac)
> return cnt;
> }
>
> +static int mtk_hwlro_add_ipaddr_idx(struct net_device *dev, u32 ip4dst)
> +{
> + struct mtk_mac *mac = netdev_priv(dev);
> + const struct mtk_reg_map *reg_map;
> + struct mtk_eth *eth = mac->hw;
> + u32 reg_val;
> + int i;
> +
> + reg_map = eth->soc->reg_map;
> +
> + /* check for duplicate IP address in the current DIP list */
> + for (i = 1; i <= MTK_HW_LRO_DIP_NUM(eth); i++) {
> + reg_val = mtk_r32(eth, MTK_LRO_DIP_DW0_CFG(reg_map, i));
> + if (reg_val == ip4dst)
> + break;
> + }
> +
> + if (i < MTK_HW_LRO_DIP_NUM(eth) + 1) {
> + netdev_warn(dev, "Duplicate IP address at DIP(%d)!\n", i);
> + return -EEXIST;
> + }
> +
> + /* find out available DIP index */
> + for (i = 1; i <= MTK_HW_LRO_DIP_NUM(eth); i++) {
> + reg_val = mtk_r32(eth, MTK_LRO_DIP_DW0_CFG(reg_map, i));
> + if (reg_val == 0UL)
> + break;
> + }
> +
> + if (i >= MTK_HW_LRO_DIP_NUM(eth) + 1) {
> + netdev_warn(dev, "DIP index is currently out of resource!\n");
> + return -EBUSY;
> + }
> +
> + return i;
> +}
[High]
Is it intentional that the DIP-table allocation policy is changed
from a per-MAC partition (previously hwlro_idx = mac->id *
MTK_MAX_LRO_IP_CNT + fsp->location) to a single global pool searched
across all MACs? The duplicate check and free-slot search above
both iterate every DIP register without scoping by mac->id.
Two observations:
1. Two netdevs that both want HW LRO on the same destination IP
collide on the duplicate-check and the second one fails with
-EEXIST. A single MAC can also fill all DIP slots and starve
the other; non-v3 silicon has only 3 slots while the uAPI
advertises 2 MACs * MTK_MAX_LRO_IP_CNT = 4 rules.
2. mtk_hwlro_add_ipaddr() (below) does not invalidate any
pre-existing entry at mac->hwlro_ip[fsp->location] before
calling this helper, so adding a new IP at a location that
already had one finds a fresh slot for the new IP and then
overwrites the software state. The old slot still has the
original IP with MYIP_VLD set in hardware, and a later del at
the same location only finds and invalidates the new entry.
Together they look like a CAP_NET_ADMIN-reachable way to leak DIP
slots until device reset, while mac->hwlro_ip[] looks consistent.
> +static int mtk_hwlro_get_ipaddr_idx(struct net_device *dev, u32 ip4dst)
> +{
> + struct mtk_mac *mac = netdev_priv(dev);
> + struct mtk_eth *eth = mac->hw;
> + u32 reg_val;
> + int i;
> +
> + /* find out DIP index that matches the given IP address */
> + for (i = 1; i <= MTK_HW_LRO_DIP_NUM(eth); i++) {
> + reg_val = mtk_r32(eth, MTK_LRO_DIP_DW0_CFG(eth->soc->reg_map, i));
> + if (reg_val == ip4dst)
> + break;
> + }
> +
> + if (i >= MTK_HW_LRO_DIP_NUM(eth) + 1) {
> + netdev_warn(dev, "DIP address is not exist!\n");
> + return -ENOENT;
> + }
> +
> + return i;
> +}
> +
> static int mtk_hwlro_add_ipaddr(struct net_device *dev,
> struct ethtool_rxnfc *cmd)
> {
> @@ -3112,13 +3207,17 @@ static int mtk_hwlro_del_ipaddr(struct net_device *dev,
> struct mtk_mac *mac = netdev_priv(dev);
> struct mtk_eth *eth = mac->hw;
> int hwlro_idx;
> + u32 ip4dst;
>
> if (fsp->location > 1)
> return -EINVAL;
>
> - mac->hwlro_ip[fsp->location] = 0;
> - hwlro_idx = (mac->id * MTK_MAX_LRO_IP_CNT) + fsp->location;
> + ip4dst = mac->hwlro_ip[fsp->location];
> + hwlro_idx = mtk_hwlro_get_ipaddr_idx(dev, ip4dst);
> + if (hwlro_idx < 0)
> + return hwlro_idx;
>
> + mac->hwlro_ip[fsp->location] = 0;
> mac->hwlro_ip_cnt = mtk_hwlro_get_ip_cnt(mac);
>
> mtk_hwlro_inval_ipaddr(eth, hwlro_idx);
[Medium]
Should this path validate that mac->hwlro_ip[fsp->location] is
non-zero before passing it to mtk_hwlro_get_ipaddr_idx()? When the
location was never populated (or was already deleted), ip4dst is 0.
mtk_hwlro_get_ipaddr_idx() then matches the first slot whose register
reads 0 — i.e. an arbitrary unrelated empty slot — and returns a
positive index, after which mtk_hwlro_inval_ipaddr() clears MYIP_VLD
on that unrelated slot and the call returns 0 to userspace.
Either rejecting the call with -ENOENT here when
mac->hwlro_ip[fsp->location] == 0, or having mtk_hwlro_get_ipaddr_idx()
refuse ip4dst == 0, would avoid that.
> @@ -3126,6 +3225,24 @@ static int mtk_hwlro_del_ipaddr(struct net_device *dev,
> return 0;
> }
>
> +static void mtk_hwlro_netdev_enable(struct net_device *dev)
> +{
> + struct mtk_mac *mac = netdev_priv(dev);
> + struct mtk_eth *eth = mac->hw;
> + int i, hwlro_idx;
> +
> + for (i = 0; i < MTK_MAX_LRO_IP_CNT; i++) {
> + if (mac->hwlro_ip[i] == 0)
> + continue;
> +
> + hwlro_idx = mtk_hwlro_get_ipaddr_idx(dev, mac->hwlro_ip[i]);
> + if (hwlro_idx < 0)
> + continue;
> +
> + mtk_hwlro_val_ipaddr(eth, hwlro_idx, mac->hwlro_ip[i]);
> + }
> +}
> +
[Low]
This isn't a bug, but is mtk_hwlro_netdev_enable() reachable in
practice? It uses mtk_hwlro_get_ipaddr_idx(), which only succeeds
when the IP is already programmed in hardware. The only path that
disables LRO is mtk_hwlro_netdev_disable() below, which clears
mac->hwlro_ip[i] = 0 (so the first guard always continues) and also
invalidates the DIP entries. After ethtool -K lro off the IPs are
gone from both software and hardware, so the subsequent ethtool -K
lro on appears to do nothing.
If the intent is restore-on-re-enable, should the disable preserve
mac->hwlro_ip[] and/or should this enable path call
mtk_hwlro_add_ipaddr_idx() to allocate fresh slots?
> static void mtk_hwlro_netdev_disable(struct net_device *dev)
> {
> struct mtk_mac *mac = netdev_priv(dev);
> @@ -3133,8 +3250,14 @@ static void mtk_hwlro_netdev_disable(struct net_device *dev)
> int i, hwlro_idx;
>
> for (i = 0; i < MTK_MAX_LRO_IP_CNT; i++) {
> + if (mac->hwlro_ip[i] == 0)
> + continue;
> +
> + hwlro_idx = mtk_hwlro_get_ipaddr_idx(dev, mac->hwlro_ip[i]);
> + if (hwlro_idx < 0)
> + continue;
> +
> mac->hwlro_ip[i] = 0;
> - hwlro_idx = (mac->id * MTK_MAX_LRO_IP_CNT) + i;
>
> mtk_hwlro_inval_ipaddr(eth, hwlro_idx);
> }
[Medium]
Can this leave mac->hwlro_ip[i] populated while mac->hwlro_ip_cnt is
reset to 0? When mtk_hwlro_get_ipaddr_idx() returns < 0 (for example
because the DIP table was already cleared by a prior
mtk_hwlro_rx_uninit, by a parallel teardown on another MAC, or by a
hardware reset), the continue skips the mac->hwlro_ip[i] = 0
assignment. After the loop, mac->hwlro_ip_cnt = 0 still runs
unconditionally.
The previous code unconditionally cleared mac->hwlro_ip[i]. In the
new shape, ETHTOOL_GRXCLSRLCNT would report 0 while
ETHTOOL_GRXCLSRLALL via mtk_hwlro_get_fdir_all() still enumerates
the stale entries, and mtk_hwlro_get_ip_cnt() called from a later
add would recount them.
> diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.h b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
> index 378cf47913ef..f7e7299fef6b 100644
> --- a/drivers/net/ethernet/mediatek/mtk_eth_soc.h
> +++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
> @@ -35,7 +35,7 @@
> #define MTK_DMA_SIZE(x) (SZ_##x)
> #define MTK_FQ_DMA_HEAD 32
> #define MTK_FQ_DMA_LENGTH 2048
> -#define MTK_RX_ETH_HLEN (ETH_HLEN + ETH_FCS_LEN)
> +#define MTK_RX_ETH_HLEN (VLAN_ETH_HLEN + VLAN_HLEN + ETH_FCS_LEN)
> #define MTK_RX_HLEN (NET_SKB_PAD + MTK_RX_ETH_HLEN + NET_IP_ALIGN)
[High]
Could this change be called out in the commit message? It widens
MTK_RX_ETH_HLEN from 18 to 26 globally for every supported chip,
not just MT7988/LRO.
mtk_add_mac() sets:
eth->netdev[id]->max_mtu = MTK_MAX_RX_LENGTH_2K - MTK_RX_ETH_HLEN;
(and MTK_MAX_RX_LENGTH - MTK_RX_ETH_HLEN for MT7628), so the
maximum MTU advertised by every supported MediaTek SoC drops by 8
bytes (e.g. 2030 -> 2022 on the 2K path, 1518 -> 1510 on MT7628).
Userspace configuration that previously set MTU values up to the
prior limit will now be rejected with -EINVAL.
If headroom for stacked VLANs is needed only on the LRO path, would
it be possible to confine the +8 to MTK_MAX_LRO_RX_LENGTH instead
of changing MTK_RX_ETH_HLEN for every chip?
next prev parent reply other threads:[~2026-05-14 1:52 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-09 19:09 [net-next v8 0/3] Add RSS and LRO support Frank Wunderlich
2026-05-09 19:09 ` [net-next v8 1/3] net: ethernet: mtk_eth_soc: Add register definitions for RSS and LRO Frank Wunderlich
2026-05-09 19:09 ` [net-next v8 2/3] net: ethernet: mtk_eth_soc: Add RSS support Frank Wunderlich
2026-05-14 1:52 ` Jakub Kicinski
2026-05-14 1:56 ` Jakub Kicinski
2026-05-09 19:09 ` [net-next v8 3/3] net: ethernet: mtk_eth_soc: Add LRO support Frank Wunderlich
2026-05-14 1:52 ` Jakub Kicinski [this message]
2026-05-14 1:53 ` Jakub Kicinski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260514015224.521125-1-kuba@kernel.org \
--to=kuba@kernel.org \
--cc=andrew+netdev@lunn.ch \
--cc=angelogioacchino.delregno@collabora.com \
--cc=daniel@makrotopia.org \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mediatek@lists.infradead.org \
--cc=linux@armlinux.org.uk \
--cc=linux@fw-web.de \
--cc=lorenzo@kernel.org \
--cc=mason-cw.chang@mediatek.com \
--cc=matthias.bgg@gmail.com \
--cc=nbd@nbd.name \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox