* [PATCH net] MAINTAINERS: Update Marvell octeontx2 driver maintainers
From: Ratheesh Kannoth @ 2026-06-26 4:48 UTC (permalink / raw)
To: netdev, linux-kernel
Cc: sgoutham, davem, edumazet, kuba, pabeni, andrew+netdev,
Ratheesh Kannoth
Update the maintainer entries for the Marvell OcteonTX (RVU) drivers to
reflect recent organizational changes.
Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
---
MAINTAINERS | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/MAINTAINERS b/MAINTAINERS
index 4a290dc1284e..6fda185033e3 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -15608,8 +15608,8 @@ F: drivers/net/ethernet/marvell/octeon_ep_vf
MARVELL OCTEONTX2 PHYSICAL FUNCTION DRIVER
M: Sunil Goutham <sgoutham@marvell.com>
M: Geetha sowjanya <gakula@marvell.com>
+M: Ratheesh Kannoth <rkannoth@marvell.com>
M: Subbaraya Sundeep <sbhatta@marvell.com>
-M: hariprasad <hkelam@marvell.com>
M: Bharat Bhushan <bbhushan2@marvell.com>
L: netdev@vger.kernel.org
S: Maintained
@@ -15618,9 +15618,8 @@ F: include/linux/soc/marvell/octeontx2/
MARVELL OCTEONTX2 RVU ADMIN FUNCTION DRIVER
M: Sunil Goutham <sgoutham@marvell.com>
-M: Linu Cherian <lcherian@marvell.com>
+M: Ratheesh Kannoth <rkannoth@marvell.com>
M: Geetha sowjanya <gakula@marvell.com>
-M: hariprasad <hkelam@marvell.com>
M: Subbaraya Sundeep <sbhatta@marvell.com>
L: netdev@vger.kernel.org
S: Maintained
@@ -15628,8 +15627,8 @@ F: Documentation/networking/device_drivers/ethernet/marvell/octeontx2.rst
F: drivers/net/ethernet/marvell/octeontx2/af/
MARVELL PEM PMU DRIVER
-M: Linu Cherian <lcherian@marvell.com>
M: Gowthami Thiagarajan <gthiagarajan@marvell.com>
+M: Geetha sowjanya <gakula@marvell.com>
S: Supported
F: drivers/perf/marvell_pem_pmu.c
--
2.43.0
^ permalink raw reply related
* [PATCH iwl-next v5 0/2] ice: implement symmetric RSS hash configuration
From: Aleksandr Loktionov @ 2026-06-26 5:47 UTC (permalink / raw)
To: intel-wired-lan, anthony.l.nguyen, aleksandr.loktionov; +Cc: netdev
The ice driver has advertised symmetric RSS support via
supported_input_xfrm since the capability was added, but ice_set_rxfh()
ignored the input_xfrm parameter entirely, so enabling symmetric hashing
had no actual effect.
This series fixes that. Patch 1 extends the ethtool core so that
drivers hashing GTP flows on TEID can report it honestly without
blocking symmetric-xor configuration. Patch 2 wires up the ice driver.
The need for patch 1 surfaced because GTP flow profiles in ice always
include TEID in the hash. ethtool_check_flow_types() calls
get_rxfh_fields() for every hashable flow type before allowing
symmetric-xor; ethtool_rxfh_config_is_sym() rejected any bitmap
containing RXH_GTP_TEID since it has no src/dst counterpart. TEID
is the same value in both tunnel directions, so this rejection is
incorrect: including it does not break symmetry.
Rather than hiding TEID from the reported fields (which would silently
misrepresent hardware behaviour), patch 1 fixes the validator, and
patch 2 reports TEID honestly.
Tested with tools/testing/selftests/drivers/net/hw/rss_input_xfrm.py
on an E810 card running kernel 6.19-rc8.
---
v4 -> v5:
- remove redundant (u64) type conversion
v3 -> v4:
- Drop the ICE_HASH_INVALID fallback in ice_get_rxfh_fields() that
fabricated default L3+L4 hash fields when no hardware RSS config
exists for a flow type; returning zero fields is more honest and
avoids misrepresenting hardware state
- Drop the companion "if (!l3 && !l4)" special case in the
pair-completion block; it was only necessary to cover the synthetic
defaults added by the fallback, which is now gone
- No functional change to ice_set_rxfh() or the ethtool core patch
v2 -> v3:
- Split into 2 patches: ethtool core fix separate from driver change
- Drop the RXH_GTP_TEID stripping workaround from the driver; instead
fix ethtool_rxfh_config_is_sym() to accept TEID as intrinsically
symmetric (patch 1)
- Fix ice_get_rxfh_fields(): the v2 unconditional assignment
"nfc->data = ICE_RSS_ALLOWED_FIELDS" clobbered fields set earlier in
the same function; replaced with pair-completion using |= so only
the missing half of a partial pair is filled in
- Remove ICE_RSS_ALLOWED_FIELDS define (no longer needed)
- Report RXH_GTP_TEID honestly for GTP flow types
v1 -> v2:
- Preserve valid symmetric RSS fields instead of overwriting nfc->data
unconditionally
Aleksandr Loktionov (2):
ethtool: treat RXH_GTP_TEID as intrinsically symmetric
ice: implement symmetric RSS hash configuration
drivers/net/ethernet/intel/ice/ice_ethtool.c | 40 +++++++++++++---
drivers/net/ethernet/intel/ice/ice_lib.c | 7 ++--
drivers/net/ethernet/intel/ice/ice_lib.h | 1 +
net/ethtool/common.c | 3 +++
4 files changed, 40 insertions(+), 11 deletions(-)
--
2.43.0
^ permalink raw reply
* [PATCH iwl-next v5 1/2] ethtool: treat RXH_GTP_TEID as intrinsically symmetric
From: Aleksandr Loktionov @ 2026-06-26 5:47 UTC (permalink / raw)
To: intel-wired-lan, anthony.l.nguyen, aleksandr.loktionov; +Cc: netdev
In-Reply-To: <20260626054730.1126969-1-aleksandr.loktionov@intel.com>
A GTP tunnel uses the same TEID value in both directions of a flow;
including TEID in the hash input does not break src/dst symmetry.
ethtool_rxfh_config_is_sym() currently rejects any hash field bitmap
that contains bits outside the four paired L3/L4 fields. This causes
drivers that hash GTP flows on TEID to fail the kernel's preflight
validation in ethtool_check_flow_types(), making it impossible for
those drivers to support symmetric-xor transforms at all.
Strip RXH_GTP_TEID from the bitmap before the paired-field check so
that drivers may honestly report TEID hashing without blocking the
configuration of symmetric transforms.
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
---
net/ethtool/common.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/net/ethtool/common.c b/net/ethtool/common.c
index 5fae329..9a3fd76 100644
--- a/net/ethtool/common.c
+++ b/net/ethtool/common.c
@@ -911,6 +911,9 @@ int ethtool_rxfh_config_is_sym(u64 rxfh)
{
bool sym;
+ /* Strip TEID before checking - it carries no src/dst asymmetry */
+ rxfh &= ~RXH_GTP_TEID;
+
sym = rxfh == (rxfh & (RXH_IP_SRC | RXH_IP_DST |
RXH_L4_B_0_1 | RXH_L4_B_2_3));
sym &= !!(rxfh & RXH_IP_SRC) == !!(rxfh & RXH_IP_DST);
--
2.52.0
^ permalink raw reply related
* [PATCH iwl-next v5 2/2] ice: implement symmetric RSS hash configuration
From: Aleksandr Loktionov @ 2026-06-26 5:47 UTC (permalink / raw)
To: intel-wired-lan, anthony.l.nguyen, aleksandr.loktionov; +Cc: netdev
In-Reply-To: <20260626054730.1126969-1-aleksandr.loktionov@intel.com>
The driver advertises symmetric RSS support via supported_input_xfrm
but ice_set_rxfh() always programmed plain Toeplitz regardless of the
requested input_xfrm, making it impossible to actually enable symmetric
hashing.
Fix ice_set_rxfh() to honour rxfh->input_xfrm: program symmetric
Toeplitz (ICE_AQ_VSI_Q_OPT_RSS_HASH_SYM_TPLZ) when RXH_XFRM_SYM_XOR
is requested, revert to plain Toeplitz when the transform is cleared,
and skip the hardware write when the function has not changed.
Make ice_set_rss_vsi_ctx() non-static and export it so ice_set_rxfh()
can reprogram the VSI context directly. Change it to preserve
vsi->rss_hfunc across VSI reinitialisation instead of always resetting
to plain Toeplitz, which would silently undo any previously configured
symmetric hash function.
Fix ice_get_rxfh_fields() to report the hash fields actually programmed
in hardware. When the hardware hashes on only one half of an L3 or L4
pair, complete the pair in the reported bitmap to satisfy the kernel's
symmetry validator. For GTP flow types, report RXH_GTP_TEID honestly;
ethtool_rxfh_config_is_sym() now accepts TEID as an intrinsically
symmetric field (see preceding patch).
Tested with tools/testing/selftests/drivers/net/hw/rss_input_xfrm.py
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
---
drivers/net/ethernet/intel/ice/ice_ethtool.c | 40 ++++++++++++++++----
drivers/net/ethernet/intel/ice/ice_lib.c | 7 ++--
drivers/net/ethernet/intel/ice/ice_lib.h | 1 +
3 files changed, 37 insertions(+), 11 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_ethtool.c b/drivers/net/ethernet/intel/ice/ice_ethtool.c
index c6bc29c..6ccfe36 100644
--- a/drivers/net/ethernet/intel/ice/ice_ethtool.c
+++ b/drivers/net/ethernet/intel/ice/ice_ethtool.c
@@ -3008,14 +3008,17 @@ ice_set_rxfh_fields(struct net_device *netdev,
return 0;
}
+#define ICE_RSS_L3_PAIR (RXH_IP_SRC | RXH_IP_DST)
+#define ICE_RSS_L4_PAIR (RXH_L4_B_0_1 | RXH_L4_B_2_3)
+
static int
ice_get_rxfh_fields(struct net_device *netdev, struct ethtool_rxfh_fields *nfc)
{
struct ice_netdev_priv *np = netdev_priv(netdev);
struct ice_vsi *vsi = np->vsi;
struct ice_pf *pf = vsi->back;
+ u64 l3, l4, hash_flds;
struct device *dev;
- u64 hash_flds;
bool symm;
u32 hdrs;
@@ -3067,6 +3070,13 @@ ice_get_rxfh_fields(struct net_device *netdev, struct ethtool_rxfh_fields *nfc)
hash_flds & ICE_FLOW_HASH_FLD_GTPU_DWN_TEID)
nfc->data |= (u64)RXH_GTP_TEID;
+ l3 = nfc->data & ICE_RSS_L3_PAIR;
+ l4 = nfc->data & ICE_RSS_L4_PAIR;
+ if (l3 && l3 != ICE_RSS_L3_PAIR)
+ nfc->data |= ICE_RSS_L3_PAIR;
+ if (l4 && l4 != ICE_RSS_L4_PAIR)
+ nfc->data |= ICE_RSS_L4_PAIR;
+
return 0;
}
@@ -3667,7 +3677,6 @@ ice_set_rxfh(struct net_device *netdev, struct ethtool_rxfh_param *rxfh,
struct netlink_ext_ack *extack)
{
struct ice_netdev_priv *np = netdev_priv(netdev);
- u8 hfunc = ICE_AQ_VSI_Q_OPT_RSS_HASH_TPLZ;
struct ice_vsi *vsi = np->vsi;
struct ice_pf *pf = vsi->back;
struct device *dev;
@@ -3689,13 +3698,28 @@ ice_set_rxfh(struct net_device *netdev, struct ethtool_rxfh_param *rxfh,
return -EOPNOTSUPP;
}
- /* Update the VSI's hash function */
- if (rxfh->input_xfrm & RXH_XFRM_SYM_XOR)
- hfunc = ICE_AQ_VSI_Q_OPT_RSS_HASH_SYM_TPLZ;
+ /* Handle RSS symmetric hash transformation */
+ if (rxfh->input_xfrm != RXH_XFRM_NO_CHANGE) {
+ u8 new_hfunc;
- err = ice_set_rss_hfunc(vsi, hfunc);
- if (err)
- return err;
+ if (rxfh->input_xfrm == RXH_XFRM_SYM_XOR)
+ new_hfunc = ICE_AQ_VSI_Q_OPT_RSS_HASH_SYM_TPLZ;
+ else if (!rxfh->input_xfrm)
+ new_hfunc = ICE_AQ_VSI_Q_OPT_RSS_HASH_TPLZ;
+ else
+ return -EOPNOTSUPP;
+
+ if (new_hfunc != vsi->rss_hfunc) {
+ err = ice_set_rss_hfunc(vsi, new_hfunc);
+ if (err) {
+ netdev_err(netdev, "Failed to set RSS hash function\n");
+ return err;
+ }
+ netdev_dbg(netdev, "RSS hash function: %sToeplitz\n",
+ new_hfunc == ICE_AQ_VSI_Q_OPT_RSS_HASH_SYM_TPLZ ?
+ "Symmetric " : "");
+ }
+ }
if (rxfh->key) {
if (!vsi->rss_hkey_user) {
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
index d921269..5b1934b 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_lib.c
@@ -1155,7 +1155,7 @@ static void ice_set_fd_vsi_ctx(struct ice_vsi_ctx *ctxt, struct ice_vsi *vsi)
* @ctxt: the VSI context being set
* @vsi: the VSI being configured
*/
-static void ice_set_rss_vsi_ctx(struct ice_vsi_ctx *ctxt, struct ice_vsi *vsi)
+void ice_set_rss_vsi_ctx(struct ice_vsi_ctx *ctxt, struct ice_vsi *vsi)
{
u8 lut_type, hash_type;
struct device *dev;
@@ -1181,8 +1181,9 @@ static void ice_set_rss_vsi_ctx(struct ice_vsi_ctx *ctxt, struct ice_vsi *vsi)
return;
}
- hash_type = ICE_AQ_VSI_Q_OPT_RSS_HASH_TPLZ;
- vsi->rss_hfunc = hash_type;
+ if (!vsi->rss_hfunc)
+ vsi->rss_hfunc = ICE_AQ_VSI_Q_OPT_RSS_HASH_TPLZ;
+ hash_type = vsi->rss_hfunc;
ctxt->info.q_opt_rss =
FIELD_PREP(ICE_AQ_VSI_Q_OPT_RSS_LUT_M, lut_type) |
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.h b/drivers/net/ethernet/intel/ice/ice_lib.h
index 49454d98..29ba335 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.h
+++ b/drivers/net/ethernet/intel/ice/ice_lib.h
@@ -46,6 +46,7 @@ void ice_vsi_delete(struct ice_vsi *vsi);
int ice_vsi_cfg_tc(struct ice_vsi *vsi, u8 ena_tc);
int ice_vsi_cfg_rss_lut_key(struct ice_vsi *vsi);
+void ice_set_rss_vsi_ctx(struct ice_vsi_ctx *ctxt, struct ice_vsi *vsi);
void ice_vsi_cfg_netdev_tc(struct ice_vsi *vsi, u8 ena_tc);
--
2.52.0
^ permalink raw reply related
* RE: [Intel-wired-lan] [TEST] Weird RSS state on ice
From: Loktionov, Aleksandr @ 2026-06-26 5:51 UTC (permalink / raw)
To: Jakub Kicinski, Pielech, Adrian
Cc: Kitszel, Przemyslaw, netdev@vger.kernel.org,
intel-wired-lan@lists.osuosl.org
In-Reply-To: <20260625190625.0f5ffe01@kernel.org>
> -----Original Message-----
> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Friday, June 26, 2026 4:06 AM
> To: Loktionov, Aleksandr <aleksandr.loktionov@intel.com>
> Cc: Pielech, Adrian <adrian.pielech@intel.com>; Kitszel, Przemyslaw
> <przemyslaw.kitszel@intel.com>; netdev@vger.kernel.org; intel-wired-
> lan@lists.osuosl.org
> Subject: Re: [Intel-wired-lan] [TEST] Weird RSS state on ice
>
> On Thu, 25 Jun 2026 07:11:14 +0000 Loktionov, Aleksandr wrote:
> > The patchset didn't help?
> >
> > [PATCH iwl-next v5 2/2] ice: implement symmetric RSS hash
> > configuration
>
> Not sure, it's not in tree, and lore doesn't want to point me at it
> either. What I don't get is how we get into the bad state in the first
> place.
>
> Looking at other tests today I spotted that rss flow label test is
> also behaving oddly. Most of the time the first case fails and the
> second
> passes:
>
> test "rss-flow-label-py"
> group "selftests-drivers-net-hw"
> result "fail"
> link "https://netdev-ci-results.intel.com/ice-results/net-next-hw-
> 2026-06-26--00-00/ice-E810-XXV4/rss_flow_label.py/stdout"
> results
> 0
> test "rss-flow-label-test-rss-flow-label"
> result "fail"
> 1
> test "rss-flow-label-test-rss-flow-label-6only"
> result "pass"
>
>
> But every now and then they skip:
>
> ok 1 rss_flow_label.test_rss_flow_label # SKIP Device doesn't support
> Flow Label for UDP6 ok 2 rss_flow_label.test_rss_flow_label_6only #
> SKIP Device doesn't support Flow Label for UDP6
>
> test "rss-flow-label-py"
> group "selftests-drivers-net-hw"
> result "skip"
> link "https://netdev-ci-results.intel.com/ice-results/net-next-hw-
> 2026-06-25--16-00/ice-E810-XXV4/rss_flow_label.py/stdout"
> results
> 0
> test "rss-flow-label-test-rss-flow-label"
> result "skip"
> 1
> test "rss-flow-label-test-rss-flow-label-6only"
> result "skip"
>
>
> The devlink info is identical so it must be that the device is in
> unclean state sometimes?? Do y'all power cycle these machines between
> runs?
Good day, Jakub
I heard from @Pielech, Adrian that we experienced infrastructure issues, but reboots helped us. Please ask him about CI infrastructure.
About my v5 March 16 symmetric RSS fix, which worked for me, I've just resent it today, please bless it.
With the best regards
Alex
^ permalink raw reply
* Re: [syzbot] [net?] BUG: soft lockup in perf_event_open (2)
From: syzbot @ 2026-06-26 6:20 UTC (permalink / raw)
To: acme, adrian.hunter, alexander.shishkin, andrew, davem, edumazet,
eperezma, irogers, james.clark, jasowang, jolsa, kuba,
linux-kernel, linux-perf-users, mark.rutland, mingo, mst,
namhyung, netdev, pabeni, peterz, syzkaller-bugs, virtualization,
xuanzhuo
In-Reply-To: <69ad01b6.050a0220.310d8.0006.GAE@google.com>
syzbot has found a reproducer for the following issue on:
HEAD commit: 4edcdefd4083 Merge tag 'bpf-fixes' of git://git.kernel.org..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=16da941e580000
kernel config: https://syzkaller.appspot.com/x/.config?x=3c3d59be33cf7e9a
dashboard link: https://syzkaller.appspot.com/bug?extid=e04801269a8f6321dd79
compiler: Debian clang version 22.1.8 (++20260613092233+e80beda6e255-1~exp1~20260613092250.77), Debian LLD 22.1.8
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=164054ea580000
Downloadable assets:
disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/d900f083ada3/non_bootable_disk-4edcdefd.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/01cc5d298db0/vmlinux-4edcdefd.xz
kernel image: https://storage.googleapis.com/syzbot-assets/59e4cf862ca3/bzImage-4edcdefd.xz
IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+e04801269a8f6321dd79@syzkaller.appspotmail.com
rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
rcu: (detected by 0, t=10502 jiffies, g=61569, q=162 ncpus=1)
rcu: All QSes seen, last rcu_preempt kthread activity 10502 (4294967357-4294956855), jiffies_till_next_fqs=1, root ->qsmask 0x0
rcu: rcu_preempt kthread starved for 10502 jiffies! g61569 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
rcu: RCU grace-period kthread stack dump:
task:rcu_preempt state:R running task stack:27464 pid:16 tgid:16 ppid:2 task_flags:0x208040 flags:0x00080000
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5510 [inline]
__schedule+0x17d9/0x56c0 kernel/sched/core.c:7234
__schedule_loop kernel/sched/core.c:7311 [inline]
schedule+0x164/0x2b0 kernel/sched/core.c:7326
schedule_timeout+0x152/0x2c0 kernel/time/sleep_timeout.c:99
rcu_gp_fqs_loop+0x30c/0x11f0 kernel/rcu/tree.c:2123
rcu_gp_kthread+0x9e/0x2b0 kernel/rcu/tree.c:2325
kthread+0x388/0x470 kernel/kthread.c:436
ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
</TASK>
rcu: Stack dump where RCU GP kthread last ran:
CPU: 0 UID: 0 PID: 5714 Comm: syz.3.20 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
RIP: 0010:unwind_next_frame+0x19ae/0x2550 arch/x86/kernel/unwind_orc.c:677
Code: e8 03 42 0f b6 04 20 84 c0 0f 85 b6 0a 00 00 4c 89 e8 48 c1 e8 03 42 0f b6 04 20 84 c0 0f 85 c5 0a 00 00 48 0f bf 03 49 01 c0 <49> 8d 56 40 4c 89 f7 4c 89 c6 e8 33 0e 00 00 84 c0 0f 84 4e 01 00
RSP: 0018:ffffc900000074e0 EFLAGS: 00000283
RAX: fffffffffffffff0 RBX: ffffffff91709a28 RCX: 0000000000000000
RDX: ffffffff91709a2a RSI: 0000000000000008 RDI: ffffc900000075e8
RBP: 1ffffffff22e1345 R08: ffffc900034df7c8 R09: 0000000000000000
R10: ffffc900000075d8 R11: fffff52000000ebd R12: dffffc0000000000
R13: ffffffff91709a29 R14: ffffc90000007588 R15: ffffc900000075d0
FS: 0000555594cee500(0000) GS:ffff88808c81b000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f5209872780 CR3: 0000000035426000 CR4: 0000000000352ef0
Call Trace:
<IRQ>
arch_stack_walk+0x11b/0x150 arch/x86/kernel/stacktrace.c:25
stack_trace_save+0xa9/0x100 kernel/stacktrace.c:122
kasan_save_stack mm/kasan/common.c:57 [inline]
kasan_save_track+0x3e/0x80 mm/kasan/common.c:78
kasan_save_free_info+0x40/0x50 mm/kasan/generic.c:584
poison_slab_object mm/kasan/common.c:253 [inline]
__kasan_slab_free+0x5c/0x80 mm/kasan/common.c:285
kasan_slab_free include/linux/kasan.h:235 [inline]
slab_free_hook mm/slub.c:2705 [inline]
slab_free mm/slub.c:6405 [inline]
kmem_cache_free+0x182/0x650 mm/slub.c:6532
kfree_skb_reason include/linux/skbuff.h:1323 [inline]
kfree_skb include/linux/skbuff.h:1332 [inline]
hsr_forward_skb+0x1a27/0x28c0 net/hsr/hsr_forward.c:753
send_hsr_supervision_frame+0x733/0xcf0 net/hsr/hsr_device.c:364
hsr_announce+0x1db/0x370 net/hsr/hsr_device.c:421
call_timer_fn+0x192/0x5e0 kernel/time/timer.c:1748
expire_timers kernel/time/timer.c:1799 [inline]
__run_timers kernel/time/timer.c:2374 [inline]
__run_timer_base+0x652/0x8b0 kernel/time/timer.c:2386
run_timer_base kernel/time/timer.c:2395 [inline]
run_timer_softirq+0xb7/0x170 kernel/time/timer.c:2405
handle_softirqs+0x225/0x840 kernel/softirq.c:622
__do_softirq kernel/softirq.c:656 [inline]
invoke_softirq kernel/softirq.c:496 [inline]
__irq_exit_rcu+0xca/0x220 kernel/softirq.c:735
irq_exit_rcu+0x9/0x30 kernel/softirq.c:752
instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1062 [inline]
sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1062
</IRQ>
<TASK>
asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:674
RIP: 0010:finish_task_switch+0x417/0xc60 kernel/sched/core.c:5361
Code: 04 00 00 41 c7 84 24 20 0e 00 00 00 00 00 00 0f 1f 44 00 00 49 83 c4 48 4c 89 e7 e8 a3 5a 23 0a e8 6e bc 39 00 fb 4c 8b 65 c8 <49> 8d bc 24 f8 16 00 00 48 89 f8 48 c1 e8 03 42 0f b6 04 30 84 c0
RSP: 0018:ffffc900034df880 EFLAGS: 00000206
RAX: 00000000000009c3 RBX: ffff88801fc3bf20 RCX: 0000000080000001
RDX: 0000000000000006 RSI: ffffffff8dfe613c RDI: ffffffff8c2aaf80
RBP: ffffc900034df8d0 R08: ffffffff9032f8f7 R09: 1ffffffff2065f1e
R10: dffffc0000000000 R11: fffffbfff2065f1f R12: ffff88803292a540
R13: ffff88801fc3bee8 R14: dffffc0000000000 R15: 1ffff11003f877e4
context_switch kernel/sched/core.c:5513 [inline]
__schedule+0x17e1/0x56c0 kernel/sched/core.c:7234
preempt_schedule_common+0x82/0xd0 kernel/sched/core.c:7413
preempt_schedule_thunk+0x16/0x40 arch/x86/entry/thunk.S:12
__mutex_lock_common kernel/locking/mutex.c:656 [inline]
__mutex_lock+0x321/0x1550 kernel/locking/mutex.c:821
__do_sys_perf_event_open kernel/events/core.c:14249 [inline]
__se_sys_perf_event_open+0x1984/0x1d40 kernel/events/core.c:13881
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f520999ce59
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffd9f0e1458 EFLAGS: 00000246 ORIG_RAX: 000000000000012a
RAX: ffffffffffffffda RBX: 00007f5209c15fa0 RCX: 00007f520999ce59
RDX: bfffffffffffffff RSI: 0000000000000000 RDI: 0000200000000180
RBP: 00007f5209a32e6f R08: 0000000000000000 R09: 0000000000000000
R10: ffffffffffffffff R11: 0000000000000246 R12: 0000000000000000
R13: 00007f5209c15fac R14: 00007f5209c15fa0 R15: 00007f5209c15fa0
</TASK>
---
If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.
^ permalink raw reply
* [PATCH net v2] octeontx2-pf: check DMAC extraction support before filtering
From: nshettyj @ 2026-06-26 6:23 UTC (permalink / raw)
To: netdev, linux-kernel
Cc: sgoutham, gakula, sbhatta, hkelam, bbhushan2, andrew+netdev,
davem, edumazet, kuba, pabeni, naveenm, tduszynski, sumang,
Nitin Shetty J
In-Reply-To: <20260625172552.258631-1-nshettyj@marvell.com>
From: Suman Ghosh <sumang@marvell.com>
Currently, configuring a VF MAC address via the PF (e.g., 'ip link
set <pf> vf 0 mac <mac>') blindly attempts to install a DMAC-based
hardware filter. However, the hardware parser profile might not
support DMAC extraction.
Check if the hardware parsing profile supports DMAC extraction
before adding the filter. Additionally, emit a warning message
to inform the operator if the MAC filter installation fails due
to missing DMAC extraction support.
Fixes: f0c2982aaf98 ("octeontx2-pf: Add support for SR-IOV management functions")
Signed-off-by: Suman Ghosh <sumang@marvell.com>
Signed-off-by: Nitin Shetty J <nshettyj@marvell.com>
---
v2:
- Move the DMAC extraction check from otx2_set_vf_mac() into
otx2_do_set_vf_mac() which already holds pf->mbox.lock, so all
mbox operations are under a single lock/unlock pair. All error
paths now use the existing goto-out pattern, eliminating the
scattered mutex_unlock() + return calls from v1.
- Return -EOPNOTSUPP instead of 0 when DMAC extraction is not
supported, so the caller gets an explicit error rather than a
silent success.
---
.../ethernet/marvell/octeontx2/nic/otx2_pf.c | 33 +++++++++++++++++++
1 file changed, 33 insertions(+)
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c
index b63df5737ff2..dc7e4a225dd0 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c
@@ -2517,10 +2517,43 @@ EXPORT_SYMBOL(otx2_config_hwtstamp_set);
static int otx2_do_set_vf_mac(struct otx2_nic *pf, int vf, const u8 *mac)
{
+ struct npc_get_field_status_req *freq;
+ struct npc_get_field_status_rsp *frsp;
struct npc_install_flow_req *req;
int err;
mutex_lock(&pf->mbox.lock);
+
+ /* Skip installing the DMAC filter if the hardware parser profile
+ * does not support DMAC extraction.
+ */
+ freq = otx2_mbox_alloc_msg_npc_get_field_status(&pf->mbox);
+ if (!freq) {
+ err = -ENOMEM;
+ goto out;
+ }
+
+ freq->field = NPC_DMAC;
+ if (otx2_sync_mbox_msg(&pf->mbox)) {
+ err = -EINVAL;
+ goto out;
+ }
+
+ frsp = (struct npc_get_field_status_rsp *)otx2_mbox_get_rsp
+ (&pf->mbox.mbox, 0, &freq->hdr);
+ if (IS_ERR(frsp)) {
+ err = PTR_ERR(frsp);
+ goto out;
+ }
+
+ if (!frsp->enable) {
+ netdev_warn(pf->netdev,
+ "VF %d MAC filter not installed: DMAC extraction not supported by parser profile\n",
+ vf);
+ err = -EOPNOTSUPP;
+ goto out;
+ }
+
req = otx2_mbox_alloc_msg_npc_install_flow(&pf->mbox);
if (!req) {
err = -ENOMEM;
--
2.48.1
^ permalink raw reply related
* RE: [Intel-wired-lan] [PATCH] ice: propagate ETH56G deskew read errors
From: Loktionov, Aleksandr @ 2026-06-26 6:45 UTC (permalink / raw)
To: Pengpeng Hou, Nguyen, Anthony L, Kitszel, Przemyslaw
Cc: Andrew Lunn, davem@davemloft.net, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Richard Cochran, intel-wired-lan@lists.osuosl.org,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <20260625030305.85304-1-pengpeng@iscas.ac.cn>
> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf
> Of Pengpeng Hou
> Sent: Thursday, June 25, 2026 5:03 AM
> To: Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Kitszel,
> Przemyslaw <przemyslaw.kitszel@intel.com>
> Cc: Andrew Lunn <andrew+netdev@lunn.ch>; davem@davemloft.net; Eric
> Dumazet <edumazet@google.com>; Jakub Kicinski <kuba@kernel.org>; Paolo
> Abeni <pabeni@redhat.com>; Richard Cochran <richardcochran@gmail.com>;
> intel-wired-lan@lists.osuosl.org; netdev@vger.kernel.org; linux-
> kernel@vger.kernel.org; pengpeng@iscas.ac.cn
> Subject: [Intel-wired-lan] [PATCH] ice: propagate ETH56G deskew read
> errors
>
> ice_ptp_calc_deskew_eth56g() returns a u32 deskew value, but it also
> returns the negative read_poll_timeout() error when the DESKEW valid
> bit never appears. That converts the negative error into a large
> unsigned deskew contribution, which can then be folded into the RX
> timestamp offset and programmed into hardware.
>
> Return the deskew value through an output parameter and propagate the
> read error from ice_phy_set_offsets_eth56g() instead of using it as
> offset data.
>
> Signed-off-by: Pengpeng Hou <pengpeng@iscas.ac.cn>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
I recommend to add Fixes: tag, and proper net or net-next in [PATCH ] to steer properly your fix.
With the best regards
Alex
> ---
> drivers/net/ethernet/intel/ice/ice_ptp_hw.c | 27 +++++++++++++++-----
> -
> 1 file changed, 19 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/ice/ice_ptp_hw.c
> b/drivers/net/ethernet/intel/ice/ice_ptp_hw.c
> index 8e5f97835954..bd2e31b816a8 100644
> --- a/drivers/net/ethernet/intel/ice/ice_ptp_hw.c
> +++ b/drivers/net/ethernet/intel/ice/ice_ptp_hw.c
> @@ -1736,17 +1736,21 @@ static u32 ice_ptp_calc_bitslip_eth56g(struct
> ice_hw *hw, u8 port, u32 bs,
> * @ds: deskew multiplier
> * @rs: RS-FEC enabled
> * @spd: link speed
> + * @deskew: calculated deskew value
> *
> - * Return: calculated deskew value
> + * Return: 0 on success, negative error code otherwise
> */
> -static u32 ice_ptp_calc_deskew_eth56g(struct ice_hw *hw, u8 port, u32
> ds,
> - bool rs, enum ice_eth56g_link_spd
> spd)
> +static int ice_ptp_calc_deskew_eth56g(struct ice_hw *hw, u8 port, u32
> ds,
> + bool rs, enum ice_eth56g_link_spd
> spd,
> + u32 *deskew)
> {
> u32 deskew_i, deskew_f;
> int err;
>
> - if (!ds)
> + if (!ds) {
> + *deskew = 0;
> return 0;
> + }
>
> read_poll_timeout(ice_read_ptp_reg_eth56g, err,
> FIELD_GET(PHY_REG_DESKEW_0_VALID, deskew_i),
> 500, @@ -1766,7 +1770,9 @@ static u32
> ice_ptp_calc_deskew_eth56g(struct ice_hw *hw, u8 port, u32 ds,
> deskew_i = FIELD_PREP(ICE_ETH56G_MAC_CFG_RX_OFFSET_INT,
> deskew_i);
> /* Shift 3 fractional bits to the end of the integer part */
> deskew_f <<= ICE_ETH56G_MAC_CFG_FRAC_W -
> PHY_REG_DESKEW_0_RLEVEL_FRAC_W;
> - return mul_u32_u32_fx_q9(deskew_i | deskew_f, ds);
> + *deskew = mul_u32_u32_fx_q9(deskew_i | deskew_f, ds);
> +
> + return 0;
> }
>
> /**
> @@ -1789,6 +1795,7 @@ static int ice_phy_set_offsets_eth56g(struct
> ice_hw *hw, u8 port, {
> u32 rx_offset, tx_offset, bs_ds;
> bool onestep, sfd;
> + int err;
>
> onestep = hw->ptp.phy.eth56g.onestep_ena;
> sfd = hw->ptp.phy.eth56g.sfd_ena;
> @@ -1805,11 +1812,15 @@ static int ice_phy_set_offsets_eth56g(struct
> ice_hw *hw, u8 port,
> if (sfd)
> rx_offset = add_u32_u32_fx(rx_offset, cfg-
> >rx_offset.sfd);
>
> - if (spd < ICE_ETH56G_LNK_SPD_40G)
> + if (spd < ICE_ETH56G_LNK_SPD_40G) {
> bs_ds = ice_ptp_calc_bitslip_eth56g(hw, port, bs_ds, fc,
> rs,
> spd);
> - else
> - bs_ds = ice_ptp_calc_deskew_eth56g(hw, port, bs_ds, rs,
> spd);
> + } else {
> + err = ice_ptp_calc_deskew_eth56g(hw, port, bs_ds, rs,
> spd,
> + &bs_ds);
> + if (err)
> + return err;
> + }
> rx_offset = add_u32_u32_fx(rx_offset, bs_ds);
> rx_offset &= ICE_ETH56G_MAC_CFG_RX_OFFSET_INT |
> ICE_ETH56G_MAC_CFG_RX_OFFSET_FRAC;
> --
> 2.50.1 (Apple Git-155)
^ permalink raw reply
* RE: [Intel-wired-lan] [PATCH v4 net 1/3] i40e: unregister netdev before clearing VSI on reinit failure
From: Loktionov, Aleksandr @ 2026-06-26 6:45 UTC (permalink / raw)
To: Fijalkowski, Maciej, intel-wired-lan@lists.osuosl.org
Cc: netdev@vger.kernel.org, Karlsson, Magnus, kuba@kernel.org,
pabeni@redhat.com, horms@kernel.org, Kitszel, Przemyslaw,
Keller, Jacob E, Fijalkowski, Maciej
In-Reply-To: <20260625151431.1102838-2-maciej.fijalkowski@intel.com>
> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf
> Of Maciej Fijalkowski
> Sent: Thursday, June 25, 2026 5:14 PM
> To: intel-wired-lan@lists.osuosl.org
> Cc: netdev@vger.kernel.org; Karlsson, Magnus
> <magnus.karlsson@intel.com>; kuba@kernel.org; pabeni@redhat.com;
> horms@kernel.org; Kitszel, Przemyslaw <przemyslaw.kitszel@intel.com>;
> Keller, Jacob E <jacob.e.keller@intel.com>; Fijalkowski, Maciej
> <maciej.fijalkowski@intel.com>
> Subject: [Intel-wired-lan] [PATCH v4 net 1/3] i40e: unregister netdev
> before clearing VSI on reinit failure
>
> i40e_vsi_reinit_setup() tears down the existing VSI queue/ring backing
> state before allocating replacement arrays and queue tracking. If one
> of these early allocations fails, the function jumps directly to
> err_vsi and calls i40e_vsi_clear().
>
> For a registered netdev, this frees the VSI while netdev_priv(netdev)-
> >vsi can still point at it, leaving the registered netdev with
> dangling private driver state.
>
> Split the error path so failures after destructive reinit teardown
> first unregister and free the netdev before clearing the VSI.
>
> Fixes: d2a69fefd756 ("i40e: Fix changing previously set
> num_queue_pairs for PFs")
> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> ---
> drivers/net/ethernet/intel/i40e/i40e_main.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c
> b/drivers/net/ethernet/intel/i40e/i40e_main.c
> index a04683004a56..471fa7f7b643 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_main.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
> @@ -14274,7 +14274,7 @@ static struct i40e_vsi
> *i40e_vsi_reinit_setup(struct i40e_vsi *vsi)
> i40e_set_num_rings_in_vsi(vsi);
> ret = i40e_vsi_alloc_arrays(vsi, false);
> if (ret)
> - goto err_vsi;
> + goto err_netdev;
>
> alloc_queue_pairs = vsi->alloc_queue_pairs *
> (i40e_enabled_xdp_vsi(vsi) ? 2 : 1); @@ -
> 14284,7 +14284,7 @@ static struct i40e_vsi
> *i40e_vsi_reinit_setup(struct i40e_vsi *vsi)
> dev_info(&pf->pdev->dev,
> "failed to get tracking for %d queues for VSI %d
> err %d\n",
> alloc_queue_pairs, vsi->seid, ret);
> - goto err_vsi;
> + goto err_netdev;
> }
> vsi->base_queue = ret;
>
> @@ -14309,6 +14309,7 @@ static struct i40e_vsi
> *i40e_vsi_reinit_setup(struct i40e_vsi *vsi)
>
> err_rings:
> i40e_vsi_free_q_vectors(vsi);
> +err_netdev:
> if (vsi->netdev_registered) {
> vsi->netdev_registered = false;
> unregister_netdev(vsi->netdev);
> @@ -14318,7 +14319,6 @@ static struct i40e_vsi
> *i40e_vsi_reinit_setup(struct i40e_vsi *vsi)
> if (vsi->type == I40E_VSI_MAIN)
> i40e_devlink_destroy_port(pf);
> i40e_aq_delete_element(&pf->hw, vsi->seid, NULL);
> -err_vsi:
> i40e_vsi_clear(vsi);
> return NULL;
> }
> --
> 2.43.0
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
^ permalink raw reply
* RE: [Intel-wired-lan] [PATCH v4 net 2/3] i40e: fix potential UAF in i40e_vsi_setup()'s error path
From: Loktionov, Aleksandr @ 2026-06-26 6:45 UTC (permalink / raw)
To: Fijalkowski, Maciej, intel-wired-lan@lists.osuosl.org
Cc: netdev@vger.kernel.org, Karlsson, Magnus, kuba@kernel.org,
pabeni@redhat.com, horms@kernel.org, Kitszel, Przemyslaw,
Keller, Jacob E, Fijalkowski, Maciej
In-Reply-To: <20260625151431.1102838-3-maciej.fijalkowski@intel.com>
> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf
> Of Maciej Fijalkowski
> Sent: Thursday, June 25, 2026 5:15 PM
> To: intel-wired-lan@lists.osuosl.org
> Cc: netdev@vger.kernel.org; Karlsson, Magnus
> <magnus.karlsson@intel.com>; kuba@kernel.org; pabeni@redhat.com;
> horms@kernel.org; Kitszel, Przemyslaw <przemyslaw.kitszel@intel.com>;
> Keller, Jacob E <jacob.e.keller@intel.com>; Fijalkowski, Maciej
> <maciej.fijalkowski@intel.com>
> Subject: [Intel-wired-lan] [PATCH v4 net 2/3] i40e: fix potential UAF
> in i40e_vsi_setup()'s error path
>
> Sashiko pointed out an issue where error path in
> i40e_vsi_reinit_setup() released ring memory but then when freeing
> q_vectors, the rings mapped to q_vectors where touched which implies a
> regular use-after-free bug.
>
> Apparently i40e_vsi_setup() has the same problem, so swap the
> allocation and freeing order and fix the 13 year old bug.
>
> Fixes: 41c445ff0f48 ("i40e: main driver core")
> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> ---
> drivers/net/ethernet/intel/i40e/i40e_main.c | 12 ++++++------
> 1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c
> b/drivers/net/ethernet/intel/i40e/i40e_main.c
> index 471fa7f7b643..4adc7b0fb2f4 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_main.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
> @@ -14460,14 +14460,14 @@ struct i40e_vsi *i40e_vsi_setup(struct
> i40e_pf *pf, u8 type,
> fallthrough;
> case I40E_VSI_FDIR:
> /* set up vectors and rings if needed */
> - ret = i40e_vsi_setup_vectors(vsi);
> - if (ret)
> - goto err_msix;
> -
> ret = i40e_alloc_rings(vsi);
> if (ret)
> goto err_rings;
>
> + ret = i40e_vsi_setup_vectors(vsi);
> + if (ret)
> + goto err_qvec;
> +
> /* map all of the rings to the q_vectors */
> i40e_vsi_map_rings_to_vectors(vsi);
>
> @@ -14487,10 +14487,10 @@ struct i40e_vsi *i40e_vsi_setup(struct
> i40e_pf *pf, u8 type,
> return vsi;
>
> err_config:
> + i40e_vsi_free_q_vectors(vsi);
> +err_qvec:
> i40e_vsi_clear_rings(vsi);
> err_rings:
> - i40e_vsi_free_q_vectors(vsi);
> -err_msix:
> if (vsi->netdev_registered) {
> vsi->netdev_registered = false;
> unregister_netdev(vsi->netdev);
> --
> 2.43.0
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
^ permalink raw reply
* Re: [PATCH v2] netfilter: nf_log: validate MAC header was set before dumping it
From: Greg KH @ 2026-06-26 6:51 UTC (permalink / raw)
To: Alexander Martyniuk
Cc: sashal, bestswngs, coreteam, davem, fw, kaber, kadlec, kuba,
kuznet, linux-kernel, netdev, netfilter-devel, pablo, stable,
xmei5, yoshfuji
In-Reply-To: <20260625164755.161383-1-alexevgmart@gmail.com>
On Thu, Jun 25, 2026 at 07:47:55PM +0300, Alexander Martyniuk wrote:
> From: Xiang Mei <xmei5@asu.edu>
>
> commit a84b6fedbc97078788be78dbdd7517d143ad1a77 upstream.
>
> The fallback path of dump_mac_header() guards the MAC header access
> only with "skb->mac_header != skb->network_header", without checking
> skb_mac_header_was_set(). When the MAC header is unset, mac_header is
> 0xffff, so the test passes and skb_mac_header(skb) returns
> skb->head + 0xffff, ~64 KiB past the buffer; the loop then reads
> dev->hard_header_len bytes out of bounds into the kernel log.
>
> This is reachable via the netdev logger: nf_log_unknown_packet() calls
> dump_mac_header() unconditionally, and an skb sent through AF_PACKET
> with PACKET_QDISC_BYPASS reaches the egress hook with mac_header still
> unset (__dev_queue_xmit(), which would reset it, is bypassed).
>
> Add the skb_mac_header_was_set() check the ARPHRD_ETHER path already
> uses, and replace the open-coded MAC header length test with
> skb_mac_header_len(). Only skbs with an unset MAC header are affected;
> valid ones are dumped as before.
>
> BUG: KASAN: slab-out-of-bounds in dump_mac_header (net/netfilter/nf_log_syslog.c:831)
> Read of size 1 at addr ffff88800ea49d3f by task exploit/148
> Call Trace:
> kasan_report (mm/kasan/report.c:595)
> dump_mac_header (net/netfilter/nf_log_syslog.c:831)
> nf_log_netdev_packet (net/netfilter/nf_log_syslog.c:938 net/netfilter/nf_log_syslog.c:963)
> nf_log_packet (net/netfilter/nf_log.c:260)
> nft_log_eval (net/netfilter/nft_log.c:60)
> nft_do_chain (net/netfilter/nf_tables_core.c:285)
> nft_do_chain_netdev (net/netfilter/nft_chain_filter.c:307)
> nf_hook_slow (net/netfilter/core.c:619)
> nf_hook_direct_egress (net/packet/af_packet.c:257)
> packet_xmit (net/packet/af_packet.c:280)
> packet_sendmsg (net/packet/af_packet.c:3114)
> __sys_sendto (net/socket.c:2265)
>
> Fixes: 7eb9282cd0ef ("netfilter: ipt_LOG/ip6t_LOG: add option to print decoded MAC header")
> Reported-by: Weiming Shi <bestswngs@gmail.com>
> Assisted-by: Claude:claude-opus-4-8
> Signed-off-by: Xiang Mei <xmei5@asu.edu>
> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> Signed-off-by: Alexander Martyniuk <alexevgmart@gmail.com>
> ---
> net/ipv4/netfilter/nf_log_ipv4.c | 4 ++--
> net/ipv6/netfilter/nf_log_ipv6.c | 4 ++--
> 2 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/net/ipv4/netfilter/nf_log_ipv4.c b/net/ipv4/netfilter/nf_log_ipv4.c
> index d07583fac8f8..d6164e8e2c73 100644
> --- a/net/ipv4/netfilter/nf_log_ipv4.c
> +++ b/net/ipv4/netfilter/nf_log_ipv4.c
> @@ -296,8 +296,8 @@ static void dump_ipv4_mac_header(struct nf_log_buf *m,
>
> fallback:
> nf_log_buf_add(m, "MAC=");
> - if (dev->hard_header_len &&
> - skb->mac_header != skb->network_header) {
> + if (dev->hard_header_len && skb_mac_header_was_set(skb) &&
> + skb_mac_header_len(skb) != 0) {
> const unsigned char *p = skb_mac_header(skb);
> unsigned int i;
>
> diff --git a/net/ipv6/netfilter/nf_log_ipv6.c b/net/ipv6/netfilter/nf_log_ipv6.c
> index 8210ff34ed9b..cc724870a467 100644
> --- a/net/ipv6/netfilter/nf_log_ipv6.c
> +++ b/net/ipv6/netfilter/nf_log_ipv6.c
> @@ -309,8 +309,8 @@ static void dump_ipv6_mac_header(struct nf_log_buf *m,
>
> fallback:
> nf_log_buf_add(m, "MAC=");
> - if (dev->hard_header_len &&
> - skb->mac_header != skb->network_header) {
> + if (dev->hard_header_len && skb_mac_header_was_set(skb) &&
> + skb_mac_header_len(skb) != 0) {
> const unsigned char *p = skb_mac_header(skb);
> unsigned int len = dev->hard_header_len;
> unsigned int i;
> --
> 2.43.0
>
>
What kernel(s) is this for?
^ permalink raw reply
* [PATCH v2] vhost/net: fix clear_user start address in VHOST_GET_FEATURES_ARRAY
From: rom.wang @ 2026-06-26 7:04 UTC (permalink / raw)
To: Michael S . Tsirkin, Jason Wang, Eugenio Pérez, Paolo Abeni,
kvm, virtualization, netdev
Cc: linux-kernel, Yufeng Wang
From: Yufeng Wang <wangyufeng@kylinos.cn>
The clear_user() call in VHOST_GET_FEATURES_ARRAY incorrectly starts
at argp, which is the beginning of the features array, overwriting the
data just written by copy_to_user(). It should start after the copied
elements at argp + copied * sizeof(u64) to only zero the trailing
unused space.
Use size_mul() for both the offset and length calculations so the
arithmetic stays consistent with the surrounding code and remains
overflow-safe.
Fixes: 333c515d1896 ("vhost-net: allow configuring extended features")
Signed-off-by: Yufeng Wang <wangyufeng@kylinos.cn>
---
Changes in v2:
- Use size_mul() for the offset calculation as well, per review feedback.
Link to v1: https://lore.kernel.org/all/20260526080336.61296-1-r4o5m6e8o@163.com/
Note:
Thank you for your review and suggestions.
I tried to add a switch in tools/virtio/vhost_net_test.c.
The switch is meant to use VHOST_GET_FEATURES_ARRAY and
VHOST_SET_FEATURES_ARRAY instead of the legacy versions.
However, when I ran `make virtio` in the tools directory,
the build failed with an error: missing asm/percpu_types.h.
I fixed that error, but then another error appeared.
Would it be acceptable to postpone the submission of
this test case until I have sorted out all the build
errors?
---
drivers/vhost/net.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 77b59f49bddb..4b963dafa233 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -1784,7 +1784,8 @@ static long vhost_net_ioctl(struct file *f, unsigned int ioctl,
return -EFAULT;
/* Zero the trailing space provided by user-space, if any */
- if (clear_user(argp, size_mul(count - copied, sizeof(u64))))
+ if (clear_user(argp + size_mul(copied, sizeof(u64)),
+ size_mul(count - copied, sizeof(u64))))
return -EFAULT;
return 0;
case VHOST_SET_FEATURES_ARRAY:
--
2.34.1
^ permalink raw reply related
* [PATCH net] net: enetc: check the number of BDs needed for xdp_frame
From: wei.fang @ 2026-06-26 7:32 UTC (permalink / raw)
To: claudiu.manoil, vladimir.oltean, xiaoning.wang, andrew+netdev,
davem, edumazet, kuba, pabeni, ast, daniel, hawk, john.fastabend,
sdf
Cc: wei.fang, imx, netdev, linux-kernel, bpf
From: Wei Fang <wei.fang@nxp.com>
The size of xdp_redirect_arr array is ENETC_MAX_SKB_FRAGS. However, the
number of fragments contained in xdp_frame may be greater than or equal
to ENETC_MAX_SKB_FRAGS, which will cause the access to xdp_redirect_arr
to be out of bounds.
Fixes: 9d2b68cc108d ("net: enetc: add support for XDP_REDIRECT")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
---
drivers/net/ethernet/freescale/enetc/enetc.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c b/drivers/net/ethernet/freescale/enetc/enetc.c
index aa8a87124b10..8e3f345dd9aa 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc.c
@@ -1783,6 +1783,7 @@ int enetc_xdp_xmit(struct net_device *ndev, int num_frames,
{
struct enetc_tx_swbd xdp_redirect_arr[ENETC_MAX_SKB_FRAGS] = {0};
struct enetc_ndev_priv *priv = netdev_priv(ndev);
+ struct skb_shared_info *shinfo;
struct enetc_bdr *tx_ring;
int xdp_tx_bd_cnt, i, k;
int xdp_tx_frm_cnt = 0;
@@ -1798,6 +1799,12 @@ int enetc_xdp_xmit(struct net_device *ndev, int num_frames,
prefetchw(ENETC_TXBD(*tx_ring, tx_ring->next_to_use));
for (k = 0; k < num_frames; k++) {
+ if (xdp_frame_has_frags(frames[k])) {
+ shinfo = xdp_get_shared_info_from_frame(frames[k]);
+ if (unlikely((shinfo->nr_frags + 1) > ENETC_MAX_SKB_FRAGS))
+ break;
+ }
+
xdp_tx_bd_cnt = enetc_xdp_frame_to_xdp_tx_swbd(tx_ring,
xdp_redirect_arr,
frames[k]);
--
2.34.1
^ permalink raw reply related
* [PATCH net] nfc: pn533: hold a reference to the request skb during send_frame
From: Yinhao Hu @ 2026-06-26 7:34 UTC (permalink / raw)
To: David Heidelberg
Cc: Kees Cook, Krzysztof Kozlowski, Dan Carpenter, Jakub Kicinski,
Samuel Ortiz, Michael Thalmeier, netdev, dzm91,
hust-os-kernel-patches, Yinhao Hu
__pn533_send_async() publishes the command and then calls
dev->phy_ops->send_frame(). Once dev->cmd is set, an incoming frame
can be matched to this command: the I2C threaded IRQ runs
pn533_recv_frame(), which queues cmd_complete_work, and
pn533_send_async_complete() frees cmd->req with consume_skb().
On the I2C transport, pn533_i2c_send_frame() still dereferences the same
skb after i2c_master_send() returns, so a completion that races the
send can free the skb while the transport is still using it.
The request skb is owned by the command object and may be freed by
command completion at any time after dev->cmd is published, so the
transport send path must not assume it stays alive. Hold a temporary
reference to the request skb across the send_frame() call so the
transport always sees a live skb even if completion races the send.
Add a pn533_send_cmd_frame() helper and use it from all three send
paths.
Fixes: 9815c7cf22da ("NFC: pn533: Separate physical layer from the core implementation")
Signed-off-by: Yinhao Hu <dddddd@hust.edu.cn>
---
drivers/nfc/pn533/pn533.c | 21 +++++++++++++++------
1 file changed, 15 insertions(+), 6 deletions(-)
diff --git a/drivers/nfc/pn533/pn533.c b/drivers/nfc/pn533/pn533.c
index d7bdbc82e2ba..55bbfa32d695 100644
--- a/drivers/nfc/pn533/pn533.c
+++ b/drivers/nfc/pn533/pn533.c
@@ -434,6 +434,18 @@ static int pn533_send_async_complete(struct pn533 *dev)
return rc;
}
+static int pn533_send_cmd_frame(struct pn533 *dev, struct pn533_cmd *cmd)
+{
+ struct sk_buff *req = cmd->req;
+ int rc;
+
+ skb_get(req);
+ dev->cmd = cmd;
+ rc = dev->phy_ops->send_frame(dev, req);
+ dev_kfree_skb(req);
+ return rc;
+}
+
static int __pn533_send_async(struct pn533 *dev, u8 cmd_code,
struct sk_buff *req,
pn533_send_async_complete_t complete_cb,
@@ -458,8 +470,7 @@ static int __pn533_send_async(struct pn533 *dev, u8 cmd_code,
mutex_lock(&dev->cmd_lock);
if (!dev->cmd_pending) {
- dev->cmd = cmd;
- rc = dev->phy_ops->send_frame(dev, req);
+ rc = pn533_send_cmd_frame(dev, cmd);
if (rc) {
dev->cmd = NULL;
goto error;
@@ -529,8 +540,7 @@ static int pn533_send_cmd_direct_async(struct pn533 *dev, u8 cmd_code,
pn533_build_cmd_frame(dev, cmd_code, req);
- dev->cmd = cmd;
- rc = dev->phy_ops->send_frame(dev, req);
+ rc = pn533_send_cmd_frame(dev, cmd);
if (rc < 0) {
dev->cmd = NULL;
kfree(cmd);
@@ -569,8 +579,7 @@ static void pn533_wq_cmd(struct work_struct *work)
mutex_unlock(&dev->cmd_lock);
- dev->cmd = cmd;
- rc = dev->phy_ops->send_frame(dev, cmd->req);
+ rc = pn533_send_cmd_frame(dev, cmd);
if (rc < 0) {
dev->cmd = NULL;
dev_kfree_skb(cmd->req);
--
2.43.0
^ permalink raw reply related
* [PATCH] Subject: [PATCH] net: gro: fix double aggregation of flush-marked skbs
From: Shiming Cheng @ 2026-06-26 7:40 UTC (permalink / raw)
To: netdev, davem, edumazet, kuba, pabeni, horms, matthias.bgg,
angelogioacchino.delregno, willemb, imv4bel, alice,
eilaimemedsnaimel, sd
Cc: lena.wang, stable, Shiming Cheng
The new skb_gro_receive_list() function is missing a critical safety check
present in the legacy skb_gro_receive() path. Specifically, it does not
validate NAPI_GRO_CB(skb)->flush before allowing packet aggregation.
This allows already-GRO'd packets with existing frag_list to be
re-aggregated into a new GRO session, corrupting the frag_list chain
structure. When skb_segment() attempts to unpack these malformed packets,
it encounters invalid state and triggers a kernel panic.
Scenario (Tethering/Device forwarding):
1. Driver: Driver Generated aggregated packet P1 via LRO with frag_list
2. Dev A: Receives aggregated fraglist packet and flush flag set
2. Dev A: Re-enters GRO, skb_gro_receive_list() is called
4. Missing flush check allows re-aggregation despite flush flag
5. Frag_list chain becomes corrupted (loops or dangling refs)
6. Dev B: TX path calls skb_segment(), crashes on corrupted frag_list
Root cause in skb_segment():
The check at line ~4891:
if (hsize <= 0 && i >= nfrags && skb_headlen(list_skb) &&
(skb_headlen(list_skb) == len || sg)) {
When frag_list is corrupted by double aggregation, when list_skb is
a NULL pointer from skb->next, skb_headlen(list_skb) dereference
NULL/corrupted pointers occurs.
Call Trace:
skb_headlen(NULL skb)
skb_segment
tcp_gso_segment
tcp4_gso_segment
inet_gso_segment
skb_mac_gso_segment
__skb_gso_segment
skb_gso_segment
validate_xmit_skb
validate_xmit_skb_list
sch_direct_xmit
qdisc_restart
__qdisc_run
qdisc_run
net_tx_action
Fix: Add NAPI_GRO_CB(skb)->flush validation to the early-return check in
skb_gro_receive_list(), matching the defensive programming pattern of
skb_gro_receive().
Fixes: 9dc2c3cd6c11 ("net: add fraglist GRO/GSO support")
Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com>
---
net/core/gro.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/core/gro.c b/net/core/gro.c
index 35f2f708f010..076247c1e662 100644
--- a/net/core/gro.c
+++ b/net/core/gro.c
@@ -229,7 +229,8 @@ int skb_gro_receive(struct sk_buff *p, struct sk_buff *skb)
int skb_gro_receive_list(struct sk_buff *p, struct sk_buff *skb)
{
- if (unlikely(p->len + skb->len >= 65536))
+ if (unlikely(p->len + skb->len >= 65536 ||
+ NAPI_GRO_CB(skb)->flush))
return -E2BIG;
if (!pskb_may_pull(skb, skb_gro_offset(skb))) {
--
2.45.2
^ permalink raw reply related
* Re: [PATCH net 0/7] xsk: fix AF_XDP multi-buffer Tx descriptor reclaim
From: Maciej Fijalkowski @ 2026-06-26 7:42 UTC (permalink / raw)
To: Stanislav Fomichev
Cc: Jason Xing, netdev, bpf, magnus.karlsson, stfomichev, kuba,
pabeni, horms, bjorn
In-Reply-To: <aj1O4vKzCuodwgYL@devvm7509.cco0.facebook.com>
On Thu, Jun 25, 2026 at 09:05:28AM -0700, Stanislav Fomichev wrote:
> On 06/25, Jason Xing wrote:
> > On Thu, Jun 25, 2026 at 12:37 AM Maciej Fijalkowski
> > <maciej.fijalkowski@intel.com> wrote:
> > >
> > > On Wed, Jun 24, 2026 at 08:38:20AM -0700, Stanislav Fomichev wrote:
> > > > On 06/23, Maciej Fijalkowski wrote:
> > > > > Hi,
> > > > >
> > > > > This series fixes several AF_XDP multi-buffer Tx paths where descriptors
> > > > > consumed from the Tx ring are not consistently returned to userspace
> > > > > through the completion ring when the packet is later dropped as invalid.
> > > > >
> > > > > The affected cases are invalid or oversized multi-buffer Tx packets in
> > > > > both the generic and zero-copy paths. In these cases, the kernel can
> > > > > consume one or more Tx descriptors while building or validating a
> > > > > multi-buffer packet, then drop the packet before it reaches the device.
> > > > > Userspace still owns the UMEM buffers only after the corresponding
> > > > > addresses are returned through the CQ. Missing completions therefore
> > > > > make userspace lose track of those buffers.
> > > > >
> > > > > The generic path fixes cover three related cases:
> > > > > * partially built multi-buffer skbs dropped by xsk_drop_skb();
> > > > > continuation descriptors left in the Tx ring after xsk_build_skb()
> > > > > reports overflow;
> > > > > * invalid descriptors encountered in the middle of a multi-buffer
> > > > > packet, including the offending invalid descriptor itself.
> > > > >
> > > > > The zero-copy path is handled separately. The batched Tx parser now
> > > > > distinguishes descriptors that can be passed to the driver from
> > > > > descriptors that are consumed only because they belong to an invalid
> > > > > multi-buffer packet. Reclaim-only descriptors are written to the CQ
> > > > > address area and published in completion order, after any earlier
> > > > > driver-visible Tx descriptors.
> > > > >
> > > > > The ZC batching path can also retain drain state when userspace has not
> > > > > yet provided the end of an invalid multi-buffer packet. To keep this
> > > > > state local to the singular batched path, the series prevents a second
> > > > > Tx socket from joining the same pool while such drain state exists.
> > > > > During the singular-to-shared transition, Tx batching is gated,
> > > > > pre-existing readers are waited out, and bind fails with -EAGAIN if the
> > > > > existing socket still has pending drain state. This avoids adding
> > > > > multi-buffer drain handling to the shared-UMEM fallback path.
> > > > >
> > > > > The last two patches update xskxceiver so the tests account invalid
> > > > > multi-buffer Tx packets as descriptors that must be reclaimed, while
> > > > > still not expecting those invalid packets on the Rx side.
> > > > >
> > > > > This is a follow-up to Jason's changes [0] which were addressing generic
> > > > > xmit only and this set allows me to pass full xskxceiver test suite run
> > > > > against ice driver.
> > > >
> > > > There is a fair amount of feedback from sashiko already :-( So the meta
> > > > question from me is: is it time to scrap our current approach where
> > > > we parse descriptor by descriptor? (and maintain half-baked skb and
> > > > half-consumed descriptor queues)
> > > >
> > > > Should we:
> > > >
> > > > 1. do desc[MAX_SKB_FRAGS] and xskq_cons_peek_desc until we exhaust
> > > > PKT_CONT (if the last packet has PKT_CONT, return EOVERFLOW to userspace
> > > > and do a full stop here)
> > > > 2. now that we really know the number of valid descriptors -> reserve
> > > > the cq space (if not -> EAGAIN)
> > > > 3. pre-allocate everything here (if at any point we have ENOMEM -> cleanup
> > > > locally, don't ever create semi-initialized skb)
> > > > 4. construct the skb
> > > > 5. xmit
> > >
> > > Yeah generic xmit became utterly horrible, haven't gone through sashiko
> > > reviews yet, but bare in mind this set also aligns zc side to what was
> > > previously being addressed by Jason.
> > >
> > > I believe planned logistics were to get these fixes onto net and then
> > > Jason had an implementation of batching on generic xmit, directed towards
> > > -next and that's where we could address current flow.
> >
> > Agreed. That's what I'm hoping for. There would be much more
> > discussion on how to do batch xmit in an elegant way, I believe.
>
> This doesn't have to depend on the batch rewrite, we should be able to rewrite
> this non-zc in net, this is still technically fixes, not feature work..
>
> There was already a couple of revisions with this drain_cont approach
> and every time I look at it feels like the cure is worse than the
> decease :-( Obviously not gonna stop you from going with the current approach,
> but these fixes feel a bit of a wasted effort to me (since the bugs keep
> coming and we are piling more complexity).
Well this is my fault as I took Jason's patches as-is and did not realize
Sashiko had issues with it. I *think* I got ZC side almost right so I'd
like to have at least one last round with trying to make the generic side
right...
^ permalink raw reply
* Re: [PATCH] Subject: [PATCH] net: gro: fix double aggregation of flush-marked skbs
From: Greg KH @ 2026-06-26 7:47 UTC (permalink / raw)
To: Shiming Cheng
Cc: netdev, davem, edumazet, kuba, pabeni, horms, matthias.bgg,
angelogioacchino.delregno, willemb, imv4bel, alice,
eilaimemedsnaimel, sd, lena.wang, stable
In-Reply-To: <20260626074059.25244-1-shiming.cheng@mediatek.com>
On Fri, Jun 26, 2026 at 03:40:59PM +0800, Shiming Cheng wrote:
> The new skb_gro_receive_list() function is missing a critical safety check
> present in the legacy skb_gro_receive() path. Specifically, it does not
> validate NAPI_GRO_CB(skb)->flush before allowing packet aggregation.
>
> This allows already-GRO'd packets with existing frag_list to be
> re-aggregated into a new GRO session, corrupting the frag_list chain
> structure. When skb_segment() attempts to unpack these malformed packets,
> it encounters invalid state and triggers a kernel panic.
>
> Scenario (Tethering/Device forwarding):
> 1. Driver: Driver Generated aggregated packet P1 via LRO with frag_list
> 2. Dev A: Receives aggregated fraglist packet and flush flag set
> 2. Dev A: Re-enters GRO, skb_gro_receive_list() is called
> 4. Missing flush check allows re-aggregation despite flush flag
> 5. Frag_list chain becomes corrupted (loops or dangling refs)
> 6. Dev B: TX path calls skb_segment(), crashes on corrupted frag_list
>
> Root cause in skb_segment():
> The check at line ~4891:
> if (hsize <= 0 && i >= nfrags && skb_headlen(list_skb) &&
> (skb_headlen(list_skb) == len || sg)) {
>
> When frag_list is corrupted by double aggregation, when list_skb is
> a NULL pointer from skb->next, skb_headlen(list_skb) dereference
> NULL/corrupted pointers occurs.
>
> Call Trace:
> skb_headlen(NULL skb)
> skb_segment
> tcp_gso_segment
> tcp4_gso_segment
> inet_gso_segment
> skb_mac_gso_segment
> __skb_gso_segment
> skb_gso_segment
> validate_xmit_skb
> validate_xmit_skb_list
> sch_direct_xmit
> qdisc_restart
> __qdisc_run
> qdisc_run
> net_tx_action
>
> Fix: Add NAPI_GRO_CB(skb)->flush validation to the early-return check in
> skb_gro_receive_list(), matching the defensive programming pattern of
> skb_gro_receive().
>
> Fixes: 9dc2c3cd6c11 ("net: add fraglist GRO/GSO support")
> Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com>
> ---
> net/core/gro.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/net/core/gro.c b/net/core/gro.c
> index 35f2f708f010..076247c1e662 100644
> --- a/net/core/gro.c
> +++ b/net/core/gro.c
> @@ -229,7 +229,8 @@ int skb_gro_receive(struct sk_buff *p, struct sk_buff *skb)
>
> int skb_gro_receive_list(struct sk_buff *p, struct sk_buff *skb)
> {
> - if (unlikely(p->len + skb->len >= 65536))
> + if (unlikely(p->len + skb->len >= 65536 ||
> + NAPI_GRO_CB(skb)->flush))
> return -E2BIG;
>
> if (!pskb_may_pull(skb, skb_gro_offset(skb))) {
> --
> 2.45.2
>
>
<formletter>
This is not the correct way to submit patches for inclusion in the
stable kernel tree. Please read:
https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
for how to do this properly.
</formletter>
^ permalink raw reply
* Re: [RFC net-next 00/17] MPTCP KTLS support
From: Geliang Tang @ 2026-06-26 7:56 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Matthieu Baerts, Mat Martineau, David S. Miller, Eric Dumazet,
Paolo Abeni, Simon Horman, Neal Cardwell, Kuniyuki Iwashima,
John Fastabend, Sabrina Dubroca, Hannes Reinecke, Geliang Tang,
netdev, mptcp, Gang Yan, Zqiang
In-Reply-To: <20260622090059.5d1813dd@kernel.org>
On Mon, 2026-06-22 at 09:00 -0700, Jakub Kicinski wrote:
> On Mon, 22 Jun 2026 18:43:20 +0800 Geliang Tang wrote:
> > Subject: [RFC net-next 00/17] MPTCP KTLS support
>
> Please no. We have a ton of unfixed bugs and may have to revert some
> of
> the features we dropped back in. I'd prefer to avoid large new bug
> surfaces until we reach an LTS release.
Sure, I can wait. During this time, I'll go over the implementation
more carefully to make sure there are no issues on the MPTCP side.
Thanks,
-Geliang
^ permalink raw reply
* Re: [PATCH net] net: airoha: fix max receive size configuration
From: Lorenzo Bianconi @ 2026-06-26 8:18 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman
Cc: linux-arm-kernel, linux-mediatek, netdev, Madhur Agrawal
In-Reply-To: <20260625-airoha-fix-rx-max-len-v1-1-45b9b827358d@kernel.org>
[-- Attachment #1: Type: text/plain, Size: 12249 bytes --]
> Set the GDM maximum receive size to AIROHA_MAX_RX_SIZE unconditionally
> during hardware initialization instead of updating it according to the
> configured MTU. This avoids dropping incoming frames that exceed the
> current MTU but could still be processed by the networking stack, which
> is able to fragment the reply on the TX side (e.g. ICMP echo requests).
> Move the per-port MTU configuration to the PPE egress path where it
> belongs, and set the tx frame size running airoha_ppe_set_xmit_frame_size()
> to dynamically track the maximum MTU across running interfaces sharing
> the same PPE instance.
> Fix the PPE MTU register addressing to pack two port entries per
> register word and add WAN_MTU0 configuration for non-LAN GDM devices.
commenting on sashiko's report:
https://sashiko.dev/#/patchset/20260625-airoha-fix-rx-max-len-v1-1-45b9b827358d%40kernel.org
>
> Fixes: 54d989d58d2a ("net: airoha: Move min/max packet len configuration in airoha_dev_open()")
> Tested-by: Madhur Agrawal <madhur.agrawal@airoha.com>
> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> ---
> drivers/net/ethernet/airoha/airoha_eth.c | 68 ++++++++++---------------------
> drivers/net/ethernet/airoha/airoha_eth.h | 2 +
> drivers/net/ethernet/airoha/airoha_ppe.c | 39 +++++++++++++-----
> drivers/net/ethernet/airoha/airoha_regs.h | 9 ++--
> 4 files changed, 58 insertions(+), 60 deletions(-)
>
> diff --git a/drivers/net/ethernet/airoha/airoha_eth.c b/drivers/net/ethernet/airoha/airoha_eth.c
> index 932b3a3df2e5..3f451c2d4c24 100644
> --- a/drivers/net/ethernet/airoha/airoha_eth.c
> +++ b/drivers/net/ethernet/airoha/airoha_eth.c
> @@ -178,10 +178,15 @@ static void airoha_fe_maccr_init(struct airoha_eth *eth)
> {
> int p;
>
> - for (p = 1; p <= ARRAY_SIZE(eth->ports); p++)
> + for (p = 1; p <= ARRAY_SIZE(eth->ports); p++) {
> airoha_fe_set(eth, REG_GDM_FWD_CFG(p),
> GDM_TCP_CKSUM_MASK | GDM_UDP_CKSUM_MASK |
> GDM_IP4_CKSUM_MASK | GDM_DROP_CRC_ERR_MASK);
> + airoha_fe_rmw(eth, REG_GDM_LEN_CFG(p),
> + GDM_SHORT_LEN_MASK | GDM_LONG_LEN_MASK,
> + FIELD_PREP(GDM_SHORT_LEN_MASK, 60) |
> + FIELD_PREP(GDM_LONG_LEN_MASK, AIROHA_MAX_RX_SIZE));
> + }
>
> airoha_fe_rmw(eth, REG_CDM_VLAN_CTRL(1), CDM_VLAN_MASK,
> FIELD_PREP(CDM_VLAN_MASK, 0x8100));
> @@ -1831,13 +1836,24 @@ static void airoha_update_hw_stats(struct airoha_gdm_dev *dev)
> spin_unlock(&port->stats_lock);
- This is a pre-existing issue, but can the spin_lock() used in
airoha_update_hw_stats() cause a deadlock?
If a process context holds port->stats_lock via spin_lock() and is preempted
by a networking softirq on the same CPU that calls dev_get_stats()
(which invokes ndo_get_stats64 -> airoha_update_hw_stats()), will the softirq
spin forever trying to acquire the same lock? Should this use spin_lock_bh()
instead?
- The reported issue has not been introduced by this patch. Moreover, I do
not think this is a real problem since in the current codebase
airoha_update_hw_stats() is always run in process context and not in-irq
context.
> }
>
> +static void airoha_dev_set_xmit_frame_size(struct net_device *netdev)
> +{
> + struct airoha_gdm_dev *dev = netdev_priv(netdev);
> +
> + airoha_ppe_set_xmit_frame_size(dev);
> + if (!airoha_is_lan_gdm_dev(dev))
> + airoha_fe_rmw(dev->eth, REG_WAN_MTU0, WAN_MTU0_MASK,
> + FIELD_PREP(WAN_MTU0_MASK,
> + VLAN_ETH_HLEN + netdev->mtu));
> +}
- Does this unconditional write to REG_WAN_MTU0 break sibling network devices
sharing the same WAN port?
If multiple interfaces share the same hardware port, this appears to overwrite
the shared register using only the current interface's MTU, ignoring the
maximum MTU of any active sibling interfaces. Could this cause the hardware to
drop frames for sibling interfaces if their MTU is larger than the most
recently configured interface?
- This is not a real issue since we can have at most a single WAN port in the
system
> +
> static int airoha_dev_open(struct net_device *netdev)
> {
> - int err, len = ETH_HLEN + netdev->mtu + ETH_FCS_LEN;
> struct airoha_gdm_dev *dev = netdev_priv(netdev);
> struct airoha_gdm_port *port = dev->port;
> - u32 cur_len, pse_port = FE_PSE_PORT_PPE1;
> struct airoha_qdma *qdma = dev->qdma;
> + u32 pse_port = FE_PSE_PORT_PPE1;
> + int err;
>
> netif_tx_start_all_queues(netdev);
> err = airoha_set_vip_for_gdm_port(dev, true);
[...]
> static int airoha_dev_stop(struct net_device *netdev)
> {
> struct airoha_gdm_dev *dev = netdev_priv(netdev);
> @@ -1909,7 +1889,7 @@ static int airoha_dev_stop(struct net_device *netdev)
> airoha_set_vip_for_gdm_port(dev, false);
>
> if (--port->users)
> - airoha_set_port_mtu(dev->eth, port);
> + airoha_ppe_set_xmit_frame_size(dev);
- Does this stop path fail to update the WAN MTU limit?
When an interface is stopped, airoha_ppe_set_xmit_frame_size() recalculates
the PPE MTU, but it looks like the global REG_WAN_MTU0 register is not updated
here. Will this prevent the MTU limit from correctly shrinking when an
interface is brought down?
- This is not a real issue since, as pointed out above, we can have at most a
single WAN port in the system, so there is no point to reconfigure
REG_WAN_MTU0 if this interface is stopped.
> else
> airoha_set_gdm_port_fwd_cfg(qdma->eth,
> REG_GDM_FWD_CFG(port->id),
> @@ -1962,10 +1942,6 @@ static int airoha_enable_gdm2_loopback(struct airoha_gdm_dev *dev)
> FIELD_PREP(LPBK_CHAN_MASK, chan) |
> LBK_GAP_MODE_MASK | LBK_LEN_MODE_MASK |
> LBK_CHAN_MODE_MASK | LPBK_EN_MASK);
> - airoha_fe_rmw(eth, REG_GDM_LEN_CFG(AIROHA_GDM2_IDX),
> - GDM_SHORT_LEN_MASK | GDM_LONG_LEN_MASK,
> - FIELD_PREP(GDM_SHORT_LEN_MASK, 60) |
> - FIELD_PREP(GDM_LONG_LEN_MASK, AIROHA_MAX_MTU));
> /* Forward the traffic to the proper GDM port */
> pse_port = port->id == AIROHA_GDM3_IDX ? FE_PSE_PORT_GDM3
> : FE_PSE_PORT_GDM4;
> @@ -2098,7 +2074,7 @@ static int airoha_dev_change_mtu(struct net_device *netdev, int mtu)
>
> WRITE_ONCE(netdev->mtu, mtu);
> if (port->users)
> - airoha_set_port_mtu(dev->eth, port);
> + airoha_dev_set_xmit_frame_size(netdev);
>
> return 0;
> }
> diff --git a/drivers/net/ethernet/airoha/airoha_eth.h b/drivers/net/ethernet/airoha/airoha_eth.h
> index d7ff8c5200e2..0c3fb6e5d7f1 100644
> --- a/drivers/net/ethernet/airoha/airoha_eth.h
> +++ b/drivers/net/ethernet/airoha/airoha_eth.h
> @@ -23,6 +23,7 @@
> #define AIROHA_MAX_DSA_PORTS 7
> #define AIROHA_MAX_NUM_RSTS 3
> #define AIROHA_MAX_MTU 9220
> +#define AIROHA_MAX_RX_SIZE 16128
> #define AIROHA_MAX_PACKET_SIZE 2048
> #define AIROHA_NUM_QOS_CHANNELS 4
> #define AIROHA_NUM_QOS_QUEUES 8
> @@ -676,6 +677,7 @@ int airoha_get_fe_port(struct airoha_gdm_dev *dev);
> bool airoha_is_valid_gdm_dev(struct airoha_eth *eth,
> struct airoha_gdm_dev *dev);
>
> +void airoha_ppe_set_xmit_frame_size(struct airoha_gdm_dev *dev);
> void airoha_ppe_set_cpu_port(struct airoha_gdm_dev *dev, u8 ppe_id, u8 fport);
> bool airoha_ppe_is_enabled(struct airoha_eth *eth, int index);
> void airoha_ppe_check_skb(struct airoha_ppe_dev *dev, struct sk_buff *skb,
> diff --git a/drivers/net/ethernet/airoha/airoha_ppe.c b/drivers/net/ethernet/airoha/airoha_ppe.c
> index 42f4b0f21d17..e7c78293002a 100644
> --- a/drivers/net/ethernet/airoha/airoha_ppe.c
> +++ b/drivers/net/ethernet/airoha/airoha_ppe.c
> @@ -97,6 +97,33 @@ void airoha_ppe_set_cpu_port(struct airoha_gdm_dev *dev, u8 ppe_id, u8 fport)
> __field_prep(DFT_CPORT_MASK(fport), fe_cpu_port));
> }
>
> +void airoha_ppe_set_xmit_frame_size(struct airoha_gdm_dev *dev)
> +{
> + struct airoha_gdm_port *port = dev->port;
> + struct airoha_eth *eth = dev->eth;
> + int i, ppe_id, index;
> + u32 len = 0;
> +
> + for (i = 0; i < ARRAY_SIZE(port->devs); i++) {
> + struct airoha_gdm_dev *d = port->devs[i];
> + struct net_device *netdev;
> +
> + if (!d)
> + continue;
> +
> + netdev = netdev_from_priv(d);
> + if (netif_running(netdev))
> + len = max_t(u32, len, netdev->mtu);
> + }
> + len += VLAN_ETH_HLEN;
> +
> + ppe_id = !airoha_is_lan_gdm_dev(dev) && airoha_ppe_is_enabled(eth, 1);
> + index = port->id == AIROHA_GDM4_IDX ? 7 : port->id;
> + airoha_fe_rmw(eth, REG_PPE_MTU(ppe_id, index),
> + FP_EGRESS_MTU_MASK(index),
> + __field_prep(FP_EGRESS_MTU_MASK(index), len));
- Does this leave the egress MTU limit uninitialized for other PPE engines?
The patch removes the loop in airoha_ppe_hw_init() that previously initialized
REG_PPE_MTU for all ports on all available PPEs. This function now only
configures it for a single ppe_id.
During cross-PPE routing (such as WAN-to-LAN), if PPE1 (WAN) forwards a packet
to a LAN port, it will check REG_PPE_MTU(1, LAN_port). Since this register was
only configured for PPE0, will the uninitialized limit (0) cause the packet to
be dropped?
- This is not a real issue since every airoha_gdm_dev/net_device is
associated to a PPE engine/QDMA according to the logic in
airoha_dev_open()/airoha_dev_set_qdma(). The other PPE engine's MTU will be
updated according to the assigned net_device.
Regards,
Lorenzo
> +}
> +
> static void airoha_ppe_hw_init(struct airoha_ppe *ppe)
> {
> u32 sram_ppe_num_data_entries = PPE_SRAM_NUM_ENTRIES, sram_num_entries;
> @@ -115,8 +142,6 @@ static void airoha_ppe_hw_init(struct airoha_ppe *ppe)
> PPE_RAM_NUM_ENTRIES_SHIFT(sram_ppe_num_data_entries);
>
> for (i = 0; i < eth->soc->num_ppe; i++) {
> - int p;
> -
> airoha_fe_wr(eth, REG_PPE_TB_BASE(i),
> ppe->foe_dma + sram_tb_size);
>
> @@ -166,15 +191,6 @@ static void airoha_ppe_hw_init(struct airoha_ppe *ppe)
> airoha_fe_wr(eth, REG_PPE_HASH_SEED(i), PPE_HASH_SEED);
> airoha_fe_clear(eth, REG_PPE_PPE_FLOW_CFG(i),
> PPE_FLOW_CFG_IP6_6RD_MASK);
> -
> - for (p = 0; p < ARRAY_SIZE(eth->ports); p++)
> - airoha_fe_rmw(eth, REG_PPE_MTU(i, p),
> - FP0_EGRESS_MTU_MASK |
> - FP1_EGRESS_MTU_MASK,
> - FIELD_PREP(FP0_EGRESS_MTU_MASK,
> - AIROHA_MAX_MTU) |
> - FIELD_PREP(FP1_EGRESS_MTU_MASK,
> - AIROHA_MAX_MTU));
> }
>
> for (i = 0; i < ARRAY_SIZE(eth->ports); i++) {
> @@ -196,6 +212,7 @@ static void airoha_ppe_hw_init(struct airoha_ppe *ppe)
> airoha_ppe_is_enabled(eth, 1);
> fport = airoha_get_fe_port(dev);
> airoha_ppe_set_cpu_port(dev, ppe_id, fport);
> + airoha_ppe_set_xmit_frame_size(dev);
> }
> }
> }
> diff --git a/drivers/net/ethernet/airoha/airoha_regs.h b/drivers/net/ethernet/airoha/airoha_regs.h
> index 436f3c8779c1..6fed63d013b4 100644
> --- a/drivers/net/ethernet/airoha/airoha_regs.h
> +++ b/drivers/net/ethernet/airoha/airoha_regs.h
> @@ -327,9 +327,8 @@
> #define PPE_SRAM_TABLE_EN_MASK BIT(0)
>
> #define REG_PPE_MTU_BASE(_n) (((_n) ? PPE2_BASE : PPE1_BASE) + 0x304)
> -#define REG_PPE_MTU(_m, _n) (REG_PPE_MTU_BASE(_m) + ((_n) << 2))
> -#define FP1_EGRESS_MTU_MASK GENMASK(29, 16)
> -#define FP0_EGRESS_MTU_MASK GENMASK(13, 0)
> +#define REG_PPE_MTU(_m, _n) (REG_PPE_MTU_BASE(_m) + (((_n) / 2) << 2))
> +#define FP_EGRESS_MTU_MASK(_n) GENMASK(13 + (((_n) % 2) << 4), ((_n) % 2) << 4)
>
> #define REG_PPE_RAM_CTRL(_n) (((_n) ? PPE2_BASE : PPE1_BASE) + 0x31c)
> #define PPE_SRAM_CTRL_ACK_MASK BIT(31)
> @@ -377,6 +376,10 @@
> #define REG_SRC_PORT_FC_MAP6 0x2298
> #define FC_ID_OF_SRC_PORT_MASK(_n) GENMASK(4 + ((_n) << 3), ((_n) << 3))
>
> +#define REG_WAN_MTU0 0x2300
> +#define WAN_MTU1_MASK GENMASK(29, 16)
> +#define WAN_MTU0_MASK GENMASK(13, 0)
> +
> #define REG_CDM5_RX_OQ1_DROP_CNT 0x29d4
>
> /* QDMA */
>
> ---
> base-commit: fd1269e454089abda0e4f9e5e25ecd02a90ab009
> change-id: 20260618-airoha-fix-rx-max-len-57654b661646
>
> Best regards,
> --
> Lorenzo Bianconi <lorenzo@kernel.org>
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply
* Re: [PATCH 0/6] lib: rework bitreverse
From: patchwork-bot+linux-riscv @ 2026-06-26 8:20 UTC (permalink / raw)
To: Yury Norov
Cc: linux-riscv, pjw, palmer, aou, alex, yury.norov, linux, arnd,
andrew+netdev, davem, edumazet, kuba, pabeni, akpm, ast, daniel,
hawk, john.fastabend, sdf, ruanjinjie, linux-kernel, linux-arch,
netdev, bpf
In-Reply-To: <20260430211351.658193-1-ynorov@nvidia.com>
Hello:
This series was applied to riscv/linux.git (fixes)
by Yury Norov <ynorov@nvidia.com>:
On Thu, 30 Apr 2026 17:13:44 -0400 you wrote:
> This series is a resend for Jinjie Ruan's "arch/riscv: Add bitrev.h file
> to support rev8 and brev8" [1], my follow-up "lib: compile generic
> bitrev based on GENERIC_BITREVERSE" [2], and the fix for a build error
> reported by Nathan Chancellor [3].
>
> No changes, except for combining pieces together and rebasing on top of
> the tree.
>
> [...]
Here is the summary with links:
- [1/6] lib: include crc32.h conditionally on CONFIG_CRC32
(no matching commit)
- [2/6] lib/bitrev: Introduce GENERIC_BITREVERSE and cleanup Kconfig
(no matching commit)
- [3/6] bitops: Define generic __bitrev8/16/32 for reuse
(no matching commit)
- [4/6] arch/riscv: Add bitrev.h file to support rev8 and brev8
(no matching commit)
- [5/6] lib: compile generic bitrev.c conditionally on GENERIC_BITREVERSE
(no matching commit)
- [6/6] MAINTAINERS: BITOPS: include bitrev.[ch]
https://git.kernel.org/riscv/c/7b2c5b4e43aa
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply
* Re: [PATCH v2 1/5] arch: select HAVE_ARCH_BITREVERSE conditionally on BITREVERSE
From: patchwork-bot+linux-riscv @ 2026-06-26 8:20 UTC (permalink / raw)
To: Yury Norov
Cc: linux-riscv, pjw, palmer, aou, alex, yury.norov, linux, arnd,
ebiggers, andrew+netdev, davem, edumazet, kuba, pabeni, akpm, ast,
daniel, hawk, john.fastabend, sdf, ruanjinjie, linux-kernel,
linux-arch, netdev, bpf
In-Reply-To: <20260506175207.110893-2-ynorov@nvidia.com>
Hello:
This series was applied to riscv/linux.git (fixes)
by Yury Norov <ynorov@nvidia.com>:
On Wed, 6 May 2026 13:52:02 -0400 you wrote:
> Architectures may have bit reversal instructions, but if the API not
> needed, the corresponding option should not be selected because it may
> lead to generating the unneeded code.
>
> Signed-off-by: Yury Norov <ynorov@nvidia.com>
> ---
> arch/arm/Kconfig | 2 +-
> arch/arm64/Kconfig | 2 +-
> arch/loongarch/Kconfig | 2 +-
> arch/mips/Kconfig | 2 +-
> lib/Kconfig | 1 +
> 5 files changed, 5 insertions(+), 4 deletions(-)
Here is the summary with links:
- [v2,1/5] arch: select HAVE_ARCH_BITREVERSE conditionally on BITREVERSE
https://git.kernel.org/riscv/c/42d9c75e8b9c
- [v2,2/5] lib/bitrev: Introduce GENERIC_BITREVERSE
https://git.kernel.org/riscv/c/00751d655ece
- [v2,3/5] bitops: Define generic___bitrev8/16/32 for reuse
https://git.kernel.org/riscv/c/83aede8131af
- [v2,4/5] arch/riscv: Add bitrev.h file to support rev8 and brev8
https://git.kernel.org/riscv/c/e8620bd7e5e0
- [v2,5/5] MAINTAINERS: BITOPS: include bitrev.[ch]
https://git.kernel.org/riscv/c/7b2c5b4e43aa
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply
* Re: [PATCH v5 0/9] driver core: Fix some race conditions
From: patchwork-bot+linux-riscv @ 2026-06-26 8:21 UTC (permalink / raw)
To: Doug Anderson
Cc: linux-riscv, gregkh, rafael, dakr, stern, aik, johan, edumazet,
leon, hch, robin.murphy, maz, aleksander.lobakin, saravanak, akpm,
Frank.Li, jgg, alex, alexander.stein, andre.przywara, andrew,
andrew, andriy.shevchenko, aou, ardb, astewart, bhelgaas, brgl,
broonie, catalin.marinas, chleroy, davem, david, devicetree,
dmaengine, driver-core, gbatra, gregory.clement, hkallweit1,
iommu, jirislaby, joel, joro, kees, kevin.brodsky, kuba, lenb,
lgirdwood, linux-acpi, linux-arm-kernel, linux-aspeed, linux-cxl,
linux-kernel, linux-mips, linux-mm, linux-pci, linux-serial,
linux-snps-arc, linux-usb, linux, linuxppc-dev, m.szyprowski,
maddy, mani, miko.lenczewski, mpe, netdev, npiggin, osalvador,
oupton, pabeni, palmer, peter.ujfalusi, peterz, pjw, robh,
sebastian.hesselbarth, tglx, tsbogend, vgupta, vkoul, will, willy,
yangyicong, yeoreum.yun
In-Reply-To: <20260406232444.3117516-1-dianders@chromium.org>
Hello:
This patch was applied to riscv/linux.git (fixes)
by Danilo Krummrich <dakr@kernel.org>:
On Mon, 6 Apr 2026 16:22:53 -0700 you wrote:
> The main goal of this series is to fix the observed bug talked about
> in the first patch ("driver core: Don't let a device probe until it's
> ready"). That patch fixes a problem that has been observed in the real
> world and could land even if the rest of the patches are found
> unacceptable or need to be spun.
>
> That said, during patch review Danilo correctly pointed out that many
> of the bitfield accesses in "struct device" are unsafe. I added a
> bunch of patches in the series to address each one.
>
> [...]
Here is the summary with links:
- [v5,7/9] driver core: Replace dev->dma_coherent with dev_dma_coherent()
https://git.kernel.org/riscv/c/3e2c1e213ac2
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply
* Re: [PATCH iwl v3] ice: retry reading NVM if admin queue returns EBUSY
From: Robert Malz @ 2026-06-26 8:15 UTC (permalink / raw)
To: Przemek Kitszel
Cc: Simon Horman, Grzegorz Nitka, anthony.l.nguyen, intel-wired-lan,
netdev
In-Reply-To: <CADcc-bwd2CcWJ1AFDm1GR1HBzo2OOh=Xr3moNS+-RVuai6yVBA@mail.gmail.com>
Hey Przemek,
I ran some tests and unfortunately, the following sentence from the
datasheet is true:
"For specific resources, such as Change Lock (0x0003) and Global Config Lock
(0x0004), this field is used by software to override the default timeout for the
operation, and also to specify the timeout used for this operation."
This means we can only change a default timeout for 0x0003 and 0x0004
but not for 0x0001 (NVM resource).
Whatever timeout I provide FW defaults to 0xB88
Input:
[ 2209.656758] ice 0000:31:00.0: CQ CMD: opcode 0x0008, flags 0x2000,
datalen 0x0000, retval 0x0000
[ 2209.656760] ice 0000:31:00.0: cookie (h,l) 0x00000000 0x00000000
[ 2209.656761] ice 0000:31:00.0: param (0,1) 0x00010001 0x00000BB9
Output:
[ 2209.656927] ice 0000:31:00.0: CQ CMD: opcode 0x0008, flags 0x2003,
datalen 0x0000, retval 0x0000
[ 2209.656929] ice 0000:31:00.0: cookie (h,l) 0x00000000 0x00000000
[ 2209.656931] ice 0000:31:00.0: param (0,1) 0x00010001 0x00000BB8
Correct me If I'm wrong, but the only way to properly handle it is to
ensure the resource is locked and released between every
ice_acquire_nvm call.
I'll start working on this.
Regards,
Robert
On Thu, Jun 25, 2026 at 12:14 PM Robert Malz <robert.malz@canonical.com> wrote:
>
> Hey Przemek,
> Thanks a lot for the feedback.
> I was sure that we use ICE_NVM_TIMEOUT (180s) as a timeout every time
> (ice_acquire_nvm) but your proposal made me rethink it a little.
> First of all, the datasheet for E810 specifies the timeout as: "As an
> input, the software might specify timeout longer than the default
> taken for this resource, and up to one minute."
> 180s is greater than one minute so I took a look into AQC logs:
> [ 110.698471] ice 0000:05:00.0: CQ CMD: opcode 0x0008, flags 0x2000,
> datalen 0x0000, retval 0x0000
> [ 110.698474] ice 0000:05:00.0: cookie (h,l) 0x00000000 0x00000000
> [ 110.698477] ice 0000:05:00.0: param (0,1) 0x00010001 0x0002BF20
> [ 110.698480] ice 0000:05:00.0: addr (h,l) 0x00000000 0x00000000
> [ 110.698645] ice 0000:05:00.0: ATQ: desc and buffer writeback:
> [ 110.698648] ice 0000:05:00.0: CQ CMD: opcode 0x0008, flags 0x2003,
> datalen 0x0000, retval 0x0000
> [ 110.698651] ice 0000:05:00.0: cookie (h,l) 0x00000000 0x00000000
> [ 110.698654] ice 0000:05:00.0: param (0,1) 0x00010001 0x00000BB8
> [ 110.698657] ice 0000:05:00.0: addr (h,l) 0x00000000 0x00000000
> Based on the above, the driver requested a 0x0002BF20 timeout (180 000
> ms) but the FW returned only 0x00000BB8 (3s).
> I'm assuming this is expected behavior since the maximum timeout for
> NVM read should be 60,000 ms.
>
> If changing the timeout requested by the driver to 60s for read ops is
> handled correctly by the FW and the FW respects that lock, the retry
> patch submitted in this email thread might not be required at all.
> Let me quickly prepare a new patch and test it. I'll update this
> thread once I have a results.
>
> Regards,
> Robert
>
>
> On Thu, Jun 25, 2026 at 9:51 AM Przemek Kitszel
> <przemyslaw.kitszel@intel.com> wrote:
> >
> > >> This is an AI-generated review of your patch. The human sending this
> >
> >
> > >> [Severity: Low]
> > >> Does this check allow an extra retry execution? Since retry_cnt starts at 0,
> > >> using a strictly greater-than check allows the loop to retry 4 times
> > >> (5 total executions) instead of the 3 executions intended by
> > >> ICE_SQ_SEND_MAX_EXECUTE.
> > >>
> > >>> + break;
> > >>> + ice_debug(hw, ICE_DBG_NVM,
> > >>> + "NVM read EBUSY error, retry %d\n",
> > >>> + retry_cnt + 1);
> > >>> + last_cmd = false;
> > >>> + ice_release_nvm(hw);
> > >>> + msleep(ICE_SQ_SEND_DELAY_TIME_MS);
> > >>
> > > The review is correct that this allows one extra execution. However, I
> > > intentionally kept this verbatim from the OOT driver (1.15.4) to
> >
> > This OOT change was made entirely by Grzegorz, for E825C. Would be great
> > to acknowledge his authorship (+ SoB ofc, below).
> > Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
> >
> > What was the platform that you have encountered the bug on?
> >
> > > retain the significant test exposure it has there. The absolute worst
> > > case is one harmless extra retry before giving up. If you strongly
> > > prefer strict adherence to ICE_SQ_SEND_MAX_EXECUTE, I can update the
> > > condition in a v4.
> >
> > Please update (to have one attempt less).
> >
> > >
> > >> [Severity: High]
> > >> Does dropping the NVM lock here violate the caller's assumption of mutual
> > >> exclusion?
> > >>
> > >> Callers of ice_read_flat_nvm() typically acquire the NVM lock beforehand to
> > >> ensure an atomic read of multiple sectors without interleaved writes. By
> > >> explicitly dropping the lock during the msleep() window, another thread, VF,
> > >> or PF could acquire the NVM lock and modify the NVM before the read resumes,
> > >> potentially resulting in reading torn or inconsistent NVM data.
> > >>
> > >>> + status = ice_acquire_nvm(hw, ICE_RES_READ);
> > >>> + if (status)
> > >>> + break;
> > >>
> > >
> > > I don't think this is an issue. The NVM resource (req_res 0x0008) is
> > > an ownership lease, not a content snapshot. The FW grants it with a
> > > hold timeout (see the timeout field handling in ice_aq_req_res). It
> > > never guaranteed a point-in-time view of the flash.
> > > Very similar behavior already exists in ice_devlink_nvm_snapshot(),
> > > which reads the NVM in chunks via ice_read_flat_nvm() and deliberately
> > > acquires/releases the lock between chunks (with a comment explaining
> > > the FW reclaim).
> > > Additionally, the lock is only dropped on the failure path, where
> > > neither bytes_read nor offset have been advanced. The driver simply
> > > re-reads the failed chunk after re-acquiring. So no data that is
> > > returned to the caller is committed across the drop window.
> > > Overall the lock drop does not weaken any guarantee callers actually
> > > have. If we wanted to introduce a true snapshot guarantee, that would
> > > probably require different approach and separate patch.
> > >
> > >
> > >> [Severity: High]
> > >> Can this lead to an unbalanced NVM resource lock release and potential data
> > >> corruption?
> > >>
> > >> If ice_acquire_nvm() fails here, the loop breaks and returns an error with the
> > >> lock currently unheld. However, callers like ice_read_flash_module() are
> > >> unaware of this internal lock state change and will unconditionally call
> > >> ice_release_nvm() upon failure:
> > >>
> > >> drivers/net/ethernet/intel/ice/ice_nvm.c:ice_read_flash_module() {
> > >> status = ice_acquire_nvm(hw, ICE_RES_READ);
> > >> if (status)
> > >> return status;
> > >>
> > >> status = ice_read_flat_nvm(hw, start + offset, &length, data, false);
> > >>
> > >> ice_release_nvm(hw);
> > >> ...
> > >>
> > >> Because firmware tracks resource locks per-PF rather than per-OS-thread, this
> > >> errant second release could forcefully strip the lock from another thread on
> > >> the same PF that successfully acquired it, exposing the NVM to concurrent
> > >> modification.
> > >>
> > >
> > > Agreed, this might be a real bug, and the one of the three I think is
> > > worth investigating.
> > > If ice_acquire_nvm() fails after the drop, ice_read_flat_nvm() returns
> > > with the lock unheld while callers unconditionally call
> > > ice_release_nvm(), so a stray release is issued.
> > >
> > > On probability, though, the window is very small. Reaching it requires
> > > sustained EBUSY across the retry budget plus a failed re-acquire
> > > (which itself polls up to ICE_NVM_TIMEOUT), and concurrently another
> > > requester taking the lock. Most reads happen during init (ice_probe,
> > > and reset/rebuild via ice_init_nvm), and NVM writes only happen on an
> > > already initialized driver. The devlink/ethtool nvm_read paths are
> > > also exposed, but hitting this race would require precise timing
> > > against a concurrent NVM owner on the device.
> > >
> > > I'd prefer to keep the scope of this patch limited to the EBUSY retry
> > > path and not take on the unbalanced-release fix here. A proper fix
> > > should change the lock-ownership contract of ice_read_flat_nvm() (on
> > > error, the lock must be released by ice_read_flat_nvm(), callers
> > > release only on success) and update all callers. Code change sould be
> > > simple for all callers but ice_discover_flash_size(), it intentionally
> > > holds one lease across a read loop and would need to re-acquire after
> > > each expected boundary failure.
> > >
> > > Given how small the original window is, I'd rather not trade tested
> > > OOT behavior for the risk of a complex unbalanced NVM lock fix. I
> > > actually have a patch mostly ready that fixes the lock-ownership
> > > contract, but I really don't like it. It changes the design of
> > > ice_read_flat_nvm(), making it less intuitive for callers. More
> > > importantly, I just don't have the resources or test coverage right
> > > now to properly verify such change.
> > >
> > > However, I can modify the failure path for ice_acquire_nvm inside
> > > ice_read_flat_nvm. Instead of bailing out immediately, we can just
> > > retry it within the existing retry budget. In this case, the
> > > probability of leaving ice_read_flat_nvm without holding the lock is
> > > reduced even further without needing a refactor.
> > >
> > > Please let me know what you think about my thought process on this.
> >
> > I think that both AI-reported issues against the lock are valid
> > concerns.
> >
> > I think that sleep was the actual fix, and re-locking were merely
> > a necessity due to their expiration (as you said).
> >
> > A proper fix would be to just increase lock-timeout to accommodate all
> > attempts (and still do the retries&sleep, but without unlocking).
> >
> >
> > >
> > >
> > >
> > >>> + retry_cnt++;
> > >>> + } else {
> > >>> + bytes_read += read_size;
> > >>> + offset += read_size;
> > >>> + retry_cnt = 0;
> > >>> + }
> > >>> } while (!last_cmd);
> > >>>
> > >>> *length = bytes_read;
> > >
> > > Thanks,
> > > Robert
> >
^ permalink raw reply
* Re: [PATCH net] net: airoha: fix max receive size configuration
From: Lorenzo Bianconi @ 2026-06-26 8:25 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman
Cc: linux-arm-kernel, linux-mediatek, netdev, Madhur Agrawal
In-Reply-To: <20260625-airoha-fix-rx-max-len-v1-1-45b9b827358d@kernel.org>
[-- Attachment #1: Type: text/plain, Size: 12499 bytes --]
> Set the GDM maximum receive size to AIROHA_MAX_RX_SIZE unconditionally
> during hardware initialization instead of updating it according to the
> configured MTU. This avoids dropping incoming frames that exceed the
> current MTU but could still be processed by the networking stack, which
> is able to fragment the reply on the TX side (e.g. ICMP echo requests).
> Move the per-port MTU configuration to the PPE egress path where it
> belongs, and set the tx frame size running airoha_ppe_set_xmit_frame_size()
> to dynamically track the maximum MTU across running interfaces sharing
> the same PPE instance.
> Fix the PPE MTU register addressing to pack two port entries per
> register word and add WAN_MTU0 configuration for non-LAN GDM devices.
>
> Fixes: 54d989d58d2a ("net: airoha: Move min/max packet len configuration in airoha_dev_open()")
> Tested-by: Madhur Agrawal <madhur.agrawal@airoha.com>
> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
commenting on sashiko's report:
https://netdev-ai.bots.linux.dev/sashiko/#/patchset/20260625-airoha-fix-rx-max-len-v1-1-45b9b827358d%40kernel.org
> ---
> drivers/net/ethernet/airoha/airoha_eth.c | 68 ++++++++++---------------------
> drivers/net/ethernet/airoha/airoha_eth.h | 2 +
> drivers/net/ethernet/airoha/airoha_ppe.c | 39 +++++++++++++-----
> drivers/net/ethernet/airoha/airoha_regs.h | 9 ++--
> 4 files changed, 58 insertions(+), 60 deletions(-)
>
> diff --git a/drivers/net/ethernet/airoha/airoha_eth.c b/drivers/net/ethernet/airoha/airoha_eth.c
> index 932b3a3df2e5..3f451c2d4c24 100644
> --- a/drivers/net/ethernet/airoha/airoha_eth.c
> +++ b/drivers/net/ethernet/airoha/airoha_eth.c
> @@ -178,10 +178,15 @@ static void airoha_fe_maccr_init(struct airoha_eth *eth)
> {
> int p;
>
> - for (p = 1; p <= ARRAY_SIZE(eth->ports); p++)
> + for (p = 1; p <= ARRAY_SIZE(eth->ports); p++) {
> airoha_fe_set(eth, REG_GDM_FWD_CFG(p),
> GDM_TCP_CKSUM_MASK | GDM_UDP_CKSUM_MASK |
> GDM_IP4_CKSUM_MASK | GDM_DROP_CRC_ERR_MASK);
> + airoha_fe_rmw(eth, REG_GDM_LEN_CFG(p),
> + GDM_SHORT_LEN_MASK | GDM_LONG_LEN_MASK,
> + FIELD_PREP(GDM_SHORT_LEN_MASK, 60) |
> + FIELD_PREP(GDM_LONG_LEN_MASK, AIROHA_MAX_RX_SIZE));
> + }
>
> airoha_fe_rmw(eth, REG_CDM_VLAN_CTRL(1), CDM_VLAN_MASK,
> FIELD_PREP(CDM_VLAN_MASK, 0x8100));
> @@ -1831,13 +1836,24 @@ static void airoha_update_hw_stats(struct airoha_gdm_dev *dev)
> spin_unlock(&port->stats_lock);
> }
>
> +static void airoha_dev_set_xmit_frame_size(struct net_device *netdev)
> +{
> + struct airoha_gdm_dev *dev = netdev_priv(netdev);
> +
> + airoha_ppe_set_xmit_frame_size(dev);
> + if (!airoha_is_lan_gdm_dev(dev))
> + airoha_fe_rmw(dev->eth, REG_WAN_MTU0, WAN_MTU0_MASK,
> + FIELD_PREP(WAN_MTU0_MASK,
> + VLAN_ETH_HLEN + netdev->mtu));
> +}
- Could the WAN_MTU0 update here use the same max-across-siblings
aggregation as airoha_ppe_set_xmit_frame_size()?
- This is same issue reported by sashiko-gemini. There is just one WAN device
in the system so we do not need calculate the max MTU here.
> +
> static int airoha_dev_open(struct net_device *netdev)
> {
> - int err, len = ETH_HLEN + netdev->mtu + ETH_FCS_LEN;
> struct airoha_gdm_dev *dev = netdev_priv(netdev);
> struct airoha_gdm_port *port = dev->port;
> - u32 cur_len, pse_port = FE_PSE_PORT_PPE1;
> struct airoha_qdma *qdma = dev->qdma;
> + u32 pse_port = FE_PSE_PORT_PPE1;
> + int err;
>
> netif_tx_start_all_queues(netdev);
> err = airoha_set_vip_for_gdm_port(dev, true);
> @@ -1851,19 +1867,7 @@ static int airoha_dev_open(struct net_device *netdev)
> airoha_fe_clear(qdma->eth, REG_GDM_INGRESS_CFG(port->id),
> GDM_STAG_EN_MASK);
>
> - cur_len = airoha_fe_get(qdma->eth, REG_GDM_LEN_CFG(port->id),
> - GDM_LONG_LEN_MASK);
> - if (!port->users || len > cur_len) {
> - /* Opening a sibling net_device with a larger MTU updates the
> - * MTU of already running devices. This is required to allow
> - * multiple net_devices with different MTUs to share the same
> - * GDM port.
> - */
> - airoha_fe_rmw(qdma->eth, REG_GDM_LEN_CFG(port->id),
> - GDM_SHORT_LEN_MASK | GDM_LONG_LEN_MASK,
> - FIELD_PREP(GDM_SHORT_LEN_MASK, 60) |
> - FIELD_PREP(GDM_LONG_LEN_MASK, len));
> - }
> + airoha_dev_set_xmit_frame_size(netdev);
> port->users++;
>
> if (!airoha_is_lan_gdm_dev(dev) &&
> @@ -1875,30 +1879,6 @@ static int airoha_dev_open(struct net_device *netdev)
> return 0;
> }
>
> -static void airoha_set_port_mtu(struct airoha_eth *eth,
> - struct airoha_gdm_port *port)
> -{
> - u32 len = 0;
> - int i;
> -
> - for (i = 0; i < ARRAY_SIZE(port->devs); i++) {
> - struct airoha_gdm_dev *dev = port->devs[i];
> - struct net_device *netdev;
> -
> - if (!dev)
> - continue;
> -
> - netdev = netdev_from_priv(dev);
> - if (netif_running(netdev))
> - len = max_t(u32, len, netdev->mtu);
> - }
> - len += ETH_HLEN + ETH_FCS_LEN;
> -
> - airoha_fe_rmw(eth, REG_GDM_LEN_CFG(port->id),
> - GDM_LONG_LEN_MASK,
> - FIELD_PREP(GDM_LONG_LEN_MASK, len));
> -}
> -
> static int airoha_dev_stop(struct net_device *netdev)
> {
> struct airoha_gdm_dev *dev = netdev_priv(netdev);
> @@ -1909,7 +1889,7 @@ static int airoha_dev_stop(struct net_device *netdev)
> airoha_set_vip_for_gdm_port(dev, false);
>
> if (--port->users)
> - airoha_set_port_mtu(dev->eth, port);
> + airoha_ppe_set_xmit_frame_size(dev);
- On the close path, the call is to airoha_ppe_set_xmit_frame_size()
directly rather than the airoha_dev_set_xmit_frame_size() wrapper.
Does this mean WAN_MTU0 is never refreshed when a WAN dev is closed?
For example, if a small-MTU sibling is closed while a larger-MTU dev
remains running, the PPE MTU register gets recomputed to the larger
value but WAN_MTU0 retains the smaller value written at the last open
or change_mtu.
The commit message states:
set the tx frame size running airoha_ppe_set_xmit_frame_size()
to dynamically track the maximum MTU across running interfaces sharing
the same PPE instance.
Is the asymmetry between PPE MTU (max across siblings) and WAN_MTU0
(per-netdev write) intentional?
- This is same issue reported by sashiko-gemini. There is just one WAN device
in the system so there is no point to update WAN_MTU0 if the WAN device is
stopped.
Regards,
Lorenzo
> else
> airoha_set_gdm_port_fwd_cfg(qdma->eth,
> REG_GDM_FWD_CFG(port->id),
> @@ -1962,10 +1942,6 @@ static int airoha_enable_gdm2_loopback(struct airoha_gdm_dev *dev)
> FIELD_PREP(LPBK_CHAN_MASK, chan) |
> LBK_GAP_MODE_MASK | LBK_LEN_MODE_MASK |
> LBK_CHAN_MODE_MASK | LPBK_EN_MASK);
> - airoha_fe_rmw(eth, REG_GDM_LEN_CFG(AIROHA_GDM2_IDX),
> - GDM_SHORT_LEN_MASK | GDM_LONG_LEN_MASK,
> - FIELD_PREP(GDM_SHORT_LEN_MASK, 60) |
> - FIELD_PREP(GDM_LONG_LEN_MASK, AIROHA_MAX_MTU));
> /* Forward the traffic to the proper GDM port */
> pse_port = port->id == AIROHA_GDM3_IDX ? FE_PSE_PORT_GDM3
> : FE_PSE_PORT_GDM4;
> @@ -2098,7 +2074,7 @@ static int airoha_dev_change_mtu(struct net_device *netdev, int mtu)
>
> WRITE_ONCE(netdev->mtu, mtu);
> if (port->users)
> - airoha_set_port_mtu(dev->eth, port);
> + airoha_dev_set_xmit_frame_size(netdev);
>
> return 0;
> }
> diff --git a/drivers/net/ethernet/airoha/airoha_eth.h b/drivers/net/ethernet/airoha/airoha_eth.h
> index d7ff8c5200e2..0c3fb6e5d7f1 100644
> --- a/drivers/net/ethernet/airoha/airoha_eth.h
> +++ b/drivers/net/ethernet/airoha/airoha_eth.h
> @@ -23,6 +23,7 @@
> #define AIROHA_MAX_DSA_PORTS 7
> #define AIROHA_MAX_NUM_RSTS 3
> #define AIROHA_MAX_MTU 9220
> +#define AIROHA_MAX_RX_SIZE 16128
> #define AIROHA_MAX_PACKET_SIZE 2048
> #define AIROHA_NUM_QOS_CHANNELS 4
> #define AIROHA_NUM_QOS_QUEUES 8
> @@ -676,6 +677,7 @@ int airoha_get_fe_port(struct airoha_gdm_dev *dev);
> bool airoha_is_valid_gdm_dev(struct airoha_eth *eth,
> struct airoha_gdm_dev *dev);
>
> +void airoha_ppe_set_xmit_frame_size(struct airoha_gdm_dev *dev);
> void airoha_ppe_set_cpu_port(struct airoha_gdm_dev *dev, u8 ppe_id, u8 fport);
> bool airoha_ppe_is_enabled(struct airoha_eth *eth, int index);
> void airoha_ppe_check_skb(struct airoha_ppe_dev *dev, struct sk_buff *skb,
> diff --git a/drivers/net/ethernet/airoha/airoha_ppe.c b/drivers/net/ethernet/airoha/airoha_ppe.c
> index 42f4b0f21d17..e7c78293002a 100644
> --- a/drivers/net/ethernet/airoha/airoha_ppe.c
> +++ b/drivers/net/ethernet/airoha/airoha_ppe.c
> @@ -97,6 +97,33 @@ void airoha_ppe_set_cpu_port(struct airoha_gdm_dev *dev, u8 ppe_id, u8 fport)
> __field_prep(DFT_CPORT_MASK(fport), fe_cpu_port));
> }
>
> +void airoha_ppe_set_xmit_frame_size(struct airoha_gdm_dev *dev)
> +{
> + struct airoha_gdm_port *port = dev->port;
> + struct airoha_eth *eth = dev->eth;
> + int i, ppe_id, index;
> + u32 len = 0;
> +
> + for (i = 0; i < ARRAY_SIZE(port->devs); i++) {
> + struct airoha_gdm_dev *d = port->devs[i];
> + struct net_device *netdev;
> +
> + if (!d)
> + continue;
> +
> + netdev = netdev_from_priv(d);
> + if (netif_running(netdev))
> + len = max_t(u32, len, netdev->mtu);
> + }
> + len += VLAN_ETH_HLEN;
> +
> + ppe_id = !airoha_is_lan_gdm_dev(dev) && airoha_ppe_is_enabled(eth, 1);
> + index = port->id == AIROHA_GDM4_IDX ? 7 : port->id;
> + airoha_fe_rmw(eth, REG_PPE_MTU(ppe_id, index),
> + FP_EGRESS_MTU_MASK(index),
> + __field_prep(FP_EGRESS_MTU_MASK(index), len));
> +}
> +
> static void airoha_ppe_hw_init(struct airoha_ppe *ppe)
> {
> u32 sram_ppe_num_data_entries = PPE_SRAM_NUM_ENTRIES, sram_num_entries;
> @@ -115,8 +142,6 @@ static void airoha_ppe_hw_init(struct airoha_ppe *ppe)
> PPE_RAM_NUM_ENTRIES_SHIFT(sram_ppe_num_data_entries);
>
> for (i = 0; i < eth->soc->num_ppe; i++) {
> - int p;
> -
> airoha_fe_wr(eth, REG_PPE_TB_BASE(i),
> ppe->foe_dma + sram_tb_size);
>
> @@ -166,15 +191,6 @@ static void airoha_ppe_hw_init(struct airoha_ppe *ppe)
> airoha_fe_wr(eth, REG_PPE_HASH_SEED(i), PPE_HASH_SEED);
> airoha_fe_clear(eth, REG_PPE_PPE_FLOW_CFG(i),
> PPE_FLOW_CFG_IP6_6RD_MASK);
> -
> - for (p = 0; p < ARRAY_SIZE(eth->ports); p++)
> - airoha_fe_rmw(eth, REG_PPE_MTU(i, p),
> - FP0_EGRESS_MTU_MASK |
> - FP1_EGRESS_MTU_MASK,
> - FIELD_PREP(FP0_EGRESS_MTU_MASK,
> - AIROHA_MAX_MTU) |
> - FIELD_PREP(FP1_EGRESS_MTU_MASK,
> - AIROHA_MAX_MTU));
> }
>
> for (i = 0; i < ARRAY_SIZE(eth->ports); i++) {
> @@ -196,6 +212,7 @@ static void airoha_ppe_hw_init(struct airoha_ppe *ppe)
> airoha_ppe_is_enabled(eth, 1);
> fport = airoha_get_fe_port(dev);
> airoha_ppe_set_cpu_port(dev, ppe_id, fport);
> + airoha_ppe_set_xmit_frame_size(dev);
> }
> }
> }
> diff --git a/drivers/net/ethernet/airoha/airoha_regs.h b/drivers/net/ethernet/airoha/airoha_regs.h
> index 436f3c8779c1..6fed63d013b4 100644
> --- a/drivers/net/ethernet/airoha/airoha_regs.h
> +++ b/drivers/net/ethernet/airoha/airoha_regs.h
> @@ -327,9 +327,8 @@
> #define PPE_SRAM_TABLE_EN_MASK BIT(0)
>
> #define REG_PPE_MTU_BASE(_n) (((_n) ? PPE2_BASE : PPE1_BASE) + 0x304)
> -#define REG_PPE_MTU(_m, _n) (REG_PPE_MTU_BASE(_m) + ((_n) << 2))
> -#define FP1_EGRESS_MTU_MASK GENMASK(29, 16)
> -#define FP0_EGRESS_MTU_MASK GENMASK(13, 0)
> +#define REG_PPE_MTU(_m, _n) (REG_PPE_MTU_BASE(_m) + (((_n) / 2) << 2))
> +#define FP_EGRESS_MTU_MASK(_n) GENMASK(13 + (((_n) % 2) << 4), ((_n) % 2) << 4)
>
> #define REG_PPE_RAM_CTRL(_n) (((_n) ? PPE2_BASE : PPE1_BASE) + 0x31c)
> #define PPE_SRAM_CTRL_ACK_MASK BIT(31)
> @@ -377,6 +376,10 @@
> #define REG_SRC_PORT_FC_MAP6 0x2298
> #define FC_ID_OF_SRC_PORT_MASK(_n) GENMASK(4 + ((_n) << 3), ((_n) << 3))
>
> +#define REG_WAN_MTU0 0x2300
> +#define WAN_MTU1_MASK GENMASK(29, 16)
> +#define WAN_MTU0_MASK GENMASK(13, 0)
> +
> #define REG_CDM5_RX_OQ1_DROP_CNT 0x29d4
>
> /* QDMA */
>
> ---
> base-commit: fd1269e454089abda0e4f9e5e25ecd02a90ab009
> change-id: 20260618-airoha-fix-rx-max-len-57654b661646
>
> Best regards,
> --
> Lorenzo Bianconi <lorenzo@kernel.org>
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply
* [PATCH net v2 1/1] net: sched: ets: avoid deficit wrap and bound empty dequeue rounds
From: Ren Wei @ 2026-06-26 8:32 UTC (permalink / raw)
To: netdev
Cc: jhs, jiri, davem, petrm, yuantan098, yifanwucs, tomapufckgml,
zcliangcn, bird, bronzed_45_vested, n05ec
From: Wyatt Feng <bronzed_45_vested@icloud.com>
ETS keeps each DRR-style deficit in a u32 and replenishes it with
the configured quantum whenever the head packet is too large. Both
the quantum and qdisc_pkt_len() are user-controlled inputs: a large
quantum can wrap the deficit counter, while a tiny quantum combined
with an inflated qdisc_pkt_len() can force billions of iterations in
softirq context before any packet becomes eligible.
Store the deficit in u64 so replenishment cannot wrap the counter.
This keeps the existing dequeue logic unchanged while fixing the
overflow condition.
Bound one dequeue attempt to at most nbands * 2 ETS rotations, as
suggested in review. This avoids the livelock without adding heavier
logic to the fast path.
Fixes: dcc68b4d8084 ("net: sch_ets: Add a new Qdisc")
Cc: stable@vger.kernel.org
Reported-by: Yuan Tan <yuantan098@gmail.com>
Reported-by: Yifan Wu <yifanwucs@gmail.com>
Reported-by: Juefei Pu <tomapufckgml@gmail.com>
Reported-by: Zhengchuan Liang <zcliangcn@gmail.com>
Reported-by: Xin Liu <bird@lzu.edu.cn>
Suggested-by: Jamal Hadi Salim <jhs@mojatatu.com>
Assisted-by: Codex:GPT-5.4
Signed-off-by: Wyatt Feng <bronzed_45_vested@icloud.com>
Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
---
changes in v2:
- Instead of doing a div() in the fast path, simply bound the loop per
dequeue
- v1 Link: https://lore.kernel.org/all/20260615103759.2404228-2-n05ec@lzu.edu.cn/
net/sched/sch_ets.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/net/sched/sch_ets.c b/net/sched/sch_ets.c
index cb8cf437ce87..12a156ccb0a6 100644
--- a/net/sched/sch_ets.c
+++ b/net/sched/sch_ets.c
@@ -40,7 +40,7 @@ struct ets_class {
struct list_head alist; /* In struct ets_sched.active. */
struct Qdisc *qdisc;
u32 quantum;
- u32 deficit;
+ u64 deficit;
struct gnet_stats_basic_sync bstats;
struct gnet_stats_queue qstats;
};
@@ -463,6 +463,8 @@ ets_qdisc_dequeue_skb(struct Qdisc *sch, struct sk_buff *skb)
static struct sk_buff *ets_qdisc_dequeue(struct Qdisc *sch)
{
struct ets_sched *q = qdisc_priv(sch);
+ unsigned int max_loops = READ_ONCE(q->nbands) * 2;
+ unsigned int loops = 0;
struct ets_class *cl;
struct sk_buff *skb;
unsigned int band;
@@ -499,6 +501,8 @@ static struct sk_buff *ets_qdisc_dequeue(struct Qdisc *sch)
cl->deficit += READ_ONCE(cl->quantum);
list_move_tail(&cl->alist, &q->active);
+ if (++loops > max_loops)
+ goto out;
}
out:
return NULL;
--
2.47.3
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox