* Re: [PATCH iwl-net 4/4] ice: report EIPE checksum errors to the OS on E830
From: Simon Horman @ 2026-04-20 15:57 UTC (permalink / raw)
To: Aleksandr Loktionov; +Cc: intel-wired-lan, anthony.l.nguyen, netdev, Jan Glaza
In-Reply-To: <20260417062954.1241900-5-aleksandr.loktionov@intel.com>
On Fri, Apr 17, 2026 at 08:29:54AM +0200, Aleksandr Loktionov wrote:
> From: Jan Glaza <jan.glaza@intel.com>
>
> For E830 adapters the hardware-reported EIPE (Ethernet Inline IPsec
> Engine) error is a reliable indication that a received packet failed
> decryption and has a bad checksum. Route EIPE errors through the
> generic checksum error path on E830 so the error is visible via
> standard ethtool statistics (rx_csum_bad).
>
> On previous devices (E810, E82X) the EIPE flag can be spuriously set
> on encapsulated packets with inner L2 padding, so those adapters only
> increment the driver-private hw_rx_eipe_error counter without routing
> through the checksum error path.
>
> Fixes: 0ca6755f3cc2 ("ice: Add a new counter for Rx EIPE errors")
> Signed-off-by: Jan Glaza <jan.glaza@intel.com>
> Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Sashiko has provided review of this patch. However, I don't believe any of
the issues flagged there should block progress of this patch.
You may wish to look over that review for possible follow-up activity.
^ permalink raw reply
* Re: [PATCH v6 3/3] dts: s32g: Add GPR syscon region
From: Jared Kangas @ 2026-04-20 16:04 UTC (permalink / raw)
To: Dan Carpenter
Cc: Chester Lin, Matthias Brugger, Ghennadi Procopciuc,
NXP S32 Linux Team, Frank Li, Sascha Hauer,
Pengutronix Kernel Team, Fabio Estevam, Rob Herring,
Krzysztof Kozlowski, Conor Dooley, linux-arm-kernel, imx,
devicetree, linux-kernel, linaro-s32, netdev
In-Reply-To: <aeKn2dvOOO43zdev@jkangas-thinkpadp1gen3.rmtuswa.csb>
Fixing Dan's address based on mailmap update, sorry for the noise.
On Fri, Apr 17, 2026 at 02:36:25PM -0700, Jared Kangas wrote:
> Hi Dan,
>
> On Fri, Jan 30, 2026 at 04:19:52PM +0300, Dan Carpenter wrote:
> > Add the GPR syscon region for the s32 chipset.
> >
> > Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
> > ---
> >
> > [snip]
> >
> > diff --git a/arch/arm64/boot/dts/freescale/s32g3.dtsi b/arch/arm64/boot/dts/freescale/s32g3.dtsi
> > index e314f3c7d61d..be03db737384 100644
> > --- a/arch/arm64/boot/dts/freescale/s32g3.dtsi
> > +++ b/arch/arm64/boot/dts/freescale/s32g3.dtsi
> > @@ -383,6 +383,11 @@ usdhc0-200mhz-grp4 {
> > };
> > };
> >
> > + gpr: syscon@4007c000 {
> > + compatible = "nxp,s32g3-gpr", "syscon";
> > + reg = <0x4007c000 0x3000>;
> > + };
> > +
> > ocotp: nvmem@400a4000 {
> > compatible = "nxp,s32g3-ocotp", "nxp,s32g2-ocotp";
> > reg = <0x400a4000 0x400>;
> > @@ -808,6 +813,7 @@ gmac0: ethernet@4033c000 {
> > compatible = "nxp,s32g2-dwmac";
> > reg = <0x4033c000 0x2000>, /* gmac IP */
> > <0x4007c004 0x4>; /* GMAC_0_CTRL_STS */
> > + nxp,phy-sel = <&gpr 0x4>;
> > interrupt-parent = <&gic>;
> > interrupts = <GIC_SPI 57 IRQ_TYPE_LEVEL_HIGH>;
> > interrupt-names = "macirq";
>
> I gave this a test on an S32G-VNP-RDB3 and didn't see any issues on the
> dwmac-s32 side, but this appears to trigger a panic when reading the new
> debugfs regmap/*/registers file for the syscon node:
>
> # grep 4007c000 /proc/vmallocinfo
> 0xffff800083da8000-0xffff800083dac000 16384 ioremap_prot+0x74/0xe0 phys=0x000000004007c000 ioremap
> # cat /sys/kernel/debug/regmap/dummy-syscon@0x000000004007c000/registers
> Internal error: synchronous external abort: 0000000096000210 [#1] SMP
> [...]
> CPU: 0 UID: 0 PID: 4344 Comm: cat Tainted: G M E X ------ --- 6.12.0+ #226 PREEMPT_RT
> Tainted: [M]=MACHINE_CHECK, [E]=UNSIGNED_MODULE, [X]=AUX
> [...]
> pc : regmap_mmio_read32le+0x44/0xa0
> lr : regmap_mmio_read32le+0x44/0xa0
> [...]
> x23: ffff00080c080000 x22: ffff000802ac4c00 x21: ffff800087b13c9c
> x20: ffff800080a46494 x19: ffff800083da810c x18: 0000000000000004
> [...]
> x5 : ffff800080a46448 x4 : ffff800083da8000 x3 : ffff800080a46494
> x2 : ffff800080a47230 x1 : ffff800083da810c x0 : 0000000000000020
> Call trace:
> regmap_mmio_read32le+0x44/0xa0 (P)
> regmap_mmio_read+0x4c/0x80
> [...]
> Code: 52800400 8b214093 aa1303e1 97f4caf0 (b9400275)
> ---[ end trace 0000000000000000 ]---
> Kernel panic - not syncing: synchronous external abort: Fatal exception
>
> Running this through decodecode gives:
>
> All code
> ========
> 0: 52800400 mov w0, #0x20 // #32
> 4: 8b214093 add x19, x4, w1, uxtw
> 8: aa1303e1 mov x1, x19
> c: 97f4caf0 bl 0xffffffffffd32bcc
> 10:* b9400275 ldr w21, [x19] <-- trapping instruction
>
> Code starting with the faulting instruction
> ===========================================
> 0: b9400275 ldr w21, [x19]
>
> x19's offset from the base address in /proc/vmallocinfo is 0x10c, which
> points to a bad read at physical address 0x4007c10c; I also confirmed
> that the preceding memory reads back without issues:
>
> # head -c 990 /sys/kernel/debug/regmap/dummy-syscon@0x000000004007c000/registers | tail -1
> 0104: 00000000
> # head -c 1005 /sys/kernel/debug/regmap/dummy-syscon@0x000000004007c000/registers | tail -1
> 0108: 00000000
> # head -c 1020 /sys/kernel/debug/regmap/dummy-syscon@0x000000004007c000/registers | tail -1
> <panic>
>
> Best,
> Jared
>
^ permalink raw reply
* [PATCH bpf] bpf: Fix NULL pointer dereference in bpf_sk_storage_clone and diag paths
From: Weiming Shi @ 2026-04-20 16:14 UTC (permalink / raw)
To: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, Martin KaFai Lau, Alexei Starovoitov, Amery Hung,
Leon Hwang, Kees Cook, Fushuai Wang, Menglong Dong, netdev, bpf,
Xiang Mei, Weiming Shi
bpf_selem_unlink_nofail() sets SDATA(selem)->smap to NULL before
removing the selem from the storage hlist. A concurrent RCU reader in
bpf_sk_storage_clone() can observe the selem still on the list with
smap already NULL, causing a NULL pointer dereference.
general protection fault, probably for non-canonical address 0xdffffc000000000a:
KASAN: null-ptr-deref in range [0x0000000000000050-0x0000000000000057]
RIP: 0010:bpf_sk_storage_clone+0x1cd/0xaa0 net/core/bpf_sk_storage.c:174
Call Trace:
<IRQ>
sk_clone+0xfed/0x1980 net/core/sock.c:2591
inet_csk_clone_lock+0x30/0x760 net/ipv4/inet_connection_sock.c:1222
tcp_create_openreq_child+0x35/0x2680 net/ipv4/tcp_minisocks.c:571
tcp_v4_syn_recv_sock+0x123/0xf90 net/ipv4/tcp_ipv4.c:1729
tcp_check_req+0x8e1/0x2580 include/net/tcp.h:855
tcp_v4_rcv+0x1845/0x3b80 net/ipv4/tcp_ipv4.c:2347
While at it, also add NULL checks in bpf_sk_storage_diag_put_all() and
diag_get() which have the same unprotected dereference pattern and could
theoretically hit the same race during an inet_diag dump.
Fixes: 5d800f87d0a5 ("bpf: Support lockless unlink when freeing map or local storage")
Reported-by: Xiang Mei <xmei5@asu.edu>
Signed-off-by: Weiming Shi <bestswngs@gmail.com>
---
net/core/bpf_sk_storage.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c
index f8338acebf077..3b487280f50fa 100644
--- a/net/core/bpf_sk_storage.c
+++ b/net/core/bpf_sk_storage.c
@@ -172,7 +172,7 @@ int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk)
struct bpf_map *map;
smap = rcu_dereference(SDATA(selem)->smap);
- if (!(smap->map.map_flags & BPF_F_CLONE))
+ if (!smap || !(smap->map.map_flags & BPF_F_CLONE))
continue;
/* Note that for lockless listeners adding new element
@@ -547,6 +547,8 @@ static int diag_get(struct bpf_local_storage_data *sdata, struct sk_buff *skb)
return -EMSGSIZE;
smap = rcu_dereference(sdata->smap);
+ if (!smap)
+ goto errout;
if (nla_put_u32(skb, SK_DIAG_BPF_STORAGE_MAP_ID, smap->map.id))
goto errout;
@@ -599,6 +601,8 @@ static int bpf_sk_storage_diag_put_all(struct sock *sk, struct sk_buff *skb,
saved_len = skb->len;
hlist_for_each_entry_rcu(selem, &sk_storage->list, snode) {
smap = rcu_dereference(SDATA(selem)->smap);
+ if (!smap)
+ continue;
diag_size += nla_value_size(smap->map.value_size);
if (nla_stgs && diag_get(SDATA(selem), skb))
--
2.43.0
^ permalink raw reply related
* AW: pre-boot plugged SFP autoneg advertisement
From: markus.stockhausen @ 2026-04-20 16:16 UTC (permalink / raw)
To: 'Andrew Lunn'
Cc: linux, hkallweit1, netdev, 'Jonas Jelonek', jan, nbd,
'Daniel Golle'
In-Reply-To: <90958cc3-e291-44ff-8fc3-102c0f62a269@lunn.ch>
> Von: markus.stockhausen@gmx.de <markus.stockhausen@gmx.de>
> Gesendet: Sonntag, 19. April 2026 10:49
> An: 'Andrew Lunn' <andrew@lunn.ch>
> Betreff: AW: pre-boot plugged SFP autoneg advertisement
>
> Took that hint/question and digged deeper. Added further debug
> to each and every linkmode_copy. I think I found the culprit in
> a userspace ethtool call. For now I assume OpenWrt netifd.
Hi Andrew,
once again thanks for your help. After further investigation I hopefully can
add
more details. I think I got the whole picture now. So some additional
background
information about the environment.
- Realtek RTL930x devices with SFP+ module slots
- These are driven directly by a SerDes (controlled by downstream PCS
driver)
- The DTS reads
port11: port@11 {
reg = <11>;
label = "lan12" ;
pcs-handle = <&serdes8>;
phy-mode = "1000base-x";
sfp = <&sfp1>;
managed = "in-band-status";
};
Sequence of events during boot is as follows:
- SFP module is already inserted (in my case 1G)
- phylink_sfp_config_phy() runs long before any network config starts
- OpenWrt netifd daemon starts and wants to configure the network interfaces
- It reads current settings via ethtool ioctl and gets autoneg=off
- It writes basic config values via ethtool ioctl including autneg=off
- Later on it starts the interface and phylink_start() is issued
With my limited knowledge I would patch phylink_ethtool_ksettings_get().
/* The MAC is reporting the link results from its own PCS
* layer via in-band status. Report these as the current
* link settings.
*/
phylink_get_ksettings(&link_state, kset);
break;
+ case MLO_AN_PHY:
+ /* SFP module present at boot but phylink not yet started.
+ * Return autonegotiation as set by
phylink_sfp_config_phy().
+ */
+ if (pl->sfp_bus && !pl->phydev)
+ kset->base.autoneg =
+
linkmode_test_bit(ETHTOOL_LINK_MODE_Autoneg_BIT,
+
pl->link_config.advertising)
+ ? AUTONEG_ENABLE : AUTONEG_DISABLE;
+ break;
}
return 0;
}
This comes from the observation that
- pl->link_config.advertising is filled by phylink_sfp_set_config()
- state MLO_AN_PHY is reported before phylink_start()
- state MLO_AN_INBAND is reported after phylink_start()
Is this reasonable or am I totally off?
Markus
^ permalink raw reply
* [PATCH net 0/2] mptcp: sync the msk->sndbuf at accept() time
From: Matthieu Baerts (NGI0) @ 2026-04-20 16:19 UTC (permalink / raw)
To: Mat Martineau, Geliang Tang, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman
Cc: netdev, mptcp, linux-kernel, Matthieu Baerts (NGI0), Gang Yan,
stable, Shuah Khan, linux-kselftest
On passive MPTCP connections, the MPTCP socket send buffer doesn't have
the expected size at accept() time.
Patch 1 fixes the regression introduced in v6.7, while the following one
validates the fix in the selftests.
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
Gang Yan (2):
mptcp: sync the msk->sndbuf at accept() time
selftests: mptcp: add a check for sndbuf of S/C
net/mptcp/protocol.c | 2 +-
tools/testing/selftests/net/mptcp/diag.sh | 28 ++++++++++++++++++++++++++++
2 files changed, 29 insertions(+), 1 deletion(-)
---
base-commit: 0cf004ffb61cd32d140531c3a84afe975f9fc7ea
change-id: 20260420-net-mptcp-sync-sndbuf-accept-5079a6f9c407
Best regards,
--
Matthieu Baerts (NGI0) <matttbe@kernel.org>
^ permalink raw reply
* [PATCH net 1/2] mptcp: sync the msk->sndbuf at accept() time
From: Matthieu Baerts (NGI0) @ 2026-04-20 16:19 UTC (permalink / raw)
To: Mat Martineau, Geliang Tang, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman
Cc: netdev, mptcp, linux-kernel, Matthieu Baerts (NGI0), Gang Yan,
stable
In-Reply-To: <20260420-net-mptcp-sync-sndbuf-accept-v1-0-e3523e3aeb44@kernel.org>
From: Gang Yan <yangang@kylinos.cn>
On passive MPTCP connections, the msk sndbuf is not updated correctly.
The root cause is an order issue in the accept path:
- tcp_check_req() -> subflow_syn_recv_sock() -> mptcp_sk_clone_init()
calls __mptcp_propagate_sndbuf() to copy the ssk sndbuf into msk
- Later, tcp_child_process() -> tcp_init_transfer() ->
tcp_sndbuf_expand() grows the ssk sndbuf.
So __mptcp_propagate_sndbuf() runs before the ssk sndbuf has been
expanded and the msk ends up with a much smaller sndbuf than the
subflow:
MPTCP: msk->sndbuf:20480, msk->first->sndbuf:2626560
Fix this by moving the __mptcp_propagate_sndbuf() call from
mptcp_sk_clone_init() -- the ssk sndbuf is not yet finalized there -- to
__mptcp_propagate_sndbuf() at accept() time, when the ssk sndbuf has
been fully expanded by tcp_sndbuf_expand().
Fixes: 8005184fd1ca ("mptcp: refactor sndbuf auto-tuning")
Cc: stable@vger.kernel.org
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/602
Signed-off-by: Gang Yan <yangang@kylinos.cn>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
net/mptcp/protocol.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index fbffd3a43fe8..718e910ff23f 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -3594,7 +3594,6 @@ struct sock *mptcp_sk_clone_init(const struct sock *sk,
* uses the correct data
*/
mptcp_copy_inaddrs(nsk, ssk);
- __mptcp_propagate_sndbuf(nsk, ssk);
mptcp_rcv_space_init(msk, ssk);
msk->rcvq_space.time = mptcp_stamp();
@@ -4252,6 +4251,7 @@ static int mptcp_stream_accept(struct socket *sock, struct socket *newsock,
mptcp_graft_subflows(newsk);
mptcp_rps_record_subflows(msk);
+ __mptcp_propagate_sndbuf(newsk, mptcp_subflow_tcp_sock(subflow));
/* Do late cleanup for the first subflow as necessary. Also
* deal with bad peers not doing a complete shutdown.
--
2.53.0
^ permalink raw reply related
* [PATCH net 2/2] selftests: mptcp: add a check for sndbuf of S/C
From: Matthieu Baerts (NGI0) @ 2026-04-20 16:19 UTC (permalink / raw)
To: Mat Martineau, Geliang Tang, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman
Cc: netdev, mptcp, linux-kernel, Matthieu Baerts (NGI0), Gang Yan,
Shuah Khan, linux-kselftest
In-Reply-To: <20260420-net-mptcp-sync-sndbuf-accept-v1-0-e3523e3aeb44@kernel.org>
From: Gang Yan <yangang@kylinos.cn>
Add a new chk_sndbuf() helper to diag.sh that extracts the sndbuf
(the 'tb' field from 'ss -m' skmem output) for both server and
client MPTCP sockets, and verifies they are equal.
Without the previous patch, it will fail:
'''
07 ....chk sndbuf server/client [FAIL] sndbuf S=20480 != C=2630656
'''
Signed-off-by: Gang Yan <yangang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
To: Shuah Khan <shuah@kernel.org>
Cc: linux-kselftest@vger.kernel.org
---
tools/testing/selftests/net/mptcp/diag.sh | 28 ++++++++++++++++++++++++++++
1 file changed, 28 insertions(+)
diff --git a/tools/testing/selftests/net/mptcp/diag.sh b/tools/testing/selftests/net/mptcp/diag.sh
index d847ff1737c3..27cbda68144e 100755
--- a/tools/testing/selftests/net/mptcp/diag.sh
+++ b/tools/testing/selftests/net/mptcp/diag.sh
@@ -322,6 +322,33 @@ wait_connected()
done
}
+chk_sndbuf()
+{
+ local server_sndbuf client_sndbuf msg
+ local port=${1}
+
+ msg="....chk sndbuf server/client"
+ server_sndbuf=$(ss -N "${ns}" -inmHM "sport" "${port}" | \
+ sed -n 's/.*tb\([0-9]\+\).*/\1/p')
+ client_sndbuf=$(ss -N "${ns}" -inmHM "dport" "${port}" | \
+ sed -n 's/.*tb\([0-9]\+\).*/\1/p')
+
+ mptcp_lib_print_title "${msg}"
+ if [ -z "${server_sndbuf}" ] || [ -z "${client_sndbuf}" ]; then
+ mptcp_lib_pr_fail "sndbuf S=${server_sndbuf} C=${client_sndbuf}"
+ mptcp_lib_result_fail "${msg}"
+ ret=${KSFT_FAIL}
+ elif [ "${server_sndbuf}" != "${client_sndbuf}" ]; then
+ mptcp_lib_pr_fail "sndbuf S=${server_sndbuf} != C=${client_sndbuf}"
+ mptcp_lib_result_fail "${msg}"
+ ret=${KSFT_FAIL}
+ else
+ mptcp_lib_pr_ok
+ mptcp_lib_result_pass "${msg}"
+ fi
+}
+
+
trap cleanup EXIT
mptcp_lib_ns_init ns
@@ -341,6 +368,7 @@ echo "b" | \
127.0.0.1 >/dev/null &
wait_connected $ns 10000
chk_msk_nr 2 "after MPC handshake"
+chk_sndbuf 10000
chk_last_time_info 10000
chk_msk_remote_key_nr 2 "....chk remote_key"
chk_msk_fallback_nr 0 "....chk no fallback"
--
2.53.0
^ permalink raw reply related
* Re: Question/proposal for DPLL NCO (DCO) mode
From: Ivan Vecera @ 2026-04-20 16:24 UTC (permalink / raw)
To: Jiri Pirko
Cc: netdev@vger.kernel.org, Arkadiusz Kubalewski, Vadim Fedorenko,
Jakub Kicinski
In-Reply-To: <3rgm35qzwvzdrubvtdrccl3sb5cx2ufiy7b5wnjun223bosi45@2bnbfrlqnsbk>
On 4/20/26 3:43 PM, Jiri Pirko wrote:
> Mon, Apr 20, 2026 at 01:42:29PM +0200, ivecera@redhat.com wrote:
>> Hi all,
>>
>> I am currently working on adding PTP clock support (PHC) to the ZL3073x
>> driver, and I would like to discuss an architectural addition to the
>> DPLL subsystem enum dpll_mode before sending the formal patch series.
>>
>> To support IEEE 1588 PTP, the hardware DPLL must be decoupled from
>> tracking physical reference pins. Instead, its output frequency needs to
>> be continuously steered by a software PTP stack (e.g., linuxptp) using
>> the .adjfine() callback. In the industry, this specific hardware state
>> is widely known as NCO (Numerically Controlled Oscillator) or DCO
>> (Digitally Controlled Oscillator) mode.
>>
>> Currently, the DPLL subsystem defines two modes in enum dpll_mode:
>>
>> MANUAL: The user explicitly selects a physical input pin to track.
>>
>> AUTOMATIC: The hardware automatically selects the best physical input pin
>> based on priorities.
>>
>> Neither of these accurately represents the NCO/DCO state, where physical
>> pins are ignored, and the loop is fully software-steered.
>
> The manual/automatic mode is defining strategy/behaviour about how the
> input pins are selected.
>
> <header_quote>
> * enum dpll_mode - working modes a dpll can support, differentiates if and how
> * dpll selects one of its inputs to syntonize with it
> </header_quote>
>
> You don't want to change the strategy. You just
> don't want that/don't care. Why it make sense to add a mode then?
It depends on the perspective. NCO is a selection strategy where the
DPLL does not lock onto an input reference (select nothing); instead,
its frequency is software-driven.
Btw. we’ve included DPLL_MODE_DETACHED in the documentation, which could
describe this use case.
But...
> I was thinking exposing this over STATUS. We have:
> UNLOCKED (free running)
> LOCKED
> LOCKED_HO_ACQ
> HOLDOVER
>
> So something maybe like SW_STEERED? But:
>
> The problem is you need to do selection. Perhaps we can have a
> "magic pin" to select for this :S ? We have pin of type
> DPLL_PIN_TYPE_INT_OSCILLATOR already. Perhaps we can have
> DPLL_PIN_TYPE_INT_NCO/NDO? It's a source.
>
> Makes sense?
Yes, this makes sense from a selection standpoint, and I have to say I
like the idea. It also has an additional benefit: if we have an NCO pin
type, that pin's frequency can be used to configure the delta frequency
offset of the DPLL device itself.
I’ll proceed this way. Thanks for the advice!
Ivan
^ permalink raw reply
* Re: [PATCH net 1/1] net/rose: hold listener socket during call request handling
From: Simon Horman @ 2026-04-20 16:26 UTC (permalink / raw)
To: Ren Wei
Cc: linux-hams, netdev, davem, edumazet, kuba, pabeni, kees, takamitz,
kuniyu, jiayuan.chen, mingo, stanksal, jlayton, yifanwucs,
tomapufckgml, bird, yuantan098, tonanli66
In-Reply-To: <52776256bf0fc38de92fe3edf39434538b672b69.1776327338.git.tonanli66@gmail.com>
On Fri, Apr 17, 2026 at 07:01:51PM +0800, Ren Wei wrote:
> From: Nan Li <tonanli66@gmail.com>
>
> The call request receive path keeps using the listener socket after the
> lookup lock has been dropped. Keep the listener alive across the
> remaining validation and child socket setup by taking a reference in the
> lookup path and releasing it once request handling is finished.
>
> This makes listener lifetime handling explicit and avoids races with
> concurrent socket teardown.
>
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> Cc: stable@kernel.org
> Reported-by: Yifan Wu <yifanwucs@gmail.com>
> Reported-by: Juefei Pu <tomapufckgml@gmail.com>
> Reported-by: Xin Liu <bird@lzu.edu.cn>
> Co-developed-by: Yuan Tan <yuantan098@gmail.com>
> Signed-off-by: Yuan Tan <yuantan098@gmail.com>
> Signed-off-by: Nan Li <tonanli66@gmail.com>
> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
> ---
> net/rose/af_rose.c | 24 +++++++++++++++++++-----
> 1 file changed, 19 insertions(+), 5 deletions(-)
Reviewed-by: Simon Horman <horms@kernel.org>
Sachiko has provided some feedback on this patch.
I do not believe they relate to shortcomings in this patch,
and I do not believe they should block progress of this patch.
You may want to look over them for areas to investigate as follow-up
(maybe you already did :)
...
^ permalink raw reply
* Re: [PATCH net 1/4] xsk: avoid skb leak in XDP_TX_METADATA case
From: Stanislav Fomichev @ 2026-04-20 16:27 UTC (permalink / raw)
To: Jason Xing
Cc: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
john.fastabend, horms, andrew+netdev, bpf, netdev, Jason Xing
In-Reply-To: <aeZI3ig6w0PhGaVU@devvm17672.vll0.facebook.com>
On 04/20, Stanislav Fomichev wrote:
> On 04/20, Stanislav Fomichev wrote:
> > On 04/18, Jason Xing wrote:
> > > From: Jason Xing <kernelxing@tencent.com>
> > >
> > > Fix it by explicitly adding kfree_skb() before returning back to its
> > > caller.
> > >
> > > How to reproduce it in virtio_net:
> > > 1. the current skb is the first one (which means no frag and xs->skb is
> > > NULL) and users enable metadata feature.
> > > 2. xsk_skb_metadata() returns a error code.
> > > 3. the caller xsk_build_skb() clears skb by using 'skb = NULL;'.
> > > 4. there is no chance to free this skb anymore.
> > >
> > > Closes: https://lore.kernel.org/all/20260415085204.3F87AC19424@smtp.kernel.org/
> > > Fixes: 30c3055f9c0d ("xsk: wrap generic metadata handling onto separate function")
> > > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> >
> > Acked-by: Stanislav Fomichev <sdf@fomichev.me>
>
> Actually, I take that back.. While looking at patch 2 (which always
> confuses me with that -EOVERFLOW handling), looks like we set
> skb->destructor in xsk_skb_init_misc? So this kfree will do
> xsk_cq_submit_addr_locked? Not sure that's what we want? Should we
> move xsk_skb_init_misc to happen after xsk_skb_metadata?
Ooops, looks like you already addressed these in v2? Let me look into
that..
^ permalink raw reply
* [PATCH net-next] netlink: clean up failed initial dump-start state
From: Michael Bommarito @ 2026-04-20 16:27 UTC (permalink / raw)
To: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
netdev
Cc: Simon Horman, Kuniyuki Iwashima, Kees Cook, Feng Yang,
linux-kernel
When __netlink_dump_start() has already installed cb->skb, taken the
module reference and set cb_running, a failure from the first
netlink_dump(sk, true) call returns via errout_skb without unwinding the
callback lifetime. That leaves cb_running set and defers module_put()
and consume_skb(cb->skb) until userspace drains the socket or closes it.
Share the normal callback teardown in a helper and use it on successful
completion and on the initial lock_taken=true failure path. Keep the
lock_taken=false continuation path unchanged, because recvmsg()-driven
retries legitimately preserve cb_running when they run out of receive
room.
Fixes: 16b304f3404f ("netlink: Eliminate kmalloc in netlink dump operation.")
Assisted-by: Claude:claude-opus-4-6
Assisted-by: Codex:gpt-5-4
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
---
Validation inside a UML guest on current mainline:
- An unprivileged local task (uid=65534, no CAP_NET_ADMIN) opens a
plain NETLINK_ROUTE socket, preloads sk_rmem_alloc with echoed
NLMSG_ERROR replies from an unsupported rtnetlink type, then issues
RTM_GETLINK | NLM_F_DUMP | NLM_F_ACK.
- Stock kernel: the initial __netlink_dump_start() hits the rmem gate
and returns via errout_skb with cb_running stuck at 1 until
recvmsg() or close() drives forward progress.
- Patched kernel: the same probe leaves cb_running clear immediately
on the lock_taken=true failure, and the larger-rcvbuf continuation
path (legitimate dump in progress) is unchanged.
A scaling pass on 3500 such wedged sockets in a 256M UML guest shows
about 3.8-3.9 MiB of extra unreclaimable slab (/proc/meminfo
SUnreclaim) beyond the visible queued rmem on the vulnerable kernel,
roughly 1.1 KiB/socket. Real accumulation, but the test hits
RLIMIT_NOFILE long before the guest approaches OOM, so this still
looks like a local availability cleanup rather than an exhaustion
primitive.
No Cc: stable@ on the theory that the bug self-heals on
recvmsg()/close and the accumulation is mild. Happy to add it and
route to net if you'd rather see it backported.
net/netlink/af_netlink.c | 30 +++++++++++++++++++-----------
1 file changed, 19 insertions(+), 11 deletions(-)
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 4d609d5cf406..7019c17e6879 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2250,6 +2250,20 @@ static int netlink_dump_done(struct netlink_sock *nlk, struct sk_buff *skb,
return 0;
}
+static void netlink_dump_cleanup(struct netlink_sock *nlk)
+{
+ struct module *module = nlk->cb.module;
+ struct sk_buff *skb = nlk->cb.skb;
+
+ if (nlk->cb.done)
+ nlk->cb.done(&nlk->cb);
+
+ WRITE_ONCE(nlk->cb_running, false);
+ mutex_unlock(&nlk->nl_cb_mutex);
+ module_put(module);
+ consume_skb(skb);
+}
+
static int netlink_dump(struct sock *sk, bool lock_taken)
{
struct netlink_sock *nlk = nlk_sk(sk);
@@ -2258,7 +2272,6 @@ static int netlink_dump(struct sock *sk, bool lock_taken)
struct sk_buff *skb = NULL;
unsigned int rmem, rcvbuf;
size_t max_recvmsg_len;
- struct module *module;
int err = -ENOBUFS;
int alloc_min_size;
int alloc_size;
@@ -2366,19 +2379,14 @@ static int netlink_dump(struct sock *sk, bool lock_taken)
else
__netlink_sendskb(sk, skb);
- if (cb->done)
- cb->done(cb);
-
- WRITE_ONCE(nlk->cb_running, false);
- module = cb->module;
- skb = cb->skb;
- mutex_unlock(&nlk->nl_cb_mutex);
- module_put(module);
- consume_skb(skb);
+ netlink_dump_cleanup(nlk);
return 0;
errout_skb:
- mutex_unlock(&nlk->nl_cb_mutex);
+ if (lock_taken)
+ netlink_dump_cleanup(nlk);
+ else
+ mutex_unlock(&nlk->nl_cb_mutex);
kfree_skb(skb);
return err;
}
--
2.53.0
^ permalink raw reply related
* Re: [PATCH v6 3/3] dts: s32g: Add GPR syscon region
From: Dan Carpenter @ 2026-04-20 16:45 UTC (permalink / raw)
To: Jared Kangas
Cc: Chester Lin, Matthias Brugger, Ghennadi Procopciuc,
NXP S32 Linux Team, Frank Li, Sascha Hauer,
Pengutronix Kernel Team, Fabio Estevam, Rob Herring,
Krzysztof Kozlowski, Conor Dooley, linux-arm-kernel, imx,
devicetree, linux-kernel, linaro-s32, netdev
In-Reply-To: <aeZOcCOgMy2g9wqp@rh-jkangas-kernel>
On Mon, Apr 20, 2026 at 09:04:00AM -0700, Jared Kangas wrote:
> Fixing Dan's address based on mailmap update, sorry for the noise.
>
> On Fri, Apr 17, 2026 at 02:36:25PM -0700, Jared Kangas wrote:
> > Hi Dan,
> >
> > On Fri, Jan 30, 2026 at 04:19:52PM +0300, Dan Carpenter wrote:
> > > Add the GPR syscon region for the s32 chipset.
> > >
> > > Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
> > > ---
> > >
> > > [snip]
> > >
> > > diff --git a/arch/arm64/boot/dts/freescale/s32g3.dtsi b/arch/arm64/boot/dts/freescale/s32g3.dtsi
> > > index e314f3c7d61d..be03db737384 100644
> > > --- a/arch/arm64/boot/dts/freescale/s32g3.dtsi
> > > +++ b/arch/arm64/boot/dts/freescale/s32g3.dtsi
> > > @@ -383,6 +383,11 @@ usdhc0-200mhz-grp4 {
> > > };
> > > };
> > >
> > > + gpr: syscon@4007c000 {
> > > + compatible = "nxp,s32g3-gpr", "syscon";
> > > + reg = <0x4007c000 0x3000>;
> > > + };
> > > +
> > > ocotp: nvmem@400a4000 {
> > > compatible = "nxp,s32g3-ocotp", "nxp,s32g2-ocotp";
> > > reg = <0x400a4000 0x400>;
> > > @@ -808,6 +813,7 @@ gmac0: ethernet@4033c000 {
> > > compatible = "nxp,s32g2-dwmac";
> > > reg = <0x4033c000 0x2000>, /* gmac IP */
> > > <0x4007c004 0x4>; /* GMAC_0_CTRL_STS */
> > > + nxp,phy-sel = <&gpr 0x4>;
> > > interrupt-parent = <&gic>;
> > > interrupts = <GIC_SPI 57 IRQ_TYPE_LEVEL_HIGH>;
> > > interrupt-names = "macirq";
> >
> > I gave this a test on an S32G-VNP-RDB3 and didn't see any issues on the
> > dwmac-s32 side, but this appears to trigger a panic when reading the new
> > debugfs regmap/*/registers file for the syscon node:
> >
> > # grep 4007c000 /proc/vmallocinfo
> > 0xffff800083da8000-0xffff800083dac000 16384 ioremap_prot+0x74/0xe0 phys=0x000000004007c000 ioremap
> > # cat /sys/kernel/debug/regmap/dummy-syscon@0x000000004007c000/registers
> > Internal error: synchronous external abort: 0000000096000210 [#1] SMP
> > [...]
> > CPU: 0 UID: 0 PID: 4344 Comm: cat Tainted: G M E X ------ --- 6.12.0+ #226 PREEMPT_RT
> > Tainted: [M]=MACHINE_CHECK, [E]=UNSIGNED_MODULE, [X]=AUX
> > [...]
> > pc : regmap_mmio_read32le+0x44/0xa0
> > lr : regmap_mmio_read32le+0x44/0xa0
> > [...]
> > x23: ffff00080c080000 x22: ffff000802ac4c00 x21: ffff800087b13c9c
> > x20: ffff800080a46494 x19: ffff800083da810c x18: 0000000000000004
> > [...]
> > x5 : ffff800080a46448 x4 : ffff800083da8000 x3 : ffff800080a46494
> > x2 : ffff800080a47230 x1 : ffff800083da810c x0 : 0000000000000020
> > Call trace:
> > regmap_mmio_read32le+0x44/0xa0 (P)
> > regmap_mmio_read+0x4c/0x80
> > [...]
> > Code: 52800400 8b214093 aa1303e1 97f4caf0 (b9400275)
> > ---[ end trace 0000000000000000 ]---
> > Kernel panic - not syncing: synchronous external abort: Fatal exception
> >
> > Running this through decodecode gives:
> >
> > All code
> > ========
> > 0: 52800400 mov w0, #0x20 // #32
> > 4: 8b214093 add x19, x4, w1, uxtw
> > 8: aa1303e1 mov x1, x19
> > c: 97f4caf0 bl 0xffffffffffd32bcc
> > 10:* b9400275 ldr w21, [x19] <-- trapping instruction
> >
> > Code starting with the faulting instruction
> > ===========================================
> > 0: b9400275 ldr w21, [x19]
> >
> > x19's offset from the base address in /proc/vmallocinfo is 0x10c, which
> > points to a bad read at physical address 0x4007c10c; I also confirmed
> > that the preceding memory reads back without issues:
Oh, ugh... I didn't realize that this wasn't merged. I don't have a
way to test this any more. The simplest fix would be to do change the
0x3000 to 0x100. The GPR63 register is at 0xFC.
reg = <0x4007c000 0x100>;
That's probably the best fix as well. The later register areas would
be their own syscons.
regards,
dan carpenter
^ permalink raw reply
* Re: [Intel-wired-lan] [PATCH] idpf: do not perform flow ops when netdev is detached
From: Li Li @ 2026-04-20 17:03 UTC (permalink / raw)
To: Kwapulinski, Piotr
Cc: Nguyen, Anthony L, Kitszel, Przemyslaw, David S. Miller,
Jakub Kicinski, Eric Dumazet, intel-wired-lan@lists.osuosl.org,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
David Decotigny, Singhai, Anjali, Samudrala, Sridhar,
Brian Vazquez, Tantilov, Emil S, stable
In-Reply-To: <PH7PR11MB59834C3C7785D1E69B7E954EF32F2@PH7PR11MB5983.namprd11.prod.outlook.com>
On Mon, Apr 20, 2026 at 1:23 AM Kwapulinski, Piotr
<piotr.kwapulinski@intel.com> wrote:
>
> >-----Original Message-----
> >From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of Li Li via Intel-wired-lan
> >Sent: Sunday, April 19, 2026 9:26 PM
> >To: Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Kitszel, Przemyslaw <przemyslaw.kitszel@intel.com>; David S. Miller <davem@davemloft.net>; Jakub Kicinski <kuba@kernel.org>; Eric Dumazet <edumazet@google.com>; intel-wired-lan@lists.osuosl.org
> >Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; David Decotigny <decot@google.com>; Singhai, Anjali <anjali.singhai@intel.com>; Samudrala, Sridhar <sridhar.samudrala@intel.com>; Brian Vazquez <brianvv@google.com>; Li Li <boolli@google.com>; Tantilov, Emil S <emil.s.tantilov@intel.com>
> >Subject: [Intel-wired-lan] [PATCH] idpf: do not perform flow ops when netdev is detached
> >
> >Even though commit 2e281e1155fc ("idpf: detach and close netdevs while handling a reset") prevents ethtool -N/-n operations to operate on detached netdevs, we found that out-of-tree workflows like OpenOnload can bypass ethtool core locks and call idpf_set_rxnfc directly during an idpf HW reset. When this happens, we could get kernel crashes like the following:
> >
> >[ 4045.787439] BUG: kernel NULL pointer dereference, address: 0000000000000070 [ 4045.794420] #PF: supervisor read access in kernel mode [ 4045.799580] #PF: error_code(0x0000) - not-present page [ 4045.804739] PGD 0 [ 4045.806772] Oops: Oops: 0000 [#1] SMP NOPTI ...
> >[ 4045.836425] Workqueue: onload-wqueue oof_do_deferred_work_fn [onload] [ 4045.842926] RIP: 0010:idpf_del_flow_steer+0x24/0x170 [idpf] ...
> >[ 4045.946323] Call Trace:
> >[ 4045.948796] <TASK>
> >[ 4045.950915] ? show_trace_log_lvl+0x1b0/0x2f0 [ 4045.955293] ? show_trace_log_lvl+0x1b0/0x2f0 [ 4045.959672] ? idpf_set_rxnfc+0x6f/0x80 [idpf] [ 4045.964142] ? __die_body.cold+0x8/0x12 [ 4045.968000] ? page_fault_oops+0x148/0x160 [ 4045.972117] ? exc_page_fault+0x6f/0x160 [ 4045.976060] ? asm_exc_page_fault+0x22/0x30 [ 4045.980262] ? idpf_del_flow_steer+0x24/0x170 [idpf] [ 4045.985245] idpf_set_rxnfc+0x6f/0x80 [idpf] [ 4045.989535] af_xdp_filter_remove+0x7c/0xb0 [sfc_resource] [ 4045.995069] oo_hw_filter_clear_hwports+0x6f/0xa0 [onload] [ 4046.000589] oo_hw_filter_update+0x65/0x210 [onload] [ 4046.005587] oof_hw_filter_update.constprop.0+0xe7/0x140 [onload] [ 4046.011716] oof_manager_update_all_filters+0xad/0x270 [onload] [ 4046.017671] __oof_do_deferred_work+0x15e/0x190 [onload] [ 4046.023014] oof_do_deferred_work+0x2c/0x40 [onload] [ 4046.028018] oof_do_deferred_work_fn+0x12/0x30 [onload] [ 4046.033277] process_one_work+0x174/0x330 [ 4046.037304] worker_thread+0x246/0x390 [ 4046.041074] ? __pfx_worker_thread+0x10/0x10 [ 4046.045364] kthread+0xf6/0x240 [ 4046.048530] ? __pfx_kthread+0x10/0x10 [ 4046.052297] ret_from_fork+0x2d/0x50 [ 4046.055896] ? __pfx_kthread+0x10/0x10 [ 4046.059664] ret_from_fork_asm+0x1a/0x30 [ 4046.063613] </TASK>
> >
> >To prevent this, we need to add checks in idpf_set_rxnfc and idpf_get_rxnfc to error out if the netdev is already detached.
> >
> >Tested: implemented the following patch to synthetically force idpf into a HW reset:
> >
> >diff --git a/drivers/net/ethernet/intel/idpf/idpf_txrx.c b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
> >index 4fc0bb14c5b1..27476d57bcf0 100644
> >--- a/drivers/net/ethernet/intel/idpf/idpf_txrx.c
> >+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
> >@@ -10,6 +10,9 @@
> > #define idpf_tx_buf_next(buf) (*(u32 *)&(buf)->priv)
> > LIBETH_SQE_CHECK_PRIV(u32);
> >
> >+static bool SIMULATE_TX_TIMEOUT;
> >+module_param(SIMULATE_TX_TIMEOUT, bool, 0644);
> >+
> > /**
> > * idpf_chk_linearize - Check if skb exceeds max descriptors per packet
> > * @skb: send buffer
> >@@ -46,6 +49,8 @@ void idpf_tx_timeout(struct net_device *netdev, unsigned int txqueue)
> >
> > adapter->tx_timeout_count++;
> >
> >+ SIMULATE_TX_TIMEOUT = false;
> >+
> > netdev_err(netdev, "Detected Tx timeout: Count %d, Queue %d\n",
> > adapter->tx_timeout_count, txqueue);
> > if (!idpf_is_reset_in_prog(adapter)) { @@ -2225,6 +2230,8 @@ static bool idpf_tx_clean_complq(struct idpf_compl_queue *complq, int budget,
> > goto fetch_next_desc;
> > }
> > tx_q = complq->txq_grp->txqs[rel_tx_qid];
> >+ if (unlikely(SIMULATE_TX_TIMEOUT && (tx_q->idx % 2 == 1)))
> >+ goto fetch_next_desc;
> >
> > /* Determine completion type */
> > ctype = le16_get_bits(tx_desc->common.qid_comptype_gen,
> >diff --git a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
> >index be66f9b2e101..ba5da2a86c15 100644
> >--- a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
> >+++ b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
> >@@ -8,6 +8,9 @@
> > #include "idpf_virtchnl.h"
> > #include "idpf_ptp.h"
> >
> >+static bool VIRTCHNL_FAILED;
> >+module_param(VIRTCHNL_FAILED, bool, 0644);
> >+
> > /**
> > * struct idpf_vc_xn_manager - Manager for tracking transactions
> > * @ring: backing and lookup for transactions @@ -3496,6 +3499,11 @@ int idpf_vc_core_init(struct idpf_adapter *adapter)
> > switch (adapter->state) {
> > case __IDPF_VER_CHECK:
> > err = idpf_send_ver_msg(adapter);
> >+
> >+ if (unlikely(VIRTCHNL_FAILED)) {
> >+ err = -EIO;
> >+ }
> Please remove redundant parenthesis
> Piotr
Hi Piotr,
The block you are commenting on is not part of the patch; it's just a
block of test code in the commit message I used to reproduce the
failures.
Thanks!
>
> >+
> > switch (err) {
> > case 0:
> > /* success, move state machine forward */
> >
> >And tested by writing 1 to /sys/module/idpf/parameters/VIRTCHNL_FAILED
> >and /sys/module/idpf/parameters/SIMULATE_TX_TIMEOUT, and running
> >idpf_get_rxnfc() right after the HW reset.
> >
> >Without the patch: encountered NULL pointer and kernel crash.
> >
> >With the patch: no crashes.
> >
> >Fixes: 2e281e1155fc ("idpf: detach and close netdevs while handling a reset")
> >Signed-off-by: Li Li <boolli@google.com>
> >---
> > drivers/net/ethernet/intel/idpf/idpf_ethtool.c | 6 ++++++
> > 1 file changed, 6 insertions(+)
> >
> >diff --git a/drivers/net/ethernet/intel/idpf/idpf_ethtool.c b/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
> >index bb99d9e7c65d..8368a7e6a754 100644
> >--- a/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
> >+++ b/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
> >@@ -43,6 +43,9 @@ static int idpf_get_rxnfc(struct net_device *netdev, struct ethtool_rxnfc *cmd,
> > unsigned int cnt = 0;
> > int err = 0;
> >
> >+ if (!netdev || !netif_device_present(netdev))
> >+ return -ENODEV;
> >+
> > idpf_vport_ctrl_lock(netdev);
> > vport = idpf_netdev_to_vport(netdev);
> > vport_config = np->adapter->vport_config[np->vport_idx];
> >@@ -349,6 +352,9 @@ static int idpf_set_rxnfc(struct net_device *netdev, struct ethtool_rxnfc *cmd) {
> > int ret = -EOPNOTSUPP;
> >
> >+ if (!netdev || !netif_device_present(netdev))
> >+ return -ENODEV;
> >+
> > idpf_vport_ctrl_lock(netdev);
> > switch (cmd->cmd) {
> > case ETHTOOL_SRXCLSRLINS:
> >--
> >2.54.0.rc1.513.gad8abe7a5a-goog
^ permalink raw reply
* Re: [PATCH net 1/2] net/mlx5e: psp: Fix invalid access on PSP dev registration fail
From: Jakub Kicinski @ 2026-04-20 17:09 UTC (permalink / raw)
To: Cosmin Ratiu
Cc: Tariq Toukan, Boris Pismenny, willemdebruijn.kernel@gmail.com,
andrew+netdev@lunn.ch, daniel.zahka@gmail.com,
davem@davemloft.net, leon@kernel.org, Rahul Rameshbabu,
pabeni@redhat.com, linux-rdma@vger.kernel.org,
linux-kernel@vger.kernel.org, Raed Salem, Dragos Tatulea,
kees@kernel.org, Mark Bloch, edumazet@google.com, Saeed Mahameed,
netdev@vger.kernel.org, Gal Pressman
In-Reply-To: <d7e2d46769e120a16ce12d345c51a47349733828.camel@nvidia.com>
On Mon, 20 Apr 2026 10:30:46 +0000 Cosmin Ratiu wrote:
> > When psp_dev_create() fails, this function now returns without
> > setting
> > psp->psp, leaving it as NULL. However, priv->psp remains allocated
> > and
> > non-NULL.
> >
> > Does this leave the RX datapath vulnerable to a NULL pointer
> > dereference?
> >
> > If priv->psp is non-NULL, the NIC RX initialization path can still
> > call
> > mlx5_accel_psp_fs_init_rx_tables(), which creates hardware flow
> > steering
> > rules to intercept UDP traffic.
> >
> > If a UDP packet triggers these rules, the hardware flags the CQE with
> > MLX5E_PSP_MARKER_BIT. The RX fast-path sees the marker and invokes
> > mlx5e_psp_offload_handle_rx_skb(), which dereferences the pointer
> > unconditionally:
> >
> > u16 dev_id = priv->psp->psp->id;
> >
> > Since priv->psp->psp is NULL, this will cause a kernel panic. Should
> > priv->psp be cleaned up, or the error propagated, to prevent flow
> > rules
> > from being installed when registration fails?
>
> First, this is preexisting. But more importantly, it's impossible to
> trigger:
> - with no PSP devs, there can be no PSP SAs installed.
> - with no SAs, PSP decryption cannot succeed.
> - all unsuccessfully decrypted PSP packets are dropped by steering.
> - the RX handler will not see any PSP packets with the marker set.
>
> This patch fixes the comparatively way more likely scenario of
> psp_dev_register failing and then mlx5e_psp_unregister passing the
> error pointer to psp_dev_unregister, which will do unpleasant things
> with it.
Sure but why are you leaving the priv->psp struct in place and whatever
FS init has been done? IOW if you really want PSP init to not block
probe why is mlx5e_psp_register() a void function rather than
mlx5e_psp_init() ? Ignoring errors from psp_dev_create()
makes no sense to me - what are you protecting from? kmalloc(GFP_KERNEL)
failing?
^ permalink raw reply
* Re: [PATCH bpf v3 2/2] selftests/bpf: Test TCP_NODELAY in TCP hdr opt callbacks
From: Martin KaFai Lau @ 2026-04-20 17:09 UTC (permalink / raw)
To: KaFai Wan
Cc: daniel, john.fastabend, sdf, ast, andrii, eddyz87, memxor, song,
yonghong.song, jolsa, davem, edumazet, kuba, pabeni, horms, shuah,
jiayuan.chen, bpf, netdev, linux-kernel, linux-kselftest
In-Reply-To: <9b3e7c118f5b9105b92f53b66034c1b7884c1372.camel@linux.dev>
On Sat, Apr 18, 2026 at 10:19:47AM +0800, KaFai Wan wrote:
> > > + ret = setsockopt(sk_fds.active_fd, SOL_TCP, TCP_NODELAY, &true_val, sizeof(true_val));
> >
> > Same comment as in v2. Why this setsockopt is needed?
>
> Sorry I miss this. It's from the review of v1, my first version would break the syscall setsockopt
> and other CB besides HDR_OPT_LEN/WRITE_HDR_OPT. So in the test I check setsockopt() and
> bpf_setsockopt() in PASSIVE_ESTABLISHED_CB to make sure patch#1 would not break user space and other
> CB.
ic. Yep, remove it since v3 is not changing the syscall setsockopt.
>
> > The setsockopt in userspace is unnecessary.
>
> Is bpf_setsockopt() in PASSIVE_ESTABLISHED_CB also unnecessary? I'll respin if they are unnecessary.
This one is fine. It checks if the bpf_setsockopt is not affectred in other CB.
^ permalink raw reply
* Re: [PATCH v2 0/3] mISDN: fix socket/device lifetime and naming races
From: Jakub Kicinski @ 2026-04-20 17:11 UTC (permalink / raw)
To: Shuvam Pandey; +Cc: netdev
In-Reply-To: <CANAAWHJKfRVO++ZdXYAKSaTH8JCBi_Xeu4DE-Ldaa6zu2Kub6A@mail.gmail.com>
On Sun, 19 Apr 2026 06:09:03 +0545 Shuvam Pandey wrote:
> Please ignore v2 for now. I’m reworking the series and will send a
> corrected v3.
Please hold off until Thursday, I'm intending to delete ISDN from the
tree. Please resend if the code still exists in Linus's tree at that
point.
^ permalink raw reply
* [PATCH net 0/4] gve: Fixes for issues discovered via net selftests
From: Harshitha Ramamurthy @ 2026-04-20 17:18 UTC (permalink / raw)
To: netdev
Cc: joshwash, hramamurthy, andrew+netdev, davem, edumazet, kuba,
pabeni, willemb, maolson, nktgrg, jfraker, ziweixiao,
jacob.e.keller, pkaligineedi, shailend, jordanrhee, stable,
linux-kernel, Pin-yen Lin
From: Pin-yen Lin <treapking@google.com>
This patch series addresses several issues in the gve driver. All four of these
fixes were uncovered by running the net selftests.
The series includes the following changes:
- Patch 1 adds NULL pointer checks for the per-queue statistics code to
prevent crashes when the rings are queried while the link is down. This
was discovered by the drivers/net/stats.py selftest.
- Patch 2 fixes an issue where interface stats would go backwards when the
interface was brought down or its configuration was adjusted. This was
also discovered by the drivers/net/stats.py selftest.
- Patch 3 ensures the driver falls back to the default minimum ring size if
the corresponding device option values are exposed as 0. This prevents
userspace from configuring unexpectedly small ring sizes. This was
discovered by the drivers/net/ring_reconfig.py selftest.
- Patch 4 makes sure ethtool configuration modifications are done
synchronously before returning to the userspace. This was discovered by
the drivers/net/ping.py selftest.
Debarghya Kundu (2):
gve: Add NULL pointer checks for per-queue statistics
gve: Fix backward stats when interface goes down or configuration is
adjusted
Pin-yen Lin (2):
gve: Use default min ring size when device option values are 0
gve: Make ethtool config changes synchronous
drivers/net/ethernet/google/gve/gve.h | 6 +
drivers/net/ethernet/google/gve/gve_adminq.c | 4 +-
drivers/net/ethernet/google/gve/gve_main.c | 128 +++++++++++++------
3 files changed, 100 insertions(+), 38 deletions(-)
--
2.54.0.rc0.605.g598a273b03-goog
base-commit: 2dddb34dd0d07b01fa770eca89480a4da4f13153
branch: gve-misc-fixes
^ permalink raw reply
* [PATCH net 1/4] gve: Add NULL pointer checks for per-queue statistics
From: Harshitha Ramamurthy @ 2026-04-20 17:18 UTC (permalink / raw)
To: netdev
Cc: joshwash, hramamurthy, andrew+netdev, davem, edumazet, kuba,
pabeni, willemb, maolson, nktgrg, jfraker, ziweixiao,
jacob.e.keller, pkaligineedi, shailend, jordanrhee, stable,
linux-kernel, Debarghya Kundu, Pin-yen Lin
In-Reply-To: <20260420171837.455487-1-hramamurthy@google.com>
From: Debarghya Kundu <debarghyak@google.com>
gve_get_[tx/rx]_queue_stats references the [tx/rx] null rings when the
link is down. Add NULL pointer checks to guard this.
This was discovered by drivers/net/stats.py selftest.
Cc: stable@vger.kernel.org
Fixes: 2e5e0932dff5 ("gve: add support for basic queue stats")
Signed-off-by: Debarghya Kundu <debarghyak@google.com>
Signed-off-by: Pin-yen Lin <treapking@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
---
drivers/net/ethernet/google/gve/gve_main.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/google/gve/gve_main.c b/drivers/net/ethernet/google/gve/gve_main.c
index 0ee864b0afe0..675382e9756c 100644
--- a/drivers/net/ethernet/google/gve/gve_main.c
+++ b/drivers/net/ethernet/google/gve/gve_main.c
@@ -2705,9 +2705,13 @@ static void gve_get_rx_queue_stats(struct net_device *dev, int idx,
struct netdev_queue_stats_rx *rx_stats)
{
struct gve_priv *priv = netdev_priv(dev);
- struct gve_rx_ring *rx = &priv->rx[idx];
+ struct gve_rx_ring *rx;
unsigned int start;
+ if (!priv->rx)
+ return;
+ rx = &priv->rx[idx];
+
do {
start = u64_stats_fetch_begin(&rx->statss);
rx_stats->packets = rx->rpackets;
@@ -2721,9 +2725,13 @@ static void gve_get_tx_queue_stats(struct net_device *dev, int idx,
struct netdev_queue_stats_tx *tx_stats)
{
struct gve_priv *priv = netdev_priv(dev);
- struct gve_tx_ring *tx = &priv->tx[idx];
+ struct gve_tx_ring *tx;
unsigned int start;
+ if (!priv->tx)
+ return;
+ tx = &priv->tx[idx];
+
do {
start = u64_stats_fetch_begin(&tx->statss);
tx_stats->packets = tx->pkt_done;
--
2.54.0.rc0.605.g598a273b03-goog
^ permalink raw reply related
* [PATCH net 2/4] gve: Fix backward stats when interface goes down or configuration is adjusted
From: Harshitha Ramamurthy @ 2026-04-20 17:18 UTC (permalink / raw)
To: netdev
Cc: joshwash, hramamurthy, andrew+netdev, davem, edumazet, kuba,
pabeni, willemb, maolson, nktgrg, jfraker, ziweixiao,
jacob.e.keller, pkaligineedi, shailend, jordanrhee, stable,
linux-kernel, Debarghya Kundu, Pin-yen Lin
In-Reply-To: <20260420171837.455487-1-hramamurthy@google.com>
From: Debarghya Kundu <debarghyak@google.com>
gve_get_base_stats() sets all the stats to 0, so the stats go backwards
when interface goes down or configuration is adjusted.
Fix this by persisting baseline stats across interface down.
This was discovered by drivers/net/stats.py selftest.
Cc: stable@vger.kernel.org
Fixes: 2e5e0932dff5 ("gve: add support for basic queue stats")
Signed-off-by: Debarghya Kundu <debarghyak@google.com>
Signed-off-by: Pin-yen Lin <treapking@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
---
drivers/net/ethernet/google/gve/gve.h | 6 ++
drivers/net/ethernet/google/gve/gve_main.c | 64 +++++++++++++++++++---
2 files changed, 63 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/google/gve/gve.h b/drivers/net/ethernet/google/gve/gve.h
index cbdf3a842cfe..ff7797043908 100644
--- a/drivers/net/ethernet/google/gve/gve.h
+++ b/drivers/net/ethernet/google/gve/gve.h
@@ -794,6 +794,10 @@ struct gve_ptp {
struct gve_priv *priv;
};
+struct gve_ring_err_stats {
+ u64 rx_alloc_fails;
+};
+
struct gve_priv {
struct net_device *dev;
struct gve_tx_ring *tx; /* array of tx_cfg.num_queues */
@@ -882,6 +886,8 @@ struct gve_priv {
unsigned long service_task_flags;
unsigned long state_flags;
+ struct gve_ring_err_stats base_ring_err_stats;
+ struct rtnl_link_stats64 base_net_stats;
struct gve_stats_report *stats_report;
u64 stats_report_len;
dma_addr_t stats_report_bus; /* dma address for the stats report */
diff --git a/drivers/net/ethernet/google/gve/gve_main.c b/drivers/net/ethernet/google/gve/gve_main.c
index 675382e9756c..8617782791e0 100644
--- a/drivers/net/ethernet/google/gve/gve_main.c
+++ b/drivers/net/ethernet/google/gve/gve_main.c
@@ -105,9 +105,22 @@ static netdev_tx_t gve_start_xmit(struct sk_buff *skb, struct net_device *dev)
return gve_tx_dqo(skb, dev);
}
-static void gve_get_stats(struct net_device *dev, struct rtnl_link_stats64 *s)
+static void gve_add_base_stats(struct gve_priv *priv,
+ struct rtnl_link_stats64 *s)
+{
+ struct rtnl_link_stats64 *base_stats = &priv->base_net_stats;
+
+ s->rx_packets += base_stats->rx_packets;
+ s->rx_bytes += base_stats->rx_bytes;
+ s->rx_dropped += base_stats->rx_dropped;
+ s->tx_packets += base_stats->tx_packets;
+ s->tx_bytes += base_stats->tx_bytes;
+ s->tx_dropped += base_stats->tx_dropped;
+}
+
+static void gve_get_ring_stats(struct gve_priv *priv,
+ struct rtnl_link_stats64 *s)
{
- struct gve_priv *priv = netdev_priv(dev);
unsigned int start;
u64 packets, bytes;
int num_tx_queues;
@@ -142,6 +155,14 @@ static void gve_get_stats(struct net_device *dev, struct rtnl_link_stats64 *s)
}
}
+static void gve_get_stats(struct net_device *dev, struct rtnl_link_stats64 *s)
+{
+ struct gve_priv *priv = netdev_priv(dev);
+
+ gve_get_ring_stats(priv, s);
+ gve_add_base_stats(priv, s);
+}
+
static int gve_alloc_flow_rule_caches(struct gve_priv *priv)
{
struct gve_flow_rules_cache *flow_rules_cache = &priv->flow_rules_cache;
@@ -1493,6 +1514,23 @@ static int gve_queues_stop(struct gve_priv *priv)
return gve_reset_recovery(priv, false);
}
+static void gve_get_ring_err_stats(struct gve_priv *priv,
+ struct gve_ring_err_stats *err_stats)
+{
+ int ring;
+
+ for (ring = 0; ring < priv->rx_cfg.num_queues; ring++) {
+ unsigned int start;
+ struct gve_rx_ring *rx = &priv->rx[ring];
+
+ do {
+ start = u64_stats_fetch_begin(&rx->statss);
+ err_stats->rx_alloc_fails +=
+ rx->rx_skb_alloc_fail + rx->rx_buf_alloc_fail;
+ } while (u64_stats_fetch_retry(&rx->statss, start));
+ }
+}
+
static int gve_close(struct net_device *dev)
{
struct gve_priv *priv = netdev_priv(dev);
@@ -1502,6 +1540,10 @@ static int gve_close(struct net_device *dev)
if (err)
return err;
+ /* Save ring queue and err stats before closing the interface */
+ gve_get_ring_stats(priv, &priv->base_net_stats);
+ gve_get_ring_err_stats(priv, &priv->base_ring_err_stats);
+
gve_queues_mem_remove(priv);
return 0;
}
@@ -2743,12 +2785,20 @@ static void gve_get_base_stats(struct net_device *dev,
struct netdev_queue_stats_rx *rx,
struct netdev_queue_stats_tx *tx)
{
- rx->packets = 0;
- rx->bytes = 0;
- rx->alloc_fail = 0;
+ const struct gve_ring_err_stats *base_err_stats;
+ const struct rtnl_link_stats64 *base_stats;
+ struct gve_priv *priv;
+
+ priv = netdev_priv(dev);
+ base_stats = &priv->base_net_stats;
+ base_err_stats = &priv->base_ring_err_stats;
+
+ rx->packets = base_stats->rx_packets;
+ rx->bytes = base_stats->rx_bytes;
+ rx->alloc_fail = base_err_stats->rx_alloc_fails;
- tx->packets = 0;
- tx->bytes = 0;
+ tx->packets = base_stats->tx_packets;
+ tx->bytes = base_stats->tx_bytes;
}
static const struct netdev_stat_ops gve_stat_ops = {
--
2.54.0.rc0.605.g598a273b03-goog
^ permalink raw reply related
* [PATCH net 3/4] gve: Use default min ring size when device option values are 0
From: Harshitha Ramamurthy @ 2026-04-20 17:18 UTC (permalink / raw)
To: netdev
Cc: joshwash, hramamurthy, andrew+netdev, davem, edumazet, kuba,
pabeni, willemb, maolson, nktgrg, jfraker, ziweixiao,
jacob.e.keller, pkaligineedi, shailend, jordanrhee, stable,
linux-kernel, Pin-yen Lin
In-Reply-To: <20260420171837.455487-1-hramamurthy@google.com>
From: Pin-yen Lin <treapking@google.com>
On gvnic devices that support reporting minimum ring sizes, the device
option always includes the min_(rx|tx)_ring_size fields, and the values
will be 0 if they are not configured to be exposed. This makes the
driver allow unexpected small ring size configurations from the
userspace.
Use the default ring size in the driver if the min ring sizes from the
device option are 0.
This was discovered by drivers/net/ring_reconfig.py selftest.
Cc: stable@vger.kernel.org
Fixes: ed4fb326947d ("gve: add support to read ring size ranges from the device")
Reviewed-by: Joshua Washington <joshwash@google.com>
Reviewed-by: Jordan Rhee <jordanrhee@google.com>
Signed-off-by: Pin-yen Lin <treapking@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
---
drivers/net/ethernet/google/gve/gve_adminq.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/google/gve/gve_adminq.c b/drivers/net/ethernet/google/gve/gve_adminq.c
index b72cc0fa2ba2..57d898f6fa82 100644
--- a/drivers/net/ethernet/google/gve/gve_adminq.c
+++ b/drivers/net/ethernet/google/gve/gve_adminq.c
@@ -189,7 +189,9 @@ void gve_parse_device_option(struct gve_priv *priv,
*dev_op_modify_ring = (void *)(option + 1);
/* device has not provided min ring size */
- if (option_length == GVE_DEVICE_OPTION_NO_MIN_RING_SIZE)
+ if (option_length == GVE_DEVICE_OPTION_NO_MIN_RING_SIZE ||
+ be16_to_cpu((*dev_op_modify_ring)->min_rx_ring_size) == 0 ||
+ be16_to_cpu((*dev_op_modify_ring)->min_tx_ring_size) == 0)
priv->default_min_ring_size = true;
break;
case GVE_DEV_OPT_ID_FLOW_STEERING:
--
2.54.0.rc0.605.g598a273b03-goog
^ permalink raw reply related
* [PATCH net 4/4] gve: Make ethtool config changes synchronous
From: Harshitha Ramamurthy @ 2026-04-20 17:18 UTC (permalink / raw)
To: netdev
Cc: joshwash, hramamurthy, andrew+netdev, davem, edumazet, kuba,
pabeni, willemb, maolson, nktgrg, jfraker, ziweixiao,
jacob.e.keller, pkaligineedi, shailend, jordanrhee, stable,
linux-kernel, Pin-yen Lin
In-Reply-To: <20260420171837.455487-1-hramamurthy@google.com>
From: Pin-yen Lin <treapking@google.com>
When modifying device features via ethtool, the driver queues the
carrier status update to its workqueue (gve_wq). This leads to a
short link-down state after running the ethtool command.
Use `gve_turnup_and_check_status()` instead of `gve_turnup()` in
`gve_queues_start()` to update the carrier status before returning to
the userspace.
This was discovered by drivers/net/ping.py selftest. The test calls
ping command right after an ethtool configuration, but the interface
could be down without this fix.
Cc: stable@vger.kernel.org
Fixes: 5f08cd3d6423 ("gve: Alloc before freeing when adjusting queues")
Reviewed-by: Joshua Washington <joshwash@google.com>
Signed-off-by: Pin-yen Lin <treapking@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
---
drivers/net/ethernet/google/gve/gve_main.c | 56 +++++++++++-----------
1 file changed, 28 insertions(+), 28 deletions(-)
diff --git a/drivers/net/ethernet/google/gve/gve_main.c b/drivers/net/ethernet/google/gve/gve_main.c
index 8617782791e0..d3b4bec38de5 100644
--- a/drivers/net/ethernet/google/gve/gve_main.c
+++ b/drivers/net/ethernet/google/gve/gve_main.c
@@ -1374,6 +1374,33 @@ static void gve_queues_mem_remove(struct gve_priv *priv)
priv->rx = NULL;
}
+static void gve_handle_link_status(struct gve_priv *priv, bool link_status)
+{
+ if (!gve_get_napi_enabled(priv))
+ return;
+
+ if (link_status == netif_carrier_ok(priv->dev))
+ return;
+
+ if (link_status) {
+ netdev_info(priv->dev, "Device link is up.\n");
+ netif_carrier_on(priv->dev);
+ } else {
+ netdev_info(priv->dev, "Device link is down.\n");
+ netif_carrier_off(priv->dev);
+ }
+}
+
+static void gve_turnup_and_check_status(struct gve_priv *priv)
+{
+ u32 status;
+
+ gve_turnup(priv);
+ status = ioread32be(&priv->reg_bar0->device_status);
+ gve_handle_link_status(priv,
+ GVE_DEVICE_STATUS_LINK_STATUS_MASK & status);
+}
+
/* The passed-in queue memory is stored into priv and the queues are made live.
* No memory is allocated. Passed-in memory is freed on errors.
*/
@@ -1434,8 +1461,7 @@ static int gve_queues_start(struct gve_priv *priv,
round_jiffies(jiffies +
msecs_to_jiffies(priv->stats_report_timer_period)));
- gve_turnup(priv);
- queue_work(priv->gve_wq, &priv->service_task);
+ gve_turnup_and_check_status(priv);
priv->interface_up_cnt++;
return 0;
@@ -1548,23 +1574,6 @@ static int gve_close(struct net_device *dev)
return 0;
}
-static void gve_handle_link_status(struct gve_priv *priv, bool link_status)
-{
- if (!gve_get_napi_enabled(priv))
- return;
-
- if (link_status == netif_carrier_ok(priv->dev))
- return;
-
- if (link_status) {
- netdev_info(priv->dev, "Device link is up.\n");
- netif_carrier_on(priv->dev);
- } else {
- netdev_info(priv->dev, "Device link is down.\n");
- netif_carrier_off(priv->dev);
- }
-}
-
static int gve_configure_rings_xdp(struct gve_priv *priv,
u16 num_xdp_rings)
{
@@ -2039,15 +2048,6 @@ static void gve_turnup(struct gve_priv *priv)
gve_set_napi_enabled(priv);
}
-static void gve_turnup_and_check_status(struct gve_priv *priv)
-{
- u32 status;
-
- gve_turnup(priv);
- status = ioread32be(&priv->reg_bar0->device_status);
- gve_handle_link_status(priv, GVE_DEVICE_STATUS_LINK_STATUS_MASK & status);
-}
-
static struct gve_notify_block *gve_get_tx_notify_block(struct gve_priv *priv,
unsigned int txqueue)
{
--
2.54.0.rc0.605.g598a273b03-goog
^ permalink raw reply related
* Re: [PATCH 2/9] x86/extable: switch to using FIELD_GET_SIGNED()
From: Yury Norov @ 2026-04-20 17:18 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, Andy Lutomirski, Jonathan Cameron, David Lechner,
Nuno Sá, Andy Shevchenko, Ping-Ke Shih, Richard Cochran,
Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Alexandre Belloni, Yury Norov, Rasmus Villemoes,
Hans de Goede, Linus Walleij, Sakari Ailus, Salah Triki,
Achim Gratz, Ben Collins, linux-kernel, linux-iio, linux-wireless,
netdev, linux-rtc
In-Reply-To: <20260420112428.GF3102624@noisy.programming.kicks-ass.net>
On Mon, Apr 20, 2026 at 01:24:28PM +0200, Peter Zijlstra wrote:
> On Fri, Apr 17, 2026 at 01:36:13PM -0400, Yury Norov wrote:
> > The EX_DATA register is laid out such that EX_DATA_IMM occupied MSB.
> > It's done to make sure that FIELD_GET() will sign-extend the IMM
> > field during extraction.
> >
> > To enforce that, all EX_DATA masks are made signed integers. This
> > works, but relies on the particular implementation of FIELD_GET(),
> > i.e. masking then shifting, not vice versa; and the particular
> > placement of the fields in the register.
>
> I don't think the order of the mask and shift matters in this case. If
> we were to first shift down and then mask, it would still work (after
> all, the mask would also need to be shifted and would also get sign
> extended, effectively ending up as -1).
FIELD_GET() doesn't require mask to be signed when a reg is signed, so
shifting mask may become zero-extended in an alternative implementation:
(reg >> __bf_shf(mask)) & (mask >> __bf_shf(mask)
This all is hypothetical, anyways.
> But yes, this very much depends on the signed field being the topmost
> field and including the MSB.
This is the part I dislike mostly. This would look just like undefined
behavior for the API user: depending on fields placement or type of the
inputs, sometimes FIELD_GET() sign-extendeds the field, and sometimes
not.
We could likely force FIELD_GET() to treat both reg and mask as unsigned
types, and state that explicitly in the documentation.
^ permalink raw reply
* Re: [PATCH] rhashtable: Restore insecure_elasticity toggle
From: kernel test robot @ 2026-04-20 17:22 UTC (permalink / raw)
To: Herbert Xu, Tejun Heo
Cc: llvm, oe-kbuild-all, Thomas Graf, David Vernet, Andrea Righi,
Changwoo Min, Emil Tsalapatis, linux-crypto, sched-ext,
linux-kernel, Florian Westphal, netdev, NeilBrown
In-Reply-To: <aeLgjAeJuidWNy3N@gondor.apana.org.au>
Hi Herbert,
kernel test robot noticed the following build warnings:
[auto build test WARNING on akpm-mm/mm-nonmm-unstable]
[also build test WARNING on net/main net-next/main linus/master v7.0]
[cannot apply to next-20260420]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Herbert-Xu/rhashtable-Restore-insecure_elasticity-toggle/20260418-233732
base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-nonmm-unstable
patch link: https://lore.kernel.org/r/aeLgjAeJuidWNy3N%40gondor.apana.org.au
patch subject: [PATCH] rhashtable: Restore insecure_elasticity toggle
config: sparc64-allmodconfig (https://download.01.org/0day-ci/archive/20260421/202604210112.4dByOk9v-lkp@intel.com/config)
compiler: clang version 23.0.0git (https://github.com/llvm/llvm-project 5bac06718f502014fade905512f1d26d578a18f3)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260421/202604210112.4dByOk9v-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202604210112.4dByOk9v-lkp@intel.com/
All warnings (new ones prefixed by >>):
In file included from net/mac80211/s1g.c:9:
In file included from net/mac80211/ieee80211_i.h:27:
include/linux/rhashtable.h:831:32: error: member reference type 'struct rhashtable_params' is not a pointer; did you mean to use '.'?
831 | if (elasticity <= 0 && !params->insecure_elasticity)
| ~~~~~~^~
| .
include/linux/rhashtable.h:839:13: error: member reference type 'struct rhashtable_params' is not a pointer; did you mean to use '.'?
839 | !params->insecure_elasticity)
| ~~~~~~^~
| .
>> net/mac80211/s1g.c:104:36: warning: implicit conversion from 'unsigned long' to '__u16' (aka 'unsigned short') changes value from 18446744073709551614 to 65534 [-Wconstant-conversion]
104 | twt_agrt->req_type &= cpu_to_le16(~IEEE80211_TWT_REQTYPE_REQUEST);
| ~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/byteorder/generic.h:90:21: note: expanded from macro 'cpu_to_le16'
90 | #define cpu_to_le16 __cpu_to_le16
| ^
include/uapi/linux/byteorder/big_endian.h:36:53: note: expanded from macro '__cpu_to_le16'
36 | #define __cpu_to_le16(x) ((__force __le16)__swab16((x)))
| ~~~~~~~~~~^~~
include/uapi/linux/swab.h:107:12: note: expanded from macro '__swab16'
107 | __fswab16(x))
| ~~~~~~~~~ ^
1 warning and 2 errors generated.
vim +104 net/mac80211/s1g.c
f5a4c24e689f54e Lorenzo Bianconi 2021-08-23 95
f5a4c24e689f54e Lorenzo Bianconi 2021-08-23 96 static void
f5a4c24e689f54e Lorenzo Bianconi 2021-08-23 97 ieee80211_s1g_rx_twt_setup(struct ieee80211_sub_if_data *sdata,
f5a4c24e689f54e Lorenzo Bianconi 2021-08-23 98 struct sta_info *sta, struct sk_buff *skb)
f5a4c24e689f54e Lorenzo Bianconi 2021-08-23 99 {
f5a4c24e689f54e Lorenzo Bianconi 2021-08-23 100 struct ieee80211_mgmt *mgmt = (void *)skb->data;
f5a4c24e689f54e Lorenzo Bianconi 2021-08-23 101 struct ieee80211_twt_setup *twt = (void *)mgmt->u.action.u.s1g.variable;
f5a4c24e689f54e Lorenzo Bianconi 2021-08-23 102 struct ieee80211_twt_params *twt_agrt = (void *)twt->params;
f5a4c24e689f54e Lorenzo Bianconi 2021-08-23 103
f5a4c24e689f54e Lorenzo Bianconi 2021-08-23 @104 twt_agrt->req_type &= cpu_to_le16(~IEEE80211_TWT_REQTYPE_REQUEST);
f5a4c24e689f54e Lorenzo Bianconi 2021-08-23 105
f5a4c24e689f54e Lorenzo Bianconi 2021-08-23 106 /* broadcast TWT not supported yet */
f5a4c24e689f54e Lorenzo Bianconi 2021-08-23 107 if (twt->control & IEEE80211_TWT_CONTROL_NEG_TYPE_BROADCAST) {
7ff379ba2d4b7b2 Johannes Berg 2021-09-27 108 twt_agrt->req_type &=
7ff379ba2d4b7b2 Johannes Berg 2021-09-27 109 ~cpu_to_le16(IEEE80211_TWT_REQTYPE_SETUP_CMD);
7ff379ba2d4b7b2 Johannes Berg 2021-09-27 110 twt_agrt->req_type |=
7ff379ba2d4b7b2 Johannes Berg 2021-09-27 111 le16_encode_bits(TWT_SETUP_CMD_REJECT,
f5a4c24e689f54e Lorenzo Bianconi 2021-08-23 112 IEEE80211_TWT_REQTYPE_SETUP_CMD);
f5a4c24e689f54e Lorenzo Bianconi 2021-08-23 113 goto out;
f5a4c24e689f54e Lorenzo Bianconi 2021-08-23 114 }
f5a4c24e689f54e Lorenzo Bianconi 2021-08-23 115
30ac96f7cc973bb Howard Hsu 2022-10-27 116 /* TWT Information not supported yet */
30ac96f7cc973bb Howard Hsu 2022-10-27 117 twt->control |= IEEE80211_TWT_CONTROL_RX_DISABLED;
30ac96f7cc973bb Howard Hsu 2022-10-27 118
f5a4c24e689f54e Lorenzo Bianconi 2021-08-23 119 drv_add_twt_setup(sdata->local, sdata, &sta->sta, twt);
f5a4c24e689f54e Lorenzo Bianconi 2021-08-23 120 out:
f5a4c24e689f54e Lorenzo Bianconi 2021-08-23 121 ieee80211_s1g_send_twt_setup(sdata, mgmt->sa, sdata->vif.addr, twt);
f5a4c24e689f54e Lorenzo Bianconi 2021-08-23 122 }
f5a4c24e689f54e Lorenzo Bianconi 2021-08-23 123
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply
* Re: [PATCH-next v2 0/2] ipvs: Fix incorrect use of HK_TYPE_KTHREAD housekeeping cpumask
From: Julian Anastasov @ 2026-04-20 17:24 UTC (permalink / raw)
To: Pablo Neira Ayuso
Cc: Waiman Long, Simon Horman, David S. Miller, David Ahern,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Florian Westphal,
Phil Sutter, Frederic Weisbecker, Chen Ridong, Phil Auld,
linux-kernel, netdev, lvs-devel, netfilter-devel, coreteam,
sheviks
In-Reply-To: <ac_OscBPYRwt73ic@lemonverbena>
Hello,
On Fri, 3 Apr 2026, Pablo Neira Ayuso wrote:
> On Fri, Apr 03, 2026 at 05:15:50PM +0300, Julian Anastasov wrote:
> >
> > Hello,
> >
> > On Tue, 31 Mar 2026, Waiman Long wrote:
> >
> > > v2:
> > > - Rebased on top of linux-next
> > >
> > > Since commit 041ee6f3727a ("kthread: Rely on HK_TYPE_DOMAIN for preferred
> > > affinity management"), the HK_TYPE_KTHREAD housekeeping cpumask may no
> > > longer be correct in showing the actual CPU affinity of kthreads that
> > > have no predefined CPU affinity. As the ipvs networking code is still
> > > using HK_TYPE_KTHREAD, we need to make HK_TYPE_KTHREAD reflect the
> > > reality.
> > >
> > > This patch series makes HK_TYPE_KTHREAD an alias of HK_TYPE_DOMAIN
> > > and uses RCU to protect access to the HK_TYPE_KTHREAD housekeeping
> > > cpumask.
> > >
> > > Waiman Long (2):
> > > sched/isolation: Make HK_TYPE_KTHREAD an alias of HK_TYPE_DOMAIN
> > > ipvs: Guard access of HK_TYPE_KTHREAD cpumask with RCU
> >
> > The patchset looks good to me for nf-next, thanks!
> >
> > Acked-by: Julian Anastasov <ja@ssi.bg>
> >
> > Pablo, Florian, as a bugfix this patchset missed
> > the chance to be applied before the changes that are in
> > nf-next in ip_vs.h, there is little fuzz there. If there
> > is no chance to resolve it somehow, we can apply it
> > on top of nf-next where it now applies successfully.
>
> One way to handle this is to follow up with nf-next as you suggest,
> then send a backport that applies cleanly for -stable once it is
> released.
>
> Else, let me know if I am misunderstanding.
This patchset is now material for the net tree. To help it,
I just posted patch "ipvs: fix races around est_mutex and est_cpulist"
that can be applied before this patchset to the net tree.
Can we get this patchset for the net tree?
Regards
--
Julian Anastasov <ja@ssi.bg>
^ permalink raw reply
* Re: [PATCH net-next] netlink: clean up failed initial dump-start state
From: Jakub Kicinski @ 2026-04-20 17:37 UTC (permalink / raw)
To: Michael Bommarito
Cc: David S . Miller, Eric Dumazet, Paolo Abeni, netdev, Simon Horman,
Kuniyuki Iwashima, Kees Cook, Feng Yang, linux-kernel
In-Reply-To: <20260420162734.854587-1-michael.bommarito@gmail.com>
On Mon, 20 Apr 2026 12:27:34 -0400 Michael Bommarito wrote:
> When __netlink_dump_start() has already installed cb->skb, taken the
> module reference and set cb_running, a failure from the first
> netlink_dump(sk, true) call returns via errout_skb without unwinding the
> callback lifetime. That leaves cb_running set and defers module_put()
> and consume_skb(cb->skb) until userspace drains the socket or closes it.
On a quick look I can't see which path clears the dump state in case we
keep failing to allocate an skb. Could you add more info on that?
> Share the normal callback teardown in a helper and use it on successful
> completion and on the initial lock_taken=true failure path. Keep the
> lock_taken=false continuation path unchanged, because recvmsg()-driven
> retries legitimately preserve cb_running when they run out of receive
> room.
>
> Fixes: 16b304f3404f ("netlink: Eliminate kmalloc in netlink dump operation.")
> Assisted-by: Claude:claude-opus-4-6
> Assisted-by: Codex:gpt-5-4
> Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
> ---
> Validation inside a UML guest on current mainline:
>
> - An unprivileged local task (uid=65534, no CAP_NET_ADMIN) opens a
> plain NETLINK_ROUTE socket, preloads sk_rmem_alloc with echoed
> NLMSG_ERROR replies from an unsupported rtnetlink type, then issues
> RTM_GETLINK | NLM_F_DUMP | NLM_F_ACK.
> - Stock kernel: the initial __netlink_dump_start() hits the rmem gate
> and returns via errout_skb with cb_running stuck at 1 until
> recvmsg() or close() drives forward progress.
> - Patched kernel: the same probe leaves cb_running clear immediately
> on the lock_taken=true failure, and the larger-rcvbuf continuation
> path (legitimate dump in progress) is unchanged.
>
> A scaling pass on 3500 such wedged sockets in a 256M UML guest shows
> about 3.8-3.9 MiB of extra unreclaimable slab (/proc/meminfo
> SUnreclaim) beyond the visible queued rmem on the vulnerable kernel,
> roughly 1.1 KiB/socket. Real accumulation, but the test hits
> RLIMIT_NOFILE long before the guest approaches OOM, so this still
> looks like a local availability cleanup rather than an exhaustion
> primitive.
This should be part of the commit message, it's useful to understanding
the problem. Actually more than the current commit msg TBH.
> No Cc: stable@ on the theory that the bug self-heals on
> recvmsg()/close and the accumulation is mild. Happy to add it and
> route to net if you'd rather see it backported.
>
> net/netlink/af_netlink.c | 30 +++++++++++++++++++-----------
> 1 file changed, 19 insertions(+), 11 deletions(-)
>
> diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
> index 4d609d5cf406..7019c17e6879 100644
> --- a/net/netlink/af_netlink.c
> +++ b/net/netlink/af_netlink.c
> @@ -2250,6 +2250,20 @@ static int netlink_dump_done(struct netlink_sock *nlk, struct sk_buff *skb,
> return 0;
> }
>
> +static void netlink_dump_cleanup(struct netlink_sock *nlk)
> +{
> + struct module *module = nlk->cb.module;
> + struct sk_buff *skb = nlk->cb.skb;
> +
> + if (nlk->cb.done)
> + nlk->cb.done(&nlk->cb);
> +
> + WRITE_ONCE(nlk->cb_running, false);
> + mutex_unlock(&nlk->nl_cb_mutex);
> + module_put(module);
> + consume_skb(skb);
> +}
It's probably better to create a helper that shares the code with
the release path as well. And try not to switch the skb freeing
to consume_skb().
> static int netlink_dump(struct sock *sk, bool lock_taken)
> {
> struct netlink_sock *nlk = nlk_sk(sk);
> @@ -2258,7 +2272,6 @@ static int netlink_dump(struct sock *sk, bool lock_taken)
> struct sk_buff *skb = NULL;
> unsigned int rmem, rcvbuf;
> size_t max_recvmsg_len;
> - struct module *module;
> int err = -ENOBUFS;
> int alloc_min_size;
> int alloc_size;
> @@ -2366,19 +2379,14 @@ static int netlink_dump(struct sock *sk, bool lock_taken)
> else
> __netlink_sendskb(sk, skb);
>
> - if (cb->done)
> - cb->done(cb);
> -
> - WRITE_ONCE(nlk->cb_running, false);
> - module = cb->module;
> - skb = cb->skb;
> - mutex_unlock(&nlk->nl_cb_mutex);
> - module_put(module);
> - consume_skb(skb);
> + netlink_dump_cleanup(nlk);
> return 0;
>
> errout_skb:
> - mutex_unlock(&nlk->nl_cb_mutex);
> + if (lock_taken)
> + netlink_dump_cleanup(nlk);
> + else
> + mutex_unlock(&nlk->nl_cb_mutex);
> kfree_skb(skb);
> return err;
> }
If you're planning to repost - please wait until tomorrow, we ask that
revisions are at least 24h apart so that people across the timezones
have a chance to chime in.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox