Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH nf] netfilter: xt_TCPMSS: check skb_dst before path-MTU clamping
From: Florian Westphal @ 2026-04-18 19:58 UTC (permalink / raw)
  To: Weiming Shi
  Cc: Pablo Neira Ayuso, David S . Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Phil Sutter, Simon Horman, netfilter-devel, coreteam,
	netdev, Xiang Mei
In-Reply-To: <20260418163057.2611503-2-bestswngs@gmail.com>

Weiming Shi <bestswngs@gmail.com> wrote:
> When TCPMSS with CLAMP_PMTU is used via nft_compat in a non-base
> chain, par->hook_mask is set to 0, bypassing the checkentry hook
> validation. The target can then run at PRE_ROUTING where skb_dst is
> NULL, causing a null-ptr-deref in tcpmss_mangle_packet():
> 
>  KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
>  RIP: 0010:tcpmss_mangle_packet (include/net/dst.h:219 net/netfilter/xt_TCPMSS.c:105)
>   tcpmss_tg4 (net/netfilter/xt_TCPMSS.c:202)
>   nft_target_eval_xt (net/netfilter/nft_compat.c:87)
>   nft_do_chain (net/netfilter/nf_tables_core.c:287)
>   nf_hook_slow (net/netfilter/core.c:623)
> 
> Check skb_dst() for NULL before calling dst_mtu().

FWIW I will apply this patch even though its wrong.

nft_compat.c is just too broken, I don't see how it can be
fixed in any reasonable amount of time.

validation is done too early, at expression instantiation
time.

This doesn't work because we have incomplete graph, it has
to be done at final table validation time.

But then all required compat info (xtables hints) is gone
and no longer available.

AFAICS the only way to resolve this is to cache the info in
the nft_expr priv area (WHERE IS ABSOLUTELY DOESN'T BELONG!)
because thats the only storage thewre is.

*puke*

^ permalink raw reply

* Re: [PATCH net] sctp: fix OOB write to userspace in sctp_getsockopt_peer_auth_chunks
From: patchwork-bot+netdevbpf @ 2026-04-18 19:30 UTC (permalink / raw)
  To: Michael Bommarito
  Cc: linux-sctp, marcelo.leitner, lucien.xin, davem, edumazet, kuba,
	pabeni, horms, netdev, linux-kernel, stable
In-Reply-To: <20260416031903.1447072-1-michael.bommarito@gmail.com>

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Wed, 15 Apr 2026 23:19:03 -0400 you wrote:
> sctp_getsockopt_peer_auth_chunks() checks that the caller's optval
> buffer is large enough for the peer AUTH chunk list with
> 
>     if (len < num_chunks)
>             return -EINVAL;
> 
> but then writes num_chunks bytes to p->gauth_chunks, which lives
> at offset offsetof(struct sctp_authchunks, gauth_chunks) == 8
> inside optval.  The check is missing the sizeof(struct
> sctp_authchunks) = 8-byte header.  When the caller supplies
> len == num_chunks (for any num_chunks > 0) the test passes but
> copy_to_user() writes sizeof(struct sctp_authchunks) = 8 bytes
> past the declared buffer.
> 
> [...]

Here is the summary with links:
  - [net] sctp: fix OOB write to userspace in sctp_getsockopt_peer_auth_chunks
    https://git.kernel.org/netdev/net/c/0cf004ffb61c

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net v6 2/2] pppoe: drop PFC frames
From: Jakub Kicinski @ 2026-04-18 19:11 UTC (permalink / raw)
  To: qingfang.deng
  Cc: Jakub Kicinski, linux-ppp, andrew+netdev, davem, edumazet, pabeni,
	hataegu0826, horms, kees, kuniyu, bigeasy, gnault, ericwouds,
	semen.protsenko, netdev, linux-kernel, paulus, jaco, carlsonj,
	wojciech.drewek, marcin.szycik
In-Reply-To: <20260415022456.141758-2-qingfang.deng@linux.dev>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
pppoe: drop PFC frames

This commit drops Protocol Field Compression (PFC) frames for PPPoE to avoid
unaligned access exceptions caused by the network header being shifted. A
helper function is introduced to identify compressed protocol fields.

> diff --git a/drivers/net/ppp/pppoe.c b/drivers/net/ppp/pppoe.c
> --- a/drivers/net/ppp/pppoe.c
> +++ b/drivers/net/ppp/pppoe.c
[ ... ]
> @@ -434,6 +434,12 @@ static int pppoe_rcv(struct sk_buff *skb, struct net_device *dev,
>  	if (skb->len < len)
>  		goto drop;
>  
> +	/* skb->data points to the PPP protocol header after skb_pull_rcsum.
> +	 * Drop PFC frames.
> +	 */
> +	if (ppp_skb_is_compressed_proto(skb))

Does this code safely access the PPP protocol byte?

After pulling the PPPoE header, skb->data points to the inner payload.
However, the new ppp_skb_is_compressed_proto() helper unconditionally
dereferences skb->data[0].

If a crafted PPPoE packet is received with an inner length of 0 and no
Ethernet padding, the linear buffer might end exactly after the PPPoE header.
In that scenario, this would read past the allocated linear buffer.

Even if the inner length is greater than 0, could the payload reside entirely
in non-linear paged fragments, causing an out-of-bounds read?

Would it be safer to verify the packet has at least 1 byte and use
pskb_may_pull() to ensure the protocol byte is in the linear region before
inspecting it, perhaps after the pskb_trim_rcsum() call?

> +		goto drop;
> +
>  	if (pskb_trim_rcsum(skb, len))
>  		goto drop;
[ ... ]
> diff --git a/include/linux/ppp_defs.h b/include/linux/ppp_defs.h
> --- a/include/linux/ppp_defs.h
> +++ b/include/linux/ppp_defs.h
[ ... ]
> +static inline bool ppp_skb_is_compressed_proto(const struct sk_buff *skb)
> +{
> +	return unlikely(skb->data[0] & 0x01);
> +}
-- 
pw-bot: cr

^ permalink raw reply

* Re: [Intel-wired-lan] [PATCH v5 net-next 0/8] dpll/ice: Add TXC DPLL type and full TX reference clock control for E825
From: Jakub Kicinski @ 2026-04-18 19:26 UTC (permalink / raw)
  To: Kubalewski, Arkadiusz
  Cc: Vecera, Ivan, vadim.fedorenko@linux.dev, edumazet@google.com,
	netdev@vger.kernel.org, richardcochran@gmail.com,
	donald.hunter@gmail.com, linux-kernel@vger.kernel.org,
	davem@davemloft.net, Prathosh.Satish@microchip.com,
	andrew+netdev@lunn.ch, intel-wired-lan@lists.osuosl.org,
	horms@kernel.org, Kitszel, Przemyslaw, Nguyen, Anthony L,
	pabeni@redhat.com, jiri@resnulli.us
In-Reply-To: <IA0PR11MB7378CF62D86454916AE8F9D79B202@IA0PR11MB7378.namprd11.prod.outlook.com>

On Fri, 17 Apr 2026 12:22:05 +0000 Kubalewski, Arkadiusz wrote:
> >> I was thinking that this is more like a purpose specific DPLL device, if
> >> someone would want something similar we would have to review it, right?  
> >
> >We would if it was a Ethernet MAC PLL, but if someone wanted to expose
> >whether some random PLL in their ASIC locks - are we adding a new type
> >for each one of those?  
> 
> Yes, that was the implicit intention within those patches, if other purpose
> specific PLL would have to be present for whatever HW design and user
> control over it would be required, then that would be the easiest to
> maintain in the long term? Multiple types and each have own function/purpose.
> 
> It would be good as long as there is one PLL for a function per board, once
> there could be multiple ones for single function, we would have to add some
> enumeration (labels, etc.)

Defer on adding identifiers. User knows which driver and bus device
spawned the pll and more importantly what the pin topology is.
Naming in the kernel is rarely a good idea.

> >> It depends, TX clock has one of external pins connected to external
> >> DPLL,
> >> but second is a board-level pin with ability to provide some external
> >> clock signal, the user would have to determine that purpose just based
> >> on the topology of one of the pins, which seems a bit problematic?
> >> I.e. if at some point there would be HW with only external non-DPLL
> >> connected pins?  
> >
> >Not sure I follow, TBH. To me the function of the "MAC PLL" is fairly
> >obvious from the fact that it has a pin exposed via rtnetlink. So it's
> >obviously a DPLL which can drive the Tx clock?
> 
> I am lost a bit now too. You mean clock recovery pin? And EEC type dpll?
> In this solution the 'MAC'/EEC is external and it doesn't drive TX clocks
> directly.

MAC == "tspll" == TXC in this series. On Grzegorz's diagram the new PLL
was in the MAC, which makes sense since it's a pll in the same ASIC as
the MAC.

I'm saying that the function of that pll is obvious since its pin will
plug into the netdev / rtnetlink.

> >It's the function / relation / linking to the EEC DPLL that may not
> >be obvious. But user can see how the pins connect they can get some
> >LLM to draw a diagram of a live system.. et voila :)
> 
> Yes, correct it would work for this particular HW, but adding a variant
> without a external EEC-connected pin in the picture would be problematic
> to understand 'generic' dpll purpose, pointing to the labels later.

The function of the "MAC/tspll" is still obvious. The clarity of the
external PLL is not helped by naming the "MAC/tspll".

> Just to make it clear. I believe that generic type dpll could be used in
> any HW and for any purpose, so after all each such usage could possibly
> introduce entropy and confusion on the user side.
> 
> But if you are fine with that, then sure, we can live with generic
> purpose dpll.

Considering all the imperfect options - generic / unnamed type would be
my preference.

^ permalink raw reply

* Re: [PATCH v1 net] af_unix: Drop all SCM attributes for SOCKMAP.
From: patchwork-bot+netdevbpf @ 2026-04-18 19:20 UTC (permalink / raw)
  To: Kuniyuki Iwashima
  Cc: davem, edumazet, kuba, pabeni, horms, cong.wang, jiang.wang,
	kuni1840, netdev, xingyuj
In-Reply-To: <20260415184830.3988432-1-kuniyu@google.com>

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Wed, 15 Apr 2026 18:48:29 +0000 you wrote:
> SOCKMAP can hide inflight fd from AF_UNIX GC.
> 
> When a socket in SOCKMAP receives skb with inflight fd,
> sk_psock_verdict_data_ready() looks up the mapped socket and
> enqueue skb to its psock->ingress_skb.
> 
> Since neither the old nor the new GC can inspect the psock
> queue, the hidden skb leaks the inflight sockets.  Note that
> this cannot be detected via kmemleak because inflight sockets
> are linked to a global list.
> 
> [...]

Here is the summary with links:
  - [v1,net] af_unix: Drop all SCM attributes for SOCKMAP.
    https://git.kernel.org/netdev/net/c/965dc93481d1

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net] ipv6: fix possible UAF in icmpv6_rcv()
From: patchwork-bot+netdevbpf @ 2026-04-18 19:20 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: davem, kuba, pabeni, horms, dsahern, idosch, netdev, eric.dumazet
In-Reply-To: <20260416103505.2380753-1-edumazet@google.com>

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Thu, 16 Apr 2026 10:35:05 +0000 you wrote:
> Caching saddr and daddr before pskb_pull() is problematic
> since skb->head can change.
> 
> Remove these temporary variables:
> 
> - We only access &ipv6_hdr(skb)->saddr and &ipv6_hdr(skb)->daddr
>   when net_dbg_ratelimited() is called in the slow path.
> 
> [...]

Here is the summary with links:
  - [net] ipv6: fix possible UAF in icmpv6_rcv()
    https://git.kernel.org/netdev/net/c/f996edd7615e

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net,v3 1/1] net: stmmac: Update default_an_inband before passing value to phylink_config
From: patchwork-bot+netdevbpf @ 2026-04-18 19:20 UTC (permalink / raw)
  To: KhaiWenTan
  Cc: andrew+netdev, davem, edumazet, kuba, pabeni, mcoquelin.stm32,
	alexandre.torgue, rmk+kernel, maxime.chevallier, netdev,
	linux-stm32, linux-arm-kernel, linux-kernel, yoong.siang.song,
	hong.aun.looi, khai.wen.tan
In-Reply-To: <20260416102609.7953-1-khai.wen.tan@intel.com>

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Thu, 16 Apr 2026 18:26:09 +0800 you wrote:
> From: KhaiWenTan <khai.wen.tan@linux.intel.com>
> 
> get_interfaces() will update both the plat->phy_interfaces and
> mdio_bus_data->default_an_inband based on reading a SERDES register. As
> get_interfaces() will be called after default_an_inband had already been
> read, dwmac-intel regressed as a result with incorrect default_an_inband
> value in phylink_config.
> 
> [...]

Here is the summary with links:
  - [net,v3,1/1] net: stmmac: Update default_an_inband before passing value to phylink_config
    https://git.kernel.org/netdev/net/c/8cff9dbe89d8

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [net,PATCH v4 1/2] net: ks8851: Reinstate disabling of BHs around IRQ handler
From: patchwork-bot+netdevbpf @ 2026-04-18 19:20 UTC (permalink / raw)
  To: Marek Vasut
  Cc: netdev, bigeasy, stable, davem, andrew+netdev, edumazet, kuba, nb,
	pabeni, ronald.wahl, yiconghui, linux-kernel
In-Reply-To: <20260415231020.455298-1-marex@nabladev.com>

Hello:

This series was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Thu, 16 Apr 2026 01:09:44 +0200 you wrote:
> If the driver executes ks8851_irq() AND a TX packet has been sent, then
> the driver enables TX queue via netif_wake_queue() which schedules TX
> softirq to queue packets for this device.
> 
> If CONFIG_PREEMPT_RT=y is set AND a packet has also been received by
> the MAC, then ks8851_rx_pkts() calls netdev_alloc_skb_ip_align() to
> allocate SKBs for the received packets. If netdev_alloc_skb_ip_align()
> is called with BH enabled, then local_bh_enable() at the end of
> netdev_alloc_skb_ip_align() will trigger the pending softirq processing,
> which may ultimately call the .xmit callback ks8851_start_xmit_par().
> The ks8851_start_xmit_par() will try to lock struct ks8851_net_par
> .lock spinlock, which is already locked by ks8851_irq() from which
> ks8851_start_xmit_par() was called. This leads to a deadlock, which
> is reported by the kernel, including a trace listed below.
> 
> [...]

Here is the summary with links:
  - [net,v4,1/2] net: ks8851: Reinstate disabling of BHs around IRQ handler
    https://git.kernel.org/netdev/net/c/5c9fcac3c872
  - [net,v4,2/2] net: ks8851: Avoid excess softirq scheduling
    https://git.kernel.org/netdev/net/c/22230e68b2cf

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net v2 00/12] Intel Wired LAN Driver Updates 2026-04-14 (ice, i40e, iavf, idpf, e1000e)
From: patchwork-bot+netdevbpf @ 2026-04-18 19:10 UTC (permalink / raw)
  To: Jacob Keller
  Cc: andrew+netdev, davem, edumazet, kuba, pabeni, netdev,
	grzegorz.nitka, aleksandr.loktionov, horms, sx.rinitha,
	zoltan.fodor, sunithax.d.mekala, lgs201920130244, stable,
	mschmidt, paul.greenwalt, przemyslaw.kitszel, kmta1236, kohei,
	poros, pmenzel, rafal.romanowski, emil.s.tantilov, patryk.holda,
	tactii, avigailx.dahan
In-Reply-To: <20260416-iwl-net-submission-2026-04-14-v2-0-686c33c9828d@intel.com>

Hello:

This series was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Thu, 16 Apr 2026 17:53:24 -0700 you wrote:
> Grzegorz updates the logic for adjusting the PTP hardware clock on E830,
> fixing a bug that prevented adjustments below S32_MAX/MIN nanoseconds.
> 
> Grzegorz and Zoli update the PCS latency settings for E825 devices at 10GbE
> and 25GbE, improving the accuracy of timestamps based on data from
> production hardware.
> 
> [...]

Here is the summary with links:
  - [net,v2,01/12] ice: fix 'adjust' timer programming for E830 devices
    https://git.kernel.org/netdev/net/c/885c5e57924d
  - [net,v2,02/12] ice: update PCS latency settings for E825 10G/25Gb modes
    https://git.kernel.org/netdev/net/c/05567e405273
  - [net,v2,03/12] ice: fix double free in ice_sf_eth_activate() error path
    https://git.kernel.org/netdev/net/c/9aab1c3d7299
  - [net,v2,04/12] ice: fix double-free of tx_buf skb
    https://git.kernel.org/netdev/net/c/1a303baa715e
  - [net,v2,05/12] ice: fix PHY config on media change with link-down-on-close
    https://git.kernel.org/netdev/net/c/55e74f9ea7fe
  - [net,v2,06/12] ice: fix ICE_AQ_LINK_SPEED_M for 200G
    https://git.kernel.org/netdev/net/c/4a3a940059e9
  - [net,v2,07/12] ice: fix race condition in TX timestamp ring cleanup
    https://git.kernel.org/netdev/net/c/7c72ec18c2a4
  - [net,v2,08/12] ice: fix potential NULL pointer deref in error path of ice_set_ringparam()
    https://git.kernel.org/netdev/net/c/fa28351f970f
  - [net,v2,09/12] i40e: don't advertise IFF_SUPP_NOFCS
    https://git.kernel.org/netdev/net/c/a24162f18825
  - [net,v2,10/12] iavf: fix wrong VLAN mask for legacy Rx descriptors L2TAG2
    https://git.kernel.org/netdev/net/c/496d9f91062f
  - [net,v2,11/12] idpf: fix xdp crash in soft reset error path
    (no matching commit)
  - [net,v2,12/12] e1000e: Unroll PTP in probe error handling
    https://git.kernel.org/netdev/net/c/aa3f7fe40935

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net 1/2] net/mlx5e: psp: Fix invalid access on PSP dev registration fail
From: Jakub Kicinski @ 2026-04-18 19:08 UTC (permalink / raw)
  To: tariqt
  Cc: Jakub Kicinski, edumazet, pabeni, andrew+netdev, davem, borisp,
	saeedm, leon, mbloch, daniel.zahka, willemdebruijn.kernel, cratiu,
	raeds, rrameshbabu, dtatulea, kees, netdev, linux-rdma,
	linux-kernel, gal
In-Reply-To: <20260417050201.192070-2-tariqt@nvidia.com>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
net/mlx5e: psp: Fix invalid access on PSP dev registration fail

This commit updates PSP device registration to use a local variable,
only saving the PSP device pointer if psp_dev_create() succeeds. This
prevents an error pointer from being stored in priv->psp->psp.

> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/psp.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/psp.c
[ ... ]
> @@ -1070,29 +1070,37 @@ static struct psp_dev_ops mlx5_psp_ops = {
[ ... ]
>  void mlx5e_psp_register(struct mlx5e_priv *priv)
>  {
> +	struct mlx5e_psp *psp = priv->psp;
> +	struct psp_dev *psd;
> +
[ ... ]
> -	priv->psp->psp = psp_dev_create(priv->netdev, &mlx5_psp_ops,
> -					&priv->psp->caps, NULL);
> -	if (IS_ERR(priv->psp->psp))
> +	psd = psp_dev_create(priv->netdev, &mlx5_psp_ops, &psp->caps, NULL);
> +	if (IS_ERR(psd)) {
>  		mlx5_core_err(priv->mdev, "PSP failed to register due to %pe\n",
> -			      priv->psp->psp);
> +			      psd);
> +		return;
> +	}
> +	psp->psp = psd;
>  }

When psp_dev_create() fails, this function now returns without setting
psp->psp, leaving it as NULL. However, priv->psp remains allocated and
non-NULL.

Does this leave the RX datapath vulnerable to a NULL pointer dereference?

If priv->psp is non-NULL, the NIC RX initialization path can still call
mlx5_accel_psp_fs_init_rx_tables(), which creates hardware flow steering
rules to intercept UDP traffic.

If a UDP packet triggers these rules, the hardware flags the CQE with
MLX5E_PSP_MARKER_BIT. The RX fast-path sees the marker and invokes
mlx5e_psp_offload_handle_rx_skb(), which dereferences the pointer
unconditionally:

u16 dev_id = priv->psp->psp->id;

Since priv->psp->psp is NULL, this will cause a kernel panic. Should
priv->psp be cleaned up, or the error propagated, to prevent flow rules
from being installed when registration fails?
-- 
pw-bot: cr

^ permalink raw reply

* Re: [PATCH net 1/6] selftests: ovpn: add nftables config dependencies for test-mark
From: patchwork-bot+netdevbpf @ 2026-04-18 19:00 UTC (permalink / raw)
  To: Antonio Quartulli
  Cc: netdev, ralf, shuah, horms, sd, kuba, pabeni, andrew+netdev,
	davem, edumazet
In-Reply-To: <20260417090305.2775723-2-antonio@openvpn.net>

Hello:

This series was applied to netdev/net.git (main)
by Antonio Quartulli <antonio@openvpn.net>:

On Fri, 17 Apr 2026 11:03:00 +0200 you wrote:
> From: Ralf Lici <ralf@mandelbit.com>
> 
> test-mark.sh installs nftables rules in an inet/filter output chain and
> verifies packet drops via nft counters. In vmksft this can fail when the
> nftables core is not enabled by the ovpn selftest config.
> 
> Add the missing kernel options required by this test:
> - CONFIG_NETFILTER
> - CONFIG_NF_TABLES
> - CONFIG_NF_TABLES_INET
> 
> [...]

Here is the summary with links:
  - [net,1/6] selftests: ovpn: add nftables config dependencies for test-mark
    https://git.kernel.org/netdev/net/c/e5fd34ab8dff
  - [net,2/6] selftests: ovpn: fail notification check on mismatch
    https://git.kernel.org/netdev/net/c/c409da0fe15e
  - [net,3/6] selftests: ovpn: flatten slurped notification JSON before filtering
    https://git.kernel.org/netdev/net/c/222e7f8d1ca3
  - [net,4/6] selftests: ovpn: add prefix to helpers and shared variables
    https://git.kernel.org/netdev/net/c/7c29665a3a3c
  - [net,5/6] selftests: ovpn: align command flow with TAP
    https://git.kernel.org/netdev/net/c/1be93bb979ab
  - [net,6/6] selftests: ovpn: serialize YNL listener startup
    https://git.kernel.org/netdev/net/c/6c9b1dc218fe

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net v2 11/12] idpf: fix xdp crash in soft reset error path
From: Jakub Kicinski @ 2026-04-18 19:00 UTC (permalink / raw)
  To: jacob.e.keller
  Cc: Jakub Kicinski, andrew+netdev, davem, edumazet, pabeni, netdev,
	emil.s.tantilov, stable, aleksandr.loktionov, patryk.holda
In-Reply-To: <20260416-iwl-net-submission-2026-04-14-v2-11-686c33c9828d@intel.com>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.

Jakub: I'll drop this patch and apply the rest.
---
idpf: fix xdp crash in soft reset error path

This commit fixes a NULL pointer dereference in idpf_vport_open() when
soft reset fails by restoring vport->xdp_prog to the old value. It also
updates the restart check to use IDPF_VPORT_UP instead of netif_running().

Cc: stable@vger.kernel.org

> diff --git a/drivers/net/ethernet/intel/idpf/xdp.c b/drivers/net/ethernet/intel/idpf/xdp.c
> --- a/drivers/net/ethernet/intel/idpf/xdp.c
> +++ b/drivers/net/ethernet/intel/idpf/xdp.c
> @@ -488,6 +488,7 @@ static int idpf_xdp_setup_prog(struct idpf_vport *vport,
>  				   "Could not reopen the vport after XDP setup");
>  
>  		cfg->user_config.xdp_prog = old;
> +		vport->xdp_prog = old;

Does this also need to roll back vport->num_xdp_txq?

Because idpf_initiate_soft_reset() overwrites the vport struct with the new
queue configuration before failing, vport->num_xdp_txq might remain > 0
while vport->xdp_prog is restored to NULL.

When the interface is brought up, idpf_xdpsqs_get() evaluates
idpf_xdp_enabled(vport) as false and returns early without allocating the
NAPI timers (xdpsq->timer).

Could another interface redirect XDP frames via ndo_xdp_xmit, bypassing the
check because vport->num_xdp_txq > 0, and cause a NULL pointer dereference
in libeth_xdpsq_queue_timer() during idpf_xdp_tx_finalize()?

>  		old = prog;
>  	}

Is this old = prog assignment leading to a double-put of the BPF program?

If idpf_initiate_soft_reset() fails, this error path drops a reference to
the newly provided prog when it calls bpf_prog_put(old) right after this
block.

Since the function returns an error code to the core network stack, won't
dev_xdp_install() in net/core/dev.c also explicitly call bpf_prog_put(prog)
assuming ownership of the error cleanup?

Could this underflow the reference count and cause a use-after-free?

[ ... ]

> diff --git a/drivers/net/ethernet/intel/idpf/xsk.c b/drivers/net/ethernet/intel/idpf/xsk.c
> --- a/drivers/net/ethernet/intel/idpf/xsk.c
> +++ b/drivers/net/ethernet/intel/idpf/xsk.c
> @@ -568,7 +569,8 @@ int idpf_xsk_pool_setup(struct idpf_vport *vport, struct netdev_bpf *bpf)
>  		return -EINVAL;
>  	}
>  
> -	restart = idpf_xdp_enabled(vport) && netif_running(vport->netdev);
> +	restart = idpf_xdp_enabled(vport) &&
> +		  test_bit(IDPF_VPORT_UP, np->state);
>  	if (!restart)
>  		goto pool;

Does this function leave the queue pair permanently disabled if the pool
setup fails?

If restart is true, the queue pair is disabled via:
    idpf_qp_switch(vport, qid, false);

If the subsequent call to libeth_xsk_setup_pool() fails, the function
returns early:

    ret = libeth_xsk_setup_pool(vport->netdev, qid, pool);
    if (ret) {
        ...
        return ret;
    }

Does this early return bypass the idpf_qp_switch(vport, qid, true) call
expected later, causing a permanent partial loss of connectivity until the
interface is manually restarted?

^ permalink raw reply

* Re: [PATCH net 0/6] pull request: fixes for ovpn 2026-04-17
From: Jakub Kicinski @ 2026-04-18 18:54 UTC (permalink / raw)
  To: Antonio Quartulli
  Cc: netdev, ralf, shuah, horms, Sabrina Dubroca, Paolo Abeni,
	Andrew Lunn, David S. Miller, Eric Dumazet
In-Reply-To: <20260417090305.2775723-1-antonio@openvpn.net>

On Fri, 17 Apr 2026 11:02:59 +0200 Antonio Quartulli wrote:
> This is a respin of the PR I originally sent against net-next + an extra
> fix (patch 6).
> 
> Please note that this patch:
> https://lore.kernel.org/all/20260225010833.11301-1-liuhangbin@gmail.com/
> broke the selftests entirely due to the switch from sh to bash.
> 
> There are new commits in the kselftest tree which take care of this:
> https://lore.kernel.org/all/20260416-selftest-fix-readlink-e-v1-0-94e4cabbdec4@kernel.org/
> but they are not in net yet, therefore you won't be able to test/run
> our kselftests for now.

It does work for us, FWIW, maybe because we run tests with make
run_tests. There were some entirely unnecessary changes to ktap 
output which broke our systems but we patched around them :/

> TCP tests are still failing every now and then.
> It seems that sometimes a single ping over a TCP tunnel is lost,
> thus making the selftest fail.

They seem to fail for us around 50% of the time on debug kernel
builds. What's your repro rate?

> We believe this is a bug in ovpn which we are currently hunting down.
> So it's nothing wrong about the tests (they are actually doing their
> job!).

FWIW one of today's runs hit this:

https://netdev-ctrl.bots.linux.dev/logs/vmksft/net-extra-dbg/results/608740/3-test-symmetric-id-tcp-sh/stderr
decoded:
https://netdev-ctrl.bots.linux.dev/logs/vmksft/net-extra-dbg/results/608740/vm-crash-thr0-0

In any case - test_mark.sh looks good now, so I'll take it out of 
the ignored list. Thanks! 2 more to go? :)

^ permalink raw reply

* Re: [PATCH net 00/14] tcp: take care of tcp_get_timestamping_opt_stats() races
From: patchwork-bot+netdevbpf @ 2026-04-18 18:50 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: davem, kuba, pabeni, horms, ncardwell, kuniyu, netdev,
	eric.dumazet
In-Reply-To: <20260416200319.3608680-1-edumazet@google.com>

Hello:

This series was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Thu, 16 Apr 2026 20:03:05 +0000 you wrote:
> tcp_get_timestamping_opt_stats() does not own the socket lock,
> this is intentional.
> 
> It calls tcp_get_info_chrono_stats() while other threads could
> change chrono fields in tcp_chrono_set(). It also reads many
> tcp socket fields that can be modified by other cpus/threads.
> 
> [...]

Here is the summary with links:
  - [net,01/14] tcp: annotate data-races in tcp_get_info_chrono_stats()
    https://git.kernel.org/netdev/net/c/267bf3cf9a6f
  - [net,02/14] tcp: add data-race annotations around tp->data_segs_out and tp->total_retrans
    https://git.kernel.org/netdev/net/c/21e92a38cfd8
  - [net,03/14] tcp: add data-races annotations around tp->reordering, tp->snd_cwnd
    https://git.kernel.org/netdev/net/c/829ba1f329cb
  - [net,04/14] tcp: annotate data-races around tp->snd_ssthresh
    https://git.kernel.org/netdev/net/c/fd571afb05eb
  - [net,05/14] tcp: annotate data-races around tp->delivered and tp->delivered_ce
    https://git.kernel.org/netdev/net/c/faa886ad3ce5
  - [net,06/14] tcp: add data-race annotations for TCP_NLA_SNDQ_SIZE
    https://git.kernel.org/netdev/net/c/124199444de4
  - [net,07/14] tcp: annotate data-races around tp->bytes_sent
    https://git.kernel.org/netdev/net/c/ee43e957ce2e
  - [net,08/14] tcp: annotate data-races around tp->bytes_retrans
    https://git.kernel.org/netdev/net/c/5efc7b9f7cbd
  - [net,09/14] tcp: annotate data-races around tp->dsack_dups
    https://git.kernel.org/netdev/net/c/a984705ca88b
  - [net,10/14] tcp: annotate data-races around tp->reord_seen
    https://git.kernel.org/netdev/net/c/62585690e6b2
  - [net,11/14] tcp: annotate data-races around tp->srtt_us
    https://git.kernel.org/netdev/net/c/290b693ce7c9
  - [net,12/14] tcp: annotate data-races around tp->timeout_rehash
    https://git.kernel.org/netdev/net/c/71c675358b71
  - [net,13/14] tcp: annotate data-races around (tp->write_seq - tp->snd_nxt)
    https://git.kernel.org/netdev/net/c/3a63b3d16056
  - [net,14/14] tcp: annotate data-races around tp->plb_rehash
    https://git.kernel.org/netdev/net/c/9e89b9d03a2d

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* [PATCH net v2 2/2] selftests/bpf: check epoll readiness after reuseport migration
From: Zhenzhong Wu @ 2026-04-18 18:13 UTC (permalink / raw)
  To: netdev
  Cc: edumazet, ncardwell, kuniyu, davem, dsahern, kuba, pabeni, horms,
	shuah, tamird, linux-kernel, linux-kselftest, Zhenzhong Wu
In-Reply-To: <20260418181333.1713389-1-jt26wzz@gmail.com>

After migrate_dance() moves established children to the target
listener, add it to an epoll set and verify that epoll_wait(..., 0)
reports it ready before accept().

This adds epoll coverage for the TCP_ESTABLISHED reuseport migration
case in migrate_reuseport.

Keep the check limited to TCP_ESTABLISHED cases. TCP_SYN_RECV and
TCP_NEW_SYN_RECV still depend on asynchronous handshake completion,
so a zero-timeout epoll_wait() would race there.

Signed-off-by: Zhenzhong Wu <jt26wzz@gmail.com>
---
 .../bpf/prog_tests/migrate_reuseport.c        | 32 ++++++++++++++++++-
 1 file changed, 31 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/migrate_reuseport.c b/tools/testing/selftests/bpf/prog_tests/migrate_reuseport.c
index 653b0a20f..580a53424 100644
--- a/tools/testing/selftests/bpf/prog_tests/migrate_reuseport.c
+++ b/tools/testing/selftests/bpf/prog_tests/migrate_reuseport.c
@@ -18,13 +18,16 @@
  *   9. call shutdown() for the second server
  *        and migrate the requests in the accept queue
  *        to the last server socket.
- *  10. call accept() for the last server socket.
+ *  10. for TCP_ESTABLISHED cases, call epoll_wait(..., 0)
+ *        for the last server socket.
+ *  11. call accept() for the last server socket.
  *
  * Author: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
  */
 
 #include <bpf/bpf.h>
 #include <bpf/libbpf.h>
+#include <sys/epoll.h>
 
 #include "test_progs.h"
 #include "test_migrate_reuseport.skel.h"
@@ -522,6 +525,33 @@ static void run_test(struct migrate_reuseport_test_case *test_case,
 			goto close_clients;
 	}
 
+	/* Only TCP_ESTABLISHED has already-migrated accept-queue entries
+	 * here.  Later states still depend on follow-up handshake work.
+	 */
+	if (test_case->state == BPF_TCP_ESTABLISHED) {
+		struct epoll_event ev = {
+			.events = EPOLLIN,
+		};
+		int epfd;
+		int nfds;
+
+		epfd = epoll_create1(EPOLL_CLOEXEC);
+		if (!ASSERT_NEQ(epfd, -1, "epoll_create1"))
+			goto close_clients;
+
+		ev.data.fd = test_case->servers[MIGRATED_TO];
+		if (!ASSERT_OK(epoll_ctl(epfd, EPOLL_CTL_ADD,
+					 test_case->servers[MIGRATED_TO], &ev),
+			       "epoll_ctl"))
+			goto close_epfd;
+
+		nfds = epoll_wait(epfd, &ev, 1, 0);
+		ASSERT_EQ(nfds, 1, "epoll_wait");
+
+close_epfd:
+		close(epfd);
+	}
+
 	count_requests(test_case, skel);
 
 close_clients:
-- 
2.43.0


^ permalink raw reply related

* [PATCH net v2 1/2] tcp: call sk_data_ready() after listener migration
From: Zhenzhong Wu @ 2026-04-18 18:13 UTC (permalink / raw)
  To: netdev
  Cc: edumazet, ncardwell, kuniyu, davem, dsahern, kuba, pabeni, horms,
	shuah, tamird, linux-kernel, linux-kselftest, Zhenzhong Wu,
	stable
In-Reply-To: <20260418181333.1713389-1-jt26wzz@gmail.com>

When inet_csk_listen_stop() migrates an established child socket from
a closing listener to another socket in the same SO_REUSEPORT group,
the target listener gets a new accept-queue entry via
inet_csk_reqsk_queue_add(), but that path never notifies the target
listener's waiters. A nonblocking accept() still works because it
checks the queue directly, but poll()/epoll_wait() waiters and
blocking accept() callers can also remain asleep indefinitely.

Call READ_ONCE(nsk->sk_data_ready)(nsk) after a successful migration
in inet_csk_listen_stop().

However, after inet_csk_reqsk_queue_add() succeeds, the ref acquired
in reuseport_migrate_sock() is effectively transferred to
nreq->rsk_listener. Another CPU can then dequeue nreq via accept()
or listener shutdown, hit reqsk_put(), and drop that listener ref.
Since listeners are SOCK_RCU_FREE, wrap the post-queue_add()
dereferences of nsk in rcu_read_lock()/rcu_read_unlock(), which also
covers the existing sock_net(nsk) access in that path.

The reqsk_timer_handler() path does not need the same changes for two
reasons: half-open requests become readable only after the final ACK,
where tcp_child_process() already wakes the listener; and once nreq is
visible via inet_ehash_insert(), the success path no longer touches
nsk directly.

Fixes: 54b92e841937 ("tcp: Migrate TCP_ESTABLISHED/TCP_SYN_RECV sockets in accept queues.")
Cc: stable@vger.kernel.org
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Zhenzhong Wu <jt26wzz@gmail.com>
---
 net/ipv4/inet_connection_sock.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 4ac3ae1bc..928654c34 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -1479,16 +1479,19 @@ void inet_csk_listen_stop(struct sock *sk)
 			if (nreq) {
 				refcount_set(&nreq->rsk_refcnt, 1);
 
+				rcu_read_lock();
 				if (inet_csk_reqsk_queue_add(nsk, nreq, child)) {
 					__NET_INC_STATS(sock_net(nsk),
 							LINUX_MIB_TCPMIGRATEREQSUCCESS);
 					reqsk_migrate_reset(req);
+					READ_ONCE(nsk->sk_data_ready)(nsk);
 				} else {
 					__NET_INC_STATS(sock_net(nsk),
 							LINUX_MIB_TCPMIGRATEREQFAILURE);
 					reqsk_migrate_reset(nreq);
 					__reqsk_free(nreq);
 				}
+				rcu_read_unlock();
 
 				/* inet_csk_reqsk_queue_add() has already
 				 * called inet_child_forget() on failure case.
-- 
2.43.0


^ permalink raw reply related

* [PATCH net v2 0/2] tcp: fix listener wakeup after reuseport migration
From: Zhenzhong Wu @ 2026-04-18 18:13 UTC (permalink / raw)
  To: netdev
  Cc: edumazet, ncardwell, kuniyu, davem, dsahern, kuba, pabeni, horms,
	shuah, tamird, linux-kernel, linux-kselftest, Zhenzhong Wu

This series fixes a missing wakeup when inet_csk_listen_stop() migrates
an established child socket from a closing listener to another socket
in the same SO_REUSEPORT group after the child has already been queued
for accept.

The target listener receives the migrated accept-queue entry via
inet_csk_reqsk_queue_add(), but its waiters are not notified.
Nonblocking accept() still succeeds because it checks the accept queue
directly, but readiness-based waiters can remain asleep until another
connection generates a wakeup.

Patch 1 notifies the target listener after a successful migration in
inet_csk_listen_stop() and protects the post-queue_add() nsk accesses
with rcu_read_lock()/rcu_read_unlock().

Patch 2 extends the existing migrate_reuseport BPF selftest with an
epoll readiness check for the TCP_ESTABLISHED migration case.

Testing:
- On a patched kernel booted under QEMU, the full migrate_reuseport
  selftest passes with SELFTEST_RC=0.

---

v2:
- wrap the post-queue_add() nsk dereferences with rcu_read_lock()/
  rcu_read_unlock() to prevent a potential UAF (Eric Dumazet)
- extend tools/testing/selftests/bpf/prog_tests/migrate_reuseport.c
  instead of adding standalone net selftests (Kuniyuki Iwashima)
- limit the epoll readiness check to TCP_ESTABLISHED cases

v1:
  https://lore.kernel.org/netdev/20260418041633.691435-1-jt26wzz@gmail.com/

Zhenzhong Wu (2):
  tcp: call sk_data_ready() after listener migration
  selftests/bpf: check epoll readiness after reuseport migration

 net/ipv4/inet_connection_sock.c               |  3 ++
 .../bpf/prog_tests/migrate_reuseport.c        | 32 ++++++++++++++++++-
 2 files changed, 34 insertions(+), 1 deletion(-)


base-commit: 52bcb57a4e8a0865a76c587c2451906342ae1b2d
-- 
2.43.0

^ permalink raw reply

* Re: [PATCH net] net: dsa: mt7530: fix .get_stats64 sleeping in atomic context
From: Jakub Kicinski @ 2026-04-18 18:01 UTC (permalink / raw)
  To: daniel
  Cc: Jakub Kicinski, chester.a.unal, andrew, olteanv, davem, edumazet,
	pabeni, matthias.bgg, angelogioacchino.delregno, linux,
	ansuelsmth, netdev, linux-kernel, linux-arm-kernel,
	linux-mediatek, frank-w, john
In-Reply-To: <79dc0ec5b6be698b14cb66339d6f63033ca2934a.1776397542.git.daniel@makrotopia.org>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
Subject: net: dsa: mt7530: fix .get_stats64 sleeping in atomic context

This commit addresses a sleep-in-atomic bug when reading MDIO-connected
switch port statistics by delegating the register reads to a delayed
workqueue. The get_stats64 callback now returns cached values protected
by a spinlock instead of synchronously reading from the MDIO bus.
MMIO-connected switches are unaffected and continue to read counters
directly.

> diff --git a/drivers/net/dsa/mt7530.c b/drivers/net/dsa/mt7530.c
> index b9423389c2ef0..786d3a8492bcb 100644
> --- a/drivers/net/dsa/mt7530.c
> +++ b/drivers/net/dsa/mt7530.c

[ ... ]

> @@ -951,6 +952,43 @@ static void mt7530_get_stats64(struct dsa_switch *ds, int port,
>  			       &storage->rx_crc_errors);
>  }
>  
> +static void mt7530_stats_poll(struct work_struct *work)
> +{
> +	struct mt7530_priv *priv = container_of(work, struct mt7530_priv,
> +						stats_work.work);
> +	struct rtnl_link_stats64 stats = {};
> +	struct dsa_port *dp;
> +	int port;
> +
> +	dsa_switch_for_each_user_port(dp, priv->ds) {
> +		port = dp->index;
> +
> +		mt7530_read_port_stats64(priv, port, &stats);
> +
> +		spin_lock(&priv->stats_lock);

Are we at risk of a deadlock here?

The get_stats64 callback can be called by dev_get_stats(), which runs in
softirq context.

If this delayed work runs in process context and is preempted by a softirq
that attempts to acquire the lock via dev_get_stats(), a deadlock could
occur. Both paths should likely use spin_lock_bh() and spin_unlock_bh().

> +		priv->ports[port].stats = stats;
> +		spin_unlock(&priv->stats_lock);
> +	}
> +
> +	schedule_delayed_work(&priv->stats_work,
> +			      MT7530_STATS_POLL_INTERVAL);
> +}
> +
> +static void mt7530_get_stats64(struct dsa_switch *ds, int port,
> +			       struct rtnl_link_stats64 *storage)
> +{
> +	struct mt7530_priv *priv = ds->priv;
> +
> +	if (priv->bus) {
> +		spin_lock(&priv->stats_lock);
> +		*storage = priv->ports[port].stats;
> +		spin_unlock(&priv->stats_lock);
> +		mod_delayed_work(system_wq, &priv->stats_work, 0);

Could this unconditional background refresh cause a denial of service on
the MDIO bus?

If an unprivileged user continually reads network statistics in a tight
loop, mt7530_stats_poll() will execute continuously, potentially starving
other MDIO tasks like PHY link-state monitoring.

Additionally, is it possible for this call to re-arm the workqueue during
teardown?

In mt7530_remove_common(), cancel_delayed_work_sync() is called before
dsa_unregister_switch(). If get_stats64() is called during or after
dsa_unregister_switch(), it would re-arm the workqueue after it was
explicitly cancelled, potentially causing a use-after-free when priv is
freed.

> +	} else {
> +		mt7530_read_port_stats64(priv, port, storage);
> +	}
> +}

[ ... ]

> @@ -3137,6 +3175,13 @@ mt753x_setup(struct dsa_switch *ds)
>  	if (ret && priv->irq_domain)
>  		mt7530_free_mdio_irq(priv);
>  
> +	if (!ret && priv->bus) {
> +		spin_lock_init(&priv->stats_lock);
> +		INIT_DELAYED_WORK(&priv->stats_work, mt7530_stats_poll);
> +		schedule_delayed_work(&priv->stats_work,
> +				      MT7530_STATS_POLL_INTERVAL);

If dsa_register_switch() encounters an error after this setup callback
completes successfully, the DSA core will attempt to clean up.

Since mt7530_switch_ops does not implement a teardown callback to cancel
this delayed work, will the work remain scheduled and eventually
dereference freed priv memory once the driver core frees it via devres?

> +	}
> +
>  	return ret;
>  }
-- 
pw-bot: cr

^ permalink raw reply

* Re: [PATCH nf,v3] netfilter: nat: use kfree_rcu to release ops
From: Jakub Kicinski @ 2026-04-18 17:58 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel, fw, netdev
In-Reply-To: <20260417101132.379848-1-pablo@netfilter.org>

On Fri, 17 Apr 2026 12:11:31 +0200 Pablo Neira Ayuso wrote:
> Subject: [PATCH nf,v3] netfilter: nat: use kfree_rcu to release ops

Out of curiosity - why are you CCing netdev on this?

^ permalink raw reply

* Re: [PATCH net v2] ibmveth: Disable GSO for packets with small MSS
From: Jakub Kicinski @ 2026-04-18 17:54 UTC (permalink / raw)
  To: mmc
  Cc: Jakub Kicinski, netdev, davem, edumazet, pabeni, horms, bjking1,
	haren, ricklind, maddy, mpe, linuxppc-dev, stable, shaik.abdulla1,
	naveedaus
In-Reply-To: <20260417172910.81433-1-mmc@linux.ibm.com>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
ibmveth: Disable GSO for packets with small MSS

This commit adds an ndo_features_check callback to the ibmveth driver to
disable segmentation offload when the MSS is less than 224 bytes. This
prevents physical adapters in SEA configurations from freezing when given
small MSS packets.

> diff --git a/drivers/net/ethernet/ibm/ibmveth.c b/drivers/net/ethernet/ibm/ibmveth.c
> --- a/drivers/net/ethernet/ibm/ibmveth.c
> +++ b/drivers/net/ethernet/ibm/ibmveth.c

[ ... ]

> @@ -1756,6 +1756,25 @@ static int ibmveth_set_mac_addr(struct net_device *dev, void *p)
>  	return 0;
>  }
>  
> +static netdev_features_t ibmveth_features_check(struct sk_buff *skb,
> +						struct net_device *dev,
> +						netdev_features_t features)
> +{
> +	/* Some physical adapters do not support segmentation offload with
> +	 * MSS < 224. Disable GSO for such packets to avoid adapter freeze.
> +	 */
> +	if (skb_is_gso(skb)) {

The commit message mentions this matches the ibmvnic hardware errata from
commit f10b09ef687f. That commit disabled GSO for both gso_size < 224 and
gso_segs == 1.

By omitting the check for skb_shinfo(skb)->gso_segs == 1, could a
single-segment GSO packet still cause the adapter to freeze?

> +		if (skb_shinfo(skb)->gso_size < IBMVETH_MIN_LSO_MSS) {
> +			netdev_warn_once(dev,
> +					 "MSS %u too small for LSO, disabling GSO\n",
> +					 skb_shinfo(skb)->gso_size);
> +			features &= ~NETIF_F_GSO_MASK;
> +		}
> +	}
> +
> +	return features;

When implementing a custom ndo_features_check callback, netif_skb_features()
will bypass the standard dflt_features_check() function. Does this mean
vlan_features_check() is no longer called for this driver?

If vlan_features_check() is skipped, multi-tagged (QinQ) VLAN packets
might retain GSO and protocol-specific checksum offloads that the hardware
parser cannot handle.

Should this return vlan_features_check(skb, features) instead?

> +}
-- 
pw-bot: cr

^ permalink raw reply

* Re: [PATCH net-next] r8169: report per-queue statistics through netdev qstats
From: Jakub Kicinski @ 2026-04-18 17:50 UTC (permalink / raw)
  To: Gustavo Arantes
  Cc: Heiner Kallweit, nic_swsd, Andrew Lunn, David S . Miller,
	Eric Dumazet, Paolo Abeni, netdev, linux-kernel
In-Reply-To: <20260418021232.5425-1-dev.gustavoa@gmail.com>

On Fri, 17 Apr 2026 23:12:32 -0300 Gustavo Arantes wrote:
> r8169 maintains synchronized per-CPU software counters for packet and byte
> accounting, but does not expose them through the netdev qstats interface.
> 
> Add netdev_stat_ops callbacks and report the existing software counters
> through queue 0 for both Rx and Tx. Provide zero base stats so device-scope
> qstats report the packet and byte counters as supported and match the
> existing RTNL statistics.

It'd be good to include the motivation / backstory that led you to
implementing this when you repost this after the merge window.
Benefit of having qstats with a single queue may not be obvious.

^ permalink raw reply

* Re: [PATCH v12 net-next 0/5] psp: Add support for dev-assoc/disassoc
From: Jakub Kicinski @ 2026-04-18 17:43 UTC (permalink / raw)
  To: Wei Wang
  Cc: netdev, Daniel Zahka, Willem de Bruijn, David Wei, Andrew Lunn,
	David S . Miller, Eric Dumazet, Simon Horman, Paolo Abeni,
	Wei Wang
In-Reply-To: <20260418170056.3490525-1-weibunny.kernel@gmail.com>

On Sat, 18 Apr 2026 10:00:50 -0700 Wei Wang wrote:
> The main purpose of this feature is to associate virtual devices like
> veth or netkit with a real PSP device, so we could provide PSP
> functionality to the application running with virtual devices.

## Form letter - net-next-closed

We have already submitted our pull request with net-next material for v7.1,
and therefore net-next is closed for new drivers, features, code refactoring
and optimizations. We are currently accepting bug fixes only.

Please repost when net-next reopens after Apr 27th.

RFC patches sent for review only are obviously welcome at any time.

See: https://www.kernel.org/doc/html/next/process/maintainer-netdev.html#development-cycle
-- 
pw-bot: defer
pv-bot: closed


^ permalink raw reply

* [PATCH v12 net-next 5/5] selftest/net: psp: Add test for dev-assoc/disassoc
From: Wei Wang @ 2026-04-18 17:00 UTC (permalink / raw)
  To: netdev, Jakub Kicinski, Daniel Zahka, Willem de Bruijn, David Wei,
	Andrew Lunn, David S . Miller, Eric Dumazet, Simon Horman,
	Paolo Abeni
  Cc: Wei Wang
In-Reply-To: <20260418170056.3490525-1-weibunny.kernel@gmail.com>

From: Wei Wang <weibunny@fb.com>

Add a new param to NetDrvContEnv to add an additional bpf redirect
program on nk_host to redirect traffic to the psp_dev_local.
The topology looks like this:
  Host NS:  psp_dev_local <---> nk_host
                |                |
                |                | (netkit pair)
                |                |
  Remote NS: psp_dev_peer      Guest NS: nk_guest
             (responder)             (PSP tests)

Add following tests for dev-assoc/dev-disassoc functionality:
1. Test the output of `./tools/net/ynl/pyynl/cli.py --spec
Documentation/netlink/specs/psp.yaml --dump dev-get` in both default and
the guest netns.
2. Test the case where we associate netkit with psp_dev_local, and
send PSP traffic from nk_guest to psp_dev_peer in 2 different netns.
3. Test to make sure the key rotation notification is sent to the netns
for associated dev as well
4. Test to make sure the dev change notification is sent to the netns
for associated dev as well
5. Test for dev-assoc/dev-disassoc without nsid parameter.
6. Test the deletion of nk_guest in client netns, and proper cleanup in
the assoc-list for psp dev.

Signed-off-by: Wei Wang <weibunny@fb.com>
---
 tools/testing/selftests/drivers/net/config    |   1 +
 .../selftests/drivers/net/lib/py/env.py       |  54 ++-
 tools/testing/selftests/drivers/net/psp.py    | 457 ++++++++++++++++--
 3 files changed, 478 insertions(+), 34 deletions(-)

diff --git a/tools/testing/selftests/drivers/net/config b/tools/testing/selftests/drivers/net/config
index fd16994366f4..b8a559b360e4 100644
--- a/tools/testing/selftests/drivers/net/config
+++ b/tools/testing/selftests/drivers/net/config
@@ -8,5 +8,6 @@ CONFIG_NETCONSOLE=m
 CONFIG_NETCONSOLE_DYNAMIC=y
 CONFIG_NETCONSOLE_EXTENDED_LOG=y
 CONFIG_NETDEVSIM=m
+CONFIG_NETKIT=y
 CONFIG_VLAN_8021Q=m
 CONFIG_XDP_SOCKETS=y
diff --git a/tools/testing/selftests/drivers/net/lib/py/env.py b/tools/testing/selftests/drivers/net/lib/py/env.py
index 24ce122abd9c..cd3a08cbe968 100644
--- a/tools/testing/selftests/drivers/net/lib/py/env.py
+++ b/tools/testing/selftests/drivers/net/lib/py/env.py
@@ -2,6 +2,7 @@
 
 import ipaddress
 import os
+import re
 import time
 import json
 from pathlib import Path
@@ -336,7 +337,7 @@ class NetDrvContEnv(NetDrvEpEnv):
               +---------------+
     """
 
-    def __init__(self, src_path, rxqueues=1, **kwargs):
+    def __init__(self, src_path, rxqueues=1, install_tx_redirect_bpf=False, **kwargs):
         self.netns = None
         self._nk_host_ifname = None
         self._nk_guest_ifname = None
@@ -347,6 +348,8 @@ class NetDrvContEnv(NetDrvEpEnv):
         self._init_ns_attached = False
         self._old_fwd = None
         self._old_accept_ra = None
+        self._nk_host_tc_attached = False
+        self._nk_host_bpf_prog_pref = None
 
         super().__init__(src_path, **kwargs)
 
@@ -397,7 +400,13 @@ class NetDrvContEnv(NetDrvEpEnv):
         self._setup_ns()
         self._attach_bpf()
 
+        if install_tx_redirect_bpf:
+            self._attach_tx_redirect_bpf()
+
     def __del__(self):
+        if self._nk_host_tc_attached:
+            cmd(f"tc filter del dev {self._nk_host_ifname} ingress pref {self._nk_host_bpf_prog_pref}", fail=False)
+            self._nk_host_tc_attached = False
         if self._tc_attached:
             cmd(f"tc filter del dev {self.ifname} ingress pref {self._bpf_prog_pref}")
             self._tc_attached = False
@@ -505,3 +514,46 @@ class NetDrvContEnv(NetDrvEpEnv):
         value = ipv6_bytes + ifindex_bytes
         value_hex = ' '.join(f'{b:02x}' for b in value)
         bpftool(f"map update id {bss_map_id} key hex 00 00 00 00 value hex {value_hex}")
+
+    def _attach_tx_redirect_bpf(self):
+        """
+        Attach BPF program on nk_host ingress to redirect TX traffic.
+
+        Packets from nk_guest destined for the nsim network arrive at nk_host
+        via the netkit pair. This BPF program redirects them to the physical
+        interface so they can reach the remote peer.
+        """
+        bpf_obj = self.test_dir / "nk_redirect.bpf.o"
+        if not bpf_obj.exists():
+            raise KsftSkipEx("BPF prog nk_redirect.bpf.o not found")
+
+        cmd(f"tc qdisc add dev {self._nk_host_ifname} clsact")
+
+        cmd(f"tc filter add dev {self._nk_host_ifname} ingress bpf obj {bpf_obj} sec tc/ingress direct-action")
+        self._nk_host_tc_attached = True
+
+        tc_info = cmd(f"tc filter show dev {self._nk_host_ifname} ingress").stdout
+        match = re.search(r'pref (\d+).*nk_redirect\.bpf.*id (\d+)', tc_info)
+        if not match:
+            raise Exception("Failed to get TX redirect BPF prog ID")
+        self._nk_host_bpf_prog_pref = int(match.group(1))
+        nk_host_bpf_prog_id = int(match.group(2))
+
+        prog_info = bpftool(f"prog show id {nk_host_bpf_prog_id}", json=True)
+        map_ids = prog_info.get("map_ids", [])
+
+        bss_map_id = None
+        for map_id in map_ids:
+            map_info = bpftool(f"map show id {map_id}", json=True)
+            if map_info.get("name").endswith("bss"):
+                bss_map_id = map_id
+
+        if bss_map_id is None:
+            raise Exception("Failed to find TX redirect BPF .bss map")
+
+        ipv6_addr = ipaddress.IPv6Address(self.nsim_v6_pfx)
+        ipv6_bytes = ipv6_addr.packed
+        ifindex_bytes = self.ifindex.to_bytes(4, byteorder='little')
+        value = ipv6_bytes + ifindex_bytes
+        value_hex = ' '.join(f'{b:02x}' for b in value)
+        bpftool(f"map update id {bss_map_id} key hex 00 00 00 00 value hex {value_hex}")
diff --git a/tools/testing/selftests/drivers/net/psp.py b/tools/testing/selftests/drivers/net/psp.py
index 864d9fce1094..79da4d425c50 100755
--- a/tools/testing/selftests/drivers/net/psp.py
+++ b/tools/testing/selftests/drivers/net/psp.py
@@ -5,6 +5,7 @@
 
 import errno
 import fcntl
+import os
 import socket
 import struct
 import termios
@@ -14,9 +15,12 @@ from lib.py import defer
 from lib.py import ksft_run, ksft_exit, ksft_pr
 from lib.py import ksft_true, ksft_eq, ksft_ne, ksft_gt, ksft_raises
 from lib.py import ksft_not_none
-from lib.py import KsftSkipEx
-from lib.py import NetDrvEpEnv, PSPFamily, NlError
+from lib.py import ksft_variants, KsftNamedVariant
+from lib.py import KsftSkipEx, KsftFailEx
+from lib.py import NetDrvEpEnv, NetDrvContEnv, PSPFamily, NlError
+from lib.py import NetNSEnter
 from lib.py import bkg, rand_port, wait_port_listen
+from lib.py import ip
 
 
 def _get_outq(s):
@@ -117,11 +121,13 @@ def _get_stat(cfg, key):
 # Test case boiler plate
 #
 
-def _init_psp_dev(cfg):
+def _init_psp_dev(cfg, use_psp_ifindex=False):
     if not hasattr(cfg, 'psp_dev_id'):
         # Figure out which local device we are testing against
+        # For NetDrvContEnv: use psp_ifindex instead of ifindex
+        target_ifindex = cfg.psp_ifindex if use_psp_ifindex else cfg.ifindex
         for dev in cfg.pspnl.dev_get({}, dump=True):
-            if dev['ifindex'] == cfg.ifindex:
+            if dev['ifindex'] == target_ifindex:
                 cfg.psp_info = dev
                 cfg.psp_dev_id = cfg.psp_info['id']
                 break
@@ -394,6 +400,297 @@ def _data_basic_send(cfg, version, ipver):
     _close_psp_conn(cfg, s)
 
 
+def _data_basic_send_netkit_psp_assoc(cfg, version, ipver):
+    """
+    Test basic data send with netkit interface associated with PSP dev.
+    """
+
+    _init_psp_dev(cfg, True)
+    psp_dev_id_for_assoc = cfg.psp_dev_id
+
+    # Associate PSP device with nk_guest interface (in guest namespace)
+    nk_guest_dev = ip(f"link show dev {cfg._nk_guest_ifname}", json=True, ns=cfg.netns)[0]
+    nk_guest_ifindex = nk_guest_dev['ifindex']
+
+    cfg.pspnl.dev_assoc({'id': psp_dev_id_for_assoc, 'ifindex': nk_guest_ifindex, 'nsid': cfg.psp_dev_peer_nsid})
+
+    # Check if assoc-list contains nk_guest
+    dev_info = cfg.pspnl.dev_get({'id': psp_dev_id_for_assoc})
+
+    if 'assoc-list' in dev_info:
+        found = False
+        for assoc in dev_info['assoc-list']:
+            if assoc['ifindex'] == nk_guest_ifindex and assoc['nsid'] == cfg.psp_dev_peer_nsid:
+                found = True
+                break
+        ksft_true(found, "Associated device not found in dev_get() response")
+    else:
+        raise RuntimeError("No assoc-list in dev_get() response after association")
+
+    # Enter guest namespace (netns) to run PSP test
+    with NetNSEnter(cfg.netns.name):
+        cfg.pspnl = PSPFamily()
+
+        s = _make_psp_conn(cfg, version, ipver)
+
+        rx_assoc = cfg.pspnl.rx_assoc({"version": version,
+                                       "dev-id": cfg.psp_dev_id,
+                                       "sock-fd": s.fileno()})
+        rx = rx_assoc['rx-key']
+        tx = _spi_xchg(s, rx)
+
+        cfg.pspnl.tx_assoc({"dev-id": cfg.psp_dev_id,
+                            "version": version,
+                            "tx-key": tx,
+                            "sock-fd": s.fileno()})
+
+        data_len = _send_careful(cfg, s, 100)
+        _check_data_rx(cfg, data_len)
+        _close_psp_conn(cfg, s)
+
+    # Clean up - back in host namespace
+    cfg.pspnl = PSPFamily()
+    cfg.pspnl.dev_disassoc({'id': psp_dev_id_for_assoc, 'ifindex': nk_guest_ifindex, 'nsid': cfg.psp_dev_peer_nsid})
+
+    del cfg.psp_dev_id
+    del cfg.psp_info
+
+
+def _key_rotation_notify_multi_ns_netkit(cfg):
+    """ Test key rotation notifications across multiple namespaces using netkit """
+    _init_psp_dev(cfg, True)
+    psp_dev_id_for_assoc = cfg.psp_dev_id
+
+    # Associate PSP device with nk_guest interface (in guest namespace)
+    nk_guest_dev = ip(f"link show dev {cfg._nk_guest_ifname}", json=True, ns=cfg.netns)[0]
+    nk_guest_ifindex = nk_guest_dev['ifindex']
+
+    cfg.pspnl.dev_assoc({'id': psp_dev_id_for_assoc, 'ifindex': nk_guest_ifindex, 'nsid': cfg.psp_dev_peer_nsid})
+
+    # Create listener in guest namespace; socket stays bound to that ns
+    with NetNSEnter(cfg.netns.name):
+        peer_pspnl = PSPFamily()
+        peer_pspnl.ntf_subscribe('use')
+
+    # Create listener in main namespace
+    main_pspnl = PSPFamily()
+    main_pspnl.ntf_subscribe('use')
+
+    # Trigger key rotation on the PSP device
+    cfg.pspnl.key_rotate({"id": psp_dev_id_for_assoc})
+
+    # Poll both sockets from main thread
+    for pspnl, label in [(main_pspnl, "main"), (peer_pspnl, "guest")]:
+        for i in range(100):
+            pspnl.check_ntf()
+
+            try:
+                msg = pspnl.async_msg_queue.get_nowait()
+                break
+            except Exception:
+                pass
+
+            time.sleep(0.1)
+        else:
+            raise KsftFailEx(f"No key rotation notification received in {label} namespace")
+
+        ksft_true(msg['msg'].get('id') == psp_dev_id_for_assoc,
+                  f"Key rotation notification for correct device not found in {label} namespace")
+
+    # Clean up
+    cfg.pspnl.dev_disassoc({'id': psp_dev_id_for_assoc, 'ifindex': nk_guest_ifindex, 'nsid': cfg.psp_dev_peer_nsid})
+    del cfg.psp_dev_id
+    del cfg.psp_info
+
+
+def _dev_change_notify_multi_ns_netkit(cfg):
+    """ Test dev_change notifications across multiple namespaces using netkit """
+    _init_psp_dev(cfg, True)
+    psp_dev_id_for_assoc = cfg.psp_dev_id
+
+    # Associate PSP device with nk_guest interface (in guest namespace)
+    nk_guest_dev = ip(f"link show dev {cfg._nk_guest_ifname}", json=True, ns=cfg.netns)[0]
+    nk_guest_ifindex = nk_guest_dev['ifindex']
+
+    cfg.pspnl.dev_assoc({'id': psp_dev_id_for_assoc, 'ifindex': nk_guest_ifindex, 'nsid': cfg.psp_dev_peer_nsid})
+
+    # Create listener in guest namespace; socket stays bound to that ns
+    with NetNSEnter(cfg.netns.name):
+        peer_pspnl = PSPFamily()
+        peer_pspnl.ntf_subscribe('mgmt')
+
+    # Create listener in main namespace
+    main_pspnl = PSPFamily()
+    main_pspnl.ntf_subscribe('mgmt')
+
+    # Trigger dev_change by calling dev_set (notification is always sent)
+    cfg.pspnl.dev_set({'id': psp_dev_id_for_assoc, 'psp-versions-ena': cfg.psp_info['psp-versions-cap']})
+
+    # Poll both sockets from main thread
+    for pspnl, label in [(main_pspnl, "main"), (peer_pspnl, "guest")]:
+        for i in range(100):
+            pspnl.check_ntf()
+
+            try:
+                msg = pspnl.async_msg_queue.get_nowait()
+                break
+            except Exception:
+                pass
+
+            time.sleep(0.1)
+        else:
+            raise KsftFailEx(f"No dev_change notification received in {label} namespace")
+
+        ksft_true(msg['msg'].get('id') == psp_dev_id_for_assoc,
+                  f"Dev_change notification for correct device not found in {label} namespace")
+
+    # Clean up
+    cfg.pspnl.dev_disassoc({'id': psp_dev_id_for_assoc, 'ifindex': nk_guest_ifindex, 'nsid': cfg.psp_dev_peer_nsid})
+    del cfg.psp_dev_id
+    del cfg.psp_info
+
+
+def _psp_dev_get_check_netkit_psp_assoc(cfg):
+    """ Check psp dev-get output with netkit interface associated with PSP dev """
+
+    _init_psp_dev(cfg, True)
+    psp_dev_id_for_assoc = cfg.psp_dev_id
+
+    # Associate PSP device with nk_guest interface (in guest namespace)
+    nk_guest_dev = ip(f"link show dev {cfg._nk_guest_ifname}", json=True, ns=cfg.netns)[0]
+    nk_guest_ifindex = nk_guest_dev['ifindex']
+
+    cfg.pspnl.dev_assoc({'id': psp_dev_id_for_assoc, 'ifindex': nk_guest_ifindex, 'nsid': cfg.psp_dev_peer_nsid})
+
+    # Check 1: In default netns, verify dev-get has correct ifindex and assoc-list
+    dev_info = cfg.pspnl.dev_get({'id': psp_dev_id_for_assoc})
+
+    # Verify the PSP device has the correct ifindex
+    ksft_eq(dev_info['ifindex'], cfg.psp_ifindex)
+
+    # Verify assoc-list exists and contains the associated nk_guest with correct ifindex and nsid
+    ksft_true('assoc-list' in dev_info, "No assoc-list in dev_get() response after association")
+    found = False
+    for assoc in dev_info['assoc-list']:
+        if assoc['ifindex'] == nk_guest_ifindex and assoc['nsid'] == cfg.psp_dev_peer_nsid:
+            found = True
+            break
+    ksft_true(found, "Associated device not found in assoc-list with correct ifindex and nsid")
+
+    # Check 2: In guest netns, verify dev-get has assoc-list with nk_guest device
+    with NetNSEnter(cfg.netns.name):
+        peer_pspnl = PSPFamily()
+
+        # Dump all devices in the guest namespace
+        peer_devices = peer_pspnl.dev_get({}, dump=True)
+
+        # Find the device with by-association flag
+        peer_dev = None
+        for dev in peer_devices:
+            if dev.get('by-association'):
+                peer_dev = dev
+                break
+
+        ksft_not_none(peer_dev, "No PSP device found with by-association flag in guest netns")
+
+        # Verify assoc-list contains the nk_guest device
+        ksft_true('assoc-list' in peer_dev and len(peer_dev['assoc-list']) > 0,
+                  "Guest device should have assoc-list with local devices")
+
+        # Verify the assoc-list contains nk_guest ifindex with nsid=-1 (same namespace)
+        found = False
+        for assoc in peer_dev['assoc-list']:
+            if assoc['ifindex'] == nk_guest_ifindex:
+                ksft_eq(assoc['nsid'], -1,
+                        "nsid should be -1 (NETNSA_NSID_NOT_ASSIGNED) for same-namespace device")
+                found = True
+                break
+        ksft_true(found, "nk_guest ifindex not found in assoc-list")
+
+    # Clean up
+    cfg.pspnl.dev_disassoc({'id': psp_dev_id_for_assoc, 'ifindex': nk_guest_ifindex, 'nsid': cfg.psp_dev_peer_nsid})
+
+    del cfg.psp_dev_id
+    del cfg.psp_info
+
+
+def _dev_assoc_no_nsid(cfg):
+    """ Test dev-assoc and dev-disassoc without nsid attribute """
+    _init_psp_dev(cfg, True)
+    psp_dev_id = cfg.psp_dev_id
+
+    # Get nk_host's ifindex (in host namespace, same as caller)
+    nk_host_dev = ip(f"link show dev {cfg._nk_host_ifname}", json=True)[0]
+    nk_host_ifindex = nk_host_dev['ifindex']
+
+    # Associate without nsid - should look up ifindex in caller's netns
+    cfg.pspnl.dev_assoc({'id': psp_dev_id, 'ifindex': nk_host_ifindex})
+
+    # Verify assoc-list contains the device
+    dev_info = cfg.pspnl.dev_get({'id': psp_dev_id})
+    ksft_true('assoc-list' in dev_info, "No assoc-list after association")
+    found = False
+    for assoc in dev_info['assoc-list']:
+        if assoc['ifindex'] == nk_host_ifindex:
+            found = True
+            break
+    ksft_true(found, "Associated device not found in assoc-list")
+
+    # Disassociate without nsid - should also use caller's netns
+    cfg.pspnl.dev_disassoc({'id': psp_dev_id, 'ifindex': nk_host_ifindex})
+
+    # Verify assoc-list no longer contains the device
+    dev_info = cfg.pspnl.dev_get({'id': psp_dev_id})
+    found = False
+    if 'assoc-list' in dev_info:
+        for assoc in dev_info['assoc-list']:
+            if assoc['ifindex'] == nk_host_ifindex:
+                found = True
+                break
+    ksft_true(not found, "Device should not be in assoc-list after disassociation")
+
+    del cfg.psp_dev_id
+    del cfg.psp_info
+
+
+def _psp_dev_assoc_cleanup_on_netkit_del(cfg):
+    """ Test that assoc-list is cleared when associated netkit interface is deleted """
+    _init_psp_dev(cfg, True)
+    psp_dev_id_for_assoc = cfg.psp_dev_id
+
+    # Associate PSP device with nk_guest interface (in guest namespace)
+    nk_guest_dev = ip(f"link show dev {cfg._nk_guest_ifname}", json=True, ns=cfg.netns)[0]
+    nk_guest_ifindex = nk_guest_dev['ifindex']
+
+    cfg.pspnl.dev_assoc({'id': psp_dev_id_for_assoc, 'ifindex': nk_guest_ifindex, 'nsid': cfg.psp_dev_peer_nsid})
+
+    # Verify assoc-list exists in default netns
+    dev_info = cfg.pspnl.dev_get({'id': psp_dev_id_for_assoc})
+    ksft_true('assoc-list' in dev_info, "No assoc-list after association")
+    found = False
+    for assoc in dev_info['assoc-list']:
+        if assoc['ifindex'] == nk_guest_ifindex and assoc['nsid'] == cfg.psp_dev_peer_nsid:
+            found = True
+            break
+    ksft_true(found, "Associated device not found in assoc-list")
+
+    # Delete the netkit interface in the guest namespace
+    ip(f"link del {cfg._nk_guest_ifname}", ns=cfg.netns)
+
+    # Mark netkit as already deleted so cleanup won't try to delete it again
+    # (deleting nk_guest also removes nk_host since they're a pair)
+    cfg._nk_host_ifname = None
+    cfg._nk_guest_ifname = None
+
+    # Verify assoc-list is gone in default netns after netkit deletion
+    dev_info = cfg.pspnl.dev_get({'id': psp_dev_id_for_assoc})
+    ksft_true('assoc-list' not in dev_info or len(dev_info['assoc-list']) == 0,
+              "assoc-list should be empty after netkit deletion")
+
+    del cfg.psp_dev_id
+    del cfg.psp_info
+
+
 def __bad_xfer_do(cfg, s, tx, version='hdr0-aes-gcm-128'):
     # Make sure we accept the ACK for the SPI before we seal with the bad assoc
     _check_data_outq(s, 0)
@@ -571,33 +868,127 @@ def removal_device_bi(cfg):
         _close_conn(cfg, s)
 
 
-def psp_ip_ver_test_builder(name, test_func, psp_ver, ipver):
-    """Build test cases for each combo of PSP version and IP version"""
-    def test_case(cfg):
-        cfg.require_ipver(ipver)
-        test_func(cfg, psp_ver, ipver)
-
-    test_case.__name__ = f"{name}_v{psp_ver}_ip{ipver}"
-    return test_case
+@ksft_variants([
+    KsftNamedVariant(f"v{v}_ip{ip}", v, ip)
+    for v in range(4) for ip in ("4", "6")
+])
+def data_basic_send(cfg, version, ipver):
+    cfg.require_ipver(ipver)
+    _data_basic_send(cfg, version, ipver)
+
+
+@ksft_variants([
+    KsftNamedVariant(f"ip{ip}", ip)
+    for ip in ("4", "6")
+])
+def data_mss_adjust(cfg, ipver):
+    cfg.require_ipver(ipver)
+    _data_mss_adjust(cfg, ipver)
+
+
+@ksft_variants([
+    KsftNamedVariant(f"v{v}_ip6", v, "6")
+    for v in range(4)
+])
+def data_basic_send_netkit_psp_assoc(cfg, version, ipver):
+    cfg.require_ipver(ipver)
+    _data_basic_send_netkit_psp_assoc(cfg, version, ipver)
+
+
+
+def _get_nsid(ns_name):
+    """Get the nsid for a namespace."""
+    for entry in ip("netns list-id", json=True):
+        if entry.get("name") == str(ns_name):
+            return entry["nsid"]
+    raise KsftSkipEx(f"nsid not found for namespace {ns_name}")
+
+
+def _setup_psp_attributes(cfg):
+    """
+    Set up PSP-specific attributes on the environment.
+
+    This sets attributes needed for PSP tests based on whether we're using
+    netdevsim or a real NIC.
+    """
+    if cfg._ns is not None:
+        # netdevsim case: PSP device is the local dev (in host namespace)
+        cfg.psp_dev = cfg._ns.nsims[0].dev
+        cfg.psp_ifname = cfg.psp_dev['ifname']
+        cfg.psp_ifindex = cfg.psp_dev['ifindex']
+
+        # PSP peer device is the remote dev (in _netns, where psp_responder runs)
+        cfg.psp_dev_peer = cfg._ns_peer.nsims[0].dev
+        cfg.psp_dev_peer_ifname = cfg.psp_dev_peer['ifname']
+        cfg.psp_dev_peer_ifindex = cfg.psp_dev_peer['ifindex']
+    else:
+        # Real NIC case: PSP device is the local interface
+        cfg.psp_dev = cfg.dev
+        cfg.psp_ifname = cfg.ifname
+        cfg.psp_ifindex = cfg.ifindex
+
+        # PSP peer device is the remote interface
+        cfg.psp_dev_peer = cfg.remote_dev
+        cfg.psp_dev_peer_ifname = cfg.remote_ifname
+        cfg.psp_dev_peer_ifindex = cfg.remote_ifindex
+
+    # Get nsid for the guest namespace (netns) where nk_guest is
+    cfg.psp_dev_peer_nsid = _get_nsid(cfg.netns.name)
+
+
+def _setup_psp_routes(cfg):
+    """
+    Set up routes for cross-namespace connectivity.
+
+    Traffic flows:
+    1. remote (_netns) -> nk_guest (netns):
+       psp_dev_peer -> psp_dev_local -> BPF redirect -> nk_host -> nk_guest
+       Needs: route in _netns to nk_v6_pfx/64 via psp_dev_local
+
+    2. nk_guest (netns) -> remote (_netns):
+       nk_guest -> nk_host -> psp_dev_local -> psp_dev_peer
+       Needs: route in netns to dev_v6_pfx/64 via nk_host
+    """
+    # In _netns (remote namespace): add route to nk_guest prefix via psp_dev_local
+    # psp_dev_peer can reach psp_dev_local via the link, then traffic goes through BPF
+    ip(f"-6 route add {cfg.nk_v6_pfx}/64 via {cfg.nsim_v6_pfx}1 dev {cfg.psp_dev_peer_ifname}",
+       ns=cfg._netns)
+
+    # In netns (guest namespace): add route to remote peer prefix
+    # nk_guest default route goes to nk_host, but we need explicit route to dev_v6_pfx/64
+    ip(f"-6 route add {cfg.nsim_v6_pfx}/64 via fe80::1 dev {cfg._nk_guest_ifname}",
+       ns=cfg.netns)
 
 
-def ipver_test_builder(name, test_func, ipver):
-    """Build test cases for each IP version"""
-    def test_case(cfg):
-        cfg.require_ipver(ipver)
-        test_func(cfg, ipver)
+def main() -> None:
+    """ Ksft boiler plate main """
 
-    test_case.__name__ = f"{name}_ip{ipver}"
-    return test_case
+    # Use a different prefix for netkit guest to avoid conflict with dev prefix
+    nk_v6_pfx = "2001:db9::"
 
+    # Set LOCAL_PREFIX_V6 to a DIFFERENT prefix than the dev prefix to avoid BPF
+    # redirecting psp_responder traffic. The BPF only redirects traffic
+    # matching LOCAL_PREFIX_V6, so dev traffic (2001:db8::) won't be affected.
+    if "LOCAL_PREFIX_V6" not in os.environ:
+        os.environ["LOCAL_PREFIX_V6"] = nk_v6_pfx
 
-def main() -> None:
-    """ Ksft boiler plate main """
+    try:
+        env = NetDrvContEnv(__file__, install_tx_redirect_bpf=True)
+        has_cont = True
+    except KsftSkipEx:
+        env = NetDrvEpEnv(__file__)
+        has_cont = False
 
-    with NetDrvEpEnv(__file__) as cfg:
+    with env as cfg:
         cfg.pspnl = PSPFamily()
 
+        if has_cont:
+            cfg.nk_v6_pfx = nk_v6_pfx
+            _setup_psp_attributes(cfg)
+            _setup_psp_routes(cfg)
+
         # Set up responder and communication sock
+        # psp_responder runs in _netns (remote namespace with psp_dev_peer)
         responder = cfg.remote.deploy("psp_responder")
 
         cfg.comm_port = rand_port()
@@ -611,17 +1002,17 @@ def main() -> None:
                                                           cfg.comm_port),
                                                          timeout=1)
 
-                cases = [
-                    psp_ip_ver_test_builder(
-                        "data_basic_send", _data_basic_send, version, ipver
-                    )
-                    for version in range(0, 4)
-                    for ipver in ("4", "6")
-                ]
-                cases += [
-                    ipver_test_builder("data_mss_adjust", _data_mss_adjust, ipver)
-                    for ipver in ("4", "6")
-                ]
+                cases = [data_basic_send, data_mss_adjust]
+
+                if has_cont:
+                    cases += [
+                        data_basic_send_netkit_psp_assoc,
+                        _key_rotation_notify_multi_ns_netkit,
+                        _dev_change_notify_multi_ns_netkit,
+                        _psp_dev_get_check_netkit_psp_assoc,
+                        _dev_assoc_no_nsid,
+                        _psp_dev_assoc_cleanup_on_netkit_del,
+                    ]
 
                 ksft_run(cases=cases, globs=globals(),
                          case_pfx={"dev_", "data_", "assoc_", "removal_"},
-- 
2.52.0


^ permalink raw reply related

* [PATCH v12 net-next 4/5] selftests/net: Add bpf skb forwarding program
From: Wei Wang @ 2026-04-18 17:00 UTC (permalink / raw)
  To: netdev, Jakub Kicinski, Daniel Zahka, Willem de Bruijn, David Wei,
	Andrew Lunn, David S . Miller, Eric Dumazet, Simon Horman,
	Paolo Abeni
  Cc: Wei Wang, Bobby Eshleman
In-Reply-To: <20260418170056.3490525-1-weibunny.kernel@gmail.com>

From: Wei Wang <weibunny@fb.com>

Add nk_redirect.bpf.c, a BPF program that forwards skbs matching some IPv6
prefix received on eth0 ifindex to a specified dev ifindex.
bpf_redirect_neigh() is used to make sure neighbor lookup is performed
and proper MAC addr is being used.

Signed-off-by: Wei Wang <weibunny@fb.com>
Reviewed-by: Bobby Eshleman <bobbyeshleman@meta.com>
Tested-by: Bobby Eshleman <bobbyeshleman@meta.com>
---
 .../drivers/net/hw/nk_redirect.bpf.c          | 60 +++++++++++++++++++
 1 file changed, 60 insertions(+)
 create mode 100644 tools/testing/selftests/drivers/net/hw/nk_redirect.bpf.c

diff --git a/tools/testing/selftests/drivers/net/hw/nk_redirect.bpf.c b/tools/testing/selftests/drivers/net/hw/nk_redirect.bpf.c
new file mode 100644
index 000000000000..7ac9ffd50f15
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/hw/nk_redirect.bpf.c
@@ -0,0 +1,60 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * BPF program for redirecting traffic using bpf_redirect_neigh().
+ * Unlike bpf_redirect() which preserves L2 headers, bpf_redirect_neigh()
+ * performs neighbor lookup and fills in the correct L2 addresses for the
+ * target interface. This is necessary when redirecting across different
+ * device types (e.g., from netdevsim to netkit).
+ */
+#include <linux/bpf.h>
+#include <linux/pkt_cls.h>
+#include <linux/if_ether.h>
+#include <linux/ipv6.h>
+#include <linux/in6.h>
+#include <bpf/bpf_endian.h>
+#include <bpf/bpf_helpers.h>
+
+#define TC_ACT_OK 0
+#define ETH_P_IPV6 0x86DD
+
+#define ctx_ptr(field)		((void *)(long)(field))
+
+#define v6_p64_equal(a, b)	(a.s6_addr32[0] == b.s6_addr32[0] && \
+				 a.s6_addr32[1] == b.s6_addr32[1])
+
+volatile __u32 redirect_ifindex;
+volatile __u8 ipv6_prefix[16];
+
+SEC("tc/ingress")
+int tc_redirect(struct __sk_buff *skb)
+{
+	void *data_end = ctx_ptr(skb->data_end);
+	void *data = ctx_ptr(skb->data);
+	struct in6_addr *match_prefix;
+	struct ipv6hdr *ip6h;
+	struct ethhdr *eth;
+
+	match_prefix = (struct in6_addr *)ipv6_prefix;
+
+	if (skb->protocol != bpf_htons(ETH_P_IPV6))
+		return TC_ACT_OK;
+
+	eth = data;
+	if ((void *)(eth + 1) > data_end)
+		return TC_ACT_OK;
+
+	ip6h = data + sizeof(struct ethhdr);
+	if ((void *)(ip6h + 1) > data_end)
+		return TC_ACT_OK;
+
+	if (!v6_p64_equal(ip6h->daddr, (*match_prefix)))
+		return TC_ACT_OK;
+
+	/*
+	 * Use bpf_redirect_neigh() to perform neighbor lookup and fill in
+	 * correct L2 addresses for the target interface.
+	 */
+	return bpf_redirect_neigh(redirect_ifindex, NULL, 0, 0);
+}
+
+char __license[] SEC("license") = "GPL";
-- 
2.52.0


^ permalink raw reply related

* [PATCH v12 net-next 3/5] psp: add a new netdev event for dev unregister
From: Wei Wang @ 2026-04-18 17:00 UTC (permalink / raw)
  To: netdev, Jakub Kicinski, Daniel Zahka, Willem de Bruijn, David Wei,
	Andrew Lunn, David S . Miller, Eric Dumazet, Simon Horman,
	Paolo Abeni
  Cc: Wei Wang
In-Reply-To: <20260418170056.3490525-1-weibunny.kernel@gmail.com>

From: Wei Wang <weibunny@fb.com>

Add a new netdev event for dev unregister and handle the removal of this
dev from psp->assoc_dev_list, upon the first dev-assoc operation.

Signed-off-by: Wei Wang <weibunny@fb.com>
Reviewed-by: Daniel Zahka <daniel.zahka@gmail.com>
---
 Documentation/netlink/specs/psp.yaml |  2 +-
 net/psp/psp-nl-gen.c                 |  2 +-
 net/psp/psp-nl-gen.h                 |  3 ++
 net/psp/psp.h                        |  1 +
 net/psp/psp_main.c                   | 76 ++++++++++++++++++++++++++++
 net/psp/psp_nl.c                     | 29 +++++++++++
 6 files changed, 111 insertions(+), 2 deletions(-)

diff --git a/Documentation/netlink/specs/psp.yaml b/Documentation/netlink/specs/psp.yaml
index 3d1b7223e084..538ed9184965 100644
--- a/Documentation/netlink/specs/psp.yaml
+++ b/Documentation/netlink/specs/psp.yaml
@@ -328,7 +328,7 @@ operations:
             - nsid
         reply:
           attributes: []
-        pre: psp-device-get-locked
+        pre: psp-device-get-locked-dev-assoc
         post: psp-device-unlock
     -
       name: dev-disassoc
diff --git a/net/psp/psp-nl-gen.c b/net/psp/psp-nl-gen.c
index 114299c64423..389a8480cc3d 100644
--- a/net/psp/psp-nl-gen.c
+++ b/net/psp/psp-nl-gen.c
@@ -135,7 +135,7 @@ static const struct genl_split_ops psp_nl_ops[] = {
 	},
 	{
 		.cmd		= PSP_CMD_DEV_ASSOC,
-		.pre_doit	= psp_device_get_locked,
+		.pre_doit	= psp_device_get_locked_dev_assoc,
 		.doit		= psp_nl_dev_assoc_doit,
 		.post_doit	= psp_device_unlock,
 		.policy		= psp_dev_assoc_nl_policy,
diff --git a/net/psp/psp-nl-gen.h b/net/psp/psp-nl-gen.h
index 4dd0f0f23053..24d51bff997f 100644
--- a/net/psp/psp-nl-gen.h
+++ b/net/psp/psp-nl-gen.h
@@ -21,6 +21,9 @@ int psp_device_get_locked_admin(const struct genl_split_ops *ops,
 				struct sk_buff *skb, struct genl_info *info);
 int psp_assoc_device_get_locked(const struct genl_split_ops *ops,
 				struct sk_buff *skb, struct genl_info *info);
+int psp_device_get_locked_dev_assoc(const struct genl_split_ops *ops,
+				    struct sk_buff *skb,
+				    struct genl_info *info);
 void
 psp_device_unlock(const struct genl_split_ops *ops, struct sk_buff *skb,
 		  struct genl_info *info);
diff --git a/net/psp/psp.h b/net/psp/psp.h
index 0f9c4e4e52cb..c82b21bae240 100644
--- a/net/psp/psp.h
+++ b/net/psp/psp.h
@@ -15,6 +15,7 @@ extern struct mutex psp_devs_lock;
 
 void psp_dev_free(struct psp_dev *psd);
 int psp_dev_check_access(struct psp_dev *psd, struct net *net, bool admin);
+int psp_attach_netdev_notifier(void);
 
 void psp_nl_notify_dev(struct psp_dev *psd, u32 cmd);
 
diff --git a/net/psp/psp_main.c b/net/psp/psp_main.c
index 9049f1d2ff02..5a134b72f320 100644
--- a/net/psp/psp_main.c
+++ b/net/psp/psp_main.c
@@ -376,6 +376,82 @@ int psp_dev_rcv(struct sk_buff *skb, u16 dev_id, u8 generation, bool strip_icv)
 }
 EXPORT_SYMBOL(psp_dev_rcv);
 
+static void psp_dev_disassoc_one(struct psp_dev *psd, struct net_device *dev)
+{
+	struct psp_assoc_dev *entry, *tmp;
+
+	list_for_each_entry_safe(entry, tmp, &psd->assoc_dev_list, dev_list) {
+		if (entry->assoc_dev == dev) {
+			list_del(&entry->dev_list);
+			psd->assoc_dev_cnt--;
+			rcu_assign_pointer(entry->assoc_dev->psp_dev, NULL);
+			netdev_put(entry->assoc_dev, &entry->dev_tracker);
+			kfree(entry);
+			return;
+		}
+	}
+}
+
+static int psp_netdev_event(struct notifier_block *nb, unsigned long event,
+			    void *ptr)
+{
+	struct net_device *dev = netdev_notifier_info_to_dev(ptr);
+	struct psp_dev *psd;
+
+	if (event != NETDEV_UNREGISTER)
+		return NOTIFY_DONE;
+
+	rcu_read_lock();
+	psd = rcu_dereference(dev->psp_dev);
+	if (psd && psp_dev_tryget(psd)) {
+		rcu_read_unlock();
+		mutex_lock(&psd->lock);
+		psp_dev_disassoc_one(psd, dev);
+		mutex_unlock(&psd->lock);
+		psp_dev_put(psd);
+	} else {
+		rcu_read_unlock();
+	}
+
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block psp_netdev_notifier = {
+	.notifier_call = psp_netdev_event,
+};
+
+static DEFINE_MUTEX(psp_notifier_lock);
+static bool psp_notifier_registered;
+
+/**
+ * psp_attach_netdev_notifier() - register netdev notifier on first use
+ *
+ * Register the netdevice notifier when the first device association
+ * is created. In many installations no associations will be created and
+ * the notifier won't be needed.
+ *
+ * Must be called without psd->lock held, due to lock ordering:
+ * rtnl_lock -> psd->lock (the notifier callback runs under rtnl_lock
+ * and takes psd->lock).
+ */
+int psp_attach_netdev_notifier(void)
+{
+	int err = 0;
+
+	if (READ_ONCE(psp_notifier_registered))
+		return 0;
+
+	mutex_lock(&psp_notifier_lock);
+	if (!psp_notifier_registered) {
+		err = register_netdevice_notifier(&psp_netdev_notifier);
+		if (!err)
+			WRITE_ONCE(psp_notifier_registered, true);
+	}
+	mutex_unlock(&psp_notifier_lock);
+
+	return err;
+}
+
 static int __init psp_init(void)
 {
 	mutex_init(&psp_devs_lock);
diff --git a/net/psp/psp_nl.c b/net/psp/psp_nl.c
index 75ca32821d28..d622f91a979e 100644
--- a/net/psp/psp_nl.c
+++ b/net/psp/psp_nl.c
@@ -167,6 +167,22 @@ int psp_device_get_locked(const struct genl_split_ops *ops,
 	return __psp_device_get_locked(ops, skb, info, false);
 }
 
+/*
+ * Non-admin version of psp_device_get_locked() + psp_attach_netdev_notifier()
+ * only used for dev-assoc.
+ */
+int psp_device_get_locked_dev_assoc(const struct genl_split_ops *ops,
+				    struct sk_buff *skb, struct genl_info *info)
+{
+	int err;
+
+	err = psp_attach_netdev_notifier();
+	if (err)
+		return err;
+
+	return __psp_device_get_locked(ops, skb, info, false);
+}
+
 static struct net *psp_nl_resolve_assoc_dev_ns(struct psp_dev *psd,
 					       struct genl_info *info)
 {
@@ -532,6 +548,19 @@ int psp_nl_dev_assoc_doit(struct sk_buff *skb, struct genl_info *info)
 	}
 
 	psp_assoc_dev->assoc_dev = assoc_dev;
+
+	/* Check for race with NETDEV_UNREGISTER. The cmpxchg above is a
+	 * full barrier, and the unregister path has synchronize_net()
+	 * between setting NETREG_UNREGISTERING and reading psp_dev in the
+	 * notifier. So at least one side would do the clean-up if we are in
+	 * the middle of unregitering assoc_dev.
+	 * And the clean-up is serialized by psd->lock.
+	 */
+	if (READ_ONCE(assoc_dev->reg_state) != NETREG_REGISTERED) {
+		err = -ENODEV;
+		goto rsp_err;
+	}
+
 	rsp = psp_nl_reply_new(info);
 	if (!rsp) {
 		err = -ENOMEM;
-- 
2.52.0


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox