Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net] octeontx2-af: fix CGX debugfs RVU AF PCI reference leaks
From: Jakub Kicinski @ 2026-06-21 21:44 UTC (permalink / raw)
  To: Ratheesh Kannoth
  Cc: davem, hkelam, lcherian, linux-kernel, netdev, pabeni, sgoutham,
	andrew+netdev, edumazet, Yuho Choi
In-Reply-To: <20260617104525.1321395-1-rkannoth@marvell.com>

On Wed, 17 Jun 2026 16:15:25 +0530 Ratheesh Kannoth wrote:
> +		{
> +			struct rvu_cgx_lmac_dbgfs_ctx *ctx;
> +
> +			ctx = devm_kzalloc(rvu->dev, sizeof(*ctx), GFP_KERNEL);
> +			if (!ctx)
> +				continue;

In addition to Simon's nit - please don't create floating code blocks,
just add the var decl at the start of the function.
-- 
pw-bot: cr

^ permalink raw reply

* Re: [PATCH net v3 1/2] net: macb: give reasons for Tx SKB kfree
From: Jakub Kicinski @ 2026-06-21 21:40 UTC (permalink / raw)
  To: Théo Lebrun
  Cc: Nicolas Ferre, Claudiu Beznea, Andrew Lunn, David S. Miller,
	Eric Dumazet, Paolo Abeni, Haavard Skinnemoen, Jeff Garzik,
	Conor Dooley, Paolo Valerio, Nicolai Buchwitz, netdev,
	linux-kernel, Vladimir Kondratiev, Gregory CLEMENT,
	Benoît Monin, Tawfik Bayouk, Thomas Petazzoni,
	Maxime Chevallier, stable
In-Reply-To: <20260617-macb-drop-tx-v3-1-d4c7e57d890b@bootlin.com>

On Wed, 17 Jun 2026 11:17:29 +0200 Théo Lebrun wrote:
> Fixes: 89e5785fc8a6 ("[PATCH] Atmel MACB ethernet driver")
> Cc: stable@vger.kernel.org

Interesting, did AI suggest this? It's fairly uncommon for drivers
to care about drop reasons, packet loss on egress ports is pretty
clearly attributed by tx_drops.

I don't think this belongs in net, net-next would be fine, if you think
it's necessary. Sashiko seems to point out a few more clear cut bugs.
-- 
pw-bot: cr

^ permalink raw reply

* Re: [PATCH net] net: dst_metadata: fix false-positive memcpy overflow in tun_dst_unclone
From: patchwork-bot+netdevbpf @ 2026-06-21 21:40 UTC (permalink / raw)
  To: Ilya Maximets
  Cc: netdev, davem, edumazet, kuba, pabeni, horms, kees, gustavoars,
	nathan, nick.desaulniers+lkml, morbo, justinstitt, linux-kernel,
	linux-hardening, llvm, write
In-Reply-To: <20260616100332.1308294-1-i.maximets@ovn.org>

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Tue, 16 Jun 2026 12:03:29 +0200 you wrote:
> kmalloc_flex() in metadata_dst_alloc() sets __counted_by for the
> structure to the options_len, which is then initialized to zero.
> Later, we're initializing the structure by copying the tunnel info
> together with the options, and this triggers a warning for a potential
> memcpy overflow, since the compiler estimates that the options can't
> fit into the structure, even though the memory for them is actually
> allocated.
> 
> [...]

Here is the summary with links:
  - [net] net: dst_metadata: fix false-positive memcpy overflow in tun_dst_unclone
    https://git.kernel.org/netdev/net/c/4c6d43db2a4d

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net v3] tipc: fix use-after-free of the discoverer in tipc_disc_rcv()
From: patchwork-bot+netdevbpf @ 2026-06-21 21:40 UTC (permalink / raw)
  To: Weiming Shi
  Cc: jmaloy, davem, edumazet, kuba, pabeni, horms, ying.xue, netdev,
	tipc-discussion, linux-kernel, xmei5
In-Reply-To: <20260617135744.3383175-3-bestswngs@gmail.com>

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Wed, 17 Jun 2026 21:57:45 +0800 you wrote:
> bearer_disable() frees b->disc with tipc_disc_delete()'s plain kfree(),
> but tipc_disc_rcv() still dereferences b->disc in RX softirq under
> rcu_read_lock() (tipc_udp_recv -> tipc_rcv -> tipc_disc_rcv).
> 
> L2 bearers are safe thanks to the synchronize_net() in
> tipc_disable_l2_media(), but the UDP bearer defers that call to the
> cleanup_bearer() workqueue, so the discoverer is freed with no grace
> period:
> 
> [...]

Here is the summary with links:
  - [net,v3] tipc: fix use-after-free of the discoverer in tipc_disc_rcv()
    https://git.kernel.org/netdev/net/c/1579342d7113

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net] net: ethernet: mtk_ppe: Fix rhashtable leak in mtk_ppe_init error paths
From: patchwork-bot+netdevbpf @ 2026-06-21 21:40 UTC (permalink / raw)
  To: Wayen Yan
  Cc: netdev, lorenzo, horms, pabeni, kuba, edumazet, andrew+netdev,
	angelogioacchino.delregno, matthias.bgg, linux-arm-kernel,
	linux-mediatek
In-Reply-To: <178167550101.2217645.14579307712717502425@gmail.com>

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Wed, 17 Jun 2026 13:48:13 +0800 you wrote:
> In mtk_ppe_init(), when accounting is enabled, the error paths for
> dmam_alloc_coherent(mib) and devm_kzalloc(acct) failures return NULL
> directly, bypassing the err_free_l2_flows label that destroys the
> rhashtable initialized earlier.
> 
> While this leak only occurs during probe (not runtime) and the leaked
> memory is minimal (an empty rhash table), fixing it ensures proper
> error path cleanup consistency.
> 
> [...]

Here is the summary with links:
  - [net] net: ethernet: mtk_ppe: Fix rhashtable leak in mtk_ppe_init error paths
    https://git.kernel.org/netdev/net/c/41782770be56

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net-next v2 0/4] net: pse-pd: decouple controller lookup from MDIO probe
From: Jakub Kicinski @ 2026-06-21 21:32 UTC (permalink / raw)
  To: Carlo Szelinsky
  Cc: Oleksij Rempel, Kory Maincent, Andrew Lunn, Heiner Kallweit,
	Russell King, David S . Miller, Eric Dumazet, Paolo Abeni,
	Corey Leavitt, Jonas Jelonek, netdev, linux-kernel
In-Reply-To: <20260620112440.1734404-1-github@szelinsky.de>

On Sat, 20 Jun 2026 13:24:36 +0200 Carlo Szelinsky wrote:
> This is v2 of Corey's RFC [1]. Corey is busy at the moment, so I'm picking
> it up to unblock everyone. The design is unchanged. The main thing v2
> fixes is the SFP deadlock Jonas reported, plus a couple of smaller points
> from the review.

net-next is closed during the merge window. We can merge the first
patch, tho, if you repost is separately for net, since it's a fix.
-- 
pw-bot: defer

^ permalink raw reply

* Re: [PATCH net v2] net: marvell: prestera: initialize err in prestera_port_sfp_bind
From: patchwork-bot+netdevbpf @ 2026-06-21 21:30 UTC (permalink / raw)
  To: Ruoyu Wang
  Cc: taras.chornyi, andrew+netdev, davem, edumazet, kuba, pabeni,
	linux, oleksandr.mazur, yevhen.orlov, netdev, linux-kernel
In-Reply-To: <20260617193228.1653582-1-ruoyuw560@gmail.com>

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Thu, 18 Jun 2026 03:32:28 +0800 you wrote:
> prestera_port_sfp_bind() returns err after walking the ports node. If no
> child node matches the port's front-panel id, err is never assigned.
> 
> Initialize err to 0 because absence of a matching optional port device
> tree node is not an error. In that case no phylink is created and port
> creation should continue with port->phy_link left NULL. Errors from
> malformed matched nodes and phylink_create() still propagate.
> 
> [...]

Here is the summary with links:
  - [net,v2] net: marvell: prestera: initialize err in prestera_port_sfp_bind
    https://git.kernel.org/netdev/net/c/62b01f72d93c

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net-next v3] net: dsa: Fix skb ownership in taggers
From: Jakub Kicinski @ 2026-06-21 21:27 UTC (permalink / raw)
  To: linusw
  Cc: Jakub Kicinski, andrew, olteanv, davem, edumazet, pabeni, horms,
	florian.fainelli, jonas.gorski, hauke, kurt, woojung.huh,
	UNGLinuxDriver, chester.a.unal, daniel, matthias.bgg,
	angelogioacchino.delregno, wei.fang, xiaoning.wang, clement.leger,
	george.mccollister, mmyangfl, netdev, sashiko-bot
In-Reply-To: <20260617-dsa-fix-free-skb-v3-1-cdd4e0778a39@kernel.org>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
net: dsa: Fix skb ownership in taggers

This patch moves SKB drop responsibility from dsa_user_xmit() and
dsa_switch_rcv() into the per-tagger ->xmit() and ->rcv() callbacks, so
that taggers consume the skb on their own error paths and the call sites
no longer free a returned NULL skb.

Closes: https://lore.kernel.org/r/20260610153952.1685895-1-kuba@kernel.org/

> diff --git a/net/dsa/tag.c b/net/dsa/tag.c
> index 79ad105902d9..cfc8f5a0cbd9 100644
> --- a/net/dsa/tag.c
> +++ b/net/dsa/tag.c
> @@ -84,10 +84,8 @@ static int dsa_switch_rcv(struct sk_buff *skb, struct net_device *dev,
>  		nskb = cpu_dp->rcv(skb, dev);
>  	}
>  
> -	if (!nskb) {
> -		kfree_skb(skb);
> +	if (!nskb)
>  		return 0;
> -	}
>  
>  	skb = nskb;
>  	skb_push(skb, ETH_HLEN);

[High]
Does the new contract that taggers consume the skb on error cover the
inline METADATA_HW_PORT_MUX branch in dsa_switch_rcv() as well?  That
branch is not a tagger callback, it is open-coded in dsa_switch_rcv():

	if (md_dst && md_dst->type == METADATA_HW_PORT_MUX) {
		unsigned int port = md_dst->u.port_info.port_id;

		skb_dst_drop(skb);
		if (!skb_has_extensions(skb))
			skb->slow_gro = 0;

		skb->dev = dsa_conduit_find_user(dev, 0, port);
		if (likely(skb->dev)) {
			dsa_default_offload_fwd_mark(skb);
			nskb = skb;
		}
	} else {
		nskb = cpu_dp->rcv(skb, dev);
	}

	if (!nskb)
		return 0;

When dsa_conduit_find_user() returns NULL (for example a port_id that
does not match any registered DSA user, such as the CPU port index, an
invalid id from the conduit driver, or a port being torn down), nskb
stays NULL and skb is not freed anywhere, so the new

	if (!nskb)
		return 0;

path looks like it leaks the skb on every such packet.

Pre-patch, the removed kfree_skb(skb) at the !nskb site covered this
case.  Should the metadata-mux branch free the skb itself when
skb->dev is NULL, or should the unconditional kfree_skb(skb) at the
!nskb site be kept for this path?

[ ... ]
-- 
pw-bot: cr

^ permalink raw reply

* Re: [PATCH net v2 0/2] net: ethernet: sunplus: spl2sw: fix of_node refcount leaks
From: Jakub Kicinski @ 2026-06-21 20:22 UTC (permalink / raw)
  To: 呂芳騰
  Cc: Shitalkumar Gandhi, Andrew Lunn, David S. Miller, Eric Dumazet,
	Paolo Abeni, Simon Horman, netdev, linux-kernel,
	Shitalkumar Gandhi
In-Reply-To: <CAFnkrs=kE3thiFLaOULGv3n_KgR-r5T4vB7hxJsZL4iAihO31g@mail.gmail.com>

On Sun, 21 Jun 2026 12:38:06 +0800 呂芳騰 wrote:
> I'm sorry that I can't test the fix.
> I've left from Suplus and don't have the relevant hardware.

That makes things harder.. but you don't necessarily need HW to review
most of the patches. If you don't intend to serve as a maintainer of
the sunplus driver please sense a patch to MAINTAINERS and step down.
Right now you are listed but don't seem to be fulfilling the duties.
Or please review the patches to the best of your ability without
testing.

^ permalink raw reply

* [PATCH net] selftests: drv-net: so_txtime: relax variance bounds
From: Willem de Bruijn @ 2026-06-21 20:01 UTC (permalink / raw)
  To: netdev; +Cc: davem, kuba, edumazet, pabeni, horms, Willem de Bruijn

From: Willem de Bruijn <willemb@google.com>

The net-next-hw spinners on netdev.bots.linux.dev observe failing
so-txtime-py tests. A review of stdout shows most failures to be
due to exceeding the 4ms grace period. All I saw were within 8ms.
So increase to that.

Double the bounds from 4 to 8ms. This is still is small enough to
differentiate the delays programmed by the test, 10 and 20ms.

Fixes: 5c6baef3885c ("selftests: drv-net: convert so_txtime to drv-net")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://lore.kernel.org/netdev/20260610170651.1b644001@kernel.org/
Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 tools/testing/selftests/drivers/net/so_txtime.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/drivers/net/so_txtime.c b/tools/testing/selftests/drivers/net/so_txtime.c
index 75f3beef13d9..55a386f3d1b9 100644
--- a/tools/testing/selftests/drivers/net/so_txtime.c
+++ b/tools/testing/selftests/drivers/net/so_txtime.c
@@ -37,7 +37,7 @@
 
 static int	cfg_clockid	= CLOCK_TAI;
 static uint16_t	cfg_port	= 8000;
-static int	cfg_variance_us	= 4000;
+static int	cfg_variance_us	= 8000;
 static bool	cfg_machine_slow;
 static uint64_t	cfg_start_time_ns;
 static int	cfg_mark;
-- 
2.55.0.rc0.799.gd6f94ed593-goog


^ permalink raw reply related

* Re: [PATCH nf-next v3] netfilter: TCPMSS: handle packets with unaligned MSS option
From: Pablo Neira Ayuso @ 2026-06-21 19:46 UTC (permalink / raw)
  To: Kacper Kokot
  Cc: netfilter-devel, kadlec, fmancera, fw, david.laight.linux,
	Phil Sutter, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, coreteam, netdev, linux-kernel
In-Reply-To: <20260621184934.75832-1-kacper.kokot.44@gmail.com>

On Sun, Jun 21, 2026 at 07:49:33PM +0100, Kacper Kokot wrote:
> RFC 9293 permits TCP options to begin on any octet boundary. Padding
> to a word boundary with NOPs is a sender convention, not a requirement,
> and robust receivers must handle unaligned options (MUST-64).
> 
> The xt_TCPMSS target's incremental checksum update assumes the MSS
> option is word-aligned. When it's not, the modified bytes straddle
> two checksum words and the resulting checksum is incorrect. The mangled
> packet may then fail checksum validation and be dropped downstream.
> That said, all mainstream stacks emit a word-aligned MSS, this change is
> motivated by spec conformance rather than a bug observed in the wild.
> 
> Extend the checksum update to handle unaligned MSS options. When the
> changed word is unaligned, the modified bytes b' and c' straddle two
> checksum words w1 and w2:
> 
>     | w1     | w2     |
> OLD |  a  b  |  c  d  |
> NEW |  a  b' |  c' d  |
> 
> The two-step update C' = C - w1 + w1' - w2 + w2' reduces algebraically
> to a single word incremental checksum update with byteswapped operands:
> 
>     C' = C - w1 - w2 + w1' + w2'
>        = C - (a * 2^8 + b)  - (c * 2^8 + d)
>            + (a * 2^8 + b') + (c' * 2^8 + d)
>        = C + 2^8 * (a - a + c' - c) + (b' - b + d - d)
>        = C + 2^8 * (c' - c) + (b' - b)
>        = C - (2^8 * c + b) + (2^8 * c' + b')
> 
> So the unaligned case adds no extra checksum operations.
> 
> Signed-off-by: Kacper Kokot <kacper.kokot.44@gmail.com>
> ---
> v3:
>  - Reframe as enhancement, not a fix (Pablo/Fernando)
>  - Rename subject to xt_TCPMSS, drop "fix" wording
>  - Reword commit message: packet may fail checksum validation and be
>    dropped downstream (Pablo)
>  - Target nf-next (Fernando)
>  - Use __be16 for csum_oldmss/csum_newmss (sparse warning from
>    kernel test robot)
>  - Reorder local variable declarations to reverse xmas tree (Fernando)
> 
> v2:
>  - Use get_unaligned_be16 (Fernando's suggestion)
>  - Fix alignment check expression (David)
>  - Mention it's a theoretical bug in the commit message
>  - Drop cc stable, the bug is only theoretical
> 
> diff --git a/net/netfilter/xt_TCPMSS.c b/net/netfilter/xt_TCPMSS.c
> index 80e1634bc51f..037add799d41 100644
> --- a/net/netfilter/xt_TCPMSS.c
> +++ b/net/netfilter/xt_TCPMSS.c
> @@ -116,9 +116,10 @@ tcpmss_mangle_packet(struct sk_buff *skb,
>  	opt = (u_int8_t *)tcph;
>  	for (i = sizeof(struct tcphdr); i <= tcp_hdrlen - TCPOLEN_MSS; i += optlen(opt, i)) {
>  		if (opt[i] == TCPOPT_MSS && opt[i+1] == TCPOLEN_MSS) {
> +			__be16 csum_oldmss, csum_newmss;
>  			u_int16_t oldmss;
>  
> -			oldmss = (opt[i+2] << 8) | opt[i+3];
> +			oldmss = get_unaligned_be16(&opt[i + 2]);
>  
>  			/* Never increase MSS, even when setting it, as
>  			 * doing so results in problems for hosts that rely
> @@ -130,8 +131,25 @@ tcpmss_mangle_packet(struct sk_buff *skb,
>  			opt[i+2] = (newmss & 0xff00) >> 8;
>  			opt[i+3] = newmss & 0x00ff;
>  
> +			csum_oldmss = htons(oldmss);
> +			csum_newmss = htons(newmss);
> +
> +			if (((char *)&opt[i + 2] - (char *)tcph) & 0x1) {
> +				/* MSS option is unaligned: the modified bytes
> +				 * straddle two checksum words. Byteswapping
> +				 * the operands lets a single incremental
> +				 * update produce the correct checksum delta
> +				 * (see commit message for the derivation).
> +				 */
> +				csum_oldmss = htons(swab16(oldmss));
> +				csum_newmss = htons(swab16(newmss));
> +			} else {
> +				csum_oldmss = htons(oldmss);
> +				csum_newmss = htons(newmss);
> +			}

After seeing this unaligned in other areas in the Netfilter tree, I am
not sure it is worth to add workarounds everywhere in this codebase to
deal with updates that span two 16-bits words for such a hypothetical
case like this.

By now, patches that call get_unaligned_be16() for correctness are OK
IMO. This is to deal with arches which cannot cope with unaligned
access. This will corrupt such rare packet but that it addresses the
unaligned splats.

If we start seeing real stacks which provide real unaligned access
like this, maybe by then we can revisit.

So I am leaning towards a small patches to introduce
get_unaligned_be16() and document that this corrupts packets with such
a rare unaligned TCP option.

IIRC, x86_64 has a inet checksum function that can deal with 1-byte
words, although other arches cannot do that and still need to
operation with 16-bit words. Given Linux is multi-arch, this all need
to stick to the 16-bit word arithmetics when mangling packets

Maybe in the future all checksum functions in every arch are updated
too to deal with 1-byte word updates, and maybe real stacks pop up
with such a rare packets. But by then these ugly workaround won't be
needed at all.

> +
>  			inet_proto_csum_replace2(&tcph->check, skb,
> -						 htons(oldmss), htons(newmss),
> +						 csum_oldmss, csum_newmss,
>  						 false);
>  			return 0;
>  		}
> -- 
> 2.43.0
> 
> 

^ permalink raw reply

* Re: [REGRESSION 6.16] r8169 RTL8168h/8111h fails to probe — "Unable to change power state from D3cold to D0" — bisected to 4d4c10f763d7
From: Mario Limonciello @ 2026-06-21 19:24 UTC (permalink / raw)
  To: Thorsten Leemhuis, Josh Perry, bhelgaas
  Cc: hkallweit1, nic_swsd, rafael, linux-pci, netdev, regressions
In-Reply-To: <e8acc151-19f3-4823-83a1-e0906dd9f0f0@leemhuis.info>



On 6/17/26 01:32, Thorsten Leemhuis wrote:
> On 6/12/26 03:07, Josh Perry wrote:
>> #regzbot introduced: 4d4c10f763d7
>>
>> Since v6.16 one of two onboard RTL8168h/8111h NICs on this board fails
>> to probe on boot; the device drops to D3cold and the driver can't bring
>> it back:
> 
> FWIW, that commit is 4d4c10f763d780 ("PCI: Explicitly put devices into
> D0 when initializing") [v6.16-rc1] from Mario, who is already CCed, but
> looks like might be on holiday or something due to inactivity on the
> lists in the recent days. So it might take a few days before this moves on.
> 
> Josh, this is not my area of expertise, but there are two things I guess
> might be helpful:
> 
> * retry with 7.1
> * upload "dmesg" and "sudo lspci -vvv" output from working and broken
> kernels somewhere (like bugzilla.kernel.org).

Yes; please retry with mainline.  We already had multiple regressions 
from that commit fixed, so if you bisected down to this commit then it's 
plausible that there is already a fix.

> 
> Ciao, Thorsten
> 
>>    r8169 0000:02:00.0 eth0: RTL8168h/8111h, 00:2b:67:48:40:01, XID 541,
>> IRQ 137
>>    r8169 0000:04:00.0: Unable to change power state from D3cold to D0,
>> device inaccessible
>>    r8169 0000:04:00.0: Mem-Wr-Inval unavailable
>>    r8169 0000:04:00.0: error -EIO: PCI read failed
>>    r8169 0000:04:00.0: probe with driver r8169 failed with error -5
>>
>> The board has two identical RTL8168h NICs (both XID 541): 0000:02:00.0
>> and 0000:04:00.0. Only 04:00.0 fails — its sibling 02:00.0, on a
>> different root port, probes and works normally on the very same kernel
>> and boot. The failing NIC then does not appear (no enp4s0), taking the
>> machine's WAN offline. This strongly suggests the problem is port/
>> topology-specific rather than device- or driver-specific: the upstream
>> port behind 04:00.0 is placed in D3cold and the endpoint cannot be
>> resumed to D0.
>>
>> Hardware: RTL8168h/8111h, XID 541, PCI 04:00.0 (onboard 1GbE).
>> Platform: Lenovo ThinkCentre M90n-1 (11AHS0B200), BIOS M2AKT49A
>> (2026-03-25, latest available). Firmware is current, so this is not a
>> platform-firmware issue.
>>
>> Bisection: v6.15 good, v6.16 bad (verified by booting both). I then
>> reverted 4d4c10f763d7 ("PCI: Explicitly put devices into D0 when
>> initializing") together with its follow-up 907a7a2e5bf4 ("PCI/PM: Set up
>> runtime PM even for devices without PCI PM") on top of 6.16.7: the NIC
>> probes and links at 1Gbps/Full normally, with no workaround:
>>
>>    r8169 0000:04:00.0 eth1: RTL8168h/8111h, 00:2b:67:48:40:02, XID 541,
>> IRQ 138
>>    r8169 0000:04:00.0 enp4s0: Link is Up - 1Gbps/Full - flow control rx/tx
>>
>> Workaround: booting an unmodified v6.16+ kernel with pcie_port_pm=off
>> also restores the NIC, which is consistent with the upstream port being
>> placed in D3cold and the device failing to resume to D0 after the
>> explicit-D0 init change.
>>
>> The follow-up 907a7a2e5bf4 does not fix this resume case: v6.18.33 is
>> still affected (retested today on current firmware).
>>
>> Happy to test patches or provide full dmesg / lspci.
>>
> 


^ permalink raw reply

* [PATCH net-next RESEND v3 2/2] selftests: net: add FOU multicast encapsulation resubmit test
From: Anton Danilov @ 2026-06-21 19:04 UTC (permalink / raw)
  To: netdev
  Cc: Willem de Bruijn, davem, David Ahern, Eric Dumazet,
	Kuniyuki Iwashima, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Shuah Khan, linux-kselftest
In-Reply-To: <cover.1782067871.git.littlesmilingcloud@gmail.com>

Add a selftest to verify that FOU-encapsulated packets addressed to a
multicast destination are correctly resubmitted to the inner protocol
handler (GRE) via the UDP multicast delivery path.

The test creates two network namespaces connected by a veth pair with
a FOU/GRETAP tunnel using a multicast remote address (239.0.0.1).
Ping is sent through the tunnel and received packets are counted on
the receiver's tunnel interface.

A static neighbor entry is configured on the sender because ARP
replies from the receiver cannot traverse the unidirectional multicast
tunnel back to the sender.

The early demux optimization (net.ipv4.ip_early_demux) is disabled on
the receiver to force packets through __udp4_lib_mcast_deliver(),
which is the code path being tested.

Signed-off-by: Anton Danilov <littlesmilingcloud@gmail.com>
Assisted-by: Claude:claude-opus-4-6
---
 tools/testing/selftests/net/Makefile          |   1 +
 .../testing/selftests/net/fou_mcast_encap.sh  | 112 ++++++++++++++++++
 2 files changed, 113 insertions(+)
 create mode 100755 tools/testing/selftests/net/fou_mcast_encap.sh

diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index 708d960ae07d..7e9ae937cffa 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -39,6 +39,7 @@ TEST_PROGS := \
 	fib_rule_tests.sh \
 	fib_tests.sh \
 	fin_ack_lat.sh \
+	fou_mcast_encap.sh \
 	fq_band_pktlimit.sh \
 	gre_gso.sh \
 	gre_ipv6_lladdr.sh \
diff --git a/tools/testing/selftests/net/fou_mcast_encap.sh b/tools/testing/selftests/net/fou_mcast_encap.sh
new file mode 100755
index 000000000000..8db9633f4c28
--- /dev/null
+++ b/tools/testing/selftests/net/fou_mcast_encap.sh
@@ -0,0 +1,112 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# Test that UDP encapsulation (FOU) correctly handles packet resubmit
+# when packets are delivered via the multicast UDP delivery path.
+#
+# When a FOU-encapsulated packet arrives with a multicast destination IP,
+# __udp4_lib_mcast_deliver() must resubmit it to the inner protocol
+# handler (e.g., GRE) rather than consuming it. This test verifies that
+# by creating a FOU/GRETAP tunnel with a multicast remote address and
+# sending ping through it.
+#
+# The early demux optimization can mask this issue by routing packets via
+# the unicast path (udp_unicast_rcv_skb), so we disable it to force
+# packets through __udp4_lib_mcast_deliver().
+
+source lib.sh
+
+NSENDER=""
+NRECV=""
+
+cleanup() {
+	cleanup_all_ns
+}
+
+trap cleanup EXIT
+
+setup() {
+	setup_ns NSENDER NRECV
+
+	ip link add veth_s type veth peer name veth_r
+	ip link set veth_s netns "$NSENDER"
+	ip link set veth_r netns "$NRECV"
+
+	ip -n "$NSENDER" addr add 10.0.0.1/24 dev veth_s
+	ip -n "$NSENDER" link set veth_s up
+
+	ip -n "$NRECV" addr add 10.0.0.2/24 dev veth_r
+	ip -n "$NRECV" link set veth_r up
+
+	# Disable early demux to force multicast delivery path
+	ip netns exec "$NRECV" sysctl -wq net.ipv4.ip_early_demux=0
+
+	# Join multicast group on receiver
+	ip -n "$NRECV" addr add 239.0.0.1/32 dev veth_r autojoin
+
+	# Multicast routes
+	ip -n "$NRECV" route add 239.0.0.0/8 dev veth_r
+	ip -n "$NSENDER" route add 239.0.0.0/8 dev veth_s
+
+	# Sender: GRETAP with FOU encap (no FOU listener needed on TX side)
+	ip -n "$NSENDER" link add eoudp0 type gretap \
+		remote 239.0.0.1 local 10.0.0.1 \
+		encap fou encap-sport 4797 encap-dport 4797 \
+		key 239.0.0.1
+	ip -n "$NSENDER" link set eoudp0 up
+	ip -n "$NSENDER" addr add 192.168.99.1/24 dev eoudp0
+
+	# Receiver: FOU listener + GRETAP
+	ip netns exec "$NRECV" ip fou add port 4797 ipproto 47
+	ip -n "$NRECV" link add eoudp0 type gretap \
+		remote 239.0.0.1 local 10.0.0.2 \
+		encap fou encap-sport 4797 encap-dport 4797 \
+		key 239.0.0.1
+	ip -n "$NRECV" link set eoudp0 up
+	ip -n "$NRECV" addr add 192.168.99.2/24 dev eoudp0
+
+	# Static neigh entry on sender: ARP replies cannot traverse the
+	# multicast tunnel back, so pre-populate the neighbor cache.
+	local recv_mac
+	recv_mac=$(ip -n "$NRECV" link show eoudp0 | awk '/ether/{print $2}')
+	ip -n "$NSENDER" neigh add 192.168.99.2 lladdr "$recv_mac" dev eoudp0
+}
+
+get_rx_packets() {
+	ip -n "$NRECV" -s link show eoudp0 | awk '/RX:/{getline; print $2}'
+}
+
+test_fou_mcast_encap() {
+	local count=100
+	local rx_before
+	local rx_after
+	local rx_delta
+
+	# Warmup: let any initial broadcast/ARP traffic settle
+	ip netns exec "$NSENDER" ping -c 1 -W 1 192.168.99.2 >/dev/null 2>&1
+	sleep 1
+
+	rx_before=$(get_rx_packets)
+	ip netns exec "$NSENDER" ping -c $count -W 1 192.168.99.2 >/dev/null 2>&1
+	sleep 1
+	rx_after=$(get_rx_packets)
+
+	rx_delta=$((rx_after - rx_before))
+
+	if [ "$rx_delta" -ge "$count" ]; then
+		echo "PASS: received $rx_delta/$count packets via multicast FOU/GRETAP"
+		return "$ksft_pass"
+	elif [ "$rx_delta" -gt 0 ]; then
+		echo "FAIL: only $rx_delta/$count packets received (partial delivery)"
+		return "$ksft_fail"
+	else
+		echo "FAIL: 0/$count packets received (multicast encap resubmit broken)"
+		return "$ksft_fail"
+	fi
+}
+
+echo "TEST: FOU/GRETAP multicast encapsulation resubmit"
+
+setup
+test_fou_mcast_encap
+exit $?
-- 
2.47.3


^ permalink raw reply related

* [PATCH net-next RESEND v3 1/2] udp: fix encapsulation packet resubmit in multicast deliver
From: Anton Danilov @ 2026-06-21 19:04 UTC (permalink / raw)
  To: netdev
  Cc: Willem de Bruijn, davem, David Ahern, Eric Dumazet,
	Kuniyuki Iwashima, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Shuah Khan, linux-kselftest
In-Reply-To: <cover.1782067871.git.littlesmilingcloud@gmail.com>

When a UDP encapsulation socket (e.g., FOU) receives a multicast
packet, __udp4_lib_mcast_deliver() and __udp6_lib_mcast_deliver()
call consume_skb() when udp_queue_rcv_skb() returns a positive value.
A positive return value from udp_queue_rcv_skb() indicates that the
encap_rcv handler (e.g., fou_udp_recv) has consumed the UDP header
and wants the packet to be resubmitted to the IP protocol handler
for further processing (e.g., as a GRE packet).

The unicast path in udp_unicast_rcv_skb() handles this correctly by
returning -ret, which propagates up to ip_protocol_deliver_rcu() for
resubmission. However, the multicast path destroys the packet via
consume_skb() instead of resubmitting it, causing silent packet loss.

This affects any UDP encapsulation (FOU, GUE) combined with multicast
destination addresses.

Fix this by returning -ret instead of calling consume_skb() when the
return value is positive, matching the behavior of the unicast path.
This avoids growing the call stack compared to calling
ip_protocol_deliver_rcu() directly.

Signed-off-by: Anton Danilov <littlesmilingcloud@gmail.com>
Assisted-by: Claude:claude-opus-4-6
---
 net/ipv4/udp.c | 6 ++++--
 net/ipv6/udp.c | 6 ++++--
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 70f6cbd4ef73..b0910659391e 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -2475,6 +2475,7 @@ static int __udp4_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 	struct udp_hslot *hslot;
 	struct sk_buff *nskb;
 	bool use_hash2;
+	int ret;

 	hash2_any = 0;
 	hash2 = 0;
@@ -2519,8 +2520,9 @@ static int __udp4_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 	}

 	if (first) {
-		if (udp_queue_rcv_skb(first, skb) > 0)
-			consume_skb(skb);
+		ret = udp_queue_rcv_skb(first, skb);
+		if (ret > 0)
+			return -ret;
 	} else {
 		kfree_skb(skb);
 		__UDP_INC_STATS(net, UDP_MIB_IGNOREDMULTI);
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 15e032194ecc..ff2e389e286b 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -949,6 +949,7 @@ static int __udp6_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 	struct udp_hslot *hslot;
 	struct sk_buff *nskb;
 	bool use_hash2;
+	int ret;

 	hash2_any = 0;
 	hash2 = 0;
@@ -998,8 +999,9 @@ static int __udp6_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 	}

 	if (first) {
-		if (udpv6_queue_rcv_skb(first, skb) > 0)
-			consume_skb(skb);
+		ret = udpv6_queue_rcv_skb(first, skb);
+		if (ret > 0)
+			return -ret;
 	} else {
 		kfree_skb(skb);
 		__UDP6_INC_STATS(net, UDP_MIB_IGNOREDMULTI);
-- 
2.47.3

^ permalink raw reply related

* [PATCH net-next RESEND v3 0/2] udp: fix FOU/GUE over multicast
From: Anton Danilov @ 2026-06-21 19:04 UTC (permalink / raw)
  To: netdev
  Cc: Willem de Bruijn, davem, David Ahern, Eric Dumazet,
	Kuniyuki Iwashima, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Shuah Khan, linux-kselftest

This is a resend of the v3 series originally posted on 2026-05-05,
which did not receive review feedback during the previous net-next
window.  No changes since v3; rebased cleanly on current net-next
(after the net-next-7.2 merge).

v3 archive:
  https://lore.kernel.org/netdev/cover.1777934869.git.littlesmilingcloud@gmail.com/

UDP encapsulation (FOU, GUE) has never worked correctly with multicast
destination addresses. When a FOU-encapsulated packet arrives at a
multicast address, it enters __udp4_lib_mcast_deliver() which calls
consume_skb() on packets that need resubmission to the inner protocol
handler, silently dropping them instead.

The unicast delivery path handles this correctly by returning -ret,
but the multicast path was never updated to support UDP encapsulation
resubmit.

This causes silent packet loss for FOU/GRETAP tunnels configured with
multicast remote addresses. The loss ratio depends on the early demux
cache hit rate - packets that hit early demux bypass the multicast path
and work correctly, masking the issue.

Reproducing the issue:

  ip netns add ns_a && ip netns add ns_b
  ip link add veth0 type veth peer name veth1
  ip link set veth0 netns ns_a && ip link set veth1 netns ns_b

  ip -n ns_a addr add 10.0.0.1/24 dev veth0 && ip -n ns_a link set veth0 up
  ip -n ns_b addr add 10.0.0.2/24 dev veth1 && ip -n ns_b link set veth1 up

  # Multicast routes
  ip -n ns_a route add 239.0.0.0/8 dev veth0
  ip -n ns_b route add 239.0.0.0/8 dev veth1

  # Disable early demux to expose the issue (otherwise it's partially masked)
  ip netns exec ns_b sysctl -w net.ipv4.ip_early_demux=0

  # Join multicast group on receiver
  ip -n ns_b addr add 239.0.0.1/32 dev veth1 autojoin

  # Sender: GRETAP with FOU encap
  ip -n ns_a link add eoudp0 type gretap \
      remote 239.0.0.1 local 10.0.0.1 \
      encap fou encap-sport 4797 encap-dport 4797 key 239.0.0.1
  ip -n ns_a link set eoudp0 up
  ip -n ns_a addr add 192.168.99.1/24 dev eoudp0

  # Receiver: FOU listener + GRETAP
  ip netns exec ns_b ip fou add port 4797 ipproto 47
  ip -n ns_b link add eoudp0 type gretap \
      remote 239.0.0.1 local 10.0.0.2 \
      encap fou encap-sport 4797 encap-dport 4797 key 239.0.0.1
  ip -n ns_b link set eoudp0 up
  ip -n ns_b addr add 192.168.99.2/24 dev eoudp0

  # Static neigh: ARP replies can't traverse unidirectional mcast tunnel
  recv_mac=$(ip -n ns_b link show eoudp0 | awk '/ether/{print $2}')
  ip -n ns_a neigh add 192.168.99.2 lladdr $recv_mac dev eoudp0

  # Test: ping through the FOU/GRETAP tunnel
  ip netns exec ns_a ping -c 100 192.168.99.2
  # -> without this patch: 0 packets received on eoudp0
  # -> with this patch: all packets received on eoudp0

AI assistance (Claude, claude-opus-4-6) was used during root cause
analysis of the kernel source code (tracing the call chain from
udp_queue_rcv_skb through encap_rcv to ip_protocol_deliver_rcu,
comparing unicast/GSO/multicast paths) and during patch and selftest
authoring. The fix approach was identified by observing that the
unicast path (udp_unicast_rcv_skb) already handles encap resubmit
correctly via return -ret, while the multicast path did not.

Changes since v2:
  - Use return -ret instead of calling ip_protocol_deliver_rcu()
    directly, matching the unicast path and avoiding call stack
    growth with nested encapsulations (Kuniyuki Iwashima)
  - Only change the first-socket path; the clone loop is not
    reachable for tunnel sockets (no SO_REUSEADDR/SO_REUSEPORT)
  - Replace Python packet generator with ping through a properly
    configured FOU/GRETAP tunnel in the selftest
  - Add static neighbor entry in the test (ARP replies cannot
    traverse the unidirectional multicast tunnel)

Changes since v1 (RFC):
  - Moved inline Python packet generator into a separate helper
  - Fixed author email typo in Signed-off-by

Anton Danilov (2):
  udp: fix encapsulation packet resubmit in multicast deliver
  selftests: net: add FOU multicast encapsulation resubmit test

 net/ipv4/udp.c                                |   6 +-
 net/ipv6/udp.c                                |   6 +-
 tools/testing/selftests/net/Makefile          |   1 +
 .../testing/selftests/net/fou_mcast_encap.sh  | 112 ++++++++++++++++++
 4 files changed, 121 insertions(+), 4 deletions(-)
 create mode 100755 tools/testing/selftests/net/fou_mcast_encap.sh

-- 
2.47.3

^ permalink raw reply

* [PATCH nf-next v3] netfilter: TCPMSS: handle packets with unaligned MSS option
From: Kacper Kokot @ 2026-06-21 18:49 UTC (permalink / raw)
  To: netfilter-devel
  Cc: pablo, kadlec, fmancera, fw, david.laight.linux, Kacper Kokot,
	Phil Sutter, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, coreteam, netdev, linux-kernel
In-Reply-To: <20260528223412.27311-1-kacper.kokot.44@gmail.com>

RFC 9293 permits TCP options to begin on any octet boundary. Padding
to a word boundary with NOPs is a sender convention, not a requirement,
and robust receivers must handle unaligned options (MUST-64).

The xt_TCPMSS target's incremental checksum update assumes the MSS
option is word-aligned. When it's not, the modified bytes straddle
two checksum words and the resulting checksum is incorrect. The mangled
packet may then fail checksum validation and be dropped downstream.
That said, all mainstream stacks emit a word-aligned MSS, this change is
motivated by spec conformance rather than a bug observed in the wild.

Extend the checksum update to handle unaligned MSS options. When the
changed word is unaligned, the modified bytes b' and c' straddle two
checksum words w1 and w2:

    | w1     | w2     |
OLD |  a  b  |  c  d  |
NEW |  a  b' |  c' d  |

The two-step update C' = C - w1 + w1' - w2 + w2' reduces algebraically
to a single word incremental checksum update with byteswapped operands:

    C' = C - w1 - w2 + w1' + w2'
       = C - (a * 2^8 + b)  - (c * 2^8 + d)
           + (a * 2^8 + b') + (c' * 2^8 + d)
       = C + 2^8 * (a - a + c' - c) + (b' - b + d - d)
       = C + 2^8 * (c' - c) + (b' - b)
       = C - (2^8 * c + b) + (2^8 * c' + b')

So the unaligned case adds no extra checksum operations.

Signed-off-by: Kacper Kokot <kacper.kokot.44@gmail.com>
---
v3:
 - Reframe as enhancement, not a fix (Pablo/Fernando)
 - Rename subject to xt_TCPMSS, drop "fix" wording
 - Reword commit message: packet may fail checksum validation and be
   dropped downstream (Pablo)
 - Target nf-next (Fernando)
 - Use __be16 for csum_oldmss/csum_newmss (sparse warning from
   kernel test robot)
 - Reorder local variable declarations to reverse xmas tree (Fernando)

v2:
 - Use get_unaligned_be16 (Fernando's suggestion)
 - Fix alignment check expression (David)
 - Mention it's a theoretical bug in the commit message
 - Drop cc stable, the bug is only theoretical

diff --git a/net/netfilter/xt_TCPMSS.c b/net/netfilter/xt_TCPMSS.c
index 80e1634bc51f..037add799d41 100644
--- a/net/netfilter/xt_TCPMSS.c
+++ b/net/netfilter/xt_TCPMSS.c
@@ -116,9 +116,10 @@ tcpmss_mangle_packet(struct sk_buff *skb,
 	opt = (u_int8_t *)tcph;
 	for (i = sizeof(struct tcphdr); i <= tcp_hdrlen - TCPOLEN_MSS; i += optlen(opt, i)) {
 		if (opt[i] == TCPOPT_MSS && opt[i+1] == TCPOLEN_MSS) {
+			__be16 csum_oldmss, csum_newmss;
 			u_int16_t oldmss;
 
-			oldmss = (opt[i+2] << 8) | opt[i+3];
+			oldmss = get_unaligned_be16(&opt[i + 2]);
 
 			/* Never increase MSS, even when setting it, as
 			 * doing so results in problems for hosts that rely
@@ -130,8 +131,25 @@ tcpmss_mangle_packet(struct sk_buff *skb,
 			opt[i+2] = (newmss & 0xff00) >> 8;
 			opt[i+3] = newmss & 0x00ff;
 
+			csum_oldmss = htons(oldmss);
+			csum_newmss = htons(newmss);
+
+			if (((char *)&opt[i + 2] - (char *)tcph) & 0x1) {
+				/* MSS option is unaligned: the modified bytes
+				 * straddle two checksum words. Byteswapping
+				 * the operands lets a single incremental
+				 * update produce the correct checksum delta
+				 * (see commit message for the derivation).
+				 */
+				csum_oldmss = htons(swab16(oldmss));
+				csum_newmss = htons(swab16(newmss));
+			} else {
+				csum_oldmss = htons(oldmss);
+				csum_newmss = htons(newmss);
+			}
+
 			inet_proto_csum_replace2(&tcph->check, skb,
-						 htons(oldmss), htons(newmss),
+						 csum_oldmss, csum_newmss,
 						 false);
 			return 0;
 		}
-- 
2.43.0


^ permalink raw reply related

* [PATCH iproute2] ip: return correct status from help command
From: Rose Wright @ 2026-06-21 18:03 UTC (permalink / raw)
  To: netdev; +Cc: stephen, Rose Wright

Currently, "ip help" or "ip -help" always returns an error code because usage() is used as a fall through on "ip" and defaults to stderr with -1.

This is a minor bug that breaks "ip help | grep" and other scripts that rely on standard exit codes. The fix is to pass the status code as a parameter into usage() and change stderr to stdout when needed.

Signed-off-by: Rose Wright <rosesophiewright@gmail.com>
---
 ip/ip.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/ip/ip.c b/ip/ip.c
index e4b71bde..ee4595e8 100644
--- a/ip/ip.c
+++ b/ip/ip.c
@@ -52,11 +52,12 @@ const char *get_ip_lib_dir(void)
 	return lib_dir;
 }
 
-static void usage(void) __attribute__((noreturn));
+static void usage(int status) __attribute__((noreturn));
 
-static void usage(void)
+static void usage(int status)
 {
-	fprintf(stderr,
+	FILE *out = status == 0 ? stdout : stderr;
+	fprintf(out,
 		"Usage: ip [ OPTIONS ] OBJECT { COMMAND | help }\n"
 		"       ip [ -force ] -batch filename\n"
 		"where  OBJECT := { address | addrlabel | fou | help | ila | ioam | l2tp | link |\n"
@@ -72,12 +73,12 @@ static void usage(void)
 		"                    -o[neline] | -t[imestamp] | -ts[hort] | -b[atch] [filename] |\n"
 		"                    -rc[vbuf] [size] | -n[etns] name | -N[umeric] | -a[ll] |\n"
 		"                    -c[olor]}\n");
-	exit(-1);
+	exit(status);
 }
 
 static int do_help(int argc, char **argv)
 {
-	usage();
+	usage(0);
 	return 0;
 }
 
@@ -279,7 +280,7 @@ int main(int argc, char **argv)
 			rcvbuf = size;
 		} else if (matches_color(opt, &color)) {
 		} else if (matches(opt, "-help") == 0) {
-			usage();
+			usage(0);
 		} else if (matches(opt, "-netns") == 0) {
 			NEXT_ARG();
 			if (netns_switch(argv[1]))
@@ -321,5 +322,5 @@ int main(int argc, char **argv)
 		return do_cmd(argv[1], argc-1, argv+1, true);
 
 	rtnl_close(&rth);
-	usage();
+	usage(-1);
 }
-- 
2.54.0


^ permalink raw reply related

* [PATCH stable 6.6.y v4 4/4] selftests/bpf: Update comments find_equal_scalars->sync_linked_regs
From: Zhenzhong Wu @ 2026-06-21 17:27 UTC (permalink / raw)
  To: bpf
  Cc: netdev, linux-kernel, ast, daniel, john.fastabend, andrii,
	martin.lau, song, yonghong.song, kpsingh, haoluo, jolsa,
	menglong8.dong, eddyz87, shung-hsi.yu, stable, mykolal, tamird
In-Reply-To: <20260621172735.409355-1-jt26wzz@gmail.com>

From: Eduard Zingerman <eddyz87@gmail.com>

[ Upstream commit cfbf25481d6dec0089c99c9d33a2ea634fe8f008 ]

find_equal_scalars() is renamed to sync_linked_regs(),
this commit updates existing references in the selftests comments.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240718202357.1746514-5-eddyz87@gmail.com
[ zhenzhong: only two pre-existing comments still needed updating in 6.6.y. ]
Signed-off-by: Zhenzhong Wu <jt26wzz@gmail.com>
---
 tools/testing/selftests/bpf/progs/verifier_spill_fill.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/bpf/progs/verifier_spill_fill.c b/tools/testing/selftests/bpf/progs/verifier_spill_fill.c
index 1f71f596d33f..07a2527a8f47 100644
--- a/tools/testing/selftests/bpf/progs/verifier_spill_fill.c
+++ b/tools/testing/selftests/bpf/progs/verifier_spill_fill.c
@@ -392,7 +392,7 @@ __naked void spill_32bit_of_64bit_fail(void)
 	*(u32*)(r10 - 8) = r1;				\
 	/* 32-bit fill r2 from stack. */		\
 	r2 = *(u32*)(r10 - 8);				\
-	/* Compare r2 with another register to trigger find_equal_scalars.\
+	/* Compare r2 with another register to trigger sync_linked_regs.\
 	 * Having one random bit is important here, otherwise the verifier cuts\
 	 * the corners. If the ID was mistakenly preserved on spill, this would\
 	 * cause the verifier to think that r1 is also equal to zero in one of\
@@ -431,7 +431,7 @@ __naked void spill_16bit_of_32bit_fail(void)
 	*(u16*)(r10 - 8) = r1;				\
 	/* 16-bit fill r2 from stack. */		\
 	r2 = *(u16*)(r10 - 8);				\
-	/* Compare r2 with another register to trigger find_equal_scalars.\
+	/* Compare r2 with another register to trigger sync_linked_regs.\
 	 * Having one random bit is important here, otherwise the verifier cuts\
 	 * the corners. If the ID was mistakenly preserved on spill, this would\
 	 * cause the verifier to think that r1 is also equal to zero in one of\
-- 
2.43.0


^ permalink raw reply related

* [PATCH stable 6.6.y v4 3/4] selftests/bpf: Tests for per-insn sync_linked_regs() precision tracking
From: Zhenzhong Wu @ 2026-06-21 17:27 UTC (permalink / raw)
  To: bpf
  Cc: netdev, linux-kernel, ast, daniel, john.fastabend, andrii,
	martin.lau, song, yonghong.song, kpsingh, haoluo, jolsa,
	menglong8.dong, eddyz87, shung-hsi.yu, stable, mykolal, tamird
In-Reply-To: <20260621172735.409355-1-jt26wzz@gmail.com>

From: Eduard Zingerman <eddyz87@gmail.com>

[ Upstream commit bebc17b1c03b224a0b4aec6a171815e39f8ba9bc ]

Add a few test cases to verify precision tracking for scalars gaining
range because of sync_linked_regs():
- check what happens when more than 6 registers might gain range in
  sync_linked_regs();
- check if precision is propagated correctly when operand of
  conditional jump gained range in sync_linked_regs() and one of
  linked registers is marked precise;
- check if precision is propagated correctly when operand of
  conditional jump gained range in sync_linked_regs() and a
  other-linked operand of the conditional jump is marked precise;
- add a minimized reproducer for precision tracking bug reported in [0];
- Check that mark_chain_precision() for one of the conditional jump
  operands does not trigger equal scalars precision propagation.

[0] https://lore.kernel.org/bpf/CAEf4BzZ0xidVCqB47XnkXcNhkPWF6_nTV7yt+_Lf0kcFEut2Mg@mail.gmail.com/

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240718202357.1746514-4-eddyz87@gmail.com
[ zhenzhong: keep the linked_regs_broken_link_2 reject check, but
  drop the mark_precise log expectations because 6.6.y does not derive
  the scalar-vs-scalar range for that non-constant JMP_X comparison. ]
Signed-off-by: Zhenzhong Wu <jt26wzz@gmail.com>
---
 .../selftests/bpf/progs/verifier_scalar_ids.c | 162 ++++++++++++++++++
 1 file changed, 162 insertions(+)

diff --git a/tools/testing/selftests/bpf/progs/verifier_scalar_ids.c b/tools/testing/selftests/bpf/progs/verifier_scalar_ids.c
index f70392bf696c..2eb85eb3a06c 100644
--- a/tools/testing/selftests/bpf/progs/verifier_scalar_ids.c
+++ b/tools/testing/selftests/bpf/progs/verifier_scalar_ids.c
@@ -47,6 +47,72 @@ __naked void linked_regs_bpf_k(void)
 	: __clobber_all);
 }
 
+/* Registers r{0,1,2} share same ID when 'if r1 > ...' insn is processed,
+ * check that verifier marks r{1,2} as precise while backtracking
+ * 'if r1 > ...' with r0 already marked.
+ */
+SEC("socket")
+__success __log_level(2)
+__flag(BPF_F_TEST_STATE_FREQ)
+__msg("frame0: regs=r0 stack= before 5: (2d) if r1 > r3 goto pc+0")
+__msg("frame0: parent state regs=r0,r1,r2,r3 stack=:")
+__msg("frame0: regs=r0,r1,r2,r3 stack= before 4: (b7) r3 = 7")
+__naked void linked_regs_bpf_x_src(void)
+{
+	asm volatile (
+	/* r0 = random number up to 0xff */
+	"call %[bpf_ktime_get_ns];"
+	"r0 &= 0xff;"
+	/* tie r0.id == r1.id == r2.id */
+	"r1 = r0;"
+	"r2 = r0;"
+	"r3 = 7;"
+	"if r1 > r3 goto +0;"
+	/* force r0 to be precise, this eventually marks r1 and r2 as
+	 * precise as well because of shared IDs
+	 */
+	"r4 = r10;"
+	"r4 += r0;"
+	"r0 = 0;"
+	"exit;"
+	:
+	: __imm(bpf_ktime_get_ns)
+	: __clobber_all);
+}
+
+/* Registers r{0,1,2} share same ID when 'if r1 > r3' insn is processed,
+ * check that verifier marks r{0,1,2} as precise while backtracking
+ * 'if r1 > r3' with r3 already marked.
+ */
+SEC("socket")
+__success __log_level(2)
+__flag(BPF_F_TEST_STATE_FREQ)
+__msg("frame0: regs=r3 stack= before 5: (2d) if r1 > r3 goto pc+0")
+__msg("frame0: parent state regs=r0,r1,r2,r3 stack=:")
+__msg("frame0: regs=r0,r1,r2,r3 stack= before 4: (b7) r3 = 7")
+__naked void linked_regs_bpf_x_dst(void)
+{
+	asm volatile (
+	/* r0 = random number up to 0xff */
+	"call %[bpf_ktime_get_ns];"
+	"r0 &= 0xff;"
+	/* tie r0.id == r1.id == r2.id */
+	"r1 = r0;"
+	"r2 = r0;"
+	"r3 = 7;"
+	"if r1 > r3 goto +0;"
+	/* force r0 to be precise, this eventually marks r1 and r2 as
+	 * precise as well because of shared IDs
+	 */
+	"r4 = r10;"
+	"r4 += r3;"
+	"r0 = 0;"
+	"exit;"
+	:
+	: __imm(bpf_ktime_get_ns)
+	: __clobber_all);
+}
+
 /* Same as linked_regs_bpf_k, but break one of the
  * links, note that r1 is absent from regs=... in __msg below.
  */
@@ -280,6 +346,102 @@ __naked void precision_two_ids(void)
 	: __clobber_all);
 }
 
+SEC("socket")
+__success __log_level(2)
+__flag(BPF_F_TEST_STATE_FREQ)
+/* check thar r0 and r6 have different IDs after 'if',
+ * collect_linked_regs() can't tie more than 6 registers for a single insn.
+ */
+__msg("8: (25) if r0 > 0x7 goto pc+0         ; R0=scalar(id=1")
+__msg("9: (bf) r6 = r6                       ; R6_w=scalar(id=2")
+/* check that r{0-5} are marked precise after 'if' */
+__msg("frame0: regs=r0 stack= before 8: (25) if r0 > 0x7 goto pc+0")
+__msg("frame0: parent state regs=r0,r1,r2,r3,r4,r5 stack=:")
+__naked void linked_regs_too_many_regs(void)
+{
+	asm volatile (
+	/* r0 = random number up to 0xff */
+	"call %[bpf_ktime_get_ns];"
+	"r0 &= 0xff;"
+	/* tie r{0-6} IDs */
+	"r1 = r0;"
+	"r2 = r0;"
+	"r3 = r0;"
+	"r4 = r0;"
+	"r5 = r0;"
+	"r6 = r0;"
+	/* propagate range for r{0-6} */
+	"if r0 > 7 goto +0;"
+	/* make r6 appear in the log */
+	"r6 = r6;"
+	/* force r0 to be precise,
+	 * this would cause r{0-4} to be precise because of shared IDs
+	 */
+	"r7 = r10;"
+	"r7 += r0;"
+	"r0 = 0;"
+	"exit;"
+	:
+	: __imm(bpf_ktime_get_ns)
+	: __clobber_all);
+}
+
+SEC("socket")
+__failure __log_level(2)
+__flag(BPF_F_TEST_STATE_FREQ)
+__msg("div by zero")
+__naked void linked_regs_broken_link_2(void)
+{
+	asm volatile (
+	"call %[bpf_get_prandom_u32];"
+	"r7 = r0;"
+	"r8 = r0;"
+	"call %[bpf_get_prandom_u32];"
+	"if r0 > 1 goto +0;"
+	/* r7.id == r8.id,
+	 * thus r7 precision implies r8 precision,
+	 * which implies r0 precision because of the conditional below.
+	 */
+	"if r8 >= r0 goto 1f;"
+	/* break id relation between r7 and r8 */
+	"r8 += r8;"
+	/* make r7 precise */
+	"if r7 == 0 goto 1f;"
+	"r0 /= 0;"
+"1:"
+	"r0 = 42;"
+	"exit;"
+	:
+	: __imm(bpf_get_prandom_u32)
+	: __clobber_all);
+}
+
+/* Check that mark_chain_precision() for one of the conditional jump
+ * operands does not trigger equal scalars precision propagation.
+ */
+SEC("socket")
+__success __log_level(2)
+__msg("3: (25) if r1 > 0x100 goto pc+0")
+__msg("frame0: regs=r1 stack= before 2: (bf) r1 = r0")
+__naked void cjmp_no_linked_regs_trigger(void)
+{
+	asm volatile (
+	/* r0 = random number up to 0xff */
+	"call %[bpf_ktime_get_ns];"
+	"r0 &= 0xff;"
+	/* tie r0.id == r1.id */
+	"r1 = r0;"
+	/* the jump below would be predicted, thus r1 would be marked precise,
+	 * this should not imply precision mark for r0
+	 */
+	"if r1 > 256 goto +0;"
+	"r0 = 0;"
+	"exit;"
+	:
+	: __imm(bpf_ktime_get_ns)
+	: __clobber_all);
+}
+
 /* Verify that check_ids() is used by regsafe() for scalars.
  *
  * r9 = ... some pointer with range X ...
-- 
2.43.0

^ permalink raw reply related

* [PATCH stable 6.6.y v4 2/4] bpf: Remove mark_precise_scalar_ids()
From: Zhenzhong Wu @ 2026-06-21 17:27 UTC (permalink / raw)
  To: bpf
  Cc: netdev, linux-kernel, ast, daniel, john.fastabend, andrii,
	martin.lau, song, yonghong.song, kpsingh, haoluo, jolsa,
	menglong8.dong, eddyz87, shung-hsi.yu, stable, mykolal, tamird
In-Reply-To: <20260621172735.409355-1-jt26wzz@gmail.com>

From: Eduard Zingerman <eddyz87@gmail.com>

[ Upstream commit 842edb5507a1038e009d27e69d13b94b6f085763 ]

Function mark_precise_scalar_ids() is superseded by
bt_sync_linked_regs() and equal scalars tracking in jump history.
mark_precise_scalar_ids() propagates precision over registers sharing
same ID on parent/child state boundaries, while jump history records
allow bt_sync_linked_regs() to propagate same information with
instruction level granularity, which is strictly more precise.

This commit removes mark_precise_scalar_ids() and updates test cases
in progs/verifier_scalar_ids to reflect new verifier behavior.

The tests are updated in the following manner:
- mark_precise_scalar_ids() propagated precision regardless of
  presence of conditional jumps, while new jump history based logic
  only kicks in when conditional jumps are present.
  Hence test cases are augmented with conditional jumps to still
  trigger precision propagation.
- As equal scalars tracking no longer relies on parent/child state
  boundaries some test cases are no longer interesting,
  such test cases are removed, namely:
  - precision_same_state and precision_cross_state are superseded by
    linked_regs_bpf_k;
  - precision_same_state_broken_link and equal_scalars_broken_link
    are superseded by linked_regs_broken_link.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240718202357.1746514-3-eddyz87@gmail.com
[ zhenzhong: backport to 6.6.y after adapting the first linked-regs
  history commit to the older scalar-id verifier layout. ]
Signed-off-by: Zhenzhong Wu <jt26wzz@gmail.com>
---
 kernel/bpf/verifier.c                         | 115 ------------
 .../selftests/bpf/progs/verifier_scalar_ids.c | 171 ++++++------------
 .../testing/selftests/bpf/verifier/precise.c  |   2 +-
 3 files changed, 56 insertions(+), 232 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 2268f095203e..f638b2d3a42f 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -4265,96 +4265,6 @@ static void mark_all_scalars_imprecise(struct bpf_verifier_env *env, struct bpf_
 	}
 }
 
-static bool idset_contains(struct bpf_idset *s, u32 id)
-{
-	u32 i;
-
-	for (i = 0; i < s->count; ++i)
-		if (s->ids[i] == id)
-			return true;
-
-	return false;
-}
-
-static int idset_push(struct bpf_idset *s, u32 id)
-{
-	if (WARN_ON_ONCE(s->count >= ARRAY_SIZE(s->ids)))
-		return -EFAULT;
-	s->ids[s->count++] = id;
-	return 0;
-}
-
-static void idset_reset(struct bpf_idset *s)
-{
-	s->count = 0;
-}
-
-/* Collect a set of IDs for all registers currently marked as precise in env->bt.
- * Mark all registers with these IDs as precise.
- */
-static int mark_precise_scalar_ids(struct bpf_verifier_env *env, struct bpf_verifier_state *st)
-{
-	struct bpf_idset *precise_ids = &env->idset_scratch;
-	struct backtrack_state *bt = &env->bt;
-	struct bpf_func_state *func;
-	struct bpf_reg_state *reg;
-	DECLARE_BITMAP(mask, 64);
-	int i, fr;
-
-	idset_reset(precise_ids);
-
-	for (fr = bt->frame; fr >= 0; fr--) {
-		func = st->frame[fr];
-
-		bitmap_from_u64(mask, bt_frame_reg_mask(bt, fr));
-		for_each_set_bit(i, mask, 32) {
-			reg = &func->regs[i];
-			if (!reg->id || reg->type != SCALAR_VALUE)
-				continue;
-			if (idset_push(precise_ids, reg->id))
-				return -EFAULT;
-		}
-
-		bitmap_from_u64(mask, bt_frame_stack_mask(bt, fr));
-		for_each_set_bit(i, mask, 64) {
-			if (i >= func->allocated_stack / BPF_REG_SIZE)
-				break;
-			if (!is_spilled_scalar_reg(&func->stack[i]))
-				continue;
-			reg = &func->stack[i].spilled_ptr;
-			if (!reg->id)
-				continue;
-			if (idset_push(precise_ids, reg->id))
-				return -EFAULT;
-		}
-	}
-
-	for (fr = 0; fr <= st->curframe; ++fr) {
-		func = st->frame[fr];
-
-		for (i = BPF_REG_0; i < BPF_REG_10; ++i) {
-			reg = &func->regs[i];
-			if (!reg->id)
-				continue;
-			if (!idset_contains(precise_ids, reg->id))
-				continue;
-			bt_set_frame_reg(bt, fr, i);
-		}
-		for (i = 0; i < func->allocated_stack / BPF_REG_SIZE; ++i) {
-			if (!is_spilled_scalar_reg(&func->stack[i]))
-				continue;
-			reg = &func->stack[i].spilled_ptr;
-			if (!reg->id)
-				continue;
-			if (!idset_contains(precise_ids, reg->id))
-				continue;
-			bt_set_frame_slot(bt, fr, i);
-		}
-	}
-
-	return 0;
-}
-
 /*
  * __mark_chain_precision() backtracks BPF program instruction sequence and
  * chain of verifier states making sure that register *regno* (if regno >= 0)
@@ -4487,31 +4397,6 @@ static int __mark_chain_precision(struct bpf_verifier_env *env, int regno)
 				bt->frame, last_idx, first_idx, subseq_idx);
 		}
 
-		/* If some register with scalar ID is marked as precise,
-		 * make sure that all registers sharing this ID are also precise.
-		 * This is needed to estimate effect of sync_linked_regs().
-		 * Do this at the last instruction of each state,
-		 * bpf_reg_state::id fields are valid for these instructions.
-		 *
-		 * Allows to track precision in situation like below:
-		 *
-		 *     r2 = unknown value
-		 *     ...
-		 *   --- state #0 ---
-		 *     ...
-		 *     r1 = r2                 // r1 and r2 now share the same ID
-		 *     ...
-		 *   --- state #1 {r1.id = A, r2.id = A} ---
-		 *     ...
-		 *     if (r2 > 10) goto exit; // sync_linked_regs() assigns range to r1
-		 *     ...
-		 *   --- state #2 {r1.id = A, r2.id = A} ---
-		 *     r3 = r10
-		 *     r3 += r1                // need to mark both r1 and r2
-		 */
-		if (mark_precise_scalar_ids(env, st))
-			return -EFAULT;
-
 		if (last_idx < 0) {
 			/* we are at the entry into subprog, which
 			 * is expected for global funcs, but only if
diff --git a/tools/testing/selftests/bpf/progs/verifier_scalar_ids.c b/tools/testing/selftests/bpf/progs/verifier_scalar_ids.c
index 22a6cf6e8255..f70392bf696c 100644
--- a/tools/testing/selftests/bpf/progs/verifier_scalar_ids.c
+++ b/tools/testing/selftests/bpf/progs/verifier_scalar_ids.c
@@ -5,54 +5,27 @@
 #include "bpf_misc.h"
 
 /* Check that precision marks propagate through scalar IDs.
- * Registers r{0,1,2} have the same scalar ID at the moment when r0 is
- * marked to be precise, this mark is immediately propagated to r{1,2}.
+ * Registers r{0,1,2} have the same scalar ID.
+ * Range information is propagated for scalars sharing same ID.
+ * Check that precision mark for r0 causes precision marks for r{1,2}
+ * when range information is propagated for 'if <reg> <op> <const>' insn.
  */
 SEC("socket")
 __success __log_level(2)
-__msg("frame0: regs=r0,r1,r2 stack= before 4: (bf) r3 = r10")
-__msg("frame0: regs=r0,r1,r2 stack= before 3: (bf) r2 = r0")
-__msg("frame0: regs=r0,r1 stack= before 2: (bf) r1 = r0")
-__msg("frame0: regs=r0 stack= before 1: (57) r0 &= 255")
-__msg("frame0: regs=r0 stack= before 0: (85) call bpf_ktime_get_ns")
-__flag(BPF_F_TEST_STATE_FREQ)
-__naked void precision_same_state(void)
-{
-	asm volatile (
-	/* r0 = random number up to 0xff */
-	"call %[bpf_ktime_get_ns];"
-	"r0 &= 0xff;"
-	/* tie r0.id == r1.id == r2.id */
-	"r1 = r0;"
-	"r2 = r0;"
-	/* force r0 to be precise, this immediately marks r1 and r2 as
-	 * precise as well because of shared IDs
-	 */
-	"r3 = r10;"
-	"r3 += r0;"
-	"r0 = 0;"
-	"exit;"
-	:
-	: __imm(bpf_ktime_get_ns)
-	: __clobber_all);
-}
-
-/* Same as precision_same_state, but mark propagates through state /
- * parent state boundary.
- */
-SEC("socket")
-__success __log_level(2)
-__msg("frame0: last_idx 6 first_idx 5 subseq_idx -1")
-__msg("frame0: regs=r0,r1,r2 stack= before 5: (bf) r3 = r10")
+/* first 'if' branch */
+__msg("6: (0f) r3 += r0")
+__msg("frame0: regs=r0 stack= before 4: (25) if r1 > 0x7 goto pc+0")
 __msg("frame0: parent state regs=r0,r1,r2 stack=:")
-__msg("frame0: regs=r0,r1,r2 stack= before 4: (05) goto pc+0")
 __msg("frame0: regs=r0,r1,r2 stack= before 3: (bf) r2 = r0")
-__msg("frame0: regs=r0,r1 stack= before 2: (bf) r1 = r0")
-__msg("frame0: regs=r0 stack= before 1: (57) r0 &= 255")
-__msg("frame0: parent state regs=r0 stack=:")
-__msg("frame0: regs=r0 stack= before 0: (85) call bpf_ktime_get_ns")
+/* second 'if' branch */
+__msg("from 4 to 5: ")
+__msg("6: (0f) r3 += r0")
+__msg("frame0: regs=r0 stack= before 5: (bf) r3 = r10")
+__msg("frame0: regs=r0 stack= before 4: (25) if r1 > 0x7 goto pc+0")
+/* parent state already has r{0,1,2} as precise */
+__msg("frame0: parent state regs= stack=:")
 __flag(BPF_F_TEST_STATE_FREQ)
-__naked void precision_cross_state(void)
+__naked void linked_regs_bpf_k(void)
 {
 	asm volatile (
 	/* r0 = random number up to 0xff */
@@ -61,9 +34,8 @@ __naked void precision_cross_state(void)
 	/* tie r0.id == r1.id == r2.id */
 	"r1 = r0;"
 	"r2 = r0;"
-	/* force checkpoint */
-	"goto +0;"
-	/* force r0 to be precise, this immediately marks r1 and r2 as
+	"if r1 > 7 goto +0;"
+	/* force r0 to be precise, this eventually marks r1 and r2 as
 	 * precise as well because of shared IDs
 	 */
 	"r3 = r10;"
@@ -75,59 +47,18 @@ __naked void precision_cross_state(void)
 	: __clobber_all);
 }
 
-/* Same as precision_same_state, but break one of the
+/* Same as linked_regs_bpf_k, but break one of the
  * links, note that r1 is absent from regs=... in __msg below.
  */
 SEC("socket")
 __success __log_level(2)
-__msg("frame0: regs=r0,r2 stack= before 5: (bf) r3 = r10")
-__msg("frame0: regs=r0,r2 stack= before 4: (b7) r1 = 0")
-__msg("frame0: regs=r0,r2 stack= before 3: (bf) r2 = r0")
-__msg("frame0: regs=r0 stack= before 2: (bf) r1 = r0")
-__msg("frame0: regs=r0 stack= before 1: (57) r0 &= 255")
-__msg("frame0: regs=r0 stack= before 0: (85) call bpf_ktime_get_ns")
-__flag(BPF_F_TEST_STATE_FREQ)
-__naked void precision_same_state_broken_link(void)
-{
-	asm volatile (
-	/* r0 = random number up to 0xff */
-	"call %[bpf_ktime_get_ns];"
-	"r0 &= 0xff;"
-	/* tie r0.id == r1.id == r2.id */
-	"r1 = r0;"
-	"r2 = r0;"
-	/* break link for r1, this is the only line that differs
-	 * compared to the previous test
-	 */
-	"r1 = 0;"
-	/* force r0 to be precise, this immediately marks r1 and r2 as
-	 * precise as well because of shared IDs
-	 */
-	"r3 = r10;"
-	"r3 += r0;"
-	"r0 = 0;"
-	"exit;"
-	:
-	: __imm(bpf_ktime_get_ns)
-	: __clobber_all);
-}
-
-/* Same as precision_same_state_broken_link, but with state /
- * parent state boundary.
- */
-SEC("socket")
-__success __log_level(2)
-__msg("frame0: regs=r0,r2 stack= before 6: (bf) r3 = r10")
-__msg("frame0: regs=r0,r2 stack= before 5: (b7) r1 = 0")
-__msg("frame0: parent state regs=r0,r2 stack=:")
-__msg("frame0: regs=r0,r1,r2 stack= before 4: (05) goto pc+0")
-__msg("frame0: regs=r0,r1,r2 stack= before 3: (bf) r2 = r0")
-__msg("frame0: regs=r0,r1 stack= before 2: (bf) r1 = r0")
-__msg("frame0: regs=r0 stack= before 1: (57) r0 &= 255")
+__msg("7: (0f) r3 += r0")
+__msg("frame0: regs=r0 stack= before 6: (bf) r3 = r10")
 __msg("frame0: parent state regs=r0 stack=:")
-__msg("frame0: regs=r0 stack= before 0: (85) call bpf_ktime_get_ns")
+__msg("frame0: regs=r0 stack= before 5: (25) if r0 > 0x7 goto pc+0")
+__msg("frame0: parent state regs=r0,r2 stack=:")
 __flag(BPF_F_TEST_STATE_FREQ)
-__naked void precision_cross_state_broken_link(void)
+__naked void linked_regs_broken_link(void)
 {
 	asm volatile (
 	/* r0 = random number up to 0xff */
@@ -136,18 +67,13 @@ __naked void precision_cross_state_broken_link(void)
 	/* tie r0.id == r1.id == r2.id */
 	"r1 = r0;"
 	"r2 = r0;"
-	/* force checkpoint, although link between r1 and r{0,2} is
-	 * broken by the next statement current precision tracking
-	 * algorithm can't react to it and propagates mark for r1 to
-	 * the parent state.
-	 */
-	"goto +0;"
 	/* break link for r1, this is the only line that differs
-	 * compared to precision_cross_state()
+	 * compared to the previous test
 	 */
 	"r1 = 0;"
-	/* force r0 to be precise, this immediately marks r1 and r2 as
-	 * precise as well because of shared IDs
+	"if r0 > 7 goto +0;"
+	/* force r0 to be precise,
+	 * this eventually marks r2 as precise because of shared IDs
 	 */
 	"r3 = r10;"
 	"r3 += r0;"
@@ -164,10 +90,16 @@ __naked void precision_cross_state_broken_link(void)
  */
 SEC("socket")
 __success __log_level(2)
-__msg("11: (0f) r2 += r1")
+__msg("12: (0f) r2 += r1")
 /* Current state */
-__msg("frame2: last_idx 11 first_idx 10 subseq_idx -1")
-__msg("frame2: regs=r1 stack= before 10: (bf) r2 = r10")
+__msg("frame2: last_idx 12 first_idx 11 subseq_idx -1 ")
+__msg("frame2: regs=r1 stack= before 11: (bf) r2 = r10")
+__msg("frame2: parent state regs=r1 stack=")
+__msg("frame1: parent state regs= stack=")
+__msg("frame0: parent state regs= stack=")
+/* Parent state */
+__msg("frame2: last_idx 10 first_idx 10 subseq_idx 11 ")
+__msg("frame2: regs=r1 stack= before 10: (25) if r1 > 0x7 goto pc+0")
 __msg("frame2: parent state regs=r1 stack=")
 /* frame1.r{6,7} are marked because mark_precise_scalar_ids()
  * looks for all registers with frame2.r1.id in the current state
@@ -192,7 +124,7 @@ __msg("frame1: regs=r1 stack= before 4: (85) call pc+1")
 __msg("frame0: parent state regs=r1,r6 stack=")
 /* Parent state */
 __msg("frame0: last_idx 3 first_idx 1 subseq_idx 4")
-__msg("frame0: regs=r0,r1,r6 stack= before 3: (bf) r6 = r0")
+__msg("frame0: regs=r1,r6 stack= before 3: (bf) r6 = r0")
 __msg("frame0: regs=r0,r1 stack= before 2: (bf) r1 = r0")
 __msg("frame0: regs=r0 stack= before 1: (57) r0 &= 255")
 __flag(BPF_F_TEST_STATE_FREQ)
@@ -230,7 +162,8 @@ static __naked __noinline __used
 void precision_many_frames__bar(void)
 {
 	asm volatile (
-	/* force r1 to be precise, this immediately marks:
+	"if r1 > 7 goto +0;"
+	/* force r1 to be precise, this eventually marks:
 	 * - bar frame r1
 	 * - foo frame r{1,6,7}
 	 * - main frame r{1,6}
@@ -247,14 +180,16 @@ void precision_many_frames__bar(void)
  */
 SEC("socket")
 __success __log_level(2)
+__msg("11: (0f) r2 += r1")
 /* foo frame */
-__msg("frame1: regs=r1 stack=-8,-16 before 9: (bf) r2 = r10")
+__msg("frame1: regs=r1 stack= before 10: (bf) r2 = r10")
+__msg("frame1: regs=r1 stack= before 9: (25) if r1 > 0x7 goto pc+0")
 __msg("frame1: regs=r1 stack=-8,-16 before 8: (7b) *(u64 *)(r10 -16) = r1")
 __msg("frame1: regs=r1 stack=-8 before 7: (7b) *(u64 *)(r10 -8) = r1")
 __msg("frame1: regs=r1 stack= before 4: (85) call pc+2")
 /* main frame */
-__msg("frame0: regs=r0,r1 stack=-8 before 3: (7b) *(u64 *)(r10 -8) = r1")
-__msg("frame0: regs=r0,r1 stack= before 2: (bf) r1 = r0")
+__msg("frame0: regs=r1 stack=-8 before 3: (7b) *(u64 *)(r10 -8) = r1")
+__msg("frame0: regs=r1 stack= before 2: (bf) r1 = r0")
 __msg("frame0: regs=r0 stack= before 1: (57) r0 &= 255")
 __flag(BPF_F_TEST_STATE_FREQ)
 __naked void precision_stack(void)
@@ -283,7 +218,8 @@ void precision_stack__foo(void)
 	 */
 	"*(u64*)(r10 - 8) = r1;"
 	"*(u64*)(r10 - 16) = r1;"
-	/* force r1 to be precise, this immediately marks:
+	"if r1 > 7 goto +0;"
+	/* force r1 to be precise, this eventually marks:
 	 * - foo frame r1,fp{-8,-16}
 	 * - main frame r1,fp{-8}
 	 */
@@ -299,15 +235,17 @@ void precision_stack__foo(void)
 SEC("socket")
 __success __log_level(2)
 /* r{6,7} */
-__msg("11: (0f) r3 += r7")
-__msg("frame0: regs=r6,r7 stack= before 10: (bf) r3 = r10")
+__msg("12: (0f) r3 += r7")
+__msg("frame0: regs=r7 stack= before 11: (bf) r3 = r10")
+__msg("frame0: regs=r7 stack= before 9: (25) if r7 > 0x7 goto pc+0")
 /* ... skip some insns ... */
 __msg("frame0: regs=r6,r7 stack= before 3: (bf) r7 = r0")
 __msg("frame0: regs=r0,r6 stack= before 2: (bf) r6 = r0")
 /* r{8,9} */
-__msg("12: (0f) r3 += r9")
-__msg("frame0: regs=r8,r9 stack= before 11: (0f) r3 += r7")
+__msg("13: (0f) r3 += r9")
+__msg("frame0: regs=r9 stack= before 12: (0f) r3 += r7")
 /* ... skip some insns ... */
+__msg("frame0: regs=r9 stack= before 10: (25) if r9 > 0x7 goto pc+0")
 __msg("frame0: regs=r8,r9 stack= before 7: (bf) r9 = r0")
 __msg("frame0: regs=r0,r8 stack= before 6: (bf) r8 = r0")
 __flag(BPF_F_TEST_STATE_FREQ)
@@ -328,8 +266,9 @@ __naked void precision_two_ids(void)
 	"r9 = r0;"
 	/* clear r0 id */
 	"r0 = 0;"
-	/* force checkpoint */
-	"goto +0;"
+	/* propagate equal scalars precision */
+	"if r7 > 7 goto +0;"
+	"if r9 > 7 goto +0;"
 	"r3 = r10;"
 	/* force r7 to be precise, this also marks r6 */
 	"r3 += r7;"
diff --git a/tools/testing/selftests/bpf/verifier/precise.c b/tools/testing/selftests/bpf/verifier/precise.c
index b0b1bcc668ad..59a020c35647 100644
--- a/tools/testing/selftests/bpf/verifier/precise.c
+++ b/tools/testing/selftests/bpf/verifier/precise.c
@@ -106,7 +106,7 @@
 	mark_precise: frame0: regs=r2 stack= before 22\
 	mark_precise: frame0: parent state regs=r2 stack=:\
 	mark_precise: frame0: last_idx 20 first_idx 20\
-	mark_precise: frame0: regs=r2,r9 stack= before 20\
+	mark_precise: frame0: regs=r2 stack= before 20\
 	mark_precise: frame0: parent state regs=r2,r9 stack=:\
 	mark_precise: frame0: last_idx 19 first_idx 17\
 	mark_precise: frame0: regs=r2,r9 stack= before 19\
-- 
2.43.0


^ permalink raw reply related

* [PATCH stable 6.6.y v4 1/4] bpf: Track equal scalars history on per-instruction level
From: Zhenzhong Wu @ 2026-06-21 17:27 UTC (permalink / raw)
  To: bpf
  Cc: netdev, linux-kernel, ast, daniel, john.fastabend, andrii,
	martin.lau, song, yonghong.song, kpsingh, haoluo, jolsa,
	menglong8.dong, eddyz87, shung-hsi.yu, stable, mykolal, tamird,
	Hao Sun
In-Reply-To: <20260621172735.409355-1-jt26wzz@gmail.com>

From: Eduard Zingerman <eddyz87@gmail.com>

[ Upstream commit 4bf79f9be434e000c8e12fe83b2f4402480f1460 ]

Use bpf_verifier_state->jmp_history to track which registers were
updated by find_equal_scalars() (renamed to collect_linked_regs())
when conditional jump was verified. Use recorded information in
backtrack_insn() to propagate precision.

E.g. for the following program:

            while verifying instructions
  1: r1 = r0              |
  2: if r1 < 8  goto ...  | push r0,r1 as linked registers in jmp_history
  3: if r0 > 16 goto ...  | push r0,r1 as linked registers in jmp_history
  4: r2 = r10             |
  5: r2 += r0             v mark_chain_precision(r0)

            while doing mark_chain_precision(r0)
  5: r2 += r0             | mark r0 precise
  4: r2 = r10             |
  3: if r0 > 16 goto ...  | mark r0,r1 as precise
  2: if r1 < 8  goto ...  | mark r0,r1 as precise
  1: r1 = r0              v

Technically, do this as follows:
- Use 10 bits to identify each register that gains range because of
  sync_linked_regs():
  - 3 bits for frame number;
  - 6 bits for register or stack slot number;
  - 1 bit to indicate if register is spilled.
- Use u64 as a vector of 6 such records + 4 bits for vector length.
- Augment struct bpf_jmp_history_entry with a field 'linked_regs'
  representing such vector.
- When doing check_cond_jmp_op() remember up to 6 registers that
  gain range because of sync_linked_regs() in such a vector.
- Don't propagate range information and reset IDs for registers that
  don't fit in 6-value vector.
- Push a pair {instruction index, linked registers vector}
  to bpf_verifier_state->jmp_history.
- When doing backtrack_insn() check if any of recorded linked
  registers is currently marked precise, if so mark all linked
  registers as precise.

This also requires fixes for two test_verifier tests:
- precise: test 1
- precise: test 2

Both tests contain the following instruction sequence:

19: (bf) r2 = r9                      ; R2=scalar(id=3) R9=scalar(id=3)
20: (a5) if r2 < 0x8 goto pc+1        ; R2=scalar(id=3,umin=8)
21: (95) exit
22: (07) r2 += 1                      ; R2_w=scalar(id=3+1,...)
23: (bf) r1 = r10                     ; R1_w=fp0 R10=fp0
24: (07) r1 += -8                     ; R1_w=fp-8
25: (b7) r3 = 0                       ; R3_w=0
26: (85) call bpf_probe_read_kernel#113

The call to bpf_probe_read_kernel() at (26) forces r2 to be precise.
Previously, this forced all registers with same id to become precise
immediately when mark_chain_precision() is called.
After this change, the precision is propagated to registers sharing
same id only when 'if' instruction is backtracked.
Hence verification log for both tests is changed:
regs=r2,r9 -> regs=r2 for instructions 25..20.

Fixes: 904e6ddf4133 ("bpf: Use scalar ids in mark_chain_precision()")
Reported-by: Hao Sun <sunhao.th@gmail.com>
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240718202357.1746514-2-eddyz87@gmail.com
Closes: https://lore.kernel.org/bpf/CAEf4BzZ0xidVCqB47XnkXcNhkPWF6_nTV7yt+_Lf0kcFEut2Mg@mail.gmail.com/
[ zhenzhong: backport to 6.6.y verifier layout and adapt
  sync_linked_regs() to the pre-BPF_ADD_CONST scalar-id code. ]
Signed-off-by: Zhenzhong Wu <jt26wzz@gmail.com>
---
 include/linux/bpf_verifier.h                  |   4 +
 kernel/bpf/verifier.c                         | 253 ++++++++++++++++--
 .../bpf/progs/verifier_subprog_precision.c    |   2 +-
 .../testing/selftests/bpf/verifier/precise.c  |   2 +-
 4 files changed, 237 insertions(+), 24 deletions(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index dba211d3bb9a..9a3b93c24f19 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -345,6 +345,10 @@ struct bpf_jmp_history_entry {
 	u32 prev_idx : 22;
 	/* special flags, e.g., whether insn is doing register stack spill/load */
 	u32 flags : 10;
+	/* additional registers that need precision tracking when this
+	 * jump is backtracked, vector of six 10-bit records
+	 */
+	u64 linked_regs;
 };
 
 /* Maximum number of register states that can exist at once */
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 0d90236d0ad9..2268f095203e 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -3461,9 +3461,87 @@ static bool is_jmp_point(struct bpf_verifier_env *env, int insn_idx)
 	return env->insn_aux_data[insn_idx].jmp_point;
 }
 
+#define LR_FRAMENO_BITS	3
+#define LR_SPI_BITS	6
+#define LR_ENTRY_BITS	(LR_SPI_BITS + LR_FRAMENO_BITS + 1)
+#define LR_SIZE_BITS	4
+#define LR_FRAMENO_MASK	((1ull << LR_FRAMENO_BITS) - 1)
+#define LR_SPI_MASK	((1ull << LR_SPI_BITS)     - 1)
+#define LR_SIZE_MASK	((1ull << LR_SIZE_BITS)    - 1)
+#define LR_SPI_OFF	LR_FRAMENO_BITS
+#define LR_IS_REG_OFF	(LR_SPI_BITS + LR_FRAMENO_BITS)
+#define LINKED_REGS_MAX	6
+
+struct linked_reg {
+	u8 frameno;
+	union {
+		u8 spi;
+		u8 regno;
+	};
+	bool is_reg;
+};
+
+struct linked_regs {
+	int cnt;
+	struct linked_reg entries[LINKED_REGS_MAX];
+};
+
+static struct linked_reg *linked_regs_push(struct linked_regs *s)
+{
+	if (s->cnt < LINKED_REGS_MAX)
+		return &s->entries[s->cnt++];
+
+	return NULL;
+}
+
+/* Use u64 as a vector of 6 10-bit values, use first 4-bits to track
+ * number of elements currently in stack.
+ * Pack one history entry for linked registers as 10 bits in the following format:
+ * - 3-bits frameno
+ * - 6-bits spi_or_reg
+ * - 1-bit  is_reg
+ */
+static u64 linked_regs_pack(struct linked_regs *s)
+{
+	u64 val = 0;
+	int i;
+
+	for (i = 0; i < s->cnt; ++i) {
+		struct linked_reg *e = &s->entries[i];
+		u64 tmp = 0;
+
+		tmp |= e->frameno;
+		tmp |= e->spi << LR_SPI_OFF;
+		tmp |= (e->is_reg ? 1 : 0) << LR_IS_REG_OFF;
+
+		val <<= LR_ENTRY_BITS;
+		val |= tmp;
+	}
+	val <<= LR_SIZE_BITS;
+	val |= s->cnt;
+	return val;
+}
+
+static void linked_regs_unpack(u64 val, struct linked_regs *s)
+{
+	int i;
+
+	s->cnt = val & LR_SIZE_MASK;
+	val >>= LR_SIZE_BITS;
+
+	for (i = 0; i < s->cnt; ++i) {
+		struct linked_reg *e = &s->entries[i];
+
+		e->frameno =  val & LR_FRAMENO_MASK;
+		e->spi     = (val >> LR_SPI_OFF) & LR_SPI_MASK;
+		e->is_reg  = (val >> LR_IS_REG_OFF) & 0x1;
+		val >>= LR_ENTRY_BITS;
+	}
+}
+
 /* for any branch, call, exit record the history of jmps in the given state */
 static int push_jmp_history(struct bpf_verifier_env *env, struct bpf_verifier_state *cur,
-			    int insn_flags)
+			    int insn_flags, u64 linked_regs)
 {
 	u32 cnt = cur->jmp_history_cnt;
 	struct bpf_jmp_history_entry *p;
@@ -3479,6 +3557,10 @@ static int push_jmp_history(struct bpf_verifier_env *env, struct bpf_verifier_st
 			  "verifier insn history bug: insn_idx %d cur flags %x new flags %x\n",
 			  env->insn_idx, env->cur_hist_ent->flags, insn_flags);
 		env->cur_hist_ent->flags |= insn_flags;
+		WARN_ONCE(env->cur_hist_ent->linked_regs != 0,
+			  "verifier insn history bug: insn_idx %d linked_regs != 0: %#llx\n",
+			  env->insn_idx, env->cur_hist_ent->linked_regs);
+		env->cur_hist_ent->linked_regs = linked_regs;
 		return 0;
 	}
 
@@ -3493,6 +3575,7 @@ static int push_jmp_history(struct bpf_verifier_env *env, struct bpf_verifier_st
 	p->idx = env->insn_idx;
 	p->prev_idx = env->prev_insn_idx;
 	p->flags = insn_flags;
+	p->linked_regs = linked_regs;
 	cur->jmp_history_cnt = cnt;
 	env->cur_hist_ent = p;
 
@@ -3668,6 +3751,11 @@ static inline bool bt_is_reg_set(struct backtrack_state *bt, u32 reg)
 	return bt->reg_masks[bt->frame] & (1 << reg);
 }
 
+static inline bool bt_is_frame_reg_set(struct backtrack_state *bt, u32 frame, u32 reg)
+{
+	return bt->reg_masks[frame] & (1 << reg);
+}
+
 static inline bool bt_is_frame_slot_set(struct backtrack_state *bt, u32 frame, u32 slot)
 {
 	return bt->stack_masks[frame] & (1ull << slot);
@@ -3717,6 +3805,42 @@ static void fmt_stack_mask(char *buf, ssize_t buf_sz, u64 stack_mask)
 	}
 }
 
+/* If any register R in hist->linked_regs is marked as precise in bt,
+ * do bt_set_frame_{reg,slot}(bt, R) for all registers in hist->linked_regs.
+ */
+static void bt_sync_linked_regs(struct backtrack_state *bt, struct bpf_jmp_history_entry *hist)
+{
+	struct linked_regs linked_regs;
+	bool some_precise = false;
+	int i;
+
+	if (!hist || hist->linked_regs == 0)
+		return;
+
+	linked_regs_unpack(hist->linked_regs, &linked_regs);
+	for (i = 0; i < linked_regs.cnt; ++i) {
+		struct linked_reg *e = &linked_regs.entries[i];
+
+		if ((e->is_reg && bt_is_frame_reg_set(bt, e->frameno, e->regno)) ||
+		    (!e->is_reg && bt_is_frame_slot_set(bt, e->frameno, e->spi))) {
+			some_precise = true;
+			break;
+		}
+	}
+
+	if (!some_precise)
+		return;
+
+	for (i = 0; i < linked_regs.cnt; ++i) {
+		struct linked_reg *e = &linked_regs.entries[i];
+
+		if (e->is_reg)
+			bt_set_frame_reg(bt, e->frameno, e->regno);
+		else
+			bt_set_frame_slot(bt, e->frameno, e->spi);
+	}
+}
+
 static bool calls_callback(struct bpf_verifier_env *env, int insn_idx);
 
 /* For given verifier state backtrack_insn() is called from the last insn to
@@ -3756,6 +3880,12 @@ static int backtrack_insn(struct bpf_verifier_env *env, int idx, int subseq_idx,
 		print_bpf_insn(&cbs, insn, env->allow_ptr_leaks);
 	}
 
+	/* If there is a history record that some registers gained range at this insn,
+	 * propagate precision marks to those registers, so that bt_is_reg_set()
+	 * accounts for these registers.
+	 */
+	bt_sync_linked_regs(bt, hist);
+
 	if (class == BPF_ALU || class == BPF_ALU64) {
 		if (!bt_is_reg_set(bt, dreg))
 			return 0;
@@ -3985,7 +4115,8 @@ static int backtrack_insn(struct bpf_verifier_env *env, int idx, int subseq_idx,
 			 */
 			bt_set_reg(bt, dreg);
 			bt_set_reg(bt, sreg);
-			 /* else dreg <cond> K
+		} else if (BPF_SRC(insn->code) == BPF_K) {
+			 /* dreg <cond> K
 			  * Only dreg still needs precision before
 			  * this insn, so for the K-based conditional
 			  * there is nothing new to be marked.
@@ -4003,6 +4134,10 @@ static int backtrack_insn(struct bpf_verifier_env *env, int idx, int subseq_idx,
 			/* to be analyzed */
 			return -ENOTSUPP;
 	}
+	/* Propagate precision marks to linked registers, to account for
+	 * registers marked as precise in this function.
+	 */
+	bt_sync_linked_regs(bt, hist);
 	return 0;
 }
 
@@ -4354,7 +4489,7 @@ static int __mark_chain_precision(struct bpf_verifier_env *env, int regno)
 
 		/* If some register with scalar ID is marked as precise,
 		 * make sure that all registers sharing this ID are also precise.
-		 * This is needed to estimate effect of find_equal_scalars().
+		 * This is needed to estimate effect of sync_linked_regs().
 		 * Do this at the last instruction of each state,
 		 * bpf_reg_state::id fields are valid for these instructions.
 		 *
@@ -4368,7 +4503,7 @@ static int __mark_chain_precision(struct bpf_verifier_env *env, int regno)
 		 *     ...
 		 *   --- state #1 {r1.id = A, r2.id = A} ---
 		 *     ...
-		 *     if (r2 > 10) goto exit; // find_equal_scalars() assigns range to r1
+		 *     if (r2 > 10) goto exit; // sync_linked_regs() assigns range to r1
 		 *     ...
 		 *   --- state #2 {r1.id = A, r2.id = A} ---
 		 *     r3 = r10
@@ -4736,7 +4871,7 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
 	}
 
 	if (insn_flags)
-		return push_jmp_history(env, env->cur_state, insn_flags);
+		return push_jmp_history(env, env->cur_state, insn_flags, 0);
 	return 0;
 }
 
@@ -5032,7 +5167,7 @@ static int check_stack_read_fixed_off(struct bpf_verifier_env *env,
 		insn_flags = 0; /* we are not restoring spilled register */
 	}
 	if (insn_flags)
-		return push_jmp_history(env, env->cur_state, insn_flags);
+		return push_jmp_history(env, env->cur_state, insn_flags, 0);
 	return 0;
 }
 
@@ -13540,7 +13675,7 @@ static int adjust_reg_min_max_vals(struct bpf_verifier_env *env,
 		ptr_reg = dst_reg;
 	else
 		/* Make sure ID is cleared otherwise dst_reg min/max could be
-		 * incorrectly propagated into other registers by find_equal_scalars()
+		 * incorrectly propagated into other registers by sync_linked_regs()
 		 */
 		dst_reg->id = 0;
 	if (BPF_SRC(insn->code) == BPF_X) {
@@ -13700,7 +13835,7 @@ static int check_alu_op(struct bpf_verifier_env *env, struct bpf_insn *insn)
 					 */
 					if (need_id)
 						/* Assign src and dst registers the same ID
-						 * that will be used by find_equal_scalars()
+						 * that will be used by sync_linked_regs()
 						 * to propagate min/max range.
 						 */
 						src_reg->id = ++env->id_gen;
@@ -13746,7 +13881,7 @@ static int check_alu_op(struct bpf_verifier_env *env, struct bpf_insn *insn)
 						copy_register_state(dst_reg, src_reg);
 						/* Make sure ID is cleared if src_reg is not in u32
 						 * range otherwise dst_reg min/max could be incorrectly
-						 * propagated into src_reg by find_equal_scalars()
+						 * propagated into src_reg by sync_linked_regs()
 						 */
 						if (!is_src_reg_u32)
 							dst_reg->id = 0;
@@ -14564,19 +14699,75 @@ static bool try_match_pkt_pointers(const struct bpf_insn *insn,
 	return true;
 }
 
-static void find_equal_scalars(struct bpf_verifier_state *vstate,
-			       struct bpf_reg_state *known_reg)
+static void __collect_linked_regs(struct linked_regs *reg_set, struct bpf_reg_state *reg,
+				  u32 id, u32 frameno, u32 spi_or_reg, bool is_reg)
 {
-	struct bpf_func_state *state;
+	struct linked_reg *e;
+
+	if (reg->type != SCALAR_VALUE || reg->id != id)
+		return;
+
+	e = linked_regs_push(reg_set);
+	if (e) {
+		e->frameno = frameno;
+		e->is_reg = is_reg;
+		e->regno = spi_or_reg;
+	} else {
+		reg->id = 0;
+	}
+}
+
+/* For all R being scalar registers or spilled scalar registers
+ * in verifier state, save R in linked_regs if R->id == id.
+ * If there are too many Rs sharing same id, reset id for leftover Rs.
+ */
+static void collect_linked_regs(struct bpf_verifier_state *vstate, u32 id,
+				struct linked_regs *linked_regs)
+{
+	struct bpf_func_state *func;
 	struct bpf_reg_state *reg;
+	int i, j;
 
-	bpf_for_each_reg_in_vstate(vstate, state, reg, ({
-		if (reg->type == SCALAR_VALUE && reg->id == known_reg->id) {
+	for (i = vstate->curframe; i >= 0; i--) {
+		func = vstate->frame[i];
+		for (j = 0; j < BPF_REG_FP; j++) {
+			reg = &func->regs[j];
+			__collect_linked_regs(linked_regs, reg, id, i, j, true);
+		}
+		for (j = 0; j < func->allocated_stack / BPF_REG_SIZE; j++) {
+			if (!is_spilled_reg(&func->stack[j]))
+				continue;
+			reg = &func->stack[j].spilled_ptr;
+			__collect_linked_regs(linked_regs, reg, id, i, j, false);
+		}
+	}
+}
+
+/* For all R in linked_regs, copy known_reg range into R
+ * if R->id == known_reg->id.
+ */
+static void sync_linked_regs(struct bpf_verifier_state *vstate, struct bpf_reg_state *known_reg,
+			     struct linked_regs *linked_regs)
+{
+	struct bpf_reg_state *reg;
+	struct linked_reg *e;
+	int i;
+
+	for (i = 0; i < linked_regs->cnt; ++i) {
+		e = &linked_regs->entries[i];
+		reg = e->is_reg ? &vstate->frame[e->frameno]->regs[e->regno]
+				: &vstate->frame[e->frameno]->stack[e->spi].spilled_ptr;
+		if (reg->type != SCALAR_VALUE || reg == known_reg)
+			continue;
+		if (reg->id != known_reg->id)
+			continue;
+		{
 			s32 saved_subreg_def = reg->subreg_def;
+
 			copy_register_state(reg, known_reg);
 			reg->subreg_def = saved_subreg_def;
 		}
-	}));
+	}
 }
 
 static int check_cond_jmp_op(struct bpf_verifier_env *env,
@@ -14587,6 +14778,7 @@ static int check_cond_jmp_op(struct bpf_verifier_env *env,
 	struct bpf_reg_state *regs = this_branch->frame[this_branch->curframe]->regs;
 	struct bpf_reg_state *dst_reg, *other_branch_regs, *src_reg = NULL;
 	struct bpf_reg_state *eq_branch_regs;
+	struct linked_regs linked_regs = {};
 	u8 opcode = BPF_OP(insn->code);
 	bool is_jmp32;
 	int pred = -1;
@@ -14704,6 +14896,21 @@ static int check_cond_jmp_op(struct bpf_verifier_env *env,
 		return 0;
 	}
 
+	/* Push scalar registers sharing same ID to jump history,
+	 * do this before creating 'other_branch', so that both
+	 * 'this_branch' and 'other_branch' share this history
+	 * if parent state is created.
+	 */
+	if (BPF_SRC(insn->code) == BPF_X && src_reg->type == SCALAR_VALUE && src_reg->id)
+		collect_linked_regs(this_branch, src_reg->id, &linked_regs);
+	if (dst_reg->type == SCALAR_VALUE && dst_reg->id)
+		collect_linked_regs(this_branch, dst_reg->id, &linked_regs);
+	if (linked_regs.cnt > 1) {
+		err = push_jmp_history(env, this_branch, 0, linked_regs_pack(&linked_regs));
+		if (err)
+			return err;
+	}
+
 	other_branch = push_stack(env, *insn_idx + insn->off + 1, *insn_idx,
 				  false);
 	if (!other_branch)
@@ -14746,8 +14953,9 @@ static int check_cond_jmp_op(struct bpf_verifier_env *env,
 						    src_reg, dst_reg, opcode);
 			if (src_reg->id &&
 			    !WARN_ON_ONCE(src_reg->id != other_branch_regs[insn->src_reg].id)) {
-				find_equal_scalars(this_branch, src_reg);
-				find_equal_scalars(other_branch, &other_branch_regs[insn->src_reg]);
+				sync_linked_regs(this_branch, src_reg, &linked_regs);
+				sync_linked_regs(other_branch, &other_branch_regs[insn->src_reg],
+						 &linked_regs);
 			}
 
 		}
@@ -14759,8 +14967,9 @@ static int check_cond_jmp_op(struct bpf_verifier_env *env,
 
 	if (dst_reg->type == SCALAR_VALUE && dst_reg->id &&
 	    !WARN_ON_ONCE(dst_reg->id != other_branch_regs[insn->dst_reg].id)) {
-		find_equal_scalars(this_branch, dst_reg);
-		find_equal_scalars(other_branch, &other_branch_regs[insn->dst_reg]);
+		sync_linked_regs(this_branch, dst_reg, &linked_regs);
+		sync_linked_regs(other_branch, &other_branch_regs[insn->dst_reg],
+				 &linked_regs);
 	}
 
 	/* if one pointer register is compared to another pointer
@@ -16182,7 +16391,7 @@ static bool regsafe(struct bpf_verifier_env *env, struct bpf_reg_state *rold,
 		 *
 		 * First verification path is [1-6]:
 		 * - at (4) same bpf_reg_state::id (b) would be assigned to r6 and r7;
-		 * - at (5) r6 would be marked <= X, find_equal_scalars() would also mark
+		 * - at (5) r6 would be marked <= X, sync_linked_regs() would also mark
 		 *   r7 <= X, because r6 and r7 share same id.
 		 * Next verification path is [1-4, 6].
 		 *
@@ -16915,7 +17124,7 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
 			 * the current state.
 			 */
 			if (is_jmp_point(env, env->insn_idx))
-				err = err ? : push_jmp_history(env, cur, 0);
+				err = err ? : push_jmp_history(env, cur, 0, 0);
 			err = err ? : propagate_precision(env, &sl->state);
 			if (err)
 				return err;
@@ -17181,7 +17390,7 @@ static int do_check(struct bpf_verifier_env *env)
 		}
 
 		if (is_jmp_point(env, env->insn_idx)) {
-			err = push_jmp_history(env, state, 0);
+			err = push_jmp_history(env, state, 0, 0);
 			if (err)
 				return err;
 		}
diff --git a/tools/testing/selftests/bpf/progs/verifier_subprog_precision.c b/tools/testing/selftests/bpf/progs/verifier_subprog_precision.c
index 4b8b0f45d17d..a188e26f04da 100644
--- a/tools/testing/selftests/bpf/progs/verifier_subprog_precision.c
+++ b/tools/testing/selftests/bpf/progs/verifier_subprog_precision.c
@@ -141,7 +141,7 @@ __msg("mark_precise: frame0: last_idx 14 first_idx 9")
 __msg("mark_precise: frame0: regs=r6 stack= before 13: (bf) r1 = r7")
 __msg("mark_precise: frame0: regs=r6 stack= before 12: (27) r6 *= 4")
 __msg("mark_precise: frame0: regs=r6 stack= before 11: (25) if r6 > 0x3 goto pc+4")
-__msg("mark_precise: frame0: regs=r6 stack= before 10: (bf) r6 = r0")
+__msg("mark_precise: frame0: regs=r0,r6 stack= before 10: (bf) r6 = r0")
 __msg("mark_precise: frame0: regs=r0 stack= before 9: (85) call bpf_loop")
 /* State entering callback body popped from states stack */
 __msg("from 9 to 17: frame1:")
diff --git a/tools/testing/selftests/bpf/verifier/precise.c b/tools/testing/selftests/bpf/verifier/precise.c
index 8a2ff81d8350..b0b1bcc668ad 100644
--- a/tools/testing/selftests/bpf/verifier/precise.c
+++ b/tools/testing/selftests/bpf/verifier/precise.c
@@ -44,7 +44,7 @@
 	mark_precise: frame0: regs=r2 stack= before 23\
 	mark_precise: frame0: regs=r2 stack= before 22\
 	mark_precise: frame0: regs=r2 stack= before 20\
-	mark_precise: frame0: parent state regs=r2 stack=:\
+	mark_precise: frame0: parent state regs=r2,r9 stack=:\
 	mark_precise: frame0: last_idx 19 first_idx 10\
 	mark_precise: frame0: regs=r2,r9 stack= before 19\
 	mark_precise: frame0: regs=r9 stack= before 18\
-- 
2.43.0


^ permalink raw reply related

* [PATCH stable 6.6.y v4 0/4] bpf: linked scalar precision fixes
From: Zhenzhong Wu @ 2026-06-21 17:27 UTC (permalink / raw)
  To: bpf
  Cc: netdev, linux-kernel, ast, daniel, john.fastabend, andrii,
	martin.lau, song, yonghong.song, kpsingh, haoluo, jolsa,
	menglong8.dong, eddyz87, shung-hsi.yu, stable, mykolal, tamird

Hi,

This v4 targets 6.6.y and keeps the v3 backport strategy: use the full
upstream linked-scalar precision-tracking series, instead of the earlier
d028f87517d6/9e314f5d8682 not-equal refinement backport path.

The original observed failure was found in Rust/Aya-generated eBPF
around helper calls. Rust match lowering can keep a helper return value
and a scalar filled through a by-reference helper argument in the same
enum-style control flow. That makes it easy for the verifier-visible
scalar values to become linked by scalar id.

The relevant verifier-log bytecode from the original reproducer is
below. The later instructions only store r7 into a map so user space can
observe which branch the verifier kept.

  15: (85) call bpf_get_func_ret#184    ; R0_w=scalar() fp-8_w=mmmmmmmm
  16: (79) r7 = *(u64 *)(r10 -8)        ; R7_w=scalar() R10=fp0
  17: (15) if r0 == 0x0 goto pc+1       ; R0_w=scalar()
  18: (bf) r7 = r0                    ; R0=scalar(id=1) R7=scalar(id=1)
  19: (55) if r0 != 0x0 goto pc+6       ; R0=0
  20: (67) r7 <<= 32                    ; R7_w=0
  21: (77) r7 >>= 32                    ; R7_w=0
  22: (b7) r1 = 1                       ; R1_w=1
  23: (55) if r7 != 0xf goto pc+1

The important verifier state shape is:

  1. The program checks "if r0 == 0". The jump target is the success
     path, and the fallthrough path is the failure path.

  2. On the failure path, "r7 = r0" gives r0 and r7 the same scalar id.
     The real success path skips that assignment, so r7 is independent
     there.

  3. At the later "if r0 != 0" check, an affected verifier can explore
     an impossible continuation where r0 is zero and r7 is narrowed
     through the shared scalar id as well.

  4. That impossible continuation reaches the return-value comparison
     with the wrong r7 value. When the real success path is analyzed
     later, state pruning can consider it safe against the earlier
     cached verifier state and skip the real continuation.

The root cause is verifier scalar state tracking, not helper-specific
behavior. A helper return value in r0 and another scalar can become
linked by scalar id on one branch. The real success path can skip that
assignment, so the two verifier states are not equivalent.

The relevant pruning point is that regsafe()/states_equal() accepted
the real success-path state against an earlier cached state where r0 was
an imprecise scalar and r7 constraints were loose enough to cover the
current r7. In the impossible path, r0 and r7 are linked by scalar id
after instruction 18. In the real success path, instruction 18 is
skipped and that scalar-id relationship does not exist. These states
should therefore not be treated as equivalent for pruning.

The upstream linked-scalar precision series fixes that root cause by
recording, in jmp_history, which linked registers were synchronized at
each conditional jump and by using that per-instruction history during
precision backtracking. This covers both the original r0 == 0 /
r0 != 0 shape and the r0 == 1 / r0 != 1 shape used by the separate
runtime selftest.

A Rust/Aya-specific runtime reproducer/selftest discussed in the v2
thread has been submitted separately to bpf-next for review:

  https://lore.kernel.org/bpf/20260611160749.391279-1-jt26wzz@gmail.com/

That reproducer keeps the same helper-return/control-flow shape but
shifts the success value to 1 before branching. This avoids depending
on the not-equal-zero refinement and exercises linked scalar precision
during state pruning directly. It uses bpf_skb_load_bytes() in the
normal tc test-run path and does not require fexit attach or
bpf_get_func_ret(). It is not included in this stable series because
per review feedback it should go through bpf-next first before being
considered for stable.

Targeted results for that separate helper-status runtime reproducer are:

  v6.6.142 + reproducer:                                  FAIL
  v6.6.142 + v2 d028/9e backport path + reproducer:      FAIL
  v6.6.142 + this linked-scalars series + reproducer:    PASS
  bpf-next + reproducer:                                  PASS

Changes since v3:
  - add the tools/testing/selftests/bpf/verifier/precise.c expected-log
    update for "precise: test 1";
  - drop the v3-only collect_linked_regs() singleton cleanup and use
    the final upstream linked_regs.cnt > 1 history-recording check in
    patch 1.

v3:
  https://lore.kernel.org/r/cover.1781194510.git.jt26wzz@gmail.com/

v2:
  https://lore.kernel.org/r/20260607170959.823755-1-jt26wzz@gmail.com/

RFC v1:
  https://lore.kernel.org/r/20260601180400.1381736-1-jt26wzz@gmail.com/

Backport details:

This series is based on v6.6.142 / stable/linux-6.6.y commit
924b4a879cbb ("Linux 6.6.142"). I would like it applied to 6.6.y first.
The same issue is reproducible on 6.1.y, 5.15.y, and 5.10.y, but those
trees need separate older-layout adaptations.

Instead of backporting the d028f87517d6 not-equal refinement plus the
9e314f5d8682 range-combining prerequisite, this series backports the
full upstream linked-scalar precision-tracking series:

  4bf79f9be434
    bpf: Track equal scalars history on per-instruction level
  842edb5507a1
    bpf: Remove mark_precise_scalar_ids()
  bebc17b1c03b
    selftests/bpf: Tests for per-insn sync_linked_regs() precision
    tracking
  cfbf25481d6d
    selftests/bpf: Update comments find_equal_scalars->sync_linked_regs

Upstream series:
  https://lore.kernel.org/r/20240718202357.1746514-1-eddyz87@gmail.com/

Patches 1 and 2 are the verifier changes from that upstream series. The
main 6.6.y-specific verifier adaptation is in patch 1: 6.6.y does not
yet have the newer BPF_ADD_CONST scalar-id representation, so
sync_linked_regs() is adapted to the older scalar-id layout. Patch 1
otherwise keeps the upstream linked_regs.cnt > 1 history-recording
condition.

Patch 1 also carries the matching upstream test_verifier expected-log
change for tools/testing/selftests/bpf/verifier/precise.c that v3
missed. On 6.6.y this is a one-line update to the "precise: test 1"
parent-state log, from regs=r2 to regs=r2,r9. Patch 2 then follows on
top of the adapted layout.

Patches 3 and 4 bring the upstream verifier selftests and comment
updates. Patch 3 has one 6.6.y-specific log adaptation:
linked_regs_broken_link_2 keeps the "div by zero" reject check, but
drops the upstream mark_precise log expectations because 6.6.y does not
derive the scalar-vs-scalar range for that non-constant JMP_X
comparison. Patch 4 only updates the two pre-existing comments that are
present in 6.6.y.

Relevant selftest results on 6.6.y with this v4 backport:

  test_verifier:
    788 PASSED, 0 SKIPPED, 0 FAILED

  test_progs -t verifier_scalar_ids:
    all 18 verifier_scalar_ids subtests passed

Thanks to Shung-Hsi Yu for reviewing v2/v3 and suggesting the upstream
linked-scalar precision series as the preferred backport direction.

Eduard Zingerman (4):
  bpf: Track equal scalars history on per-instruction level
  bpf: Remove mark_precise_scalar_ids()
  selftests/bpf: Tests for per-insn sync_linked_regs() precision
    tracking
  selftests/bpf: Update comments find_equal_scalars->sync_linked_regs

 include/linux/bpf_verifier.h                  |   4 +
 kernel/bpf/verifier.c                         | 364 +++++++++++-------
 .../selftests/bpf/progs/verifier_scalar_ids.c | 253 ++++++++----
 .../selftests/bpf/progs/verifier_spill_fill.c |   4 +-
 .../bpf/progs/verifier_subprog_precision.c    |   2 +-
 .../testing/selftests/bpf/verifier/precise.c  |   4 +-
 6 files changed, 415 insertions(+), 216 deletions(-)

base-commit: 924b4a879cbb75aef37c160b955b92f6894b11a4
-- 
2.43.0

^ permalink raw reply

* Re: [PATCH net 0/2] nfc: llcp: fix OOB reads and integer bugs in TLV parsers
From: David Heidelberg @ 2026-06-21 16:52 UTC (permalink / raw)
  To: Muhammad Bilal, netdev
  Cc: linux-kernel, oe-linux-nfc, david+nfc, davem, edumazet, kuba,
	pabeni, horms, stable
In-Reply-To: <20260519011937.12903-1-meatuni001@gmail.com>

On 19/05/2026 03:19, Muhammad Bilal wrote:
> This series fixes memory safety bugs in the NFC LLCP TLV parsing code,
> reachable from a remote NFC peer via crafted LLCP frames.
> 
> Patch 1 fixes nfc_llcp_parse_gb_tlv() and nfc_llcp_parse_connection_tlv():
>    - u8 offset wraps to zero after 255 (widened to u16)
>    - OOB read of TLV header on truncated buffer
>    - OOB read of value field via attacker-controlled length byte
> 
> Patch 2 fixes nfc_llcp_recv_snl():
>    - OOB read of TLV header when tlv_len - offset == 1
>    - OOB read of SDREQ value via attacker-controlled length
>    - SIZE_MAX underflow when length == 0 in service_name_len,
>      bypassing the sn_len == 0 guard in nfc_llcp_sock_from_sn()
> 
> Previously reported to security@kernel.org on 2026-05-15. Willy Tarreau
> advised posting to public lists as NFC is currently orphaned.
> 
> Muhammad Bilal (2):
>    nfc: llcp: fix OOB read and u8 offset wrap in TLV parsers
>    nfc: llcp: add missing bounds checks in nfc_llcp_recv_snl()
> 
>   net/nfc/llcp_commands.c | 28 ++++++++++++++++++++++++++--
>   net/nfc/llcp_core.c     | 23 +++++++++++++++++++++--
>   2 files changed, 47 insertions(+), 4 deletions(-)
> 

Hello Muhammad,

could I ask for the patches rebase against for-next [1]?

Thank you much for your work!
David

[1] https://codeberg.org/linux-nfc/linux

^ permalink raw reply

* Re: [PATCH net] nfc: nci: validate packet length when parsing NCI 2.x RF interfaces
From: David Heidelberg @ 2026-06-21 16:46 UTC (permalink / raw)
  To: Zijing Yin
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, oe-linux-nfc, netdev, linux-kernel, stable
In-Reply-To: <20260611162718.2301552-1-yzjaurora@gmail.com>

On 11/06/2026 18:27, Zijing Yin wrote:
> nci_core_init_rsp_packet_v2() parses the variable-length list of
> supported RF interfaces carried in an NCI 2.x CORE_INIT_RSP without ever
> validating the controller-supplied lengths against the size of the
> received packet.
> 
> Each list entry is a (RF interface, RF extension count, RF extensions[])
> tuple. The loop walks the list using the per-entry extension count
> (rf_extension_cnt, up to 255) taken straight from the packet, so a
> malformed CORE_INIT_RSP can advance the read pointer far past the end of
> the skb data buffer. The stored interface count is clamped to
> NCI_MAX_SUPPORTED_RF_INTERFACES so the write side is bounded, but the
> read side runs off the end of the buffer.
> 
> A malformed CORE_INIT_RSP from the controller, also reachable from user
> space through the virtual NCI device (CONFIG_NFC_VIRTUAL_NCI) once the
> device has entered NCI 2.x mode, therefore makes the parser read past the
> end of the response buffer while walking the interface list, copying the
> out-of-bounds bytes into ndev->supported_rf_interfaces[].
> 
> Reject responses shorter than the fixed part of the structure, and make
> sure each interface entry and its extension bytes lie within the received
> packet before dereferencing them. A truncated or malformed list is
> treated as a syntax error, which fails the CORE_INIT request instead of
> reading out of bounds.
> 
> Fixes: bcd684aace34 ("net/nfc/nci: Support NCI 2.x initial sequence")
> Cc: stable@vger.kernel.org
> Signed-off-by: Zijing Yin <yzjaurora@gmail.com>
> ---
>   net/nfc/nci/rsp.c | 18 +++++++++++++++++-
>   1 file changed, 17 insertions(+), 1 deletion(-)
> 

Hello Zijing,

in meanwhile [1] got merged earlier patch, which also seems to address the issue.

Could you review that the issue is now fixed for you?

The NFC tree is here [2].

Thank you and sorry for the late reply
David



[1] https://lore.kernel.org/all/20260527052625.3309581-1-yun.zhou@windriver.com/
[2] https://codeberg.org/linux-nfc/linux

^ permalink raw reply

* Re: [PATCH net v4] nfc: llcp: bound SNL TLV parsing to the skb and add length checks
From: David Heidelberg @ 2026-06-21 16:34 UTC (permalink / raw)
  To: Doruk Tan Ozturk, oe-linux-nfc
  Cc: david.laight.linux, horms, netdev, linux-kernel
In-Reply-To: <20260609202543.42282-1-doruk@0sec.ai>


On Tue, 09 Jun 2026 22:25:43 +0200, Doruk Tan Ozturk wrote:
 > nfc: llcp: bound SNL TLV parsing to the skb and add length checks

Applied, thanks!

[1/1] nfc: llcp: bound SNL TLV parsing to the skb and add length checks
       commit: ed85d4cbbfaa4e630c5aa0d607348b42620d976b

Best regards,
-- 
David Heidelberg <david@ixit.cz>

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox