Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH v11 2/4] arm64: dts: s32: set Ethernet channel irqs
From: Jared Kangas @ 2026-04-21 20:07 UTC (permalink / raw)
  To: jan.petrous
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Maxime Coquelin, Alexandre Torgue, Chester Lin,
	Matthias Brugger, Ghennadi Procopciuc, NXP S32 Linux Team,
	Shawn Guo, Sascha Hauer, Pengutronix Kernel Team, Fabio Estevam,
	Rob Herring, Krzysztof Kozlowski, Conor Dooley, Frank Li, netdev,
	linux-stm32, linux-arm-kernel, linux-kernel, imx, devicetree,
	rmk+kernel, vladimir.oltean, boon.khai.ng
In-Reply-To: <20260312-dwmac_multi_irq-v11-2-09621ccb040b@oss.nxp.com>

On Thu, Mar 12, 2026 at 09:55:28AM +0100, Jan Petrous via B4 Relay wrote:
> From: "Jan Petrous (OSS)" <jan.petrous@oss.nxp.com>
> 
> The GMAC Ethernet controller found on S32G2/S32G3 and S32R45
> contains up to 5 RX and 5 TX channels.
> It can operate in two interrupt modes:
> 
>   1) Sharing IRQ mode: only MAC IRQ line is used
>      for all channels.
> 
>   2) Multiple IRQ mode: every channel uses two IRQ lines,
>      one for RX and second for TX.
> 
> Specify all IRQ twins for all channels.
> 
> Reviewed-by: Matthias Brugger <mbrugger@suse.com>
> Signed-off-by: Jan Petrous (OSS) <jan.petrous@oss.nxp.com>
> ---

Tested the new channels on an S32G-VNP-RDB3 while testing patch 4/4.

Tested-by: Jared Kangas <jkangas@redhat.com>


^ permalink raw reply

* [PATCH iproute2] lib: add input validation for time, rate, and size parsing functions
From: Stephen Hemminger @ 2026-04-21 20:23 UTC (permalink / raw)
  To: netdev; +Cc: Stephen Hemminger

The parsing functions get_time(), get_time64(), get_rate(), get_rate64(),
and get_size64() use strtod() to convert user input but don't validate
the parsed values. This allows negative numbers and overflow values to
be passed through, which can cause unexpected behavior or security issues
when these values reach the kernel as unsigned integers.

Add validation to reject:
- Negative values (which make no sense for time, rate, or size)
- Overflow conditions (when strtod() returns HUGE_VAL with ERANGE)
- Empty strings (already checked, but now with explicit comments)

This prevents potential issues and provides clearer error reporting.
Fixing it in iproute2 does not mean that the kernel is off the hook
for handling negative values. Checks are still needed.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/utils.c      | 16 ++++++++++++++--
 lib/utils_math.c | 25 ++++++++++++++++++++++---
 2 files changed, 36 insertions(+), 5 deletions(-)

diff --git a/lib/utils.c b/lib/utils.c
index 1215fe31..50602e59 100644
--- a/lib/utils.c
+++ b/lib/utils.c
@@ -1647,7 +1647,13 @@ int get_time(unsigned int *time, const char *str)
 
 	t = strtod(str, &p);
 	if (p == str)
-		return -1;
+		return -1;	/* empty string */
+
+	if (t < 0)
+		return -1;	/* negative value */
+
+	if (t == HUGE_VAL && errno == ERANGE)
+		return -1;	/* out of range */
 
 	if (*p) {
 		if (strcasecmp(p, "s") == 0 || strcasecmp(p, "sec") == 0 ||
@@ -1693,7 +1699,13 @@ int get_time64(__s64 *time, const char *str)
 
 	nsec = strtod(str, &p);
 	if (p == str)
-		return -1;
+		return -1;	/* empty string */
+
+	if (nsec < 0)
+		return -1;	/* negative value */
+
+	if (nsec == HUGE_VAL && errno == ERANGE)
+		return -1;	/* out of range */
 
 	if (*p) {
 		if (strcasecmp(p, "s") == 0 ||
diff --git a/lib/utils_math.c b/lib/utils_math.c
index fd2ddc7c..4b0ac266 100644
--- a/lib/utils_math.c
+++ b/lib/utils_math.c
@@ -5,6 +5,7 @@
 #include <string.h>
 #include <math.h>
 #include <limits.h>
+#include <errno.h>
 #include <asm/types.h>
 
 #include "utils.h"
@@ -42,7 +43,13 @@ int get_rate(unsigned int *rate, const char *str)
 	const struct rate_suffix *s;
 
 	if (p == str)
-		return -1;
+		return -1;	/* empty string */
+
+	if (bps < 0)
+		return -1;	/* negative */
+
+	if (bps == HUGE_VAL && errno == ERANGE)
+		return -1;	/* out of range */
 
 	for (s = suffixes; s->name; ++s) {
 		if (strcasecmp(s->name, p) == 0) {
@@ -70,7 +77,13 @@ int get_rate64(__u64 *rate, const char *str)
 	const struct rate_suffix *s;
 
 	if (p == str)
-		return -1;
+		return -1;	/* empty string */
+
+	if (bps < 0)
+		return -1;	/* negative */
+
+	if (bps == HUGE_VAL && errno == ERANGE)
+		return -1;	/* out of range */
 
 	for (s = suffixes; s->name; ++s) {
 		if (strcasecmp(s->name, p) == 0) {
@@ -95,7 +108,13 @@ int get_size64(__u64 *size, const char *str)
 
 	sz = strtod(str, &p);
 	if (p == str)
-		return -1;
+		return -1;	/* empty string */
+
+	if (sz < 0)
+		return -1;	/* negative */
+
+	if (sz == HUGE_VAL && errno == ERANGE)
+		return -1;	/* out of range */
 
 	if (*p) {
 		if (strcasecmp(p, "kb") == 0 || strcasecmp(p, "k") == 0)
-- 
2.53.0


^ permalink raw reply related

* [PATCH net v3] ipv6: Cap TLV scan in ip6_tnl_parse_tlv_enc_lim
From: Daniel Borkmann @ 2026-04-21 20:24 UTC (permalink / raw)
  To: kuba
  Cc: edumazet, dsahern, tom, willemdebruijn.kernel, idosch,
	justin.iurman, pabeni, netdev

Commit 47d3d7ac656a ("ipv6: Implement limits on Hop-by-Hop and
Destination options") added net.ipv6.max_{hbh,dst}_opts_{cnt,len}
and applied them in ip6_parse_tlv(), the generic TLV walker
invoked from ipv6_destopt_rcv() and ipv6_parse_hopopts().

ip6_tnl_parse_tlv_enc_lim() does not go through ip6_parse_tlv();
it has its own hand-rolled TLV scanner inside its NEXTHDR_DEST
branch which looks for IPV6_TLV_TNL_ENCAP_LIMIT. That inner
loop is bounded only by optlen, which can be up to 2048 bytes.
Stuffing the Destination Options header with 2046 Pad1 (type=0)
entries advances the scanner a single byte at a time, yielding
~2000 TLV iterations per extension header.

Reusing max_dst_opts_cnt to bound the TLV iterations, matching
the semantics from 47d3d7ac656a, would require duplicating
ip6_parse_tlv() to also validate Pad1/PadN payload. It would
also mandate enforcing max_dst_opts_len, since otherwise an
attacker shifts the axis to few options with a giant PadN and
recovers the original DoS. Allowing up to 8 options before the
tunnel encapsulation limit TLV is liberal enough; in practice
encap limit is the first TLV. Thus, go with a hard-coded limit
IP6_TUNNEL_MAX_DEST_TLVS (8).

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
---
  v2->v3:
   - Hard code limit of 8 instead of max_dst_opts_cnt (Ido)
  v1->v2:
   - Use abs() given max_dst_opts_cnt's negative meaning (Justin)
   - Remove unlikely (Justin)

 net/ipv6/ip6_tunnel.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 907c6a2af331..4546a60942ab 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -62,6 +62,8 @@ MODULE_LICENSE("GPL");
 MODULE_ALIAS_RTNL_LINK("ip6tnl");
 MODULE_ALIAS_NETDEV("ip6tnl0");

+#define IP6_TUNNEL_MAX_DEST_TLVS    8
+
 #define IP6_TUNNEL_HASH_SIZE_SHIFT  5
 #define IP6_TUNNEL_HASH_SIZE (1 << IP6_TUNNEL_HASH_SIZE_SHIFT)

@@ -430,11 +432,15 @@ __u16 ip6_tnl_parse_tlv_enc_lim(struct sk_buff *skb, __u8 *raw)
 				break;
 		}
 		if (nexthdr == NEXTHDR_DEST) {
+			int tlv_cnt = 0;
 			u16 i = 2;

 			while (1) {
 				struct ipv6_tlv_tnl_enc_lim *tel;

+				if (unlikely(tlv_cnt++ >= IP6_TUNNEL_MAX_DEST_TLVS))
+					break;
+
 				/* No more room for encapsulation limit */
 				if (i + sizeof(*tel) > optlen)
 					break;
-- 
2.43.0

^ permalink raw reply related

* Re: [PATCH net 1/4] rxrpc: Fix memory leaks in rxkad_verify_response()
From: Simon Horman @ 2026-04-21 20:32 UTC (permalink / raw)
  To: David Howells
  Cc: netdev, Marc Dionne, Jakub Kicinski, David S. Miller,
	Eric Dumazet, Paolo Abeni, linux-afs, linux-kernel,
	Jeffrey Altman, stable
In-Reply-To: <20260420145900.1223732-2-dhowells@redhat.com>

On Mon, Apr 20, 2026 at 03:58:54PM +0100, David Howells wrote:
> Fix rxkad_verify_response() to free ticket by using a __free() construct
> rather than explicitly freeing it.
> 
> Also fix rxkad_verify_response() to free the server key by using a __free()
> construct.
> 
> Fixes: 57af281e5389 ("rxrpc: Tidy up abort generation infrastructure")
> Fixes: ec832bd06d6f ("rxrpc: Don't retain the server key in the connection")
> Closes: https://sashiko.dev/#/patchset/20260408121252.2249051-1-dhowells%40redhat.com
> Signed-off-by: David Howells <dhowells@redhat.com>

...

> index eb7f2769d2b1..0acdc46f42c2 100644
> --- a/net/rxrpc/rxkad.c
> +++ b/net/rxrpc/rxkad.c

...

> @@ -1160,16 +1159,15 @@ static int rxkad_verify_response(struct rxrpc_connection *conn,
>  	}
>  
>  	ret = -ENOMEM;
> -	response = kzalloc_obj(struct rxkad_response, GFP_NOFS);
> +	struct rxkad_response *response __free(kfree) =
> +		kzalloc_obj(struct rxkad_response, GFP_NOFS);
>  	if (!response)
>  		goto temporary_error;
>  

Hi David,

This goto, combined with the use of __free in the declaration
of ticket below results in a compile error for x86_64 allmodconfig
with clang 21.1.8.

  net/rxrpc/rxkad.c:1165:3: error: cannot jump from this goto statement to its label
   1165 |                 goto temporary_error;
        |                 ^
  net/rxrpc/rxkad.c:1192:8: note: jump bypasses initialization of variable with __attribute__((cleanup))
   1192 |         void *ticket __free(kfree) = kmalloc(ticket_len, GFP_NOFS);
        |               ^

Moreover, the use of this construct is discouraged in Networking code:

  1.7.3. Using device-managed and cleanup.h constructs¶

  Netdev remains skeptical about promises of all “auto-cleanup” APIs,
  including even devm_ helpers, historically. They are not the preferred
  style of implementation, merely an acceptable one.

  Use of guard() is discouraged within any function longer than 20 lines,
  scoped_guard() is considered more readable. Using normal lock/unlock is
  still (weakly) preferred.

  Low level cleanup constructs (such as __free()) can be used when building
  APIs and helpers, especially scoped iterators. However, direct use of
  __free() within networking core and drivers is discouraged. Similar
  guidance applies to declaring variables mid-function.

  https://docs.kernel.org/process/maintainer-netdev.html#using-device-managed-and-cleanup-h-constructs

And to round things out, Sashiko also points out problems with
the use of __free() in this patch.

...

>  
>  	/* extract the kerberos ticket and decrypt and decode it */
>  	ret = -ENOMEM;
> -	ticket = kmalloc(ticket_len, GFP_NOFS);
> +	void *ticket __free(kfree) = kmalloc(ticket_len, GFP_NOFS);
>  	if (!ticket)
> -		goto temporary_error_free_resp;
> +		goto temporary_error;

...

>  temporary_error:
>  	/* Ignore the response packet if we got a temporary error such as
>  	 * ENOMEM.  We just want to send the challenge again.  Note that we
>  	 * also come out this way if the ticket decryption fails.
>  	 */
> -	key_put(server_key);
>  	return ret;
>  }
>  
> 

-- 
pw-bot: changes-requested

^ permalink raw reply

* Re: [PATCH net 2/4] rxrpc: Fix conn-level packet handling to unshare RESPONSE packets
From: Simon Horman @ 2026-04-21 20:38 UTC (permalink / raw)
  To: dhowells
  Cc: 'Simon Horman', netdev, marc.dionne, kuba, davem,
	edumazet, pabeni, linux-afs, linux-kernel, jaltman, stable
In-Reply-To: <20260420145900.1223732-3-dhowells@redhat.com>

From: 'Simon Horman' <horms@kernel.org>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://sashiko.dev
---
Subject: rxrpc: Fix conn-level packet handling to unshare RESPONSE packets

This commit modifies the RxRPC connection-level packet processor to unshare
RESPONSE packets before decryption. It introduces a new rxrpc_unshare_skb()
helper to safely unshare skbuffs passed via a double pointer and replaces
direct calls to skb_unshare().

Closes: https://sashiko.dev/#/patchset/20260408121252.2249051-1-dhowells%40redhat.com

> diff --git a/net/rxrpc/conn_event.c b/net/rxrpc/conn_event.c
> index 9a41ec708aeb9..3d56a5d233697 100644
> --- a/net/rxrpc/conn_event.c
> +++ b/net/rxrpc/conn_event.c

[ ... ]

> @@ -371,7 +379,7 @@ static void rxrpc_do_process_connection(struct rxrpc_connection *conn)
>  	 * connection that each one has when we've finished with it */
>  	while ((skb = skb_dequeue(&conn->rx_queue))) {
>  		rxrpc_see_skb(skb, rxrpc_skb_see_conn_work);
> -		ret = rxrpc_process_event(conn, skb);
> +		ret = rxrpc_process_event(conn, &skb);
>  		switch (ret) {
>  		case -ENOMEM:
>  		case -EAGAIN:

Looking at the code immediately following this hunk in
rxrpc_do_process_connection():

		switch (ret) {
		case -ENOMEM:
		case -EAGAIN:
			skb_queue_head(&conn->rx_queue, skb);
			rxrpc_queue_conn(conn, rxrpc_conn_queue_retry_work);
			break;

If rxrpc_process_event() fails with -ENOMEM because rxrpc_unshare_skb()
failed, rxrpc_unshare_skb() will have set *_skb = NULL.

Does this mean skb is NULL when we reach skb_queue_head()?

Since skb_queue_head() dereferences the skb parameter, this appears to
result in a NULL pointer dereference. Furthermore, since skb_unshare()
frees the original skb on allocation failure, the packet is already gone
and requeuing it wouldn't be possible anyway.

> diff --git a/net/rxrpc/io_thread.c b/net/rxrpc/io_thread.c
> index 6979569319252..0592ce644fc3d 100644
> --- a/net/rxrpc/io_thread.c
> +++ b/net/rxrpc/io_thread.c

[ ... ]

> @@ -249,19 +249,10 @@ static bool rxrpc_input_packet(struct rxrpc_local *local, struct sk_buff **_skb)
>  		 * decryption.
>  		 */
>  		if (sp->hdr.securityIndex != 0) {
> -			skb = skb_unshare(skb, GFP_ATOMIC);
> -			if (!skb) {
> -				rxrpc_eaten_skb(*_skb, rxrpc_skb_eaten_by_unshare_nomem);
> -				*_skb = NULL;
> +			skb = rxrpc_unshare_skb(_skb, GFP_ATOMIC);
> +			if (!skb)
>  				return just_discard;
> -			}

This isn't a new bug introduced by this patch, but since we are modifying
the unshare path here: when rxrpc_unshare_skb() fails, it sets *_skb = NULL
and returns just_discard.

Back in rxrpc_io_thread(), the packet is processed like this:

net/rxrpc/io_thread.c:rxrpc_io_thread() {
...
		while ((skb = __skb_dequeue(&rx_queue))) {
			struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
			switch (skb->mark) {
			case RXRPC_SKB_MARK_PACKET:
				skb->priority = 0;
				if (!rxrpc_input_packet(local, &skb))
					rxrpc_reject_packet(local, skb);
				trace_rxrpc_rx_done(skb->mark, skb->priority);
...

If rxrpc_input_packet() sets skb to NULL and returns false (just_discard),
does this cause a NULL pointer dereference when trace_rxrpc_rx_done() tries
to access skb->mark?

^ permalink raw reply

* Re: [PATCH net 00/18] Remove a number of ISA and PCMCIA Ethernet drivers
From: Byron Stanoszek @ 2026-04-21 20:44 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan,
	linux-kernel, netdev, linux-doc
In-Reply-To: <20260421-v7-0-0-net-next-driver-removal-v1-v1-0-69517c689d1f@lunn.ch>

On Tue, 21 Apr 2026, Andrew Lunn wrote:

> These old drivers have not been much of a Maintenance burden until
> recently. Now there are more newbies using AI and fuzzers finding
> issues, resulting in more work for Maintainers. Fixing these old
> drivers make little sense, if it is not clear they have users.
>
>      drivers: net: 3com: 3c59x: Remove this driver

Hi Andrew,

I happen to still use this driver on several hundred industrial PC
installations that were outfitted with 3com 3C905-B & CX cards 15+ years ago.
The old hardware still runs, therefore those cards haven't needed to be
replaced. I keep these systems up to date with the latest Linux kernel roughly
once per year.

I understand the maintenance burden, but I would be delighted to continue
receiving bug fixes for this driver via the mainline Linux kernel if you are
still willing to continue to support it.

Thanks and best regards,
  -Byron

^ permalink raw reply

* Re: [syzbot] [kvm?] [net?] [virt?] BUG: sleeping function called from invalid context in vhost_get_avail_idx
From: Michael S. Tsirkin @ 2026-04-21 20:54 UTC (permalink / raw)
  To: Kohei Enju; +Cc: syzbot, jasowang, linux-kernel, netdev, syzkaller-bugs
In-Reply-To: <aeerTzpq8B-WTKeC@x1>

On Wed, Apr 22, 2026 at 02:11:01AM +0900, Kohei Enju wrote:
> On 04/20 15:09, syzbot wrote:
> > Hello,
> > 
> > syzbot found the following issue on:
> > 
> > HEAD commit:    8541d8f725c6 Merge tag 'mtd/for-7.1' of git://git.kernel.o..
> > git tree:       upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=136454ce580000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=7e54da1916e8d11f
> > dashboard link: https://syzkaller.appspot.com/bug?extid=6985cb8e543ea90ba8ee
> > compiler:       gcc (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44
> > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=15d264ce580000
> > C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=143ec1ba580000
> > 
> > Downloadable assets:
> > disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/d900f083ada3/non_bootable_disk-8541d8f7.raw.xz
> > vmlinux: https://storage.googleapis.com/syzbot-assets/22dfea2c37c2/vmlinux-8541d8f7.xz
> > kernel image: https://storage.googleapis.com/syzbot-assets/e2f93ad68fe3/bzImage-8541d8f7.xz
> > 
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: syzbot+6985cb8e543ea90ba8ee@syzkaller.appspotmail.com
> > 
> > BUG: sleeping function called from invalid context at drivers/vhost/vhost.c:1527
> > in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 6110, name: vhost-6109
> > preempt_count: 1, expected: 0
> > RCU nest depth: 0, expected: 0
> > 2 locks held by vhost-6109/6110:
> >  #0: ffff888055624cb0 (&vq->mutex/1){+.+.}-{4:4}, at: handle_tx+0x2d/0x160 drivers/vhost/net.c:971
> >  #1: ffff888055620248 (&vq->mutex){+.+.}-{4:4}, at: vhost_net_busy_poll+0x9c/0x730 drivers/vhost/net.c:554
> > Preemption disabled at:
> > [<ffffffff88f1a006>] vhost_net_busy_poll+0x1c6/0x730 drivers/vhost/net.c:563
> 
> I think the blamed commit may be commit 030881372460 ("vhost_net: basic
> polling support"), since it introduced preempt_{disable,enable}() around
> the busy-poll loop, which calls a sleepable function inside the loop.
> 
> Also, from the changelog of the series,
> 
> https://lore.kernel.org/netdev/1448435489-5949-4-git-send-email-jasowang@redhat.com/T/#u
> 
>   Changes from RFC V1:
>   ...
>   - Disable preemption during busy looping to make sure local_clock() was
>     correctly used.
> 
> So my understanding is that preempt_disable() was introduced to keep
> local_clock() based timeout accounting on a single CPU, rather than as a
> requirement of busy polling itself.
> 
> If my understanding is correct, migrate_disable() is sufficient here
> instead of preempt_disable(), avoiding sleepable accesses from a
> preempt-disabled context.
> 
> #syz test
> 
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 80965181920c..c6536cad9c4f 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -560,7 +560,7 @@ static void vhost_net_busy_poll(struct vhost_net *net,
>         busyloop_timeout = poll_rx ? rvq->busyloop_timeout:
>                                      tvq->busyloop_timeout;
> 
> -       preempt_disable();
> +       migrate_disable();
>         endtime = busy_clock() + busyloop_timeout;
> 
>         while (vhost_can_busy_poll(endtime)) {
> @@ -577,7 +577,7 @@ static void vhost_net_busy_poll(struct vhost_net *net,
>                 cpu_relax();
>         }
> 
> -       preempt_enable();
> +       migrate_enable();
> 
>         if (poll_rx || sock_has_rx_data(sock))
>                 vhost_net_busy_poll_try_queue(net, vq);



Makes sense but this stipped up the bot. Try again?

> 
> > CPU: 0 UID: 0 PID: 6110 Comm: vhost-6109 Not tainted syzkaller #0 PREEMPT(full) 
> > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
> > Call Trace:
> >  <TASK>
> >  __dump_stack lib/dump_stack.c:94 [inline]
> >  dump_stack_lvl+0x100/0x190 lib/dump_stack.c:120
> >  __might_resched.cold+0x1ec/0x232 kernel/sched/core.c:9162
> >  __might_fault+0x8b/0x140 mm/memory.c:7322
> >  vhost_get_avail_idx+0x31c/0x4f0 drivers/vhost/vhost.c:1527
> >  vhost_vq_avail_empty drivers/vhost/vhost.c:3206 [inline]
> >  vhost_vq_avail_empty+0xa9/0xe0 drivers/vhost/vhost.c:3199
> >  vhost_net_busy_poll+0x297/0x730 drivers/vhost/net.c:574
> >  vhost_net_tx_get_vq_desc drivers/vhost/net.c:610 [inline]
> >  get_tx_bufs.constprop.0+0x338/0x600 drivers/vhost/net.c:650
> >  handle_tx_copy+0x28c/0x12e0 drivers/vhost/net.c:778
> >  handle_tx+0x139/0x160 drivers/vhost/net.c:985
> >  vhost_run_work_list+0x183/0x220 drivers/vhost/vhost.c:454
> >  vhost_task_fn+0x156/0x430 kernel/vhost_task.c:49
> >  ret_from_fork+0x72b/0xd50 arch/x86/kernel/process.c:158
> >  ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
> >  </TASK>
> > 
> > 
> > ---
> > This report is generated by a bot. It may contain errors.
> > See https://goo.gl/tpsmEJ for more information about syzbot.
> > syzbot engineers can be reached at syzkaller@googlegroups.com.
> > 
> > syzbot will keep track of this issue. See:
> > https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> > 
> > If the report is already addressed, let syzbot know by replying with:
> > #syz fix: exact-commit-title
> > 
> > If you want syzbot to run the reproducer, reply with:
> > #syz test: git://repo/address.git branch-or-commit-hash
> > If you attach or paste a git patch, syzbot will apply it before testing.
> > 
> > If you want to overwrite report's subsystems, reply with:
> > #syz set subsystems: new-subsystem
> > (See the list of subsystem names on the web dashboard)
> > 
> > If the report is a duplicate of another one, reply with:
> > #syz dup: exact-subject-of-another-report
> > 
> > If you want to undo deduplication, reply with:
> > #syz undup


^ permalink raw reply

* Re: [PATCH net 2/4] rxrpc: Fix conn-level packet handling to unshare RESPONSE packets
From: David Howells @ 2026-04-21 20:58 UTC (permalink / raw)
  To: Simon Horman
  Cc: dhowells, netdev, marc.dionne, kuba, davem, edumazet, pabeni,
	linux-afs, linux-kernel, jaltman, stable
In-Reply-To: <20260421203833.745240-1-horms@kernel.org>

Simon Horman <horms@kernel.org> wrote:

> This isn't a new bug introduced by this patch, but since we are modifying
> the unshare path here: when rxrpc_unshare_skb() fails, it sets *_skb = NULL
> and returns just_discard.

Is there a way to avoid having skb_unshare() eat your ref to the source skbuff
if it fails?

David


^ permalink raw reply

* Re: [PATCH] dt-bindings: Fix phandle-array constraints, again
From: Rob Herring (Arm) @ 2026-04-21 21:33 UTC (permalink / raw)
  To: Rob Herring (Arm)
  Cc: Sylwester Nawrocki, ath11k, devicetree, linux-pci,
	Mathieu Poirier, Conor Dooley, Xu Yang, Lorenzo Pieralisi,
	Andrew Lunn, linux-spi, Patrice Chotard, Rao Mandadapu,
	Mark Brown, linux-sound, Maarten Lankhorst, Bjorn Andersson,
	linux-remoteproc, Maxime Coquelin, Thomas Zimmermann, Sibi Sankar,
	linux-wireless, Johannes Berg, Yang Xiwen, Chaitanya Chundru,
	Alex Elder, Manivannan Sadhasivam, Ulf Hansson, Stephan Gerhold,
	linux-kernel, Eric Dumazet, Paolo Abeni, Jakub Kicinski,
	linux-arm-msm, linux-usb, Bjorn Helgaas, Jeff Johnson, linux-mmc,
	Krzysztof Wilczyński, Krzysztof Kozlowski, Peng Fan,
	David S. Miller, Greg Kroah-Hartman, ath10k, netdev,
	Maxime Ripard
In-Reply-To: <20260421195836.1547469-1-robh@kernel.org>


On Tue, 21 Apr 2026 14:55:25 -0500, Rob Herring (Arm) wrote:
> The unfortunately named 'phandle-array' property type is really a matrix
> with phandle and fixed arg cells entries. A matrix property should have 2
> levels of items constraints.
> 
> Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
> ---
> Can someone from QCom provide some descriptions for 'qcom,smem-states'
> properties.
> ---
>  .../display/rockchip/rockchip,rk3399-cdn-dp.yaml         | 2 ++
>  .../bindings/mmc/hisilicon,hi3798cv200-dw-mshc.yaml      | 7 ++++---
>  Documentation/devicetree/bindings/net/qcom,bam-dmux.yaml | 6 ++++++
>  Documentation/devicetree/bindings/net/qcom,ipa.yaml      | 6 ++++++
>  .../devicetree/bindings/net/wireless/qcom,ath10k.yaml    | 5 ++++-
>  .../devicetree/bindings/net/wireless/qcom,ath11k.yaml    | 5 ++++-
>  .../bindings/net/wireless/qcom,ipq5332-wifi.yaml         | 9 +++++++++
>  .../devicetree/bindings/pci/toshiba,tc9563.yaml          | 5 +++--
>  .../bindings/remoteproc/qcom,msm8916-mss-pil.yaml        | 3 +++
>  .../bindings/remoteproc/qcom,msm8996-mss-pil.yaml        | 3 +++
>  .../devicetree/bindings/remoteproc/qcom,pas-common.yaml  | 4 ++++
>  .../bindings/remoteproc/qcom,qcs404-cdsp-pil.yaml        | 4 ++++
>  .../bindings/remoteproc/qcom,sc7180-mss-pil.yaml         | 3 +++
>  .../bindings/remoteproc/qcom,sc7280-adsp-pil.yaml        | 3 +++
>  .../bindings/remoteproc/qcom,sc7280-mss-pil.yaml         | 3 +++
>  .../bindings/remoteproc/qcom,sc7280-wpss-pil.yaml        | 3 +++
>  .../bindings/remoteproc/qcom,sdm845-adsp-pil.yaml        | 3 +++
>  .../devicetree/bindings/remoteproc/qcom,wcnss-pil.yaml   | 3 +++
>  Documentation/devicetree/bindings/sound/samsung,tm2.yaml | 2 ++
>  .../devicetree/bindings/spi/st,stm32mp25-ospi.yaml       | 5 +++--
>  .../devicetree/bindings/usb/chipidea,usb2-common.yaml    | 2 ++
>  Documentation/devicetree/bindings/usb/ci-hdrc-usb2.yaml  | 7 ++++---
>  22 files changed, 81 insertions(+), 12 deletions(-)
> 

My bot found errors running 'make dt_binding_check' on your patch:

yamllint warnings/errors:
./Documentation/devicetree/bindings/remoteproc/qcom,qcs404-cdsp-pil.yaml:99:1: [warning] too many blank lines (2 > 1) (empty-lines)
./Documentation/devicetree/bindings/remoteproc/qcom,pas-common.yaml:67:1: [warning] too many blank lines (2 > 1) (empty-lines)

dtschema/dtc warnings/errors:
/builds/robherring/dt-review-ci/linux/Documentation/devicetree/bindings/i2c/i2c-demux-pinctrl.example.dtb: i2c-mux3 (i2c-demux-pinctrl): i2c-parent:0: [2, 3, 4] is too long
	from schema $id: http://devicetree.org/schemas/i2c/i2c-demux-pinctrl.yaml
/builds/robherring/dt-review-ci/linux/Documentation/devicetree/bindings/sound/samsung,tm2.example.dtb: sound (samsung,tm2-audio): i2s-controller: [[4294967295], [0], [4294967295], [0]] is too long
	from schema $id: http://devicetree.org/schemas/sound/samsung,tm2.yaml

doc reference errors (make refcheckdocs):

See https://patchwork.kernel.org/project/devicetree/patch/20260421195836.1547469-1-robh@kernel.org

The base for the series is generally the latest rc1. A different dependency
should be noted in *this* patch.

If you already ran 'make dt_binding_check' and didn't see the above
error(s), then make sure 'yamllint' is installed and dt-schema is up to
date:

pip3 install dtschema --upgrade

Please check and re-submit after running the above command yourself. Note
that DT_SCHEMA_FILES can be set to your schema file to speed up checking
your schema. However, it must be unset to test all examples with your schema.


^ permalink raw reply

* Re: [PATCH net 12/18] drivers: net: cirrus: mac89x0: Remove this driver
From: Daniel Palmer @ 2026-04-21 21:34 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan,
	linux-kernel, netdev, linux-doc
In-Reply-To: <20260421-v7-0-0-net-next-driver-removal-v1-v1-12-69517c689d1f@lunn.ch>

Hi Andrew,

On Wed, 22 Apr 2026 at 05:01, Andrew Lunn <andrew@lunn.ch> wrote:
>
> The mac89x0 was written by Russell Nelson in 1996. It is an MAC
> device, so unlikely to be used with modern kernels.
>
> Signed-off-by: Andrew Lunn <andrew@lunn.ch>

I think I might be using this with a mac and I'm running mainline.
FWIW the m68k mac stuff still works perfectly fine with modern kernels
and there are people using it.

^ permalink raw reply

* Re: [PATCH net 06/18] drivers: net: amd: Remove hplance and mvme147
From: Daniel Palmer @ 2026-04-21 21:38 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan,
	linux-kernel, netdev, linux-doc
In-Reply-To: <20260421-v7-0-0-net-next-driver-removal-v1-v1-6-69517c689d1f@lunn.ch>

Hi Andrew,

On Wed, 22 Apr 2026 at 04:39, Andrew Lunn <andrew@lunn.ch> wrote:
>
> These drivers use the 7990 core with wrappers for the HP300 and
> Motorola MVME147 SBC circa 1998. It is unlikely they are used with a
> modern kernel.

I have an MVME147 blinking away running mainline using the mvme147 driver.
I think some of these need to be CC'd to the specific arch lists so
the few people using them get a chance to pipe up.
I think I'm the last person to have touched the mvme147, I don't mind
being a maintainer for it.

Cheers,

Daniel

^ permalink raw reply

* Re: [PATCH net 00/18] Remove a number of ISA and PCMCIA Ethernet drivers
From: Daniel Palmer @ 2026-04-21 22:03 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan,
	linux-kernel, netdev, linux-doc
In-Reply-To: <20260421-v7-0-0-net-next-driver-removal-v1-v1-0-69517c689d1f@lunn.ch>

Hi Andrew,

On Wed, 22 Apr 2026 at 04:32, Andrew Lunn <andrew@lunn.ch> wrote:
>
> These old drivers have not been much of a Maintenance burden until
> recently. Now there are more newbies using AI and fuzzers finding
> issues, resulting in more work for Maintainers. Fixing these old
> drivers make little sense, if it is not clear they have users.
>
> These are all ISA and PCMCIA Ethernet devices, mostly from the last
> century, a couple from 2001 or 2002. It seems unlikely they are still
> used. However, remove them one patch at a time so they can be brought
> back if somebody still has the hardware, runs modern kernels and wants
> to take up the roll of driver Maintainer.
>
> Signed-off-by: Andrew Lunn <andrew@lunn.ch>

I replied to the single patches for the ones I think I have off the
top of my head but maybe I have a few other these.
I think on x86 people running machines that have real ISA slots and a
modern kernel are going to be fairly rare but I think there is non-x86
hardware that has some of these and people are still using them.
I think the pcnet one is used in some PC emulators too?

This is just my opinion but I think it'd be good to give people a bit
more time before they get removed. I thought I was the only person
using the 68000 (a cpu from 1979) code in the m68k tree for years
until out of the blue a month or so ago someone popped up and asked if
I could fix a bug in it since I'd fixed a different bug in it
recently.

Maybe we could add a special thing in the maintainers for "this is
code only crazy people use" and have a rule to ignore untested AI
generated patches for it? :)

Thanks,

Daniel

^ permalink raw reply

* [PATCH bpf-next v3 0/9] Refactor verifier object relationship tracking
From: Amery Hung @ 2026-04-21 22:10 UTC (permalink / raw)
  To: bpf
  Cc: netdev, alexei.starovoitov, andrii, daniel, eddyz87, memxor,
	martin.lau, mykyta.yatsenko5, ameryhung, kernel-team

Hi all,

This patchset cleans up dynptr handling, refactors object relationship
tracking in the verifier by introducing parent_id, and fixes dynptr
use-after-free bugs where file/skb dynptrs are not invalidated when
the parent referenced object is freed.

* Motivation *

In BPF qdisc programs, an skb can be freed through kfuncs. However,
since dynptr does not track the parent referenced object (e.g., skb),
the verifier does not invalidate the dynptr after the skb is freed,
resulting in use-after-free. The same issue also affects file dynptr.

The figure below shows the current state of object tracking. The
verifier tracks objects using three fields: id for nullness tracking,
ref_obj_id for lifetime tracking, and dynptr_id for tracking the parent
dynptr of a slice (PTR_TO_MEM only). While dynptr_id links slices to
their parent dynptr, there is no field that links a dynptr back to its
parent skb. When the skb is freed via release_reference(ref_obj_id=1),
only objects with ref_obj_id=1 are invalidated. Since skb dynptr is
non-referenced (ref_obj_id=0), the dynptr and its derived slices remain
accessible.

Current: object (id, ref_obj_id, dynptr_id)
  id         = unique id of the object (for nullness tracking)
  ref_obj_id = id of the referenced object (for lifetime tracking)
  dynptr_id  = id of the parent dynptr (only for PTR_TO_MEM slices)

                      skb (0,1,0)
                             ^
                             ! No link from dynptr to skb
                             +-------------------------------+
                             |           bpf_dynptr_clone    |
                 dynptr A (2,0,0)                dynptr C (4,0,0)
                           ^                               ^
        bpf_dynptr_slice   |                               |
                           |                               |
              slice B (3,0,2)                 slice D (5,0,4)

* Why not simply use ref_obj_id to track the parent? *

A natural first approach is to link dynptr to its parent by sharing
the parent's ref_obj_id and propagating it to slices. Now, releasing
the skb via release_reference(ref_obj_id=1) correctly invalidates all
derived objects.

Attempted fix: share parent's ref_obj_id

                      skb (0,1,0)
                             ^
                             +-------------------------------+
                             |           bpf_dynptr_clone    |
                 dynptr A (2,1,0)                dynptr C (4,1,0)
                           ^                               ^
        bpf_dynptr_slice   |                               |
                           |                               |
              slice B (3,1,2)                 slice D (5,1,4)

However, this approach does not generalize to all dynptr types.
Referenced dynptrs such as file dynptr acquire their own ref_obj_id to
track the dynptr's lifetime. Since ref_obj_id is already
used for the dynptr's own reference, it cannot also be used to point to
the parent file object. While it is possible to add specialized handling
for individual dynptr types [0], it adds complexity and does not
generalize.

An alternative approach is to avoid introducing a new field and instead
repurpose ref_obj_id as parent_id by folding lifetime tracking into id
[1]. In this design, each object is represented as (id, ref_obj_id)
where id is used for both nullness and lifetime tracking, and ref_obj_id
tracks the parent object's id.

Attempted: object (id, ref_obj_id)
  id         = id of the object (for nullness and lifetime tracking)
  ref_obj_id = id of the parent object
  '          = id is referenced

                        skb (1',0)
                             ^
        bpf_dynptr_from_skb  +-------------------------------+
                             |      bpf_dynptr_clone(A, C)   |
                 dynptr A (2,1')                 dynptr C (4,1')
                           ^                               ^
        bpf_dynptr_slice   |                               |
                           |                               |
                slice B (3,2)                   slice D (5,4)

However, this design cannot express the relationship between referenced
socket pointers and their casted counterparts. After pointer casting,
the original and casted pointers need the same lifetime (same ref_obj_id
in the current design) but different nullness (different id). The casted
pointer may be NULL even if the original is valid. With id serving as
the only field for both nullness and lifetime, and ref_obj_id repurposed
as parent, there is no way to express "different identity, same
lifetime."

Referenced socket pointer (expressed using current design):

                                C = ptr_casting_function(A)
                ptr A (1,1,0)                     ptr C (2,1,0)
                         ^                                 ^
                         |                                 |
                        ptr C may be NULL even if ptr A is valid
                        but they have the same lifetime

* New Design: parent_id *

To track precise object relationships, u32 parent_id is added to
bpf_reg_state. A child object's parent_id points to the parent
object's id. This replaces the PTR_TO_MEM-specific dynptr_id, and
does not increase the size of bpf_reg_state on 64-bit machines as
there is existing padding.

After: object (id, ref_obj_id, parent_id)
  id         = unique id of the object (for nullness tracking)
  ref_obj_id = id of the referenced object; objects with the same
               ref_obj_id share the same lifetime
  parent_id  = id of the parent object; points to parent's id
               (for object relationship tracking)

                          skb (1,1,0)
                               ^
          bpf_dynptr_from_skb  +-------------------------------+
                               |      bpf_dynptr_clone(A, C)   |
                 dynptr A (2,0,1)               dynptr C (4,0,1)
                           ^                              ^
        bpf_dynptr_slice   |                              |
                           |                              |
              slice B (3,0,2)                slice D (5,0,4)
                       ^
  bpf_dynptr_from_mem  |
  (NOT allowed yet)    |
         dynptr E (6,0,3)

With parent_id, the verifier can precisely track object trees. When the
skb is freed, the verifier traverses the tree rooted at skb (id=1) and
invalidates all descendants — dynptr A, dynptr C, and their slices.
When dynptr A is destroyed by overwriting the stack slot, only dynptr A
and its children (slice B, dynptr E) are invalidated; skb, dynptr C,
and slice D remain valid.

For referenced dynptr (e.g., file dynptr), the original and its clones
share the same ref_obj_id so they are all invalidated together when any
one of them is released. For non-referenced dynptr (e.g., skb dynptr),
clones live independently since they have ref_obj_id=0.

To avoid recursive call chains when releasing objects (e.g.,
release_reference() -> unmark_stack_slots_dynptr() ->
release_reference()), release_reference() now uses stack-based DFS to
find and invalidate all registers and stack slots with matching id or
ref_obj_id and all descendants whose parent_id matches. Currently, it
skips id == 0, which could be a valid id (e.g., pkt pointer by reading
ctx). Future work may start assigning > 0 id to them. This does not
affect the current use cases where skb and file parents are both given
id > 0.

* Preserving reg->id after null-check *

For parent_id tracking to work, child objects need to refer to the
parent's id. This requires two preparatory changes: assigning reg->id
when reading referenced kptrs from program context (patch 2), and
preserving reg->id of pointer objects after null-check (patch 3).
Previously, null-check would clear reg->id, making it impossible for
children to reference the parent afterward. The latter causes a slight
increase in verified states for some programs. One selftest object
sees +19 states (+5.01%). For Meta BPF objects, the increase is
also minor, with the largest being +34 states (+3.63%).

* Object relationship in different scenarios (for reference) *

The figures below show how the new design handles all four combinations
of referenced/non-referenced dynptr with referenced/non-referenced
parent. The relationship between slices and dynptrs is omitted as it
is the same across all cases. The main difference is how cloned dynptrs
are represented. Since bpf_dynptr_clone() does not initialize a new
reference, clones of referenced dynptrs share the same ref_obj_id and
must be invalidated together. For non-referenced dynptrs, the original
and clones live independently.

(1) Non-referenced dynptr with referenced parent (e.g., skb in Qdisc):

                          skb (1,1,0)
                               ^
          bpf_dynptr_from_skb  +-------------------------------+
                               |      bpf_dynptr_clone(A, C)   |
                 dynptr A (2,0,1)                dynptr C (4,0,1)

(2) Non-referenced dynptr with non-referenced parent (e.g., skb in TC,
    always valid):

      bpf_dynptr_from_skb
                                  bpf_dynptr_clone(A, C)
             dynptr A (1,0,0)                  dynptr C (2,0,0)

                         dynptr A and C live independently

(3) Referenced dynptr with referenced parent:

                     file (1,1,0)
                           ^ ^
     bpf_dynptr_from_file  | +-------------------------------+
                           |       bpf_dynptr_clone(A, C)    |
             dynptr A (2,3,1)                  dynptr C (4,3,1)
                         ^                                 ^
                         |                                 |
                         dynptr A and C have the same lifetime

(4) Referenced dynptr with non-referenced parent:

 bpf_ringbuf_reserve_dynptr
                                  bpf_dynptr_clone(A, C)
             dynptr A (1,1,0)                  dynptr C (2,1,0)
                         ^                                 ^
                         |                                 |
                         dynptr A and C have the same lifetime

[0] https://lore.kernel.org/bpf/20250414161443.1146103-2-memxor@gmail.com/
[1] https://github.com/ameryhung/bpf/commits/obj_relationship_v2_no_parent_id/

Changelog:

v2 -> v3
  - Rebase to bpf-next/master
  - Update veristat numbers
  - Update commit msg to explain multiple dropped checks (Mykyta, Andrii)
  - Reuse idmap as idstack in release_reference() and check for
    duplicate id (Mykyta, Andrii)
  - Change to use RUN_TEST for qdisc dynptr selftest (Eduard) 
  Link: https://lore.kernel.org/bpf/20260307064439.3247440-1-ameryhung@gmail.com/  

v1 -> v2
  - Redesign: Use object (id, ref_obj_id, parent_id) instead of
    (id, ref_obj_id) as it cannot express ptr casting without
    introducing specialized code to handle the case
  - Use stack-based DFS to release objects to avoid recursion (Andrii)
  - Keep reg->id after null check
  - Add dynptr cleanup
  - Fix dynptr kfunc arg type determination
  - Add a file dynptr UAF selftest
  Link: https://lore.kernel.org/bpf/20260202214817.2853236-1-ameryhung@gmail.com/

---

Amery Hung (9):
  bpf: Unify dynptr handling in the verifier
  bpf: Assign reg->id when getting referenced kptr from ctx
  bpf: Preserve reg->id of pointer objects after null-check
  bpf: Refactor object relationship tracking and fix dynptr UAF bug
  bpf: Remove redundant dynptr arg check for helper
  selftests/bpf: Test creating dynptr from dynptr data and slice
  selftests/bpf: Test using dynptr after freeing the underlying object
  selftests/bpf: Test using slice after invalidating dynptr clone
  selftests/bpf: Test using file dynptr after the reference on file is
    dropped

 include/linux/bpf_verifier.h                  |  34 +-
 kernel/bpf/log.c                              |   4 +-
 kernel/bpf/states.c                           |   9 +-
 kernel/bpf/verifier.c                         | 461 ++++++------------
 .../selftests/bpf/prog_tests/bpf_qdisc.c      |   8 +
 ..._qdisc_dynptr_use_after_invalidate_clone.c |  75 +++
 .../progs/bpf_qdisc_fail__invalid_dynptr.c    |  68 +++
 ...f_qdisc_fail__invalid_dynptr_cross_frame.c |  74 +++
 .../bpf_qdisc_fail__invalid_dynptr_slice.c    |  70 +++
 .../testing/selftests/bpf/progs/dynptr_fail.c |  48 +-
 .../selftests/bpf/progs/file_reader_fail.c    |  60 +++
 .../selftests/bpf/progs/user_ringbuf_fail.c   |   4 +-
 12 files changed, 593 insertions(+), 322 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/bpf_qdisc_dynptr_use_after_invalidate_clone.c
 create mode 100644 tools/testing/selftests/bpf/progs/bpf_qdisc_fail__invalid_dynptr.c
 create mode 100644 tools/testing/selftests/bpf/progs/bpf_qdisc_fail__invalid_dynptr_cross_frame.c
 create mode 100644 tools/testing/selftests/bpf/progs/bpf_qdisc_fail__invalid_dynptr_slice.c

-- 
2.52.0

^ permalink raw reply

* [PATCH bpf-next v3 1/9] bpf: Unify dynptr handling in the verifier
From: Amery Hung @ 2026-04-21 22:10 UTC (permalink / raw)
  To: bpf
  Cc: netdev, alexei.starovoitov, andrii, daniel, eddyz87, memxor,
	martin.lau, mykyta.yatsenko5, ameryhung, kernel-team
In-Reply-To: <20260421221016.2967924-1-ameryhung@gmail.com>

Simplify dynptr checking for helper and kfunc by unifying it. Remember
the initialized dynptr (i.e.,g !(arg_type |= MEM_UNINIT)) pass to a
dynptr kfunc during process_dynptr_func() so that we can easily
retrieve the information for verification later. By saving it in
meta->dynptr, there is no need to call dynptr helpers such as
dynptr_id(), dynptr_ref_obj_id() and dynptr_type() in check_func_arg().

Remove and open code the helpers in process_dynptr_func() when
saving id, ref_obj_id, and type. It is okay to drop spi < 0 check as
is_dynptr_reg_valid_init() has made sure the dynptr is valid.

Besides, since dynptr ref_obj_id information is now pass around in
meta->bpf_dynptr_desc, drop the check in helper_multiple_ref_obj_use.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Amery Hung <ameryhung@gmail.com>
---
 include/linux/bpf_verifier.h |  12 ++-
 kernel/bpf/verifier.c        | 178 +++++++----------------------------
 2 files changed, 41 insertions(+), 149 deletions(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index b148f816f25b..dc0cff59246d 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -1319,6 +1319,12 @@ struct bpf_map_desc {
 	int uid;
 };
 
+struct bpf_dynptr_desc {
+	enum bpf_dynptr_type type;
+	u32 id;
+	u32 ref_obj_id;
+};
+
 struct bpf_kfunc_call_arg_meta {
 	/* In parameters */
 	struct btf *btf;
@@ -1359,16 +1365,12 @@ struct bpf_kfunc_call_arg_meta {
 	struct {
 		struct btf_field *field;
 	} arg_rbtree_root;
-	struct {
-		enum bpf_dynptr_type type;
-		u32 id;
-		u32 ref_obj_id;
-	} initialized_dynptr;
 	struct {
 		u8 spi;
 		u8 frameno;
 	} iter;
 	struct bpf_map_desc map;
+	struct bpf_dynptr_desc dynptr;
 	u64 mem_size;
 };
 
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 185210b73385..41e4ea41c72e 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -232,6 +232,7 @@ static void bpf_map_key_store(struct bpf_insn_aux_data *aux, u64 state)
 
 struct bpf_call_arg_meta {
 	struct bpf_map_desc map;
+	struct bpf_dynptr_desc dynptr;
 	bool raw_mode;
 	bool pkt_access;
 	u8 release_regno;
@@ -240,7 +241,6 @@ struct bpf_call_arg_meta {
 	int mem_size;
 	u64 msize_max_value;
 	int ref_obj_id;
-	int dynptr_id;
 	int func_id;
 	struct btf *btf;
 	u32 btf_id;
@@ -434,11 +434,6 @@ static bool is_ptr_cast_function(enum bpf_func_id func_id)
 		func_id == BPF_FUNC_skc_to_tcp_request_sock;
 }
 
-static bool is_dynptr_ref_function(enum bpf_func_id func_id)
-{
-	return func_id == BPF_FUNC_dynptr_data;
-}
-
 static bool is_sync_callback_calling_kfunc(u32 btf_id);
 static bool is_async_callback_calling_kfunc(u32 btf_id);
 static bool is_callback_calling_kfunc(u32 btf_id);
@@ -507,8 +502,6 @@ static bool helper_multiple_ref_obj_use(enum bpf_func_id func_id,
 		ref_obj_uses++;
 	if (is_acquire_function(func_id, map))
 		ref_obj_uses++;
-	if (is_dynptr_ref_function(func_id))
-		ref_obj_uses++;
 
 	return ref_obj_uses > 1;
 }
@@ -7433,7 +7426,8 @@ static int process_kptr_func(struct bpf_verifier_env *env, int regno,
  * and checked dynamically during runtime.
  */
 static int process_dynptr_func(struct bpf_verifier_env *env, int regno, int insn_idx,
-			       enum bpf_arg_type arg_type, int clone_ref_obj_id)
+			       enum bpf_arg_type arg_type, int clone_ref_obj_id,
+			       struct bpf_dynptr_desc *dynptr)
 {
 	struct bpf_reg_state *reg = reg_state(env, regno);
 	int err;
@@ -7499,6 +7493,20 @@ static int process_dynptr_func(struct bpf_verifier_env *env, int regno, int insn
 		}
 
 		err = mark_dynptr_read(env, reg);
+
+		if (dynptr) {
+			struct bpf_func_state *state = bpf_func(env, reg);
+			int spi;
+
+			if (reg->type != CONST_PTR_TO_DYNPTR) {
+				spi = dynptr_get_spi(env, reg);
+				reg = &state->stack[spi].spilled_ptr;
+			}
+
+			dynptr->id = reg->id;
+			dynptr->type = reg->dynptr.type;
+			dynptr->ref_obj_id = reg->ref_obj_id;
+		}
 	}
 	return err;
 }
@@ -8263,72 +8271,6 @@ static int check_func_arg_reg_off(struct bpf_verifier_env *env,
 	}
 }
 
-static struct bpf_reg_state *get_dynptr_arg_reg(struct bpf_verifier_env *env,
-						const struct bpf_func_proto *fn,
-						struct bpf_reg_state *regs)
-{
-	struct bpf_reg_state *state = NULL;
-	int i;
-
-	for (i = 0; i < MAX_BPF_FUNC_REG_ARGS; i++)
-		if (arg_type_is_dynptr(fn->arg_type[i])) {
-			if (state) {
-				verbose(env, "verifier internal error: multiple dynptr args\n");
-				return NULL;
-			}
-			state = &regs[BPF_REG_1 + i];
-		}
-
-	if (!state)
-		verbose(env, "verifier internal error: no dynptr arg found\n");
-
-	return state;
-}
-
-static int dynptr_id(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
-{
-	struct bpf_func_state *state = bpf_func(env, reg);
-	int spi;
-
-	if (reg->type == CONST_PTR_TO_DYNPTR)
-		return reg->id;
-	spi = dynptr_get_spi(env, reg);
-	if (spi < 0)
-		return spi;
-	return state->stack[spi].spilled_ptr.id;
-}
-
-static int dynptr_ref_obj_id(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
-{
-	struct bpf_func_state *state = bpf_func(env, reg);
-	int spi;
-
-	if (reg->type == CONST_PTR_TO_DYNPTR)
-		return reg->ref_obj_id;
-	spi = dynptr_get_spi(env, reg);
-	if (spi < 0)
-		return spi;
-	return state->stack[spi].spilled_ptr.ref_obj_id;
-}
-
-static enum bpf_dynptr_type dynptr_get_type(struct bpf_verifier_env *env,
-					    struct bpf_reg_state *reg)
-{
-	struct bpf_func_state *state = bpf_func(env, reg);
-	int spi;
-
-	if (reg->type == CONST_PTR_TO_DYNPTR)
-		return reg->dynptr.type;
-
-	spi = bpf_get_spi(reg->var_off.value);
-	if (spi < 0) {
-		verbose(env, "verifier internal error: invalid spi when querying dynptr type\n");
-		return BPF_DYNPTR_TYPE_INVALID;
-	}
-
-	return state->stack[spi].spilled_ptr.dynptr.type;
-}
-
 static int check_reg_const_str(struct bpf_verifier_env *env,
 			       struct bpf_reg_state *reg, u32 regno)
 {
@@ -8683,7 +8625,7 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 					 true, meta);
 		break;
 	case ARG_PTR_TO_DYNPTR:
-		err = process_dynptr_func(env, regno, insn_idx, arg_type, 0);
+		err = process_dynptr_func(env, regno, insn_idx, arg_type, 0, &meta->dynptr);
 		if (err)
 			return err;
 		break;
@@ -9342,7 +9284,7 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env, int subprog,
 			if (ret)
 				return ret;
 
-			ret = process_dynptr_func(env, regno, -1, arg->arg_type, 0);
+			ret = process_dynptr_func(env, regno, -1, arg->arg_type, 0, NULL);
 			if (ret)
 				return ret;
 		} else if (base_type(arg->arg_type) == ARG_PTR_TO_BTF_ID) {
@@ -10429,52 +10371,10 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 			}
 		}
 		break;
-	case BPF_FUNC_dynptr_data:
-	{
-		struct bpf_reg_state *reg;
-		int id, ref_obj_id;
-
-		reg = get_dynptr_arg_reg(env, fn, regs);
-		if (!reg)
-			return -EFAULT;
-
-
-		if (meta.dynptr_id) {
-			verifier_bug(env, "meta.dynptr_id already set");
-			return -EFAULT;
-		}
-		if (meta.ref_obj_id) {
-			verifier_bug(env, "meta.ref_obj_id already set");
-			return -EFAULT;
-		}
-
-		id = dynptr_id(env, reg);
-		if (id < 0) {
-			verifier_bug(env, "failed to obtain dynptr id");
-			return id;
-		}
-
-		ref_obj_id = dynptr_ref_obj_id(env, reg);
-		if (ref_obj_id < 0) {
-			verifier_bug(env, "failed to obtain dynptr ref_obj_id");
-			return ref_obj_id;
-		}
-
-		meta.dynptr_id = id;
-		meta.ref_obj_id = ref_obj_id;
-
-		break;
-	}
 	case BPF_FUNC_dynptr_write:
 	{
-		enum bpf_dynptr_type dynptr_type;
-		struct bpf_reg_state *reg;
+		enum bpf_dynptr_type dynptr_type = meta.dynptr.type;
 
-		reg = get_dynptr_arg_reg(env, fn, regs);
-		if (!reg)
-			return -EFAULT;
-
-		dynptr_type = dynptr_get_type(env, reg);
 		if (dynptr_type == BPF_DYNPTR_TYPE_INVALID)
 			return -EFAULT;
 
@@ -10665,10 +10565,7 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 		return -EFAULT;
 	}
 
-	if (is_dynptr_ref_function(func_id))
-		regs[BPF_REG_0].dynptr_id = meta.dynptr_id;
-
-	if (is_ptr_cast_function(func_id) || is_dynptr_ref_function(func_id)) {
+	if (is_ptr_cast_function(func_id)) {
 		/* For release_reference() */
 		regs[BPF_REG_0].ref_obj_id = meta.ref_obj_id;
 	} else if (is_acquire_function(func_id, meta.map.ptr)) {
@@ -10682,6 +10579,11 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 		regs[BPF_REG_0].ref_obj_id = id;
 	}
 
+	if (func_id == BPF_FUNC_dynptr_data) {
+		regs[BPF_REG_0].dynptr_id = meta.dynptr.id;
+		regs[BPF_REG_0].ref_obj_id = meta.dynptr.ref_obj_id;
+	}
+
 	err = do_refine_retval_range(env, regs, fn->ret_type, func_id, &meta);
 	if (err)
 		return err;
@@ -12260,7 +12162,7 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 				meta->release_regno = regno;
 			} else if (meta->func_id == special_kfunc_list[KF_bpf_dynptr_clone] &&
 				   (dynptr_arg_type & MEM_UNINIT)) {
-				enum bpf_dynptr_type parent_type = meta->initialized_dynptr.type;
+				enum bpf_dynptr_type parent_type = meta->dynptr.type;
 
 				if (parent_type == BPF_DYNPTR_TYPE_INVALID) {
 					verifier_bug(env, "no dynptr type for parent of clone");
@@ -12268,29 +12170,17 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 				}
 
 				dynptr_arg_type |= (unsigned int)get_dynptr_type_flag(parent_type);
-				clone_ref_obj_id = meta->initialized_dynptr.ref_obj_id;
+				clone_ref_obj_id = meta->dynptr.ref_obj_id;
 				if (dynptr_type_refcounted(parent_type) && !clone_ref_obj_id) {
 					verifier_bug(env, "missing ref obj id for parent of clone");
 					return -EFAULT;
 				}
 			}
 
-			ret = process_dynptr_func(env, regno, insn_idx, dynptr_arg_type, clone_ref_obj_id);
+			ret = process_dynptr_func(env, regno, insn_idx, dynptr_arg_type, clone_ref_obj_id,
+						  &meta->dynptr);
 			if (ret < 0)
 				return ret;
-
-			if (!(dynptr_arg_type & MEM_UNINIT)) {
-				int id = dynptr_id(env, reg);
-
-				if (id < 0) {
-					verifier_bug(env, "failed to obtain dynptr id");
-					return id;
-				}
-				meta->initialized_dynptr.id = id;
-				meta->initialized_dynptr.type = dynptr_get_type(env, reg);
-				meta->initialized_dynptr.ref_obj_id = dynptr_ref_obj_id(env, reg);
-			}
-
 			break;
 		}
 		case KF_ARG_PTR_TO_ITER:
@@ -12894,7 +12784,7 @@ static int check_special_kfunc(struct bpf_verifier_env *env, struct bpf_kfunc_ca
 		}
 	} else if (meta->func_id == special_kfunc_list[KF_bpf_dynptr_slice] ||
 		   meta->func_id == special_kfunc_list[KF_bpf_dynptr_slice_rdwr]) {
-		enum bpf_type_flag type_flag = get_dynptr_type_flag(meta->initialized_dynptr.type);
+		enum bpf_type_flag type_flag = get_dynptr_type_flag(meta->dynptr.type);
 
 		mark_reg_known_zero(env, regs, BPF_REG_0);
 
@@ -12918,11 +12808,11 @@ static int check_special_kfunc(struct bpf_verifier_env *env, struct bpf_kfunc_ca
 			}
 		}
 
-		if (!meta->initialized_dynptr.id) {
+		if (!meta->dynptr.id) {
 			verifier_bug(env, "no dynptr id");
 			return -EFAULT;
 		}
-		regs[BPF_REG_0].dynptr_id = meta->initialized_dynptr.id;
+		regs[BPF_REG_0].dynptr_id = meta->dynptr.id;
 
 		/* we don't need to set BPF_REG_0's ref obj id
 		 * because packet slices are not refcounted (see
@@ -13110,7 +13000,7 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 	if (meta.release_regno) {
 		struct bpf_reg_state *reg = &regs[meta.release_regno];
 
-		if (meta.initialized_dynptr.ref_obj_id) {
+		if (meta.dynptr.ref_obj_id) {
 			err = unmark_stack_slots_dynptr(env, reg);
 		} else {
 			err = release_reference(env, reg->ref_obj_id);
-- 
2.52.0


^ permalink raw reply related

* [PATCH bpf-next v3 2/9] bpf: Assign reg->id when getting referenced kptr from ctx
From: Amery Hung @ 2026-04-21 22:10 UTC (permalink / raw)
  To: bpf
  Cc: netdev, alexei.starovoitov, andrii, daniel, eddyz87, memxor,
	martin.lau, mykyta.yatsenko5, ameryhung, kernel-team
In-Reply-To: <20260421221016.2967924-1-ameryhung@gmail.com>

Assign reg->id when getting referenced kptr from read program context
to be consistent with R0 of KF_ACQUIRE kfunc. skb dynptr will track the
referenced skb in qdisc programs using a new field reg->parent_id in
a later patch.

Signed-off-by: Amery Hung <ameryhung@gmail.com>
---
 kernel/bpf/verifier.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 41e4ea41c72e..93003a2a96b0 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -6448,8 +6448,6 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
 			} else {
 				mark_reg_known_zero(env, regs,
 						    value_regno);
-				if (type_may_be_null(info.reg_type))
-					regs[value_regno].id = ++env->id_gen;
 				/* A load of ctx field could have different
 				 * actual load size with the one encoded in the
 				 * insn. When the dst is PTR, it is for sure not
@@ -6459,8 +6457,11 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
 				if (base_type(info.reg_type) == PTR_TO_BTF_ID) {
 					regs[value_regno].btf = info.btf;
 					regs[value_regno].btf_id = info.btf_id;
+					regs[value_regno].id = info.ref_obj_id;
 					regs[value_regno].ref_obj_id = info.ref_obj_id;
 				}
+				if (type_may_be_null(info.reg_type) && !regs[value_regno].id)
+					regs[value_regno].id = ++env->id_gen;
 			}
 			regs[value_regno].type = info.reg_type;
 		}
-- 
2.52.0


^ permalink raw reply related

* [PATCH bpf-next v3 3/9] bpf: Preserve reg->id of pointer objects after null-check
From: Amery Hung @ 2026-04-21 22:10 UTC (permalink / raw)
  To: bpf
  Cc: netdev, alexei.starovoitov, andrii, daniel, eddyz87, memxor,
	martin.lau, mykyta.yatsenko5, ameryhung, kernel-team
In-Reply-To: <20260421221016.2967924-1-ameryhung@gmail.com>

Preserve reg->id of pointer objects after null-checking the register so
that children objects derived from it can still refer to it in the new
object relationship tracking mechanism introduced in a later patch. This
change incurs a slight increase in the number of states in one selftest
bpf object, rbtree_search.bpf.o. For Meta bpf objects, the increase of
states is also negligible.

Selftest BPF objects with insns_diff > 0

Program                   Insns (A)  Insns (B)  Insns   (DIFF)  States (A)  States (B)  States (DIFF)
------------------------  ---------  ---------  --------------  ----------  ----------  -------------
rbtree_search                  6820       7326   +506 (+7.42%)         379         398   +19 (+5.01%)

Meta BPF objects with insns_diff > 0

Program                   Insns (A)  Insns (B)  Insns   (DIFF)  States (A)  States (B)  States (DIFF)
------------------------  ---------  ---------  --------------  ----------  ----------  -------------
ned_imex_be_tclass               52         57     +5 (+9.62%)           5           6   +1 (+20.00%)
ned_imex_be_tclass               52         57     +5 (+9.62%)           5           6   +1 (+20.00%)
ned_skop_auto_flowlabel         523        526     +3 (+0.57%)          39          40    +1 (+2.56%)
ned_skop_mss                    289        292     +3 (+1.04%)          20          20    +0 (+0.00%)
ned_skopt_bet_classifier         78         82     +4 (+5.13%)           8           8    +0 (+0.00%)
dctcp_update_alpha              252        320   +68 (+26.98%)          21          27   +6 (+28.57%)
dctcp_update_alpha              252        320   +68 (+26.98%)          21          27   +6 (+28.57%)
ned_ts_func                     119        126     +7 (+5.88%)           6           7   +1 (+16.67%)
tw_egress                      1119       1128     +9 (+0.80%)          95          96    +1 (+1.05%)
tw_ingress                     1128       1137     +9 (+0.80%)          95          96    +1 (+1.05%)
tw_tproxy_router               4380       4465    +85 (+1.94%)         114         118    +4 (+3.51%)
tw_tproxy_router4              3093       3170    +77 (+2.49%)          83          88    +5 (+6.02%)
ttls_tc_ingress               34656      35717  +1061 (+3.06%)         936         970   +34 (+3.63%)
tw_twfw_egress               222327     222338    +11 (+0.00%)       10563       10564    +1 (+0.01%)
tw_twfw_ingress               78295      78299     +4 (+0.01%)        3825        3826    +1 (+0.03%)
tw_twfw_tc_eg                222839     222859    +20 (+0.01%)       10584       10585    +1 (+0.01%)
tw_twfw_tc_in                 78295      78299     +4 (+0.01%)        3825        3826    +1 (+0.03%)
tw_twfw_egress                 8080       8085     +5 (+0.06%)         456         456    +0 (+0.00%)
tw_twfw_ingress                8053       8056     +3 (+0.04%)         454         454    +0 (+0.00%)
tw_twfw_tc_eg                  8154       8174    +20 (+0.25%)         456         457    +1 (+0.22%)
tw_twfw_tc_in                  8060       8063     +3 (+0.04%)         455         455    +0 (+0.00%)
tw_twfw_egress               222327     222338    +11 (+0.00%)       10563       10564    +1 (+0.01%)
tw_twfw_ingress               78295      78299     +4 (+0.01%)        3825        3826    +1 (+0.03%)
tw_twfw_tc_eg                222839     222859    +20 (+0.01%)       10584       10585    +1 (+0.01%)
tw_twfw_tc_in                 78295      78299     +4 (+0.01%)        3825        3826    +1 (+0.03%)
tw_twfw_egress                 8080       8085     +5 (+0.06%)         456         456    +0 (+0.00%)
tw_twfw_ingress                8053       8056     +3 (+0.04%)         454         454    +0 (+0.00%)
tw_twfw_tc_eg                  8154       8174    +20 (+0.25%)         456         457    +1 (+0.22%)
tw_twfw_tc_in                  8060       8063     +3 (+0.04%)         455         455    +0 (+0.00%)

Looking into rbtree_search, the reason for such increase is that the
verifier has to explore the main loop shown below for one more iteration
until state pruning decides the current state is safe.

long rbtree_search(void *ctx)
{
	...
	bpf_spin_lock(&glock0);
	rb_n = bpf_rbtree_root(&groot0);
	while (can_loop) {
		if (!rb_n) {
			bpf_spin_unlock(&glock0);
			return __LINE__;
		}

		n = rb_entry(rb_n, struct node_data, r0);
		if (lookup_key == n->key0)
			break;
		if (nr_gc < NR_NODES)
			gc_ns[nr_gc++] = rb_n;
		if (lookup_key < n->key0)
			rb_n = bpf_rbtree_left(&groot0, rb_n);
		else
			rb_n = bpf_rbtree_right(&groot0, rb_n);
	}
	...
}

Below is what the verifier sees at the start of each iteration
(65: may_goto) after preserving id of rb_n. Without id of rb_n, the
verifier stops exploring the loop at iter 16.

           rb_n  gc_ns[15]
iter 15    257   257

iter 16    290   257    rb_n: idmap add 257->290
                        gc_ns[15]: check 257 != 290 --> state not equal

iter 17    325   257    rb_n: idmap add 290->325
                        gc_ns[15]: idmap add 257->257 --> state safe

Signed-off-by: Amery Hung <ameryhung@gmail.com>
---
 kernel/bpf/verifier.c | 13 ++++---------
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 93003a2a96b0..0313b7d5f6c9 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -15886,15 +15886,10 @@ static void mark_ptr_or_null_reg(struct bpf_func_state *state,
 
 		mark_ptr_not_null_reg(reg);
 
-		if (!reg_may_point_to_spin_lock(reg)) {
-			/* For not-NULL ptr, reg->ref_obj_id will be reset
-			 * in release_reference().
-			 *
-			 * reg->id is still used by spin_lock ptr. Other
-			 * than spin_lock ptr type, reg->id can be reset.
-			 */
-			reg->id = 0;
-		}
+		/*
+		 * reg->id is preserved for object relationship tracking
+		 * and spin_lock lock state tracking
+		 */
 	}
 }
 
-- 
2.52.0


^ permalink raw reply related

* [PATCH bpf-next v3 4/9] bpf: Refactor object relationship tracking and fix dynptr UAF bug
From: Amery Hung @ 2026-04-21 22:10 UTC (permalink / raw)
  To: bpf
  Cc: netdev, alexei.starovoitov, andrii, daniel, eddyz87, memxor,
	martin.lau, mykyta.yatsenko5, ameryhung, kernel-team
In-Reply-To: <20260421221016.2967924-1-ameryhung@gmail.com>

Refactor object relationship tracking in the verifier and fix a dynptr
use-after-free bug where file/skb dynptrs are not invalidated when the
parent referenced object is freed.

Add parent_id to bpf_reg_state to precisely track child-parent
relationships. A child object's parent_id points to the parent object's
id. This replaces the PTR_TO_MEM-specific dynptr_id and does not
increase the size of bpf_reg_state on 64-bit machines as there is
existing padding.

When calling dynptr constructors (i.e., process_dynptr_func() with
MEM_UNINIT argument), track the parent's id if the parent is a
referenced object. This only applies to file dynptr and skb dynptr,
so only pass parent reg->id to kfunc constructors.

For release_reference(), invalidating an object now also invalidates
all descendants by traversing the object tree. This is done using
stack-based DFS to avoid recursive call chains of release_reference() ->
unmark_stack_slots_dynptr() -> release_reference(). Referenced objects
encountered during tree traversal cannot be indirectly released. They
require an explicit helper/kfunc call to release the acquired resources.

While the new design changes how object relationships are tracked in
the verifier, it does not change the verifier's behavior. Here is the
implication for dynptr, pointer casting, and owning/non-owning
references:

Dynptr:

When initializing a dynptr, referenced dynptrs acquire a reference for
ref_obj_id. If the dynptr has a referenced parent, parent_id tracks the
parent's id. When cloning, ref_obj_id and parent_id are copied from the
original. Releasing a referenced dynptr via release_reference(ref_obj_id)
invalidates all clones and derived slices. For non-referenced dynptrs,
only the specific dynptr and its children are invalidated.

Pointer casting:

Referenced socket pointers and their casted counterparts share the same
lifetime but have different nullness — they have different id but the
same ref_obj_id.

Owning to non-owning reference conversion:

After converting owning to non-owning by clearing ref_obj_id (e.g.,
object(id=1, ref_obj_id=1) -> object(id=1, ref_obj_id=0)), the
verifier only needs to release the reference state, so it calls
release_reference_nomark() instead of release_reference().

Note that the error message "reference has not been acquired before" in
the helper and kfunc release paths is removed. This message was already
unreachable. The verifier only calls release_reference() after
confirming meta.ref_obj_id is valid, so the condition could never
trigger in practice (no selftest exercises it either). With the
refactor, release_reference() can now be called with non-acquired ids
and have different error conditions. Report directly in
release_reference() instead.

Fixes: 870c28588afa ("bpf: net_sched: Add basic bpf qdisc kfuncs")
Signed-off-by: Amery Hung <ameryhung@gmail.com>
---
 include/linux/bpf_verifier.h |  22 ++-
 kernel/bpf/log.c             |   4 +-
 kernel/bpf/states.c          |   9 +-
 kernel/bpf/verifier.c        | 264 +++++++++++++++++------------------
 4 files changed, 152 insertions(+), 147 deletions(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index dc0cff59246d..1314299c3763 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -65,7 +65,6 @@ struct bpf_reg_state {
 
 		struct { /* for PTR_TO_MEM | PTR_TO_MEM_OR_NULL */
 			u32 mem_size;
-			u32 dynptr_id; /* for dynptr slices */
 		};
 
 		/* For dynptr stack slots */
@@ -193,6 +192,13 @@ struct bpf_reg_state {
 	 * allowed and has the same effect as bpf_sk_release(sk).
 	 */
 	u32 ref_obj_id;
+	/* Tracks the parent object this register was derived from.
+	 * Used for cascading invalidation: when the parent object is
+	 * released or invalidated, all registers with matching parent_id
+	 * are also invalidated. For example, a slice from bpf_dynptr_data()
+	 * gets parent_id set to the dynptr's id.
+	 */
+	u32 parent_id;
 	/* Inside the callee two registers can be both PTR_TO_STACK like
 	 * R1=fp-8 and R2=fp-8, but one of them points to this function stack
 	 * while another to the caller's stack. To differentiate them 'frameno'
@@ -508,7 +514,7 @@ struct bpf_verifier_state {
 	     iter < frame->allocated_stack / BPF_REG_SIZE;		\
 	     iter++, reg = bpf_get_spilled_reg(iter, frame, mask))
 
-#define bpf_for_each_reg_in_vstate_mask(__vst, __state, __reg, __mask, __expr)   \
+#define bpf_for_each_reg_in_vstate_mask(__vst, __state, __reg, __stack, __mask, __expr)   \
 	({                                                               \
 		struct bpf_verifier_state *___vstate = __vst;            \
 		int ___i, ___j;                                          \
@@ -516,6 +522,7 @@ struct bpf_verifier_state {
 			struct bpf_reg_state *___regs;                   \
 			__state = ___vstate->frame[___i];                \
 			___regs = __state->regs;                         \
+			__stack = NULL;                                  \
 			for (___j = 0; ___j < MAX_BPF_REG; ___j++) {     \
 				__reg = &___regs[___j];                  \
 				(void)(__expr);                          \
@@ -523,14 +530,19 @@ struct bpf_verifier_state {
 			bpf_for_each_spilled_reg(___j, __state, __reg, __mask) { \
 				if (!__reg)                              \
 					continue;                        \
+				__stack = &__state->stack[___j];         \
 				(void)(__expr);                          \
 			}                                                \
 		}                                                        \
 	})
 
 /* Invoke __expr over regsiters in __vst, setting __state and __reg */
-#define bpf_for_each_reg_in_vstate(__vst, __state, __reg, __expr) \
-	bpf_for_each_reg_in_vstate_mask(__vst, __state, __reg, 1 << STACK_SPILL, __expr)
+#define bpf_for_each_reg_in_vstate(__vst, __state, __reg, __expr)		\
+	({									\
+		struct bpf_stack_state * ___stack;                        	\
+		bpf_for_each_reg_in_vstate_mask(__vst, __state, __reg, ___stack,\
+						1 << STACK_SPILL, __expr);	\
+	})
 
 /* linked list of verifier states used to prune search */
 struct bpf_verifier_state_list {
@@ -1323,6 +1335,7 @@ struct bpf_dynptr_desc {
 	enum bpf_dynptr_type type;
 	u32 id;
 	u32 ref_obj_id;
+	u32 parent_id;
 };
 
 struct bpf_kfunc_call_arg_meta {
@@ -1334,6 +1347,7 @@ struct bpf_kfunc_call_arg_meta {
 	const char *func_name;
 	/* Out parameters */
 	u32 ref_obj_id;
+	u32 id;
 	u8 release_regno;
 	bool r0_rdonly;
 	u32 ret_btf_id;
diff --git a/kernel/bpf/log.c b/kernel/bpf/log.c
index 011e4ec25acd..d8dd372e45cd 100644
--- a/kernel/bpf/log.c
+++ b/kernel/bpf/log.c
@@ -667,6 +667,8 @@ static void print_reg_state(struct bpf_verifier_env *env,
 		verbose(env, "%+d", reg->delta);
 	if (reg->ref_obj_id)
 		verbose_a("ref_obj_id=%d", reg->ref_obj_id);
+	if (reg->parent_id)
+		verbose_a("parent_id=%d", reg->parent_id);
 	if (type_is_non_owning_ref(reg->type))
 		verbose_a("%s", "non_own_ref");
 	if (type_is_map_ptr(t)) {
@@ -770,8 +772,6 @@ void print_verifier_state(struct bpf_verifier_env *env, const struct bpf_verifie
 				verbose_a("id=%d", reg->id);
 			if (reg->ref_obj_id)
 				verbose_a("ref_id=%d", reg->ref_obj_id);
-			if (reg->dynptr_id)
-				verbose_a("dynptr_id=%d", reg->dynptr_id);
 			verbose(env, ")");
 			break;
 		case STACK_ITER:
diff --git a/kernel/bpf/states.c b/kernel/bpf/states.c
index 8478d2c6ed5b..72bd3bcda5fb 100644
--- a/kernel/bpf/states.c
+++ b/kernel/bpf/states.c
@@ -494,7 +494,8 @@ static bool regs_exact(const struct bpf_reg_state *rold,
 {
 	return memcmp(rold, rcur, offsetof(struct bpf_reg_state, id)) == 0 &&
 	       check_ids(rold->id, rcur->id, idmap) &&
-	       check_ids(rold->ref_obj_id, rcur->ref_obj_id, idmap);
+	       check_ids(rold->ref_obj_id, rcur->ref_obj_id, idmap) &&
+	       check_ids(rold->parent_id, rcur->parent_id, idmap);
 }
 
 enum exact_level {
@@ -619,7 +620,8 @@ static bool regsafe(struct bpf_verifier_env *env, struct bpf_reg_state *rold,
 		       range_within(rold, rcur) &&
 		       tnum_in(rold->var_off, rcur->var_off) &&
 		       check_ids(rold->id, rcur->id, idmap) &&
-		       check_ids(rold->ref_obj_id, rcur->ref_obj_id, idmap);
+		       check_ids(rold->ref_obj_id, rcur->ref_obj_id, idmap) &&
+		       check_ids(rold->parent_id, rcur->parent_id, idmap);
 	case PTR_TO_PACKET_META:
 	case PTR_TO_PACKET:
 		/* We must have at least as much range as the old ptr
@@ -799,7 +801,8 @@ static bool stacksafe(struct bpf_verifier_env *env, struct bpf_func_state *old,
 			cur_reg = &cur->stack[spi].spilled_ptr;
 			if (old_reg->dynptr.type != cur_reg->dynptr.type ||
 			    old_reg->dynptr.first_slot != cur_reg->dynptr.first_slot ||
-			    !check_ids(old_reg->ref_obj_id, cur_reg->ref_obj_id, idmap))
+			    !check_ids(old_reg->ref_obj_id, cur_reg->ref_obj_id, idmap) ||
+			    !check_ids(old_reg->parent_id, cur_reg->parent_id, idmap))
 				return false;
 			break;
 		case STACK_ITER:
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 0313b7d5f6c9..908a3af0e7c4 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -201,7 +201,7 @@ struct bpf_verifier_stack_elem {
 
 static int acquire_reference(struct bpf_verifier_env *env, int insn_idx);
 static int release_reference_nomark(struct bpf_verifier_state *state, int ref_obj_id);
-static int release_reference(struct bpf_verifier_env *env, int ref_obj_id);
+static int release_reference(struct bpf_verifier_env *env, int id);
 static void invalidate_non_owning_refs(struct bpf_verifier_env *env);
 static bool in_rbtree_lock_required_cb(struct bpf_verifier_env *env);
 static int ref_set_non_owning(struct bpf_verifier_env *env,
@@ -241,6 +241,7 @@ struct bpf_call_arg_meta {
 	int mem_size;
 	u64 msize_max_value;
 	int ref_obj_id;
+	u32 id;
 	int func_id;
 	struct btf *btf;
 	u32 btf_id;
@@ -603,14 +604,14 @@ static enum bpf_type_flag get_dynptr_type_flag(enum bpf_dynptr_type type)
 	}
 }
 
-static bool dynptr_type_refcounted(enum bpf_dynptr_type type)
+static bool dynptr_type_referenced(enum bpf_dynptr_type type)
 {
 	return type == BPF_DYNPTR_TYPE_RINGBUF || type == BPF_DYNPTR_TYPE_FILE;
 }
 
 static void __mark_dynptr_reg(struct bpf_reg_state *reg,
 			      enum bpf_dynptr_type type,
-			      bool first_slot, int dynptr_id);
+			      bool first_slot, int id);
 
 
 static void mark_dynptr_stack_regs(struct bpf_verifier_env *env,
@@ -635,11 +636,12 @@ static int destroy_if_dynptr_stack_slot(struct bpf_verifier_env *env,
 				        struct bpf_func_state *state, int spi);
 
 static int mark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
-				   enum bpf_arg_type arg_type, int insn_idx, int clone_ref_obj_id)
+				   enum bpf_arg_type arg_type, int insn_idx, int parent_id,
+				   struct bpf_dynptr_desc *dynptr)
 {
 	struct bpf_func_state *state = bpf_func(env, reg);
+	int spi, i, err, ref_obj_id = 0;
 	enum bpf_dynptr_type type;
-	int spi, i, err;
 
 	spi = dynptr_get_spi(env, reg);
 	if (spi < 0)
@@ -673,82 +675,56 @@ static int mark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_reg_
 	mark_dynptr_stack_regs(env, &state->stack[spi].spilled_ptr,
 			       &state->stack[spi - 1].spilled_ptr, type);
 
-	if (dynptr_type_refcounted(type)) {
-		/* The id is used to track proper releasing */
-		int id;
-
-		if (clone_ref_obj_id)
-			id = clone_ref_obj_id;
-		else
-			id = acquire_reference(env, insn_idx);
-
-		if (id < 0)
-			return id;
-
-		state->stack[spi].spilled_ptr.ref_obj_id = id;
-		state->stack[spi - 1].spilled_ptr.ref_obj_id = id;
+	if (dynptr->type == BPF_DYNPTR_TYPE_INVALID) { /* dynptr constructors */
+		if (dynptr_type_referenced(type)) {
+			ref_obj_id = acquire_reference(env, insn_idx);
+			if (ref_obj_id < 0)
+				return ref_obj_id;
+		}
+	} else { /* bpf_dynptr_clone() */
+		ref_obj_id = dynptr->ref_obj_id;
+		parent_id = dynptr->parent_id;
 	}
 
+	state->stack[spi].spilled_ptr.ref_obj_id = ref_obj_id;
+	state->stack[spi - 1].spilled_ptr.ref_obj_id = ref_obj_id;
+	state->stack[spi].spilled_ptr.parent_id = parent_id;
+	state->stack[spi - 1].spilled_ptr.parent_id = parent_id;
+
 	return 0;
 }
 
-static void invalidate_dynptr(struct bpf_verifier_env *env, struct bpf_func_state *state, int spi)
+static void invalidate_dynptr(struct bpf_verifier_env *env, struct bpf_func_state *state,
+			      struct bpf_stack_state *stack)
 {
 	int i;
 
 	for (i = 0; i < BPF_REG_SIZE; i++) {
-		state->stack[spi].slot_type[i] = STACK_INVALID;
-		state->stack[spi - 1].slot_type[i] = STACK_INVALID;
+		stack[0].slot_type[i] = STACK_INVALID;
+		stack[1].slot_type[i] = STACK_INVALID;
 	}
 
-	bpf_mark_reg_not_init(env, &state->stack[spi].spilled_ptr);
-	bpf_mark_reg_not_init(env, &state->stack[spi - 1].spilled_ptr);
+	bpf_mark_reg_not_init(env, &stack[0].spilled_ptr);
+	bpf_mark_reg_not_init(env, &stack[1].spilled_ptr);
 }
 
 static int unmark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
 {
 	struct bpf_func_state *state = bpf_func(env, reg);
-	int spi, ref_obj_id, i;
+	int spi;
 
 	spi = dynptr_get_spi(env, reg);
 	if (spi < 0)
 		return spi;
 
-	if (!dynptr_type_refcounted(state->stack[spi].spilled_ptr.dynptr.type)) {
-		invalidate_dynptr(env, state, spi);
-		return 0;
-	}
-
-	ref_obj_id = state->stack[spi].spilled_ptr.ref_obj_id;
-
-	/* If the dynptr has a ref_obj_id, then we need to invalidate
-	 * two things:
-	 *
-	 * 1) Any dynptrs with a matching ref_obj_id (clones)
-	 * 2) Any slices derived from this dynptr.
+	/*
+	 * For referenced dynptr, the clones share the same ref_obj_id and will be
+	 * invalidated too. For non-referenced dynptr, only the dynptr and slices
+	 * derived from it will be invalidated.
 	 */
-
-	/* Invalidate any slices associated with this dynptr */
-	WARN_ON_ONCE(release_reference(env, ref_obj_id));
-
-	/* Invalidate any dynptr clones */
-	for (i = 1; i < state->allocated_stack / BPF_REG_SIZE; i++) {
-		if (state->stack[i].spilled_ptr.ref_obj_id != ref_obj_id)
-			continue;
-
-		/* it should always be the case that if the ref obj id
-		 * matches then the stack slot also belongs to a
-		 * dynptr
-		 */
-		if (state->stack[i].slot_type[0] != STACK_DYNPTR) {
-			verifier_bug(env, "misconfigured ref_obj_id");
-			return -EFAULT;
-		}
-		if (state->stack[i].spilled_ptr.dynptr.first_slot)
-			invalidate_dynptr(env, state, i);
-	}
-
-	return 0;
+	reg = &state->stack[spi].spilled_ptr;
+	return release_reference(env, dynptr_type_referenced(reg->dynptr.type) ?
+				      reg->ref_obj_id : reg->id);
 }
 
 static void __mark_reg_unknown(const struct bpf_verifier_env *env,
@@ -765,10 +741,6 @@ static void mark_reg_invalid(const struct bpf_verifier_env *env, struct bpf_reg_
 static int destroy_if_dynptr_stack_slot(struct bpf_verifier_env *env,
 				        struct bpf_func_state *state, int spi)
 {
-	struct bpf_func_state *fstate;
-	struct bpf_reg_state *dreg;
-	int i, dynptr_id;
-
 	/* We always ensure that STACK_DYNPTR is never set partially,
 	 * hence just checking for slot_type[0] is enough. This is
 	 * different for STACK_SPILL, where it may be only set for
@@ -781,9 +753,9 @@ static int destroy_if_dynptr_stack_slot(struct bpf_verifier_env *env,
 	if (!state->stack[spi].spilled_ptr.dynptr.first_slot)
 		spi = spi + 1;
 
-	if (dynptr_type_refcounted(state->stack[spi].spilled_ptr.dynptr.type)) {
+	if (dynptr_type_referenced(state->stack[spi].spilled_ptr.dynptr.type)) {
 		int ref_obj_id = state->stack[spi].spilled_ptr.ref_obj_id;
-		int ref_cnt = 0;
+		int i, ref_cnt = 0;
 
 		/*
 		 * A referenced dynptr can be overwritten only if there is at
@@ -808,29 +780,8 @@ static int destroy_if_dynptr_stack_slot(struct bpf_verifier_env *env,
 	mark_stack_slot_scratched(env, spi);
 	mark_stack_slot_scratched(env, spi - 1);
 
-	/* Writing partially to one dynptr stack slot destroys both. */
-	for (i = 0; i < BPF_REG_SIZE; i++) {
-		state->stack[spi].slot_type[i] = STACK_INVALID;
-		state->stack[spi - 1].slot_type[i] = STACK_INVALID;
-	}
-
-	dynptr_id = state->stack[spi].spilled_ptr.id;
-	/* Invalidate any slices associated with this dynptr */
-	bpf_for_each_reg_in_vstate(env->cur_state, fstate, dreg, ({
-		/* Dynptr slices are only PTR_TO_MEM_OR_NULL and PTR_TO_MEM */
-		if (dreg->type != (PTR_TO_MEM | PTR_MAYBE_NULL) && dreg->type != PTR_TO_MEM)
-			continue;
-		if (dreg->dynptr_id == dynptr_id)
-			mark_reg_invalid(env, dreg);
-	}));
-
-	/* Do not release reference state, we are destroying dynptr on stack,
-	 * not using some helper to release it. Just reset register.
-	 */
-	bpf_mark_reg_not_init(env, &state->stack[spi].spilled_ptr);
-	bpf_mark_reg_not_init(env, &state->stack[spi - 1].spilled_ptr);
-
-	return 0;
+	/* Invalidate the dynptr and any derived slices */
+	return release_reference(env, state->stack[spi].spilled_ptr.id);
 }
 
 static bool is_dynptr_reg_valid_uninit(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
@@ -1449,15 +1400,15 @@ static void release_reference_state(struct bpf_verifier_state *state, int idx)
 	return;
 }
 
-static bool find_reference_state(struct bpf_verifier_state *state, int ptr_id)
+static struct bpf_reference_state *find_reference_state(struct bpf_verifier_state *state, int ptr_id)
 {
 	int i;
 
 	for (i = 0; i < state->acquired_refs; i++)
 		if (state->refs[i].id == ptr_id)
-			return true;
+			return &state->refs[i];
 
-	return false;
+	return NULL;
 }
 
 static int release_lock_state(struct bpf_verifier_state *state, int type, int id, void *ptr)
@@ -1764,6 +1715,7 @@ static void __mark_reg_known(struct bpf_reg_state *reg, u64 imm)
 	       offsetof(struct bpf_reg_state, var_off) - sizeof(reg->type));
 	reg->id = 0;
 	reg->ref_obj_id = 0;
+	reg->parent_id = 0;
 	___mark_reg_known(reg, imm);
 }
 
@@ -1801,7 +1753,7 @@ static void mark_reg_known_zero(struct bpf_verifier_env *env,
 }
 
 static void __mark_dynptr_reg(struct bpf_reg_state *reg, enum bpf_dynptr_type type,
-			      bool first_slot, int dynptr_id)
+			      bool first_slot, int id)
 {
 	/* reg->type has no meaning for STACK_DYNPTR, but when we set reg for
 	 * callback arguments, it does need to be CONST_PTR_TO_DYNPTR, so simply
@@ -1810,7 +1762,7 @@ static void __mark_dynptr_reg(struct bpf_reg_state *reg, enum bpf_dynptr_type ty
 	__mark_reg_known_zero(reg);
 	reg->type = CONST_PTR_TO_DYNPTR;
 	/* Give each dynptr a unique id to uniquely associate slices to it. */
-	reg->id = dynptr_id;
+	reg->id = id;
 	reg->dynptr.type = type;
 	reg->dynptr.first_slot = first_slot;
 }
@@ -2451,6 +2403,7 @@ void bpf_mark_reg_unknown_imprecise(struct bpf_reg_state *reg)
 	reg->type = SCALAR_VALUE;
 	reg->id = 0;
 	reg->ref_obj_id = 0;
+	reg->parent_id = 0;
 	reg->var_off = tnum_unknown;
 	reg->frameno = 0;
 	reg->precise = false;
@@ -7427,7 +7380,7 @@ static int process_kptr_func(struct bpf_verifier_env *env, int regno,
  * and checked dynamically during runtime.
  */
 static int process_dynptr_func(struct bpf_verifier_env *env, int regno, int insn_idx,
-			       enum bpf_arg_type arg_type, int clone_ref_obj_id,
+			       enum bpf_arg_type arg_type, int parent_id,
 			       struct bpf_dynptr_desc *dynptr)
 {
 	struct bpf_reg_state *reg = reg_state(env, regno);
@@ -7470,7 +7423,8 @@ static int process_dynptr_func(struct bpf_verifier_env *env, int regno, int insn
 				return err;
 		}
 
-		err = mark_stack_slots_dynptr(env, reg, arg_type, insn_idx, clone_ref_obj_id);
+		err = mark_stack_slots_dynptr(env, reg, arg_type, insn_idx, parent_id,
+					      dynptr);
 	} else /* OBJ_RELEASE and None case from above */ {
 		/* For the reg->type == PTR_TO_STACK case, bpf_dynptr is never const */
 		if (reg->type == CONST_PTR_TO_DYNPTR && (arg_type & OBJ_RELEASE)) {
@@ -7507,6 +7461,7 @@ static int process_dynptr_func(struct bpf_verifier_env *env, int regno, int insn
 			dynptr->id = reg->id;
 			dynptr->type = reg->dynptr.type;
 			dynptr->ref_obj_id = reg->ref_obj_id;
+			dynptr->parent_id = reg->parent_id;
 		}
 	}
 	return err;
@@ -8461,7 +8416,7 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 			 */
 			if (reg->type == PTR_TO_STACK) {
 				spi = dynptr_get_spi(env, reg);
-				if (spi < 0 || !state->stack[spi].spilled_ptr.ref_obj_id) {
+				if (spi < 0 || !state->stack[spi].spilled_ptr.id) {
 					verbose(env, "arg %d is an unacquired reference\n", regno);
 					return -EINVAL;
 				}
@@ -8489,6 +8444,7 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 			return -EACCES;
 		}
 		meta->ref_obj_id = reg->ref_obj_id;
+		meta->id = reg->id;
 	}
 
 	switch (base_type(arg_type)) {
@@ -9111,26 +9067,75 @@ static int release_reference_nomark(struct bpf_verifier_state *state, int ref_ob
 	return -EINVAL;
 }
 
-/* The pointer with the specified id has released its reference to kernel
- * resources. Identify all copies of the same pointer and clear the reference.
- *
- * This is the release function corresponding to acquire_reference(). Idempotent.
- */
-static int release_reference(struct bpf_verifier_env *env, int ref_obj_id)
+static int idstack_push(struct bpf_idmap *idmap, u32 id)
+{
+	int i;
+
+	if (!id)
+		return 0;
+
+	for (i = 0; i < idmap->cnt; i++)
+		if (idmap->map[i].old == id)
+			return 0;
+
+	if (WARN_ON_ONCE(idmap->cnt >= BPF_ID_MAP_SIZE))
+		return -EFAULT;
+
+	idmap->map[idmap->cnt++].old = id;
+	return 0;
+}
+
+static int idstack_pop(struct bpf_idmap *idmap)
 {
+	if (!idmap->cnt)
+		return 0;
+
+	return idmap->map[--idmap->cnt].old;
+}
+
+/* Release id and objects referencing the id iteratively in a DFS manner */
+static int release_reference(struct bpf_verifier_env *env, int id)
+{
+	u32 mask = (1 << STACK_SPILL) | (1 << STACK_DYNPTR);
 	struct bpf_verifier_state *vstate = env->cur_state;
+	struct bpf_idmap *idstack = &env->idmap_scratch;
+	struct bpf_stack_state *stack;
 	struct bpf_func_state *state;
 	struct bpf_reg_state *reg;
-	int err;
+	int root_id = id, err;
 
-	err = release_reference_nomark(vstate, ref_obj_id);
-	if (err)
-		return err;
+	idstack->cnt = 0;
+	idstack_push(idstack, id);
 
-	bpf_for_each_reg_in_vstate(vstate, state, reg, ({
-		if (reg->ref_obj_id == ref_obj_id)
-			mark_reg_invalid(env, reg);
-	}));
+	if (find_reference_state(vstate, id))
+		WARN_ON_ONCE(release_reference_nomark(vstate, id));
+
+	while ((id = idstack_pop(idstack))) {
+		bpf_for_each_reg_in_vstate_mask(vstate, state, reg, stack, mask, ({
+			if (reg->id != id && reg->parent_id != id && reg->ref_obj_id != id)
+				continue;
+
+			if (reg->ref_obj_id && id != root_id) {
+				struct bpf_reference_state *ref_state;
+
+				ref_state = find_reference_state(env->cur_state, reg->ref_obj_id);
+				verbose(env, "Unreleased reference id=%d alloc_insn=%d when releasing id=%d\n",
+					ref_state->id, ref_state->insn_idx, root_id);
+				return -EINVAL;
+			}
+
+			if (reg->id != id) {
+				err = idstack_push(idstack, reg->id);
+				if (err)
+					return err;
+			}
+
+			if (!stack || stack->slot_type[BPF_REG_SIZE - 1] == STACK_SPILL)
+				mark_reg_invalid(env, reg);
+			else if (stack->slot_type[BPF_REG_SIZE - 1] == STACK_DYNPTR)
+				invalidate_dynptr(env, state, stack);
+		}));
+	}
 
 	return 0;
 }
@@ -10298,11 +10303,8 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 			 */
 			err = 0;
 		}
-		if (err) {
-			verbose(env, "func %s#%d reference has not been acquired before\n",
-				func_id_name(func_id), func_id);
+		if (err)
 			return err;
-		}
 	}
 
 	switch (func_id) {
@@ -10580,10 +10582,8 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 		regs[BPF_REG_0].ref_obj_id = id;
 	}
 
-	if (func_id == BPF_FUNC_dynptr_data) {
-		regs[BPF_REG_0].dynptr_id = meta.dynptr.id;
-		regs[BPF_REG_0].ref_obj_id = meta.dynptr.ref_obj_id;
-	}
+	if (func_id == BPF_FUNC_dynptr_data)
+		regs[BPF_REG_0].parent_id = meta.dynptr.id;
 
 	err = do_refine_retval_range(env, regs, fn->ret_type, func_id, &meta);
 	if (err)
@@ -12009,6 +12009,7 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 				return -EFAULT;
 			}
 			meta->ref_obj_id = reg->ref_obj_id;
+			meta->id = reg->id;
 			if (is_kfunc_release(meta))
 				meta->release_regno = regno;
 		}
@@ -12145,7 +12146,6 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 		case KF_ARG_PTR_TO_DYNPTR:
 		{
 			enum bpf_arg_type dynptr_arg_type = ARG_PTR_TO_DYNPTR;
-			int clone_ref_obj_id = 0;
 
 			if (is_kfunc_arg_uninit(btf, &args[i]))
 				dynptr_arg_type |= MEM_UNINIT;
@@ -12171,15 +12171,10 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 				}
 
 				dynptr_arg_type |= (unsigned int)get_dynptr_type_flag(parent_type);
-				clone_ref_obj_id = meta->dynptr.ref_obj_id;
-				if (dynptr_type_refcounted(parent_type) && !clone_ref_obj_id) {
-					verifier_bug(env, "missing ref obj id for parent of clone");
-					return -EFAULT;
-				}
 			}
 
-			ret = process_dynptr_func(env, regno, insn_idx, dynptr_arg_type, clone_ref_obj_id,
-						  &meta->dynptr);
+			ret = process_dynptr_func(env, regno, insn_idx, dynptr_arg_type,
+						  meta->ref_obj_id ? meta->id : 0, &meta->dynptr);
 			if (ret < 0)
 				return ret;
 			break;
@@ -12813,12 +12808,7 @@ static int check_special_kfunc(struct bpf_verifier_env *env, struct bpf_kfunc_ca
 			verifier_bug(env, "no dynptr id");
 			return -EFAULT;
 		}
-		regs[BPF_REG_0].dynptr_id = meta->dynptr.id;
-
-		/* we don't need to set BPF_REG_0's ref obj id
-		 * because packet slices are not refcounted (see
-		 * dynptr_type_refcounted)
-		 */
+		regs[BPF_REG_0].parent_id = meta->dynptr.id;
 	} else {
 		return 0;
 	}
@@ -12953,6 +12943,7 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 	if (rcu_lock) {
 		env->cur_state->active_rcu_locks++;
 	} else if (rcu_unlock) {
+		struct bpf_stack_state *stack;
 		struct bpf_func_state *state;
 		struct bpf_reg_state *reg;
 		u32 clear_mask = (1 << STACK_SPILL) | (1 << STACK_ITER);
@@ -12962,7 +12953,7 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 			return -EINVAL;
 		}
 		if (--env->cur_state->active_rcu_locks == 0) {
-			bpf_for_each_reg_in_vstate_mask(env->cur_state, state, reg, clear_mask, ({
+			bpf_for_each_reg_in_vstate_mask(env->cur_state, state, reg, stack, clear_mask, ({
 				if (reg->type & MEM_RCU) {
 					reg->type &= ~(MEM_RCU | PTR_MAYBE_NULL);
 					reg->type |= PTR_UNTRUSTED;
@@ -13005,9 +12996,6 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 			err = unmark_stack_slots_dynptr(env, reg);
 		} else {
 			err = release_reference(env, reg->ref_obj_id);
-			if (err)
-				verbose(env, "kfunc %s#%d reference has not been acquired before\n",
-					func_name, meta.func_id);
 		}
 		if (err)
 			return err;
@@ -13024,7 +13012,7 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 			return err;
 		}
 
-		err = release_reference(env, release_ref_obj_id);
+		err = release_reference_nomark(env->cur_state, release_ref_obj_id);
 		if (err) {
 			verbose(env, "kfunc %s#%d reference has not been acquired before\n",
 				func_name, meta.func_id);
@@ -13114,7 +13102,7 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 
 			/* Ensures we don't access the memory after a release_reference() */
 			if (meta.ref_obj_id)
-				regs[BPF_REG_0].ref_obj_id = meta.ref_obj_id;
+				regs[BPF_REG_0].parent_id = meta.ref_obj_id;
 
 			if (is_kfunc_rcu_protected(&meta))
 				regs[BPF_REG_0].type |= MEM_RCU;
-- 
2.52.0


^ permalink raw reply related

* [PATCH bpf-next v3 5/9] bpf: Remove redundant dynptr arg check for helper
From: Amery Hung @ 2026-04-21 22:10 UTC (permalink / raw)
  To: bpf
  Cc: netdev, alexei.starovoitov, andrii, daniel, eddyz87, memxor,
	martin.lau, mykyta.yatsenko5, ameryhung, kernel-team
In-Reply-To: <20260421221016.2967924-1-ameryhung@gmail.com>

unmark_stack_slots_dynptr() already makes sure that CONST_PTR_TO_DYNPTR
cannot be released. process_dynptr_func() also prevents passing
uninitialized dynptr to helpers expecting initialized dynptr. Now that
unmark_stack_slots_dynptr() also error returned from
release_reference(), there should be no reason to keep these redundant
checks.

Signed-off-by: Amery Hung <ameryhung@gmail.com>
---
 kernel/bpf/verifier.c                         | 21 +------------------
 .../testing/selftests/bpf/progs/dynptr_fail.c |  6 +++---
 .../selftests/bpf/progs/user_ringbuf_fail.c   |  4 ++--
 3 files changed, 6 insertions(+), 25 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 908a3af0e7c4..3ab9bc2fe0e3 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -8405,26 +8405,7 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 
 skip_type_check:
 	if (arg_type_is_release(arg_type)) {
-		if (arg_type_is_dynptr(arg_type)) {
-			struct bpf_func_state *state = bpf_func(env, reg);
-			int spi;
-
-			/* Only dynptr created on stack can be released, thus
-			 * the get_spi and stack state checks for spilled_ptr
-			 * should only be done before process_dynptr_func for
-			 * PTR_TO_STACK.
-			 */
-			if (reg->type == PTR_TO_STACK) {
-				spi = dynptr_get_spi(env, reg);
-				if (spi < 0 || !state->stack[spi].spilled_ptr.id) {
-					verbose(env, "arg %d is an unacquired reference\n", regno);
-					return -EINVAL;
-				}
-			} else {
-				verbose(env, "cannot release unowned const bpf_dynptr\n");
-				return -EINVAL;
-			}
-		} else if (!reg->ref_obj_id && !bpf_register_is_null(reg)) {
+		if (!arg_type_is_dynptr(arg_type) && !reg->ref_obj_id && !bpf_register_is_null(reg)) {
 			verbose(env, "R%d must be referenced when passed to release function\n",
 				regno);
 			return -EINVAL;
diff --git a/tools/testing/selftests/bpf/progs/dynptr_fail.c b/tools/testing/selftests/bpf/progs/dynptr_fail.c
index b62773ce5219..b5fbc9b5c484 100644
--- a/tools/testing/selftests/bpf/progs/dynptr_fail.c
+++ b/tools/testing/selftests/bpf/progs/dynptr_fail.c
@@ -136,7 +136,7 @@ int ringbuf_missing_release_callback(void *ctx)
 
 /* Can't call bpf_ringbuf_submit/discard_dynptr on a non-initialized dynptr */
 SEC("?raw_tp")
-__failure __msg("arg 1 is an unacquired reference")
+__failure __msg("Expected an initialized dynptr as arg #0")
 int ringbuf_release_uninit_dynptr(void *ctx)
 {
 	struct bpf_dynptr ptr;
@@ -650,7 +650,7 @@ int invalid_offset(void *ctx)
 
 /* Can't release a dynptr twice */
 SEC("?raw_tp")
-__failure __msg("arg 1 is an unacquired reference")
+__failure __msg("Expected an initialized dynptr as arg #0")
 int release_twice(void *ctx)
 {
 	struct bpf_dynptr ptr;
@@ -677,7 +677,7 @@ static int release_twice_callback_fn(__u32 index, void *data)
  * within a callback function, fails
  */
 SEC("?raw_tp")
-__failure __msg("arg 1 is an unacquired reference")
+__failure __msg("Expected an initialized dynptr as arg #0")
 int release_twice_callback(void *ctx)
 {
 	struct bpf_dynptr ptr;
diff --git a/tools/testing/selftests/bpf/progs/user_ringbuf_fail.c b/tools/testing/selftests/bpf/progs/user_ringbuf_fail.c
index 54de0389f878..c0d0422b8030 100644
--- a/tools/testing/selftests/bpf/progs/user_ringbuf_fail.c
+++ b/tools/testing/selftests/bpf/progs/user_ringbuf_fail.c
@@ -146,7 +146,7 @@ try_discard_dynptr(struct bpf_dynptr *dynptr, void *context)
  * not be able to read past the end of the pointer.
  */
 SEC("?raw_tp")
-__failure __msg("cannot release unowned const bpf_dynptr")
+__failure __msg("CONST_PTR_TO_DYNPTR cannot be released")
 int user_ringbuf_callback_discard_dynptr(void *ctx)
 {
 	bpf_user_ringbuf_drain(&user_ringbuf, try_discard_dynptr, NULL, 0);
@@ -166,7 +166,7 @@ try_submit_dynptr(struct bpf_dynptr *dynptr, void *context)
  * not be able to read past the end of the pointer.
  */
 SEC("?raw_tp")
-__failure __msg("cannot release unowned const bpf_dynptr")
+__failure __msg("CONST_PTR_TO_DYNPTR cannot be released")
 int user_ringbuf_callback_submit_dynptr(void *ctx)
 {
 	bpf_user_ringbuf_drain(&user_ringbuf, try_submit_dynptr, NULL, 0);
-- 
2.52.0


^ permalink raw reply related

* [PATCH bpf-next v3 6/9] selftests/bpf: Test creating dynptr from dynptr data and slice
From: Amery Hung @ 2026-04-21 22:10 UTC (permalink / raw)
  To: bpf
  Cc: netdev, alexei.starovoitov, andrii, daniel, eddyz87, memxor,
	martin.lau, mykyta.yatsenko5, ameryhung, kernel-team
In-Reply-To: <20260421221016.2967924-1-ameryhung@gmail.com>

The verifier currently does not allow creating dynptr from dynptr data
or slice. Add a selftest to test this explicitly.

Signed-off-by: Amery Hung <ameryhung@gmail.com>
---
 .../testing/selftests/bpf/progs/dynptr_fail.c | 42 +++++++++++++++++++
 1 file changed, 42 insertions(+)

diff --git a/tools/testing/selftests/bpf/progs/dynptr_fail.c b/tools/testing/selftests/bpf/progs/dynptr_fail.c
index b5fbc9b5c484..43beb70f50ee 100644
--- a/tools/testing/selftests/bpf/progs/dynptr_fail.c
+++ b/tools/testing/selftests/bpf/progs/dynptr_fail.c
@@ -705,6 +705,48 @@ int dynptr_from_mem_invalid_api(void *ctx)
 	return 0;
 }
 
+/* Cannot create dynptr from dynptr data */
+SEC("?raw_tp")
+__failure __msg("Unsupported reg type mem for bpf_dynptr_from_mem data")
+int dynptr_from_dynptr_data(void *ctx)
+{
+	struct bpf_dynptr ptr, ptr2;
+	__u8 *data;
+
+	if (get_map_val_dynptr(&ptr))
+		return 0;
+
+	data = bpf_dynptr_data(&ptr, 0, sizeof(__u32));
+	if (!data)
+		return 0;
+
+	/* this should fail */
+	bpf_dynptr_from_mem(data, sizeof(__u32), 0, &ptr2);
+
+	return 0;
+}
+
+/* Cannot create dynptr from dynptr slice */
+SEC("?tc")
+__failure __msg("Unsupported reg type mem for bpf_dynptr_from_mem data")
+int dynptr_from_dynptr_slice(struct __sk_buff *skb)
+{
+	struct bpf_dynptr ptr, ptr2;
+	struct ethhdr *hdr;
+	char buffer[sizeof(*hdr)] = {};
+
+	bpf_dynptr_from_skb(skb, 0, &ptr);
+
+	hdr = bpf_dynptr_slice_rdwr(&ptr, 0, buffer, sizeof(buffer));
+	if (!hdr)
+		return SK_DROP;
+
+	/* this should fail */
+	bpf_dynptr_from_mem(hdr, sizeof(*hdr), 0, &ptr2);
+
+	return SK_PASS;
+}
+
 SEC("?tc")
 __failure __msg("cannot overwrite referenced dynptr") __log_level(2)
 int dynptr_pruning_overwrite(struct __sk_buff *ctx)
-- 
2.52.0


^ permalink raw reply related

* [PATCH bpf-next v3 7/9] selftests/bpf: Test using dynptr after freeing the underlying object
From: Amery Hung @ 2026-04-21 22:10 UTC (permalink / raw)
  To: bpf
  Cc: netdev, alexei.starovoitov, andrii, daniel, eddyz87, memxor,
	martin.lau, mykyta.yatsenko5, ameryhung, kernel-team
In-Reply-To: <20260421221016.2967924-1-ameryhung@gmail.com>

Make sure the verifier invalidates the dynptr and dynptr slice derived
from an skb after the skb is freed.

Signed-off-by: Amery Hung <ameryhung@gmail.com>
---
 .../selftests/bpf/prog_tests/bpf_qdisc.c      |  6 ++
 .../progs/bpf_qdisc_fail__invalid_dynptr.c    | 68 +++++++++++++++++
 ...f_qdisc_fail__invalid_dynptr_cross_frame.c | 74 +++++++++++++++++++
 .../bpf_qdisc_fail__invalid_dynptr_slice.c    | 70 ++++++++++++++++++
 4 files changed, 218 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/progs/bpf_qdisc_fail__invalid_dynptr.c
 create mode 100644 tools/testing/selftests/bpf/progs/bpf_qdisc_fail__invalid_dynptr_cross_frame.c
 create mode 100644 tools/testing/selftests/bpf/progs/bpf_qdisc_fail__invalid_dynptr_slice.c

diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_qdisc.c b/tools/testing/selftests/bpf/prog_tests/bpf_qdisc.c
index 730357cd0c9a..65277c8fc887 100644
--- a/tools/testing/selftests/bpf/prog_tests/bpf_qdisc.c
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_qdisc.c
@@ -8,6 +8,9 @@
 #include "bpf_qdisc_fifo.skel.h"
 #include "bpf_qdisc_fq.skel.h"
 #include "bpf_qdisc_fail__incompl_ops.skel.h"
+#include "bpf_qdisc_fail__invalid_dynptr.skel.h"
+#include "bpf_qdisc_fail__invalid_dynptr_slice.skel.h"
+#include "bpf_qdisc_fail__invalid_dynptr_cross_frame.skel.h"
 
 #define LO_IFINDEX 1
 
@@ -223,6 +226,9 @@ void test_ns_bpf_qdisc(void)
 		test_qdisc_attach_to_non_root();
 	if (test__start_subtest("incompl_ops"))
 		test_incompl_ops();
+	RUN_TESTS(bpf_qdisc_fail__invalid_dynptr);
+	RUN_TESTS(bpf_qdisc_fail__invalid_dynptr_cross_frame);
+	RUN_TESTS(bpf_qdisc_fail__invalid_dynptr_slice);
 }
 
 void serial_test_bpf_qdisc_default(void)
diff --git a/tools/testing/selftests/bpf/progs/bpf_qdisc_fail__invalid_dynptr.c b/tools/testing/selftests/bpf/progs/bpf_qdisc_fail__invalid_dynptr.c
new file mode 100644
index 000000000000..3a20811e3feb
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/bpf_qdisc_fail__invalid_dynptr.c
@@ -0,0 +1,68 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <vmlinux.h>
+#include "bpf_experimental.h"
+#include "bpf_qdisc_common.h"
+#include "bpf_misc.h"
+
+char _license[] SEC("license") = "GPL";
+
+int proto;
+
+SEC("struct_ops")
+__failure __msg("Expected an initialized dynptr as arg #")
+int BPF_PROG(invalid_dynptr, struct sk_buff *skb, struct Qdisc *sch,
+	     struct bpf_sk_buff_ptr *to_free)
+{
+	struct bpf_dynptr ptr;
+	struct ethhdr *hdr;
+
+	bpf_dynptr_from_skb((struct __sk_buff *)skb, 0, &ptr);
+
+	bpf_qdisc_skb_drop(skb, to_free);
+
+	hdr = bpf_dynptr_slice(&ptr, 0, NULL, sizeof(*hdr));
+	if (!hdr)
+		return NET_XMIT_DROP;
+
+	proto = hdr->h_proto;
+
+	return NET_XMIT_DROP;
+}
+
+SEC("struct_ops")
+__auxiliary
+struct sk_buff *BPF_PROG(bpf_qdisc_test_dequeue, struct Qdisc *sch)
+{
+	return NULL;
+}
+
+SEC("struct_ops")
+__auxiliary
+int BPF_PROG(bpf_qdisc_test_init, struct Qdisc *sch, struct nlattr *opt,
+	     struct netlink_ext_ack *extack)
+{
+	return 0;
+}
+
+SEC("struct_ops")
+__auxiliary
+void BPF_PROG(bpf_qdisc_test_reset, struct Qdisc *sch)
+{
+}
+
+SEC("struct_ops")
+__auxiliary
+void BPF_PROG(bpf_qdisc_test_destroy, struct Qdisc *sch)
+{
+}
+
+SEC(".struct_ops")
+struct Qdisc_ops test = {
+	.enqueue   = (void *)invalid_dynptr,
+	.dequeue   = (void *)bpf_qdisc_test_dequeue,
+	.init      = (void *)bpf_qdisc_test_init,
+	.reset     = (void *)bpf_qdisc_test_reset,
+	.destroy   = (void *)bpf_qdisc_test_destroy,
+	.id        = "bpf_qdisc_test",
+};
diff --git a/tools/testing/selftests/bpf/progs/bpf_qdisc_fail__invalid_dynptr_cross_frame.c b/tools/testing/selftests/bpf/progs/bpf_qdisc_fail__invalid_dynptr_cross_frame.c
new file mode 100644
index 000000000000..2e23b8593af9
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/bpf_qdisc_fail__invalid_dynptr_cross_frame.c
@@ -0,0 +1,74 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <vmlinux.h>
+#include "bpf_experimental.h"
+#include "bpf_qdisc_common.h"
+#include "bpf_misc.h"
+
+char _license[] SEC("license") = "GPL";
+
+int proto;
+
+static __noinline int free_skb(struct sk_buff *skb)
+{
+	bpf_kfree_skb(skb);
+	return 0;
+}
+
+SEC("struct_ops")
+__failure __msg("invalid mem access 'scalar'")
+int BPF_PROG(invalid_dynptr_cross_frame, struct sk_buff *skb, struct Qdisc *sch,
+	     struct bpf_sk_buff_ptr *to_free)
+{
+	struct bpf_dynptr ptr;
+	struct ethhdr *hdr;
+
+	bpf_dynptr_from_skb((struct __sk_buff *)skb, 0, &ptr);
+
+	hdr = bpf_dynptr_slice(&ptr, 0, NULL, sizeof(*hdr));
+	if (!hdr)
+		return NET_XMIT_DROP;
+
+	free_skb(skb);
+
+	proto = hdr->h_proto;
+
+	return NET_XMIT_DROP;
+}
+
+SEC("struct_ops")
+__auxiliary
+struct sk_buff *BPF_PROG(bpf_qdisc_test_dequeue, struct Qdisc *sch)
+{
+	return NULL;
+}
+
+SEC("struct_ops")
+__auxiliary
+int BPF_PROG(bpf_qdisc_test_init, struct Qdisc *sch, struct nlattr *opt,
+	     struct netlink_ext_ack *extack)
+{
+	return 0;
+}
+
+SEC("struct_ops")
+__auxiliary
+void BPF_PROG(bpf_qdisc_test_reset, struct Qdisc *sch)
+{
+}
+
+SEC("struct_ops")
+__auxiliary
+void BPF_PROG(bpf_qdisc_test_destroy, struct Qdisc *sch)
+{
+}
+
+SEC(".struct_ops")
+struct Qdisc_ops test = {
+	.enqueue   = (void *)invalid_dynptr_cross_frame,
+	.dequeue   = (void *)bpf_qdisc_test_dequeue,
+	.init      = (void *)bpf_qdisc_test_init,
+	.reset     = (void *)bpf_qdisc_test_reset,
+	.destroy   = (void *)bpf_qdisc_test_destroy,
+	.id        = "bpf_qdisc_test",
+};
diff --git a/tools/testing/selftests/bpf/progs/bpf_qdisc_fail__invalid_dynptr_slice.c b/tools/testing/selftests/bpf/progs/bpf_qdisc_fail__invalid_dynptr_slice.c
new file mode 100644
index 000000000000..731216c4e45a
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/bpf_qdisc_fail__invalid_dynptr_slice.c
@@ -0,0 +1,70 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <vmlinux.h>
+#include "bpf_experimental.h"
+#include "bpf_qdisc_common.h"
+#include "bpf_misc.h"
+
+char _license[] SEC("license") = "GPL";
+
+int proto;
+
+SEC("struct_ops")
+__failure __msg("invalid mem access 'scalar'")
+int BPF_PROG(invalid_dynptr_slice, struct sk_buff *skb, struct Qdisc *sch,
+	     struct bpf_sk_buff_ptr *to_free)
+{
+	struct bpf_dynptr ptr;
+	struct ethhdr *hdr;
+
+	bpf_dynptr_from_skb((struct __sk_buff *)skb, 0, &ptr);
+
+	hdr = bpf_dynptr_slice(&ptr, 0, NULL, sizeof(*hdr));
+	if (!hdr) {
+		bpf_qdisc_skb_drop(skb, to_free);
+		return NET_XMIT_DROP;
+	}
+
+	bpf_qdisc_skb_drop(skb, to_free);
+
+	proto = hdr->h_proto;
+
+	return NET_XMIT_DROP;
+}
+
+SEC("struct_ops")
+__auxiliary
+struct sk_buff *BPF_PROG(bpf_qdisc_test_dequeue, struct Qdisc *sch)
+{
+	return NULL;
+}
+
+SEC("struct_ops")
+__auxiliary
+int BPF_PROG(bpf_qdisc_test_init, struct Qdisc *sch, struct nlattr *opt,
+	     struct netlink_ext_ack *extack)
+{
+	return 0;
+}
+
+SEC("struct_ops")
+__auxiliary
+void BPF_PROG(bpf_qdisc_test_reset, struct Qdisc *sch)
+{
+}
+
+SEC("struct_ops")
+__auxiliary
+void BPF_PROG(bpf_qdisc_test_destroy, struct Qdisc *sch)
+{
+}
+
+SEC(".struct_ops")
+struct Qdisc_ops test = {
+	.enqueue   = (void *)invalid_dynptr_slice,
+	.dequeue   = (void *)bpf_qdisc_test_dequeue,
+	.init      = (void *)bpf_qdisc_test_init,
+	.reset     = (void *)bpf_qdisc_test_reset,
+	.destroy   = (void *)bpf_qdisc_test_destroy,
+	.id        = "bpf_qdisc_test",
+};
-- 
2.52.0


^ permalink raw reply related

* [PATCH bpf-next v3 8/9] selftests/bpf: Test using slice after invalidating dynptr clone
From: Amery Hung @ 2026-04-21 22:10 UTC (permalink / raw)
  To: bpf
  Cc: netdev, alexei.starovoitov, andrii, daniel, eddyz87, memxor,
	martin.lau, mykyta.yatsenko5, ameryhung, kernel-team
In-Reply-To: <20260421221016.2967924-1-ameryhung@gmail.com>

The parent object of a cloned dynptr is skb not the original dynptr.
Invalidate the original dynptr should not prevent the program from
using the slice derived from the cloned dynptr.

Signed-off-by: Amery Hung <ameryhung@gmail.com>
---
 .../selftests/bpf/prog_tests/bpf_qdisc.c      |  2 +
 ..._qdisc_dynptr_use_after_invalidate_clone.c | 75 +++++++++++++++++++
 2 files changed, 77 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/progs/bpf_qdisc_dynptr_use_after_invalidate_clone.c

diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_qdisc.c b/tools/testing/selftests/bpf/prog_tests/bpf_qdisc.c
index 65277c8fc887..77f1c0550c9b 100644
--- a/tools/testing/selftests/bpf/prog_tests/bpf_qdisc.c
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_qdisc.c
@@ -11,6 +11,7 @@
 #include "bpf_qdisc_fail__invalid_dynptr.skel.h"
 #include "bpf_qdisc_fail__invalid_dynptr_slice.skel.h"
 #include "bpf_qdisc_fail__invalid_dynptr_cross_frame.skel.h"
+#include "bpf_qdisc_dynptr_use_after_invalidate_clone.skel.h"
 
 #define LO_IFINDEX 1
 
@@ -229,6 +230,7 @@ void test_ns_bpf_qdisc(void)
 	RUN_TESTS(bpf_qdisc_fail__invalid_dynptr);
 	RUN_TESTS(bpf_qdisc_fail__invalid_dynptr_cross_frame);
 	RUN_TESTS(bpf_qdisc_fail__invalid_dynptr_slice);
+	RUN_TESTS(bpf_qdisc_dynptr_use_after_invalidate_clone);
 }
 
 void serial_test_bpf_qdisc_default(void)
diff --git a/tools/testing/selftests/bpf/progs/bpf_qdisc_dynptr_use_after_invalidate_clone.c b/tools/testing/selftests/bpf/progs/bpf_qdisc_dynptr_use_after_invalidate_clone.c
new file mode 100644
index 000000000000..cca2accf081d
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/bpf_qdisc_dynptr_use_after_invalidate_clone.c
@@ -0,0 +1,75 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <vmlinux.h>
+#include "bpf_experimental.h"
+#include "bpf_qdisc_common.h"
+#include "bpf_misc.h"
+
+char _license[] SEC("license") = "GPL";
+
+int proto;
+
+SEC("struct_ops")
+__success
+int BPF_PROG(dynptr_use_after_invalidate_clone, struct sk_buff *skb, struct Qdisc *sch,
+	     struct bpf_sk_buff_ptr *to_free)
+{
+	struct bpf_dynptr ptr, ptr_clone;
+	struct ethhdr *hdr;
+
+	bpf_dynptr_from_skb((struct __sk_buff *)skb, 0, &ptr);
+
+	bpf_dynptr_clone(&ptr, &ptr_clone);
+
+	hdr = bpf_dynptr_slice(&ptr_clone, 0, NULL, sizeof(*hdr));
+	if (!hdr) {
+		bpf_qdisc_skb_drop(skb, to_free);
+		return NET_XMIT_DROP;
+	}
+
+	*(int *)&ptr = 0;
+
+	proto = hdr->h_proto;
+
+	bpf_qdisc_skb_drop(skb, to_free);
+
+	return NET_XMIT_DROP;
+}
+
+SEC("struct_ops")
+__auxiliary
+struct sk_buff *BPF_PROG(bpf_qdisc_test_dequeue, struct Qdisc *sch)
+{
+	return NULL;
+}
+
+SEC("struct_ops")
+__auxiliary
+int BPF_PROG(bpf_qdisc_test_init, struct Qdisc *sch, struct nlattr *opt,
+	     struct netlink_ext_ack *extack)
+{
+	return 0;
+}
+
+SEC("struct_ops")
+__auxiliary
+void BPF_PROG(bpf_qdisc_test_reset, struct Qdisc *sch)
+{
+}
+
+SEC("struct_ops")
+__auxiliary
+void BPF_PROG(bpf_qdisc_test_destroy, struct Qdisc *sch)
+{
+}
+
+SEC(".struct_ops")
+struct Qdisc_ops test = {
+	.enqueue   = (void *)dynptr_use_after_invalidate_clone,
+	.dequeue   = (void *)bpf_qdisc_test_dequeue,
+	.init      = (void *)bpf_qdisc_test_init,
+	.reset     = (void *)bpf_qdisc_test_reset,
+	.destroy   = (void *)bpf_qdisc_test_destroy,
+	.id        = "bpf_qdisc_test",
+};
+
-- 
2.52.0


^ permalink raw reply related

* [PATCH bpf-next v3 9/9] selftests/bpf: Test using file dynptr after the reference on file is dropped
From: Amery Hung @ 2026-04-21 22:10 UTC (permalink / raw)
  To: bpf
  Cc: netdev, alexei.starovoitov, andrii, daniel, eddyz87, memxor,
	martin.lau, mykyta.yatsenko5, ameryhung, kernel-team
In-Reply-To: <20260421221016.2967924-1-ameryhung@gmail.com>

File dynptr and slice should be invalidated when the parent file's
reference is dropped in the program. Without the verifier tracking
dyntpr's parent referenced object, the dynptr would continute to be
incorrectly used even if the underlying file is being tear down or gone.

Signed-off-by: Amery Hung <ameryhung@gmail.com>
---
 .../selftests/bpf/progs/file_reader_fail.c    | 60 +++++++++++++++++++
 1 file changed, 60 insertions(+)

diff --git a/tools/testing/selftests/bpf/progs/file_reader_fail.c b/tools/testing/selftests/bpf/progs/file_reader_fail.c
index 32fe28ed2439..a7102737abfe 100644
--- a/tools/testing/selftests/bpf/progs/file_reader_fail.c
+++ b/tools/testing/selftests/bpf/progs/file_reader_fail.c
@@ -50,3 +50,63 @@ int xdp_no_dynptr_type(struct xdp_md *xdp)
 	bpf_dynptr_file_discard(&dynptr);
 	return 0;
 }
+
+SEC("lsm/file_open")
+__failure
+__msg("Expected an initialized dynptr as arg #2")
+int use_file_dynptr_after_put_file(void *ctx)
+{
+	struct task_struct *task = bpf_get_current_task_btf();
+	struct file *file = bpf_get_task_exe_file(task);
+	struct bpf_dynptr dynptr;
+	char buf[64];
+
+	if (!file)
+		return 0;
+
+	if (bpf_dynptr_from_file(file, 0, &dynptr))
+		goto out;
+
+	bpf_put_file(file);
+
+	/* this should fail - dynptr is invalid after file ref is dropped */
+	bpf_dynptr_read(buf, sizeof(buf), &dynptr, 0, 0);
+	return 0;
+
+out:
+	bpf_dynptr_file_discard(&dynptr);
+	bpf_put_file(file);
+	return 0;
+}
+
+SEC("lsm/file_open")
+__failure
+__msg("invalid mem access 'scalar'")
+int use_file_dynptr_slice_after_put_file(void *ctx)
+{
+	struct task_struct *task = bpf_get_current_task_btf();
+	struct file *file = bpf_get_task_exe_file(task);
+	struct bpf_dynptr dynptr;
+	char *data;
+
+	if (!file)
+		return 0;
+
+	if (bpf_dynptr_from_file(file, 0, &dynptr))
+		goto out;
+
+	data = bpf_dynptr_data(&dynptr, 0, 1);
+	if (!data)
+		goto out;
+
+	bpf_put_file(file);
+
+	/* this should fail - data slice is invalid after file ref is dropped */
+	*data = 'x';
+	return 0;
+
+out:
+	bpf_dynptr_file_discard(&dynptr);
+	bpf_put_file(file);
+	return 0;
+}
-- 
2.52.0


^ permalink raw reply related

* Re: [PATCH net v8 03/15] net: cache snapshot entries for ndo_set_rx_mode_async
From: Stanislav Fomichev @ 2026-04-21 22:19 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: netdev, davem, edumazet, kuba
In-Reply-To: <12244c7a-07c5-4d75-98aa-3fe35c149272@redhat.com>

On 04/21, Paolo Abeni wrote:
> On 4/21/26 11:52 AM, Paolo Abeni wrote:
> > On 4/16/26 8:57 PM, Stanislav Fomichev wrote:
> >> Add a per-device netdev_hw_addr_list cache (rx_mode_addr_cache) that
> >> allows __hw_addr_list_snapshot() and __hw_addr_list_reconcile() to
> >> reuse previously allocated entries instead of hitting GFP_ATOMIC on
> >> every snapshot cycle.
> >>
> >> snapshot pops entries from the cache when available, falling back to
> >> __hw_addr_create(). reconcile splices both snapshot lists back into
> >> the cache via __hw_addr_splice(). The cache is flushed in
> >> free_netdev().
> >>
> >> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
> >> (cherry picked from commit ba3ab1832a511f660fdc6231245b14bf610c05bd)
> > 
> > Are you backporting from 7.2 via time machine??? :-P
> > 
> >> @@ -611,8 +633,8 @@ void __hw_addr_list_reconcile(struct netdev_hw_addr_list *real_list,
> >>  		}
> >>  	}
> >>  
> >> -	__hw_addr_flush(work);
> >> -	__hw_addr_flush(ref);
> >> +	__hw_addr_splice(cache, work);
> >> +	__hw_addr_splice(cache, ref);
> > 
> > I think here sashiko has a point, with the cache size being unbounded. I
> > guess syzkaller or the like will find a way to make it grow too much.
> > 
> > What about hard-limit it to some reasonable value?!?
> 
> There are a few more remarks from sashiko at the driver level, but
> AFAICS are all pre-existing issues.
> 
> I think even the above one it's better handled as a follow-up, so I'm
> applying the series as-is (I'll just drop the cherry-pick statement above).

I have a follow-up series to add retries here, but not sure it's the
right direction. I can send it once net-next opens just to opinions.

^ permalink raw reply

* Re: [PATCH net v2 1/8] xsk: reject sw-csum UMEM binding to IFF_TX_SKB_NO_LINEAR devices
From: Stanislav Fomichev @ 2026-04-21 22:20 UTC (permalink / raw)
  To: Jason Xing; +Cc: bpf, netdev, Jason Xing
In-Reply-To: <CAL+tcoCwEeOriG3y-EZm6TfDenr3uCEDXUJ8MtrW9x=Jx40ohQ@mail.gmail.com>

On 04/21, Jason Xing wrote:
> On Tue, Apr 21, 2026 at 3:34 AM Stanislav Fomichev <sdf.kernel@gmail.com> wrote:
> >
> > > From: Jason Xing <kernelxing@tencent.com>
> > >
> > > skb_checksum_help() is a common helper that writes the folded
> > > 16-bit checksum back via skb->data + csum_start + csum_offset,
> > > i.e. it relies on the skb's linear head and fails (with WARN_ONCE
> > > and -EINVAL) when skb_headlen() is 0.
> > >
> > > AF_XDP generic xmit takes two very different paths depending on the
> > > netdev. Drivers that advertise IFF_TX_SKB_NO_LINEAR (e.g. virtio_net)
> > > skip the "copy payload into a linear head" step on purpose as a
> > > performance optimisation: xsk_build_skb_zerocopy() only attaches UMEM
> > > pages as frags and never calls skb_put(), so skb_headlen() stays 0
> > > for the whole skb. For these skbs there is simply no linear area for
> > > skb_checksum_help() to write the csum into - the sw-csum fallback is
> > > structurally inapplicable.
> > >
> > > The patch tries to catch this and reject the combination with error at
> > > setup time. Rejecting at bind() converts this silent per-packet failure
> > > into a synchronous, actionable -EOPNOTSUPP at setup time. HW csum and
> > > launch_time metadata on IFF_TX_SKB_NO_LINEAR drivers are unaffected
> > > because they do not call skb_checksum_help().
> > >
> > > Without the patch, every descriptor carrying 'XDP_TX_METADATA |
> > > XDP_TXMD_FLAGS_CHECKSUM' produces:
> > > 1) a WARN_ONCE "offset (N) >= skb_headlen() (0)" from skb_checksum_help(),
> > > 2) sendmsg() returning -EINVAL without consuming the descriptor
> > >    (invalid_descs is not incremented),
> > > 3) a wedged TX ring: __xsk_generic_xmit() does not advance the
> > >     consumer on non-EOVERFLOW errors, so the next sendmsg() re-reads
> > >     the same descriptor and re-hits the same WARN until the socket
> > >     is closed.
> > >
> > > Closes: https://lore.kernel.org/all/20260419045822.843BFC2BCAF@smtp.kernel.org/#t
> > > Fixes: 30c3055f9c0d ("xsk: wrap generic metadata handling onto separate function")
> > > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > > ---
> > >  net/xdp/xsk_buff_pool.c | 3 +++
> > >  1 file changed, 3 insertions(+)
> > >
> > > diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
> > > index 37b7a68b89b3..c2521b6547e3 100644
> > > --- a/net/xdp/xsk_buff_pool.c
> > > +++ b/net/xdp/xsk_buff_pool.c
> > > @@ -169,6 +169,9 @@ int xp_assign_dev(struct xsk_buff_pool *pool,
> > >       if (force_zc && force_copy)
> > >               return -EINVAL;
> > >
> > > +     if (pool->tx_sw_csum && (netdev->priv_flags & IFF_TX_SKB_NO_LINEAR))
> > > +             return -EOPNOTSUPP;
> > > +
> > >       if (xsk_get_pool_from_qid(netdev, queue_id))
> > >               return -EBUSY;
> > >
> > > --
> > > 2.41.3
> > >
> >
> > Wondering whether a better fixes tag is commit 11614723af26 ("xsk: Add option
> > to calculate TX checksum in SW")?
> >
> > Acked-by: Stanislav Fomichev <sdf@fomichev.me>
> 
> Thanks for the check. But not really. It is the commit 30c3055f9c0d
> that brings the csum support of IFF_TX_SKB_NO_LINEAR case where this
> issue can be triggered (because this mode no longer puts data into skb
> linear area).

Ah, right, good point, it's the IFF_TX_SKB_NO_LINEAR path.

^ permalink raw reply

* Re: [PATCH net v2 8/8] xsk: fix u64 descriptor address truncation on 32-bit architectures
From: Stanislav Fomichev @ 2026-04-21 22:23 UTC (permalink / raw)
  To: Jason Xing
  Cc: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
	maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
	john.fastabend, bpf, netdev, Jason Xing
In-Reply-To: <CAL+tcoCxQ8cU+wr7eJsMH+FPMcp+vQ4J4BdSdsVqyTRqN8q9LQ@mail.gmail.com>

On 04/21, Jason Xing wrote:
> On Tue, Apr 21, 2026 at 3:49 AM Stanislav Fomichev <sdf.kernel@gmail.com> wrote:
> >
> > On 04/20, Jason Xing wrote:
> > > From: Jason Xing <kernelxing@tencent.com>
> > >
> > > In copy mode TX, xsk_skb_destructor_set_addr() stores the 64-bit
> > > descriptor address into skb_shinfo(skb)->destructor_arg (void *) via a
> > > uintptr_t cast:
> > >
> > >     skb_shinfo(skb)->destructor_arg = (void *)((uintptr_t)addr | 0x1UL);
> > >
> > > On 32-bit architectures uintptr_t is 32 bits, so the upper 32 bits of
> > > the descriptor address are silently dropped. In XDP_ZEROCOPY unaligned
> > > mode the chunk offset is encoded in bits 48-63 of the descriptor
> > > address (XSK_UNALIGNED_BUF_OFFSET_SHIFT = 48), meaning the offset is
> > > lost entirely. The completion queue then returns a truncated address to
> > > userspace, making buffer recycling impossible.
> > >
> > > Fix this by handling the 32-bit case directly in
> > > xsk_skb_destructor_set_addr(): when !CONFIG_64BIT, allocate an xsk_addrs
> > > struct (the same path already used for multi-descriptor SKBs) to store
> > > the full u64 address.
> >
> > Is it easier to make XSK `depends on 64BIT` to avoid dealing with that? Does
> 
> Of course, it would be super easy. Actually the initial version looks
> like this. One line of coder is simply enough.
> 
> > anybody seriously run af_xdp on 32 bit systems?
> 
> But my worry as you guess is if there exists a 32 bit system? I doubt
> it. That's why I put some effort into adding the compatibility code to
> cover the case. Good news is that it doesn't add any side effects of
> the 64 bit system since they are protected under IS_ENABLED condition.

If someone complains, we can follow up? af_xdp is all u64 everywhere,
including uapi, I doubt someone is using it on 32 bit systems. We don't
test this part on 32 bit systems on nipa either..

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox