[PATCH bpf-next v4 0/6] bpf, skmsg: some fixes for skmsg

BPF List
 help / color / mirror / Atom feed

* [PATCH bpf-next v4 0/6] bpf, skmsg: some fixes for skmsg
@ 2026-06-15  2:19 Jiayuan Chen
  2026-06-15  2:19 ` [PATCH bpf-next v4 1/6] bpf, sockmap: reject overflowing copy + len in bpf_msg_push_data() Jiayuan Chen
                   ` (6 more replies)
  0 siblings, 7 replies; 11+ messages in thread
From: Jiayuan Chen @ 2026-06-15  2:19 UTC (permalink / raw)
  To: bpf; +Cc: jiayuan.chen

All fixes are from previous patches sent by Weiming Shi, Zhang Cen,
Kuniyuki and Sechang Lim, which have already been reviewed by me and John and Jakub.

https://lore.kernel.org/bpf/20260610081218.506709-2-rhkrqnwk98@gmail.com/
https://lore.kernel.org/bpf/20260520102715.3033936-1-rollkingzzc@gmail.com/
https://lore.kernel.org/bpf/20260424191602.1522411-3-bestswngs@gmail.com/
https://lore.kernel.org/bpf/20260423155807.1245644-2-bestswngs@gmail.com/
https://lore.kernel.org/bpf/20260221233234.3814768-4-kuniyu@google.com/

The automated reviewer (sashiko) may still flag a few other potential
issues on top of this series. After looking into them, they are either
already covered by the patches here, are the BPF program's own
responsibility (e.g. initializing the payload it pushes) and intentionally
left out, or only reachable under very narrow conditions that require a
specially crafted BPF program and an unusual sk_msg ring state, so they are
not practical to trigger and are left out of this series. I'm collecting
these fixes together because the same
problems have been re-sent many times in slightly different forms, and I
hope this series can be prioritized for merging so the duplicates can
finally settle. With so many AI-generated patches floating around for
these spots, leaving them unmerged just keeps wasting maintainer review
cycles on the same issues.

v3->v4: Carry Kuniyuki Iwashima's reviewed-by tag.
        Drop the __GFP_ZERO patch; initializing the pushed payload is the
        BPF program's responsibility, not the kernel's (per maintainer
        feedback).
        https://lore.kernel.org/bpf/20260612130919.299124-1-jiayuan.chen@linux.dev/
v2->v3: Target to bpf-next and carry Emil's reviewed-by tag.
        Reverse xmas tree style is used suggested by Cong.
        (not all code match reverse xmas tree due to variable dependency)
v1->v2: fix problem when fix the conflict.

Kuniyuki Iwashima (1):
  sockmap: Fix use-after-free in udp_bpf_recvmsg()

Sechang Lim (2):
  bpf, sockmap: fix integer overflow in bpf_msg_pop_data() bounds check
  selftests/bpf: add test for bpf_msg_pop_data() overflow

Weiming Shi (2):
  bpf, sockmap: reject overflowing copy + len in bpf_msg_push_data()
  bpf, sockmap: Fix wrong rsge offset in bpf_msg_push_data()

Zhang Cen (1):
  bpf, sockmap: keep sk_msg copy state in sync

 net/core/filter.c                             | 97 +++++++++++++++++--
 net/ipv4/udp_bpf.c                            |  9 ++
 .../selftests/bpf/prog_tests/sockmap_basic.c  | 48 +++++++++
 .../bpf/progs/test_sockmap_msg_pop_data.c     | 27 ++++++
 4 files changed, 173 insertions(+), 8 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/test_sockmap_msg_pop_data.c

-- 
2.43.0

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH bpf-next v4 1/6] bpf, sockmap: reject overflowing copy + len in bpf_msg_push_data()
  2026-06-15  2:19 [PATCH bpf-next v4 0/6] bpf, skmsg: some fixes for skmsg Jiayuan Chen
@ 2026-06-15  2:19 ` Jiayuan Chen
  2026-06-15  2:32   ` sashiko-bot
  2026-06-15  2:19 ` [PATCH bpf-next v4 2/6] bpf, sockmap: Fix wrong rsge offset " Jiayuan Chen
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 11+ messages in thread
From: Jiayuan Chen @ 2026-06-15  2:19 UTC (permalink / raw)
  To: bpf; +Cc: jiayuan.chen

From: Weiming Shi <bestswngs@gmail.com>

When the scatterlist ring is full or nearly full, bpf_msg_push_data()
enters a copy fallback path and computes copy + len for the page
allocation size. Since len comes from BPF with arg3_type = ARG_ANYTHING
and both are u32, a crafted len can wrap the sum to a small value,
causing an undersized allocation followed by an out-of-bounds memcpy.

 BUG: unable to handle page fault for address: ffffed104089a402
 Oops: Oops: 0000 [#1] SMP KASAN NOPTI
 Call Trace:
  __asan_memcpy (mm/kasan/shadow.c:105)
  bpf_msg_push_data (net/core/filter.c:2852 net/core/filter.c:2788)
  bpf_prog_9ed8b5711920a7d7+0x2e/0x36
  sk_psock_msg_verdict (net/core/skmsg.c:934)
  tcp_bpf_sendmsg (net/ipv4/tcp_bpf.c:421 net/ipv4/tcp_bpf.c:584)
  __sys_sendto (net/socket.c:2206)
  do_syscall_64 (arch/x86/entry/syscall_64.c:94)
  entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)

Add an overflow check before the allocation.

Link: https://lore.kernel.org/all/20260424155913.A19FDC19425@smtp.kernel.org
Fixes: 6fff607e2f14 ("bpf: sk_msg program helper bpf_msg_push_data")
Tested-by: Xiang Mei <xmei5@asu.edu>
Tested-by: Xinyu Ma <mmmxny@gmail.com>
Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Weiming Shi <bestswngs@gmail.com>
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
---
 net/core/filter.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/core/filter.c b/net/core/filter.c
index 9590877b0714f..3c8f1cedb217f 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2829,6 +2829,9 @@ BPF_CALL_4(bpf_msg_push_data, struct sk_msg *, msg, u32, start,
 	if (!space || (space == 1 && start != offset))
 		copy = msg->sg.data[i].length;
 
+	if (unlikely(copy + len < copy))
+		return -EINVAL;
+
 	page = alloc_pages(__GFP_NOWARN | GFP_ATOMIC | __GFP_COMP,
 			   get_order(copy + len));
 	if (unlikely(!page))
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH bpf-next v4 2/6] bpf, sockmap: Fix wrong rsge offset in bpf_msg_push_data()
  2026-06-15  2:19 [PATCH bpf-next v4 0/6] bpf, skmsg: some fixes for skmsg Jiayuan Chen
  2026-06-15  2:19 ` [PATCH bpf-next v4 1/6] bpf, sockmap: reject overflowing copy + len in bpf_msg_push_data() Jiayuan Chen
@ 2026-06-15  2:19 ` Jiayuan Chen
  2026-06-15  2:49   ` bot+bpf-ci
  2026-06-15  2:19 ` [PATCH bpf-next v4 3/6] bpf, sockmap: keep sk_msg copy state in sync Jiayuan Chen
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 11+ messages in thread
From: Jiayuan Chen @ 2026-06-15  2:19 UTC (permalink / raw)
  To: bpf; +Cc: jiayuan.chen

From: Weiming Shi <bestswngs@gmail.com>

When bpf_msg_push_data() splits a scatterlist element into head and
tail, the tail's page offset is advanced by `start` (absolute message
byte offset) instead of `start - offset` (byte position within the
element). This makes rsge.offset overshoot by `offset` bytes, pointing
to the wrong location within the page or beyond its boundary. Consumers
of the corrupted entry either silently read wrong data or trigger an
out-of-bounds access.

 BUG: KASAN: slab-use-after-free in bpf_msg_pull_data (net/core/filter.c:2728)
 Read of size 32752 at addr ffff8881042f0010 by task poc/130
 Call Trace:
  __asan_memcpy (mm/kasan/shadow.c:105)
  bpf_msg_pull_data (net/core/filter.c:2728)
  bpf_prog_run_pin_on_cpu (include/linux/bpf.h:1402)
  sk_psock_msg_verdict (net/core/skmsg.c:934)
  tcp_bpf_send_verdict (net/ipv4/tcp_bpf.c:421)
  sock_sendmsg_nosec (net/socket.c:727)

Fixes: 6fff607e2f14 ("bpf: sk_msg program helper bpf_msg_push_data")
Reported-by: Xiang Mei <xmei5@asu.edu>
Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Weiming Shi <bestswngs@gmail.com>
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
---
 net/core/filter.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index 3c8f1cedb217f..3e555f276ba80 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2872,7 +2872,7 @@ BPF_CALL_4(bpf_msg_push_data, struct sk_msg *, msg, u32, start,
 
 		psge->length = start - offset;
 		rsge.length -= psge->length;
-		rsge.offset += start;
+		rsge.offset += start - offset;
 
 		sk_msg_iter_var_next(i);
 		sg_unmark_end(psge);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH bpf-next v4 3/6] bpf, sockmap: keep sk_msg copy state in sync
  2026-06-15  2:19 [PATCH bpf-next v4 0/6] bpf, skmsg: some fixes for skmsg Jiayuan Chen
  2026-06-15  2:19 ` [PATCH bpf-next v4 1/6] bpf, sockmap: reject overflowing copy + len in bpf_msg_push_data() Jiayuan Chen
  2026-06-15  2:19 ` [PATCH bpf-next v4 2/6] bpf, sockmap: Fix wrong rsge offset " Jiayuan Chen
@ 2026-06-15  2:19 ` Jiayuan Chen
  2026-06-15  2:19 ` [PATCH bpf-next v4 4/6] sockmap: Fix use-after-free in udp_bpf_recvmsg() Jiayuan Chen
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Jiayuan Chen @ 2026-06-15  2:19 UTC (permalink / raw)
  To: bpf; +Cc: jiayuan.chen

From: Zhang Cen <rollkingzzc@gmail.com>

SK_MSG uses msg->sg.copy as per-scatterlist-entry provenance. Entries
with this bit set are copied before data/data_end are exposed to SK_MSG
BPF programs for direct packet access.

bpf_msg_pull_data(), bpf_msg_push_data(), and bpf_msg_pop_data()
rewrite the sk_msg scatterlist ring by collapsing, splitting, and
shifting entries. These operations move msg->sg.data[] entries, but the
parallel copy bitmap can be left behind on the old slot. A copied entry
can then return to msg->sg.start with its copy bit clear and be exposed
as directly writable packet data.

This corruption path requires an attached SK_MSG BPF program that calls
the mutating helpers; ordinary sockmap/TLS traffic that never runs
push/pop/pull helper sequences is not affected.

Keep msg->sg.copy synchronized with scatterlist entry moves, preserve
the copy bit when an entry is split, clear it when a helper replaces an
entry with a private page, and clear slots vacated by pull-data
compaction.

Fixes: 015632bb30da ("bpf: sk_msg program helper bpf_sk_msg_pull_data")
Fixes: 6fff607e2f14 ("bpf: sk_msg program helper bpf_msg_push_data")
Fixes: 7246d8ed4dcc ("bpf: helper to pop data from messages")
Cc: stable@vger.kernel.org
Co-developed-by: Han Guidong <2045gemini@gmail.com>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Han Guidong <2045gemini@gmail.com>
Signed-off-by: Zhang Cen <rollkingzzc@gmail.com>
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
---
 net/core/filter.c | 88 ++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 83 insertions(+), 5 deletions(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index 3e555f276ba80..f605ab528b1af 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2654,6 +2654,38 @@ static void sk_msg_reset_curr(struct sk_msg *msg)
 	}
 }
 
+static bool sk_msg_elem_is_copy(const struct sk_msg *msg, u32 i)
+{
+	return test_bit(i, msg->sg.copy);
+}
+
+static void sk_msg_clear_elem_copy(struct sk_msg *msg, u32 i)
+{
+	__clear_bit(i, msg->sg.copy);
+}
+
+static void sk_msg_set_elem_copy(struct sk_msg *msg, u32 i)
+{
+	__set_bit(i, msg->sg.copy);
+}
+
+static void sk_msg_clear_copy_range(struct sk_msg *msg, u32 start, u32 end)
+{
+	while (start != end) {
+		sk_msg_clear_elem_copy(msg, start);
+		sk_msg_iter_var_next(start);
+	}
+}
+
+static void sk_msg_sg_move(struct sk_msg *msg, u32 dst, u32 src)
+{
+	msg->sg.data[dst] = msg->sg.data[src];
+	if (sk_msg_elem_is_copy(msg, src))
+		sk_msg_set_elem_copy(msg, dst);
+	else
+		sk_msg_clear_elem_copy(msg, dst);
+}
+
 static const struct bpf_func_proto bpf_msg_cork_bytes_proto = {
 	.func           = bpf_msg_cork_bytes,
 	.gpl_only       = false,
@@ -2692,7 +2724,7 @@ BPF_CALL_4(bpf_msg_pull_data, struct sk_msg *, msg, u32, start,
 	 * account for the headroom.
 	 */
 	bytes_sg_total = start - offset + bytes;
-	if (!test_bit(i, msg->sg.copy) && bytes_sg_total <= len)
+	if (!sk_msg_elem_is_copy(msg, i) && bytes_sg_total <= len)
 		goto out;
 
 	/* At this point we need to linearize multiple scatterlist
@@ -2738,6 +2770,7 @@ BPF_CALL_4(bpf_msg_pull_data, struct sk_msg *, msg, u32, start,
 	} while (i != last_sge);
 
 	sg_set_page(&msg->sg.data[first_sge], page, copy, 0);
+	sk_msg_clear_elem_copy(msg, first_sge);
 
 	/* To repair sg ring we need to shift entries. If we only
 	 * had a single entry though we can just replace it and
@@ -2747,8 +2780,14 @@ BPF_CALL_4(bpf_msg_pull_data, struct sk_msg *, msg, u32, start,
 	shift = last_sge > first_sge ?
 		last_sge - first_sge - 1 :
 		NR_MSG_FRAG_IDS - first_sge + last_sge - 1;
-	if (!shift)
+	if (!shift) {
+		sk_msg_clear_elem_copy(msg, msg->sg.end);
 		goto out;
+	}
+
+	i = first_sge;
+	sk_msg_iter_var_next(i);
+	sk_msg_clear_copy_range(msg, i, last_sge);
 
 	i = first_sge;
 	sk_msg_iter_var_next(i);
@@ -2762,16 +2801,18 @@ BPF_CALL_4(bpf_msg_pull_data, struct sk_msg *, msg, u32, start,
 		if (move_from == msg->sg.end)
 			break;
 
-		msg->sg.data[i] = msg->sg.data[move_from];
+		sk_msg_sg_move(msg, i, move_from);
 		msg->sg.data[move_from].length = 0;
 		msg->sg.data[move_from].page_link = 0;
 		msg->sg.data[move_from].offset = 0;
+		sk_msg_clear_elem_copy(msg, move_from);
 		sk_msg_iter_var_next(i);
 	} while (1);
 
 	msg->sg.end = msg->sg.end - shift > msg->sg.end ?
 		      msg->sg.end - shift + NR_MSG_FRAG_IDS :
 		      msg->sg.end - shift;
+	sk_msg_clear_elem_copy(msg, msg->sg.end);
 out:
 	sk_msg_reset_curr(msg);
 	msg->data = sg_virt(&msg->sg.data[first_sge]) + start - offset;
@@ -2792,8 +2833,10 @@ static const struct bpf_func_proto bpf_msg_pull_data_proto = {
 BPF_CALL_4(bpf_msg_push_data, struct sk_msg *, msg, u32, start,
 	   u32, len, u64, flags)
 {
+	bool sge_copy = false, nsge_copy = false, nnsge_copy = false;
 	struct scatterlist sge, nsge, nnsge, rsge = {0}, *psge;
 	u32 new, i = 0, l = 0, space, copy = 0, offset = 0;
+	bool rsge_copy = false;
 	u8 *raw, *to, *from;
 	struct page *page;
 
@@ -2869,6 +2912,7 @@ BPF_CALL_4(bpf_msg_push_data, struct sk_msg *, msg, u32, start,
 			sk_msg_iter_var_prev(i);
 		psge = sk_msg_elem(msg, i);
 		rsge = sk_msg_elem_cpy(msg, i);
+		rsge_copy = sk_msg_elem_is_copy(msg, i);
 
 		psge->length = start - offset;
 		rsge.length -= psge->length;
@@ -2894,23 +2938,34 @@ BPF_CALL_4(bpf_msg_push_data, struct sk_msg *, msg, u32, start,
 	/* Shift one or two slots as needed */
 	sge = sk_msg_elem_cpy(msg, new);
 	sg_unmark_end(&sge);
+	sge_copy = sk_msg_elem_is_copy(msg, new);
 
 	nsge = sk_msg_elem_cpy(msg, i);
+	nsge_copy = sk_msg_elem_is_copy(msg, i);
 	if (rsge.length) {
 		sk_msg_iter_var_next(i);
 		nnsge = sk_msg_elem_cpy(msg, i);
+		nnsge_copy = sk_msg_elem_is_copy(msg, i);
 		sk_msg_iter_next(msg, end);
 	}
 
 	while (i != msg->sg.end) {
 		msg->sg.data[i] = sge;
+		if (sge_copy)
+			sk_msg_set_elem_copy(msg, i);
+		else
+			sk_msg_clear_elem_copy(msg, i);
 		sge = nsge;
+		sge_copy = nsge_copy;
 		sk_msg_iter_var_next(i);
 		if (rsge.length) {
 			nsge = nnsge;
+			nsge_copy = nnsge_copy;
 			nnsge = sk_msg_elem_cpy(msg, i);
+			nnsge_copy = sk_msg_elem_is_copy(msg, i);
 		} else {
 			nsge = sk_msg_elem_cpy(msg, i);
+			nsge_copy = sk_msg_elem_is_copy(msg, i);
 		}
 	}
 
@@ -2918,13 +2973,18 @@ BPF_CALL_4(bpf_msg_push_data, struct sk_msg *, msg, u32, start,
 	/* Place newly allocated data buffer */
 	sk_mem_charge(msg->sk, len);
 	msg->sg.size += len;
-	__clear_bit(new, msg->sg.copy);
+	sk_msg_clear_elem_copy(msg, new);
 	sg_set_page(&msg->sg.data[new], page, len + copy, 0);
 	if (rsge.length) {
 		get_page(sg_page(&rsge));
 		sk_msg_iter_var_next(new);
 		msg->sg.data[new] = rsge;
+		if (rsge_copy)
+			sk_msg_set_elem_copy(msg, new);
+		else
+			sk_msg_clear_elem_copy(msg, new);
 	}
+	sk_msg_clear_elem_copy(msg, msg->sg.end);
 
 	sk_msg_reset_curr(msg);
 	sk_msg_compute_data_pointers(msg);
@@ -2950,27 +3010,38 @@ static void sk_msg_shift_left(struct sk_msg *msg, int i)
 	do {
 		prev = i;
 		sk_msg_iter_var_next(i);
-		msg->sg.data[prev] = msg->sg.data[i];
+		sk_msg_sg_move(msg, prev, i);
 	} while (i != msg->sg.end);
 
 	sk_msg_iter_prev(msg, end);
+	sk_msg_clear_elem_copy(msg, msg->sg.end);
 }
 
 static void sk_msg_shift_right(struct sk_msg *msg, int i)
 {
 	struct scatterlist tmp, sge;
+	bool tmp_copy, sge_copy;
 
 	sk_msg_iter_next(msg, end);
 	sge = sk_msg_elem_cpy(msg, i);
+	sge_copy = sk_msg_elem_is_copy(msg, i);
 	sk_msg_iter_var_next(i);
 	tmp = sk_msg_elem_cpy(msg, i);
+	tmp_copy = sk_msg_elem_is_copy(msg, i);
 
 	while (i != msg->sg.end) {
 		msg->sg.data[i] = sge;
+		if (sge_copy)
+			sk_msg_set_elem_copy(msg, i);
+		else
+			sk_msg_clear_elem_copy(msg, i);
 		sk_msg_iter_var_next(i);
 		sge = tmp;
+		sge_copy = tmp_copy;
 		tmp = sk_msg_elem_cpy(msg, i);
+		tmp_copy = sk_msg_elem_is_copy(msg, i);
 	}
+	sk_msg_clear_elem_copy(msg, msg->sg.end);
 }
 
 BPF_CALL_4(bpf_msg_pop_data, struct sk_msg *, msg, u32, start,
@@ -3027,8 +3098,10 @@ BPF_CALL_4(bpf_msg_pop_data, struct sk_msg *, msg, u32, start,
 	 */
 	if (start != offset) {
 		struct scatterlist *nsge, *sge = sk_msg_elem(msg, i);
+		bool sge_copy = sk_msg_elem_is_copy(msg, i);
 		int a = start - offset;
 		int b = sge->length - pop - a;
+		u32 sge_idx = i;
 
 		sk_msg_iter_var_next(i);
 
@@ -3041,6 +3114,10 @@ BPF_CALL_4(bpf_msg_pop_data, struct sk_msg *, msg, u32, start,
 				sg_set_page(nsge,
 					    sg_page(sge),
 					    b, sge->offset + pop + a);
+				if (sge_copy)
+					sk_msg_set_elem_copy(msg, i);
+				else
+					sk_msg_clear_elem_copy(msg, i);
 			} else {
 				struct page *page, *orig;
 				u8 *to, *from;
@@ -3057,6 +3134,7 @@ BPF_CALL_4(bpf_msg_pop_data, struct sk_msg *, msg, u32, start,
 				memcpy(to, from, a);
 				memcpy(to + a, from + a + pop, b);
 				sg_set_page(sge, page, a + b, 0);
+				sk_msg_clear_elem_copy(msg, sge_idx);
 				put_page(orig);
 			}
 			pop = 0;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH bpf-next v4 4/6] sockmap: Fix use-after-free in udp_bpf_recvmsg()
  2026-06-15  2:19 [PATCH bpf-next v4 0/6] bpf, skmsg: some fixes for skmsg Jiayuan Chen
                   ` (2 preceding siblings ...)
  2026-06-15  2:19 ` [PATCH bpf-next v4 3/6] bpf, sockmap: keep sk_msg copy state in sync Jiayuan Chen
@ 2026-06-15  2:19 ` Jiayuan Chen
  2026-06-15  2:37   ` sashiko-bot
  2026-06-15  2:19 ` [PATCH bpf-next v4 5/6] bpf, sockmap: fix integer overflow in bpf_msg_pop_data() bounds check Jiayuan Chen
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 11+ messages in thread
From: Jiayuan Chen @ 2026-06-15  2:19 UTC (permalink / raw)
  To: bpf; +Cc: jiayuan.chen

From: Kuniyuki Iwashima <kuniyu@google.com>

syzbot reported use-after-free of struct sk_msg in sk_msg_recvmsg(). [0]

sk_msg_recvmsg() peeks sk_msg from psock->ingress_msg under a lock,
but its processing is lockless.

Thus, sk_msg_recvmsg() must be serialised by callers, otherwise
multiple threads could touch the same sk_msg.

For example, TCP uses lock_sock(), and AF_UNIX uses unix_sk(sk)->iolock.

Initially, udp_bpf_recvmsg() had used lock_sock(), but the cited
commit removed it.

Let's serialise sk_msg_recvmsg() with lock_sock() in udp_bpf_recvmsg().

Note that holding spin_lock_bh(&sk->sk_receive_queue.lock) is not
an option due to copy_page_to_iter() in sk_msg_recvmsg().

[0]:
BUG: KASAN: slab-use-after-free in sk_msg_recvmsg+0xb54/0xc30 net/core/skmsg.c:428
Read of size 4 at addr ffff88814cdcf000 by task syz.0.24/6020

CPU: 1 UID: 0 PID: 6020 Comm: syz.0.24 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: Google Compute Engine/Google Compute Engine, BIOS Google 01/13/2026
Call Trace:
 <TASK>
 dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
 print_address_description mm/kasan/report.c:378 [inline]
 print_report+0xba/0x230 mm/kasan/report.c:482
 kasan_report+0x117/0x150 mm/kasan/report.c:595
 sk_msg_recvmsg+0xb54/0xc30 net/core/skmsg.c:428
 udp_bpf_recvmsg+0x4bd/0xe00 net/ipv4/udp_bpf.c:84
 inet_recvmsg+0x260/0x270 net/ipv4/af_inet.c:891
 sock_recvmsg_nosec net/socket.c:1078 [inline]
 sock_recvmsg+0x1a8/0x270 net/socket.c:1100
 ____sys_recvmsg+0x1e6/0x4a0 net/socket.c:2812
 ___sys_recvmsg+0x215/0x590 net/socket.c:2854
 do_recvmmsg+0x334/0x800 net/socket.c:2949
 __sys_recvmmsg net/socket.c:3023 [inline]
 __do_sys_recvmmsg net/socket.c:3046 [inline]
 __se_sys_recvmmsg net/socket.c:3039 [inline]
 __x64_sys_recvmmsg+0x198/0x250 net/socket.c:3039
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xe2/0xf80 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fb319f9aeb9
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fb31ad97028 EFLAGS: 00000246 ORIG_RAX: 000000000000012b
RAX: ffffffffffffffda RBX: 00007fb31a216090 RCX: 00007fb319f9aeb9
RDX: 0000000000000001 RSI: 0000200000000400 RDI: 0000000000000004
RBP: 00007fb31a008c1f R08: 0000000000000000 R09: 0000000000000000
R10: 0000000040000021 R11: 0000000000000246 R12: 0000000000000000
R13: 00007fb31a216128 R14: 00007fb31a216090 R15: 00007ffe21dd0a98
 </TASK>

Allocated by task 6019:
 kasan_save_stack mm/kasan/common.c:57 [inline]
 kasan_save_track+0x3e/0x80 mm/kasan/common.c:78
 poison_kmalloc_redzone mm/kasan/common.c:398 [inline]
 __kasan_kmalloc+0x93/0xb0 mm/kasan/common.c:415
 kasan_kmalloc include/linux/kasan.h:263 [inline]
 __kmalloc_cache_noprof+0x3d1/0x6e0 mm/slub.c:5780
 kmalloc_noprof include/linux/slab.h:957 [inline]
 kzalloc_noprof include/linux/slab.h:1094 [inline]
 alloc_sk_msg net/core/skmsg.c:510 [inline]
 sk_psock_skb_ingress_self+0x60/0x350 net/core/skmsg.c:612
 sk_psock_verdict_apply net/core/skmsg.c:1038 [inline]
 sk_psock_verdict_recv+0x7d9/0x8d0 net/core/skmsg.c:1236
 udp_read_skb+0x73e/0x7e0 net/ipv4/udp.c:2045
 sk_psock_verdict_data_ready+0x12d/0x550 net/core/skmsg.c:1257
 __udp_enqueue_schedule_skb+0xc54/0x10b0 net/ipv4/udp.c:1789
 __udp_queue_rcv_skb net/ipv4/udp.c:2346 [inline]
 udp_queue_rcv_one_skb+0xac5/0x19c0 net/ipv4/udp.c:2475
 __udp4_lib_mcast_deliver+0xc06/0xcf0 net/ipv4/udp.c:2585
 __udp4_lib_rcv+0x10f6/0x2620 net/ipv4/udp.c:2724
 ip_protocol_deliver_rcu+0x282/0x440 net/ipv4/ip_input.c:207
 ip_local_deliver_finish+0x3bb/0x6f0 net/ipv4/ip_input.c:241
 NF_HOOK+0x336/0x3c0 include/linux/netfilter.h:318
 dst_input include/net/dst.h:474 [inline]
 ip_sublist_rcv_finish+0x221/0x2a0 net/ipv4/ip_input.c:584
 ip_list_rcv_finish net/ipv4/ip_input.c:628 [inline]
 ip_sublist_rcv+0x5c6/0xa70 net/ipv4/ip_input.c:644
 ip_list_rcv+0x3f1/0x450 net/ipv4/ip_input.c:678
 __netif_receive_skb_list_ptype net/core/dev.c:6195 [inline]
 __netif_receive_skb_list_core+0x7e5/0x810 net/core/dev.c:6242
 __netif_receive_skb_list net/core/dev.c:6294 [inline]
 netif_receive_skb_list_internal+0x995/0xcf0 net/core/dev.c:6385
 netif_receive_skb_list+0x54/0x410 net/core/dev.c:6437
 xdp_recv_frames net/bpf/test_run.c:269 [inline]
 xdp_test_run_batch net/bpf/test_run.c:350 [inline]
 bpf_test_run_xdp_live+0x1946/0x1cf0 net/bpf/test_run.c:379
 bpf_prog_test_run_xdp+0x81c/0x1160 net/bpf/test_run.c:1396
 bpf_prog_test_run+0x2c7/0x340 kernel/bpf/syscall.c:4703
 __sys_bpf+0x5cb/0x920 kernel/bpf/syscall.c:6182
 __do_sys_bpf kernel/bpf/syscall.c:6274 [inline]
 __se_sys_bpf kernel/bpf/syscall.c:6272 [inline]
 __x64_sys_bpf+0x7c/0x90 kernel/bpf/syscall.c:6272
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xe2/0xf80 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Freed by task 6021:
 kasan_save_stack mm/kasan/common.c:57 [inline]
 kasan_save_track+0x3e/0x80 mm/kasan/common.c:78
 kasan_save_free_info+0x46/0x50 mm/kasan/generic.c:584
 poison_slab_object mm/kasan/common.c:253 [inline]
 __kasan_slab_free+0x5c/0x80 mm/kasan/common.c:285
 kasan_slab_free include/linux/kasan.h:235 [inline]
 slab_free_hook mm/slub.c:2540 [inline]
 slab_free mm/slub.c:6674 [inline]
 kfree+0x1be/0x650 mm/slub.c:6882
 kfree_sk_msg include/linux/skmsg.h:385 [inline]
 sk_msg_recvmsg+0xaa8/0xc30 net/core/skmsg.c:483
 udp_bpf_recvmsg+0x4bd/0xe00 net/ipv4/udp_bpf.c:84
 inet_recvmsg+0x260/0x270 net/ipv4/af_inet.c:891
 sock_recvmsg_nosec net/socket.c:1078 [inline]
 sock_recvmsg+0x1a8/0x270 net/socket.c:1100
 ____sys_recvmsg+0x1e6/0x4a0 net/socket.c:2812
 ___sys_recvmsg+0x215/0x590 net/socket.c:2854
 do_recvmmsg+0x334/0x800 net/socket.c:2949
 __sys_recvmmsg net/socket.c:3023 [inline]
 __do_sys_recvmmsg net/socket.c:3046 [inline]
 __se_sys_recvmmsg net/socket.c:3039 [inline]
 __x64_sys_recvmmsg+0x198/0x250 net/socket.c:3039
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xe2/0xf80 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Fixes: 9f2470fbc4cb ("skmsg: Improve udp_bpf_recvmsg() accuracy")
Reported-by: syzbot+9307c991a6d07ce6e6d8@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/69922ac9.a70a0220.2c38d7.00e0.GAE@google.com/
Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
---
 net/ipv4/udp_bpf.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/net/ipv4/udp_bpf.c b/net/ipv4/udp_bpf.c
index 9f33b07b14813..ad57c4c9eaab6 100644
--- a/net/ipv4/udp_bpf.c
+++ b/net/ipv4/udp_bpf.c
@@ -50,7 +50,9 @@ static int udp_msg_wait_data(struct sock *sk, struct sk_psock *psock,
 	sk_set_bit(SOCKWQ_ASYNC_WAITDATA, sk);
 	ret = udp_msg_has_data(sk, psock);
 	if (!ret) {
+		release_sock(sk);
 		wait_woken(&wait, TASK_INTERRUPTIBLE, timeo);
+		lock_sock(sk);
 		ret = udp_msg_has_data(sk, psock);
 	}
 	sk_clear_bit(SOCKWQ_ASYNC_WAITDATA, sk);
@@ -79,6 +81,7 @@ static int udp_bpf_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 		goto out;
 	}
 
+	lock_sock(sk);
 msg_bytes_ready:
 	copied = sk_msg_recvmsg(sk, psock, msg, len, flags);
 	if (!copied) {
@@ -90,11 +93,17 @@ static int udp_bpf_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 		if (data) {
 			if (psock_has_data(psock))
 				goto msg_bytes_ready;
+
+			release_sock(sk);
+
 			ret = sk_udp_recvmsg(sk, msg, len, flags);
 			goto out;
 		}
 		copied = -EAGAIN;
 	}
+
+	release_sock(sk);
+
 	ret = copied;
 out:
 	sk_psock_put(sk, psock);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH bpf-next v4 5/6] bpf, sockmap: fix integer overflow in bpf_msg_pop_data() bounds check
  2026-06-15  2:19 [PATCH bpf-next v4 0/6] bpf, skmsg: some fixes for skmsg Jiayuan Chen
                   ` (3 preceding siblings ...)
  2026-06-15  2:19 ` [PATCH bpf-next v4 4/6] sockmap: Fix use-after-free in udp_bpf_recvmsg() Jiayuan Chen
@ 2026-06-15  2:19 ` Jiayuan Chen
  2026-06-15  2:19 ` [PATCH bpf-next v4 6/6] selftests/bpf: add test for bpf_msg_pop_data() overflow Jiayuan Chen
  2026-06-15  4:40 ` [PATCH bpf-next v4 0/6] bpf, skmsg: some fixes for skmsg patchwork-bot+netdevbpf
  6 siblings, 0 replies; 11+ messages in thread
From: Jiayuan Chen @ 2026-06-15  2:19 UTC (permalink / raw)
  To: bpf; +Cc: jiayuan.chen

From: Sechang Lim <rhkrqnwk98@gmail.com>

start and len are u32, so

	u64 last = start + len;

evaluates start + len in 32-bit and wraps before storing it in last.
The bounds check

	if (start >= offset + l || last > msg->sg.size)
		return -EINVAL;

can then be passed with an out-of-range start/len, after which the pop
loop runs off the end of the scatterlist and sk_msg_shift_left() calls
put_page() on the empty msg->sg.end slot:

  Oops: general protection fault, probably for non-canonical address
  0xdffffc0000000001: 0000 [#1] SMP KASAN PTI
  KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
  RIP: 0010:sk_msg_shift_left net/core/filter.c:2957 [inline]
  RIP: 0010:____bpf_msg_pop_data net/core/filter.c:3103 [inline]
  RIP: 0010:bpf_msg_pop_data+0x753/0x1a10 net/core/filter.c:2984
  Call Trace:
   <TASK>
   bpf_prog_4cc92c278f4d5d56+0x1b1/0x1e8
   bpf_prog_run_pin_on_cpu+0x107/0x320 include/linux/filter.h:746
   sk_psock_msg_verdict+0x357/0x7f0 net/core/skmsg.c:934
   tcp_bpf_send_verdict net/ipv4/tcp_bpf.c:420 [inline]
   tcp_bpf_sendmsg+0x766/0x1ae0 net/ipv4/tcp_bpf.c:583
   __sock_sendmsg+0x153/0x1c0 net/socket.c:802
   __sys_sendto+0x326/0x430 net/socket.c:2265
   __x64_sys_sendto+0xe3/0x100 net/socket.c:2268
   do_syscall_64+0x14c/0x480
   entry_SYSCALL_64_after_hwframe+0x77/0x7f
   </TASK>

Widen the addition with a (u64) cast so the bound is evaluated in
64-bit and a len near U32_MAX no longer wraps below msg->sg.size.

While here, change pop from int to u32. It counts bytes against the
unsigned scatterlist lengths and can never be negative, so the signed
type only invites sign-confusion in the pop loop.

Fixes: 7246d8ed4dcc ("bpf: helper to pop data from messages")
Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Sechang Lim <rhkrqnwk98@gmail.com>
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
---
 net/core/filter.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index f605ab528b1af..73f05907839d7 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3048,8 +3048,8 @@ BPF_CALL_4(bpf_msg_pop_data, struct sk_msg *, msg, u32, start,
 	   u32, len, u64, flags)
 {
 	u32 i = 0, l = 0, space, offset = 0;
-	u64 last = start + len;
-	int pop;
+	u64 last = (u64)start + len;
+	u32 pop;
 
 	if (unlikely(flags))
 		return -EINVAL;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH bpf-next v4 6/6] selftests/bpf: add test for bpf_msg_pop_data() overflow
  2026-06-15  2:19 [PATCH bpf-next v4 0/6] bpf, skmsg: some fixes for skmsg Jiayuan Chen
                   ` (4 preceding siblings ...)
  2026-06-15  2:19 ` [PATCH bpf-next v4 5/6] bpf, sockmap: fix integer overflow in bpf_msg_pop_data() bounds check Jiayuan Chen
@ 2026-06-15  2:19 ` Jiayuan Chen
  2026-06-15  4:40 ` [PATCH bpf-next v4 0/6] bpf, skmsg: some fixes for skmsg patchwork-bot+netdevbpf
  6 siblings, 0 replies; 11+ messages in thread
From: Jiayuan Chen @ 2026-06-15  2:19 UTC (permalink / raw)
  To: bpf; +Cc: jiayuan.chen

From: Sechang Lim <rhkrqnwk98@gmail.com>

Add a test in sockmap_basic.c that calls bpf_msg_pop_data() with a length
close to U32_MAX, which overflows the start + len bounds check. The sk_msg
program records the return value over a sendmsg and the test checks that
the call is rejected with -EINVAL.

Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Cc: Jiayuan Chen <jiayuan.chen@linux.dev>
Signed-off-by: Sechang Lim <rhkrqnwk98@gmail.com>
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
---
 .../selftests/bpf/prog_tests/sockmap_basic.c  | 48 +++++++++++++++++++
 .../bpf/progs/test_sockmap_msg_pop_data.c     | 27 +++++++++++
 2 files changed, 75 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/progs/test_sockmap_msg_pop_data.c

diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
index d2846579285f2..cb3229711f93a 100644
--- a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
+++ b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
@@ -14,6 +14,7 @@
 #include "test_sockmap_pass_prog.skel.h"
 #include "test_sockmap_drop_prog.skel.h"
 #include "test_sockmap_change_tail.skel.h"
+#include "test_sockmap_msg_pop_data.skel.h"
 #include "bpf_iter_sockmap.skel.h"
 
 #include "sockmap_helpers.h"
@@ -666,6 +667,51 @@ static void test_sockmap_skb_verdict_change_tail(void)
 	test_sockmap_change_tail__destroy(skel);
 }
 
+static void test_sockmap_msg_verdict_pop_data(void)
+{
+	struct test_sockmap_msg_pop_data *skel;
+	int err, map, verdict;
+	int c1 = -1, p1 = -1, sent;
+	int zero = 0;
+	char *buf;
+	const size_t len = 32 * 1024;
+
+	skel = test_sockmap_msg_pop_data__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "open_and_load"))
+		return;
+
+	verdict = bpf_program__fd(skel->progs.prog_msg_pop_data);
+	map = bpf_map__fd(skel->maps.sock_map);
+
+	err = bpf_prog_attach(verdict, map, BPF_SK_MSG_VERDICT, 0);
+	if (!ASSERT_OK(err, "bpf_prog_attach"))
+		goto out;
+
+	err = create_pair(AF_INET, SOCK_STREAM, &c1, &p1);
+	if (!ASSERT_OK(err, "create_pair"))
+		goto out;
+
+	err = bpf_map_update_elem(map, &zero, &c1, BPF_NOEXIST);
+	if (!ASSERT_OK(err, "bpf_map_update_elem"))
+		goto out_close;
+
+	buf = calloc(len, 1);
+	if (!ASSERT_OK_PTR(buf, "calloc"))
+		goto out_close;
+
+	sent = xsend(c1, buf, len, 0);
+	ASSERT_EQ(sent, (ssize_t)len, "xsend");
+	ASSERT_EQ(skel->data->pop_data_ret, -EINVAL, "pop_data_rejects overflow");
+
+	free(buf);
+
+out_close:
+	close(c1);
+	close(p1);
+out:
+	test_sockmap_msg_pop_data__destroy(skel);
+}
+
 static void test_sockmap_skb_verdict_peek_helper(int map)
 {
 	int err, c1, p1, zero = 0, sent, recvd, avail;
@@ -1373,6 +1419,8 @@ void test_sockmap_basic(void)
 		test_sockmap_skb_verdict_fionread(false);
 	if (test__start_subtest("sockmap skb_verdict change tail"))
 		test_sockmap_skb_verdict_change_tail();
+	if (test__start_subtest("sockmap msg_verdict pop_data overflow"))
+		test_sockmap_msg_verdict_pop_data();
 	if (test__start_subtest("sockmap skb_verdict msg_f_peek"))
 		test_sockmap_skb_verdict_peek();
 	if (test__start_subtest("sockmap skb_verdict msg_f_peek with link"))
diff --git a/tools/testing/selftests/bpf/progs/test_sockmap_msg_pop_data.c b/tools/testing/selftests/bpf/progs/test_sockmap_msg_pop_data.c
new file mode 100644
index 0000000000000..301e65b95256c
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_sockmap_msg_pop_data.c
@@ -0,0 +1,27 @@
+// SPDX-License-Identifier: GPL-2.0
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+
+struct {
+	__uint(type, BPF_MAP_TYPE_SOCKMAP);
+	__uint(max_entries, 1);
+	__type(key, int);
+	__type(value, int);
+} sock_map SEC(".maps");
+
+#define POP_START 0x48a3
+#define POP_LEN   0xfffffffd
+
+long pop_data_ret = 1;
+
+SEC("sk_msg")
+int prog_msg_pop_data(struct sk_msg_md *msg)
+{
+	if (msg->size <= POP_START)
+		return SK_PASS;
+
+	pop_data_ret = bpf_msg_pop_data(msg, POP_START, POP_LEN, 0);
+	return SK_PASS;
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next v4 1/6] bpf, sockmap: reject overflowing copy + len in bpf_msg_push_data()
  2026-06-15  2:19 ` [PATCH bpf-next v4 1/6] bpf, sockmap: reject overflowing copy + len in bpf_msg_push_data() Jiayuan Chen
@ 2026-06-15  2:32   ` sashiko-bot
  0 siblings, 0 replies; 11+ messages in thread
From: sashiko-bot @ 2026-06-15  2:32 UTC (permalink / raw)
  To: Jiayuan Chen; +Cc: bpf

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:

Pre-existing issues:
- [Critical] Appending to the end of a full or nearly full message scatterlist reads an out-of-bounds or unrelated `length`, causing a heap buffer overflow or massive information leak.
- [High] Allocating a new page without `__GFP_ZERO` in `bpf_msg_push_data()` leaks uninitialized kernel heap memory.
--

commit d474d4a6051d7bd721dc6d3aa5d5f0dff7f6ec9d
Author: Weiming Shi <bestswngs@gmail.com>

bpf, sockmap: reject overflowing copy + len in bpf_msg_push_data()

When the scatterlist ring is full or nearly full, bpf_msg_push_data()
enters a copy fallback path and computes copy + len for the page
allocation size. Since len comes from BPF with arg3_type = ARG_ANYTHING
and both are u32, a crafted len can wrap the sum to a small value,
causing an undersized allocation followed by an out-of-bounds memcpy.

Link: https://lore.kernel.org/all/20260424155913.A19FDC19425@smtp.kernel.org

> diff --git a/net/core/filter.c b/net/core/filter.c
> index 57b00c6cc8cc1..4b159045881dd 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -2829,6 +2829,9 @@ BPF_CALL_4(bpf_msg_push_data, struct sk_msg *, msg, u32, start,
>  	if (!space || (space == 1 && start != offset))
>  		copy = msg->sg.data[i].length;

[Severity: Critical]
This is a pre-existing issue, but does bpf_msg_push_data() read an
out-of-bounds or unrelated length when appending to the end of a full or
nearly full message scatterlist?

If appending exactly at the end of the message (start == offset + l), the loop
terminates with i == msg->sg.end. If there is exactly one available slot
(space == 1), the condition space == 1 && start != offset evaluates to true,
causing this line to execute.

Since i == msg->sg.end, this reads the length of an unused, empty slot (stale
data). This stale value dictates the allocation size and is added to the new
scatterlist element's length (msg->sg.data[new].length = len + copy). This
causes the scatterlist element to advertise a length much larger than the
actually initialized data, triggering massive out-of-bounds reads during
transmission and leaking thousands of bytes of kernel memory.

Additionally, if the scatterlist ring is entirely full (space == 0),
msg->sg.end equals msg->sg.start. Thus, copy incorrectly reads the length of
the first element in the ring, while the subsequent code consolidates the last
element. This results in allocating a page sized for the first element but
memcpy-ing the last element's data into it. If the last element is larger than
the first, this causes a deterministic heap buffer overflow.

>  
> +	if (unlikely(copy + len < copy))
> +		return -EINVAL;
> +
>  	page = alloc_pages(__GFP_NOWARN | GFP_ATOMIC | __GFP_COMP,
>  			   get_order(copy + len));

[Severity: High]
This isn't a bug introduced by this patch, but does alloc_pages() here in
bpf_msg_push_data() leak uninitialized kernel heap memory?

This allocates a new page but intentionally skips initializing the newly
inserted len bytes region, copying only the front and back portions of the old
scatterlist element. Because the allocation does not include the __GFP_ZERO
flag, the inserted space contains whatever data previously occupied that kernel
heap page.

If the BPF program does not subsequently overwrite this region before the
message is transmitted, the uninitialized kernel memory is sent over the
network or exposed to user space.

>  	if (unlikely(!page))

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260615021959.140010-1-jiayuan.chen@linux.dev?part=1

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next v4 4/6] sockmap: Fix use-after-free in udp_bpf_recvmsg()
  2026-06-15  2:19 ` [PATCH bpf-next v4 4/6] sockmap: Fix use-after-free in udp_bpf_recvmsg() Jiayuan Chen
@ 2026-06-15  2:37   ` sashiko-bot
  0 siblings, 0 replies; 11+ messages in thread
From: sashiko-bot @ 2026-06-15  2:37 UTC (permalink / raw)
  To: Jiayuan Chen; +Cc: bpf

Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:

Pre-existing issues:
- [High] Infinite tight loop (soft lockup) holding lock_sock if data is pending only in ingress_skb.
- [High] Permanent bricking of socket receive path in `__sk_msg_recvmsg` on 0-byte datagrams or failed copies.
- [High] Delegating blocking recvmsg to `sk_udp_recvmsg` misses BPF wakeups, causing indefinite hangs.
- [High] `vsock_msg_wait_data` sleeps while holding `lock_sock(sk)`, leading to a deadlock.
--

commit a60377195171f0aa2922ffec3bb6a4d9b4e06c24
Author: Kuniyuki Iwashima <kuniyu@google.com>

sockmap: Fix use-after-free in udp_bpf_recvmsg()

syzbot reported use-after-free of struct sk_msg in sk_msg_recvmsg().
sk_msg_recvmsg() peeks sk_msg from psock->ingress_msg under a lock,
but its processing is lockless.

Link: https://lore.kernel.org/netdev/69922ac9.a70a0220.2c38d7.00e0.GAE@google.com/

> diff --git a/net/ipv4/udp_bpf.c b/net/ipv4/udp_bpf.c
> --- a/net/ipv4/udp_bpf.c
> +++ b/net/ipv4/udp_bpf.c

[ ... ]

>  static int udp_bpf_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
>  			   int flags)
>  {
>  	struct sk_psock *psock;
>  	int copied, ret;

[ ... ]

>  	psock = sk_psock_get(sk);
>  	if (unlikely(!psock))
>  		return sk_udp_recvmsg(sk, msg, len, flags);
>  
>  	if (!psock_has_data(psock)) {
>  		ret = sk_udp_recvmsg(sk, msg, len, flags);
>  		goto out;
>  	}

[Severity: High]
This is a pre-existing issue, but does delegating a blocking recvmsg to
sk_udp_recvmsg here cause us to miss BPF wakeups?

If psock_has_data(psock) is initially false, udp_bpf_recvmsg() bypasses its
BPF-aware wait loop and delegates the read entirely to the native
sk_udp_recvmsg().

If the socket is blocking, sk_udp_recvmsg() sleeps waiting for packets on
sk->sk_receive_queue. If a BPF program subsequently redirects data to the
socket, the data is placed in psock->ingress_skb or psock->ingress_msg, and
a wakeup is issued via sk_data_ready().

This wakes up the native UDP loop, which only checks sk->sk_receive_queue,
finds it empty, and goes back to sleep. Can this cause the process to hang
indefinitely despite data being correctly available in the psock?

> +	lock_sock(sk);
>  msg_bytes_ready:
>  	copied = sk_msg_recvmsg(sk, psock, msg, len, flags);
>  	if (!copied) {
> @@ -90,11 +93,17 @@ static int udp_bpf_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
>  		if (data) {
>  			if (psock_has_data(psock))
>  				goto msg_bytes_ready;

[Severity: High]
This is a pre-existing issue, but can this loop cause a soft lockup if data
is pending only in ingress_skb?

If data has been enqueued to psock->ingress_skb but not yet moved to
psock->ingress_msg by the backlog worker, sk_msg_recvmsg() returns 0.
The code then calls udp_msg_wait_data().

Since psock_has_data() evaluates to true when ingress_skb is not empty,
udp_msg_wait_data() returns 1 immediately without sleeping. The loop then
jumps back to msg_bytes_ready and repeats.

On a non-preemptible kernel, if the worker is scheduled on the same CPU,
could this starve the worker and form a busy-wait loop in kernel space?

While reviewing the rx paths associated with this change, I noticed a
couple of other related issues in the subsystem.

> diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> --- a/net/core/skmsg.c
> +++ b/net/core/skmsg.c

[ ... ]

>  static int __sk_msg_recvmsg(struct sock *sk, struct sk_psock *psock,
>  			    struct msghdr *msg, int len, int flags,
>  			    struct sk_msg *msg_rx)
>  {

[ ... ]

> 			if (!copy) {
> 				copied = copied ? copied : -EFAULT;
> 				goto out;
> 			}

[Severity: High]
This is a pre-existing issue, but does returning -EFAULT here permanently
brick the socket receive path for 0-byte datagrams?

When a 0-byte UDP packet is redirected, skb_to_sgvec yields an empty
scatterlist, making copy evaluate to 0.

Because the error path jumps to out, it bypasses advancing the read pointer
(msg_rx->sg.start = i) and skips dequeuing the message.

On the next recvmsg call, won't the function peek the same 0-byte message
and immediately return -EFAULT again?

> diff --git a/net/vmw_vsock/vsock_bpf.c b/net/vmw_vsock/vsock_bpf.c
> --- a/net/vmw_vsock/vsock_bpf.c
> +++ b/net/vmw_vsock/vsock_bpf.c

[ ... ]

>  static int vsock_bpf_recvmsg(struct sock *sk, struct msghdr *msg,
>  			     size_t len, int flags)
>  {
>  	struct sk_psock *psock;

[ ... ]

> 	lock_sock(sk);
> 	vsk = vsock_sk(sk);
> ...
> 	while (copied == 0) {
> 		long timeo = sock_rcvtimeo(sk, flags & MSG_DONTWAIT);
>
> 		if (!vsock_msg_wait_data(sk, psock, timeo)) {

[Severity: High]
This is a pre-existing issue, but does vsock_msg_wait_data sleep while
holding lock_sock(sk)?

In vsock_bpf_recvmsg(), the socket lock is held when calling
vsock_msg_wait_data(). Inside vsock_msg_wait_data(), wait_woken() is called
to sleep without releasing the socket lock.

Meanwhile, the virtio transport receive path (virtio_transport_recv_pkt)
must acquire lock_sock(sk) to deliver incoming packets.

Could this result in a deadlock since the transport worker is blocked trying
to acquire lock_sock(sk) while the user process sleeps holding it?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260615021959.140010-1-jiayuan.chen@linux.dev?part=4

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next v4 2/6] bpf, sockmap: Fix wrong rsge offset in bpf_msg_push_data()
  2026-06-15  2:19 ` [PATCH bpf-next v4 2/6] bpf, sockmap: Fix wrong rsge offset " Jiayuan Chen
@ 2026-06-15  2:49   ` bot+bpf-ci
  0 siblings, 0 replies; 11+ messages in thread
From: bot+bpf-ci @ 2026-06-15  2:49 UTC (permalink / raw)
  To: jiayuan.chen, bpf
  Cc: jiayuan.chen, ast, andrii, daniel, martin.lau, eddyz87,
	yonghong.song, clm, ihor.solodrai

[-- Attachment #1: Type: text/plain, Size: 884 bytes --]

> diff --git a/net/core/filter.c b/net/core/filter.c
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -2872,7 +2872,7 @@ struct scatterlist sge, nsge, nnsge, rsge = {0}, *psge;
>
>  		psge->length = start - offset;
>  		rsge.length -= psge->length;
> -		rsge.offset += start;
> +		rsge.offset += start - offset;
>
>  		sk_msg_iter_var_next(i);
>  		sg_unmark_end(psge);

This looks correct. Would it be helpful to include a test case or
reproducer with this fix? Jiayuan Chen noted in the review discussion
that it took significant effort to reproduce this issue, and having test
coverage would help prevent regressions in the future.


---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/27520503350

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next v4 0/6] bpf, skmsg: some fixes for skmsg
  2026-06-15  2:19 [PATCH bpf-next v4 0/6] bpf, skmsg: some fixes for skmsg Jiayuan Chen
                   ` (5 preceding siblings ...)
  2026-06-15  2:19 ` [PATCH bpf-next v4 6/6] selftests/bpf: add test for bpf_msg_pop_data() overflow Jiayuan Chen
@ 2026-06-15  4:40 ` patchwork-bot+netdevbpf
  6 siblings, 0 replies; 11+ messages in thread
From: patchwork-bot+netdevbpf @ 2026-06-15  4:40 UTC (permalink / raw)
  To: Jiayuan Chen; +Cc: bpf

Hello:

This series was applied to bpf/bpf-next.git (master)
by Alexei Starovoitov <ast@kernel.org>:

On Mon, 15 Jun 2026 10:19:53 +0800 you wrote:
> All fixes are from previous patches sent by Weiming Shi, Zhang Cen,
> Kuniyuki and Sechang Lim, which have already been reviewed by me and John and Jakub.
> 
> https://lore.kernel.org/bpf/20260610081218.506709-2-rhkrqnwk98@gmail.com/
> https://lore.kernel.org/bpf/20260520102715.3033936-1-rollkingzzc@gmail.com/
> https://lore.kernel.org/bpf/20260424191602.1522411-3-bestswngs@gmail.com/
> https://lore.kernel.org/bpf/20260423155807.1245644-2-bestswngs@gmail.com/
> https://lore.kernel.org/bpf/20260221233234.3814768-4-kuniyu@google.com/
> 
> [...]

Here is the summary with links:
  - [bpf-next,v4,1/6] bpf, sockmap: reject overflowing copy + len in bpf_msg_push_data()
    https://git.kernel.org/bpf/bpf-next/c/0c0a8ed85349
  - [bpf-next,v4,2/6] bpf, sockmap: Fix wrong rsge offset in bpf_msg_push_data()
    https://git.kernel.org/bpf/bpf-next/c/f3f34ca45b96
  - [bpf-next,v4,3/6] bpf, sockmap: keep sk_msg copy state in sync
    https://git.kernel.org/bpf/bpf-next/c/2ccbc9a38746
  - [bpf-next,v4,4/6] sockmap: Fix use-after-free in udp_bpf_recvmsg()
    https://git.kernel.org/bpf/bpf-next/c/c010995b29c8
  - [bpf-next,v4,5/6] bpf, sockmap: fix integer overflow in bpf_msg_pop_data() bounds check
    https://git.kernel.org/bpf/bpf-next/c/a48802fb2cd2
  - [bpf-next,v4,6/6] selftests/bpf: add test for bpf_msg_pop_data() overflow
    https://git.kernel.org/bpf/bpf-next/c/70b139d0483c

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2026-06-15  4:40 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-15  2:19 [PATCH bpf-next v4 0/6] bpf, skmsg: some fixes for skmsg Jiayuan Chen
2026-06-15  2:19 ` [PATCH bpf-next v4 1/6] bpf, sockmap: reject overflowing copy + len in bpf_msg_push_data() Jiayuan Chen
2026-06-15  2:32   ` sashiko-bot
2026-06-15  2:19 ` [PATCH bpf-next v4 2/6] bpf, sockmap: Fix wrong rsge offset " Jiayuan Chen
2026-06-15  2:49   ` bot+bpf-ci
2026-06-15  2:19 ` [PATCH bpf-next v4 3/6] bpf, sockmap: keep sk_msg copy state in sync Jiayuan Chen
2026-06-15  2:19 ` [PATCH bpf-next v4 4/6] sockmap: Fix use-after-free in udp_bpf_recvmsg() Jiayuan Chen
2026-06-15  2:37   ` sashiko-bot
2026-06-15  2:19 ` [PATCH bpf-next v4 5/6] bpf, sockmap: fix integer overflow in bpf_msg_pop_data() bounds check Jiayuan Chen
2026-06-15  2:19 ` [PATCH bpf-next v4 6/6] selftests/bpf: add test for bpf_msg_pop_data() overflow Jiayuan Chen
2026-06-15  4:40 ` [PATCH bpf-next v4 0/6] bpf, skmsg: some fixes for skmsg patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox