[PATCH bpf 0/4] bpf, skmsg: some fixes for skmsg

BPF List
 help / color / mirror / Atom feed

* [PATCH bpf 0/4] bpf, skmsg: some fixes for skmsg
@ 2026-06-11  1:35 Jiayuan Chen
  2026-06-11  1:35 ` [PATCH bpf 1/4] bpf, sockmap: reject overflowing copy + len in bpf_msg_push_data() Jiayuan Chen
                   ` (4 more replies)
  0 siblings, 5 replies; 12+ messages in thread
From: Jiayuan Chen @ 2026-06-11  1:35 UTC (permalink / raw)
  To: bpf


All fixes are from previous patches sent by Weiming Shi and Zhang Cen,
which have already been reviewed by me and John.

https://lore.kernel.org/bpf/20260520102715.3033936-1-rollkingzzc@gmail.com/
https://lore.kernel.org/bpf/20260424190310.1520555-2-bestswngs@gmail.com/
https://lore.kernel.org/bpf/20260424191602.1522411-3-bestswngs@gmail.com/
https://lore.kernel.org/bpf/20260423155807.1245644-2-bestswngs@gmail.com/

Weiming Shi (3):
  bpf, sockmap: reject overflowing copy + len in bpf_msg_push_data()
  bpf, sockmap: Fix wrong rsge offset in bpf_msg_push_data()
  bpf, sockmap: zero-initialize pages allocated in bpf_msg_push_data

Zhang Cen (1):
  bpf, sockmap: keep sk_msg copy state in sync

 net/core/filter.c | 95 +++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 88 insertions(+), 7 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH bpf 1/4] bpf, sockmap: reject overflowing copy + len in bpf_msg_push_data()
  2026-06-11  1:35 [PATCH bpf 0/4] bpf, skmsg: some fixes for skmsg Jiayuan Chen
@ 2026-06-11  1:35 ` Jiayuan Chen
  2026-06-11  1:49   ` sashiko-bot
  2026-06-11  1:35 ` [PATCH bpf 2/4] bpf, sockmap: Fix wrong rsge offset " Jiayuan Chen
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 12+ messages in thread
From: Jiayuan Chen @ 2026-06-11  1:35 UTC (permalink / raw)
  To: bpf

From: Weiming Shi <bestswngs@gmail.com>

When the scatterlist ring is full or nearly full, bpf_msg_push_data()
enters a copy fallback path and computes copy + len for the page
allocation size. Since len comes from BPF with arg3_type = ARG_ANYTHING
and both are u32, a crafted len can wrap the sum to a small value,
causing an undersized allocation followed by an out-of-bounds memcpy.

 BUG: unable to handle page fault for address: ffffed104089a402
 Oops: Oops: 0000 [#1] SMP KASAN NOPTI
 Call Trace:
  __asan_memcpy (mm/kasan/shadow.c:105)
  bpf_msg_push_data (net/core/filter.c:2852 net/core/filter.c:2788)
  bpf_prog_9ed8b5711920a7d7+0x2e/0x36
  sk_psock_msg_verdict (net/core/skmsg.c:934)
  tcp_bpf_sendmsg (net/ipv4/tcp_bpf.c:421 net/ipv4/tcp_bpf.c:584)
  __sys_sendto (net/socket.c:2206)
  do_syscall_64 (arch/x86/entry/syscall_64.c:94)
  entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)

Add an overflow check before the allocation.

Link: https://lore.kernel.org/all/20260424155913.A19FDC19425@smtp.kernel.org
Fixes: 6fff607e2f14 ("bpf: sk_msg program helper bpf_msg_push_data")
Tested-by: Xiang Mei <xmei5@asu.edu>
Tested-by: Xinyu Ma <mmmxny@gmail.com>
Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Signed-off-by: Weiming Shi <bestswngs@gmail.com>
---
 net/core/filter.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/core/filter.c b/net/core/filter.c
index 9590877b0714f..3c8f1cedb217f 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2829,6 +2829,9 @@ BPF_CALL_4(bpf_msg_push_data, struct sk_msg *, msg, u32, start,
 	if (!space || (space == 1 && start != offset))
 		copy = msg->sg.data[i].length;
 
+	if (unlikely(copy + len < copy))
+		return -EINVAL;
+
 	page = alloc_pages(__GFP_NOWARN | GFP_ATOMIC | __GFP_COMP,
 			   get_order(copy + len));
 	if (unlikely(!page))
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH bpf 2/4] bpf, sockmap: Fix wrong rsge offset in bpf_msg_push_data()
  2026-06-11  1:35 [PATCH bpf 0/4] bpf, skmsg: some fixes for skmsg Jiayuan Chen
  2026-06-11  1:35 ` [PATCH bpf 1/4] bpf, sockmap: reject overflowing copy + len in bpf_msg_push_data() Jiayuan Chen
@ 2026-06-11  1:35 ` Jiayuan Chen
  2026-06-11  1:56   ` sashiko-bot
  2026-06-11  1:35 ` [PATCH bpf 3/4] bpf, sockmap: zero-initialize pages allocated in bpf_msg_push_data Jiayuan Chen
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 12+ messages in thread
From: Jiayuan Chen @ 2026-06-11  1:35 UTC (permalink / raw)
  To: bpf

From: Weiming Shi <bestswngs@gmail.com>

When bpf_msg_push_data() splits a scatterlist element into head and
tail, the tail's page offset is advanced by `start` (absolute message
byte offset) instead of `start - offset` (byte position within the
element). This makes rsge.offset overshoot by `offset` bytes, pointing
to the wrong location within the page or beyond its boundary. Consumers
of the corrupted entry either silently read wrong data or trigger an
out-of-bounds access.

 BUG: KASAN: slab-use-after-free in bpf_msg_pull_data (net/core/filter.c:2728)
 Read of size 32752 at addr ffff8881042f0010 by task poc/130
 Call Trace:
  __asan_memcpy (mm/kasan/shadow.c:105)
  bpf_msg_pull_data (net/core/filter.c:2728)
  bpf_prog_run_pin_on_cpu (include/linux/bpf.h:1402)
  sk_psock_msg_verdict (net/core/skmsg.c:934)
  tcp_bpf_send_verdict (net/ipv4/tcp_bpf.c:421)
  sock_sendmsg_nosec (net/socket.c:727)

Fixes: 6fff607e2f14 ("bpf: sk_msg program helper bpf_msg_push_data")
Reported-by: Xiang Mei <xmei5@asu.edu>
Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Signed-off-by: Weiming Shi <bestswngs@gmail.com>
---
 net/core/filter.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index 3c8f1cedb217f..3e555f276ba80 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2872,7 +2872,7 @@ BPF_CALL_4(bpf_msg_push_data, struct sk_msg *, msg, u32, start,
 
 		psge->length = start - offset;
 		rsge.length -= psge->length;
-		rsge.offset += start;
+		rsge.offset += start - offset;
 
 		sk_msg_iter_var_next(i);
 		sg_unmark_end(psge);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH bpf 3/4] bpf, sockmap: zero-initialize pages allocated in bpf_msg_push_data
  2026-06-11  1:35 [PATCH bpf 0/4] bpf, skmsg: some fixes for skmsg Jiayuan Chen
  2026-06-11  1:35 ` [PATCH bpf 1/4] bpf, sockmap: reject overflowing copy + len in bpf_msg_push_data() Jiayuan Chen
  2026-06-11  1:35 ` [PATCH bpf 2/4] bpf, sockmap: Fix wrong rsge offset " Jiayuan Chen
@ 2026-06-11  1:35 ` Jiayuan Chen
  2026-06-11  1:45   ` sashiko-bot
  2026-06-11  2:11   ` bot+bpf-ci
  2026-06-11  1:35 ` [PATCH bpf 4/4] bpf, sockmap: keep sk_msg copy state in sync Jiayuan Chen
  2026-06-11  1:40 ` [PATCH bpf 0/4] bpf, skmsg: some fixes for skmsg Jiayuan Chen
  4 siblings, 2 replies; 12+ messages in thread
From: Jiayuan Chen @ 2026-06-11  1:35 UTC (permalink / raw)
  To: bpf

From: Weiming Shi <bestswngs@gmail.com>

bpf_msg_push_data() allocates pages via alloc_pages() without
__GFP_ZERO. In the non-copy path, the entire page of uninitialized
heap content is added directly to the sk_msg scatterlist, which is
then transmitted over TCP to userspace via tcp_bpf_push(). In the
copy path, a gap of len bytes between the front and back memcpy
regions is similarly left uninitialized.

This leads to a kernel heap information leak: stale page content
including kernel pointers from the direct-map and vmemmap regions
is transmitted to userspace, which can be used to defeat KASLR.

Add __GFP_ZERO to the alloc_pages() call to ensure the allocated
page is always zeroed before it enters the scatterlist.

Link: https://lore.kernel.org/all/20260424155913.A19FDC19425@smtp.kernel.org
Fixes: 6fff607e2f14 ("bpf: sk_msg program helper bpf_msg_push_data")
Tested-by: Xiang Mei <xmei5@asu.edu>
Tested-by: Xinyu Ma <mmmxny@gmail.com>
Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Signed-off-by: Weiming Shi <bestswngs@gmail.com>
Cc: Jiayuan Chen <jiayuan.chen@linux.dev>
---
 net/core/filter.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index 3e555f276ba80..982d59cf659f5 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2716,7 +2716,7 @@ BPF_CALL_4(bpf_msg_pull_data, struct sk_msg *, msg, u32, start,
 	if (unlikely(bytes_sg_total > copy))
 		return -EINVAL;
 
-	page = alloc_pages(__GFP_NOWARN | GFP_ATOMIC | __GFP_COMP,
+	page = alloc_pages(__GFP_NOWARN | GFP_ATOMIC | __GFP_COMP | __GFP_ZERO,
 			   get_order(copy));
 	if (unlikely(!page))
 		return -ENOMEM;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH bpf 4/4] bpf, sockmap: keep sk_msg copy state in sync
  2026-06-11  1:35 [PATCH bpf 0/4] bpf, skmsg: some fixes for skmsg Jiayuan Chen
                   ` (2 preceding siblings ...)
  2026-06-11  1:35 ` [PATCH bpf 3/4] bpf, sockmap: zero-initialize pages allocated in bpf_msg_push_data Jiayuan Chen
@ 2026-06-11  1:35 ` Jiayuan Chen
  2026-06-11  1:47   ` sashiko-bot
  2026-06-11  1:40 ` [PATCH bpf 0/4] bpf, skmsg: some fixes for skmsg Jiayuan Chen
  4 siblings, 1 reply; 12+ messages in thread
From: Jiayuan Chen @ 2026-06-11  1:35 UTC (permalink / raw)
  To: bpf

From: Zhang Cen <rollkingzzc@gmail.com>

SK_MSG uses msg->sg.copy as per-scatterlist-entry provenance. Entries
with this bit set are copied before data/data_end are exposed to SK_MSG
BPF programs for direct packet access.

bpf_msg_pull_data(), bpf_msg_push_data(), and bpf_msg_pop_data()
rewrite the sk_msg scatterlist ring by collapsing, splitting, and
shifting entries. These operations move msg->sg.data[] entries, but the
parallel copy bitmap can be left behind on the old slot. A copied entry
can then return to msg->sg.start with its copy bit clear and be exposed
as directly writable packet data.

This corruption path requires an attached SK_MSG BPF program that calls
the mutating helpers; ordinary sockmap/TLS traffic that never runs
push/pop/pull helper sequences is not affected.

Keep msg->sg.copy synchronized with scatterlist entry moves, preserve
the copy bit when an entry is split, clear it when a helper replaces an
entry with a private page, and clear slots vacated by pull-data
compaction.

Fixes: 015632bb30da ("bpf: sk_msg program helper bpf_sk_msg_pull_data")
Fixes: 6fff607e2f14 ("bpf: sk_msg program helper bpf_msg_push_data")
Fixes: 7246d8ed4dcc ("bpf: helper to pop data from messages")
Cc: stable@vger.kernel.org
Co-developed-by: Han Guidong <2045gemini@gmail.com>
Cc: Jiayuan Chen <jiayuan.chen@linux.dev>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Han Guidong <2045gemini@gmail.com>
Signed-off-by: Zhang Cen <rollkingzzc@gmail.com>
---
 net/core/filter.c | 88 ++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 83 insertions(+), 5 deletions(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index 982d59cf659f5..95961b4e66a6d 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2654,6 +2654,38 @@ static void sk_msg_reset_curr(struct sk_msg *msg)
 	}
 }
 
+static bool sk_msg_elem_is_copy(const struct sk_msg *msg, u32 i)
+{
+	return test_bit(i, msg->sg.copy);
+}
+
+static void sk_msg_clear_elem_copy(struct sk_msg *msg, u32 i)
+{
+	__clear_bit(i, msg->sg.copy);
+}
+
+static void sk_msg_set_elem_copy(struct sk_msg *msg, u32 i)
+{
+	__set_bit(i, msg->sg.copy);
+}
+
+static void sk_msg_clear_copy_range(struct sk_msg *msg, u32 start, u32 end)
+{
+	while (start != end) {
+		sk_msg_clear_elem_copy(msg, start);
+		sk_msg_iter_var_next(start);
+	}
+}
+
+static void sk_msg_sg_move(struct sk_msg *msg, u32 dst, u32 src)
+{
+	msg->sg.data[dst] = msg->sg.data[src];
+	if (sk_msg_elem_is_copy(msg, src))
+		sk_msg_set_elem_copy(msg, dst);
+	else
+		sk_msg_clear_elem_copy(msg, dst);
+}
+
 static const struct bpf_func_proto bpf_msg_cork_bytes_proto = {
 	.func           = bpf_msg_cork_bytes,
 	.gpl_only       = false,
@@ -2692,7 +2724,7 @@ BPF_CALL_4(bpf_msg_pull_data, struct sk_msg *, msg, u32, start,
 	 * account for the headroom.
 	 */
 	bytes_sg_total = start - offset + bytes;
-	if (!test_bit(i, msg->sg.copy) && bytes_sg_total <= len)
+	if (!sk_msg_elem_is_copy(msg, i) && bytes_sg_total <= len)
 		goto out;
 
 	/* At this point we need to linearize multiple scatterlist
@@ -2738,6 +2770,7 @@ BPF_CALL_4(bpf_msg_pull_data, struct sk_msg *, msg, u32, start,
 	} while (i != last_sge);
 
 	sg_set_page(&msg->sg.data[first_sge], page, copy, 0);
+	sk_msg_clear_elem_copy(msg, first_sge);
 
 	/* To repair sg ring we need to shift entries. If we only
 	 * had a single entry though we can just replace it and
@@ -2747,8 +2780,14 @@ BPF_CALL_4(bpf_msg_pull_data, struct sk_msg *, msg, u32, start,
 	shift = last_sge > first_sge ?
 		last_sge - first_sge - 1 :
 		NR_MSG_FRAG_IDS - first_sge + last_sge - 1;
-	if (!shift)
+	if (!shift) {
+		sk_msg_clear_elem_copy(msg, msg->sg.end);
 		goto out;
+	}
+
+	i = first_sge;
+	sk_msg_iter_var_next(i);
+	sk_msg_clear_copy_range(msg, i, last_sge);
 
 	i = first_sge;
 	sk_msg_iter_var_next(i);
@@ -2762,16 +2801,18 @@ BPF_CALL_4(bpf_msg_pull_data, struct sk_msg *, msg, u32, start,
 		if (move_from == msg->sg.end)
 			break;
 
-		msg->sg.data[i] = msg->sg.data[move_from];
+		sk_msg_sg_move(msg, i, move_from);
 		msg->sg.data[move_from].length = 0;
 		msg->sg.data[move_from].page_link = 0;
 		msg->sg.data[move_from].offset = 0;
+		sk_msg_clear_elem_copy(msg, move_from);
 		sk_msg_iter_var_next(i);
 	} while (1);
 
 	msg->sg.end = msg->sg.end - shift > msg->sg.end ?
 		      msg->sg.end - shift + NR_MSG_FRAG_IDS :
 		      msg->sg.end - shift;
+	sk_msg_clear_elem_copy(msg, msg->sg.end);
 out:
 	sk_msg_reset_curr(msg);
 	msg->data = sg_virt(&msg->sg.data[first_sge]) + start - offset;
@@ -2794,6 +2835,8 @@ BPF_CALL_4(bpf_msg_push_data, struct sk_msg *, msg, u32, start,
 {
 	struct scatterlist sge, nsge, nnsge, rsge = {0}, *psge;
 	u32 new, i = 0, l = 0, space, copy = 0, offset = 0;
+	bool sge_copy = false, nsge_copy = false, nnsge_copy = false;
+	bool rsge_copy = false;
 	u8 *raw, *to, *from;
 	struct page *page;
 
@@ -2869,6 +2912,7 @@ BPF_CALL_4(bpf_msg_push_data, struct sk_msg *, msg, u32, start,
 			sk_msg_iter_var_prev(i);
 		psge = sk_msg_elem(msg, i);
 		rsge = sk_msg_elem_cpy(msg, i);
+		rsge_copy = sk_msg_elem_is_copy(msg, i);
 
 		psge->length = start - offset;
 		rsge.length -= psge->length;
@@ -2894,23 +2938,34 @@ BPF_CALL_4(bpf_msg_push_data, struct sk_msg *, msg, u32, start,
 	/* Shift one or two slots as needed */
 	sge = sk_msg_elem_cpy(msg, new);
 	sg_unmark_end(&sge);
+	sge_copy = sk_msg_elem_is_copy(msg, new);
 
 	nsge = sk_msg_elem_cpy(msg, i);
+	nsge_copy = sk_msg_elem_is_copy(msg, i);
 	if (rsge.length) {
 		sk_msg_iter_var_next(i);
 		nnsge = sk_msg_elem_cpy(msg, i);
+		nnsge_copy = sk_msg_elem_is_copy(msg, i);
 		sk_msg_iter_next(msg, end);
 	}
 
 	while (i != msg->sg.end) {
 		msg->sg.data[i] = sge;
+		if (sge_copy)
+			sk_msg_set_elem_copy(msg, i);
+		else
+			sk_msg_clear_elem_copy(msg, i);
 		sge = nsge;
+		sge_copy = nsge_copy;
 		sk_msg_iter_var_next(i);
 		if (rsge.length) {
 			nsge = nnsge;
+			nsge_copy = nnsge_copy;
 			nnsge = sk_msg_elem_cpy(msg, i);
+			nnsge_copy = sk_msg_elem_is_copy(msg, i);
 		} else {
 			nsge = sk_msg_elem_cpy(msg, i);
+			nsge_copy = sk_msg_elem_is_copy(msg, i);
 		}
 	}
 
@@ -2918,13 +2973,18 @@ BPF_CALL_4(bpf_msg_push_data, struct sk_msg *, msg, u32, start,
 	/* Place newly allocated data buffer */
 	sk_mem_charge(msg->sk, len);
 	msg->sg.size += len;
-	__clear_bit(new, msg->sg.copy);
+	sk_msg_clear_elem_copy(msg, new);
 	sg_set_page(&msg->sg.data[new], page, len + copy, 0);
 	if (rsge.length) {
 		get_page(sg_page(&rsge));
 		sk_msg_iter_var_next(new);
 		msg->sg.data[new] = rsge;
+		if (rsge_copy)
+			sk_msg_set_elem_copy(msg, new);
+		else
+			sk_msg_clear_elem_copy(msg, new);
 	}
+	sk_msg_clear_elem_copy(msg, msg->sg.end);
 
 	sk_msg_reset_curr(msg);
 	sk_msg_compute_data_pointers(msg);
@@ -2950,27 +3010,38 @@ static void sk_msg_shift_left(struct sk_msg *msg, int i)
 	do {
 		prev = i;
 		sk_msg_iter_var_next(i);
-		msg->sg.data[prev] = msg->sg.data[i];
+		sk_msg_sg_move(msg, prev, i);
 	} while (i != msg->sg.end);
 
 	sk_msg_iter_prev(msg, end);
+	sk_msg_clear_elem_copy(msg, msg->sg.end);
 }
 
 static void sk_msg_shift_right(struct sk_msg *msg, int i)
 {
 	struct scatterlist tmp, sge;
+	bool tmp_copy, sge_copy;
 
 	sk_msg_iter_next(msg, end);
 	sge = sk_msg_elem_cpy(msg, i);
+	sge_copy = sk_msg_elem_is_copy(msg, i);
 	sk_msg_iter_var_next(i);
 	tmp = sk_msg_elem_cpy(msg, i);
+	tmp_copy = sk_msg_elem_is_copy(msg, i);
 
 	while (i != msg->sg.end) {
 		msg->sg.data[i] = sge;
+		if (sge_copy)
+			sk_msg_set_elem_copy(msg, i);
+		else
+			sk_msg_clear_elem_copy(msg, i);
 		sk_msg_iter_var_next(i);
 		sge = tmp;
+		sge_copy = tmp_copy;
 		tmp = sk_msg_elem_cpy(msg, i);
+		tmp_copy = sk_msg_elem_is_copy(msg, i);
 	}
+	sk_msg_clear_elem_copy(msg, msg->sg.end);
 }
 
 BPF_CALL_4(bpf_msg_pop_data, struct sk_msg *, msg, u32, start,
@@ -3027,8 +3098,10 @@ BPF_CALL_4(bpf_msg_pop_data, struct sk_msg *, msg, u32, start,
 	 */
 	if (start != offset) {
 		struct scatterlist *nsge, *sge = sk_msg_elem(msg, i);
+		u32 sge_idx = i;
 		int a = start - offset;
 		int b = sge->length - pop - a;
+		bool sge_copy = sk_msg_elem_is_copy(msg, sge_idx);
 
 		sk_msg_iter_var_next(i);
 
@@ -3041,6 +3114,10 @@ BPF_CALL_4(bpf_msg_pop_data, struct sk_msg *, msg, u32, start,
 				sg_set_page(nsge,
 					    sg_page(sge),
 					    b, sge->offset + pop + a);
+				if (sge_copy)
+					sk_msg_set_elem_copy(msg, i);
+				else
+					sk_msg_clear_elem_copy(msg, i);
 			} else {
 				struct page *page, *orig;
 				u8 *to, *from;
@@ -3057,6 +3134,7 @@ BPF_CALL_4(bpf_msg_pop_data, struct sk_msg *, msg, u32, start,
 				memcpy(to, from, a);
 				memcpy(to + a, from + a + pop, b);
 				sg_set_page(sge, page, a + b, 0);
+				sk_msg_clear_elem_copy(msg, sge_idx);
 				put_page(orig);
 			}
 			pop = 0;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf 0/4] bpf, skmsg: some fixes for skmsg
  2026-06-11  1:35 [PATCH bpf 0/4] bpf, skmsg: some fixes for skmsg Jiayuan Chen
                   ` (3 preceding siblings ...)
  2026-06-11  1:35 ` [PATCH bpf 4/4] bpf, sockmap: keep sk_msg copy state in sync Jiayuan Chen
@ 2026-06-11  1:40 ` Jiayuan Chen
  4 siblings, 0 replies; 12+ messages in thread
From: Jiayuan Chen @ 2026-06-11  1:40 UTC (permalink / raw)
  To: bpf


On 6/11/26 9:35 AM, Jiayuan Chen wrote:
> All fixes are from previous patches sent by Weiming Shi and Zhang Cen,
> which have already been reviewed by me and John.
>
> https://lore.kernel.org/bpf/20260520102715.3033936-1-rollkingzzc@gmail.com/
> https://lore.kernel.org/bpf/20260424190310.1520555-2-bestswngs@gmail.com/
> https://lore.kernel.org/bpf/20260424191602.1522411-3-bestswngs@gmail.com/
> https://lore.kernel.org/bpf/20260423155807.1245644-2-bestswngs@gmail.com/

Note:

Because there were too many similar patches, I selected the earliest 
submitted patch and retained their credit.


> Weiming Shi (3):
>    bpf, sockmap: reject overflowing copy + len in bpf_msg_push_data()
>    bpf, sockmap: Fix wrong rsge offset in bpf_msg_push_data()
>    bpf, sockmap: zero-initialize pages allocated in bpf_msg_push_data
>
> Zhang Cen (1):
>    bpf, sockmap: keep sk_msg copy state in sync
>
>   net/core/filter.c | 95 +++++++++++++++++++++++++++++++++++++++++++----
>   1 file changed, 88 insertions(+), 7 deletions(-)
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf 3/4] bpf, sockmap: zero-initialize pages allocated in bpf_msg_push_data
  2026-06-11  1:35 ` [PATCH bpf 3/4] bpf, sockmap: zero-initialize pages allocated in bpf_msg_push_data Jiayuan Chen
@ 2026-06-11  1:45   ` sashiko-bot
  2026-06-11  1:47     ` Jiayuan Chen
  2026-06-11  2:11   ` bot+bpf-ci
  1 sibling, 1 reply; 12+ messages in thread
From: sashiko-bot @ 2026-06-11  1:45 UTC (permalink / raw)
  To: Jiayuan Chen; +Cc: bpf

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:

Pre-existing issues:
- [High] The patch intended to fix a heap information leak in `bpf_msg_push_data()` but mistakenly modified `bpf_msg_pull_data()` instead. The uninitialized memory exposure in `bpf_msg_push_data()` remains active.
--

commit 6cfebe959c2ffe911ff763388e942ea998f90682
Author: Weiming Shi <bestswngs@gmail.com>

bpf, sockmap: zero-initialize pages allocated in bpf_msg_push_data

The commit intends to fix a kernel heap information leak in bpf_msg_push_data
by zero-initializing the allocated page before adding it to the scatterlist.
It adds the __GFP_ZERO flag to the alloc_pages call to prevent stale kernel
pointers from leaking to userspace.

Link: https://lore.kernel.org/all/20260424155913.A19FDC19425@smtp.kernel.org

> diff --git a/net/core/filter.c b/net/core/filter.c
> index 3e555f276ba80..982d59cf659f5 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -2716,7 +2716,7 @@ BPF_CALL_4(bpf_msg_pull_data, struct sk_msg *, msg, u32, start,
>  	if (unlikely(bytes_sg_total > copy))
>  		return -EINVAL;
>  
> -	page = alloc_pages(__GFP_NOWARN | GFP_ATOMIC | __GFP_COMP,
> +	page = alloc_pages(__GFP_NOWARN | GFP_ATOMIC | __GFP_COMP | __GFP_ZERO,
>  			   get_order(copy));

[Severity: High]
This isn't a bug introduced by this patch, but this patch completely fails to
fix the issue described in the commit message.

The commit message states the fix is for bpf_msg_push_data, but the diff
applies the __GFP_ZERO flag to bpf_msg_pull_data instead.

In bpf_msg_pull_data, the bytes exposed to the scatterlist are already fully
initialized via memcpy, making the __GFP_ZERO flag unnecessary overhead.

Meanwhile, the alloc_pages call in bpf_msg_push_data still lacks the __GFP_ZERO
flag, leaving the original information leak fully exploitable:

net/core/filter.c:bpf_msg_push_data() {
        ...
        page = alloc_pages(__GFP_NOWARN | GFP_ATOMIC | __GFP_COMP,
                           get_order(copy + len));
        ...
}

Could the __GFP_ZERO flag be moved to the correct alloc_pages call in
bpf_msg_push_data?

>  	if (unlikely(!page))
>  		return -ENOMEM;

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260611013547.247039-1-jiayuan.chen@linux.dev?part=3

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf 3/4] bpf, sockmap: zero-initialize pages allocated in bpf_msg_push_data
  2026-06-11  1:45   ` sashiko-bot
@ 2026-06-11  1:47     ` Jiayuan Chen
  0 siblings, 0 replies; 12+ messages in thread
From: Jiayuan Chen @ 2026-06-11  1:47 UTC (permalink / raw)
  To: sashiko-reviews; +Cc: bpf


On 6/11/26 9:45 AM, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
>
> Pre-existing issues:
> - [High] The patch intended to fix a heap information leak in `bpf_msg_push_data()` but mistakenly modified `bpf_msg_pull_data()` instead. The uninitialized memory exposure in `bpf_msg_push_data()` remains active.
> --
>
> commit 6cfebe959c2ffe911ff763388e942ea998f90682
> Author: Weiming Shi <bestswngs@gmail.com>
>
> bpf, sockmap: zero-initialize pages allocated in bpf_msg_push_data
>
> The commit intends to fix a kernel heap information leak in bpf_msg_push_data
> by zero-initializing the allocated page before adding it to the scatterlist.
> It adds the __GFP_ZERO flag to the alloc_pages call to prevent stale kernel
> pointers from leaking to userspace.
>
> Link: https://lore.kernel.org/all/20260424155913.A19FDC19425@smtp.kernel.org
>
>> diff --git a/net/core/filter.c b/net/core/filter.c
>> index 3e555f276ba80..982d59cf659f5 100644
>> --- a/net/core/filter.c
>> +++ b/net/core/filter.c
>> @@ -2716,7 +2716,7 @@ BPF_CALL_4(bpf_msg_pull_data, struct sk_msg *, msg, u32, start,
>>   	if (unlikely(bytes_sg_total > copy))
>>   		return -EINVAL;
>>   
>> -	page = alloc_pages(__GFP_NOWARN | GFP_ATOMIC | __GFP_COMP,
>> +	page = alloc_pages(__GFP_NOWARN | GFP_ATOMIC | __GFP_COMP | __GFP_ZERO,
>>   			   get_order(copy));
> [Severity: High]
> This isn't a bug introduced by this patch, but this patch completely fails to
> fix the issue described in the commit message.


You are right.

When resolving the conflict, something went wrong.


>
> The commit message states the fix is for bpf_msg_push_data, but the diff
> applies the __GFP_ZERO flag to bpf_msg_pull_data instead.
>
> In bpf_msg_pull_data, the bytes exposed to the scatterlist are already fully
> initialized via memcpy, making the __GFP_ZERO flag unnecessary overhead.
>
> Meanwhile, the alloc_pages call in bpf_msg_push_data still lacks the __GFP_ZERO
> flag, leaving the original information leak fully exploitable:
>
> net/core/filter.c:bpf_msg_push_data() {
>          ...
>          page = alloc_pages(__GFP_NOWARN | GFP_ATOMIC | __GFP_COMP,
>                             get_order(copy + len));
>          ...
> }
>
> Could the __GFP_ZERO flag be moved to the correct alloc_pages call in
> bpf_msg_push_data?
>
>>   	if (unlikely(!page))
>>   		return -ENOMEM;

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf 4/4] bpf, sockmap: keep sk_msg copy state in sync
  2026-06-11  1:35 ` [PATCH bpf 4/4] bpf, sockmap: keep sk_msg copy state in sync Jiayuan Chen
@ 2026-06-11  1:47   ` sashiko-bot
  0 siblings, 0 replies; 12+ messages in thread
From: sashiko-bot @ 2026-06-11  1:47 UTC (permalink / raw)
  To: Jiayuan Chen; +Cc: bpf

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:

Pre-existing issues:
- [High] Uninitialized memory disclosure in bpf_msg_push_data() due to missing __GFP_ZERO.
--

commit 4e239ebdfdc9495774d38523350637e9a010cf74
Author: Zhang Cen <rollkingzzc@gmail.com>

bpf, sockmap: keep sk_msg copy state in sync

This commit fixes an issue where modifying helpers like bpf_msg_push_data()
rewrite the scatterlist ring but leave the copy bitmap unsynchronized. It
keeps the msg->sg.copy bitmap synchronized with scatterlist entry moves to
prevent data corruption.

> diff --git a/net/core/filter.c b/net/core/filter.c
> index 982d59cf659f5..95961b4e66a6d 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
[ ... ]
> @@ -2918,13 +2973,18 @@ BPF_CALL_4(bpf_msg_push_data, struct sk_msg *, msg, u32, start,
>  	/* Place newly allocated data buffer */
>  	sk_mem_charge(msg->sk, len);
>  	msg->sg.size += len;
> -	__clear_bit(new, msg->sg.copy);
> +	sk_msg_clear_elem_copy(msg, new);
>  	sg_set_page(&msg->sg.data[new], page, len + copy, 0);

[Severity: High]
This is a pre-existing issue, but does this allocation lack initialization?

When the page is allocated earlier in bpf_msg_push_data() via alloc_pages()
without the __GFP_ZERO flag:

    page = alloc_pages(__GFP_NOWARN | GFP_ATOMIC | __GFP_COMP,
                       get_order(copy + len));

The newly inserted payload space of size len is not zeroed. Then it is
added to the scatterlist here via sg_set_page().

Could this lead to an uninitialized memory disclosure over the network?
If copy is 0, the entire page might be left uninitialized. If copy is
greater than 0, the gap of len bytes between the copied front and back
portions might be uninitialized.

Because the BPF verifier cannot enforce that a BPF program initializes this
newly inserted space before the packet is sent, could the kernel transmit
uninitialized page memory?

>  	if (rsge.length) {
>  		get_page(sg_page(&rsge));
>  		sk_msg_iter_var_next(new);
>  		msg->sg.data[new] = rsge;
> +		if (rsge_copy)
> +			sk_msg_set_elem_copy(msg, new);
> +		else
> +			sk_msg_clear_elem_copy(msg, new);
>  	}
> +	sk_msg_clear_elem_copy(msg, msg->sg.end);
[ ... ]

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260611013547.247039-1-jiayuan.chen@linux.dev?part=4

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf 1/4] bpf, sockmap: reject overflowing copy + len in bpf_msg_push_data()
  2026-06-11  1:35 ` [PATCH bpf 1/4] bpf, sockmap: reject overflowing copy + len in bpf_msg_push_data() Jiayuan Chen
@ 2026-06-11  1:49   ` sashiko-bot
  0 siblings, 0 replies; 12+ messages in thread
From: sashiko-bot @ 2026-06-11  1:49 UTC (permalink / raw)
  To: Jiayuan Chen; +Cc: bpf

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:

Pre-existing issues:
- [Critical] Out-of-bounds memory access and scatterlist corruption when appending to a full or nearly full ring buffer.
--

commit 98c76ef1269e7a7079e13de8f22677af2c26b2dd
Author: Weiming Shi <bestswngs@gmail.com>

bpf, sockmap: reject overflowing copy + len in bpf_msg_push_data()

This commit adds an overflow check to bpf_msg_push_data() when calculating
the page allocation size. It prevents an integer overflow where a crafted
length could wrap the copy + len sum, leading to an undersized allocation
and a subsequent out-of-bounds memory access.

Link: https://lore.kernel.org/all/20260424155913.A19FDC19425@smtp.kernel.org

> diff --git a/net/core/filter.c b/net/core/filter.c
> index 9590877b0714f..3c8f1cedb217f 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -2829,6 +2829,9 @@ BPF_CALL_4(bpf_msg_push_data, struct sk_msg *, msg, u32, start,
>  	if (!space || (space == 1 && start != offset))
>  		copy = msg->sg.data[i].length;

[Severity: Critical]
This is a pre-existing issue, but can this read the length of the wrong
scatterlist element in bpf_msg_push_data()?

When appending data to a socket message (start == msg->sg.size), the loop
terminates with i pointing to msg->sg.end. If the scatterlist ring is full
(!space), msg->sg.end wraps around and equals msg->sg.start. This evaluates:

    copy = msg->sg.data[i].length;

which incorrectly reads the length of the first element in the ring buffer
instead of the last.

This allocates a page based on the wrong length, but the coalescing step
steps i back and copies bytes from the actual last element:

    memcpy(raw, from, front);

If the last element is significantly larger than the first, does this cause
an out-of-bounds memcpy that can corrupt adjacent kernel heap memory? Also, if
there is exactly one free slot (space == 1), does copy read the uninitialized
length of that free slot?

>  
> +	if (unlikely(copy + len < copy))
> +		return -EINVAL;
> +
>  	page = alloc_pages(__GFP_NOWARN | GFP_ATOMIC | __GFP_COMP,
>  			   get_order(copy + len));
>  	if (unlikely(!page))

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260611013547.247039-1-jiayuan.chen@linux.dev?part=1

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf 2/4] bpf, sockmap: Fix wrong rsge offset in bpf_msg_push_data()
  2026-06-11  1:35 ` [PATCH bpf 2/4] bpf, sockmap: Fix wrong rsge offset " Jiayuan Chen
@ 2026-06-11  1:56   ` sashiko-bot
  0 siblings, 0 replies; 12+ messages in thread
From: sashiko-bot @ 2026-06-11  1:56 UTC (permalink / raw)
  To: Jiayuan Chen; +Cc: bpf

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:

Pre-existing issues:
- [Critical] `bpf_msg_push_data` incorrectly reads the length of an uninitialized ring buffer slot when appending data to a full `sk_msg`, leading to ring buffer destruction and an infinite loop/crash during message free.
- [High] bpf_msg_push_data allocates uninitialized memory causing a kernel info leak.
--

commit d70a24eef21929f8e8a694f6c0b71f98f2d76089
Author: Weiming Shi <bestswngs@gmail.com>

bpf, sockmap: Fix wrong rsge offset in bpf_msg_push_data()

When bpf_msg_push_data() splits a scatterlist element into head and
tail, the tail's page offset is advanced by start (absolute message
byte offset) instead of start - offset (byte position within the
element). This makes rsge.offset overshoot by offset bytes, pointing
to the wrong location within the page or beyond its boundary. Consumers
of the corrupted entry either silently read wrong data or trigger an
out-of-bounds access.

> diff --git a/net/core/filter.c b/net/core/filter.c
> index 3c8f1cedb217f..3e555f276ba80 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -2828,8 +2828,6 @@ BPF_CALL_4(bpf_msg_push_data, struct sk_msg *, msg, u32, start,
>  	if (!space || (space == 1 && start != offset))
>  		copy = msg->sg.data[i].length;

[Severity: Critical]
This is a pre-existing issue, but does this incorrectly read the length of an
uninitialized ring buffer slot when appending data to a full sk_msg?

When bpf_msg_push_data() is called to append data exactly at the end of a
full sk_msg (where space == 0), the scatterlist search loop finishes with
i == msg->sg.end. It then seems to read copy = msg->sg.data[i].length,
accessing an uninitialized length from the unused slot at msg->sg.end.

If this garbage copy value is > 0, the function allocates an oversized page
and sets the new scatterlist element's length to len + copy, but
msg->sg.size is only incremented by len. Could this cause msg->sg.size to
underflow during __sk_msg_free(), leading to an infinite loop and double
free?

If the garbage copy value is 0, the code later executes
sk_msg_iter_next(msg, end). Doesn't this increment msg->sg.end and wrap it
to msg->sg.start, destroying the ring buffer state and leading to
memory leaks or corruption?

>  
>  	if (unlikely(copy + len < copy))
>  		return -EINVAL;
>  
>  	page = alloc_pages(__GFP_NOWARN | GFP_ATOMIC | __GFP_COMP,
>  			   get_order(copy + len));

[Severity: High]
This isn't a bug introduced by this patch, but could this allocation be missing
the __GFP_ZERO flag?

When bpf_msg_push_data() inserts new space into a socket message, it
copies existing data to the front and back of the new page, but appears to
leave the newly inserted len bytes uninitialized.

In the non-copy path, the entire page is added to the scatterlist
uninitialized. Could this uninitialized kernel memory be exposed to the
sk_msg scatterlist and transmitted to the network or read by userspace via
tcp_bpf_push(), causing a kernel information leak?

It appears a later patch in the series (6cfebe959c2ffe911ff763388e942ea998f90682)
claims to fix this in its commit message, but it mistakenly patches
bpf_msg_pull_data() instead of bpf_msg_push_data(), leaving this issue
unfixed.

>  	if (unlikely(!page))
>  		return -ENOMEM;

[ ... ]

> @@ -2872,7 +2872,7 @@ BPF_CALL_4(bpf_msg_push_data, struct sk_msg *, msg, u32, start,
>  
>  		psge->length = start - offset;
>  		rsge.length -= psge->length;
> -		rsge.offset += start;
> +		rsge.offset += start - offset;
>  
>  		sk_msg_iter_var_next(i);
>  		sg_unmark_end(psge);

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260611013547.247039-1-jiayuan.chen@linux.dev?part=2

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf 3/4] bpf, sockmap: zero-initialize pages allocated in bpf_msg_push_data
  2026-06-11  1:35 ` [PATCH bpf 3/4] bpf, sockmap: zero-initialize pages allocated in bpf_msg_push_data Jiayuan Chen
  2026-06-11  1:45   ` sashiko-bot
@ 2026-06-11  2:11   ` bot+bpf-ci
  1 sibling, 0 replies; 12+ messages in thread
From: bot+bpf-ci @ 2026-06-11  2:11 UTC (permalink / raw)
  To: jiayuan.chen, bpf
  Cc: ast, andrii, daniel, martin.lau, eddyz87, yonghong.song, clm,
	ihor.solodrai

[-- Attachment #1: Type: text/plain, Size: 2169 bytes --]

> diff --git a/net/core/filter.c b/net/core/filter.c
> index 3e555f276ba8..982d59cf659f 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -2716,7 +2716,7 @@ BPF_CALL_4(bpf_msg_pull_data, struct sk_msg *, msg, u32, start,
>  	if (unlikely(bytes_sg_total > copy))
>  		return -EINVAL;
>
> -	page = alloc_pages(__GFP_NOWARN | GFP_ATOMIC | __GFP_COMP,
> +	page = alloc_pages(__GFP_NOWARN | GFP_ATOMIC | __GFP_COMP | __GFP_ZERO,
>  			   get_order(copy));
>  	if (unlikely(!page))
>  		return -ENOMEM;

The hunk header shows this change is inside bpf_msg_pull_data
(BPF_CALL_4 at line 2716). However, the commit subject, body, and Fixes:
tag all reference bpf_msg_push_data, not bpf_msg_pull_data. Does this patch
modify the wrong function?

The commit message describes the heap info leak in bpf_msg_push_data: in
the copy path "a gap of len bytes between the front and back memcpy regions
is left uninitialized", and in the non-copy path "the entire page of
uninitialized heap content is added directly to the sk_msg scatterlist".

Looking at bpf_msg_push_data in net/core/filter.c, its alloc_pages() call
at line 2835 is still:

    page = alloc_pages(__GFP_NOWARN | GFP_ATOMIC | __GFP_COMP,
                       get_order(copy + len));

without __GFP_ZERO. The copy path (lines 2840-2865) writes raw[0..front)
and raw[front+len .. front+len+back) via two memcpy calls, leaving
raw[front .. front+len) uninitialized. This page is then placed into the
scatterlist via sg_set_page at line 2922 and transmitted. This is the leak
described in the commit message.

The function actually modified, bpf_msg_pull_data, does not have this
issue. Its memcpy loop (lines 2726-2738) fully overwrites all 'copy' bytes
before they are placed via sg_set_page at line 2740. Adding __GFP_ZERO
there does not fix the documented security concern.

Should the __GFP_ZERO also be added to bpf_msg_push_data's alloc_pages at
line 2835?

---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/27318345998

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2026-06-11  2:11 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-11  1:35 [PATCH bpf 0/4] bpf, skmsg: some fixes for skmsg Jiayuan Chen
2026-06-11  1:35 ` [PATCH bpf 1/4] bpf, sockmap: reject overflowing copy + len in bpf_msg_push_data() Jiayuan Chen
2026-06-11  1:49   ` sashiko-bot
2026-06-11  1:35 ` [PATCH bpf 2/4] bpf, sockmap: Fix wrong rsge offset " Jiayuan Chen
2026-06-11  1:56   ` sashiko-bot
2026-06-11  1:35 ` [PATCH bpf 3/4] bpf, sockmap: zero-initialize pages allocated in bpf_msg_push_data Jiayuan Chen
2026-06-11  1:45   ` sashiko-bot
2026-06-11  1:47     ` Jiayuan Chen
2026-06-11  2:11   ` bot+bpf-ci
2026-06-11  1:35 ` [PATCH bpf 4/4] bpf, sockmap: keep sk_msg copy state in sync Jiayuan Chen
2026-06-11  1:47   ` sashiko-bot
2026-06-11  1:40 ` [PATCH bpf 0/4] bpf, skmsg: some fixes for skmsg Jiayuan Chen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox