[PATCH bpf] bpf, sockmap: Fix wrong rsge offset in bpf_msg_push

BPF List
 help / color / mirror / Atom feed

* [PATCH bpf] bpf, sockmap: Fix wrong rsge offset in bpf_msg_push_data()
@ 2026-04-23 15:58 Weiming Shi
  2026-04-24 15:59 ` sashiko-bot
  2026-04-25  8:25 ` Jiayuan Chen
  0 siblings, 2 replies; 4+ messages in thread
From: Weiming Shi @ 2026-04-23 15:58 UTC (permalink / raw)
  To: Martin KaFai Lau, Daniel Borkmann, Alexei Starovoitov,
	Andrii Nakryiko, David S . Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni
  Cc: John Fastabend, Stanislav Fomichev, Eduard Zingerman, Song Liu,
	Yonghong Song, KP Singh, Hao Luo, Jiri Olsa, Simon Horman, bpf,
	netdev, Xiang Mei, Weiming Shi

When bpf_msg_push_data() splits a scatterlist element into head and
tail, the tail's page offset is advanced by `start` (absolute message
byte offset) instead of `start - offset` (byte position within the
element). This makes rsge.offset overshoot by `offset` bytes, pointing
to the wrong location within the page or beyond its boundary. Consumers
of the corrupted entry either silently read wrong data or trigger an
out-of-bounds access.

 BUG: KASAN: slab-use-after-free in bpf_msg_pull_data (net/core/filter.c:2728)
 Read of size 32752 at addr ffff8881042f0010 by task poc/130
 Call Trace:
  __asan_memcpy (mm/kasan/shadow.c:105)
  bpf_msg_pull_data (net/core/filter.c:2728)
  bpf_prog_run_pin_on_cpu (include/linux/bpf.h:1402)
  sk_psock_msg_verdict (net/core/skmsg.c:934)
  tcp_bpf_send_verdict (net/ipv4/tcp_bpf.c:421)
  sock_sendmsg_nosec (net/socket.c:727)

Fixes: 6fff607e2f14 ("bpf: sk_msg program helper bpf_msg_push_data")
Reported-by: Xiang Mei <xmei5@asu.edu>
Signed-off-by: Weiming Shi <bestswngs@gmail.com>
---
 net/core/filter.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index 3e56b567bd18..f12fbc49bc03 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2865,7 +2865,7 @@ BPF_CALL_4(bpf_msg_push_data, struct sk_msg *, msg, u32, start,

 		psge->length = start - offset;
 		rsge.length -= psge->length;
-		rsge.offset += start;
+		rsge.offset += start - offset;

 		sk_msg_iter_var_next(i);
 		sg_unmark_end(psge);
--
2.43.0

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH bpf] bpf, sockmap: Fix wrong rsge offset in bpf_msg_push_data()
  2026-04-23 15:58 [PATCH bpf] bpf, sockmap: Fix wrong rsge offset in bpf_msg_push_data() Weiming Shi
@ 2026-04-24 15:59 ` sashiko-bot
  2026-04-24 19:21   ` Weiming Shi
  2026-04-25  8:25 ` Jiayuan Chen
  1 sibling, 1 reply; 4+ messages in thread
From: sashiko-bot @ 2026-04-24 15:59 UTC (permalink / raw)
  To: Weiming Shi; +Cc: bpf

Sashiko AI review found 4 potential issue(s):
- [Critical] When appending data to a full or nearly full sk_msg, bpf_msg_push_data() reads the length of the wrong scatterlist element, leading to a buffer overflow or ring corruption.
- [Critical] Missing bounds check on the 'len' parameter causes an integer overflow and a massive Out-Of-Bounds write.
- [Critical] Scatterlist shifts fail to update the msg->sg.copy bitmap, allowing unauthorized modification of shared pages.
- [High] Uninitialized kernel memory allocation in bpf_msg_push_data() leads to an Information Leak.
--

commit 15a0d48ce1bdcc5941167a3e0c2a258bb5038277
Author: Weiming Shi <bestswngs@gmail.com>

bpf, sockmap: Fix wrong rsge offset in bpf_msg_push_data()

When bpf_msg_push_data() splits a scatterlist element into head and
tail, the tail's page offset is advanced by start instead of
start - offset. This overshoot points to the wrong location within
the page or beyond its boundary.

> diff --git a/net/core/filter.c b/net/core/filter.c
> index 5fa9189eb772b2..a7a9ea4cb2f9a4 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -2860,7 +2860,7 @@ BPF_CALL_4(bpf_msg_push_data, struct sk_msg *, msg, u32, start,
>  
>  		psge->length = start - offset;
>  		rsge.length -= psge->length;
> -		rsge.offset += start;
> +		rsge.offset += start - offset;
>  
>  		sk_msg_iter_var_next(i);
>  		sg_unmark_end(psge);

This isn't a bug introduced by this patch, but looking at the earlier part
of bpf_msg_push_data(), is the len parameter bounds-checked?

net/core/filter.c:bpf_msg_push_data() {
...
	if (!space || (space == 1 && start != offset))
		copy = msg->sg.data[i].length;

	page = alloc_pages(__GFP_NOWARN | GFP_ATOMIC | __GFP_COMP,
			   get_order(copy + len));
...
		if (back) {
			from += front;
			to = raw + front + len;

			memcpy(to, from, back);
		}
...
}

If a large value is provided for len (e.g., 0xFFFFFFFF), could copy + len
overflow 32-bit arithmetic, resulting in a very small allocation?

Later, when calculating to = raw + front + len, does the zero-extended len
advance the destination pointer out-of-bounds, causing a massive memory
corruption during memcpy()?

Additionally, bpf_msg_push_data() uses alloc_pages() without the
__GFP_ZERO flag. After copying the split element's front and back parts,
do the middle len bytes remain uninitialized?

Could this allow uninitialized kernel heap data to be added to the sk_msg
and leaked over the network when the message is transmitted?

Also, when appending data (start == msg->sg.size), the search loop terminates
with i == msg->sg.end. If space <= 1, the code executes
copy = msg->sg.data[i].length.

Does this read from an unused slot?

If the slot has a garbage length from a previous pop or shift, could copy
take that value, causing alloc_pages() to allocate a smaller size and
leading to a kernel heap buffer overflow during the subsequent
memcpy(raw, from, front)?

And if the garbage length is 0, could it skip the if (copy) block and
increment msg->sg.end without space, corrupting the scatterlist ring array?

Finally, when scatterlist elements are shifted in bpf_msg_pop_data():

net/core/filter.c:sk_msg_shift_left() {
...
	do {
		prev = i;
		sk_msg_iter_var_next(i);
		msg->sg.data[prev] = msg->sg.data[i];
	} while (i != msg->sg.end);
...
}

Should the msg->sg.copy bitmap also be shifted?

If a shared page (where the copy bit is 1) is shifted into a slot where
the bit is 0, will it be falsely identified as a private page?

Could a subsequent bpf_msg_pull_data() then pass the
!test_bit(i, msg->sg.copy) check and return a direct writable pointer to
the shared page, enabling unauthorized modification of read-only memory?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260423155807.1245644-2-bestswngs@gmail.com?part=1

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH bpf] bpf, sockmap: Fix wrong rsge offset in bpf_msg_push_data()
  2026-04-24 15:59 ` sashiko-bot
@ 2026-04-24 19:21   ` Weiming Shi
  0 siblings, 0 replies; 4+ messages in thread
From: Weiming Shi @ 2026-04-24 19:21 UTC (permalink / raw)
  To: sashiko; +Cc: bpf

On 26-04-24 15:59, sashiko-bot@kernel.org wrote:
> Sashiko AI review found 4 potential issue(s):
> - [Critical] When appending data to a full or nearly full sk_msg, bpf_msg_push_data() reads the length of the wrong scatterlist element, leading to a buffer overflow or ring corruption.
> - [Critical] Missing bounds check on the 'len' parameter causes an integer overflow and a massive Out-Of-Bounds write.
> - [Critical] Scatterlist shifts fail to update the msg->sg.copy bitmap, allowing unauthorized modification of shared pages.
> - [High] Uninitialized kernel memory allocation in bpf_msg_push_data() leads to an Information Leak.
> --
> 
> commit 15a0d48ce1bdcc5941167a3e0c2a258bb5038277
> Author: Weiming Shi <bestswngs@gmail.com>
> 
> bpf, sockmap: Fix wrong rsge offset in bpf_msg_push_data()
> 
> When bpf_msg_push_data() splits a scatterlist element into head and
> tail, the tail's page offset is advanced by start instead of
> start - offset. This overshoot points to the wrong location within
> the page or beyond its boundary.
> 
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index 5fa9189eb772b2..a7a9ea4cb2f9a4 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -2860,7 +2860,7 @@ BPF_CALL_4(bpf_msg_push_data, struct sk_msg *, msg, u32, start,
> >  
> >  		psge->length = start - offset;
> >  		rsge.length -= psge->length;
> > -		rsge.offset += start;
> > +		rsge.offset += start - offset;
> >  
> >  		sk_msg_iter_var_next(i);
> >  		sg_unmark_end(psge);
> 
> This isn't a bug introduced by this patch, but looking at the earlier part
> of bpf_msg_push_data(), is the len parameter bounds-checked?
> 
> net/core/filter.c:bpf_msg_push_data() {
> ...
> 	if (!space || (space == 1 && start != offset))
> 		copy = msg->sg.data[i].length;
> 
> 	page = alloc_pages(__GFP_NOWARN | GFP_ATOMIC | __GFP_COMP,
> 			   get_order(copy + len));
> ...
> 		if (back) {
> 			from += front;
> 			to = raw + front + len;
> 
> 			memcpy(to, from, back);
> 		}
> ...
> }
> 
> If a large value is provided for len (e.g., 0xFFFFFFFF), could copy + len
> overflow 32-bit arithmetic, resulting in a very small allocation?
> 
> Later, when calculating to = raw + front + len, does the zero-extended len
> advance the destination pointer out-of-bounds, causing a massive memory
> corruption during memcpy()?
> 
> Additionally, bpf_msg_push_data() uses alloc_pages() without the
> __GFP_ZERO flag. After copying the split element's front and back parts,
> do the middle len bytes remain uninitialized?
> 
> Could this allow uninitialized kernel heap data to be added to the sk_msg
> and leaked over the network when the message is transmitted?
> 
> Also, when appending data (start == msg->sg.size), the search loop terminates
> with i == msg->sg.end. If space <= 1, the code executes
> copy = msg->sg.data[i].length.
> 
> Does this read from an unused slot?
> 
> If the slot has a garbage length from a previous pop or shift, could copy
> take that value, causing alloc_pages() to allocate a smaller size and
> leading to a kernel heap buffer overflow during the subsequent
> memcpy(raw, from, front)?
> 
> And if the garbage length is 0, could it skip the if (copy) block and
> increment msg->sg.end without space, corrupting the scatterlist ring array?
> 
> Finally, when scatterlist elements are shifted in bpf_msg_pop_data():
> 
> net/core/filter.c:sk_msg_shift_left() {
> ...
> 	do {
> 		prev = i;
> 		sk_msg_iter_var_next(i);
> 		msg->sg.data[prev] = msg->sg.data[i];
> 	} while (i != msg->sg.end);
> ...
> }
> 
> Should the msg->sg.copy bitmap also be shifted?
> 
> If a shared page (where the copy bit is 1) is shifted into a slot where
> the bit is 0, will it be falsely identified as a private page?
> 
> Could a subsequent bpf_msg_pull_data() then pass the
> !test_bit(i, msg->sg.copy) check and return a direct writable pointer to
> the shared page, enabling unauthorized modification of read-only memory?
> 
> -- 
> Sashiko AI review · https://sashiko.dev/#/patchset/20260423155807.1245644-2-bestswngs@gmail.com?part=1

We verified the findings locally with QEMU. Two of the four issues are
confirmed with reproducers:

- copy + len integer overflow leading to heap buffer overflow [1]
- uninitialized memory leak via alloc_pages without __GFP_ZERO [2]

Patches sent for both.
[1] https://lore.kernel.org/all/20260424191602.1522411-3-bestswngs@gmail.com/
[2] https://lore.kernel.org/all/20260424190310.1520555-2-bestswngs@gmail.com/


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH bpf] bpf, sockmap: Fix wrong rsge offset in bpf_msg_push_data()
  2026-04-23 15:58 [PATCH bpf] bpf, sockmap: Fix wrong rsge offset in bpf_msg_push_data() Weiming Shi
  2026-04-24 15:59 ` sashiko-bot
@ 2026-04-25  8:25 ` Jiayuan Chen
  1 sibling, 0 replies; 4+ messages in thread
From: Jiayuan Chen @ 2026-04-25  8:25 UTC (permalink / raw)
  To: Weiming Shi, Martin KaFai Lau, Daniel Borkmann,
	Alexei Starovoitov, Andrii Nakryiko, David S . Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: John Fastabend, Stanislav Fomichev, Eduard Zingerman, Song Liu,
	Yonghong Song, KP Singh, Hao Luo, Jiri Olsa, Simon Horman, bpf,
	netdev, Xiang Mei


On 4/23/26 11:58 PM, Weiming Shi wrote:
> When bpf_msg_push_data() splits a scatterlist element into head and
> tail, the tail's page offset is advanced by `start` (absolute message
> byte offset) instead of `start - offset` (byte position within the
> element). This makes rsge.offset overshoot by `offset` bytes, pointing
> to the wrong location within the page or beyond its boundary. Consumers
> of the corrupted entry either silently read wrong data or trigger an
> out-of-bounds access.
>
>   BUG: KASAN: slab-use-after-free in bpf_msg_pull_data (net/core/filter.c:2728)
>   Read of size 32752 at addr ffff8881042f0010 by task poc/130
>   Call Trace:
>    __asan_memcpy (mm/kasan/shadow.c:105)
>    bpf_msg_pull_data (net/core/filter.c:2728)
>    bpf_prog_run_pin_on_cpu (include/linux/bpf.h:1402)
>    sk_psock_msg_verdict (net/core/skmsg.c:934)
>    tcp_bpf_send_verdict (net/ipv4/tcp_bpf.c:421)
>    sock_sendmsg_nosec (net/socket.c:727)
>
> Fixes: 6fff607e2f14 ("bpf: sk_msg program helper bpf_msg_push_data")
> Reported-by: Xiang Mei <xmei5@asu.edu>
> Signed-off-by: Weiming Shi <bestswngs@gmail.com>


Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev>


Fix looks correct.
The bug only surfaces when the split branch is taken and start lands
inside a non-first SG element (offset > 0).

It took me 2 hours to reproduce it. Repos or self-tests are welcome next 
time you reported a bug...

> ---
>   net/core/filter.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 3e56b567bd18..f12fbc49bc03 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -2865,7 +2865,7 @@ BPF_CALL_4(bpf_msg_push_data, struct sk_msg *, msg, u32, start,
>
>   		psge->length = start - offset;
>   		rsge.length -= psge->length;
> -		rsge.offset += start;
> +		rsge.offset += start - offset;
>
>   		sk_msg_iter_var_next(i);
>   		sg_unmark_end(psge);
> --
> 2.43.0

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-04-25  8:26 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-23 15:58 [PATCH bpf] bpf, sockmap: Fix wrong rsge offset in bpf_msg_push_data() Weiming Shi
2026-04-24 15:59 ` sashiko-bot
2026-04-24 19:21   ` Weiming Shi
2026-04-25  8:25 ` Jiayuan Chen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox