* [PATCH bpf] bpf, sockmap: zero-initialize pages allocated in bpf_msg_push_data
@ 2026-04-24 19:03 Weiming Shi
2026-04-25 3:17 ` Jiayuan Chen
0 siblings, 1 reply; 4+ messages in thread
From: Weiming Shi @ 2026-04-24 19:03 UTC (permalink / raw)
To: Martin KaFai Lau, Daniel Borkmann, Alexei Starovoitov,
Andrii Nakryiko, Eduard Zingerman, Kumar Kartikeya Dwivedi,
David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
Cc: John Fastabend, Stanislav Fomichev, Song Liu, Yonghong Song,
Jiri Olsa, Simon Horman, bpf, netdev, Xiang Mei, Weiming Shi,
Xinyu Ma
bpf_msg_push_data() allocates pages via alloc_pages() without
__GFP_ZERO. In the non-copy path, the entire page of uninitialized
heap content is added directly to the sk_msg scatterlist, which is
then transmitted over TCP to userspace via tcp_bpf_push(). In the
copy path, a gap of len bytes between the front and back memcpy
regions is similarly left uninitialized.
This leads to a kernel heap information leak: stale page content
including kernel pointers from the direct-map and vmemmap regions
is transmitted to userspace, which can be used to defeat KASLR.
Add __GFP_ZERO to the alloc_pages() call to ensure the allocated
page is always zeroed before it enters the scatterlist.
Link: https://lore.kernel.org/all/20260424155913.A19FDC19425@smtp.kernel.org
Fixes: 6fff607e2f14 ("bpf: sk_msg program helper bpf_msg_push_data")
Tested-by: Xiang Mei <xmei5@asu.edu>
Tested-by: Xinyu Ma <mmmxny@gmail.com>
Signed-off-by: Weiming Shi <bestswngs@gmail.com>
---
net/core/filter.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/core/filter.c b/net/core/filter.c
index bc96c18df4e0..ea02239892fd 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2820,7 +2820,7 @@ BPF_CALL_4(bpf_msg_push_data, struct sk_msg *, msg, u32, start,
if (!space || (space == 1 && start != offset))
copy = msg->sg.data[i].length;
- page = alloc_pages(__GFP_NOWARN | GFP_ATOMIC | __GFP_COMP,
+ page = alloc_pages(__GFP_NOWARN | GFP_ATOMIC | __GFP_COMP | __GFP_ZERO,
get_order(copy + len));
if (unlikely(!page))
return -ENOMEM;
--
2.43.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH bpf] bpf, sockmap: zero-initialize pages allocated in bpf_msg_push_data
2026-04-24 19:03 [PATCH bpf] bpf, sockmap: zero-initialize pages allocated in bpf_msg_push_data Weiming Shi
@ 2026-04-25 3:17 ` Jiayuan Chen
2026-04-25 17:59 ` Weiming Shi
0 siblings, 1 reply; 4+ messages in thread
From: Jiayuan Chen @ 2026-04-25 3:17 UTC (permalink / raw)
To: Weiming Shi, Martin KaFai Lau, Daniel Borkmann,
Alexei Starovoitov, Andrii Nakryiko, Eduard Zingerman,
Kumar Kartikeya Dwivedi, David S . Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni
Cc: John Fastabend, Stanislav Fomichev, Song Liu, Yonghong Song,
Jiri Olsa, Simon Horman, bpf, netdev, Xiang Mei, Xinyu Ma
On 4/25/26 3:03 AM, Weiming Shi wrote:
> bpf_msg_push_data() allocates pages via alloc_pages() without
> __GFP_ZERO. In the non-copy path, the entire page of uninitialized
> heap content is added directly to the sk_msg scatterlist, which is
> then transmitted over TCP to userspace via tcp_bpf_push(). In the
> copy path, a gap of len bytes between the front and back memcpy
> regions is similarly left uninitialized.
>
> This leads to a kernel heap information leak: stale page content
> including kernel pointers from the direct-map and vmemmap regions
> is transmitted to userspace, which can be used to defeat KASLR.
>
> Add __GFP_ZERO to the alloc_pages() call to ensure the allocated
> page is always zeroed before it enters the scatterlist.
As the helper's own documentation says:
If a program of type BPF_PROG_TYPE_SK_MSG is run on a msg it may
want to insert metadata or options into the msg. This can later be
read and used by any of the lower layer BPF hooks.
The inserted region is meant to be written by the BPF program — that's
the entire point of calling push.
If the program doesn't fill it, the push has no purpose to begin with.
Isn't the uninitialized content a bug in the BPF program rather than
something the kernel helper should paper over?
> Link: https://lore.kernel.org/all/20260424155913.A19FDC19425@smtp.kernel.org
> Fixes: 6fff607e2f14 ("bpf: sk_msg program helper bpf_msg_push_data")
> Tested-by: Xiang Mei <xmei5@asu.edu>
> Tested-by: Xinyu Ma <mmmxny@gmail.com>
> Signed-off-by: Weiming Shi <bestswngs@gmail.com>
> ---
> net/core/filter.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/core/filter.c b/net/core/filter.c
> index bc96c18df4e0..ea02239892fd 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -2820,7 +2820,7 @@ BPF_CALL_4(bpf_msg_push_data, struct sk_msg *, msg, u32, start,
> if (!space || (space == 1 && start != offset))
> copy = msg->sg.data[i].length;
>
> - page = alloc_pages(__GFP_NOWARN | GFP_ATOMIC | __GFP_COMP,
> + page = alloc_pages(__GFP_NOWARN | GFP_ATOMIC | __GFP_COMP | __GFP_ZERO,
> get_order(copy + len));
> if (unlikely(!page))
> return -ENOMEM;
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH bpf] bpf, sockmap: zero-initialize pages allocated in bpf_msg_push_data
2026-04-25 3:17 ` Jiayuan Chen
@ 2026-04-25 17:59 ` Weiming Shi
2026-04-26 6:31 ` Jiayuan Chen
0 siblings, 1 reply; 4+ messages in thread
From: Weiming Shi @ 2026-04-25 17:59 UTC (permalink / raw)
To: Jiayuan Chen
Cc: Martin KaFai Lau, Daniel Borkmann, Alexei Starovoitov,
Andrii Nakryiko, Eduard Zingerman, Kumar Kartikeya Dwivedi,
David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
John Fastabend, Stanislav Fomichev, Song Liu, Yonghong Song,
Jiri Olsa, Simon Horman, bpf, netdev, Xiang Mei, Xinyu Ma
On 26-04-25 11:17, Jiayuan Chen wrote:
>
> On 4/25/26 3:03 AM, Weiming Shi wrote:
> > bpf_msg_push_data() allocates pages via alloc_pages() without
> > __GFP_ZERO. In the non-copy path, the entire page of uninitialized
> > heap content is added directly to the sk_msg scatterlist, which is
> > then transmitted over TCP to userspace via tcp_bpf_push(). In the
> > copy path, a gap of len bytes between the front and back memcpy
> > regions is similarly left uninitialized.
> >
> > This leads to a kernel heap information leak: stale page content
> > including kernel pointers from the direct-map and vmemmap regions
> > is transmitted to userspace, which can be used to defeat KASLR.
> >
> > Add __GFP_ZERO to the alloc_pages() call to ensure the allocated
> > page is always zeroed before it enters the scatterlist.
>
>
>
> As the helper's own documentation says:
>
> If a program of type BPF_PROG_TYPE_SK_MSG is run on a msg it may
> want to insert metadata or options into the msg. This can later be
> read and used by any of the lower layer BPF hooks.
>
> The inserted region is meant to be written by the BPF program — that's the
> entire point of calling push.
>
> If the program doesn't fill it, the push has no purpose to begin with.
>
>
> Isn't the uninitialized content a bug in the BPF program rather than
> something the kernel helper should paper over?
>
Hi, Thanks for the review.
In my testing a process with only CAP_BPF + CAP_NET_ADMIN can receive
kernel heap and vmalloc pointers through recv() from the uninitialized
pushed region. The uninitialized memory contains critical kernel metadata
such as direct-map and vmalloc pointers, which breaks KASLR.
Kernels without CONFIG_INIT_ON_ALLOC_DEFAULT_ON (e.g. RHEL) are
directly affected the leak is not masked by any mitigation.
Thanks,
Weiming Shi
>
> > Link: https://lore.kernel.org/all/20260424155913.A19FDC19425@smtp.kernel.org
> > Fixes: 6fff607e2f14 ("bpf: sk_msg program helper bpf_msg_push_data")
> > Tested-by: Xiang Mei <xmei5@asu.edu>
> > Tested-by: Xinyu Ma <mmmxny@gmail.com>
> > Signed-off-by: Weiming Shi <bestswngs@gmail.com>
> > ---
> > net/core/filter.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index bc96c18df4e0..ea02239892fd 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -2820,7 +2820,7 @@ BPF_CALL_4(bpf_msg_push_data, struct sk_msg *, msg, u32, start,
> > if (!space || (space == 1 && start != offset))
> > copy = msg->sg.data[i].length;
> > - page = alloc_pages(__GFP_NOWARN | GFP_ATOMIC | __GFP_COMP,
> > + page = alloc_pages(__GFP_NOWARN | GFP_ATOMIC | __GFP_COMP | __GFP_ZERO,
> > get_order(copy + len));
> > if (unlikely(!page))
> > return -ENOMEM;
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH bpf] bpf, sockmap: zero-initialize pages allocated in bpf_msg_push_data
2026-04-25 17:59 ` Weiming Shi
@ 2026-04-26 6:31 ` Jiayuan Chen
0 siblings, 0 replies; 4+ messages in thread
From: Jiayuan Chen @ 2026-04-26 6:31 UTC (permalink / raw)
To: Weiming Shi, Jiayuan Chen
Cc: Martin KaFai Lau, Daniel Borkmann, Alexei Starovoitov,
Andrii Nakryiko, Eduard Zingerman, Kumar Kartikeya Dwivedi,
David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
John Fastabend, Stanislav Fomichev, Song Liu, Yonghong Song,
Jiri Olsa, Simon Horman, bpf, netdev, Xiang Mei, Xinyu Ma
On 4/26/26 1:59 AM, Weiming Shi wrote:
> On 26-04-25 11:17, Jiayuan Chen wrote:
>> On 4/25/26 3:03 AM, Weiming Shi wrote:
>>> bpf_msg_push_data() allocates pages via alloc_pages() without
>>> __GFP_ZERO. In the non-copy path, the entire page of uninitialized
>>> heap content is added directly to the sk_msg scatterlist, which is
>>> then transmitted over TCP to userspace via tcp_bpf_push(). In the
>>> copy path, a gap of len bytes between the front and back memcpy
>>> regions is similarly left uninitialized.
>>>
>>> This leads to a kernel heap information leak: stale page content
>>> including kernel pointers from the direct-map and vmemmap regions
>>> is transmitted to userspace, which can be used to defeat KASLR.
>>>
>>> Add __GFP_ZERO to the alloc_pages() call to ensure the allocated
>>> page is always zeroed before it enters the scatterlist.
>>
>>
>> As the helper's own documentation says:
>>
>> If a program of type BPF_PROG_TYPE_SK_MSG is run on a msg it may
>> want to insert metadata or options into the msg. This can later be
>> read and used by any of the lower layer BPF hooks.
>>
>> The inserted region is meant to be written by the BPF program — that's the
>> entire point of calling push.
>>
>> If the program doesn't fill it, the push has no purpose to begin with.
>>
>>
>> Isn't the uninitialized content a bug in the BPF program rather than
>> something the kernel helper should paper over?
>>
> Hi, Thanks for the review.
>
> In my testing a process with only CAP_BPF + CAP_NET_ADMIN can receive
> kernel heap and vmalloc pointers through recv() from the uninitialized
> pushed region. The uninitialized memory contains critical kernel metadata
> such as direct-map and vmalloc pointers, which breaks KASLR.
>
> Kernels without CONFIG_INIT_ON_ALLOC_DEFAULT_ON (e.g. RHEL) are
> directly affected the leak is not masked by any mitigation.
>
> Thanks,
> Weiming Shi
>
Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Previously I thought this was as same as bpf_xdp_adjust_head /
bpf_xdp_adjust_meta,
but the function itself allocates a page, I believed the cost of
GFP_ZERO flag was irrelevant.
Add one more thing: in the future, more and more AI systems will
complain about
this kind of problem. I believe it is worth it.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-04-26 6:31 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-24 19:03 [PATCH bpf] bpf, sockmap: zero-initialize pages allocated in bpf_msg_push_data Weiming Shi
2026-04-25 3:17 ` Jiayuan Chen
2026-04-25 17:59 ` Weiming Shi
2026-04-26 6:31 ` Jiayuan Chen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox