Linux kernel -stable discussions
 help / color / mirror / Atom feed
* [PATCH v2 1/3] bpf: cgroup: use kvfree() for replaced sysctl write buffer
       [not found] <20260529031026.2716641-1-dawei.feng@seu.edu.cn>
@ 2026-05-29  3:10 ` Dawei Feng
  2026-05-29  4:45   ` Jiayuan Chen
  2026-05-29  3:10 ` [PATCH v2 3/3] bpf: cgroup: restore sysctl new-value replacement Dawei Feng
  1 sibling, 1 reply; 5+ messages in thread
From: Dawei Feng @ 2026-05-29  3:10 UTC (permalink / raw)
  To: martin.lau
  Cc: emil, ast, daniel, andrii, eddyz87, memxor, song, yonghong.song,
	jolsa, kees, joel.granados, bpf, linux-kernel, linux-fsdevel,
	jianhao.xu, Dawei Feng, stable, Zilin Guan

proc_sys_call_handler() allocates its temporary sysctl buffer with
kvzalloc() and passes it to __cgroup_bpf_run_filter_sysctl(). Since
kvzalloc() may fall back to vmalloc() for large allocations, freeing
that buffer with kfree() is wrong and can corrupt memory.

Use kvfree() to safely handle both kmalloc and kvzalloc()/vmalloc
allocations.

The bug was first flagged by an experimental analysis tool we are
developing for kernel memory-management bugs while analyzing
v6.13-rc1. The tool is still under development and is not yet publicly
available. Manual inspection confirms that the bug is still
present in v7.1-rc5.

Reproduced the bug based on v7.1-rc4 in a QEMU x86_64 guest booted with
KASAN and CONFIG_FAILSLAB enabled. To exercise the replacement path, the
test tree also included the accompanying fix for the stale ret == 1
check in __cgroup_bpf_run_filter_sysctl(). The reproducer confines
failslab injections to the proc_sys_call_handler() range, uses
stacktrace-depth=32, and injects fail-nth=1 while writing 8191 bytes to
/proc/sys/kernel/domainname from a task in the target cgroup. Under
that setup, fail-nth=1 triggered the fault:

  BUG: unable to handle page fault for address: ffffeb0200024d48
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 0 P4D 0
  Oops: Oops: 0000  SMP KASAN NOPTI
  CPU: 2 UID: 0 PID: 209 Comm: repro_proc_sys_ Not tainted 7.1.0-rc4-00686-g97625979a5d4  PREEMPT(lazy)
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
  RIP: 0010:kfree+0x6e/0x510
  Code: 80 48 01 ef 0f 82 ae 04 00 00 48 c7 c0 00 00 00 80 48 2b 05 04 1b 23 04 48 01 c7 48 c1 ef 0c 48 c1 e7 06 48 03 3d e2 1a 23 04 <4c> 8b 57 08 4c 89 d0 83 e0 01 48 83 e8 01 49 09 c2 49 >
  RSP: 0018:ffff888108de7ab8 EFLAGS: 00010282
  RAX: 0000777f80000000 RBX: ffff88815af398c0 RCX: 0000000000000080
  RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffeb0200024d40
  RBP: ffffc90000935000 R08: 0000000000000001 R09: 0000000000000001
  R10: ffffffff86b4b297 R11: 0000000000000000 R12: ffffffff819b71fd
  R13: 0000000000000001 R14: ffff888108de7cc0 R15: 0000000000000000
  FS:  00007f8988cc2b80(0000) GS:ffff8881d3256000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: ffffeb0200024d48 CR3: 0000000101d6b000 CR4: 0000000000350ef0
  Call Trace:
   <TASK>
   ? __cgroup_bpf_run_filter_sysctl+0x626/0xc30
   __cgroup_bpf_run_filter_sysctl+0x74d/0xc30
   ? __pfx___cgroup_bpf_run_filter_sysctl+0x10/0x10
   ? srso_return_thunk+0x5/0x5f
   ? __kvmalloc_node_noprof+0x345/0x870
   ? proc_sys_call_handler+0x250/0x480
   ? srso_return_thunk+0x5/0x5f
   proc_sys_call_handler+0x3a2/0x480
   ? __pfx_proc_sys_call_handler+0x10/0x10
   ? srso_return_thunk+0x5/0x5f
   ? selinux_file_permission+0x39f/0x500
   ? srso_return_thunk+0x5/0x5f
   ? lock_is_held_type+0x9e/0x120
   vfs_write+0x98e/0x1000
   ? srso_return_thunk+0x5/0x5f
   ? kmem_cache_free+0x308/0x550
   ? __pfx_vfs_write+0x10/0x10
   ? __pfx_do_sys_openat2+0x10/0x10
   ksys_write+0xf2/0x1d0
   ? __pfx_ksys_write+0x10/0x10
   ? srso_return_thunk+0x5/0x5f
   ? trace_irq_enable.constprop.0+0x110/0x140
   do_syscall_64+0x115/0x690
   entry_SYSCALL_64_after_hwframe+0x77/0x7f
   RIP: 0033:0x7f8988dd8907
   Code: 10 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8  01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 >
   RSP: 002b:00007fff4069b878 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
   RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f8988dd8907
   RDX: 0000000000001fff RSI: 0000564f97ef46b0 RDI: 0000000000000005
   RBP: 0000564f97ef46b0 R08: 0000000000000000 R09: 0000564f97ef46b0
   R10: 0000000000000004 R11: 0000000000000246 R12: 0000000000000000
   R13: 0000000000001fff R14: 0000000000000005 R15: 0000000000000001
   </TASK>
With this fix applied on top of the same test setup, rerunning the
reproducer with fail-nth=1 yields no corresponding Oops reports.

Fixes: 4508943794ef ("proc: use kvzalloc for our kernel buffer")
Cc: stable@vger.kernel.org

Signed-off-by: Zilin Guan <zilin@seu.edu.cn>
Signed-off-by: Dawei Feng <dawei.feng@seu.edu.cn>
---
 kernel/bpf/cgroup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index 876f6a81a9b6..faadcfb9b5e5 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -1936,7 +1936,7 @@ int __cgroup_bpf_run_filter_sysctl(struct ctl_table_header *head,
 	kfree(ctx.cur_val);
 
 	if (ret == 1 && ctx.new_updated) {
-		kfree(*buf);
+		kvfree(*buf);
 		*buf = ctx.new_val;
 		*pcount = ctx.new_len;
 	} else {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v2 3/3] bpf: cgroup: restore sysctl new-value replacement
       [not found] <20260529031026.2716641-1-dawei.feng@seu.edu.cn>
  2026-05-29  3:10 ` [PATCH v2 1/3] bpf: cgroup: use kvfree() for replaced sysctl write buffer Dawei Feng
@ 2026-05-29  3:10 ` Dawei Feng
  2026-05-29  3:56   ` bot+bpf-ci
  2026-05-29  4:51   ` Jiayuan Chen
  1 sibling, 2 replies; 5+ messages in thread
From: Dawei Feng @ 2026-05-29  3:10 UTC (permalink / raw)
  To: martin.lau
  Cc: emil, ast, daniel, andrii, eddyz87, memxor, song, yonghong.song,
	jolsa, kees, joel.granados, bpf, linux-kernel, linux-fsdevel,
	jianhao.xu, Dawei Feng, stable, Zilin Guan

Commit 4e63acdff864 ("bpf: Introduce bpf_sysctl_{get,set}_new_value
helpers") changed the success return value to 0, but failed to update the
corresponding check in __cgroup_bpf_run_filter_sysctl(). Since
bpf_prog_run_array_cg() now returns 0 on success, the legacy ret == 1
condition is never satisfied. As a result, the modified value is ignored,
and bpf_sysctl_set_new_value() fails to replace the write buffer.

Fix this by checking for a return value of 0 instead, so cgroup/sysctl
programs can correctly replace the pending sysctl buffer.

This bug was discovered during a manual code review. Tested via a
cgroup/sysctl BPF reproducer overriding writes to a target sysctl.
Pre-fix, bpf_sysctl_set_new_value("foo") was silently ignored: the write
returned 8192 and the value remained "600". Post-fix, the BPF replacement
buffer properly propagates: the write returns 3 and the value updates to
"foo".

Fixes: 4e63acdff864 ("bpf: Introduce bpf_sysctl_{get,set}_new_value helpers")
Cc: stable@vger.kernel.org

Signed-off-by: Zilin Guan <zilin@seu.edu.cn>
Signed-off-by: Dawei Feng <dawei.feng@seu.edu.cn>
---
 kernel/bpf/cgroup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index a0b5f8cd8b10..3f06e2270f5c 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -1935,7 +1935,7 @@ int __cgroup_bpf_run_filter_sysctl(struct ctl_table_header *head,
 
 	kfree(ctx.cur_val);
 
-	if (ret == 1 && ctx.new_updated) {
+	if (!ret && ctx.new_updated) {
 		kvfree(*buf);
 		*buf = ctx.new_val;
 		*pcount = ctx.new_len;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 3/3] bpf: cgroup: restore sysctl new-value replacement
  2026-05-29  3:10 ` [PATCH v2 3/3] bpf: cgroup: restore sysctl new-value replacement Dawei Feng
@ 2026-05-29  3:56   ` bot+bpf-ci
  2026-05-29  4:51   ` Jiayuan Chen
  1 sibling, 0 replies; 5+ messages in thread
From: bot+bpf-ci @ 2026-05-29  3:56 UTC (permalink / raw)
  To: dawei.feng, martin.lau
  Cc: emil, ast, daniel, andrii, eddyz87, memxor, song, yonghong.song,
	jolsa, kees, joel.granados, bpf, linux-kernel, linux-fsdevel,
	jianhao.xu, dawei.feng, stable, zilin, ast, andrii, daniel,
	martin.lau, eddyz87, yonghong.song, clm, ihor.solodrai

[-- Attachment #1: Type: text/plain, Size: 1962 bytes --]

> diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
> index a0b5f8cd8b10..3f06e2270f5c 100644
> --- a/kernel/bpf/cgroup.c
> +++ b/kernel/bpf/cgroup.c
> @@ -1935,7 +1935,7 @@ int __cgroup_bpf_run_filter_sysctl(struct ctl_table_header *head,
>
>  	kfree(ctx.cur_val);
>
> -	if (ret == 1 && ctx.new_updated) {
> +	if (!ret && ctx.new_updated) {
>  		kvfree(*buf);
>  		*buf = ctx.new_val;
>  		*pcount = ctx.new_len;

With the success check now testing !ret, does the kerneldoc above
__cgroup_bpf_run_filter_sysctl() still describe the right convention?  It
currently reads:

 * This function will return %-EPERM if an attached program is found and
 * returned value != 1 during execution. In all other cases 0 is returned.

Since bpf_prog_run_array_cg() returns 0 on success rather than 1, should
that line say "returned value != 0" to match the new check?

This point was raised in v1 review but does not appear to have been
changed in v2.

Now for the Fixes: tag.  The commit message states:

> Fixes: 4e63acdff864 ("bpf: Introduce bpf_sysctl_{get,set}_new_value helpers")

Is this the commit that actually introduced the problem?  At 4e63acdff864
the runner was BPF_PROG_RUN_ARRAY, which returned 1 on success, so the
ret == 1 check was correct at that time.

There appear to be two candidate commits that changed the return
convention to 0-on-success and left the ret == 1 check stale:

  c4dcfdd406aa ("bpf: Move getsockopt retval to struct bpf_cg_run_ctx")
  f10d05966196 ("bpf: Make BPF_PROG_RUN_ARRAY return -err instead of allow boolean")

Should the Fixes: tag point at one of these instead of 4e63acdff864?

The f10d05966196 suggestion was raised by the bpf-ci review in v1 but the
tag was unchanged in v2.


---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/26616337824

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 1/3] bpf: cgroup: use kvfree() for replaced sysctl write buffer
  2026-05-29  3:10 ` [PATCH v2 1/3] bpf: cgroup: use kvfree() for replaced sysctl write buffer Dawei Feng
@ 2026-05-29  4:45   ` Jiayuan Chen
  0 siblings, 0 replies; 5+ messages in thread
From: Jiayuan Chen @ 2026-05-29  4:45 UTC (permalink / raw)
  To: Dawei Feng, martin.lau
  Cc: emil, ast, daniel, andrii, eddyz87, memxor, song, yonghong.song,
	jolsa, kees, joel.granados, bpf, linux-kernel, linux-fsdevel,
	jianhao.xu, stable, Zilin Guan


On 5/29/26 11:10 AM, Dawei Feng wrote:
> proc_sys_call_handler() allocates its temporary sysctl buffer with
> kvzalloc() and passes it to __cgroup_bpf_run_filter_sysctl(). Since
> kvzalloc() may fall back to vmalloc() for large allocations, freeing
> that buffer with kfree() is wrong and can corrupt memory.
>
> Use kvfree() to safely handle both kmalloc and kvzalloc()/vmalloc
> allocations.
>
> The bug was first flagged by an experimental analysis tool we are
> developing for kernel memory-management bugs while analyzing
> v6.13-rc1. The tool is still under development and is not yet publicly
> available. Manual inspection confirms that the bug is still
> present in v7.1-rc5.
>
> Reproduced the bug based on v7.1-rc4 in a QEMU x86_64 guest booted with
> KASAN and CONFIG_FAILSLAB enabled. To exercise the replacement path, the
> test tree also included the accompanying fix for the stale ret == 1
> check in __cgroup_bpf_run_filter_sysctl(). The reproducer confines
> failslab injections to the proc_sys_call_handler() range, uses
> stacktrace-depth=32, and injects fail-nth=1 while writing 8191 bytes to
> /proc/sys/kernel/domainname from a task in the target cgroup. Under
> that setup, fail-nth=1 triggered the fault:
>
>    BUG: unable to handle page fault for address: ffffeb0200024d48
>    #PF: supervisor read access in kernel mode
>    #PF: error_code(0x0000) - not-present page
>    PGD 0 P4D 0
>    Oops: Oops: 0000  SMP KASAN NOPTI
>    CPU: 2 UID: 0 PID: 209 Comm: repro_proc_sys_ Not tainted 7.1.0-rc4-00686-g97625979a5d4  PREEMPT(lazy)
>    Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
>    RIP: 0010:kfree+0x6e/0x510
>    Code: 80 48 01 ef 0f 82 ae 04 00 00 48 c7 c0 00 00 00 80 48 2b 05 04 1b 23 04 48 01 c7 48 c1 ef 0c 48 c1 e7 06 48 03 3d e2 1a 23 04 <4c> 8b 57 08 4c 89 d0 83 e0 01 48 83 e8 01 49 09 c2 49 >
>    RSP: 0018:ffff888108de7ab8 EFLAGS: 00010282
>    RAX: 0000777f80000000 RBX: ffff88815af398c0 RCX: 0000000000000080
>    RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffeb0200024d40
>    RBP: ffffc90000935000 R08: 0000000000000001 R09: 0000000000000001
>    R10: ffffffff86b4b297 R11: 0000000000000000 R12: ffffffff819b71fd
>    R13: 0000000000000001 R14: ffff888108de7cc0 R15: 0000000000000000
>    FS:  00007f8988cc2b80(0000) GS:ffff8881d3256000(0000) knlGS:0000000000000000
>    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>    CR2: ffffeb0200024d48 CR3: 0000000101d6b000 CR4: 0000000000350ef0
>    Call Trace:
>     <TASK>
>     ? __cgroup_bpf_run_filter_sysctl+0x626/0xc30
>     __cgroup_bpf_run_filter_sysctl+0x74d/0xc30
>     ? __pfx___cgroup_bpf_run_filter_sysctl+0x10/0x10
>     ? srso_return_thunk+0x5/0x5f
>     ? __kvmalloc_node_noprof+0x345/0x870
>     ? proc_sys_call_handler+0x250/0x480
>     ? srso_return_thunk+0x5/0x5f
>     proc_sys_call_handler+0x3a2/0x480
>     ? __pfx_proc_sys_call_handler+0x10/0x10
>     ? srso_return_thunk+0x5/0x5f
>     ? selinux_file_permission+0x39f/0x500
>     ? srso_return_thunk+0x5/0x5f
>     ? lock_is_held_type+0x9e/0x120
>     vfs_write+0x98e/0x1000
>     ? srso_return_thunk+0x5/0x5f
>     ? kmem_cache_free+0x308/0x550
>     ? __pfx_vfs_write+0x10/0x10
>     ? __pfx_do_sys_openat2+0x10/0x10
>     ksys_write+0xf2/0x1d0
>     ? __pfx_ksys_write+0x10/0x10
>     ? srso_return_thunk+0x5/0x5f
>     ? trace_irq_enable.constprop.0+0x110/0x140
>     do_syscall_64+0x115/0x690
>     entry_SYSCALL_64_after_hwframe+0x77/0x7f
>     RIP: 0033:0x7f8988dd8907
>     Code: 10 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8  01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 >
>     RSP: 002b:00007fff4069b878 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
>     RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f8988dd8907
>     RDX: 0000000000001fff RSI: 0000564f97ef46b0 RDI: 0000000000000005
>     RBP: 0000564f97ef46b0 R08: 0000000000000000 R09: 0000564f97ef46b0
>     R10: 0000000000000004 R11: 0000000000000246 R12: 0000000000000000
>     R13: 0000000000001fff R14: 0000000000000005 R15: 0000000000000001
>     </TASK>
> With this fix applied on top of the same test setup, rerunning the
> reproducer with fail-nth=1 yields no corresponding Oops reports.
>
> Fixes: 4508943794ef ("proc: use kvzalloc for our kernel buffer")
> Cc: stable@vger.kernel.org
>
> Signed-off-by: Zilin Guan <zilin@seu.edu.cn>
> Signed-off-by: Dawei Feng <dawei.feng@seu.edu.cn>


Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 3/3] bpf: cgroup: restore sysctl new-value replacement
  2026-05-29  3:10 ` [PATCH v2 3/3] bpf: cgroup: restore sysctl new-value replacement Dawei Feng
  2026-05-29  3:56   ` bot+bpf-ci
@ 2026-05-29  4:51   ` Jiayuan Chen
  1 sibling, 0 replies; 5+ messages in thread
From: Jiayuan Chen @ 2026-05-29  4:51 UTC (permalink / raw)
  To: Dawei Feng, martin.lau
  Cc: emil, ast, daniel, andrii, eddyz87, memxor, song, yonghong.song,
	jolsa, kees, joel.granados, bpf, linux-kernel, linux-fsdevel,
	jianhao.xu, stable, Zilin Guan


On 5/29/26 11:10 AM, Dawei Feng wrote:
> Commit 4e63acdff864 ("bpf: Introduce bpf_sysctl_{get,set}_new_value
> helpers") changed the success return value to 0, but failed to update the
> corresponding check in __cgroup_bpf_run_filter_sysctl(). Since
> bpf_prog_run_array_cg() now returns 0 on success, the legacy ret == 1
> condition is never satisfied. As a result, the modified value is ignored,
> and bpf_sysctl_set_new_value() fails to replace the write buffer.
>
> Fix this by checking for a return value of 0 instead, so cgroup/sysctl
> programs can correctly replace the pending sysctl buffer.
>
> This bug was discovered during a manual code review. Tested via a
> cgroup/sysctl BPF reproducer overriding writes to a target sysctl.
> Pre-fix, bpf_sysctl_set_new_value("foo") was silently ignored: the write
> returned 8192 and the value remained "600". Post-fix, the BPF replacement
> buffer properly propagates: the write returns 3 and the value updates to
> "foo".
>
> Fixes: 4e63acdff864 ("bpf: Introduce bpf_sysctl_{get,set}_new_value helpers")


Change is fine but fixes tag is wrong like the bot said.

> Cc: stable@vger.kernel.org
>
> Signed-off-by: Zilin Guan <zilin@seu.edu.cn>
> Signed-off-by: Dawei Feng <dawei.feng@seu.edu.cn>

I saw a reviewed-by tag given by Emil in previous thread. Pls carry it 
when sending new verson.


> ---
>   kernel/bpf/cgroup.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
> index a0b5f8cd8b10..3f06e2270f5c 100644
> --- a/kernel/bpf/cgroup.c
> +++ b/kernel/bpf/cgroup.c
> @@ -1935,7 +1935,7 @@ int __cgroup_bpf_run_filter_sysctl(struct ctl_table_header *head,
>   
>   	kfree(ctx.cur_val);
>   
> -	if (ret == 1 && ctx.new_updated) {
> +	if (!ret && ctx.new_updated) {
>   		kvfree(*buf);
>   		*buf = ctx.new_val;
>   		*pcount = ctx.new_len;

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-05-29  4:51 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20260529031026.2716641-1-dawei.feng@seu.edu.cn>
2026-05-29  3:10 ` [PATCH v2 1/3] bpf: cgroup: use kvfree() for replaced sysctl write buffer Dawei Feng
2026-05-29  4:45   ` Jiayuan Chen
2026-05-29  3:10 ` [PATCH v2 3/3] bpf: cgroup: restore sysctl new-value replacement Dawei Feng
2026-05-29  3:56   ` bot+bpf-ci
2026-05-29  4:51   ` Jiayuan Chen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox