* [PATCH v2 1/3] bpf: cgroup: use kvfree() for replaced sysctl write buffer
2026-05-29 3:10 [PATCH v2 0/3] bpf: cgroup: fix sysctl new-value handling in __cgroup_bpf_run_filter_sysctl Dawei Feng
@ 2026-05-29 3:10 ` Dawei Feng
2026-05-29 4:45 ` Jiayuan Chen
2026-06-01 21:07 ` Yonghong Song
2026-05-29 3:10 ` [PATCH v2 2/3] bpf: cgroup: NUL-terminate replaced sysctl value Dawei Feng
` (2 subsequent siblings)
3 siblings, 2 replies; 18+ messages in thread
From: Dawei Feng @ 2026-05-29 3:10 UTC (permalink / raw)
To: martin.lau
Cc: emil, ast, daniel, andrii, eddyz87, memxor, song, yonghong.song,
jolsa, kees, joel.granados, bpf, linux-kernel, linux-fsdevel,
jianhao.xu, Dawei Feng, stable, Zilin Guan
proc_sys_call_handler() allocates its temporary sysctl buffer with
kvzalloc() and passes it to __cgroup_bpf_run_filter_sysctl(). Since
kvzalloc() may fall back to vmalloc() for large allocations, freeing
that buffer with kfree() is wrong and can corrupt memory.
Use kvfree() to safely handle both kmalloc and kvzalloc()/vmalloc
allocations.
The bug was first flagged by an experimental analysis tool we are
developing for kernel memory-management bugs while analyzing
v6.13-rc1. The tool is still under development and is not yet publicly
available. Manual inspection confirms that the bug is still
present in v7.1-rc5.
Reproduced the bug based on v7.1-rc4 in a QEMU x86_64 guest booted with
KASAN and CONFIG_FAILSLAB enabled. To exercise the replacement path, the
test tree also included the accompanying fix for the stale ret == 1
check in __cgroup_bpf_run_filter_sysctl(). The reproducer confines
failslab injections to the proc_sys_call_handler() range, uses
stacktrace-depth=32, and injects fail-nth=1 while writing 8191 bytes to
/proc/sys/kernel/domainname from a task in the target cgroup. Under
that setup, fail-nth=1 triggered the fault:
BUG: unable to handle page fault for address: ffffeb0200024d48
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: Oops: 0000 SMP KASAN NOPTI
CPU: 2 UID: 0 PID: 209 Comm: repro_proc_sys_ Not tainted 7.1.0-rc4-00686-g97625979a5d4 PREEMPT(lazy)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
RIP: 0010:kfree+0x6e/0x510
Code: 80 48 01 ef 0f 82 ae 04 00 00 48 c7 c0 00 00 00 80 48 2b 05 04 1b 23 04 48 01 c7 48 c1 ef 0c 48 c1 e7 06 48 03 3d e2 1a 23 04 <4c> 8b 57 08 4c 89 d0 83 e0 01 48 83 e8 01 49 09 c2 49 >
RSP: 0018:ffff888108de7ab8 EFLAGS: 00010282
RAX: 0000777f80000000 RBX: ffff88815af398c0 RCX: 0000000000000080
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffeb0200024d40
RBP: ffffc90000935000 R08: 0000000000000001 R09: 0000000000000001
R10: ffffffff86b4b297 R11: 0000000000000000 R12: ffffffff819b71fd
R13: 0000000000000001 R14: ffff888108de7cc0 R15: 0000000000000000
FS: 00007f8988cc2b80(0000) GS:ffff8881d3256000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffeb0200024d48 CR3: 0000000101d6b000 CR4: 0000000000350ef0
Call Trace:
<TASK>
? __cgroup_bpf_run_filter_sysctl+0x626/0xc30
__cgroup_bpf_run_filter_sysctl+0x74d/0xc30
? __pfx___cgroup_bpf_run_filter_sysctl+0x10/0x10
? srso_return_thunk+0x5/0x5f
? __kvmalloc_node_noprof+0x345/0x870
? proc_sys_call_handler+0x250/0x480
? srso_return_thunk+0x5/0x5f
proc_sys_call_handler+0x3a2/0x480
? __pfx_proc_sys_call_handler+0x10/0x10
? srso_return_thunk+0x5/0x5f
? selinux_file_permission+0x39f/0x500
? srso_return_thunk+0x5/0x5f
? lock_is_held_type+0x9e/0x120
vfs_write+0x98e/0x1000
? srso_return_thunk+0x5/0x5f
? kmem_cache_free+0x308/0x550
? __pfx_vfs_write+0x10/0x10
? __pfx_do_sys_openat2+0x10/0x10
ksys_write+0xf2/0x1d0
? __pfx_ksys_write+0x10/0x10
? srso_return_thunk+0x5/0x5f
? trace_irq_enable.constprop.0+0x110/0x140
do_syscall_64+0x115/0x690
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f8988dd8907
Code: 10 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 >
RSP: 002b:00007fff4069b878 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f8988dd8907
RDX: 0000000000001fff RSI: 0000564f97ef46b0 RDI: 0000000000000005
RBP: 0000564f97ef46b0 R08: 0000000000000000 R09: 0000564f97ef46b0
R10: 0000000000000004 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000001fff R14: 0000000000000005 R15: 0000000000000001
</TASK>
With this fix applied on top of the same test setup, rerunning the
reproducer with fail-nth=1 yields no corresponding Oops reports.
Fixes: 4508943794ef ("proc: use kvzalloc for our kernel buffer")
Cc: stable@vger.kernel.org
Signed-off-by: Zilin Guan <zilin@seu.edu.cn>
Signed-off-by: Dawei Feng <dawei.feng@seu.edu.cn>
---
kernel/bpf/cgroup.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index 876f6a81a9b6..faadcfb9b5e5 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -1936,7 +1936,7 @@ int __cgroup_bpf_run_filter_sysctl(struct ctl_table_header *head,
kfree(ctx.cur_val);
if (ret == 1 && ctx.new_updated) {
- kfree(*buf);
+ kvfree(*buf);
*buf = ctx.new_val;
*pcount = ctx.new_len;
} else {
--
2.34.1
^ permalink raw reply related [flat|nested] 18+ messages in thread* Re: [PATCH v2 1/3] bpf: cgroup: use kvfree() for replaced sysctl write buffer
2026-05-29 3:10 ` [PATCH v2 1/3] bpf: cgroup: use kvfree() for replaced sysctl write buffer Dawei Feng
@ 2026-05-29 4:45 ` Jiayuan Chen
2026-06-01 21:07 ` Yonghong Song
1 sibling, 0 replies; 18+ messages in thread
From: Jiayuan Chen @ 2026-05-29 4:45 UTC (permalink / raw)
To: Dawei Feng, martin.lau
Cc: emil, ast, daniel, andrii, eddyz87, memxor, song, yonghong.song,
jolsa, kees, joel.granados, bpf, linux-kernel, linux-fsdevel,
jianhao.xu, stable, Zilin Guan
On 5/29/26 11:10 AM, Dawei Feng wrote:
> proc_sys_call_handler() allocates its temporary sysctl buffer with
> kvzalloc() and passes it to __cgroup_bpf_run_filter_sysctl(). Since
> kvzalloc() may fall back to vmalloc() for large allocations, freeing
> that buffer with kfree() is wrong and can corrupt memory.
>
> Use kvfree() to safely handle both kmalloc and kvzalloc()/vmalloc
> allocations.
>
> The bug was first flagged by an experimental analysis tool we are
> developing for kernel memory-management bugs while analyzing
> v6.13-rc1. The tool is still under development and is not yet publicly
> available. Manual inspection confirms that the bug is still
> present in v7.1-rc5.
>
> Reproduced the bug based on v7.1-rc4 in a QEMU x86_64 guest booted with
> KASAN and CONFIG_FAILSLAB enabled. To exercise the replacement path, the
> test tree also included the accompanying fix for the stale ret == 1
> check in __cgroup_bpf_run_filter_sysctl(). The reproducer confines
> failslab injections to the proc_sys_call_handler() range, uses
> stacktrace-depth=32, and injects fail-nth=1 while writing 8191 bytes to
> /proc/sys/kernel/domainname from a task in the target cgroup. Under
> that setup, fail-nth=1 triggered the fault:
>
> BUG: unable to handle page fault for address: ffffeb0200024d48
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 0 P4D 0
> Oops: Oops: 0000 SMP KASAN NOPTI
> CPU: 2 UID: 0 PID: 209 Comm: repro_proc_sys_ Not tainted 7.1.0-rc4-00686-g97625979a5d4 PREEMPT(lazy)
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
> RIP: 0010:kfree+0x6e/0x510
> Code: 80 48 01 ef 0f 82 ae 04 00 00 48 c7 c0 00 00 00 80 48 2b 05 04 1b 23 04 48 01 c7 48 c1 ef 0c 48 c1 e7 06 48 03 3d e2 1a 23 04 <4c> 8b 57 08 4c 89 d0 83 e0 01 48 83 e8 01 49 09 c2 49 >
> RSP: 0018:ffff888108de7ab8 EFLAGS: 00010282
> RAX: 0000777f80000000 RBX: ffff88815af398c0 RCX: 0000000000000080
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffeb0200024d40
> RBP: ffffc90000935000 R08: 0000000000000001 R09: 0000000000000001
> R10: ffffffff86b4b297 R11: 0000000000000000 R12: ffffffff819b71fd
> R13: 0000000000000001 R14: ffff888108de7cc0 R15: 0000000000000000
> FS: 00007f8988cc2b80(0000) GS:ffff8881d3256000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: ffffeb0200024d48 CR3: 0000000101d6b000 CR4: 0000000000350ef0
> Call Trace:
> <TASK>
> ? __cgroup_bpf_run_filter_sysctl+0x626/0xc30
> __cgroup_bpf_run_filter_sysctl+0x74d/0xc30
> ? __pfx___cgroup_bpf_run_filter_sysctl+0x10/0x10
> ? srso_return_thunk+0x5/0x5f
> ? __kvmalloc_node_noprof+0x345/0x870
> ? proc_sys_call_handler+0x250/0x480
> ? srso_return_thunk+0x5/0x5f
> proc_sys_call_handler+0x3a2/0x480
> ? __pfx_proc_sys_call_handler+0x10/0x10
> ? srso_return_thunk+0x5/0x5f
> ? selinux_file_permission+0x39f/0x500
> ? srso_return_thunk+0x5/0x5f
> ? lock_is_held_type+0x9e/0x120
> vfs_write+0x98e/0x1000
> ? srso_return_thunk+0x5/0x5f
> ? kmem_cache_free+0x308/0x550
> ? __pfx_vfs_write+0x10/0x10
> ? __pfx_do_sys_openat2+0x10/0x10
> ksys_write+0xf2/0x1d0
> ? __pfx_ksys_write+0x10/0x10
> ? srso_return_thunk+0x5/0x5f
> ? trace_irq_enable.constprop.0+0x110/0x140
> do_syscall_64+0x115/0x690
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
> RIP: 0033:0x7f8988dd8907
> Code: 10 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 >
> RSP: 002b:00007fff4069b878 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f8988dd8907
> RDX: 0000000000001fff RSI: 0000564f97ef46b0 RDI: 0000000000000005
> RBP: 0000564f97ef46b0 R08: 0000000000000000 R09: 0000564f97ef46b0
> R10: 0000000000000004 R11: 0000000000000246 R12: 0000000000000000
> R13: 0000000000001fff R14: 0000000000000005 R15: 0000000000000001
> </TASK>
> With this fix applied on top of the same test setup, rerunning the
> reproducer with fail-nth=1 yields no corresponding Oops reports.
>
> Fixes: 4508943794ef ("proc: use kvzalloc for our kernel buffer")
> Cc: stable@vger.kernel.org
>
> Signed-off-by: Zilin Guan <zilin@seu.edu.cn>
> Signed-off-by: Dawei Feng <dawei.feng@seu.edu.cn>
Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev>
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH v2 1/3] bpf: cgroup: use kvfree() for replaced sysctl write buffer
2026-05-29 3:10 ` [PATCH v2 1/3] bpf: cgroup: use kvfree() for replaced sysctl write buffer Dawei Feng
2026-05-29 4:45 ` Jiayuan Chen
@ 2026-06-01 21:07 ` Yonghong Song
1 sibling, 0 replies; 18+ messages in thread
From: Yonghong Song @ 2026-06-01 21:07 UTC (permalink / raw)
To: Dawei Feng, martin.lau
Cc: emil, ast, daniel, andrii, eddyz87, memxor, song, jolsa, kees,
joel.granados, bpf, linux-kernel, linux-fsdevel, jianhao.xu,
stable, Zilin Guan
On 5/28/26 8:10 PM, Dawei Feng wrote:
> proc_sys_call_handler() allocates its temporary sysctl buffer with
> kvzalloc() and passes it to __cgroup_bpf_run_filter_sysctl(). Since
> kvzalloc() may fall back to vmalloc() for large allocations, freeing
> that buffer with kfree() is wrong and can corrupt memory.
>
> Use kvfree() to safely handle both kmalloc and kvzalloc()/vmalloc
> allocations.
>
> The bug was first flagged by an experimental analysis tool we are
> developing for kernel memory-management bugs while analyzing
> v6.13-rc1. The tool is still under development and is not yet publicly
> available. Manual inspection confirms that the bug is still
> present in v7.1-rc5.
>
> Reproduced the bug based on v7.1-rc4 in a QEMU x86_64 guest booted with
> KASAN and CONFIG_FAILSLAB enabled. To exercise the replacement path, the
> test tree also included the accompanying fix for the stale ret == 1
> check in __cgroup_bpf_run_filter_sysctl(). The reproducer confines
> failslab injections to the proc_sys_call_handler() range, uses
> stacktrace-depth=32, and injects fail-nth=1 while writing 8191 bytes to
> /proc/sys/kernel/domainname from a task in the target cgroup. Under
> that setup, fail-nth=1 triggered the fault:
>
> BUG: unable to handle page fault for address: ffffeb0200024d48
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 0 P4D 0
> Oops: Oops: 0000 SMP KASAN NOPTI
> CPU: 2 UID: 0 PID: 209 Comm: repro_proc_sys_ Not tainted 7.1.0-rc4-00686-g97625979a5d4 PREEMPT(lazy)
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
> RIP: 0010:kfree+0x6e/0x510
> Code: 80 48 01 ef 0f 82 ae 04 00 00 48 c7 c0 00 00 00 80 48 2b 05 04 1b 23 04 48 01 c7 48 c1 ef 0c 48 c1 e7 06 48 03 3d e2 1a 23 04 <4c> 8b 57 08 4c 89 d0 83 e0 01 48 83 e8 01 49 09 c2 49 >
> RSP: 0018:ffff888108de7ab8 EFLAGS: 00010282
> RAX: 0000777f80000000 RBX: ffff88815af398c0 RCX: 0000000000000080
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffeb0200024d40
> RBP: ffffc90000935000 R08: 0000000000000001 R09: 0000000000000001
> R10: ffffffff86b4b297 R11: 0000000000000000 R12: ffffffff819b71fd
> R13: 0000000000000001 R14: ffff888108de7cc0 R15: 0000000000000000
> FS: 00007f8988cc2b80(0000) GS:ffff8881d3256000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: ffffeb0200024d48 CR3: 0000000101d6b000 CR4: 0000000000350ef0
> Call Trace:
> <TASK>
> ? __cgroup_bpf_run_filter_sysctl+0x626/0xc30
> __cgroup_bpf_run_filter_sysctl+0x74d/0xc30
> ? __pfx___cgroup_bpf_run_filter_sysctl+0x10/0x10
> ? srso_return_thunk+0x5/0x5f
> ? __kvmalloc_node_noprof+0x345/0x870
> ? proc_sys_call_handler+0x250/0x480
> ? srso_return_thunk+0x5/0x5f
> proc_sys_call_handler+0x3a2/0x480
> ? __pfx_proc_sys_call_handler+0x10/0x10
> ? srso_return_thunk+0x5/0x5f
> ? selinux_file_permission+0x39f/0x500
> ? srso_return_thunk+0x5/0x5f
> ? lock_is_held_type+0x9e/0x120
> vfs_write+0x98e/0x1000
> ? srso_return_thunk+0x5/0x5f
> ? kmem_cache_free+0x308/0x550
> ? __pfx_vfs_write+0x10/0x10
> ? __pfx_do_sys_openat2+0x10/0x10
> ksys_write+0xf2/0x1d0
> ? __pfx_ksys_write+0x10/0x10
> ? srso_return_thunk+0x5/0x5f
> ? trace_irq_enable.constprop.0+0x110/0x140
> do_syscall_64+0x115/0x690
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
> RIP: 0033:0x7f8988dd8907
> Code: 10 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 >
> RSP: 002b:00007fff4069b878 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f8988dd8907
> RDX: 0000000000001fff RSI: 0000564f97ef46b0 RDI: 0000000000000005
> RBP: 0000564f97ef46b0 R08: 0000000000000000 R09: 0000564f97ef46b0
> R10: 0000000000000004 R11: 0000000000000246 R12: 0000000000000000
> R13: 0000000000001fff R14: 0000000000000005 R15: 0000000000000001
> </TASK>
I think the above log can be simplified like
BUG: unable to handle page fault for address: ffffeb0200024d48
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: Oops: 0000 SMP KASAN NOPTI
CPU: 2 UID: 0 PID: 209 Comm: repro_proc_sys_ Not tainted 7.1.0-rc4-00686-g97625979a5d4 PREEMPT(lazy)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
RIP: 0010:kfree+0x6e/0x510
...
Call Trace:
<TASK>
? __cgroup_bpf_run_filter_sysctl+0x626/0xc30
__cgroup_bpf_run_filter_sysctl+0x74d/0xc30
? __pfx___cgroup_bpf_run_filter_sysctl+0x10/0x10
? srso_return_thunk+0x5/0x5f
? __kvmalloc_node_noprof+0x345/0x870
? proc_sys_call_handler+0x250/0x480
? srso_return_thunk+0x5/0x5f
proc_sys_call_handler+0x3a2/0x480
? __pfx_proc_sys_call_handler+0x10/0x10
? srso_return_thunk+0x5/0x5f
? selinux_file_permission+0x39f/0x500
? srso_return_thunk+0x5/0x5f
? lock_is_held_type+0x9e/0x120
vfs_write+0x98e/0x1000
...
> With this fix applied on top of the same test setup, rerunning the
> reproducer with fail-nth=1 yields no corresponding Oops reports.
>
> Fixes: 4508943794ef ("proc: use kvzalloc for our kernel buffer")
> Cc: stable@vger.kernel.org
>
> Signed-off-by: Zilin Guan <zilin@seu.edu.cn>
> Signed-off-by: Dawei Feng <dawei.feng@seu.edu.cn>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v2 2/3] bpf: cgroup: NUL-terminate replaced sysctl value
2026-05-29 3:10 [PATCH v2 0/3] bpf: cgroup: fix sysctl new-value handling in __cgroup_bpf_run_filter_sysctl Dawei Feng
2026-05-29 3:10 ` [PATCH v2 1/3] bpf: cgroup: use kvfree() for replaced sysctl write buffer Dawei Feng
@ 2026-05-29 3:10 ` Dawei Feng
2026-05-29 3:56 ` bot+bpf-ci
2026-06-01 21:22 ` Yonghong Song
2026-05-29 3:10 ` [PATCH v2 3/3] bpf: cgroup: restore sysctl new-value replacement Dawei Feng
2026-05-29 4:44 ` [PATCH v2 0/3] bpf: cgroup: fix sysctl new-value handling in __cgroup_bpf_run_filter_sysctl Jiayuan Chen
3 siblings, 2 replies; 18+ messages in thread
From: Dawei Feng @ 2026-05-29 3:10 UTC (permalink / raw)
To: martin.lau
Cc: emil, ast, daniel, andrii, eddyz87, memxor, song, yonghong.song,
jolsa, kees, joel.granados, bpf, linux-kernel, linux-fsdevel,
jianhao.xu, Dawei Feng, Zilin Guan
When writing to sysctls, proc_sys_call_handler() guarantees that the
buffer passed to proc handlers is NUL-terminated. If
bpf_sysctl_set_new_value() replaces the pending sysctl value, it can
hand a replacement buffer directly to proc handlers. However, the
helper currently copies only buf_len
bytes into that buffer without appending a NUL terminator, leaving
downstream parsers vulnerable to out-of-bounds access.
Fix this by appending a '\0' after the replaced value to restore the
expected sysctl semantics. Since the helper already rejects buf_len
greater than PAGE_SIZE - 1, there is always room for the extra byte.
Reproduced in a QEMU x86_64 guest booted with KASAN while exercising
the sysctl replacement path with a cgroup/sysctl BPF program. The
reproducer targets `/proc/sys/net/core/flow_limit_cpu_bitmap`, fills
the original user write buffer with non-zero bytes, and overrides the
sysctl value so the replacement buffer lacks a terminating NUL. Under
that setup, the pre-fix kernel reported:
BUG: KASAN: slab-out-of-bounds in strnchrnul+0x72/0x90
Read of size 1 at addr ffff88800de57000 by task repro_patch3/66
CPU: 0 UID: 0 PID: 66 Comm: repro_patch3 Not tainted 7.1.0-rc3-00269-g8370ca1f87cc #6 PREEMPT(lazy)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x68/0xa0
print_report+0xcb/0x5e0
? __virt_addr_valid+0x21d/0x3f0
? strnchrnul+0x72/0x90
? strnchrnul+0x72/0x90
kasan_report+0xca/0x100
? strnchrnul+0x72/0x90
strnchrnul+0x72/0x90
bitmap_parse+0x37/0x2e0
flow_limit_cpu_sysctl+0xc6/0x840
? __pfx_flow_limit_cpu_sysctl+0x10/0x10
? __kvmalloc_node_noprof+0x5ba/0x870
proc_sys_call_handler+0x31d/0x480
? __pfx_proc_sys_call_handler+0x10/0x10
? selinux_file_permission+0x39f/0x500
? lock_is_held_type+0x9e/0x120
vfs_write+0x98e/0x1000
? kmem_cache_free+0x308/0x550
? __pfx_vfs_write+0x10/0x10
? __pfx_do_sys_openat2+0x10/0x10
ksys_write+0xf2/0x1d0
? __pfx_ksys_write+0x10/0x10
? trace_irq_enable.constprop.0+0x110/0x140
do_syscall_64+0x115/0x690
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x447f37
Code: ff ff f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
RSP: 002b:00007fff01ade608 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 0000000000447f37
RDX: 0000000000001fff RSI: 00000000172b1780 RDI: 0000000000000005
RBP: 00000000172b1780 R08: 00000000004ca1b0 R09: 00000000172b1780
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000001fff
R13: 0000000000000000 R14: 0000000000000005 R15: 0000000000000003
</TASK>
The buggy address is located 0 bytes to the right of
allocated 4096-byte region [ffff88800de56000, ffff88800de57000)
With this fix applied, rerunning the same sysctl-targeted path yields
no corresponding KASAN reports.
Signed-off-by: Zilin Guan <zilin@seu.edu.cn>
Signed-off-by: Dawei Feng <dawei.feng@seu.edu.cn>
---
kernel/bpf/cgroup.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index faadcfb9b5e5..a0b5f8cd8b10 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -2342,6 +2342,7 @@ BPF_CALL_3(bpf_sysctl_set_new_value, struct bpf_sysctl_kern *, ctx,
return -E2BIG;
memcpy(ctx->new_val, buf, buf_len);
+ ((char *)ctx->new_val)[buf_len] = '\0';
ctx->new_len = buf_len;
ctx->new_updated = 1;
--
2.34.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH v2 2/3] bpf: cgroup: NUL-terminate replaced sysctl value
2026-05-29 3:10 ` [PATCH v2 2/3] bpf: cgroup: NUL-terminate replaced sysctl value Dawei Feng
@ 2026-05-29 3:56 ` bot+bpf-ci
2026-06-01 21:22 ` Yonghong Song
1 sibling, 0 replies; 18+ messages in thread
From: bot+bpf-ci @ 2026-05-29 3:56 UTC (permalink / raw)
To: dawei.feng, martin.lau
Cc: emil, ast, daniel, andrii, eddyz87, memxor, song, yonghong.song,
jolsa, kees, joel.granados, bpf, linux-kernel, linux-fsdevel,
jianhao.xu, dawei.feng, zilin, ast, andrii, daniel, martin.lau,
eddyz87, yonghong.song, clm, ihor.solodrai
[-- Attachment #1: Type: text/plain, Size: 1464 bytes --]
> When writing to sysctls, proc_sys_call_handler() guarantees that the
> buffer passed to proc handlers is NUL-terminated. If
> bpf_sysctl_set_new_value() replaces the pending sysctl value, it can
> hand a replacement buffer directly to proc handlers. However, the
> helper currently copies only buf_len
> bytes into that buffer without appending a NUL terminator, leaving
> downstream parsers vulnerable to out-of-bounds access.
>
> Fix this by appending a '\0' after the replaced value to restore the
> expected sysctl semantics. Since the helper already rejects buf_len
> greater than PAGE_SIZE - 1, there is always room for the extra byte.
>
> Reproduced in a QEMU x86_64 guest booted with KASAN while exercising
> the sysctl replacement path with a cgroup/sysctl BPF program.
>
> [ ... KASAN report snipped ... ]
>
> Signed-off-by: Zilin Guan <zilin@seu.edu.cn>
> Signed-off-by: Dawei Feng <dawei.feng@seu.edu.cn>
This fixes an out-of-bounds read in bpf_sysctl_set_new_value(). The
missing NUL terminator after the memcpy() of the replacement value looks
like it was introduced by the commit that added the helper. Should this
carry:
Fixes: 4e63acdff864 ("bpf: Introduce bpf_sysctl_{get,set}_new_value helpers")
---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/26616337824
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH v2 2/3] bpf: cgroup: NUL-terminate replaced sysctl value
2026-05-29 3:10 ` [PATCH v2 2/3] bpf: cgroup: NUL-terminate replaced sysctl value Dawei Feng
2026-05-29 3:56 ` bot+bpf-ci
@ 2026-06-01 21:22 ` Yonghong Song
2026-06-03 9:47 ` Dawei Feng
` (2 more replies)
1 sibling, 3 replies; 18+ messages in thread
From: Yonghong Song @ 2026-06-01 21:22 UTC (permalink / raw)
To: Dawei Feng, martin.lau
Cc: emil, ast, daniel, andrii, eddyz87, memxor, song, jolsa, kees,
joel.granados, bpf, linux-kernel, linux-fsdevel, jianhao.xu,
Zilin Guan
On 5/28/26 8:10 PM, Dawei Feng wrote:
> When writing to sysctls, proc_sys_call_handler() guarantees that the
> buffer passed to proc handlers is NUL-terminated. If
> bpf_sysctl_set_new_value() replaces the pending sysctl value, it can
> hand a replacement buffer directly to proc handlers. However, the
> helper currently copies only buf_len
> bytes into that buffer without appending a NUL terminator, leaving
> downstream parsers vulnerable to out-of-bounds access.
>
> Fix this by appending a '\0' after the replaced value to restore the
> expected sysctl semantics. Since the helper already rejects buf_len
> greater than PAGE_SIZE - 1, there is always room for the extra byte.
>
> Reproduced in a QEMU x86_64 guest booted with KASAN while exercising
> the sysctl replacement path with a cgroup/sysctl BPF program. The
> reproducer targets `/proc/sys/net/core/flow_limit_cpu_bitmap`, fills
> the original user write buffer with non-zero bytes, and overrides the
> sysctl value so the replacement buffer lacks a terminating NUL. Under
> that setup, the pre-fix kernel reported:
>
> BUG: KASAN: slab-out-of-bounds in strnchrnul+0x72/0x90
> Read of size 1 at addr ffff88800de57000 by task repro_patch3/66
> CPU: 0 UID: 0 PID: 66 Comm: repro_patch3 Not tainted 7.1.0-rc3-00269-g8370ca1f87cc #6 PREEMPT(lazy)
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
> Call Trace:
> <TASK>
> dump_stack_lvl+0x68/0xa0
> print_report+0xcb/0x5e0
> ? __virt_addr_valid+0x21d/0x3f0
> ? strnchrnul+0x72/0x90
> ? strnchrnul+0x72/0x90
> kasan_report+0xca/0x100
> ? strnchrnul+0x72/0x90
> strnchrnul+0x72/0x90
> bitmap_parse+0x37/0x2e0
> flow_limit_cpu_sysctl+0xc6/0x840
> ? __pfx_flow_limit_cpu_sysctl+0x10/0x10
> ? __kvmalloc_node_noprof+0x5ba/0x870
> proc_sys_call_handler+0x31d/0x480
> ? __pfx_proc_sys_call_handler+0x10/0x10
> ? selinux_file_permission+0x39f/0x500
> ? lock_is_held_type+0x9e/0x120
> vfs_write+0x98e/0x1000
> ? kmem_cache_free+0x308/0x550
> ? __pfx_vfs_write+0x10/0x10
> ? __pfx_do_sys_openat2+0x10/0x10
> ksys_write+0xf2/0x1d0
> ? __pfx_ksys_write+0x10/0x10
> ? trace_irq_enable.constprop.0+0x110/0x140
> do_syscall_64+0x115/0x690
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
> RIP: 0033:0x447f37
> Code: ff ff f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
> RSP: 002b:00007fff01ade608 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 0000000000447f37
> RDX: 0000000000001fff RSI: 00000000172b1780 RDI: 0000000000000005
> RBP: 00000000172b1780 R08: 00000000004ca1b0 R09: 00000000172b1780
> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000001fff
> R13: 0000000000000000 R14: 0000000000000005 R15: 0000000000000003
> </TASK>
> The buggy address is located 0 bytes to the right of
> allocated 4096-byte region [ffff88800de56000, ffff88800de57000)
The above log can be simplied.
>
> With this fix applied, rerunning the same sysctl-targeted path yields
> no corresponding KASAN reports.
>
> Signed-off-by: Zilin Guan <zilin@seu.edu.cn>
> Signed-off-by: Dawei Feng <dawei.feng@seu.edu.cn>
> ---
> kernel/bpf/cgroup.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
> index faadcfb9b5e5..a0b5f8cd8b10 100644
> --- a/kernel/bpf/cgroup.c
> +++ b/kernel/bpf/cgroup.c
> @@ -2342,6 +2342,7 @@ BPF_CALL_3(bpf_sysctl_set_new_value, struct bpf_sysctl_kern *, ctx,
> return -E2BIG;
>
> memcpy(ctx->new_val, buf, buf_len);
> + ((char *)ctx->new_val)[buf_len] = '\0';
Does memcpy(ctx->new_val, buf, buf_len + 1) work?
> ctx->new_len = buf_len;
> ctx->new_updated = 1;
>
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH v2 2/3] bpf: cgroup: NUL-terminate replaced sysctl value
2026-06-01 21:22 ` Yonghong Song
@ 2026-06-03 9:47 ` Dawei Feng
2026-06-03 10:01 ` Dawei Feng
2026-06-03 10:33 ` Dawei Feng
2 siblings, 0 replies; 18+ messages in thread
From: Dawei Feng @ 2026-06-03 9:47 UTC (permalink / raw)
To: yonghong.song
Cc: andrii, ast, bpf, daniel, dawei.feng, eddyz87, emil, jianhao.xu,
joel.granados, jolsa, kees, linux-fsdevel, linux-kernel,
martin.lau, memxor, song, zilin
Hi Yonghong,
On Mon, 1 Jun 2026 14:22:07 -0700, Yonghong Song wrote:
>The above log can be simplied.
Thanks. I will simplify the KASAN log in v3.
>> memcpy(ctx->new_val, buf, buf_len);
>> + ((char *)ctx->new_val)[buf_len] = '\0';
>
>Does memcpy(ctx->new_val, buf, buf_len + 1) work?
I do not think memcpy(ctx->new_val, buf, buf_len + 1) would be a good
fit here.
The helper interface only guarantees that buf is valid for buf_len
bytes, so reading one more byte would go past the declared input range.
Even if some callers happen to have a trailing '\0' right after the
payload, that is not part of the contract.
Appending the terminator on the destination side keeps the source read
strictly within buf_len, while still restoring the NUL-terminated
buffer semantics expected by downstream proc handlers. It is also safe
for the destination, since the helper already rejects
buf_len > PAGE_SIZE - 1.
Therefore, I would prefer to keep it as:
memcpy(ctx->new_val, buf, buf_len);
((char *)ctx->new_val)[buf_len] = '\0';
Best regards,
Dawei
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 2/3] bpf: cgroup: NUL-terminate replaced sysctl value
2026-06-01 21:22 ` Yonghong Song
2026-06-03 9:47 ` Dawei Feng
@ 2026-06-03 10:01 ` Dawei Feng
2026-06-03 10:33 ` Dawei Feng
2 siblings, 0 replies; 18+ messages in thread
From: Dawei Feng @ 2026-06-03 10:01 UTC (permalink / raw)
To: yonghong.song
Cc: andrii, ast, bpf, daniel, dawei.feng, eddyz87, emil, jianhao.xu,
joel.granados, jolsa, kees, linux-fsdevel, linux-kernel,
martin.lau, memxor, song, zilin
Hi Yonghong,
On Mon, 1 Jun 2026 14:22:07 -0700, Yonghong Song wrote:
>The above log can be simplied.
Thanks. I will simplify the KASAN log in v3.
>> memcpy(ctx->new_val, buf, buf_len);
>> + ((char *)ctx->new_val)[buf_len] = '\0';
>
>Does memcpy(ctx->new_val, buf, buf_len + 1) work?
I do not think memcpy(ctx->new_val, buf, buf_len + 1) would be a good
fit here.
The helper interface only guarantees that buf is valid for buf_len
bytes, so reading one more byte would go past the declared input range.
Even if some callers happen to have a trailing '\0' right after the
payload, that is not part of the contract.
Appending the terminator on the destination side keeps the source read
strictly within buf_len, while still restoring the NUL-terminated
buffer semantics expected by downstream proc handlers. It is also safe
for the destination, since the helper already rejects
buf_len > PAGE_SIZE - 1.
Therefore, I would prefer to keep it as:
memcpy(ctx->new_val, buf, buf_len);
((char *)ctx->new_val)[buf_len] = '\0';
Best regards,
Dawei
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 2/3] bpf: cgroup: NUL-terminate replaced sysctl value
2026-06-01 21:22 ` Yonghong Song
2026-06-03 9:47 ` Dawei Feng
2026-06-03 10:01 ` Dawei Feng
@ 2026-06-03 10:33 ` Dawei Feng
2 siblings, 0 replies; 18+ messages in thread
From: Dawei Feng @ 2026-06-03 10:33 UTC (permalink / raw)
To: yonghong.song
Cc: andrii, ast, bpf, daniel, dawei.feng, eddyz87, emil, jianhao.xu,
joel.granados, jolsa, kees, linux-fsdevel, linux-kernel,
martin.lau, memxor, song, zilin
Hi Yonghong,
On Mon, 1 Jun 2026 14:22:07 -0700, Yonghong Song wrote:
>The above log can be simplied.
Thanks. I will simplify the KASAN log in v3.
>> memcpy(ctx->new_val, buf, buf_len);
>> + ((char *)ctx->new_val)[buf_len] = '\0';
>
>Does memcpy(ctx->new_val, buf, buf_len + 1) work?
I do not think memcpy(ctx->new_val, buf, buf_len + 1) would be a good
fit here.
The helper interface only guarantees that buf is valid for buf_len
bytes, so reading one more byte would go past the declared input range.
Even if some callers happen to have a trailing '\0' right after the
payload, that is not part of the contract.
Appending the terminator on the destination side keeps the source read
strictly within buf_len, while still restoring the NUL-terminated
buffer semantics expected by downstream proc handlers. It is also safe
for the destination, since the helper already rejects
buf_len > PAGE_SIZE - 1.
Therefore, I would prefer to keep it as:
memcpy(ctx->new_val, buf, buf_len);
((char *)ctx->new_val)[buf_len] = '\0';
Best regards,
Dawei
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v2 3/3] bpf: cgroup: restore sysctl new-value replacement
2026-05-29 3:10 [PATCH v2 0/3] bpf: cgroup: fix sysctl new-value handling in __cgroup_bpf_run_filter_sysctl Dawei Feng
2026-05-29 3:10 ` [PATCH v2 1/3] bpf: cgroup: use kvfree() for replaced sysctl write buffer Dawei Feng
2026-05-29 3:10 ` [PATCH v2 2/3] bpf: cgroup: NUL-terminate replaced sysctl value Dawei Feng
@ 2026-05-29 3:10 ` Dawei Feng
2026-05-29 3:56 ` bot+bpf-ci
` (3 more replies)
2026-05-29 4:44 ` [PATCH v2 0/3] bpf: cgroup: fix sysctl new-value handling in __cgroup_bpf_run_filter_sysctl Jiayuan Chen
3 siblings, 4 replies; 18+ messages in thread
From: Dawei Feng @ 2026-05-29 3:10 UTC (permalink / raw)
To: martin.lau
Cc: emil, ast, daniel, andrii, eddyz87, memxor, song, yonghong.song,
jolsa, kees, joel.granados, bpf, linux-kernel, linux-fsdevel,
jianhao.xu, Dawei Feng, stable, Zilin Guan
Commit 4e63acdff864 ("bpf: Introduce bpf_sysctl_{get,set}_new_value
helpers") changed the success return value to 0, but failed to update the
corresponding check in __cgroup_bpf_run_filter_sysctl(). Since
bpf_prog_run_array_cg() now returns 0 on success, the legacy ret == 1
condition is never satisfied. As a result, the modified value is ignored,
and bpf_sysctl_set_new_value() fails to replace the write buffer.
Fix this by checking for a return value of 0 instead, so cgroup/sysctl
programs can correctly replace the pending sysctl buffer.
This bug was discovered during a manual code review. Tested via a
cgroup/sysctl BPF reproducer overriding writes to a target sysctl.
Pre-fix, bpf_sysctl_set_new_value("foo") was silently ignored: the write
returned 8192 and the value remained "600". Post-fix, the BPF replacement
buffer properly propagates: the write returns 3 and the value updates to
"foo".
Fixes: 4e63acdff864 ("bpf: Introduce bpf_sysctl_{get,set}_new_value helpers")
Cc: stable@vger.kernel.org
Signed-off-by: Zilin Guan <zilin@seu.edu.cn>
Signed-off-by: Dawei Feng <dawei.feng@seu.edu.cn>
---
kernel/bpf/cgroup.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index a0b5f8cd8b10..3f06e2270f5c 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -1935,7 +1935,7 @@ int __cgroup_bpf_run_filter_sysctl(struct ctl_table_header *head,
kfree(ctx.cur_val);
- if (ret == 1 && ctx.new_updated) {
+ if (!ret && ctx.new_updated) {
kvfree(*buf);
*buf = ctx.new_val;
*pcount = ctx.new_len;
--
2.34.1
^ permalink raw reply related [flat|nested] 18+ messages in thread* Re: [PATCH v2 3/3] bpf: cgroup: restore sysctl new-value replacement
2026-05-29 3:10 ` [PATCH v2 3/3] bpf: cgroup: restore sysctl new-value replacement Dawei Feng
@ 2026-05-29 3:56 ` bot+bpf-ci
2026-05-29 4:51 ` Jiayuan Chen
` (2 subsequent siblings)
3 siblings, 0 replies; 18+ messages in thread
From: bot+bpf-ci @ 2026-05-29 3:56 UTC (permalink / raw)
To: dawei.feng, martin.lau
Cc: emil, ast, daniel, andrii, eddyz87, memxor, song, yonghong.song,
jolsa, kees, joel.granados, bpf, linux-kernel, linux-fsdevel,
jianhao.xu, dawei.feng, stable, zilin, ast, andrii, daniel,
martin.lau, eddyz87, yonghong.song, clm, ihor.solodrai
[-- Attachment #1: Type: text/plain, Size: 1962 bytes --]
> diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
> index a0b5f8cd8b10..3f06e2270f5c 100644
> --- a/kernel/bpf/cgroup.c
> +++ b/kernel/bpf/cgroup.c
> @@ -1935,7 +1935,7 @@ int __cgroup_bpf_run_filter_sysctl(struct ctl_table_header *head,
>
> kfree(ctx.cur_val);
>
> - if (ret == 1 && ctx.new_updated) {
> + if (!ret && ctx.new_updated) {
> kvfree(*buf);
> *buf = ctx.new_val;
> *pcount = ctx.new_len;
With the success check now testing !ret, does the kerneldoc above
__cgroup_bpf_run_filter_sysctl() still describe the right convention? It
currently reads:
* This function will return %-EPERM if an attached program is found and
* returned value != 1 during execution. In all other cases 0 is returned.
Since bpf_prog_run_array_cg() returns 0 on success rather than 1, should
that line say "returned value != 0" to match the new check?
This point was raised in v1 review but does not appear to have been
changed in v2.
Now for the Fixes: tag. The commit message states:
> Fixes: 4e63acdff864 ("bpf: Introduce bpf_sysctl_{get,set}_new_value helpers")
Is this the commit that actually introduced the problem? At 4e63acdff864
the runner was BPF_PROG_RUN_ARRAY, which returned 1 on success, so the
ret == 1 check was correct at that time.
There appear to be two candidate commits that changed the return
convention to 0-on-success and left the ret == 1 check stale:
c4dcfdd406aa ("bpf: Move getsockopt retval to struct bpf_cg_run_ctx")
f10d05966196 ("bpf: Make BPF_PROG_RUN_ARRAY return -err instead of allow boolean")
Should the Fixes: tag point at one of these instead of 4e63acdff864?
The f10d05966196 suggestion was raised by the bpf-ci review in v1 but the
tag was unchanged in v2.
---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/26616337824
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH v2 3/3] bpf: cgroup: restore sysctl new-value replacement
2026-05-29 3:10 ` [PATCH v2 3/3] bpf: cgroup: restore sysctl new-value replacement Dawei Feng
2026-05-29 3:56 ` bot+bpf-ci
@ 2026-05-29 4:51 ` Jiayuan Chen
2026-05-29 6:34 ` sashiko-bot
2026-06-01 22:01 ` Yonghong Song
3 siblings, 0 replies; 18+ messages in thread
From: Jiayuan Chen @ 2026-05-29 4:51 UTC (permalink / raw)
To: Dawei Feng, martin.lau
Cc: emil, ast, daniel, andrii, eddyz87, memxor, song, yonghong.song,
jolsa, kees, joel.granados, bpf, linux-kernel, linux-fsdevel,
jianhao.xu, stable, Zilin Guan
On 5/29/26 11:10 AM, Dawei Feng wrote:
> Commit 4e63acdff864 ("bpf: Introduce bpf_sysctl_{get,set}_new_value
> helpers") changed the success return value to 0, but failed to update the
> corresponding check in __cgroup_bpf_run_filter_sysctl(). Since
> bpf_prog_run_array_cg() now returns 0 on success, the legacy ret == 1
> condition is never satisfied. As a result, the modified value is ignored,
> and bpf_sysctl_set_new_value() fails to replace the write buffer.
>
> Fix this by checking for a return value of 0 instead, so cgroup/sysctl
> programs can correctly replace the pending sysctl buffer.
>
> This bug was discovered during a manual code review. Tested via a
> cgroup/sysctl BPF reproducer overriding writes to a target sysctl.
> Pre-fix, bpf_sysctl_set_new_value("foo") was silently ignored: the write
> returned 8192 and the value remained "600". Post-fix, the BPF replacement
> buffer properly propagates: the write returns 3 and the value updates to
> "foo".
>
> Fixes: 4e63acdff864 ("bpf: Introduce bpf_sysctl_{get,set}_new_value helpers")
Change is fine but fixes tag is wrong like the bot said.
> Cc: stable@vger.kernel.org
>
> Signed-off-by: Zilin Guan <zilin@seu.edu.cn>
> Signed-off-by: Dawei Feng <dawei.feng@seu.edu.cn>
I saw a reviewed-by tag given by Emil in previous thread. Pls carry it
when sending new verson.
> ---
> kernel/bpf/cgroup.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
> index a0b5f8cd8b10..3f06e2270f5c 100644
> --- a/kernel/bpf/cgroup.c
> +++ b/kernel/bpf/cgroup.c
> @@ -1935,7 +1935,7 @@ int __cgroup_bpf_run_filter_sysctl(struct ctl_table_header *head,
>
> kfree(ctx.cur_val);
>
> - if (ret == 1 && ctx.new_updated) {
> + if (!ret && ctx.new_updated) {
> kvfree(*buf);
> *buf = ctx.new_val;
> *pcount = ctx.new_len;
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH v2 3/3] bpf: cgroup: restore sysctl new-value replacement
2026-05-29 3:10 ` [PATCH v2 3/3] bpf: cgroup: restore sysctl new-value replacement Dawei Feng
2026-05-29 3:56 ` bot+bpf-ci
2026-05-29 4:51 ` Jiayuan Chen
@ 2026-05-29 6:34 ` sashiko-bot
2026-06-01 22:01 ` Yonghong Song
3 siblings, 0 replies; 18+ messages in thread
From: sashiko-bot @ 2026-05-29 6:34 UTC (permalink / raw)
To: Dawei Feng; +Cc: bpf
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] Inaccurate commit message and 'Fixes' tag attribution, which will cause regressions in stable backports.
--
commit 7f62fca2fcdc45d11be946984a6167b4460d8a59
Author: Dawei Feng <dawei.feng@seu.edu.cn>
bpf: cgroup: restore sysctl new-value replacement
This commit updates the return value check in __cgroup_bpf_run_filter_sysctl
to check for 0 instead of 1. It intends to fix an issue where the modified
sysctl value was ignored because the runner macro returns 0 on success.
> Commit 4e63acdff864 ("bpf: Introduce bpf_sysctl_{get,set}_new_value
> helpers") changed the success return value to 0, but failed to update the
> corresponding check in __cgroup_bpf_run_filter_sysctl().
[Severity: High]
Is this attribution accurate? Commit 4e63acdff864 introduced the sysctl hook
with the ret == 1 check, which was correct at the time because the legacy
BPF_PROG_RUN_ARRAY returned 1 on success.
It appears the return value was actually changed later by commit
f10d05966196 (bpf: Make BPF_PROG_RUN_ARRAY return -err instead of allow
boolean), which changed the macro to return 0 on success but missed
updating this condition.
> Fixes: 4e63acdff864 ("bpf: Introduce bpf_sysctl_{get,set}_new_value helpers")
> Cc: stable@vger.kernel.org
[Severity: High]
Since this patch targets 4e63acdff864 for stable backports, could this cause
a regression if applied to older LTS kernels (like 5.4, 5.10, or 5.15)?
On those older kernels where f10d05966196 is not present, the runner still
returns 1 on success. Applying this fix would change the condition to !ret,
which evaluates to false on success and actively breaks the sysctl override
functionality.
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260529031026.2716641-1-dawei.feng@seu.edu.cn?part=3
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH v2 3/3] bpf: cgroup: restore sysctl new-value replacement
2026-05-29 3:10 ` [PATCH v2 3/3] bpf: cgroup: restore sysctl new-value replacement Dawei Feng
` (2 preceding siblings ...)
2026-05-29 6:34 ` sashiko-bot
@ 2026-06-01 22:01 ` Yonghong Song
3 siblings, 0 replies; 18+ messages in thread
From: Yonghong Song @ 2026-06-01 22:01 UTC (permalink / raw)
To: Dawei Feng, martin.lau
Cc: emil, ast, daniel, andrii, eddyz87, memxor, song, jolsa, kees,
joel.granados, bpf, linux-kernel, linux-fsdevel, jianhao.xu,
stable, Zilin Guan
On 5/28/26 8:10 PM, Dawei Feng wrote:
> Commit 4e63acdff864 ("bpf: Introduce bpf_sysctl_{get,set}_new_value
> helpers") changed the success return value to 0, but failed to update the
> corresponding check in __cgroup_bpf_run_filter_sysctl(). Since
> bpf_prog_run_array_cg() now returns 0 on success, the legacy ret == 1
> condition is never satisfied. As a result, the modified value is ignored,
> and bpf_sysctl_set_new_value() fails to replace the write buffer.
>
> Fix this by checking for a return value of 0 instead, so cgroup/sysctl
> programs can correctly replace the pending sysctl buffer.
>
> This bug was discovered during a manual code review. Tested via a
> cgroup/sysctl BPF reproducer overriding writes to a target sysctl.
> Pre-fix, bpf_sysctl_set_new_value("foo") was silently ignored: the write
> returned 8192 and the value remained "600". Post-fix, the BPF replacement
> buffer properly propagates: the write returns 3 and the value updates to
> "foo".
>
> Fixes: 4e63acdff864 ("bpf: Introduce bpf_sysctl_{get,set}_new_value helpers")
Maybe to have the following commit as the fix tag:
f10d05966196 ("bpf: Make BPF_PROG_RUN_ARRAY return -err instead of allow boolean")
Also, for subject, please remove ': cgroup' (including the patch 1 and 2).
The 'Restore sysctl new-value replacement' is not really clear.
Maybe 'bpf: Restore sysctl new-value from 1 to 0'?
Other than the above,
Acked-by: Yonghong Song <yonghong.song@linux.dev>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 0/3] bpf: cgroup: fix sysctl new-value handling in __cgroup_bpf_run_filter_sysctl
2026-05-29 3:10 [PATCH v2 0/3] bpf: cgroup: fix sysctl new-value handling in __cgroup_bpf_run_filter_sysctl Dawei Feng
` (2 preceding siblings ...)
2026-05-29 3:10 ` [PATCH v2 3/3] bpf: cgroup: restore sysctl new-value replacement Dawei Feng
@ 2026-05-29 4:44 ` Jiayuan Chen
2026-05-29 11:37 ` Dawei Feng
3 siblings, 1 reply; 18+ messages in thread
From: Jiayuan Chen @ 2026-05-29 4:44 UTC (permalink / raw)
To: Dawei Feng, martin.lau
Cc: emil, ast, daniel, andrii, eddyz87, memxor, song, yonghong.song,
jolsa, kees, joel.granados, bpf, linux-kernel, linux-fsdevel,
jianhao.xu
On 5/29/26 11:10 AM, Dawei Feng wrote:
> This series fixes three bugs in the sysctl write-buffer replacement path
> of __cgroup_bpf_run_filter_sysctl(). It resolves a kvzalloc()/kfree()
> mismatch, adds a missing NUL terminator to the replacement string, and
> updates a stale return value check to safely restore the replacement
> functionality.
>
> Patch Summary:
> - patch 1 uses kvfree() for the replaced sysctl write buffer
> - patch 2 NUL-terminates the replaced sysctl value
> - patch 3 restores sysctl new-value replacement
>
> Changelog:
> v1 -> v2:
> - added patch 2 to fix an out-of-bounds access in
> bpf_sysctl_set_new_value() by properly NUL-terminating the replaced
> sysctl value buffer.
> - reordered patches 1 and 3.
I think patch 1 and patch 3 should be adjacent in the series.
Why is patch 2 placed between them?
>
> Dawei Feng (3):
> bpf: cgroup: use kvfree() for replaced sysctl write buffer
> bpf: cgroup: NUL-terminate replaced sysctl value
> bpf: cgroup: restore sysctl new-value replacement
>
> kernel/bpf/cgroup.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH v2 0/3] bpf: cgroup: fix sysctl new-value handling in __cgroup_bpf_run_filter_sysctl
2026-05-29 4:44 ` [PATCH v2 0/3] bpf: cgroup: fix sysctl new-value handling in __cgroup_bpf_run_filter_sysctl Jiayuan Chen
@ 2026-05-29 11:37 ` Dawei Feng
2026-05-29 16:45 ` Emil Tsalapatis
0 siblings, 1 reply; 18+ messages in thread
From: Dawei Feng @ 2026-05-29 11:37 UTC (permalink / raw)
To: jiayuan.chen
Cc: emil, martin.lau, ast, daniel, andrii, eddyz87, memxor, song,
yonghong.song, jolsa, kees, joel.granados, bpf, linux-kernel,
linux-fsdevel, jianhao.xu
On Fri, May 29, 2026 at 12:44 PM, Jiayuan Chen wrote:
>I think patch 1 and patch 3 should be adjacent in the series.
>
>Why is patch 2 placed between them?
Hi Jiayuan,
Thanks for the review.
Patch 1 and Patch 2 are actually parallel to each other. They both address
pre-existing issues that would be exposed and triggered once the changes
in Patch 3 are applied.
In a previous iteration, a test bot and another maintainer pointed out
that activating the code path without fixing the underlying issues first
could trigger a kernel panic.
I placed both prerequisite fixes (Patch 1 and Patch 2) before Patch 3.
This ensures that the fixes are already in place before the code path is
activated.
If you prefer a different order, please let me know and I will be happy to
adjust it.
> Change is fine but fixes tag is wrong like the bot said.
Noted. I will correct the Fixes tag in v3.
> I saw a reviewed-by tag given by Emil in previous thread. Pls carry it
> when sending new verson.
Thanks for catching this. I will make sure to collect and carry Emil's
Reviewed-by tag in the upcoming v3.
Best regards,
Dawei
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 0/3] bpf: cgroup: fix sysctl new-value handling in __cgroup_bpf_run_filter_sysctl
2026-05-29 11:37 ` Dawei Feng
@ 2026-05-29 16:45 ` Emil Tsalapatis
0 siblings, 0 replies; 18+ messages in thread
From: Emil Tsalapatis @ 2026-05-29 16:45 UTC (permalink / raw)
To: Dawei Feng, jiayuan.chen
Cc: emil, martin.lau, ast, daniel, andrii, eddyz87, memxor, song,
yonghong.song, jolsa, kees, joel.granados, bpf, linux-kernel,
linux-fsdevel, jianhao.xu
On Fri May 29, 2026 at 7:37 AM EDT, Dawei Feng wrote:
> On Fri, May 29, 2026 at 12:44 PM, Jiayuan Chen wrote:
>>I think patch 1 and patch 3 should be adjacent in the series.
>>
>>Why is patch 2 placed between them?
>
> Hi Jiayuan,
>
> Thanks for the review.
>
> Patch 1 and Patch 2 are actually parallel to each other. They both address
> pre-existing issues that would be exposed and triggered once the changes
> in Patch 3 are applied.
>
> In a previous iteration, a test bot and another maintainer pointed out
> that activating the code path without fixing the underlying issues first
> could trigger a kernel panic.
>
> I placed both prerequisite fixes (Patch 1 and Patch 2) before Patch 3.
> This ensures that the fixes are already in place before the code path is
> activated.
>
> If you prefer a different order, please let me know and I will be happy to
> adjust it.
>
>> Change is fine but fixes tag is wrong like the bot said.
>
> Noted. I will correct the Fixes tag in v3.
>
>> I saw a reviewed-by tag given by Emil in previous thread. Pls carry it
>> when sending new verson.
>
> Thanks for catching this. I will make sure to collect and carry Emil's
> Reviewed-by tag in the upcoming v3.
>
Seconding what Jiayuan said in terms of the Fixes: tag. Also feel free
to add my Reviewed-by: on patch 2/3 that IIRC was not present on v1.
> Best regards,
> Dawei
^ permalink raw reply [flat|nested] 18+ messages in thread