From: Jiayuan Chen <jiayuan.chen@linux.dev>
To: Dawei Feng <dawei.feng@seu.edu.cn>, martin.lau@linux.dev
Cc: emil@etsalapatis.com, ast@kernel.org, daniel@iogearbox.net,
andrii@kernel.org, eddyz87@gmail.com, memxor@gmail.com,
song@kernel.org, yonghong.song@linux.dev, jolsa@kernel.org,
kees@kernel.org, joel.granados@kernel.org, bpf@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
jianhao.xu@seu.edu.cn, stable@vger.kernel.org,
Zilin Guan <zilin@seu.edu.cn>
Subject: Re: [PATCH v2 1/3] bpf: cgroup: use kvfree() for replaced sysctl write buffer
Date: Fri, 29 May 2026 12:45:37 +0800 [thread overview]
Message-ID: <c8507975-a3da-4917-b2d6-8bc82391cd38@linux.dev> (raw)
In-Reply-To: <20260529031026.2716641-2-dawei.feng@seu.edu.cn>
On 5/29/26 11:10 AM, Dawei Feng wrote:
> proc_sys_call_handler() allocates its temporary sysctl buffer with
> kvzalloc() and passes it to __cgroup_bpf_run_filter_sysctl(). Since
> kvzalloc() may fall back to vmalloc() for large allocations, freeing
> that buffer with kfree() is wrong and can corrupt memory.
>
> Use kvfree() to safely handle both kmalloc and kvzalloc()/vmalloc
> allocations.
>
> The bug was first flagged by an experimental analysis tool we are
> developing for kernel memory-management bugs while analyzing
> v6.13-rc1. The tool is still under development and is not yet publicly
> available. Manual inspection confirms that the bug is still
> present in v7.1-rc5.
>
> Reproduced the bug based on v7.1-rc4 in a QEMU x86_64 guest booted with
> KASAN and CONFIG_FAILSLAB enabled. To exercise the replacement path, the
> test tree also included the accompanying fix for the stale ret == 1
> check in __cgroup_bpf_run_filter_sysctl(). The reproducer confines
> failslab injections to the proc_sys_call_handler() range, uses
> stacktrace-depth=32, and injects fail-nth=1 while writing 8191 bytes to
> /proc/sys/kernel/domainname from a task in the target cgroup. Under
> that setup, fail-nth=1 triggered the fault:
>
> BUG: unable to handle page fault for address: ffffeb0200024d48
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 0 P4D 0
> Oops: Oops: 0000 SMP KASAN NOPTI
> CPU: 2 UID: 0 PID: 209 Comm: repro_proc_sys_ Not tainted 7.1.0-rc4-00686-g97625979a5d4 PREEMPT(lazy)
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
> RIP: 0010:kfree+0x6e/0x510
> Code: 80 48 01 ef 0f 82 ae 04 00 00 48 c7 c0 00 00 00 80 48 2b 05 04 1b 23 04 48 01 c7 48 c1 ef 0c 48 c1 e7 06 48 03 3d e2 1a 23 04 <4c> 8b 57 08 4c 89 d0 83 e0 01 48 83 e8 01 49 09 c2 49 >
> RSP: 0018:ffff888108de7ab8 EFLAGS: 00010282
> RAX: 0000777f80000000 RBX: ffff88815af398c0 RCX: 0000000000000080
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffeb0200024d40
> RBP: ffffc90000935000 R08: 0000000000000001 R09: 0000000000000001
> R10: ffffffff86b4b297 R11: 0000000000000000 R12: ffffffff819b71fd
> R13: 0000000000000001 R14: ffff888108de7cc0 R15: 0000000000000000
> FS: 00007f8988cc2b80(0000) GS:ffff8881d3256000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: ffffeb0200024d48 CR3: 0000000101d6b000 CR4: 0000000000350ef0
> Call Trace:
> <TASK>
> ? __cgroup_bpf_run_filter_sysctl+0x626/0xc30
> __cgroup_bpf_run_filter_sysctl+0x74d/0xc30
> ? __pfx___cgroup_bpf_run_filter_sysctl+0x10/0x10
> ? srso_return_thunk+0x5/0x5f
> ? __kvmalloc_node_noprof+0x345/0x870
> ? proc_sys_call_handler+0x250/0x480
> ? srso_return_thunk+0x5/0x5f
> proc_sys_call_handler+0x3a2/0x480
> ? __pfx_proc_sys_call_handler+0x10/0x10
> ? srso_return_thunk+0x5/0x5f
> ? selinux_file_permission+0x39f/0x500
> ? srso_return_thunk+0x5/0x5f
> ? lock_is_held_type+0x9e/0x120
> vfs_write+0x98e/0x1000
> ? srso_return_thunk+0x5/0x5f
> ? kmem_cache_free+0x308/0x550
> ? __pfx_vfs_write+0x10/0x10
> ? __pfx_do_sys_openat2+0x10/0x10
> ksys_write+0xf2/0x1d0
> ? __pfx_ksys_write+0x10/0x10
> ? srso_return_thunk+0x5/0x5f
> ? trace_irq_enable.constprop.0+0x110/0x140
> do_syscall_64+0x115/0x690
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
> RIP: 0033:0x7f8988dd8907
> Code: 10 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 >
> RSP: 002b:00007fff4069b878 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f8988dd8907
> RDX: 0000000000001fff RSI: 0000564f97ef46b0 RDI: 0000000000000005
> RBP: 0000564f97ef46b0 R08: 0000000000000000 R09: 0000564f97ef46b0
> R10: 0000000000000004 R11: 0000000000000246 R12: 0000000000000000
> R13: 0000000000001fff R14: 0000000000000005 R15: 0000000000000001
> </TASK>
> With this fix applied on top of the same test setup, rerunning the
> reproducer with fail-nth=1 yields no corresponding Oops reports.
>
> Fixes: 4508943794ef ("proc: use kvzalloc for our kernel buffer")
> Cc: stable@vger.kernel.org
>
> Signed-off-by: Zilin Guan <zilin@seu.edu.cn>
> Signed-off-by: Dawei Feng <dawei.feng@seu.edu.cn>
Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev>
next prev parent reply other threads:[~2026-05-29 4:45 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-29 3:10 [PATCH v2 0/3] bpf: cgroup: fix sysctl new-value handling in __cgroup_bpf_run_filter_sysctl Dawei Feng
2026-05-29 3:10 ` [PATCH v2 1/3] bpf: cgroup: use kvfree() for replaced sysctl write buffer Dawei Feng
2026-05-29 4:45 ` Jiayuan Chen [this message]
2026-06-01 21:07 ` Yonghong Song
2026-05-29 3:10 ` [PATCH v2 2/3] bpf: cgroup: NUL-terminate replaced sysctl value Dawei Feng
2026-05-29 3:56 ` bot+bpf-ci
2026-06-01 21:22 ` Yonghong Song
2026-06-03 9:47 ` Dawei Feng
2026-06-03 10:01 ` Dawei Feng
2026-06-03 10:33 ` Dawei Feng
2026-05-29 3:10 ` [PATCH v2 3/3] bpf: cgroup: restore sysctl new-value replacement Dawei Feng
2026-05-29 3:56 ` bot+bpf-ci
2026-05-29 4:51 ` Jiayuan Chen
2026-05-29 6:34 ` sashiko-bot
2026-06-01 22:01 ` Yonghong Song
2026-05-29 4:44 ` [PATCH v2 0/3] bpf: cgroup: fix sysctl new-value handling in __cgroup_bpf_run_filter_sysctl Jiayuan Chen
2026-05-29 11:37 ` Dawei Feng
2026-05-29 16:45 ` Emil Tsalapatis
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c8507975-a3da-4917-b2d6-8bc82391cd38@linux.dev \
--to=jiayuan.chen@linux.dev \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=dawei.feng@seu.edu.cn \
--cc=eddyz87@gmail.com \
--cc=emil@etsalapatis.com \
--cc=jianhao.xu@seu.edu.cn \
--cc=joel.granados@kernel.org \
--cc=jolsa@kernel.org \
--cc=kees@kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=martin.lau@linux.dev \
--cc=memxor@gmail.com \
--cc=song@kernel.org \
--cc=stable@vger.kernel.org \
--cc=yonghong.song@linux.dev \
--cc=zilin@seu.edu.cn \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.