Linux Trace Kernel
 help / color / mirror / Atom feed
* [PATCH v4] rethook: Remove the running task check in rethook_find_ret_addr()
@ 2026-06-10  1:36 Tengda Wu
  2026-06-10  1:50 ` sashiko-bot
  0 siblings, 1 reply; 2+ messages in thread
From: Tengda Wu @ 2026-06-10  1:36 UTC (permalink / raw)
  To: Masami Hiramatsu, Peter Zijlstra, Petr Mladek
  Cc: Steven Rostedt, Mathieu Desnoyers, Alexei Starovoitov,
	linux-trace-kernel, linux-kernel, live-patching, Tengda Wu

The current check in rethook_find_ret_addr() prevents obtaining a return
address when the target task is marked as running. However, this condition
is both insufficient for correctness and unnecessary for its intended
purpose.

The check is inherently racy: a task can begin running on another CPU
immediately after task_is_running() returns false, potentially leading to
concurrent modification of rethook data structures while the iteration is
in progress.

Rather than trying to fix this unreliable check deep in the unwinding
path, simply remove it. The iteration is already safe from crashes because
unwind_next_frame() holds RCU and rethook_node structures are RCU-freed;
even if the iteration goes off the rails and returns invalid information,
it will not crash. Callers that require consistency must provide a safe
context themselves.

Fixes: 54ecbe6f1ed5 ("rethook: Add a generic return hook")
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Tengda Wu <wutengda@huaweicloud.com>
---
v4: Also update the function description in the comment.
v3: https://lore.kernel.org/all/20260609084953.901576-1-wutengda@huaweicloud.com/
v2: https://lore.kernel.org/all/20260609005728.458962-1-wutengda@huaweicloud.com/
v1: https://lore.kernel.org/all/20260525132253.1889726-1-wutengda@huaweicloud.com/

 kernel/trace/rethook.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/kernel/trace/rethook.c b/kernel/trace/rethook.c
index 5a8bdf88999a..1e7fdebe3cd5 100644
--- a/kernel/trace/rethook.c
+++ b/kernel/trace/rethook.c
@@ -233,9 +233,10 @@ NOKPROBE_SYMBOL(__rethook_find_ret_addr);
  *
  * Find the correct return address modified by a rethook on @tsk in unsigned
  * long type.
- * The @tsk must be 'current' or a task which is not running. @frame is a hint
- * to get the currect return address - which is compared with the
- * rethook::frame field. The @cur is a loop cursor for searching the
+ * @tsk can be any task (any state). If not 'current', the result may be
+ * unreliable. Callers requiring reliability must ensure a safe context.
+ * @frame is a hint to get the correct return address - which is compared with
+ * the rethook::frame field. The @cur is a loop cursor for searching the
  * kretprobe return addresses on the @tsk. The '*@cur' should be NULL at the
  * first call, but '@cur' itself must NOT NULL.
  *
@@ -250,9 +251,6 @@ unsigned long rethook_find_ret_addr(struct task_struct *tsk, unsigned long frame
 	if (WARN_ON_ONCE(!cur))
 		return 0;
 
-	if (tsk != current && task_is_running(tsk))
-		return 0;
-
 	do {
 		ret = __rethook_find_ret_addr(tsk, cur);
 		if (!ret)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH v4] rethook: Remove the running task check in rethook_find_ret_addr()
  2026-06-10  1:36 [PATCH v4] rethook: Remove the running task check in rethook_find_ret_addr() Tengda Wu
@ 2026-06-10  1:50 ` sashiko-bot
  0 siblings, 0 replies; 2+ messages in thread
From: sashiko-bot @ 2026-06-10  1:50 UTC (permalink / raw)
  To: Masami Hiramatsu, Petr Mladek, Peter Zijlstra
  Cc: Tengda Wu, Mathieu Desnoyers, Alexei Starovoitov, Steven Rostedt,
	linux-kernel, linux-trace-kernel, live-patching

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:

Pre-existing issues:
- [High] Removing `task_is_running()` from `rethook_find_ret_addr()` exposes stack unwinders to use-after-free, wild pointer dereferences, and list corruption when unwinding running tasks, because `rethook_node` recycling and unwinding lack adequate RCU synchronization.
--

commit 83e1fac9c64377191cc2b41ef9da781c46f1e95a
Author: Tengda Wu <wutengda@huaweicloud.com>

rethook: Remove the running task check in rethook_find_ret_addr()

This commit removes the task_is_running() check from rethook_find_ret_addr().
The commit message states the check is unnecessary because unwinders hold RCU
and rethook_node structures are RCU-freed. Callers are expected to provide
their own safe context.

> The iteration is already safe from crashes because
> unwind_next_frame() holds RCU and rethook_node structures are RCU-freed;
> even if the iteration goes off the rails and returns invalid information,
> it will not crash. Callers that require consistency must provide a safe
> context themselves.

[ ... ]

> diff --git a/kernel/trace/rethook.c b/kernel/trace/rethook.c
> index 5a8bdf88999a9..1e7fdebe3cd5d 100644
> --- a/kernel/trace/rethook.c
> +++ b/kernel/trace/rethook.c
[ ... ]
> @@ -250,9 +251,6 @@ unsigned long rethook_find_ret_addr(struct task_struct *tsk, unsigned long frame
>  	if (WARN_ON_ONCE(!cur))
>  		return 0;
>  
> -	if (tsk != current && task_is_running(tsk))
> -		return 0;
> -
>  	do {
>  		ret = __rethook_find_ret_addr(tsk, cur);

[Severity: High]
This is a pre-existing issue, but does removing this check entirely expose
the stack unwinders to a use-after-free and list corruption?

The commit message mentions that rethook_node structures are RCU-freed, but
looking at rethook_recycle(), the nodes appear to be immediately pushed back
to the lockless objpool without an RCU grace period:

kernel/trace/rethook.c:rethook_recycle() {
    ...
    objpool_push(node, &node->rethook->pool);
}

Additionally, while unwind_next_frame() does hold RCU for a single frame
iteration, stack unwinders like x86 ORC drop the read lock between frames:

arch/x86/kernel/unwind_orc.c:unwind_next_frame() {
    ...
    guard(rcu)();
    ...
}

Because the unwinder retains the list cursor (*cur) across frame iterations
while RCU is dropped, could unwinding a running task allow the target task
to recycle the node?

If the node is recycled and instantly reallocated to another task, the next
call to __rethook_find_ret_addr() would execute:

kernel/trace/rethook.c:__rethook_find_ret_addr() {
    ...
    node = node->next;

on a stale pointer. Could removing this check turn a theoretical race into
an easily triggered use-after-free when reading /proc/<pid>/stack for tasks
executing kretprobes?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260610013658.1837963-1-wutengda@huaweicloud.com?part=1

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2026-06-10  1:50 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-10  1:36 [PATCH v4] rethook: Remove the running task check in rethook_find_ret_addr() Tengda Wu
2026-06-10  1:50 ` sashiko-bot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox