From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
To: Tengda Wu <wutengda@huaweicloud.com>,
Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
Steven Rostedt <rostedt@goodmis.org>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
Alexei Starovoitov <ast@kernel.org>,
linux-trace-kernel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] rethook: Use tsk->on_cpu to check task execution state
Date: Mon, 8 Jun 2026 11:56:46 +0900 [thread overview]
Message-ID: <20260608115646.97d80d30aed182d468496449@kernel.org> (raw)
In-Reply-To: <679a1c8f-1e4d-4ae5-83e1-d0068e6de1a6@huaweicloud.com>
On Mon, 8 Jun 2026 09:52:37 +0800
Tengda Wu <wutengda@huaweicloud.com> wrote:
>
>
> On 2026/6/5 21:43, Masami Hiramatsu wrote:
> > On Thu, 4 Jun 2026 11:34:45 +0200
> > Peter Zijlstra <peterz@infradead.org> wrote:
> >
> >> On Mon, Jun 01, 2026 at 08:40:01AM +0900, Masami Hiramatsu wrote:
> >>
> >>> Peter, is it OK to drop @rq from task_on_cpu()?
> >>
> >> Sure.
> >>
> >>> Then we can use it from rethook.
> >>
> >> Well, it is in sched/sched.h, which is an internal header, and no you
> >> cannot use that header in rethook.
> >
> > Ah, OK. Hmm, then we should not use it. Maybe ->on_cpu is also internal
> > state?
> >
> >>
> >> But lets step back first, what is the actual problem here, why are we
> >> looking at ->on_cpu at all?
> >
> > Tengda, can you explain it?
> > I think you want to take a stacktrace on !current process, and
> > rethook_find_ret_addr() is rejected i the task is running state.
> >
> > But if you can share actual situation what you need, it is
> > helpful for us to understand.
> >
> > Thank you,
> >
> >
>
>
> Sure.
>
> Background: We are verifying the support of live patches for functions that
> have a kretprobe. The specific verification method is as follows:
>
> We construct a function foo() that calls bar():
>
> void bar(void)
> {
> for (;;) {
> schedule();
> }
> }
>
> void foo(void)
> {
> bar();
> }
>
> A kretprobe is attached to bar():
>
> echo 'r:rp1 bar' > /sys/kernel/tracing/kprobe_events
> echo 1 > /sys/kernel/tracing/events/kprobes/rp1/enable
>
> Then foo() is triggered. The expected behavior is that bar() will call
> schedule() and yield the CPU.
>
> After that, the live patch is activated to attempt replacing the implementation
> of foo(). The expectation is that this should succeed.
>
> However, in reality, because the task that called schedule() is still in the
> RUNNING state, the condition task_is_running(tsk) inside rethook_find_ret_addr()
> is not satisfied, causing the function to return early. This, in turn,
> prevents stack_trace_save_tsk_reliable() from determining the stack as
> reliable, leading to a failure in activating the live patch.
Hmm is the bar() doing infinite loop, or limited loop but take a long time
so just yield a while? Anyway, it seems like a non-good design pattern.
Is it possible to avoid busy loops and instead use Workers, or wait for
something to complete or for input within a loop?
>
> **Not sure if this is correct:**
>
> We believe that after a task voluntarily calls schedule(), when the stack
> is expected to be reliable, it is a safe time to activate a live patch.
In this case, I don't know how to block the loop inside the bar.
Even if !tsk->on_cpu, the tsk can restart running right after checking
the flag.
> Additionally, a similar tsk->on_cpu check can be found elsewhere in the
> kernel (See task_on_another_cpu() in arch/x86/include/asm/unwind.h).
> Therefore, we propose changing the task_is_running(tsk) condition to
> tsk->on_cpu.
Yes, but the caller said there is another check to ensure the race.
/*
* Refuse to unwind the stack of a task while it's executing on another
* CPU. This check is racy, but that's ok: the unwinder has other
* checks to prevent it from going off the rails.
*/
if (task_on_another_cpu(task))
goto err;
Josh, do you know how this avoid the race case?
Thank you,
>
> Thanks,
> Tengda
>
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
next prev parent reply other threads:[~2026-06-08 2:56 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-25 13:22 [PATCH] rethook: Use tsk->on_cpu to check task execution state Tengda Wu
2026-05-26 3:37 ` Masami Hiramatsu
2026-05-29 3:39 ` Tengda Wu
2026-05-31 23:40 ` Masami Hiramatsu
2026-06-01 0:58 ` Tengda Wu
2026-06-04 9:34 ` Peter Zijlstra
2026-06-05 13:43 ` Masami Hiramatsu
2026-06-08 1:52 ` Tengda Wu
2026-06-08 2:56 ` Masami Hiramatsu [this message]
2026-06-08 8:31 ` Tengda Wu
2026-06-08 9:34 ` Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260608115646.97d80d30aed182d468496449@kernel.org \
--to=mhiramat@kernel.org \
--cc=ast@kernel.org \
--cc=jpoimboe@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=wutengda@huaweicloud.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox