Re: [PATCH] rethook: Use tsk->on_cpu to check task execution state

BPF List
 help / color / mirror / Atom feed

* Re: [PATCH] rethook: Use tsk->on_cpu to check task execution state
       [not found]               ` <20260608102326.GB3161497@noisy.programming.kicks-ass.net>
@ 2026-06-08 13:08                 ` Masami Hiramatsu
  2026-06-08 14:06                   ` Peter Zijlstra
  2026-06-09  0:59                   ` Tengda Wu
  0 siblings, 2 replies; 5+ messages in thread
From: Masami Hiramatsu @ 2026-06-08 13:08 UTC (permalink / raw)
  To: Peter Zijlstra, bpf
  Cc: Tengda Wu, Masami Hiramatsu, Steven Rostedt, Mathieu Desnoyers,
	Alexei Starovoitov, linux-trace-kernel, linux-kernel,
	Josh Poimboeuf, jikos, mbenes, pmladek

On Mon, 8 Jun 2026 12:23:26 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Mon, Jun 08, 2026 at 11:34:49AM +0200, Peter Zijlstra wrote:
> > 
> > +Live patching folks
> > 
> > On Mon, Jun 08, 2026 at 09:52:37AM +0800, Tengda Wu wrote:
> > 
> > > Background: We are verifying the support of live patches for functions that
> > > have a kretprobe. The specific verification method is as follows:
> > > 
> > > We construct a function foo() that calls bar():
> > > 
> > > void bar(void)
> > > {
> > >     for (;;) {
> > >         schedule();
> > >     }
> > > }
> > > 
> > > void foo(void)
> > > {
> > >     bar();
> > > }
> > > 
> > > A kretprobe is attached to bar():
> > > 
> > > echo 'r:rp1 bar' > /sys/kernel/tracing/kprobe_events
> > > echo 1 > /sys/kernel/tracing/events/kprobes/rp1/enable
> > > 
> > > Then foo() is triggered. The expected behavior is that bar() will call
> > > schedule() and yield the CPU.
> > > 
> > > After that, the live patch is activated to attempt replacing the implementation
> > > of foo(). The expectation is that this should succeed.
> > 
> > This wholly depends on how foo() calls bar(), if it is a normal call,
> > then no, it should not succeed, because foo() is still on the stack.
> > 
> > If it is a tail-call, then yes, because foo() is no longer relevant.
> > 
> > > However, in reality, because the task that called schedule() is still in the
> > > RUNNING state,
> > 
> > So calling schedule() without setting state is dodgy in the first place.
> > Who is doing this? All wait primitives will set this to
> > TASK_UNINTERRUPTIBLE or something along those lines.
> > 
> > > the condition task_is_running(tsk) inside rethook_find_ret_addr()
> > > is not satisfied, causing the function to return early. This, in turn,
> > > prevents stack_trace_save_tsk_reliable() from determining the stack as
> > > reliable, leading to a failure in activating the live patch.
> > > 
> > > **Not sure if this is correct:**
> > > 
> > > We believe that after a task voluntarily calls schedule(), when the stack
> > > is expected to be reliable, it is a safe time to activate a live patch.
> > 
> > Calling schedule() without setting state is a no-op and really shouldn't
> > count much at all.
> > 
> > > Additionally, a similar tsk->on_cpu check can be found elsewhere in the
> > > kernel (See task_on_another_cpu() in arch/x86/include/asm/unwind.h).
> > > Therefore, we propose changing the task_is_running(tsk) condition to
> > > tsk->on_cpu.
> > 
> > Anyway, I'm wondering what the purpose of this check here is, there is
> > no real comment, and commit 5120d167e21c ("rethook: Remove warning
> > messages printed for finding return address of a frame.") is just pure
> > voodoo as well.
> 
> FWIW, you should have had this discussion then.

Indeed. The rethook is making a shadow stack by list, thus caller must
guarantee the target process is blocked at least during this function.

The commit messages suggest that when BPF takes a backtrace, it also
includes other running tasks. Is that safe?

> 
> > Also, note the comment that goes with the usage of
> > task_on_another_cpu(); that thing is racy as all heck.
> > 
> > So it really comes down to what the purpose of this check is.

This check has been introduced when it is copied from
kretprobe_find_ret_addr(). It has the comment:

 * The @tsk must be 'current' or a task which is not running. @fp is a hint

IIRC, I added this check to explicitly verify this condition.

> > 
> > I suspect the issue at hand is that tsk->rethook elements, such as
> > iterated by __rethook_find_ret_addr() are not safe to be accessed for a
> > running task.
> > 
> > Notably while rethook_recycle() has some RCU thing on, that objpool
> > thing (and the recycle name itself) seems to strongly suggest iterating
> > these things is not sound (you could start with things from this task,
> > hit a recycled entry and continue iterating rethooks from another task).
> > 
> > Also note that the current check is also racy, nothing really prevents a
> > wakeup from happening right after you observe task_is_running() being
> > false. The task can then get scheduled in on another CPU and tear down
> > its rethooks concurrent with __rethook_find_ret_addr().

Yeah, but is there any way to ensure the task is blocked? Even if it is
blocked, like TASK_UNINTERRUPTIBLE, unless holding the actual lock in
the rethook, it may not be possible to ensure it?

Of course, we could give up on checking within this function and leave
everything to the caller to guarantee - as kretprobe does.

BTW, the reason why we made it possible to pass tasks other than current
 is that the stack unwinding code itself supported unwinding tasks other
than current, so we had no choice but to create this interface.

However, it is a bad idea to check this in deep inside of unwinding.

> > Now, livepatch itself calls unwind from a proper context, but unwinds in
> > general are not. This rethook stuff doesn't seem to be sound in general.
> 
> I suspect just entirely removing the check is the sanest option at this
> point. Callers that do it right (livepatch) are guaranteed consistent
> data, and the rest gets whatever pieces.

Agreed.

> 
> Notably, unwind_next() holds rcu, so the iteration is protected from any
> of those rethook_node things getting freed. Its just that the iteration
> can go sideways and you might not get a sane answer.
> 
> The very worst possible option is getting stuck in an infinite loop when
> concurrent with agressive rethook re-use or something daft like that,
> but that seems extremely unlikely.


OK, thanks for your review!

Tengda, can you send a patch to just remove the check?

Thank you,

-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] rethook: Use tsk->on_cpu to check task execution state
  2026-06-08 13:08                 ` [PATCH] rethook: Use tsk->on_cpu to check task execution state Masami Hiramatsu
@ 2026-06-08 14:06                   ` Peter Zijlstra
  2026-06-09  4:41                     ` Masami Hiramatsu
  2026-06-09  0:59                   ` Tengda Wu
  1 sibling, 1 reply; 5+ messages in thread
From: Peter Zijlstra @ 2026-06-08 14:06 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: bpf, Tengda Wu, Steven Rostedt, Mathieu Desnoyers,
	Alexei Starovoitov, linux-trace-kernel, linux-kernel,
	Josh Poimboeuf, jikos, mbenes, pmladek

On Mon, Jun 08, 2026 at 10:08:11PM +0900, Masami Hiramatsu wrote:

> > > Anyway, I'm wondering what the purpose of this check here is, there is
> > > no real comment, and commit 5120d167e21c ("rethook: Remove warning
> > > messages printed for finding return address of a frame.") is just pure
> > > voodoo as well.
> > 
> > FWIW, you should have had this discussion then.
> 
> Indeed. The rethook is making a shadow stack by list, thus caller must
> guarantee the target process is blocked at least during this function.
> 
> The commit messages suggest that when BPF takes a backtrace, it also
> includes other running tasks. Is that safe?

Well, you get to keep the pieces. At this point safe only pertains to
'doesn't-crash', all correctness is out the window.

I always forget the crazy BPF does ;-)

> > > Also, note the comment that goes with the usage of
> > > task_on_another_cpu(); that thing is racy as all heck.
> > > 
> > > So it really comes down to what the purpose of this check is.
> 
> This check has been introduced when it is copied from
> kretprobe_find_ret_addr(). It has the comment:
> 
>  * The @tsk must be 'current' or a task which is not running. @fp is a hint
> 
> IIRC, I added this check to explicitly verify this condition.

Right, but it is a prescriptive comment, not an explanatory one. That
is, it doesn't explain the condition.

> > > I suspect the issue at hand is that tsk->rethook elements, such as
> > > iterated by __rethook_find_ret_addr() are not safe to be accessed for a
> > > running task.
> > > 
> > > Notably while rethook_recycle() has some RCU thing on, that objpool
> > > thing (and the recycle name itself) seems to strongly suggest iterating
> > > these things is not sound (you could start with things from this task,
> > > hit a recycled entry and continue iterating rethooks from another task).
> > > 
> > > Also note that the current check is also racy, nothing really prevents a
> > > wakeup from happening right after you observe task_is_running() being
> > > false. The task can then get scheduled in on another CPU and tear down
> > > its rethooks concurrent with __rethook_find_ret_addr().
> 
> Yeah, but is there any way to ensure the task is blocked? Even if it is
> blocked, like TASK_UNINTERRUPTIBLE, unless holding the actual lock in
> the rethook, it may not be possible to ensure it?
> 
> Of course, we could give up on checking within this function and leave
> everything to the caller to guarantee - as kretprobe does.
> 
> BTW, the reason why we made it possible to pass tasks other than current
>  is that the stack unwinding code itself supported unwinding tasks other
> than current, so we had no choice but to create this interface.
> 
> However, it is a bad idea to check this in deep inside of unwinding.

This, you cannot take locks in unwinding. The only thing you can do is
try to do the best you can without crashing.

Typically unwind only happens on self -- this is natural, a task crashes
and unwinds itself, or a task does something (takes a lock, hits a
tracepoint, etc) and takes a snapshot of its own stack, and this is
safe.

Things like live-patch use task_call_func(), which ensures the callback
function is done while holding sufficient locks for the task to not
change state.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] rethook: Use tsk->on_cpu to check task execution state
  2026-06-08 14:06                   ` Peter Zijlstra
@ 2026-06-09  4:41                     ` Masami Hiramatsu
  2026-06-09  7:05                       ` Peter Zijlstra
  0 siblings, 1 reply; 5+ messages in thread
From: Masami Hiramatsu @ 2026-06-09  4:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: bpf, Tengda Wu, Steven Rostedt, Mathieu Desnoyers,
	Alexei Starovoitov, linux-trace-kernel, linux-kernel,
	Josh Poimboeuf, jikos, mbenes, pmladek

On Mon, 8 Jun 2026 16:06:54 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Mon, Jun 08, 2026 at 10:08:11PM +0900, Masami Hiramatsu wrote:
> 
> > > > Anyway, I'm wondering what the purpose of this check here is, there is
> > > > no real comment, and commit 5120d167e21c ("rethook: Remove warning
> > > > messages printed for finding return address of a frame.") is just pure
> > > > voodoo as well.
> > > 
> > > FWIW, you should have had this discussion then.
> > 
> > Indeed. The rethook is making a shadow stack by list, thus caller must
> > guarantee the target process is blocked at least during this function.
> > 
> > The commit messages suggest that when BPF takes a backtrace, it also
> > includes other running tasks. Is that safe?
> 
> Well, you get to keep the pieces. At this point safe only pertains to
> 'doesn't-crash', all correctness is out the window.
> 
> I always forget the crazy BPF does ;-)
> 
> > > > Also, note the comment that goes with the usage of
> > > > task_on_another_cpu(); that thing is racy as all heck.
> > > > 
> > > > So it really comes down to what the purpose of this check is.
> > 
> > This check has been introduced when it is copied from
> > kretprobe_find_ret_addr(). It has the comment:
> > 
> >  * The @tsk must be 'current' or a task which is not running. @fp is a hint
> > 
> > IIRC, I added this check to explicitly verify this condition.
> 
> Right, but it is a prescriptive comment, not an explanatory one. That
> is, it doesn't explain the condition.
> 
> > > > I suspect the issue at hand is that tsk->rethook elements, such as
> > > > iterated by __rethook_find_ret_addr() are not safe to be accessed for a
> > > > running task.
> > > > 
> > > > Notably while rethook_recycle() has some RCU thing on, that objpool
> > > > thing (and the recycle name itself) seems to strongly suggest iterating
> > > > these things is not sound (you could start with things from this task,
> > > > hit a recycled entry and continue iterating rethooks from another task).
> > > > 
> > > > Also note that the current check is also racy, nothing really prevents a
> > > > wakeup from happening right after you observe task_is_running() being
> > > > false. The task can then get scheduled in on another CPU and tear down
> > > > its rethooks concurrent with __rethook_find_ret_addr().
> > 
> > Yeah, but is there any way to ensure the task is blocked? Even if it is
> > blocked, like TASK_UNINTERRUPTIBLE, unless holding the actual lock in
> > the rethook, it may not be possible to ensure it?
> > 
> > Of course, we could give up on checking within this function and leave
> > everything to the caller to guarantee - as kretprobe does.
> > 
> > BTW, the reason why we made it possible to pass tasks other than current
> >  is that the stack unwinding code itself supported unwinding tasks other
> > than current, so we had no choice but to create this interface.
> > 
> > However, it is a bad idea to check this in deep inside of unwinding.
> 
> This, you cannot take locks in unwinding. The only thing you can do is
> try to do the best you can without crashing.
> 
> Typically unwind only happens on self -- this is natural, a task crashes
> and unwinds itself, or a task does something (takes a lock, hits a
> tracepoint, etc) and takes a snapshot of its own stack, and this is
> safe.
> 
> Things like live-patch use task_call_func(), which ensures the callback
> function is done while holding sufficient locks for the task to not
> change state.

Hmm, is there any way to ensure the function is called from task_call_func()?
(Maybe checking p->pi_lock, but this is not sure the lock owner is this
context?) If not, I need to make this available only for current task
(anyway it just return kretprobe trampoline address, no critical issue)
or, introduce a spinlock.

Or, eventually it may be better to replace kretprobe/rethook with
fprobe return handler.

Thank you,

-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] rethook: Use tsk->on_cpu to check task execution state
  2026-06-09  4:41                     ` Masami Hiramatsu
@ 2026-06-09  7:05                       ` Peter Zijlstra
  0 siblings, 0 replies; 5+ messages in thread
From: Peter Zijlstra @ 2026-06-09  7:05 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: bpf, Tengda Wu, Steven Rostedt, Mathieu Desnoyers,
	Alexei Starovoitov, linux-trace-kernel, linux-kernel,
	Josh Poimboeuf, jikos, mbenes, pmladek

On Tue, Jun 09, 2026 at 01:41:53PM +0900, Masami Hiramatsu wrote:

> > This, you cannot take locks in unwinding. The only thing you can do is
> > try to do the best you can without crashing.
> > 
> > Typically unwind only happens on self -- this is natural, a task crashes
> > and unwinds itself, or a task does something (takes a lock, hits a
> > tracepoint, etc) and takes a snapshot of its own stack, and this is
> > safe.
> > 
> > Things like live-patch use task_call_func(), which ensures the callback
> > function is done while holding sufficient locks for the task to not
> > change state.
> 
> Hmm, is there any way to ensure the function is called from task_call_func()?

Nope. And you shouldn't want to.

> (Maybe checking p->pi_lock, but this is not sure the lock owner is this
> context?) If not, I need to make this available only for current task
> (anyway it just return kretprobe trampoline address, no critical issue)
> or, introduce a spinlock.
> 
> Or, eventually it may be better to replace kretprobe/rethook with
> fprobe return handler.

I'm not sure where you're wanting to go. AFAICT the current rethook
stuff won't crash when called on an active task, it might just not give
the right results -- but that is true for the entire unwind, so who
cares?

Those who call unwind on active tasks get to keep the pieces, not our
problem etc.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] rethook: Use tsk->on_cpu to check task execution state
  2026-06-08 13:08                 ` [PATCH] rethook: Use tsk->on_cpu to check task execution state Masami Hiramatsu
  2026-06-08 14:06                   ` Peter Zijlstra
@ 2026-06-09  0:59                   ` Tengda Wu
  1 sibling, 0 replies; 5+ messages in thread
From: Tengda Wu @ 2026-06-09  0:59 UTC (permalink / raw)
  To: Masami Hiramatsu, Peter Zijlstra, bpf
  Cc: Steven Rostedt, Mathieu Desnoyers, Alexei Starovoitov,
	linux-trace-kernel, linux-kernel, Josh Poimboeuf, jikos, mbenes,
	pmladek



On 2026/6/8 21:08, Masami Hiramatsu wrote:
> On Mon, 8 Jun 2026 12:23:26 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
>> On Mon, Jun 08, 2026 at 11:34:49AM +0200, Peter Zijlstra wrote:
>>>
>>> +Live patching folks
>>>
>>> On Mon, Jun 08, 2026 at 09:52:37AM +0800, Tengda Wu wrote:
>>>
>>>> Background: We are verifying the support of live patches for functions that
>>>> have a kretprobe. The specific verification method is as follows:
>>>>
>>>> We construct a function foo() that calls bar():
>>>>
>>>> void bar(void)
>>>> {
>>>>     for (;;) {
>>>>         schedule();
>>>>     }
>>>> }
>>>>
>>>> void foo(void)
>>>> {
>>>>     bar();
>>>> }
>>>>
>>>> A kretprobe is attached to bar():
>>>>
>>>> echo 'r:rp1 bar' > /sys/kernel/tracing/kprobe_events
>>>> echo 1 > /sys/kernel/tracing/events/kprobes/rp1/enable
>>>>
>>>> Then foo() is triggered. The expected behavior is that bar() will call
>>>> schedule() and yield the CPU.
>>>>
>>>> After that, the live patch is activated to attempt replacing the implementation
>>>> of foo(). The expectation is that this should succeed.
>>>
>>> This wholly depends on how foo() calls bar(), if it is a normal call,
>>> then no, it should not succeed, because foo() is still on the stack.
>>>
>>> If it is a tail-call, then yes, because foo() is no longer relevant.
>>>
>>>> However, in reality, because the task that called schedule() is still in the
>>>> RUNNING state,
>>>
>>> So calling schedule() without setting state is dodgy in the first place.
>>> Who is doing this? All wait primitives will set this to
>>> TASK_UNINTERRUPTIBLE or something along those lines.
>>>
>>>> the condition task_is_running(tsk) inside rethook_find_ret_addr()
>>>> is not satisfied, causing the function to return early. This, in turn,
>>>> prevents stack_trace_save_tsk_reliable() from determining the stack as
>>>> reliable, leading to a failure in activating the live patch.
>>>>
>>>> **Not sure if this is correct:**
>>>>
>>>> We believe that after a task voluntarily calls schedule(), when the stack
>>>> is expected to be reliable, it is a safe time to activate a live patch.
>>>
>>> Calling schedule() without setting state is a no-op and really shouldn't
>>> count much at all.
>>>
>>>> Additionally, a similar tsk->on_cpu check can be found elsewhere in the
>>>> kernel (See task_on_another_cpu() in arch/x86/include/asm/unwind.h).
>>>> Therefore, we propose changing the task_is_running(tsk) condition to
>>>> tsk->on_cpu.
>>>
>>> Anyway, I'm wondering what the purpose of this check here is, there is
>>> no real comment, and commit 5120d167e21c ("rethook: Remove warning
>>> messages printed for finding return address of a frame.") is just pure
>>> voodoo as well.
>>
>> FWIW, you should have had this discussion then.
> 
> Indeed. The rethook is making a shadow stack by list, thus caller must
> guarantee the target process is blocked at least during this function.
> 
> The commit messages suggest that when BPF takes a backtrace, it also
> includes other running tasks. Is that safe?
> 
>>
>>> Also, note the comment that goes with the usage of
>>> task_on_another_cpu(); that thing is racy as all heck.
>>>
>>> So it really comes down to what the purpose of this check is.
> 
> This check has been introduced when it is copied from
> kretprobe_find_ret_addr(). It has the comment:
> 
>  * The @tsk must be 'current' or a task which is not running. @fp is a hint
> 
> IIRC, I added this check to explicitly verify this condition.
> 
>>>
>>> I suspect the issue at hand is that tsk->rethook elements, such as
>>> iterated by __rethook_find_ret_addr() are not safe to be accessed for a
>>> running task.
>>>
>>> Notably while rethook_recycle() has some RCU thing on, that objpool
>>> thing (and the recycle name itself) seems to strongly suggest iterating
>>> these things is not sound (you could start with things from this task,
>>> hit a recycled entry and continue iterating rethooks from another task).
>>>
>>> Also note that the current check is also racy, nothing really prevents a
>>> wakeup from happening right after you observe task_is_running() being
>>> false. The task can then get scheduled in on another CPU and tear down
>>> its rethooks concurrent with __rethook_find_ret_addr().
> 
> Yeah, but is there any way to ensure the task is blocked? Even if it is
> blocked, like TASK_UNINTERRUPTIBLE, unless holding the actual lock in
> the rethook, it may not be possible to ensure it?
> 
> Of course, we could give up on checking within this function and leave
> everything to the caller to guarantee - as kretprobe does.
> 
> BTW, the reason why we made it possible to pass tasks other than current
>  is that the stack unwinding code itself supported unwinding tasks other
> than current, so we had no choice but to create this interface.
> 
> However, it is a bad idea to check this in deep inside of unwinding.
> 
>>> Now, livepatch itself calls unwind from a proper context, but unwinds in
>>> general are not. This rethook stuff doesn't seem to be sound in general.
>>
>> I suspect just entirely removing the check is the sanest option at this
>> point. Callers that do it right (livepatch) are guaranteed consistent
>> data, and the rest gets whatever pieces.
> 
> Agreed.
> 
>>
>> Notably, unwind_next() holds rcu, so the iteration is protected from any
>> of those rethook_node things getting freed. Its just that the iteration
>> can go sideways and you might not get a sane answer.
>>
>> The very worst possible option is getting stuck in an infinite loop when
>> concurrent with agressive rethook re-use or something daft like that,
>> but that seems extremely unlikely.
> 
> 
> OK, thanks for your review!
> 
> Tengda, can you send a patch to just remove the check?
> 
> Thank you,
> 

Sure, the patch to remove the check has been sent.

-- Tengda


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-06-09  7:06 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20260525132253.1889726-1-wutengda@huaweicloud.com>
     [not found] ` <20260526123719.482f07a3843e207e22d95378@kernel.org>
     [not found]   ` <94179dab-ffb7-4fab-af45-b20bfb686ab3@huaweicloud.com>
     [not found]     ` <20260601084001.9566b443746447ec2bb1a9fb@kernel.org>
     [not found]       ` <20260604093445.GF3126523@noisy.programming.kicks-ass.net>
     [not found]         ` <20260605224341.c926299d613b6102912c9a3f@kernel.org>
     [not found]           ` <679a1c8f-1e4d-4ae5-83e1-d0068e6de1a6@huaweicloud.com>
     [not found]             ` <20260608093449.GH4149641@noisy.programming.kicks-ass.net>
     [not found]               ` <20260608102326.GB3161497@noisy.programming.kicks-ass.net>
2026-06-08 13:08                 ` [PATCH] rethook: Use tsk->on_cpu to check task execution state Masami Hiramatsu
2026-06-08 14:06                   ` Peter Zijlstra
2026-06-09  4:41                     ` Masami Hiramatsu
2026-06-09  7:05                       ` Peter Zijlstra
2026-06-09  0:59                   ` Tengda Wu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox