[BUG] no ORC stacktrace from kretprobe.multi bpf program

linux-trace-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [BUG] no ORC stacktrace from kretprobe.multi bpf program
@ 2025-10-08 21:08 Jiri Olsa
  2025-10-12  4:09 ` Masami Hiramatsu
  2025-10-13 17:10 ` Steven Rostedt
  0 siblings, 2 replies; 13+ messages in thread
From: Jiri Olsa @ 2025-10-08 21:08 UTC (permalink / raw)
  To: Masami Hiramatsu, Steven Rostedt, Josh Poimboeuf
  Cc: Peter Zijlstra, Andrii Nakryiko, bpf, linux-trace-kernel, x86,
	Yonghong Song

hi,
I'm getting no stacktrace from bpf program attached on kretprobe.multi probe
(which means on top of return fprobe) on x86.

I think we need some kind of treatment we do for rethook, AFAICS the ORC unwind
stops on return_to_handler, because the stack and the function itself are not
adjusted for unwind_recover_ret_addr call

If it's any help I pushed the bpf/selftest for that in here:
  https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git/log/?h=stacktrace_test

just execute:
  # test_progs -t stacktrace_map/kretprobe_multi

thanks,
jirka

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] no ORC stacktrace from kretprobe.multi bpf program
  2025-10-08 21:08 [BUG] no ORC stacktrace from kretprobe.multi bpf program Jiri Olsa
@ 2025-10-12  4:09 ` Masami Hiramatsu
  2025-10-13 14:36   ` Jiri Olsa
  2025-10-13 17:10 ` Steven Rostedt
  1 sibling, 1 reply; 13+ messages in thread
From: Masami Hiramatsu @ 2025-10-12  4:09 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Steven Rostedt, Josh Poimboeuf, Peter Zijlstra, Andrii Nakryiko,
	bpf, linux-trace-kernel, x86, Yonghong Song

On Wed, 8 Oct 2025 23:08:26 +0200
Jiri Olsa <olsajiri@gmail.com> wrote:

> hi,
> I'm getting no stacktrace from bpf program attached on kretprobe.multi probe
> (which means on top of return fprobe) on x86.
> 
> I think we need some kind of treatment we do for rethook, AFAICS the ORC unwind
> stops on return_to_handler, because the stack and the function itself are not
> adjusted for unwind_recover_ret_addr call
> 
> If it's any help I pushed the bpf/selftest for that in here:
>   https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git/log/?h=stacktrace_test
> 
> just execute:
>   # test_progs -t stacktrace_map/kretprobe_multi

Hmm, curious. as far as we are using fgraph, stacktrace should work.
May this happen if function-graph tracer is enabled too?

Thank you,

> 
> thanks,
> jirka


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] no ORC stacktrace from kretprobe.multi bpf program
  2025-10-12  4:09 ` Masami Hiramatsu
@ 2025-10-13 14:36   ` Jiri Olsa
  0 siblings, 0 replies; 13+ messages in thread
From: Jiri Olsa @ 2025-10-13 14:36 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Jiri Olsa, Steven Rostedt, Josh Poimboeuf, Peter Zijlstra,
	Andrii Nakryiko, bpf, linux-trace-kernel, x86, Yonghong Song

On Sun, Oct 12, 2025 at 01:09:31PM +0900, Masami Hiramatsu wrote:
> On Wed, 8 Oct 2025 23:08:26 +0200
> Jiri Olsa <olsajiri@gmail.com> wrote:
> 
> > hi,
> > I'm getting no stacktrace from bpf program attached on kretprobe.multi probe
> > (which means on top of return fprobe) on x86.
> > 
> > I think we need some kind of treatment we do for rethook, AFAICS the ORC unwind
> > stops on return_to_handler, because the stack and the function itself are not
> > adjusted for unwind_recover_ret_addr call
> > 
> > If it's any help I pushed the bpf/selftest for that in here:
> >   https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git/log/?h=stacktrace_test
> > 
> > just execute:
> >   # test_progs -t stacktrace_map/kretprobe_multi
> 
> Hmm, curious. as far as we are using fgraph, stacktrace should work.
> May this happen if function-graph tracer is enabled too?

that tests is just simple kretprobe so there should be no function-graph
tracer in the way.. I plan to check on this again later this week

jirka

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] no ORC stacktrace from kretprobe.multi bpf program
  2025-10-08 21:08 [BUG] no ORC stacktrace from kretprobe.multi bpf program Jiri Olsa
  2025-10-12  4:09 ` Masami Hiramatsu
@ 2025-10-13 17:10 ` Steven Rostedt
  2025-10-15 16:06   ` Josh Poimboeuf
  1 sibling, 1 reply; 13+ messages in thread
From: Steven Rostedt @ 2025-10-13 17:10 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Masami Hiramatsu, Josh Poimboeuf, Peter Zijlstra, Andrii Nakryiko,
	bpf, linux-trace-kernel, x86, Yonghong Song

On Wed, 8 Oct 2025 23:08:26 +0200
Jiri Olsa <olsajiri@gmail.com> wrote:

> hi,
> I'm getting no stacktrace from bpf program attached on kretprobe.multi probe
> (which means on top of return fprobe) on x86.
> 
> I think we need some kind of treatment we do for rethook, AFAICS the ORC unwind
> stops on return_to_handler, because the stack and the function itself are not
> adjusted for unwind_recover_ret_addr call

Hmm, we do have a way to retrieve the actual return caller from a location
for return_to_handler:

  See kernel/trace/fgraph.c: ftrace_graph_get_ret_stack()

Hmm, I think the x86 ORC unwinder needs to use this.

-- Steve


> 
> If it's any help I pushed the bpf/selftest for that in here:
>   https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git/log/?h=stacktrace_test
> 
> just execute:
>   # test_progs -t stacktrace_map/kretprobe_multi
> 
> thanks,
> jirka


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] no ORC stacktrace from kretprobe.multi bpf program
  2025-10-13 17:10 ` Steven Rostedt
@ 2025-10-15 16:06   ` Josh Poimboeuf
  2025-10-15 16:11     ` Steven Rostedt
  0 siblings, 1 reply; 13+ messages in thread
From: Josh Poimboeuf @ 2025-10-15 16:06 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Jiri Olsa, Masami Hiramatsu, Peter Zijlstra, Andrii Nakryiko, bpf,
	linux-trace-kernel, x86, Yonghong Song

On Mon, Oct 13, 2025 at 01:10:55PM -0400, Steven Rostedt wrote:
> On Wed, 8 Oct 2025 23:08:26 +0200
> Jiri Olsa <olsajiri@gmail.com> wrote:
> 
> > hi,
> > I'm getting no stacktrace from bpf program attached on kretprobe.multi probe
> > (which means on top of return fprobe) on x86.
> > 
> > I think we need some kind of treatment we do for rethook, AFAICS the ORC unwind
> > stops on return_to_handler, because the stack and the function itself are not
> > adjusted for unwind_recover_ret_addr call
> 
> Hmm, we do have a way to retrieve the actual return caller from a location
> for return_to_handler:
> 
>   See kernel/trace/fgraph.c: ftrace_graph_get_ret_stack()
> 
> Hmm, I think the x86 ORC unwinder needs to use this.

I'm confused, is that not what ftrace_graph_ret_addr() already does?

-- 
Josh

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] no ORC stacktrace from kretprobe.multi bpf program
  2025-10-15 16:06   ` Josh Poimboeuf
@ 2025-10-15 16:11     ` Steven Rostedt
  2025-10-22  9:04       ` Feng Yang
  0 siblings, 1 reply; 13+ messages in thread
From: Steven Rostedt @ 2025-10-15 16:11 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Jiri Olsa, Masami Hiramatsu, Peter Zijlstra, Andrii Nakryiko, bpf,
	linux-trace-kernel, x86, Yonghong Song

On Wed, 15 Oct 2025 09:06:12 -0700
Josh Poimboeuf <jpoimboe@kernel.org> wrote:

> > Hmm, we do have a way to retrieve the actual return caller from a location
> > for return_to_handler:
> > 
> >   See kernel/trace/fgraph.c: ftrace_graph_get_ret_stack()
> > 
> > Hmm, I think the x86 ORC unwinder needs to use this.  
> 
> I'm confused, is that not what ftrace_graph_ret_addr() already does?

Ah yeah, that does it too. I just searched for the first function that did
the look up ;-)

Now I guess the question is, why is this not working?

-- Steve

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] no ORC stacktrace from kretprobe.multi bpf program
  2025-10-15 16:11     ` Steven Rostedt
@ 2025-10-22  9:04       ` Feng Yang
  2025-10-22 12:32         ` Jiri Olsa
  0 siblings, 1 reply; 13+ messages in thread
From: Feng Yang @ 2025-10-22  9:04 UTC (permalink / raw)
  To: rostedt
  Cc: andrii, bpf, jpoimboe, linux-trace-kernel, mhiramat, olsajiri,
	peterz, x86, yhs

On Wed, 15 Oct 2025 12:11:38 -0400 Steven Rostedt <rostedt@goodmis.org> wrote:

> > > Hmm, we do have a way to retrieve the actual return caller from a location
> > > for return_to_handler:
> > > 
> > >   See kernel/trace/fgraph.c: ftrace_graph_get_ret_stack()
> > > 
> > > Hmm, I think the x86 ORC unwinder needs to use this.  
> > 
> > I'm confused, is that not what ftrace_graph_ret_addr() already does?

> Ah yeah, that does it too. I just searched for the first function that did
> the look up ;-)

> Now I guess the question is, why is this not working?


I've also encountered this issue recently. It only outputs the stack trace of return_to_handler, for example:

# bpftrace -e 'kretprobe:vfs_rea* {@[kstack]=count()}'
Attaching 1 probe...
^C

@[
    ksys_read+192
    get_perf_callchain+211
    bpf_get_stackid+101
    cleanup_module+303100
    kprobe_multi_link_prog_run+175
    fprobe_return+265
    __ftrace_return_to_handler.isra.0+433
    return_to_handler+30
]: 1

The return stack trace when directly executing samples/fprobe/fprobe_example.c is similar:
[ 71.892353] return_to_handler: kernel_thread+0x71/0xa0
[ 71.892356] sample_exit_handler: Return from <kernel_clone+0x4/0x470> ip = 0x000000000e0e2004 to rip = 0x00000000127e6d58 (kernel_thread+0x71/0xa0)
[ 71.892361] __ftrace_return_to_handler.isra.0+0x1b1/0x280
[ 71.892363] return_to_handler+0x1e/0x50

No cases were found where the ret of the ftrace_graph_ret_addr function is equal to return_handler.

Additionally, I noticed that when the x86 architecture executes perf_callchain_kernel, perf_hw_regs(regs) is false,
and it calls unwind_start(&state, current, NULL, (void *)regs->sp);
which then proceeds to __unwind_start where the check task == current is performed.
However, the ARM architecture executes kunwind_init_from_regs(&state, regs);
instead of taking the second branch with the task == current check.

I hope these phenomena can help you analyze the cause of this issue.
Thanks.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] no ORC stacktrace from kretprobe.multi bpf program
  2025-10-22  9:04       ` Feng Yang
@ 2025-10-22 12:32         ` Jiri Olsa
  2025-10-22 14:28           ` Steven Rostedt
  0 siblings, 1 reply; 13+ messages in thread
From: Jiri Olsa @ 2025-10-22 12:32 UTC (permalink / raw)
  To: Feng Yang
  Cc: rostedt, andrii, bpf, jpoimboe, linux-trace-kernel, mhiramat,
	olsajiri, peterz, x86, yhs

On Wed, Oct 22, 2025 at 05:04:29PM +0800, Feng Yang wrote:
> On Wed, 15 Oct 2025 12:11:38 -0400 Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > > > Hmm, we do have a way to retrieve the actual return caller from a location
> > > > for return_to_handler:
> > > > 
> > > >   See kernel/trace/fgraph.c: ftrace_graph_get_ret_stack()
> > > > 
> > > > Hmm, I think the x86 ORC unwinder needs to use this.  
> > > 
> > > I'm confused, is that not what ftrace_graph_ret_addr() already does?
> 
> > Ah yeah, that does it too. I just searched for the first function that did
> > the look up ;-)
> 
> > Now I guess the question is, why is this not working?
> 
> 
> I've also encountered this issue recently. It only outputs the stack trace of return_to_handler, for example:
> 
> # bpftrace -e 'kretprobe:vfs_rea* {@[kstack]=count()}'
> Attaching 1 probe...
> ^C
> 
> @[
>     ksys_read+192
>     get_perf_callchain+211
>     bpf_get_stackid+101
>     cleanup_module+303100
>     kprobe_multi_link_prog_run+175
>     fprobe_return+265
>     __ftrace_return_to_handler.isra.0+433
>     return_to_handler+30
> ]: 1

that looks messed up

> 
> The return stack trace when directly executing samples/fprobe/fprobe_example.c is similar:
> [ 71.892353] return_to_handler: kernel_thread+0x71/0xa0
> [ 71.892356] sample_exit_handler: Return from <kernel_clone+0x4/0x470> ip = 0x000000000e0e2004 to rip = 0x00000000127e6d58 (kernel_thread+0x71/0xa0)
> [ 71.892361] __ftrace_return_to_handler.isra.0+0x1b1/0x280
> [ 71.892363] return_to_handler+0x1e/0x50
> 
> No cases were found where the ret of the ftrace_graph_ret_addr function is equal to return_handler.
> 
> Additionally, I noticed that when the x86 architecture executes perf_callchain_kernel, perf_hw_regs(regs) is false,
> and it calls unwind_start(&state, current, NULL, (void *)regs->sp);
> which then proceeds to __unwind_start where the check task == current is performed.
> However, the ARM architecture executes kunwind_init_from_regs(&state, regs);
> instead of taking the second branch with the task == current check.
> 
> I hope these phenomena can help you analyze the cause of this issue.
> Thanks.
> 

thanks for the report.. so above is from arm?

yes the x86_64 starts with:
  unwind_start(&state, current, NULL, (void *)regs->sp);

I seems to get reasonable stack traces on x86 with the change below,
which just initializes fields in regs that are used later on and sets
the stack so the ftrace_graph_ret_addr code is triggered during unwind

but I'm not familiar with this code, Masami, Josh, any idea?

thanks,
jirka


---
diff --git a/arch/x86/kernel/ftrace_64.S b/arch/x86/kernel/ftrace_64.S
index 367da3638167..2d2bb8c37b56 100644
--- a/arch/x86/kernel/ftrace_64.S
+++ b/arch/x86/kernel/ftrace_64.S
@@ -353,6 +353,8 @@ STACK_FRAME_NON_STANDARD_FP(__fentry__)
 SYM_CODE_START(return_to_handler)
 	UNWIND_HINT_UNDEFINED
 	ANNOTATE_NOENDBR
+	push $return_to_handler
+	UNWIND_HINT_FUNC
 
 	/* Save ftrace_regs for function exit context  */
 	subq $(FRAME_SIZE), %rsp
@@ -360,6 +362,9 @@ SYM_CODE_START(return_to_handler)
 	movq %rax, RAX(%rsp)
 	movq %rdx, RDX(%rsp)
 	movq %rbp, RBP(%rsp)
+	movq %rsp, RSP(%rsp)
+	movq $0, EFLAGS(%rsp)
+	movq $__KERNEL_CS, CS(%rsp)
 	movq %rsp, %rdi
 
 	call ftrace_return_to_handler
@@ -368,7 +373,8 @@ SYM_CODE_START(return_to_handler)
 	movq RDX(%rsp), %rdx
 	movq RAX(%rsp), %rax
 
-	addq $(FRAME_SIZE), %rsp
+	addq $(FRAME_SIZE) + 8, %rsp
+
 	/*
 	 * Jump back to the old return address. This cannot be JMP_NOSPEC rdi
 	 * since IBT would demand that contain ENDBR, which simply isn't so for

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [BUG] no ORC stacktrace from kretprobe.multi bpf program
  2025-10-22 12:32         ` Jiri Olsa
@ 2025-10-22 14:28           ` Steven Rostedt
  2025-10-22 20:41             ` Jiri Olsa
  0 siblings, 1 reply; 13+ messages in thread
From: Steven Rostedt @ 2025-10-22 14:28 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Feng Yang, andrii, bpf, jpoimboe, linux-trace-kernel, mhiramat,
	peterz, x86, yhs

On Wed, 22 Oct 2025 14:32:19 +0200
Jiri Olsa <olsajiri@gmail.com> wrote:

> thanks for the report.. so above is from arm?
> 
> yes the x86_64 starts with:
>   unwind_start(&state, current, NULL, (void *)regs->sp);
> 
> I seems to get reasonable stack traces on x86 with the change below,
> which just initializes fields in regs that are used later on and sets
> the stack so the ftrace_graph_ret_addr code is triggered during unwind
> 
> but I'm not familiar with this code, Masami, Josh, any idea?

Oh! This is an issue with a stack trace happening from a callback of the
exit handler?

OK, that makes much more sense. As I don't think the code handles that
properly.

> 
> thanks,
> jirka
> 
> 
> ---
> diff --git a/arch/x86/kernel/ftrace_64.S b/arch/x86/kernel/ftrace_64.S
> index 367da3638167..2d2bb8c37b56 100644
> --- a/arch/x86/kernel/ftrace_64.S
> +++ b/arch/x86/kernel/ftrace_64.S
> @@ -353,6 +353,8 @@ STACK_FRAME_NON_STANDARD_FP(__fentry__)
>  SYM_CODE_START(return_to_handler)
>  	UNWIND_HINT_UNDEFINED

I believe the above UNWIND_HINT_UNDEFINED means that if ORC were to hit
this, it should just give up.

This is because tracing the exit of the function really doesn't fit in the
normal execution paradigm.

The entry is easy. It's the same as if the callback was called by the
function being traced. The exit is more difficult because the function
being traced has already did its return. Now the callback is in this limbo
area of being called between a return and the caller.

>  	ANNOTATE_NOENDBR
> +	push $return_to_handler
> +	UNWIND_HINT_FUNC

OK, so what happened here is that you put in the return_to_handle into the
stack and told ORC that this is a normal function, and that when it
triggers to do a lookup from the handler itself.

I wonder if we could just add a new UNWIND_HINT that tells ORC to do that?

>  
>  	/* Save ftrace_regs for function exit context  */
>  	subq $(FRAME_SIZE), %rsp
> @@ -360,6 +362,9 @@ SYM_CODE_START(return_to_handler)
>  	movq %rax, RAX(%rsp)
>  	movq %rdx, RDX(%rsp)
>  	movq %rbp, RBP(%rsp)
> +	movq %rsp, RSP(%rsp)
> +	movq $0, EFLAGS(%rsp)
> +	movq $__KERNEL_CS, CS(%rsp)

Is this simulating some kind of interrupt?

>  	movq %rsp, %rdi
>  
>  	call ftrace_return_to_handler

Now it gets tricky in the ftrace_return_to_handler as the first thing it
does is to pop the shadow stack, which makes the return_to_handler lookup
different, as its no longer on the stack that the unwinder will use.

The return address will live in the "ret" variable of that function, which
the unwinder will not have access to. Yeah, this will not be easy to solve.

-- Steve

> @@ -368,7 +373,8 @@ SYM_CODE_START(return_to_handler)
>  	movq RDX(%rsp), %rdx
>  	movq RAX(%rsp), %rax
>  
> -	addq $(FRAME_SIZE), %rsp
> +	addq $(FRAME_SIZE) + 8, %rsp
> +
>  	/*
>  	 * Jump back to the old return address. This cannot be JMP_NOSPEC rdi
>  	 * since IBT would demand that contain ENDBR, which simply isn't so for

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] no ORC stacktrace from kretprobe.multi bpf program
  2025-10-22 14:28           ` Steven Rostedt
@ 2025-10-22 20:41             ` Jiri Olsa
  2025-10-22 21:17               ` Steven Rostedt
  0 siblings, 1 reply; 13+ messages in thread
From: Jiri Olsa @ 2025-10-22 20:41 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Jiri Olsa, Feng Yang, andrii, bpf, jpoimboe, linux-trace-kernel,
	mhiramat, peterz, x86, yhs

On Wed, Oct 22, 2025 at 10:28:19AM -0400, Steven Rostedt wrote:
> On Wed, 22 Oct 2025 14:32:19 +0200
> Jiri Olsa <olsajiri@gmail.com> wrote:
> 
> > thanks for the report.. so above is from arm?
> > 
> > yes the x86_64 starts with:
> >   unwind_start(&state, current, NULL, (void *)regs->sp);
> > 
> > I seems to get reasonable stack traces on x86 with the change below,
> > which just initializes fields in regs that are used later on and sets
> > the stack so the ftrace_graph_ret_addr code is triggered during unwind
> > 
> > but I'm not familiar with this code, Masami, Josh, any idea?
> 
> Oh! This is an issue with a stack trace happening from a callback of the
> exit handler?

yes, it's triggered via:

  return_to_handler
    ftrace_return_to_handler
      fprobe_return
        kprobe_multi_link_exit_handler
	  kprobe_multi_link_prog_run
	    bpf_prog_run
	      bpf_prog..
	        bpf_get_stackid
		  get_perf_callchain
		    perf_callchain_kernel
		      unwind_start

> 
> OK, that makes much more sense. As I don't think the code handles that
> properly.
> 
> > 
> > thanks,
> > jirka
> > 
> > 
> > ---
> > diff --git a/arch/x86/kernel/ftrace_64.S b/arch/x86/kernel/ftrace_64.S
> > index 367da3638167..2d2bb8c37b56 100644
> > --- a/arch/x86/kernel/ftrace_64.S
> > +++ b/arch/x86/kernel/ftrace_64.S
> > @@ -353,6 +353,8 @@ STACK_FRAME_NON_STANDARD_FP(__fentry__)
> >  SYM_CODE_START(return_to_handler)
> >  	UNWIND_HINT_UNDEFINED
> 
> I believe the above UNWIND_HINT_UNDEFINED means that if ORC were to hit
> this, it should just give up.
> 
> This is because tracing the exit of the function really doesn't fit in the
> normal execution paradigm.
> 
> The entry is easy. It's the same as if the callback was called by the
> function being traced. The exit is more difficult because the function
> being traced has already did its return. Now the callback is in this limbo
> area of being called between a return and the caller.

I followed rethook trampoline arch_rethook_trampoline code which does similar
stuff and gets similar treatment in unwind_recover_ret_addr like fgraph

> 
> >  	ANNOTATE_NOENDBR
> > +	push $return_to_handler
> > +	UNWIND_HINT_FUNC
> 
> OK, so what happened here is that you put in the return_to_handle into the
> stack and told ORC that this is a normal function, and that when it
> triggers to do a lookup from the handler itself.

together with the "push $return_to_handler" it suppose to instruct ftrace_graph_ret_addr
to go get the 'real' return address from shadow stack

> 
> I wonder if we could just add a new UNWIND_HINT that tells ORC to do that?

if I remove the initial UNWIND_HINT_UNDEFINED I get objtool warning
about unreachable instruction

> 
> >  
> >  	/* Save ftrace_regs for function exit context  */
> >  	subq $(FRAME_SIZE), %rsp
> > @@ -360,6 +362,9 @@ SYM_CODE_START(return_to_handler)
> >  	movq %rax, RAX(%rsp)
> >  	movq %rdx, RDX(%rsp)
> >  	movq %rbp, RBP(%rsp)
> > +	movq %rsp, RSP(%rsp)
> > +	movq $0, EFLAGS(%rsp)
> > +	movq $__KERNEL_CS, CS(%rsp)
> 
> Is this simulating some kind of interrupt?

there are several checks in pt_regs on these fields 

- in get_perf_callchain we check user_mode(regs) so CS has to be set
- in perf_callchain_kernel we call perf_hw_regs(regs), so EFLAGS has to be set

> 
> >  	movq %rsp, %rdi
> >  
> >  	call ftrace_return_to_handler
> 
> Now it gets tricky in the ftrace_return_to_handler as the first thing it
> does is to pop the shadow stack, which makes the return_to_handler lookup
> different, as its no longer on the stack that the unwinder will use.

hum strange.. the resulting stack trace seems ok, I'll make it a
selftest I send it

ftrace_graph_ret_addr that checks on the 'real return address seems
to have 2 ways of getting to it:

        i = *idx ? : task->curr_ret_stack;

I dont know how that previous pop affects this, but I'm sure it's
more complicated than this ;-)

jirka


> 
> The return address will live in the "ret" variable of that function, which
> the unwinder will not have access to. Yeah, this will not be easy to solve.
> 
> -- Steve
> 
> 
> > @@ -368,7 +373,8 @@ SYM_CODE_START(return_to_handler)
> >  	movq RDX(%rsp), %rdx
> >  	movq RAX(%rsp), %rax
> >  
> > -	addq $(FRAME_SIZE), %rsp
> > +	addq $(FRAME_SIZE) + 8, %rsp
> > +
> >  	/*
> >  	 * Jump back to the old return address. This cannot be JMP_NOSPEC rdi
> >  	 * since IBT would demand that contain ENDBR, which simply isn't so for
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] no ORC stacktrace from kretprobe.multi bpf program
  2025-10-22 20:41             ` Jiri Olsa
@ 2025-10-22 21:17               ` Steven Rostedt
  2025-10-23 20:42                 ` Jiri Olsa
  0 siblings, 1 reply; 13+ messages in thread
From: Steven Rostedt @ 2025-10-22 21:17 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Feng Yang, andrii, bpf, jpoimboe, linux-trace-kernel, mhiramat,
	peterz, x86, yhs

On Wed, 22 Oct 2025 22:41:20 +0200
Jiri Olsa <olsajiri@gmail.com> wrote:

> >   
> > >  	ANNOTATE_NOENDBR
> > > +	push $return_to_handler
> > > +	UNWIND_HINT_FUNC  
> > 
> > OK, so what happened here is that you put in the return_to_handle into the
> > stack and told ORC that this is a normal function, and that when it
> > triggers to do a lookup from the handler itself.  
> 
> together with the "push $return_to_handler" it suppose to instruct ftrace_graph_ret_addr
> to go get the 'real' return address from shadow stack
> 
> > 
> > I wonder if we could just add a new UNWIND_HINT that tells ORC to do that?  
> 
> if I remove the initial UNWIND_HINT_UNDEFINED I get objtool warning
> about unreachable instruction

Right. I was thinking we add UNWIND_HINT_RETHOOK and an
UNWIND_HINT_TYPE_RETHOOK that lets objtool know that this function is a
"return_to_hook" function and the unwinder can do something special with it.

> 
> >   
> > >  
> > >  	/* Save ftrace_regs for function exit context  */
> > >  	subq $(FRAME_SIZE), %rsp
> > > @@ -360,6 +362,9 @@ SYM_CODE_START(return_to_handler)
> > >  	movq %rax, RAX(%rsp)
> > >  	movq %rdx, RDX(%rsp)
> > >  	movq %rbp, RBP(%rsp)
> > > +	movq %rsp, RSP(%rsp)
> > > +	movq $0, EFLAGS(%rsp)
> > > +	movq $__KERNEL_CS, CS(%rsp)  
> > 
> > Is this simulating some kind of interrupt?  
> 
> there are several checks in pt_regs on these fields 
> 
> - in get_perf_callchain we check user_mode(regs) so CS has to be set
> - in perf_callchain_kernel we call perf_hw_regs(regs), so EFLAGS has to be set

So this is a different issue. I rather have this added in
kprobe_multi_link_prog_run as its the only user of it. Or have the
ftrace_regs conversion update it. This isn't something that should be done
at every call and slow everyone else down.

> 
> >   
> > >  	movq %rsp, %rdi
> > >  
> > >  	call ftrace_return_to_handler  
> > 
> > Now it gets tricky in the ftrace_return_to_handler as the first thing it
> > does is to pop the shadow stack, which makes the return_to_handler lookup
> > different, as its no longer on the stack that the unwinder will use.  
> 
> hum strange.. the resulting stack trace seems ok, I'll make it a
> selftest I send it
> 
> ftrace_graph_ret_addr that checks on the 'real return address seems
> to have 2 ways of getting to it:
> 
>         i = *idx ? : task->curr_ret_stack;
> 
> I dont know how that previous pop affects this, but I'm sure it's
> more complicated than this ;-)

Oh wait, it may be OK. I forgot I had to change the pop function to give
the data back, but it doesn't modify the task->curr_ret_stack until after
it calls all the callbacks. That's because the shadow stack still has the
data that is being passed from the entry callback. So it can't be updated
yet otherwise that data on the shadow stack will get corrupted.

Yeah, the return_to_handler should work up until the end of
ftrace_return_to_handler().

-- Steve


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] no ORC stacktrace from kretprobe.multi bpf program
  2025-10-22 21:17               ` Steven Rostedt
@ 2025-10-23 20:42                 ` Jiri Olsa
  2025-10-23 20:55                   ` Steven Rostedt
  0 siblings, 1 reply; 13+ messages in thread
From: Jiri Olsa @ 2025-10-23 20:42 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Jiri Olsa, Feng Yang, andrii, bpf, jpoimboe, linux-trace-kernel,
	mhiramat, peterz, x86, yhs

On Wed, Oct 22, 2025 at 05:17:11PM -0400, Steven Rostedt wrote:
> On Wed, 22 Oct 2025 22:41:20 +0200
> Jiri Olsa <olsajiri@gmail.com> wrote:
> 
> > >   
> > > >  	ANNOTATE_NOENDBR
> > > > +	push $return_to_handler
> > > > +	UNWIND_HINT_FUNC  
> > > 
> > > OK, so what happened here is that you put in the return_to_handle into the
> > > stack and told ORC that this is a normal function, and that when it
> > > triggers to do a lookup from the handler itself.  
> > 
> > together with the "push $return_to_handler" it suppose to instruct ftrace_graph_ret_addr
> > to go get the 'real' return address from shadow stack
> > 
> > > 
> > > I wonder if we could just add a new UNWIND_HINT that tells ORC to do that?  
> > 
> > if I remove the initial UNWIND_HINT_UNDEFINED I get objtool warning
> > about unreachable instruction
> 
> Right. I was thinking we add UNWIND_HINT_RETHOOK and an
> UNWIND_HINT_TYPE_RETHOOK that lets objtool know that this function is a
> "return_to_hook" function and the unwinder can do something special with it.
> 
> > 
> > >   
> > > >  
> > > >  	/* Save ftrace_regs for function exit context  */
> > > >  	subq $(FRAME_SIZE), %rsp
> > > > @@ -360,6 +362,9 @@ SYM_CODE_START(return_to_handler)
> > > >  	movq %rax, RAX(%rsp)
> > > >  	movq %rdx, RDX(%rsp)
> > > >  	movq %rbp, RBP(%rsp)
> > > > +	movq %rsp, RSP(%rsp)
> > > > +	movq $0, EFLAGS(%rsp)
> > > > +	movq $__KERNEL_CS, CS(%rsp)  
> > > 
> > > Is this simulating some kind of interrupt?  
> > 
> > there are several checks in pt_regs on these fields 
> > 
> > - in get_perf_callchain we check user_mode(regs) so CS has to be set
> > - in perf_callchain_kernel we call perf_hw_regs(regs), so EFLAGS has to be set
> 
> So this is a different issue. I rather have this added in
> kprobe_multi_link_prog_run as its the only user of it. Or have the

there's also fprobe tracer that probably needs it as well

> ftrace_regs conversion update it. This isn't something that should be done
> at every call and slow everyone else down.

I think it's ok, but not sure where to get rsp value at that point,
perhaps we could just use the pt_regs address

jirka

> 
> > 
> > >   
> > > >  	movq %rsp, %rdi
> > > >  
> > > >  	call ftrace_return_to_handler  

SNIP

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] no ORC stacktrace from kretprobe.multi bpf program
  2025-10-23 20:42                 ` Jiri Olsa
@ 2025-10-23 20:55                   ` Steven Rostedt
  0 siblings, 0 replies; 13+ messages in thread
From: Steven Rostedt @ 2025-10-23 20:55 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Feng Yang, andrii, bpf, jpoimboe, linux-trace-kernel, mhiramat,
	peterz, x86, yhs

On Thu, 23 Oct 2025 22:42:08 +0200
Jiri Olsa <olsajiri@gmail.com> wrote:

> > > > > @@ -360,6 +362,9 @@ SYM_CODE_START(return_to_handler)
> > > > >  	movq %rax, RAX(%rsp)
> > > > >  	movq %rdx, RDX(%rsp)
> > > > >  	movq %rbp, RBP(%rsp)
> > > > > +	movq %rsp, RSP(%rsp)
> > > > > +	movq $0, EFLAGS(%rsp)
> > > > > +	movq $__KERNEL_CS, CS(%rsp)    
> > > > 
> > > > Is this simulating some kind of interrupt?    
> > > 
> > > there are several checks in pt_regs on these fields 
> > > 
> > > - in get_perf_callchain we check user_mode(regs) so CS has to be set
> > > - in perf_callchain_kernel we call perf_hw_regs(regs), so EFLAGS has to be set  
> > 
> > So this is a different issue. I rather have this added in
> > kprobe_multi_link_prog_run as its the only user of it. Or have the  
> 
> there's also fprobe tracer that probably needs it as well
> 
> > ftrace_regs conversion update it. This isn't something that should be done
> > at every call and slow everyone else down.  
> 
> I think it's ok, but not sure where to get rsp value at that point,
> perhaps we could just use the pt_regs address

Oh, rsp is fine to add, as that's one of the items expected for
ftrace_regs. It's the flags and CS that isn't needed.

-- Steve

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2025-10-23 20:54 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-08 21:08 [BUG] no ORC stacktrace from kretprobe.multi bpf program Jiri Olsa
2025-10-12  4:09 ` Masami Hiramatsu
2025-10-13 14:36   ` Jiri Olsa
2025-10-13 17:10 ` Steven Rostedt
2025-10-15 16:06   ` Josh Poimboeuf
2025-10-15 16:11     ` Steven Rostedt
2025-10-22  9:04       ` Feng Yang
2025-10-22 12:32         ` Jiri Olsa
2025-10-22 14:28           ` Steven Rostedt
2025-10-22 20:41             ` Jiri Olsa
2025-10-22 21:17               ` Steven Rostedt
2025-10-23 20:42                 ` Jiri Olsa
2025-10-23 20:55                   ` Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).