linux-trace-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* ARM Ftrace Function Graph Fails With UNWINDER_FRAME_POINTER
@ 2023-11-30 23:47 Justin Chen
  2023-12-01  9:12 ` Ard Biesheuvel
  2024-05-27  7:43 ` Thorsten Scherer
  0 siblings, 2 replies; 15+ messages in thread
From: Justin Chen @ 2023-11-30 23:47 UTC (permalink / raw)
  To: rostedt, mhiramat, mark.rutland, linux, linux-trace-kernel,
	linux-arm-kernel, ardb
  Cc: Florian Fainelli, Doug Berger

[-- Attachment #1: Type: text/plain, Size: 3041 bytes --]

Hello,

Ran into an odd bug that I am unsure what the solution is. Tested a few 
kernels versions and they all fail the same.

FUNCTION_GRAPH_FP_TEST was enabled with 953f534a7ed6 ("ARM: ftrace: 
enable HAVE_FUNCTION_GRAPH_FP_TEST"). This test fails when 
UNWINDER_FRAME_POINTER is enabled. Enable function_graph tracer and you 
should see a failure similar to below.

[   63.817239] ------------[ cut here ]------------
[   63.822006] WARNING: CPU: 3 PID: 1185 at kernel/trace/fgraph.c:195 
ftrace_return_to_handler+0x228/0x374
[   63.831645] Bad frame pointer: expected d1e0df40, received d1e0df48
[   63.831645]   from func packet_setsockopt return to c0b558f4
[   63.843801] Modules linked in: bdc udc_core
[   63.848246] CPU: 3 PID: 1185 Comm: udhcpc Not tainted 
6.1.53-0.1pre-gf0bc552d12f8 #33
[   63.856209] Hardware name: Broadcom STB (Flattened Device Tree)
[   63.862227] Backtrace:
[   63.864761]  dump_backtrace from show_stack+0x20/0x24
[   63.869982]  r7:c031cd8c r6:00000009 r5:00000013 r4:c11c7fac
[   63.875736]  show_stack from dump_stack_lvl+0x48/0x54
[   63.880929]  dump_stack_lvl from dump_stack+0x18/0x1c
[   63.886111]  r5:000000c3 r4:c11bd92c
[   63.889764]  dump_stack from __warn+0x88/0x130
[   63.894339]  __warn from warn_slowpath_fmt+0x140/0x198
[   63.899631]  r8:d1e0deac r7:c11bd958 r6:c031cd8c r5:c11bd92c r4:00000000
[   63.906431]  warn_slowpath_fmt from ftrace_return_to_handler+0x228/0x374
[   63.913294]  r8:c3a8d840 r7:00000002 r6:d1e0df48 r5:c2377a94 r4:c269a400
[   63.920095]  ftrace_return_to_handler from return_to_handler+0xc/0x18
[   63.926699]  r8:c0cd8ed0 r7:00000008 r6:c418c500 r5:00000004 r4:00000107
[   63.933500]  __sys_setsockopt from return_to_handler+0x0/0x18
[   63.939415]  r8:c02002bc r7:00000126 r6:00000003 r5:00000000 r4:00000004
[   63.946217]  sys_setsockopt from return_to_handler+0x0/0x18
[   63.952053] ---[ end trace 0000000000000000 ]---

Sure enough the top of the parent stack is off by 8. (Tested with 
gcc6.3/gcc8.3/gcc12.3)
00006dcc <packet_setsockopt>:
     6dcc:       e1a0c00d        mov     ip, sp
     6dd0:       e24dd008        sub     sp, sp, #8 <======
     6dd4:       e92ddff0        push    {r4, r5, r6, r7, r8, r9, sl, 
fp, ip, lr, pc}
     6dd8:       e24cb00c        sub     fp, ip, #12
     6ddc:       e24dd06c        sub     sp, sp, #108    @ 0x6c
     6de0:       e52de004        push    {lr}            @ (str lr, [sp, 
#-4]!)
     6de4:       ebfffffe        bl      0 <__gnu_mcount_nc>

I'm not quite sure why gcc is putting this extra 8 byte frame (maybe 
some optimization?), but it isn't being accounted for thus the 
FUNCTION_GRAPH_FP_TEST for arm fails. Note that only some functions do 
this. Function graph works with FUNCTION_GRAPH_FP_TEST disabled, so it 
looks the test is hitting false positives.

Thanks,
Justin

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4206 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: ARM Ftrace Function Graph Fails With UNWINDER_FRAME_POINTER
  2023-11-30 23:47 ARM Ftrace Function Graph Fails With UNWINDER_FRAME_POINTER Justin Chen
@ 2023-12-01  9:12 ` Ard Biesheuvel
  2023-12-01 17:25   ` Justin Chen
  2023-12-01 18:22   ` Russell King (Oracle)
  2024-05-27  7:43 ` Thorsten Scherer
  1 sibling, 2 replies; 15+ messages in thread
From: Ard Biesheuvel @ 2023-12-01  9:12 UTC (permalink / raw)
  To: Justin Chen
  Cc: rostedt, mhiramat, mark.rutland, linux, linux-trace-kernel,
	linux-arm-kernel, Florian Fainelli, Doug Berger

On Fri, 1 Dec 2023 at 00:48, Justin Chen <justin.chen@broadcom.com> wrote:
>
> Hello,
>
> Ran into an odd bug that I am unsure what the solution is. Tested a few
> kernels versions and they all fail the same.
>
> FUNCTION_GRAPH_FP_TEST was enabled with 953f534a7ed6 ("ARM: ftrace:
> enable HAVE_FUNCTION_GRAPH_FP_TEST"). This test fails when
> UNWINDER_FRAME_POINTER is enabled. Enable function_graph tracer and you
> should see a failure similar to below.
>
> [   63.817239] ------------[ cut here ]------------
> [   63.822006] WARNING: CPU: 3 PID: 1185 at kernel/trace/fgraph.c:195
> ftrace_return_to_handler+0x228/0x374
> [   63.831645] Bad frame pointer: expected d1e0df40, received d1e0df48
> [   63.831645]   from func packet_setsockopt return to c0b558f4
> [   63.843801] Modules linked in: bdc udc_core
> [   63.848246] CPU: 3 PID: 1185 Comm: udhcpc Not tainted
> 6.1.53-0.1pre-gf0bc552d12f8 #33
> [   63.856209] Hardware name: Broadcom STB (Flattened Device Tree)
> [   63.862227] Backtrace:
> [   63.864761]  dump_backtrace from show_stack+0x20/0x24
> [   63.869982]  r7:c031cd8c r6:00000009 r5:00000013 r4:c11c7fac
> [   63.875736]  show_stack from dump_stack_lvl+0x48/0x54
> [   63.880929]  dump_stack_lvl from dump_stack+0x18/0x1c
> [   63.886111]  r5:000000c3 r4:c11bd92c
> [   63.889764]  dump_stack from __warn+0x88/0x130
> [   63.894339]  __warn from warn_slowpath_fmt+0x140/0x198
> [   63.899631]  r8:d1e0deac r7:c11bd958 r6:c031cd8c r5:c11bd92c r4:00000000
> [   63.906431]  warn_slowpath_fmt from ftrace_return_to_handler+0x228/0x374
> [   63.913294]  r8:c3a8d840 r7:00000002 r6:d1e0df48 r5:c2377a94 r4:c269a400
> [   63.920095]  ftrace_return_to_handler from return_to_handler+0xc/0x18
> [   63.926699]  r8:c0cd8ed0 r7:00000008 r6:c418c500 r5:00000004 r4:00000107
> [   63.933500]  __sys_setsockopt from return_to_handler+0x0/0x18
> [   63.939415]  r8:c02002bc r7:00000126 r6:00000003 r5:00000000 r4:00000004
> [   63.946217]  sys_setsockopt from return_to_handler+0x0/0x18
> [   63.952053] ---[ end trace 0000000000000000 ]---
>
> Sure enough the top of the parent stack is off by 8. (Tested with
> gcc6.3/gcc8.3/gcc12.3)
> 00006dcc <packet_setsockopt>:
>      6dcc:       e1a0c00d        mov     ip, sp
>      6dd0:       e24dd008        sub     sp, sp, #8 <======
>      6dd4:       e92ddff0        push    {r4, r5, r6, r7, r8, r9, sl,
> fp, ip, lr, pc}
>      6dd8:       e24cb00c        sub     fp, ip, #12
>      6ddc:       e24dd06c        sub     sp, sp, #108    @ 0x6c
>      6de0:       e52de004        push    {lr}            @ (str lr, [sp,
> #-4]!)
>      6de4:       ebfffffe        bl      0 <__gnu_mcount_nc>
>
> I'm not quite sure why gcc is putting this extra 8 byte frame (maybe
> some optimization?), but it isn't being accounted for thus the
> FUNCTION_GRAPH_FP_TEST for arm fails. Note that only some functions do
> this. Function graph works with FUNCTION_GRAPH_FP_TEST disabled, so it
> looks the test is hitting false positives.
>

Thanks for the report.

It appears the sub instruction at 0x6dd0 correctly accounts for the
extra 8 bytes, so the frame pointer is valid. So it is our assumption
that there are no gaps between the stack frames is invalid.

Could you try the following change please?

--- a/arch/arm/kernel/ftrace.c
+++ b/arch/arm/kernel/ftrace.c
@@ -235,8 +235,12 @@
                return;

        if (IS_ENABLED(CONFIG_UNWINDER_FRAME_POINTER)) {
-               /* FP points one word below parent's top of stack */
-               frame_pointer += 4;
+               /*
+                * The top of stack of the parent is recorded in the stack
+                * frame at offset [fp, #-8].
+                */
+               get_kernel_nofault(frame_pointer,
+                                  (unsigned long *)(frame_pointer - 8));
        } else {
                struct stackframe frame = {
                        .fp = frame_pointer,

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: ARM Ftrace Function Graph Fails With UNWINDER_FRAME_POINTER
  2023-12-01  9:12 ` Ard Biesheuvel
@ 2023-12-01 17:25   ` Justin Chen
  2023-12-01 18:07     ` Steven Rostedt
  2023-12-01 18:22   ` Russell King (Oracle)
  1 sibling, 1 reply; 15+ messages in thread
From: Justin Chen @ 2023-12-01 17:25 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: rostedt, mhiramat, mark.rutland, linux, linux-trace-kernel,
	linux-arm-kernel, Florian Fainelli, Doug Berger

[-- Attachment #1: Type: text/plain, Size: 4439 bytes --]



On 12/1/2023 1:12 AM, Ard Biesheuvel wrote:
> On Fri, 1 Dec 2023 at 00:48, Justin Chen <justin.chen@broadcom.com> wrote:
>>
>> Hello,
>>
>> Ran into an odd bug that I am unsure what the solution is. Tested a few
>> kernels versions and they all fail the same.
>>
>> FUNCTION_GRAPH_FP_TEST was enabled with 953f534a7ed6 ("ARM: ftrace:
>> enable HAVE_FUNCTION_GRAPH_FP_TEST"). This test fails when
>> UNWINDER_FRAME_POINTER is enabled. Enable function_graph tracer and you
>> should see a failure similar to below.
>>
>> [   63.817239] ------------[ cut here ]------------
>> [   63.822006] WARNING: CPU: 3 PID: 1185 at kernel/trace/fgraph.c:195
>> ftrace_return_to_handler+0x228/0x374
>> [   63.831645] Bad frame pointer: expected d1e0df40, received d1e0df48
>> [   63.831645]   from func packet_setsockopt return to c0b558f4
>> [   63.843801] Modules linked in: bdc udc_core
>> [   63.848246] CPU: 3 PID: 1185 Comm: udhcpc Not tainted
>> 6.1.53-0.1pre-gf0bc552d12f8 #33
>> [   63.856209] Hardware name: Broadcom STB (Flattened Device Tree)
>> [   63.862227] Backtrace:
>> [   63.864761]  dump_backtrace from show_stack+0x20/0x24
>> [   63.869982]  r7:c031cd8c r6:00000009 r5:00000013 r4:c11c7fac
>> [   63.875736]  show_stack from dump_stack_lvl+0x48/0x54
>> [   63.880929]  dump_stack_lvl from dump_stack+0x18/0x1c
>> [   63.886111]  r5:000000c3 r4:c11bd92c
>> [   63.889764]  dump_stack from __warn+0x88/0x130
>> [   63.894339]  __warn from warn_slowpath_fmt+0x140/0x198
>> [   63.899631]  r8:d1e0deac r7:c11bd958 r6:c031cd8c r5:c11bd92c r4:00000000
>> [   63.906431]  warn_slowpath_fmt from ftrace_return_to_handler+0x228/0x374
>> [   63.913294]  r8:c3a8d840 r7:00000002 r6:d1e0df48 r5:c2377a94 r4:c269a400
>> [   63.920095]  ftrace_return_to_handler from return_to_handler+0xc/0x18
>> [   63.926699]  r8:c0cd8ed0 r7:00000008 r6:c418c500 r5:00000004 r4:00000107
>> [   63.933500]  __sys_setsockopt from return_to_handler+0x0/0x18
>> [   63.939415]  r8:c02002bc r7:00000126 r6:00000003 r5:00000000 r4:00000004
>> [   63.946217]  sys_setsockopt from return_to_handler+0x0/0x18
>> [   63.952053] ---[ end trace 0000000000000000 ]---
>>
>> Sure enough the top of the parent stack is off by 8. (Tested with
>> gcc6.3/gcc8.3/gcc12.3)
>> 00006dcc <packet_setsockopt>:
>>       6dcc:       e1a0c00d        mov     ip, sp
>>       6dd0:       e24dd008        sub     sp, sp, #8 <======
>>       6dd4:       e92ddff0        push    {r4, r5, r6, r7, r8, r9, sl,
>> fp, ip, lr, pc}
>>       6dd8:       e24cb00c        sub     fp, ip, #12
>>       6ddc:       e24dd06c        sub     sp, sp, #108    @ 0x6c
>>       6de0:       e52de004        push    {lr}            @ (str lr, [sp,
>> #-4]!)
>>       6de4:       ebfffffe        bl      0 <__gnu_mcount_nc>
>>
>> I'm not quite sure why gcc is putting this extra 8 byte frame (maybe
>> some optimization?), but it isn't being accounted for thus the
>> FUNCTION_GRAPH_FP_TEST for arm fails. Note that only some functions do
>> this. Function graph works with FUNCTION_GRAPH_FP_TEST disabled, so it
>> looks the test is hitting false positives.
>>
> 
> Thanks for the report.
> 
> It appears the sub instruction at 0x6dd0 correctly accounts for the
> extra 8 bytes, so the frame pointer is valid. So it is our assumption
> that there are no gaps between the stack frames is invalid.

Thanks for the assistance. The gap between the stack frame depends on 
the function. Most do not have a gap. Some have 8 (as shown above), some 
have 12. A single assumption here is not going to work. I'm having a 
hard time finding out the reasoning for this gap. I tried disabling a 
bunch of gcc flags as well as -O2 and the gap still exists.

Thanks,
Justin

> 
> Could you try the following change please?
> 
> --- a/arch/arm/kernel/ftrace.c
> +++ b/arch/arm/kernel/ftrace.c
> @@ -235,8 +235,12 @@
>                  return;
> 
>          if (IS_ENABLED(CONFIG_UNWINDER_FRAME_POINTER)) {
> -               /* FP points one word below parent's top of stack */
> -               frame_pointer += 4;
> +               /*
> +                * The top of stack of the parent is recorded in the stack
> +                * frame at offset [fp, #-8].
> +                */
> +               get_kernel_nofault(frame_pointer,
> +                                  (unsigned long *)(frame_pointer - 8));
>          } else {
>                  struct stackframe frame = {
>                          .fp = frame_pointer,

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4206 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: ARM Ftrace Function Graph Fails With UNWINDER_FRAME_POINTER
  2023-12-01 17:25   ` Justin Chen
@ 2023-12-01 18:07     ` Steven Rostedt
  2023-12-01 22:59       ` Justin Chen
  0 siblings, 1 reply; 15+ messages in thread
From: Steven Rostedt @ 2023-12-01 18:07 UTC (permalink / raw)
  To: Justin Chen
  Cc: Ard Biesheuvel, mhiramat, mark.rutland, linux, linux-trace-kernel,
	linux-arm-kernel, Florian Fainelli, Doug Berger

On Fri, 1 Dec 2023 09:25:59 -0800
Justin Chen <justin.chen@broadcom.com> wrote:

> > It appears the sub instruction at 0x6dd0 correctly accounts for the
> > extra 8 bytes, so the frame pointer is valid. So it is our assumption
> > that there are no gaps between the stack frames is invalid.  
> 
> Thanks for the assistance. The gap between the stack frame depends on 
> the function. Most do not have a gap. Some have 8 (as shown above), some 
> have 12. A single assumption here is not going to work. I'm having a 
> hard time finding out the reasoning for this gap. I tried disabling a 
> bunch of gcc flags as well as -O2 and the gap still exists.

That code was originally added because of some strange things that gcc did
with mcount (for example, it made a copy of the stack frame that it passed
to mcount, where the function graph tracer replaced the copy of the return
stack making the shadow stack go out of sync and crash). This was very hard
to debug and I added this code to detect it if it happened again.

Well it's been over a decade since that happened (2009).

  71e308a239c09 ("function-graph: add stack frame test")

I'm happy assuming that the compiler folks are aware of our tricks with
hijacking return calls and I don't expect it to happen again. We can just
rip out those checks. That is, if it's only causing false positives, I
don't think it's worth keeping around.

Has it detected any real issues on the Arm platforms?

-- Steve

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: ARM Ftrace Function Graph Fails With UNWINDER_FRAME_POINTER
  2023-12-01  9:12 ` Ard Biesheuvel
  2023-12-01 17:25   ` Justin Chen
@ 2023-12-01 18:22   ` Russell King (Oracle)
  1 sibling, 0 replies; 15+ messages in thread
From: Russell King (Oracle) @ 2023-12-01 18:22 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Justin Chen, rostedt, mhiramat, mark.rutland, linux-trace-kernel,
	linux-arm-kernel, Florian Fainelli, Doug Berger

On Fri, Dec 01, 2023 at 10:12:48AM +0100, Ard Biesheuvel wrote:
> It appears the sub instruction at 0x6dd0 correctly accounts for the
> extra 8 bytes, so the frame pointer is valid. So it is our assumption
> that there are no gaps between the stack frames is invalid.
> 
> Could you try the following change please?
> 
> --- a/arch/arm/kernel/ftrace.c
> +++ b/arch/arm/kernel/ftrace.c
> @@ -235,8 +235,12 @@
>                 return;
> 
>         if (IS_ENABLED(CONFIG_UNWINDER_FRAME_POINTER)) {
> -               /* FP points one word below parent's top of stack */
> -               frame_pointer += 4;
> +               /*
> +                * The top of stack of the parent is recorded in the stack
> +                * frame at offset [fp, #-8].
> +                */
> +               get_kernel_nofault(frame_pointer,
> +                                  (unsigned long *)(frame_pointer - 8));

Yes, this will get the value of the stack pointer when the function
was entered - which may be the bottom of the parent function's stack
_or_ the start of non-register arguments to this function. So your
replacement has always been more correct.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: ARM Ftrace Function Graph Fails With UNWINDER_FRAME_POINTER
  2023-12-01 18:07     ` Steven Rostedt
@ 2023-12-01 22:59       ` Justin Chen
  2023-12-02  6:53         ` Ard Biesheuvel
  0 siblings, 1 reply; 15+ messages in thread
From: Justin Chen @ 2023-12-01 22:59 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Ard Biesheuvel, mhiramat, mark.rutland, linux, linux-trace-kernel,
	linux-arm-kernel, Florian Fainelli, Doug Berger

[-- Attachment #1: Type: text/plain, Size: 1780 bytes --]



On 12/1/23 10:07 AM, Steven Rostedt wrote:
> On Fri, 1 Dec 2023 09:25:59 -0800
> Justin Chen <justin.chen@broadcom.com> wrote:
> 
>>> It appears the sub instruction at 0x6dd0 correctly accounts for the
>>> extra 8 bytes, so the frame pointer is valid. So it is our assumption
>>> that there are no gaps between the stack frames is invalid.
>>
>> Thanks for the assistance. The gap between the stack frame depends on
>> the function. Most do not have a gap. Some have 8 (as shown above), some
>> have 12. A single assumption here is not going to work. I'm having a
>> hard time finding out the reasoning for this gap. I tried disabling a
>> bunch of gcc flags as well as -O2 and the gap still exists.
> 
> That code was originally added because of some strange things that gcc did
> with mcount (for example, it made a copy of the stack frame that it passed
> to mcount, where the function graph tracer replaced the copy of the return
> stack making the shadow stack go out of sync and crash). This was very hard
> to debug and I added this code to detect it if it happened again.
> 
> Well it's been over a decade since that happened (2009).
> 
>    71e308a239c09 ("function-graph: add stack frame test")
> 
> I'm happy assuming that the compiler folks are aware of our tricks with
> hijacking return calls and I don't expect it to happen again. We can just
> rip out those checks. That is, if it's only causing false positives, I
> don't think it's worth keeping around.
> 
> Has it detected any real issues on the Arm platforms?
> 
> -- Steve

I am not familiar enough to make a call. But from my limited testing 
with ARM, I didn't see any issues. If you would like me to, I can submit 
a patch to remove the check entirely. Or maybe only disable it for ARM?

Thanks,
Justin

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4206 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: ARM Ftrace Function Graph Fails With UNWINDER_FRAME_POINTER
  2023-12-01 22:59       ` Justin Chen
@ 2023-12-02  6:53         ` Ard Biesheuvel
  2023-12-02  8:49           ` Justin Chen
  0 siblings, 1 reply; 15+ messages in thread
From: Ard Biesheuvel @ 2023-12-02  6:53 UTC (permalink / raw)
  To: Justin Chen
  Cc: Steven Rostedt, mhiramat, mark.rutland, linux, linux-trace-kernel,
	linux-arm-kernel, Florian Fainelli, Doug Berger

On Fri, 1 Dec 2023 at 23:59, Justin Chen <justin.chen@broadcom.com> wrote:
>
>
>
> On 12/1/23 10:07 AM, Steven Rostedt wrote:
> > On Fri, 1 Dec 2023 09:25:59 -0800
> > Justin Chen <justin.chen@broadcom.com> wrote:
> >
> >>> It appears the sub instruction at 0x6dd0 correctly accounts for the
> >>> extra 8 bytes, so the frame pointer is valid. So it is our assumption
> >>> that there are no gaps between the stack frames is invalid.
> >>
> >> Thanks for the assistance. The gap between the stack frame depends on
> >> the function. Most do not have a gap. Some have 8 (as shown above), some
> >> have 12. A single assumption here is not going to work. I'm having a
> >> hard time finding out the reasoning for this gap. I tried disabling a
> >> bunch of gcc flags as well as -O2 and the gap still exists.
> >
> > That code was originally added because of some strange things that gcc did
> > with mcount (for example, it made a copy of the stack frame that it passed
> > to mcount, where the function graph tracer replaced the copy of the return
> > stack making the shadow stack go out of sync and crash). This was very hard
> > to debug and I added this code to detect it if it happened again.
> >
> > Well it's been over a decade since that happened (2009).
> >
> >    71e308a239c09 ("function-graph: add stack frame test")
> >
> > I'm happy assuming that the compiler folks are aware of our tricks with
> > hijacking return calls and I don't expect it to happen again. We can just
> > rip out those checks. That is, if it's only causing false positives, I
> > don't think it's worth keeping around.
> >
> > Has it detected any real issues on the Arm platforms?
> >
> > -- Steve
>
> I am not familiar enough to make a call. But from my limited testing
> with ARM, I didn't see any issues. If you would like me to, I can submit
> a patch to remove the check entirely. Or maybe only disable it for ARM?
>

Please try the fix I proposed first.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: ARM Ftrace Function Graph Fails With UNWINDER_FRAME_POINTER
  2023-12-02  6:53         ` Ard Biesheuvel
@ 2023-12-02  8:49           ` Justin Chen
  2023-12-02  9:26             ` Ard Biesheuvel
  0 siblings, 1 reply; 15+ messages in thread
From: Justin Chen @ 2023-12-02  8:49 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Steven Rostedt, mhiramat, mark.rutland, linux, linux-trace-kernel,
	linux-arm-kernel, Florian Fainelli, Doug Berger

[-- Attachment #1: Type: text/plain, Size: 2893 bytes --]



On 12/1/23 10:53 PM, Ard Biesheuvel wrote:
> On Fri, 1 Dec 2023 at 23:59, Justin Chen <justin.chen@broadcom.com> wrote:
>>
>>
>>
>> On 12/1/23 10:07 AM, Steven Rostedt wrote:
>>> On Fri, 1 Dec 2023 09:25:59 -0800
>>> Justin Chen <justin.chen@broadcom.com> wrote:
>>>
>>>>> It appears the sub instruction at 0x6dd0 correctly accounts for the
>>>>> extra 8 bytes, so the frame pointer is valid. So it is our assumption
>>>>> that there are no gaps between the stack frames is invalid.
>>>>
>>>> Thanks for the assistance. The gap between the stack frame depends on
>>>> the function. Most do not have a gap. Some have 8 (as shown above), some
>>>> have 12. A single assumption here is not going to work. I'm having a
>>>> hard time finding out the reasoning for this gap. I tried disabling a
>>>> bunch of gcc flags as well as -O2 and the gap still exists.
>>>
>>> That code was originally added because of some strange things that gcc did
>>> with mcount (for example, it made a copy of the stack frame that it passed
>>> to mcount, where the function graph tracer replaced the copy of the return
>>> stack making the shadow stack go out of sync and crash). This was very hard
>>> to debug and I added this code to detect it if it happened again.
>>>
>>> Well it's been over a decade since that happened (2009).
>>>
>>>     71e308a239c09 ("function-graph: add stack frame test")
>>>
>>> I'm happy assuming that the compiler folks are aware of our tricks with
>>> hijacking return calls and I don't expect it to happen again. We can just
>>> rip out those checks. That is, if it's only causing false positives, I
>>> don't think it's worth keeping around.
>>>
>>> Has it detected any real issues on the Arm platforms?
>>>
>>> -- Steve
>>
>> I am not familiar enough to make a call. But from my limited testing
>> with ARM, I didn't see any issues. If you would like me to, I can submit
>> a patch to remove the check entirely. Or maybe only disable it for ARM?
>>
> 
> Please try the fix I proposed first.

Just tested it. Seems to do the trick. Either solution works for me.

FWIW I also experimented with LLVM, looks like function_graph just 
crashes regardless of the issue being discussed. The disassemble of 
LLVM[1] does something completely different.

Thanks,
Justin

[1]
LLVM dump
c0c6faa0 <sk_getsockopt>:
c0c6faa0: f0 4f 2d e9   push    {r4, r5, r6, r7, r8, r9, r10, r11, lr}
c0c6faa4: 1c b0 8d e2   add     r11, sp, #28
c0c6faa8: ac d0 4d e2   sub     sp, sp, #172
c0c6faac: 00 70 a0 e1   mov     r7, r0
c0c6fab0: c8 0c 04 e3   movw    r0, #19656
c0c6fab4: 80 02 4c e3   movt    r0, #49792
c0c6fab8: 03 50 a0 e1   mov     r5, r3
c0c6fabc: 00 00 90 e5   ldr     r0, [r0]
c0c6fac0: 02 a0 a0 e1   mov     r10, r2
c0c6fac4: 20 00 0b e5   str     r0, [r11, #-32]
c0c6fac8: 00 40 2d e9   stmdb   sp!, {lr}
c0c6facc: 4b 8b d6 eb   bl      0xc0212800 <__gnu_mcount_nc> @ imm = 
#-10867412

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4206 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: ARM Ftrace Function Graph Fails With UNWINDER_FRAME_POINTER
  2023-12-02  8:49           ` Justin Chen
@ 2023-12-02  9:26             ` Ard Biesheuvel
  2023-12-02 17:49               ` Justin Chen
  0 siblings, 1 reply; 15+ messages in thread
From: Ard Biesheuvel @ 2023-12-02  9:26 UTC (permalink / raw)
  To: Justin Chen
  Cc: Steven Rostedt, mhiramat, mark.rutland, linux, linux-trace-kernel,
	linux-arm-kernel, Florian Fainelli, Doug Berger

On Sat, 2 Dec 2023 at 09:49, Justin Chen <justin.chen@broadcom.com> wrote:
>
>
>
> On 12/1/23 10:53 PM, Ard Biesheuvel wrote:
> > On Fri, 1 Dec 2023 at 23:59, Justin Chen <justin.chen@broadcom.com> wrote:
> >>
> >>
> >>
> >> On 12/1/23 10:07 AM, Steven Rostedt wrote:
> >>> On Fri, 1 Dec 2023 09:25:59 -0800
> >>> Justin Chen <justin.chen@broadcom.com> wrote:
> >>>
> >>>>> It appears the sub instruction at 0x6dd0 correctly accounts for the
> >>>>> extra 8 bytes, so the frame pointer is valid. So it is our assumption
> >>>>> that there are no gaps between the stack frames is invalid.
> >>>>
> >>>> Thanks for the assistance. The gap between the stack frame depends on
> >>>> the function. Most do not have a gap. Some have 8 (as shown above), some
> >>>> have 12. A single assumption here is not going to work. I'm having a
> >>>> hard time finding out the reasoning for this gap. I tried disabling a
> >>>> bunch of gcc flags as well as -O2 and the gap still exists.
> >>>
> >>> That code was originally added because of some strange things that gcc did
> >>> with mcount (for example, it made a copy of the stack frame that it passed
> >>> to mcount, where the function graph tracer replaced the copy of the return
> >>> stack making the shadow stack go out of sync and crash). This was very hard
> >>> to debug and I added this code to detect it if it happened again.
> >>>
> >>> Well it's been over a decade since that happened (2009).
> >>>
> >>>     71e308a239c09 ("function-graph: add stack frame test")
> >>>
> >>> I'm happy assuming that the compiler folks are aware of our tricks with
> >>> hijacking return calls and I don't expect it to happen again. We can just
> >>> rip out those checks. That is, if it's only causing false positives, I
> >>> don't think it's worth keeping around.
> >>>
> >>> Has it detected any real issues on the Arm platforms?
> >>>
> >>> -- Steve
> >>
> >> I am not familiar enough to make a call. But from my limited testing
> >> with ARM, I didn't see any issues. If you would like me to, I can submit
> >> a patch to remove the check entirely. Or maybe only disable it for ARM?
> >>
> >
> > Please try the fix I proposed first.
>
> Just tested it. Seems to do the trick.

Thanks

> Either solution works for me.
>

Given that this discussion is taking place in the context of the
report of an issue identified by GRAPH_FP_TEST, I don't see how
removing that would be a reasonable conclusion.

> FWIW I also experimented with LLVM, looks like function_graph just
> crashes regardless of the issue being discussed. The disassemble of
> LLVM[1] does something completely different.
>


LLVM does not support CONFIG_UNWINDER_FRAME_POINTER so the fact that
the prologue looks different is expected.

In the case below, the FP {r11} is correctly made to point to a {r11,
lr} tuple on the stack, so the codegen is correct AFAICT. But IIRC we
rely on unwind information for ftrace in this case, not the frame
pointer.

Could you be more specific about the crash?


>
> [1]
> LLVM dump
> c0c6faa0 <sk_getsockopt>:
> c0c6faa0: f0 4f 2d e9   push    {r4, r5, r6, r7, r8, r9, r10, r11, lr}
> c0c6faa4: 1c b0 8d e2   add     r11, sp, #28
> c0c6faa8: ac d0 4d e2   sub     sp, sp, #172
> c0c6faac: 00 70 a0 e1   mov     r7, r0
> c0c6fab0: c8 0c 04 e3   movw    r0, #19656
> c0c6fab4: 80 02 4c e3   movt    r0, #49792
> c0c6fab8: 03 50 a0 e1   mov     r5, r3
> c0c6fabc: 00 00 90 e5   ldr     r0, [r0]
> c0c6fac0: 02 a0 a0 e1   mov     r10, r2
> c0c6fac4: 20 00 0b e5   str     r0, [r11, #-32]
> c0c6fac8: 00 40 2d e9   stmdb   sp!, {lr}
> c0c6facc: 4b 8b d6 eb   bl      0xc0212800 <__gnu_mcount_nc> @ imm =
> #-10867412

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: ARM Ftrace Function Graph Fails With UNWINDER_FRAME_POINTER
  2023-12-02  9:26             ` Ard Biesheuvel
@ 2023-12-02 17:49               ` Justin Chen
  0 siblings, 0 replies; 15+ messages in thread
From: Justin Chen @ 2023-12-02 17:49 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Steven Rostedt, mhiramat, mark.rutland, linux, linux-trace-kernel,
	linux-arm-kernel, Florian Fainelli, Doug Berger

[-- Attachment #1: Type: text/plain, Size: 3921 bytes --]



On 12/2/2023 1:26 AM, Ard Biesheuvel wrote:
> On Sat, 2 Dec 2023 at 09:49, Justin Chen <justin.chen@broadcom.com> wrote:
>>
>>
>>
>> On 12/1/23 10:53 PM, Ard Biesheuvel wrote:
>>> On Fri, 1 Dec 2023 at 23:59, Justin Chen <justin.chen@broadcom.com> wrote:
>>>>
>>>>
>>>>
>>>> On 12/1/23 10:07 AM, Steven Rostedt wrote:
>>>>> On Fri, 1 Dec 2023 09:25:59 -0800
>>>>> Justin Chen <justin.chen@broadcom.com> wrote:
>>>>>
>>>>>>> It appears the sub instruction at 0x6dd0 correctly accounts for the
>>>>>>> extra 8 bytes, so the frame pointer is valid. So it is our assumption
>>>>>>> that there are no gaps between the stack frames is invalid.
>>>>>>
>>>>>> Thanks for the assistance. The gap between the stack frame depends on
>>>>>> the function. Most do not have a gap. Some have 8 (as shown above), some
>>>>>> have 12. A single assumption here is not going to work. I'm having a
>>>>>> hard time finding out the reasoning for this gap. I tried disabling a
>>>>>> bunch of gcc flags as well as -O2 and the gap still exists.
>>>>>
>>>>> That code was originally added because of some strange things that gcc did
>>>>> with mcount (for example, it made a copy of the stack frame that it passed
>>>>> to mcount, where the function graph tracer replaced the copy of the return
>>>>> stack making the shadow stack go out of sync and crash). This was very hard
>>>>> to debug and I added this code to detect it if it happened again.
>>>>>
>>>>> Well it's been over a decade since that happened (2009).
>>>>>
>>>>>      71e308a239c09 ("function-graph: add stack frame test")
>>>>>
>>>>> I'm happy assuming that the compiler folks are aware of our tricks with
>>>>> hijacking return calls and I don't expect it to happen again. We can just
>>>>> rip out those checks. That is, if it's only causing false positives, I
>>>>> don't think it's worth keeping around.
>>>>>
>>>>> Has it detected any real issues on the Arm platforms?
>>>>>
>>>>> -- Steve
>>>>
>>>> I am not familiar enough to make a call. But from my limited testing
>>>> with ARM, I didn't see any issues. If you would like me to, I can submit
>>>> a patch to remove the check entirely. Or maybe only disable it for ARM?
>>>>
>>>
>>> Please try the fix I proposed first.
>>
>> Just tested it. Seems to do the trick.
> 
> Thanks
> 
>> Either solution works for me.
>>
> 
> Given that this discussion is taking place in the context of the
> report of an issue identified by GRAPH_FP_TEST, I don't see how
> removing that would be a reasonable conclusion.
> 

Fair enough. I will submit a patch. Thanks for the help.

>> FWIW I also experimented with LLVM, looks like function_graph just
>> crashes regardless of the issue being discussed. The disassemble of
>> LLVM[1] does something completely different.
>>
> 
> 
> LLVM does not support CONFIG_UNWINDER_FRAME_POINTER so the fact that
> the prologue looks different is expected.
> 
> In the case below, the FP {r11} is correctly made to point to a {r11,
> lr} tuple on the stack, so the codegen is correct AFAICT. But IIRC we
> rely on unwind information for ftrace in this case, not the frame
> pointer.
> 
> Could you be more specific about the crash?
>

It just hangs with no prints. I can instrument the hang and see what I 
can find.

Justin

> 
>>
>> [1]
>> LLVM dump
>> c0c6faa0 <sk_getsockopt>:
>> c0c6faa0: f0 4f 2d e9   push    {r4, r5, r6, r7, r8, r9, r10, r11, lr}
>> c0c6faa4: 1c b0 8d e2   add     r11, sp, #28
>> c0c6faa8: ac d0 4d e2   sub     sp, sp, #172
>> c0c6faac: 00 70 a0 e1   mov     r7, r0
>> c0c6fab0: c8 0c 04 e3   movw    r0, #19656
>> c0c6fab4: 80 02 4c e3   movt    r0, #49792
>> c0c6fab8: 03 50 a0 e1   mov     r5, r3
>> c0c6fabc: 00 00 90 e5   ldr     r0, [r0]
>> c0c6fac0: 02 a0 a0 e1   mov     r10, r2
>> c0c6fac4: 20 00 0b e5   str     r0, [r11, #-32]
>> c0c6fac8: 00 40 2d e9   stmdb   sp!, {lr}
>> c0c6facc: 4b 8b d6 eb   bl      0xc0212800 <__gnu_mcount_nc> @ imm =
>> #-10867412

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4206 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: ARM Ftrace Function Graph Fails With UNWINDER_FRAME_POINTER
  2023-11-30 23:47 ARM Ftrace Function Graph Fails With UNWINDER_FRAME_POINTER Justin Chen
  2023-12-01  9:12 ` Ard Biesheuvel
@ 2024-05-27  7:43 ` Thorsten Scherer
  2024-05-27  7:56   ` Russell King (Oracle)
  1 sibling, 1 reply; 15+ messages in thread
From: Thorsten Scherer @ 2024-05-27  7:43 UTC (permalink / raw)
  To: ardb
  Cc: rostedt, mhiramat, mark.rutland, linux, linux-trace-kernel,
	linux-arm-kernel, Justin Chen, Florian Fainelli, Doug Berger,
	Uwe Kleine-König, kernel, Thorsten Scherer

Hello,

in the context of a panic on an i.MX25 based v6.9 kernel [1] Uwe pointed me to
this thread.  With the proposed code change applied the procedure

    # set to some known good (randomly guessed) filter function and enable function_graph
    echo mtdblock_open > set_ftrace_filter
    echo function_graph > current_tracer

    # walk available filter funcs
    cat available_filter_functions | while read f; do echo $f | tee -a set_ftrace_filter; sleep 1; done

produces the following output

[  159.832173] Insufficient stack space to handle exception!
[  159.832241] Task stack:     [0xc8e44000..0xc8e46000]
[  159.842701] IRQ stack:      [0xc8800000..0xc8802000]
[  159.847712] Overflow stack: [0xc1934000..0xc1935000]
[  159.852726] Internal error: Oops - BUG: 0 [#1] PREEMPT ARM
[  159.858273] Modules linked in: capture_events_imxgpt ti_ads7950 industrialio_triggered_buffer kfifo_buf capture_events_irq capture_events iio_trig_hrtimer industrialio_sw_trigger industrialio_configfs dm_mod
[  159.876948] CPU: 0 PID: 199 Comm: sh Not tainted 6.9.0 #3
[  159.882412] Hardware name: Freescale i.MX25 (Device Tree Support)
[  159.888547] PC is at prepare_ftrace_return+0x4/0x7c
[  159.893520] LR is at ftrace_graph_caller+0x1c/0x28
[  159.898376] pc : [<c010dd44>]    lr : [<c010d988>]    psr: 60000093
[  159.904690] sp : c8e44018  ip : c8e44018  fp : c8e4403c
[  159.909959] r10: c0c09e78  r9 : c35e9bc0  r8 : c010d9bc
[  159.915227] r7 : 00000001  r6 : 00000004  r5 : c8e44064  r4 : c8e440ac
[  159.921800] r3 : c8e44030  r2 : c8e4403c  r1 : c010eb9c  r0 : c8e44038
[  159.928376] Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment none
[  159.935652] Control: 0005317f  Table: 83074000  DAC: 00000051
[  159.941436] Register r0 information: 2-page vmalloc region starting at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
[  159.952253] Register r1 information: non-slab/vmalloc memory
[  159.957988] Register r2 information: 2-page vmalloc region starting at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
[  159.968791] Register r3 information: 2-page vmalloc region starting at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
[  159.979592] Register r4 information: 2-page vmalloc region starting at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
[  159.990391] Register r5 information: 2-page vmalloc region starting at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
[  160.001187] Register r6 information: non-paged memory
[  160.006303] Register r7 information: non-paged memory
[  160.011415] Register r8 information: non-slab/vmalloc memory
[  160.017139] Register r9 information: slab kmalloc-32 start c35e9bc0 pointer offset 0 size 32
[  160.025718] Register r10 information: non-slab/vmalloc memory
[  160.031530] Register r11 information: 2-page vmalloc region starting at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
[  160.042422] Register r12 information: 2-page vmalloc region starting at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
[  160.053315] Process sh (pid: 199, stack limit = 0x68fc3abb)
[  160.058955] Stack: (0xc8e44018 to 0xc8e46000)

and points me to copy_from_kernel_nofault_allowed.

    Disassembly of section .text:

    c010eb8c <copy_from_kernel_nofault_allowed>:
    c010eb8c:       e1a0c00d        mov     ip, sp
    c010eb90:       e92dd800        push    {fp, ip, lr, pc}
    c010eb94:       e24cb004        sub     fp, ip, #4
    c010eb98:       e52de004        push    {lr}            ; (str lr, [sp, #-4]!)
    c010eb9c:       ebfffb4f        bl      c010d8e0 <__gnu_mcount_nc>
    c010eba0:       e35004bf        cmp     r0, #-1090519040        ; 0xbf000000
    c010eba4:       3a000003        bcc     c010ebb8 <copy_from_kernel_nofault_allowed+0x2c>
    c010eba8:       e0900001        adds    r0, r0, r1
    c010ebac:       33a00001        movcc   r0, #1
    c010ebb0:       23a00000        movcs   r0, #0
    c010ebb4:       e89da800        ldm     sp, {fp, sp, pc}
    c010ebb8:       e3a00000        mov     r0, #0
    c010ebbc:       e89da800        ldm     sp, {fp, sp, pc}


In the time given I couldn't make sense of the issue, so I am passing this
hopefully useful information to you.

Best regards
Thorsten

#regzbot introduced: 953f534a7ed6b725d4f101d2949393acc9262880 ^

[1] https://lore.kernel.org/all/alp44tukzo6mvcwl4ke4ehhmojrqnv6xfcdeuliybxfjfvgd3e@gpjvwj33cc76/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: ARM Ftrace Function Graph Fails With UNWINDER_FRAME_POINTER
  2024-05-27  7:43 ` Thorsten Scherer
@ 2024-05-27  7:56   ` Russell King (Oracle)
  2024-05-27 12:28     ` Uwe Kleine-König
  0 siblings, 1 reply; 15+ messages in thread
From: Russell King (Oracle) @ 2024-05-27  7:56 UTC (permalink / raw)
  To: Thorsten Scherer
  Cc: ardb, rostedt, mhiramat, mark.rutland, linux-trace-kernel,
	linux-arm-kernel, Justin Chen, Florian Fainelli, Doug Berger,
	Uwe Kleine-König, kernel

On Mon, May 27, 2024 at 09:43:41AM +0200, Thorsten Scherer wrote:
> Hello,
> 
> in the context of a panic on an i.MX25 based v6.9 kernel [1] Uwe pointed me to
> this thread.  With the proposed code change applied the procedure
> 
>     # set to some known good (randomly guessed) filter function and enable function_graph
>     echo mtdblock_open > set_ftrace_filter
>     echo function_graph > current_tracer
> 
>     # walk available filter funcs
>     cat available_filter_functions | while read f; do echo $f | tee -a set_ftrace_filter; sleep 1; done
> 
> produces the following output
> 
> [  159.832173] Insufficient stack space to handle exception!
> [  159.832241] Task stack:     [0xc8e44000..0xc8e46000]
> [  159.842701] IRQ stack:      [0xc8800000..0xc8802000]
> [  159.847712] Overflow stack: [0xc1934000..0xc1935000]
> [  159.852726] Internal error: Oops - BUG: 0 [#1] PREEMPT ARM
> [  159.858273] Modules linked in: capture_events_imxgpt ti_ads7950 industrialio_triggered_buffer kfifo_buf capture_events_irq capture_events iio_trig_hrtimer industrialio_sw_trigger industrialio_configfs dm_mod
> [  159.876948] CPU: 0 PID: 199 Comm: sh Not tainted 6.9.0 #3
> [  159.882412] Hardware name: Freescale i.MX25 (Device Tree Support)
> [  159.888547] PC is at prepare_ftrace_return+0x4/0x7c
> [  159.893520] LR is at ftrace_graph_caller+0x1c/0x28
> [  159.898376] pc : [<c010dd44>]    lr : [<c010d988>]    psr: 60000093
> [  159.904690] sp : c8e44018  ip : c8e44018  fp : c8e4403c
> [  159.909959] r10: c0c09e78  r9 : c35e9bc0  r8 : c010d9bc
> [  159.915227] r7 : 00000001  r6 : 00000004  r5 : c8e44064  r4 : c8e440ac
> [  159.921800] r3 : c8e44030  r2 : c8e4403c  r1 : c010eb9c  r0 : c8e44038
> [  159.928376] Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment none
> [  159.935652] Control: 0005317f  Table: 83074000  DAC: 00000051
> [  159.941436] Register r0 information: 2-page vmalloc region starting at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> [  159.952253] Register r1 information: non-slab/vmalloc memory
> [  159.957988] Register r2 information: 2-page vmalloc region starting at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> [  159.968791] Register r3 information: 2-page vmalloc region starting at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> [  159.979592] Register r4 information: 2-page vmalloc region starting at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> [  159.990391] Register r5 information: 2-page vmalloc region starting at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> [  160.001187] Register r6 information: non-paged memory
> [  160.006303] Register r7 information: non-paged memory
> [  160.011415] Register r8 information: non-slab/vmalloc memory
> [  160.017139] Register r9 information: slab kmalloc-32 start c35e9bc0 pointer offset 0 size 32
> [  160.025718] Register r10 information: non-slab/vmalloc memory
> [  160.031530] Register r11 information: 2-page vmalloc region starting at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> [  160.042422] Register r12 information: 2-page vmalloc region starting at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> [  160.053315] Process sh (pid: 199, stack limit = 0x68fc3abb)
> [  160.058955] Stack: (0xc8e44018 to 0xc8e46000)

No backtrace? No Code: line? I'm guessing there was an attempt to ftrace
a function involving the ftrace tracing infrastructure, which is why 8KB
of stack has been gobbled up. It could be
copy_from_kernel_nofault_allowed() but it would be useful to have at
least some extract of the backtrace showing the recursive cycle to
confirm, otherwise there is nothing in your report to confirm. As I'm
not a ftrace user myself, this isn't something I'd test for, so having
a full report would be useful.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: ARM Ftrace Function Graph Fails With UNWINDER_FRAME_POINTER
  2024-05-27  7:56   ` Russell King (Oracle)
@ 2024-05-27 12:28     ` Uwe Kleine-König
  2024-05-27 12:51       ` Russell King (Oracle)
  0 siblings, 1 reply; 15+ messages in thread
From: Uwe Kleine-König @ 2024-05-27 12:28 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Thorsten Scherer, mark.rutland, Florian Fainelli, rostedt,
	Justin Chen, Doug Berger, mhiramat, kernel, ardb,
	linux-arm-kernel, linux-trace-kernel

[-- Attachment #1: Type: text/plain, Size: 4374 bytes --]

On Mon, May 27, 2024 at 08:56:16AM +0100, Russell King (Oracle) wrote:
> On Mon, May 27, 2024 at 09:43:41AM +0200, Thorsten Scherer wrote:
> > Hello,
> > 
> > in the context of a panic on an i.MX25 based v6.9 kernel [1] Uwe pointed me to
> > this thread.  With the proposed code change applied the procedure
> > 
> >     # set to some known good (randomly guessed) filter function and enable function_graph
> >     echo mtdblock_open > set_ftrace_filter
> >     echo function_graph > current_tracer
> > 
> >     # walk available filter funcs
> >     cat available_filter_functions | while read f; do echo $f | tee -a set_ftrace_filter; sleep 1; done
> > 
> > produces the following output
> > 
> > [  159.832173] Insufficient stack space to handle exception!
> > [  159.832241] Task stack:     [0xc8e44000..0xc8e46000]
> > [  159.842701] IRQ stack:      [0xc8800000..0xc8802000]
> > [  159.847712] Overflow stack: [0xc1934000..0xc1935000]
> > [  159.852726] Internal error: Oops - BUG: 0 [#1] PREEMPT ARM
> > [  159.858273] Modules linked in: capture_events_imxgpt ti_ads7950 industrialio_triggered_buffer kfifo_buf capture_events_irq capture_events iio_trig_hrtimer industrialio_sw_trigger industrialio_configfs dm_mod
> > [  159.876948] CPU: 0 PID: 199 Comm: sh Not tainted 6.9.0 #3
> > [  159.882412] Hardware name: Freescale i.MX25 (Device Tree Support)
> > [  159.888547] PC is at prepare_ftrace_return+0x4/0x7c
> > [  159.893520] LR is at ftrace_graph_caller+0x1c/0x28
> > [  159.898376] pc : [<c010dd44>]    lr : [<c010d988>]    psr: 60000093
> > [  159.904690] sp : c8e44018  ip : c8e44018  fp : c8e4403c
> > [  159.909959] r10: c0c09e78  r9 : c35e9bc0  r8 : c010d9bc
> > [  159.915227] r7 : 00000001  r6 : 00000004  r5 : c8e44064  r4 : c8e440ac
> > [  159.921800] r3 : c8e44030  r2 : c8e4403c  r1 : c010eb9c  r0 : c8e44038
> > [  159.928376] Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment none
> > [  159.935652] Control: 0005317f  Table: 83074000  DAC: 00000051
> > [  159.941436] Register r0 information: 2-page vmalloc region starting at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> > [  159.952253] Register r1 information: non-slab/vmalloc memory
> > [  159.957988] Register r2 information: 2-page vmalloc region starting at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> > [  159.968791] Register r3 information: 2-page vmalloc region starting at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> > [  159.979592] Register r4 information: 2-page vmalloc region starting at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> > [  159.990391] Register r5 information: 2-page vmalloc region starting at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> > [  160.001187] Register r6 information: non-paged memory
> > [  160.006303] Register r7 information: non-paged memory
> > [  160.011415] Register r8 information: non-slab/vmalloc memory
> > [  160.017139] Register r9 information: slab kmalloc-32 start c35e9bc0 pointer offset 0 size 32
> > [  160.025718] Register r10 information: non-slab/vmalloc memory
> > [  160.031530] Register r11 information: 2-page vmalloc region starting at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> > [  160.042422] Register r12 information: 2-page vmalloc region starting at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> > [  160.053315] Process sh (pid: 199, stack limit = 0x68fc3abb)
> > [  160.058955] Stack: (0xc8e44018 to 0xc8e46000)
> 
> No backtrace? No Code: line? I'm guessing there was an attempt to ftrace
> a function involving the ftrace tracing infrastructure, which is why 8KB
> of stack has been gobbled up. It could be
> copy_from_kernel_nofault_allowed() but it would be useful to have at
> least some extract of the backtrace showing the recursive cycle to
> confirm, otherwise there is nothing in your report to confirm. As I'm
> not a ftrace user myself, this isn't something I'd test for, so having
> a full report would be useful.

Is not having a backtrace related to ftrace_return_address() returning
NULL, as Arnd pointed out in
https://lore.kernel.org/linux-arm-kernel/36cd10de-c51c-40ff-90e8-71495406019d@app.fastmail.com/
?

Best regards
Uwe

-- 
Pengutronix e.K.                           | Uwe Kleine-König            |
Industrial Linux Solutions                 | https://www.pengutronix.de/ |

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: ARM Ftrace Function Graph Fails With UNWINDER_FRAME_POINTER
  2024-05-27 12:28     ` Uwe Kleine-König
@ 2024-05-27 12:51       ` Russell King (Oracle)
  2024-05-28  4:52         ` Thorsten Scherer
  0 siblings, 1 reply; 15+ messages in thread
From: Russell King (Oracle) @ 2024-05-27 12:51 UTC (permalink / raw)
  To: Uwe Kleine-König
  Cc: Thorsten Scherer, mark.rutland, Florian Fainelli, rostedt,
	Justin Chen, Doug Berger, mhiramat, kernel, ardb,
	linux-arm-kernel, linux-trace-kernel

On Mon, May 27, 2024 at 02:28:59PM +0200, Uwe Kleine-König wrote:
> On Mon, May 27, 2024 at 08:56:16AM +0100, Russell King (Oracle) wrote:
> > On Mon, May 27, 2024 at 09:43:41AM +0200, Thorsten Scherer wrote:
> > > Hello,
> > > 
> > > in the context of a panic on an i.MX25 based v6.9 kernel [1] Uwe pointed me to
> > > this thread.  With the proposed code change applied the procedure
> > > 
> > >     # set to some known good (randomly guessed) filter function and enable function_graph
> > >     echo mtdblock_open > set_ftrace_filter
> > >     echo function_graph > current_tracer
> > > 
> > >     # walk available filter funcs
> > >     cat available_filter_functions | while read f; do echo $f | tee -a set_ftrace_filter; sleep 1; done
> > > 
> > > produces the following output
> > > 
> > > [  159.832173] Insufficient stack space to handle exception!
> > > [  159.832241] Task stack:     [0xc8e44000..0xc8e46000]
> > > [  159.842701] IRQ stack:      [0xc8800000..0xc8802000]
> > > [  159.847712] Overflow stack: [0xc1934000..0xc1935000]
> > > [  159.852726] Internal error: Oops - BUG: 0 [#1] PREEMPT ARM
> > > [  159.858273] Modules linked in: capture_events_imxgpt ti_ads7950 industrialio_triggered_buffer kfifo_buf capture_events_irq capture_events iio_trig_hrtimer industrialio_sw_trigger industrialio_configfs dm_mod
> > > [  159.876948] CPU: 0 PID: 199 Comm: sh Not tainted 6.9.0 #3
> > > [  159.882412] Hardware name: Freescale i.MX25 (Device Tree Support)
> > > [  159.888547] PC is at prepare_ftrace_return+0x4/0x7c
> > > [  159.893520] LR is at ftrace_graph_caller+0x1c/0x28
> > > [  159.898376] pc : [<c010dd44>]    lr : [<c010d988>]    psr: 60000093
> > > [  159.904690] sp : c8e44018  ip : c8e44018  fp : c8e4403c
> > > [  159.909959] r10: c0c09e78  r9 : c35e9bc0  r8 : c010d9bc
> > > [  159.915227] r7 : 00000001  r6 : 00000004  r5 : c8e44064  r4 : c8e440ac
> > > [  159.921800] r3 : c8e44030  r2 : c8e4403c  r1 : c010eb9c  r0 : c8e44038
> > > [  159.928376] Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment none
> > > [  159.935652] Control: 0005317f  Table: 83074000  DAC: 00000051
> > > [  159.941436] Register r0 information: 2-page vmalloc region starting at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> > > [  159.952253] Register r1 information: non-slab/vmalloc memory
> > > [  159.957988] Register r2 information: 2-page vmalloc region starting at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> > > [  159.968791] Register r3 information: 2-page vmalloc region starting at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> > > [  159.979592] Register r4 information: 2-page vmalloc region starting at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> > > [  159.990391] Register r5 information: 2-page vmalloc region starting at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> > > [  160.001187] Register r6 information: non-paged memory
> > > [  160.006303] Register r7 information: non-paged memory
> > > [  160.011415] Register r8 information: non-slab/vmalloc memory
> > > [  160.017139] Register r9 information: slab kmalloc-32 start c35e9bc0 pointer offset 0 size 32
> > > [  160.025718] Register r10 information: non-slab/vmalloc memory
> > > [  160.031530] Register r11 information: 2-page vmalloc region starting at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> > > [  160.042422] Register r12 information: 2-page vmalloc region starting at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> > > [  160.053315] Process sh (pid: 199, stack limit = 0x68fc3abb)
> > > [  160.058955] Stack: (0xc8e44018 to 0xc8e46000)
> > 
> > No backtrace? No Code: line? I'm guessing there was an attempt to ftrace
> > a function involving the ftrace tracing infrastructure, which is why 8KB
> > of stack has been gobbled up. It could be
> > copy_from_kernel_nofault_allowed() but it would be useful to have at
> > least some extract of the backtrace showing the recursive cycle to
> > confirm, otherwise there is nothing in your report to confirm. As I'm
> > not a ftrace user myself, this isn't something I'd test for, so having
> > a full report would be useful.
> 
> Is not having a backtrace related to ftrace_return_address() returning
> NULL, as Arnd pointed out in
> https://lore.kernel.org/linux-arm-kernel/36cd10de-c51c-40ff-90e8-71495406019d@app.fastmail.com/
> ?

Unlikely - the stack and code lines are also missing. I think the
submitter truncated the oops which is highly likely given that it
would've dumped all 8kB of the stack in hex, and the trace and
code lines would be after that.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: ARM Ftrace Function Graph Fails With UNWINDER_FRAME_POINTER
  2024-05-27 12:51       ` Russell King (Oracle)
@ 2024-05-28  4:52         ` Thorsten Scherer
  0 siblings, 0 replies; 15+ messages in thread
From: Thorsten Scherer @ 2024-05-28  4:52 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Uwe Kleine-König, mark.rutland, Florian Fainelli, rostedt,
	Justin Chen, Doug Berger, mhiramat, kernel, ardb,
	linux-arm-kernel, linux-trace-kernel

Hello,

On Mon, May 27, 2024 at 01:51:49PM +0100, Russell King (Oracle) wrote:
> On Mon, May 27, 2024 at 02:28:59PM +0200, Uwe Kleine-König wrote:
> > On Mon, May 27, 2024 at 08:56:16AM +0100, Russell King (Oracle) wrote:
> > > On Mon, May 27, 2024 at 09:43:41AM +0200, Thorsten Scherer wrote:
> > > > Hello,
> > > > 
> > > > in the context of a panic on an i.MX25 based v6.9 kernel [1] Uwe pointed me to
> > > > this thread.  With the proposed code change applied the procedure
> > > > 
> > > >     # set to some known good (randomly guessed) filter function and enable function_graph
> > > >     echo mtdblock_open > set_ftrace_filter
> > > >     echo function_graph > current_tracer
> > > > 
> > > >     # walk available filter funcs
> > > >     cat available_filter_functions | while read f; do echo $f | tee -a set_ftrace_filter; sleep 1; done
> > > > 
> > > > produces the following output
> > > > 
> > > > [  159.832173] Insufficient stack space to handle exception!
> > > > [  159.832241] Task stack:     [0xc8e44000..0xc8e46000]
> > > > [  159.842701] IRQ stack:      [0xc8800000..0xc8802000]
> > > > [  159.847712] Overflow stack: [0xc1934000..0xc1935000]
> > > > [  159.852726] Internal error: Oops - BUG: 0 [#1] PREEMPT ARM
> > > > [  159.858273] Modules linked in: capture_events_imxgpt ti_ads7950 industrialio_triggered_buffer kfifo_buf capture_events_irq capture_events iio_trig_hrtimer industrialio_sw_trigger industrialio_configfs dm_mod
> > > > [  159.876948] CPU: 0 PID: 199 Comm: sh Not tainted 6.9.0 #3
> > > > [  159.882412] Hardware name: Freescale i.MX25 (Device Tree Support)
> > > > [  159.888547] PC is at prepare_ftrace_return+0x4/0x7c
> > > > [  159.893520] LR is at ftrace_graph_caller+0x1c/0x28
> > > > [  159.898376] pc : [<c010dd44>]    lr : [<c010d988>]    psr: 60000093
> > > > [  159.904690] sp : c8e44018  ip : c8e44018  fp : c8e4403c
> > > > [  159.909959] r10: c0c09e78  r9 : c35e9bc0  r8 : c010d9bc
> > > > [  159.915227] r7 : 00000001  r6 : 00000004  r5 : c8e44064  r4 : c8e440ac
> > > > [  159.921800] r3 : c8e44030  r2 : c8e4403c  r1 : c010eb9c  r0 : c8e44038
> > > > [  159.928376] Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment none
> > > > [  159.935652] Control: 0005317f  Table: 83074000  DAC: 00000051
> > > > [  159.941436] Register r0 information: 2-page vmalloc region starting at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> > > > [  159.952253] Register r1 information: non-slab/vmalloc memory
> > > > [  159.957988] Register r2 information: 2-page vmalloc region starting at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> > > > [  159.968791] Register r3 information: 2-page vmalloc region starting at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> > > > [  159.979592] Register r4 information: 2-page vmalloc region starting at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> > > > [  159.990391] Register r5 information: 2-page vmalloc region starting at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> > > > [  160.001187] Register r6 information: non-paged memory
> > > > [  160.006303] Register r7 information: non-paged memory
> > > > [  160.011415] Register r8 information: non-slab/vmalloc memory
> > > > [  160.017139] Register r9 information: slab kmalloc-32 start c35e9bc0 pointer offset 0 size 32
> > > > [  160.025718] Register r10 information: non-slab/vmalloc memory
> > > > [  160.031530] Register r11 information: 2-page vmalloc region starting at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> > > > [  160.042422] Register r12 information: 2-page vmalloc region starting at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> > > > [  160.053315] Process sh (pid: 199, stack limit = 0x68fc3abb)
> > > > [  160.058955] Stack: (0xc8e44018 to 0xc8e46000)
> > > 
> > > No backtrace? No Code: line? I'm guessing there was an attempt to ftrace
> > > a function involving the ftrace tracing infrastructure, which is why 8KB
> > > of stack has been gobbled up. It could be
> > > copy_from_kernel_nofault_allowed() but it would be useful to have at
> > > least some extract of the backtrace showing the recursive cycle to
> > > confirm, otherwise there is nothing in your report to confirm. As I'm
> > > not a ftrace user myself, this isn't something I'd test for, so having
> > > a full report would be useful.
> > 
> > Is not having a backtrace related to ftrace_return_address() returning
> > NULL, as Arnd pointed out in
> > https://lore.kernel.org/linux-arm-kernel/36cd10de-c51c-40ff-90e8-71495406019d@app.fastmail.com/
> > ?
> 
> Unlikely - the stack and code lines are also missing. I think the
> submitter truncated the oops which is highly likely given that it
> would've dumped all 8kB of the stack in hex, and the trace and
> code lines would be after that.

sorry for causing additional friction by my imprecise description.
Indeed, this was the whole oops before the machine restarted.  I double
checked that.

I see that this [1] thread contains a patch.  Will test it on my machine.
> -- 
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

Best regards
Thorsten

[1] https://lore.kernel.org/all/20240527161236.313448-2-ardb+git@google.com/

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2024-05-28  4:52 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-30 23:47 ARM Ftrace Function Graph Fails With UNWINDER_FRAME_POINTER Justin Chen
2023-12-01  9:12 ` Ard Biesheuvel
2023-12-01 17:25   ` Justin Chen
2023-12-01 18:07     ` Steven Rostedt
2023-12-01 22:59       ` Justin Chen
2023-12-02  6:53         ` Ard Biesheuvel
2023-12-02  8:49           ` Justin Chen
2023-12-02  9:26             ` Ard Biesheuvel
2023-12-02 17:49               ` Justin Chen
2023-12-01 18:22   ` Russell King (Oracle)
2024-05-27  7:43 ` Thorsten Scherer
2024-05-27  7:56   ` Russell King (Oracle)
2024-05-27 12:28     ` Uwe Kleine-König
2024-05-27 12:51       ` Russell King (Oracle)
2024-05-28  4:52         ` Thorsten Scherer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).