* ARM Ftrace Function Graph Fails With UNWINDER_FRAME_POINTER
@ 2023-11-30 23:47 Justin Chen
2023-12-01 9:12 ` Ard Biesheuvel
2024-05-27 7:43 ` Thorsten Scherer
0 siblings, 2 replies; 15+ messages in thread
From: Justin Chen @ 2023-11-30 23:47 UTC (permalink / raw)
To: rostedt, mhiramat, mark.rutland, linux, linux-trace-kernel,
linux-arm-kernel, ardb
Cc: Florian Fainelli, Doug Berger
[-- Attachment #1: Type: text/plain, Size: 3041 bytes --]
Hello,
Ran into an odd bug that I am unsure what the solution is. Tested a few
kernels versions and they all fail the same.
FUNCTION_GRAPH_FP_TEST was enabled with 953f534a7ed6 ("ARM: ftrace:
enable HAVE_FUNCTION_GRAPH_FP_TEST"). This test fails when
UNWINDER_FRAME_POINTER is enabled. Enable function_graph tracer and you
should see a failure similar to below.
[ 63.817239] ------------[ cut here ]------------
[ 63.822006] WARNING: CPU: 3 PID: 1185 at kernel/trace/fgraph.c:195
ftrace_return_to_handler+0x228/0x374
[ 63.831645] Bad frame pointer: expected d1e0df40, received d1e0df48
[ 63.831645] from func packet_setsockopt return to c0b558f4
[ 63.843801] Modules linked in: bdc udc_core
[ 63.848246] CPU: 3 PID: 1185 Comm: udhcpc Not tainted
6.1.53-0.1pre-gf0bc552d12f8 #33
[ 63.856209] Hardware name: Broadcom STB (Flattened Device Tree)
[ 63.862227] Backtrace:
[ 63.864761] dump_backtrace from show_stack+0x20/0x24
[ 63.869982] r7:c031cd8c r6:00000009 r5:00000013 r4:c11c7fac
[ 63.875736] show_stack from dump_stack_lvl+0x48/0x54
[ 63.880929] dump_stack_lvl from dump_stack+0x18/0x1c
[ 63.886111] r5:000000c3 r4:c11bd92c
[ 63.889764] dump_stack from __warn+0x88/0x130
[ 63.894339] __warn from warn_slowpath_fmt+0x140/0x198
[ 63.899631] r8:d1e0deac r7:c11bd958 r6:c031cd8c r5:c11bd92c r4:00000000
[ 63.906431] warn_slowpath_fmt from ftrace_return_to_handler+0x228/0x374
[ 63.913294] r8:c3a8d840 r7:00000002 r6:d1e0df48 r5:c2377a94 r4:c269a400
[ 63.920095] ftrace_return_to_handler from return_to_handler+0xc/0x18
[ 63.926699] r8:c0cd8ed0 r7:00000008 r6:c418c500 r5:00000004 r4:00000107
[ 63.933500] __sys_setsockopt from return_to_handler+0x0/0x18
[ 63.939415] r8:c02002bc r7:00000126 r6:00000003 r5:00000000 r4:00000004
[ 63.946217] sys_setsockopt from return_to_handler+0x0/0x18
[ 63.952053] ---[ end trace 0000000000000000 ]---
Sure enough the top of the parent stack is off by 8. (Tested with
gcc6.3/gcc8.3/gcc12.3)
00006dcc <packet_setsockopt>:
6dcc: e1a0c00d mov ip, sp
6dd0: e24dd008 sub sp, sp, #8 <======
6dd4: e92ddff0 push {r4, r5, r6, r7, r8, r9, sl,
fp, ip, lr, pc}
6dd8: e24cb00c sub fp, ip, #12
6ddc: e24dd06c sub sp, sp, #108 @ 0x6c
6de0: e52de004 push {lr} @ (str lr, [sp,
#-4]!)
6de4: ebfffffe bl 0 <__gnu_mcount_nc>
I'm not quite sure why gcc is putting this extra 8 byte frame (maybe
some optimization?), but it isn't being accounted for thus the
FUNCTION_GRAPH_FP_TEST for arm fails. Note that only some functions do
this. Function graph works with FUNCTION_GRAPH_FP_TEST disabled, so it
looks the test is hitting false positives.
Thanks,
Justin
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4206 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: ARM Ftrace Function Graph Fails With UNWINDER_FRAME_POINTER
2023-11-30 23:47 ARM Ftrace Function Graph Fails With UNWINDER_FRAME_POINTER Justin Chen
@ 2023-12-01 9:12 ` Ard Biesheuvel
2023-12-01 17:25 ` Justin Chen
2023-12-01 18:22 ` Russell King (Oracle)
2024-05-27 7:43 ` Thorsten Scherer
1 sibling, 2 replies; 15+ messages in thread
From: Ard Biesheuvel @ 2023-12-01 9:12 UTC (permalink / raw)
To: Justin Chen
Cc: rostedt, mhiramat, mark.rutland, linux, linux-trace-kernel,
linux-arm-kernel, Florian Fainelli, Doug Berger
On Fri, 1 Dec 2023 at 00:48, Justin Chen <justin.chen@broadcom.com> wrote:
>
> Hello,
>
> Ran into an odd bug that I am unsure what the solution is. Tested a few
> kernels versions and they all fail the same.
>
> FUNCTION_GRAPH_FP_TEST was enabled with 953f534a7ed6 ("ARM: ftrace:
> enable HAVE_FUNCTION_GRAPH_FP_TEST"). This test fails when
> UNWINDER_FRAME_POINTER is enabled. Enable function_graph tracer and you
> should see a failure similar to below.
>
> [ 63.817239] ------------[ cut here ]------------
> [ 63.822006] WARNING: CPU: 3 PID: 1185 at kernel/trace/fgraph.c:195
> ftrace_return_to_handler+0x228/0x374
> [ 63.831645] Bad frame pointer: expected d1e0df40, received d1e0df48
> [ 63.831645] from func packet_setsockopt return to c0b558f4
> [ 63.843801] Modules linked in: bdc udc_core
> [ 63.848246] CPU: 3 PID: 1185 Comm: udhcpc Not tainted
> 6.1.53-0.1pre-gf0bc552d12f8 #33
> [ 63.856209] Hardware name: Broadcom STB (Flattened Device Tree)
> [ 63.862227] Backtrace:
> [ 63.864761] dump_backtrace from show_stack+0x20/0x24
> [ 63.869982] r7:c031cd8c r6:00000009 r5:00000013 r4:c11c7fac
> [ 63.875736] show_stack from dump_stack_lvl+0x48/0x54
> [ 63.880929] dump_stack_lvl from dump_stack+0x18/0x1c
> [ 63.886111] r5:000000c3 r4:c11bd92c
> [ 63.889764] dump_stack from __warn+0x88/0x130
> [ 63.894339] __warn from warn_slowpath_fmt+0x140/0x198
> [ 63.899631] r8:d1e0deac r7:c11bd958 r6:c031cd8c r5:c11bd92c r4:00000000
> [ 63.906431] warn_slowpath_fmt from ftrace_return_to_handler+0x228/0x374
> [ 63.913294] r8:c3a8d840 r7:00000002 r6:d1e0df48 r5:c2377a94 r4:c269a400
> [ 63.920095] ftrace_return_to_handler from return_to_handler+0xc/0x18
> [ 63.926699] r8:c0cd8ed0 r7:00000008 r6:c418c500 r5:00000004 r4:00000107
> [ 63.933500] __sys_setsockopt from return_to_handler+0x0/0x18
> [ 63.939415] r8:c02002bc r7:00000126 r6:00000003 r5:00000000 r4:00000004
> [ 63.946217] sys_setsockopt from return_to_handler+0x0/0x18
> [ 63.952053] ---[ end trace 0000000000000000 ]---
>
> Sure enough the top of the parent stack is off by 8. (Tested with
> gcc6.3/gcc8.3/gcc12.3)
> 00006dcc <packet_setsockopt>:
> 6dcc: e1a0c00d mov ip, sp
> 6dd0: e24dd008 sub sp, sp, #8 <======
> 6dd4: e92ddff0 push {r4, r5, r6, r7, r8, r9, sl,
> fp, ip, lr, pc}
> 6dd8: e24cb00c sub fp, ip, #12
> 6ddc: e24dd06c sub sp, sp, #108 @ 0x6c
> 6de0: e52de004 push {lr} @ (str lr, [sp,
> #-4]!)
> 6de4: ebfffffe bl 0 <__gnu_mcount_nc>
>
> I'm not quite sure why gcc is putting this extra 8 byte frame (maybe
> some optimization?), but it isn't being accounted for thus the
> FUNCTION_GRAPH_FP_TEST for arm fails. Note that only some functions do
> this. Function graph works with FUNCTION_GRAPH_FP_TEST disabled, so it
> looks the test is hitting false positives.
>
Thanks for the report.
It appears the sub instruction at 0x6dd0 correctly accounts for the
extra 8 bytes, so the frame pointer is valid. So it is our assumption
that there are no gaps between the stack frames is invalid.
Could you try the following change please?
--- a/arch/arm/kernel/ftrace.c
+++ b/arch/arm/kernel/ftrace.c
@@ -235,8 +235,12 @@
return;
if (IS_ENABLED(CONFIG_UNWINDER_FRAME_POINTER)) {
- /* FP points one word below parent's top of stack */
- frame_pointer += 4;
+ /*
+ * The top of stack of the parent is recorded in the stack
+ * frame at offset [fp, #-8].
+ */
+ get_kernel_nofault(frame_pointer,
+ (unsigned long *)(frame_pointer - 8));
} else {
struct stackframe frame = {
.fp = frame_pointer,
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: ARM Ftrace Function Graph Fails With UNWINDER_FRAME_POINTER
2023-12-01 9:12 ` Ard Biesheuvel
@ 2023-12-01 17:25 ` Justin Chen
2023-12-01 18:07 ` Steven Rostedt
2023-12-01 18:22 ` Russell King (Oracle)
1 sibling, 1 reply; 15+ messages in thread
From: Justin Chen @ 2023-12-01 17:25 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: rostedt, mhiramat, mark.rutland, linux, linux-trace-kernel,
linux-arm-kernel, Florian Fainelli, Doug Berger
[-- Attachment #1: Type: text/plain, Size: 4439 bytes --]
On 12/1/2023 1:12 AM, Ard Biesheuvel wrote:
> On Fri, 1 Dec 2023 at 00:48, Justin Chen <justin.chen@broadcom.com> wrote:
>>
>> Hello,
>>
>> Ran into an odd bug that I am unsure what the solution is. Tested a few
>> kernels versions and they all fail the same.
>>
>> FUNCTION_GRAPH_FP_TEST was enabled with 953f534a7ed6 ("ARM: ftrace:
>> enable HAVE_FUNCTION_GRAPH_FP_TEST"). This test fails when
>> UNWINDER_FRAME_POINTER is enabled. Enable function_graph tracer and you
>> should see a failure similar to below.
>>
>> [ 63.817239] ------------[ cut here ]------------
>> [ 63.822006] WARNING: CPU: 3 PID: 1185 at kernel/trace/fgraph.c:195
>> ftrace_return_to_handler+0x228/0x374
>> [ 63.831645] Bad frame pointer: expected d1e0df40, received d1e0df48
>> [ 63.831645] from func packet_setsockopt return to c0b558f4
>> [ 63.843801] Modules linked in: bdc udc_core
>> [ 63.848246] CPU: 3 PID: 1185 Comm: udhcpc Not tainted
>> 6.1.53-0.1pre-gf0bc552d12f8 #33
>> [ 63.856209] Hardware name: Broadcom STB (Flattened Device Tree)
>> [ 63.862227] Backtrace:
>> [ 63.864761] dump_backtrace from show_stack+0x20/0x24
>> [ 63.869982] r7:c031cd8c r6:00000009 r5:00000013 r4:c11c7fac
>> [ 63.875736] show_stack from dump_stack_lvl+0x48/0x54
>> [ 63.880929] dump_stack_lvl from dump_stack+0x18/0x1c
>> [ 63.886111] r5:000000c3 r4:c11bd92c
>> [ 63.889764] dump_stack from __warn+0x88/0x130
>> [ 63.894339] __warn from warn_slowpath_fmt+0x140/0x198
>> [ 63.899631] r8:d1e0deac r7:c11bd958 r6:c031cd8c r5:c11bd92c r4:00000000
>> [ 63.906431] warn_slowpath_fmt from ftrace_return_to_handler+0x228/0x374
>> [ 63.913294] r8:c3a8d840 r7:00000002 r6:d1e0df48 r5:c2377a94 r4:c269a400
>> [ 63.920095] ftrace_return_to_handler from return_to_handler+0xc/0x18
>> [ 63.926699] r8:c0cd8ed0 r7:00000008 r6:c418c500 r5:00000004 r4:00000107
>> [ 63.933500] __sys_setsockopt from return_to_handler+0x0/0x18
>> [ 63.939415] r8:c02002bc r7:00000126 r6:00000003 r5:00000000 r4:00000004
>> [ 63.946217] sys_setsockopt from return_to_handler+0x0/0x18
>> [ 63.952053] ---[ end trace 0000000000000000 ]---
>>
>> Sure enough the top of the parent stack is off by 8. (Tested with
>> gcc6.3/gcc8.3/gcc12.3)
>> 00006dcc <packet_setsockopt>:
>> 6dcc: e1a0c00d mov ip, sp
>> 6dd0: e24dd008 sub sp, sp, #8 <======
>> 6dd4: e92ddff0 push {r4, r5, r6, r7, r8, r9, sl,
>> fp, ip, lr, pc}
>> 6dd8: e24cb00c sub fp, ip, #12
>> 6ddc: e24dd06c sub sp, sp, #108 @ 0x6c
>> 6de0: e52de004 push {lr} @ (str lr, [sp,
>> #-4]!)
>> 6de4: ebfffffe bl 0 <__gnu_mcount_nc>
>>
>> I'm not quite sure why gcc is putting this extra 8 byte frame (maybe
>> some optimization?), but it isn't being accounted for thus the
>> FUNCTION_GRAPH_FP_TEST for arm fails. Note that only some functions do
>> this. Function graph works with FUNCTION_GRAPH_FP_TEST disabled, so it
>> looks the test is hitting false positives.
>>
>
> Thanks for the report.
>
> It appears the sub instruction at 0x6dd0 correctly accounts for the
> extra 8 bytes, so the frame pointer is valid. So it is our assumption
> that there are no gaps between the stack frames is invalid.
Thanks for the assistance. The gap between the stack frame depends on
the function. Most do not have a gap. Some have 8 (as shown above), some
have 12. A single assumption here is not going to work. I'm having a
hard time finding out the reasoning for this gap. I tried disabling a
bunch of gcc flags as well as -O2 and the gap still exists.
Thanks,
Justin
>
> Could you try the following change please?
>
> --- a/arch/arm/kernel/ftrace.c
> +++ b/arch/arm/kernel/ftrace.c
> @@ -235,8 +235,12 @@
> return;
>
> if (IS_ENABLED(CONFIG_UNWINDER_FRAME_POINTER)) {
> - /* FP points one word below parent's top of stack */
> - frame_pointer += 4;
> + /*
> + * The top of stack of the parent is recorded in the stack
> + * frame at offset [fp, #-8].
> + */
> + get_kernel_nofault(frame_pointer,
> + (unsigned long *)(frame_pointer - 8));
> } else {
> struct stackframe frame = {
> .fp = frame_pointer,
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4206 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: ARM Ftrace Function Graph Fails With UNWINDER_FRAME_POINTER
2023-12-01 17:25 ` Justin Chen
@ 2023-12-01 18:07 ` Steven Rostedt
2023-12-01 22:59 ` Justin Chen
0 siblings, 1 reply; 15+ messages in thread
From: Steven Rostedt @ 2023-12-01 18:07 UTC (permalink / raw)
To: Justin Chen
Cc: Ard Biesheuvel, mhiramat, mark.rutland, linux, linux-trace-kernel,
linux-arm-kernel, Florian Fainelli, Doug Berger
On Fri, 1 Dec 2023 09:25:59 -0800
Justin Chen <justin.chen@broadcom.com> wrote:
> > It appears the sub instruction at 0x6dd0 correctly accounts for the
> > extra 8 bytes, so the frame pointer is valid. So it is our assumption
> > that there are no gaps between the stack frames is invalid.
>
> Thanks for the assistance. The gap between the stack frame depends on
> the function. Most do not have a gap. Some have 8 (as shown above), some
> have 12. A single assumption here is not going to work. I'm having a
> hard time finding out the reasoning for this gap. I tried disabling a
> bunch of gcc flags as well as -O2 and the gap still exists.
That code was originally added because of some strange things that gcc did
with mcount (for example, it made a copy of the stack frame that it passed
to mcount, where the function graph tracer replaced the copy of the return
stack making the shadow stack go out of sync and crash). This was very hard
to debug and I added this code to detect it if it happened again.
Well it's been over a decade since that happened (2009).
71e308a239c09 ("function-graph: add stack frame test")
I'm happy assuming that the compiler folks are aware of our tricks with
hijacking return calls and I don't expect it to happen again. We can just
rip out those checks. That is, if it's only causing false positives, I
don't think it's worth keeping around.
Has it detected any real issues on the Arm platforms?
-- Steve
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: ARM Ftrace Function Graph Fails With UNWINDER_FRAME_POINTER
2023-12-01 9:12 ` Ard Biesheuvel
2023-12-01 17:25 ` Justin Chen
@ 2023-12-01 18:22 ` Russell King (Oracle)
1 sibling, 0 replies; 15+ messages in thread
From: Russell King (Oracle) @ 2023-12-01 18:22 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: Justin Chen, rostedt, mhiramat, mark.rutland, linux-trace-kernel,
linux-arm-kernel, Florian Fainelli, Doug Berger
On Fri, Dec 01, 2023 at 10:12:48AM +0100, Ard Biesheuvel wrote:
> It appears the sub instruction at 0x6dd0 correctly accounts for the
> extra 8 bytes, so the frame pointer is valid. So it is our assumption
> that there are no gaps between the stack frames is invalid.
>
> Could you try the following change please?
>
> --- a/arch/arm/kernel/ftrace.c
> +++ b/arch/arm/kernel/ftrace.c
> @@ -235,8 +235,12 @@
> return;
>
> if (IS_ENABLED(CONFIG_UNWINDER_FRAME_POINTER)) {
> - /* FP points one word below parent's top of stack */
> - frame_pointer += 4;
> + /*
> + * The top of stack of the parent is recorded in the stack
> + * frame at offset [fp, #-8].
> + */
> + get_kernel_nofault(frame_pointer,
> + (unsigned long *)(frame_pointer - 8));
Yes, this will get the value of the stack pointer when the function
was entered - which may be the bottom of the parent function's stack
_or_ the start of non-register arguments to this function. So your
replacement has always been more correct.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: ARM Ftrace Function Graph Fails With UNWINDER_FRAME_POINTER
2023-12-01 18:07 ` Steven Rostedt
@ 2023-12-01 22:59 ` Justin Chen
2023-12-02 6:53 ` Ard Biesheuvel
0 siblings, 1 reply; 15+ messages in thread
From: Justin Chen @ 2023-12-01 22:59 UTC (permalink / raw)
To: Steven Rostedt
Cc: Ard Biesheuvel, mhiramat, mark.rutland, linux, linux-trace-kernel,
linux-arm-kernel, Florian Fainelli, Doug Berger
[-- Attachment #1: Type: text/plain, Size: 1780 bytes --]
On 12/1/23 10:07 AM, Steven Rostedt wrote:
> On Fri, 1 Dec 2023 09:25:59 -0800
> Justin Chen <justin.chen@broadcom.com> wrote:
>
>>> It appears the sub instruction at 0x6dd0 correctly accounts for the
>>> extra 8 bytes, so the frame pointer is valid. So it is our assumption
>>> that there are no gaps between the stack frames is invalid.
>>
>> Thanks for the assistance. The gap between the stack frame depends on
>> the function. Most do not have a gap. Some have 8 (as shown above), some
>> have 12. A single assumption here is not going to work. I'm having a
>> hard time finding out the reasoning for this gap. I tried disabling a
>> bunch of gcc flags as well as -O2 and the gap still exists.
>
> That code was originally added because of some strange things that gcc did
> with mcount (for example, it made a copy of the stack frame that it passed
> to mcount, where the function graph tracer replaced the copy of the return
> stack making the shadow stack go out of sync and crash). This was very hard
> to debug and I added this code to detect it if it happened again.
>
> Well it's been over a decade since that happened (2009).
>
> 71e308a239c09 ("function-graph: add stack frame test")
>
> I'm happy assuming that the compiler folks are aware of our tricks with
> hijacking return calls and I don't expect it to happen again. We can just
> rip out those checks. That is, if it's only causing false positives, I
> don't think it's worth keeping around.
>
> Has it detected any real issues on the Arm platforms?
>
> -- Steve
I am not familiar enough to make a call. But from my limited testing
with ARM, I didn't see any issues. If you would like me to, I can submit
a patch to remove the check entirely. Or maybe only disable it for ARM?
Thanks,
Justin
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4206 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: ARM Ftrace Function Graph Fails With UNWINDER_FRAME_POINTER
2023-12-01 22:59 ` Justin Chen
@ 2023-12-02 6:53 ` Ard Biesheuvel
2023-12-02 8:49 ` Justin Chen
0 siblings, 1 reply; 15+ messages in thread
From: Ard Biesheuvel @ 2023-12-02 6:53 UTC (permalink / raw)
To: Justin Chen
Cc: Steven Rostedt, mhiramat, mark.rutland, linux, linux-trace-kernel,
linux-arm-kernel, Florian Fainelli, Doug Berger
On Fri, 1 Dec 2023 at 23:59, Justin Chen <justin.chen@broadcom.com> wrote:
>
>
>
> On 12/1/23 10:07 AM, Steven Rostedt wrote:
> > On Fri, 1 Dec 2023 09:25:59 -0800
> > Justin Chen <justin.chen@broadcom.com> wrote:
> >
> >>> It appears the sub instruction at 0x6dd0 correctly accounts for the
> >>> extra 8 bytes, so the frame pointer is valid. So it is our assumption
> >>> that there are no gaps between the stack frames is invalid.
> >>
> >> Thanks for the assistance. The gap between the stack frame depends on
> >> the function. Most do not have a gap. Some have 8 (as shown above), some
> >> have 12. A single assumption here is not going to work. I'm having a
> >> hard time finding out the reasoning for this gap. I tried disabling a
> >> bunch of gcc flags as well as -O2 and the gap still exists.
> >
> > That code was originally added because of some strange things that gcc did
> > with mcount (for example, it made a copy of the stack frame that it passed
> > to mcount, where the function graph tracer replaced the copy of the return
> > stack making the shadow stack go out of sync and crash). This was very hard
> > to debug and I added this code to detect it if it happened again.
> >
> > Well it's been over a decade since that happened (2009).
> >
> > 71e308a239c09 ("function-graph: add stack frame test")
> >
> > I'm happy assuming that the compiler folks are aware of our tricks with
> > hijacking return calls and I don't expect it to happen again. We can just
> > rip out those checks. That is, if it's only causing false positives, I
> > don't think it's worth keeping around.
> >
> > Has it detected any real issues on the Arm platforms?
> >
> > -- Steve
>
> I am not familiar enough to make a call. But from my limited testing
> with ARM, I didn't see any issues. If you would like me to, I can submit
> a patch to remove the check entirely. Or maybe only disable it for ARM?
>
Please try the fix I proposed first.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: ARM Ftrace Function Graph Fails With UNWINDER_FRAME_POINTER
2023-12-02 6:53 ` Ard Biesheuvel
@ 2023-12-02 8:49 ` Justin Chen
2023-12-02 9:26 ` Ard Biesheuvel
0 siblings, 1 reply; 15+ messages in thread
From: Justin Chen @ 2023-12-02 8:49 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: Steven Rostedt, mhiramat, mark.rutland, linux, linux-trace-kernel,
linux-arm-kernel, Florian Fainelli, Doug Berger
[-- Attachment #1: Type: text/plain, Size: 2893 bytes --]
On 12/1/23 10:53 PM, Ard Biesheuvel wrote:
> On Fri, 1 Dec 2023 at 23:59, Justin Chen <justin.chen@broadcom.com> wrote:
>>
>>
>>
>> On 12/1/23 10:07 AM, Steven Rostedt wrote:
>>> On Fri, 1 Dec 2023 09:25:59 -0800
>>> Justin Chen <justin.chen@broadcom.com> wrote:
>>>
>>>>> It appears the sub instruction at 0x6dd0 correctly accounts for the
>>>>> extra 8 bytes, so the frame pointer is valid. So it is our assumption
>>>>> that there are no gaps between the stack frames is invalid.
>>>>
>>>> Thanks for the assistance. The gap between the stack frame depends on
>>>> the function. Most do not have a gap. Some have 8 (as shown above), some
>>>> have 12. A single assumption here is not going to work. I'm having a
>>>> hard time finding out the reasoning for this gap. I tried disabling a
>>>> bunch of gcc flags as well as -O2 and the gap still exists.
>>>
>>> That code was originally added because of some strange things that gcc did
>>> with mcount (for example, it made a copy of the stack frame that it passed
>>> to mcount, where the function graph tracer replaced the copy of the return
>>> stack making the shadow stack go out of sync and crash). This was very hard
>>> to debug and I added this code to detect it if it happened again.
>>>
>>> Well it's been over a decade since that happened (2009).
>>>
>>> 71e308a239c09 ("function-graph: add stack frame test")
>>>
>>> I'm happy assuming that the compiler folks are aware of our tricks with
>>> hijacking return calls and I don't expect it to happen again. We can just
>>> rip out those checks. That is, if it's only causing false positives, I
>>> don't think it's worth keeping around.
>>>
>>> Has it detected any real issues on the Arm platforms?
>>>
>>> -- Steve
>>
>> I am not familiar enough to make a call. But from my limited testing
>> with ARM, I didn't see any issues. If you would like me to, I can submit
>> a patch to remove the check entirely. Or maybe only disable it for ARM?
>>
>
> Please try the fix I proposed first.
Just tested it. Seems to do the trick. Either solution works for me.
FWIW I also experimented with LLVM, looks like function_graph just
crashes regardless of the issue being discussed. The disassemble of
LLVM[1] does something completely different.
Thanks,
Justin
[1]
LLVM dump
c0c6faa0 <sk_getsockopt>:
c0c6faa0: f0 4f 2d e9 push {r4, r5, r6, r7, r8, r9, r10, r11, lr}
c0c6faa4: 1c b0 8d e2 add r11, sp, #28
c0c6faa8: ac d0 4d e2 sub sp, sp, #172
c0c6faac: 00 70 a0 e1 mov r7, r0
c0c6fab0: c8 0c 04 e3 movw r0, #19656
c0c6fab4: 80 02 4c e3 movt r0, #49792
c0c6fab8: 03 50 a0 e1 mov r5, r3
c0c6fabc: 00 00 90 e5 ldr r0, [r0]
c0c6fac0: 02 a0 a0 e1 mov r10, r2
c0c6fac4: 20 00 0b e5 str r0, [r11, #-32]
c0c6fac8: 00 40 2d e9 stmdb sp!, {lr}
c0c6facc: 4b 8b d6 eb bl 0xc0212800 <__gnu_mcount_nc> @ imm =
#-10867412
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4206 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: ARM Ftrace Function Graph Fails With UNWINDER_FRAME_POINTER
2023-12-02 8:49 ` Justin Chen
@ 2023-12-02 9:26 ` Ard Biesheuvel
2023-12-02 17:49 ` Justin Chen
0 siblings, 1 reply; 15+ messages in thread
From: Ard Biesheuvel @ 2023-12-02 9:26 UTC (permalink / raw)
To: Justin Chen
Cc: Steven Rostedt, mhiramat, mark.rutland, linux, linux-trace-kernel,
linux-arm-kernel, Florian Fainelli, Doug Berger
On Sat, 2 Dec 2023 at 09:49, Justin Chen <justin.chen@broadcom.com> wrote:
>
>
>
> On 12/1/23 10:53 PM, Ard Biesheuvel wrote:
> > On Fri, 1 Dec 2023 at 23:59, Justin Chen <justin.chen@broadcom.com> wrote:
> >>
> >>
> >>
> >> On 12/1/23 10:07 AM, Steven Rostedt wrote:
> >>> On Fri, 1 Dec 2023 09:25:59 -0800
> >>> Justin Chen <justin.chen@broadcom.com> wrote:
> >>>
> >>>>> It appears the sub instruction at 0x6dd0 correctly accounts for the
> >>>>> extra 8 bytes, so the frame pointer is valid. So it is our assumption
> >>>>> that there are no gaps between the stack frames is invalid.
> >>>>
> >>>> Thanks for the assistance. The gap between the stack frame depends on
> >>>> the function. Most do not have a gap. Some have 8 (as shown above), some
> >>>> have 12. A single assumption here is not going to work. I'm having a
> >>>> hard time finding out the reasoning for this gap. I tried disabling a
> >>>> bunch of gcc flags as well as -O2 and the gap still exists.
> >>>
> >>> That code was originally added because of some strange things that gcc did
> >>> with mcount (for example, it made a copy of the stack frame that it passed
> >>> to mcount, where the function graph tracer replaced the copy of the return
> >>> stack making the shadow stack go out of sync and crash). This was very hard
> >>> to debug and I added this code to detect it if it happened again.
> >>>
> >>> Well it's been over a decade since that happened (2009).
> >>>
> >>> 71e308a239c09 ("function-graph: add stack frame test")
> >>>
> >>> I'm happy assuming that the compiler folks are aware of our tricks with
> >>> hijacking return calls and I don't expect it to happen again. We can just
> >>> rip out those checks. That is, if it's only causing false positives, I
> >>> don't think it's worth keeping around.
> >>>
> >>> Has it detected any real issues on the Arm platforms?
> >>>
> >>> -- Steve
> >>
> >> I am not familiar enough to make a call. But from my limited testing
> >> with ARM, I didn't see any issues. If you would like me to, I can submit
> >> a patch to remove the check entirely. Or maybe only disable it for ARM?
> >>
> >
> > Please try the fix I proposed first.
>
> Just tested it. Seems to do the trick.
Thanks
> Either solution works for me.
>
Given that this discussion is taking place in the context of the
report of an issue identified by GRAPH_FP_TEST, I don't see how
removing that would be a reasonable conclusion.
> FWIW I also experimented with LLVM, looks like function_graph just
> crashes regardless of the issue being discussed. The disassemble of
> LLVM[1] does something completely different.
>
LLVM does not support CONFIG_UNWINDER_FRAME_POINTER so the fact that
the prologue looks different is expected.
In the case below, the FP {r11} is correctly made to point to a {r11,
lr} tuple on the stack, so the codegen is correct AFAICT. But IIRC we
rely on unwind information for ftrace in this case, not the frame
pointer.
Could you be more specific about the crash?
>
> [1]
> LLVM dump
> c0c6faa0 <sk_getsockopt>:
> c0c6faa0: f0 4f 2d e9 push {r4, r5, r6, r7, r8, r9, r10, r11, lr}
> c0c6faa4: 1c b0 8d e2 add r11, sp, #28
> c0c6faa8: ac d0 4d e2 sub sp, sp, #172
> c0c6faac: 00 70 a0 e1 mov r7, r0
> c0c6fab0: c8 0c 04 e3 movw r0, #19656
> c0c6fab4: 80 02 4c e3 movt r0, #49792
> c0c6fab8: 03 50 a0 e1 mov r5, r3
> c0c6fabc: 00 00 90 e5 ldr r0, [r0]
> c0c6fac0: 02 a0 a0 e1 mov r10, r2
> c0c6fac4: 20 00 0b e5 str r0, [r11, #-32]
> c0c6fac8: 00 40 2d e9 stmdb sp!, {lr}
> c0c6facc: 4b 8b d6 eb bl 0xc0212800 <__gnu_mcount_nc> @ imm =
> #-10867412
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: ARM Ftrace Function Graph Fails With UNWINDER_FRAME_POINTER
2023-12-02 9:26 ` Ard Biesheuvel
@ 2023-12-02 17:49 ` Justin Chen
0 siblings, 0 replies; 15+ messages in thread
From: Justin Chen @ 2023-12-02 17:49 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: Steven Rostedt, mhiramat, mark.rutland, linux, linux-trace-kernel,
linux-arm-kernel, Florian Fainelli, Doug Berger
[-- Attachment #1: Type: text/plain, Size: 3921 bytes --]
On 12/2/2023 1:26 AM, Ard Biesheuvel wrote:
> On Sat, 2 Dec 2023 at 09:49, Justin Chen <justin.chen@broadcom.com> wrote:
>>
>>
>>
>> On 12/1/23 10:53 PM, Ard Biesheuvel wrote:
>>> On Fri, 1 Dec 2023 at 23:59, Justin Chen <justin.chen@broadcom.com> wrote:
>>>>
>>>>
>>>>
>>>> On 12/1/23 10:07 AM, Steven Rostedt wrote:
>>>>> On Fri, 1 Dec 2023 09:25:59 -0800
>>>>> Justin Chen <justin.chen@broadcom.com> wrote:
>>>>>
>>>>>>> It appears the sub instruction at 0x6dd0 correctly accounts for the
>>>>>>> extra 8 bytes, so the frame pointer is valid. So it is our assumption
>>>>>>> that there are no gaps between the stack frames is invalid.
>>>>>>
>>>>>> Thanks for the assistance. The gap between the stack frame depends on
>>>>>> the function. Most do not have a gap. Some have 8 (as shown above), some
>>>>>> have 12. A single assumption here is not going to work. I'm having a
>>>>>> hard time finding out the reasoning for this gap. I tried disabling a
>>>>>> bunch of gcc flags as well as -O2 and the gap still exists.
>>>>>
>>>>> That code was originally added because of some strange things that gcc did
>>>>> with mcount (for example, it made a copy of the stack frame that it passed
>>>>> to mcount, where the function graph tracer replaced the copy of the return
>>>>> stack making the shadow stack go out of sync and crash). This was very hard
>>>>> to debug and I added this code to detect it if it happened again.
>>>>>
>>>>> Well it's been over a decade since that happened (2009).
>>>>>
>>>>> 71e308a239c09 ("function-graph: add stack frame test")
>>>>>
>>>>> I'm happy assuming that the compiler folks are aware of our tricks with
>>>>> hijacking return calls and I don't expect it to happen again. We can just
>>>>> rip out those checks. That is, if it's only causing false positives, I
>>>>> don't think it's worth keeping around.
>>>>>
>>>>> Has it detected any real issues on the Arm platforms?
>>>>>
>>>>> -- Steve
>>>>
>>>> I am not familiar enough to make a call. But from my limited testing
>>>> with ARM, I didn't see any issues. If you would like me to, I can submit
>>>> a patch to remove the check entirely. Or maybe only disable it for ARM?
>>>>
>>>
>>> Please try the fix I proposed first.
>>
>> Just tested it. Seems to do the trick.
>
> Thanks
>
>> Either solution works for me.
>>
>
> Given that this discussion is taking place in the context of the
> report of an issue identified by GRAPH_FP_TEST, I don't see how
> removing that would be a reasonable conclusion.
>
Fair enough. I will submit a patch. Thanks for the help.
>> FWIW I also experimented with LLVM, looks like function_graph just
>> crashes regardless of the issue being discussed. The disassemble of
>> LLVM[1] does something completely different.
>>
>
>
> LLVM does not support CONFIG_UNWINDER_FRAME_POINTER so the fact that
> the prologue looks different is expected.
>
> In the case below, the FP {r11} is correctly made to point to a {r11,
> lr} tuple on the stack, so the codegen is correct AFAICT. But IIRC we
> rely on unwind information for ftrace in this case, not the frame
> pointer.
>
> Could you be more specific about the crash?
>
It just hangs with no prints. I can instrument the hang and see what I
can find.
Justin
>
>>
>> [1]
>> LLVM dump
>> c0c6faa0 <sk_getsockopt>:
>> c0c6faa0: f0 4f 2d e9 push {r4, r5, r6, r7, r8, r9, r10, r11, lr}
>> c0c6faa4: 1c b0 8d e2 add r11, sp, #28
>> c0c6faa8: ac d0 4d e2 sub sp, sp, #172
>> c0c6faac: 00 70 a0 e1 mov r7, r0
>> c0c6fab0: c8 0c 04 e3 movw r0, #19656
>> c0c6fab4: 80 02 4c e3 movt r0, #49792
>> c0c6fab8: 03 50 a0 e1 mov r5, r3
>> c0c6fabc: 00 00 90 e5 ldr r0, [r0]
>> c0c6fac0: 02 a0 a0 e1 mov r10, r2
>> c0c6fac4: 20 00 0b e5 str r0, [r11, #-32]
>> c0c6fac8: 00 40 2d e9 stmdb sp!, {lr}
>> c0c6facc: 4b 8b d6 eb bl 0xc0212800 <__gnu_mcount_nc> @ imm =
>> #-10867412
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4206 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: ARM Ftrace Function Graph Fails With UNWINDER_FRAME_POINTER
2023-11-30 23:47 ARM Ftrace Function Graph Fails With UNWINDER_FRAME_POINTER Justin Chen
2023-12-01 9:12 ` Ard Biesheuvel
@ 2024-05-27 7:43 ` Thorsten Scherer
2024-05-27 7:56 ` Russell King (Oracle)
1 sibling, 1 reply; 15+ messages in thread
From: Thorsten Scherer @ 2024-05-27 7:43 UTC (permalink / raw)
To: ardb
Cc: rostedt, mhiramat, mark.rutland, linux, linux-trace-kernel,
linux-arm-kernel, Justin Chen, Florian Fainelli, Doug Berger,
Uwe Kleine-König, kernel, Thorsten Scherer
Hello,
in the context of a panic on an i.MX25 based v6.9 kernel [1] Uwe pointed me to
this thread. With the proposed code change applied the procedure
# set to some known good (randomly guessed) filter function and enable function_graph
echo mtdblock_open > set_ftrace_filter
echo function_graph > current_tracer
# walk available filter funcs
cat available_filter_functions | while read f; do echo $f | tee -a set_ftrace_filter; sleep 1; done
produces the following output
[ 159.832173] Insufficient stack space to handle exception!
[ 159.832241] Task stack: [0xc8e44000..0xc8e46000]
[ 159.842701] IRQ stack: [0xc8800000..0xc8802000]
[ 159.847712] Overflow stack: [0xc1934000..0xc1935000]
[ 159.852726] Internal error: Oops - BUG: 0 [#1] PREEMPT ARM
[ 159.858273] Modules linked in: capture_events_imxgpt ti_ads7950 industrialio_triggered_buffer kfifo_buf capture_events_irq capture_events iio_trig_hrtimer industrialio_sw_trigger industrialio_configfs dm_mod
[ 159.876948] CPU: 0 PID: 199 Comm: sh Not tainted 6.9.0 #3
[ 159.882412] Hardware name: Freescale i.MX25 (Device Tree Support)
[ 159.888547] PC is at prepare_ftrace_return+0x4/0x7c
[ 159.893520] LR is at ftrace_graph_caller+0x1c/0x28
[ 159.898376] pc : [<c010dd44>] lr : [<c010d988>] psr: 60000093
[ 159.904690] sp : c8e44018 ip : c8e44018 fp : c8e4403c
[ 159.909959] r10: c0c09e78 r9 : c35e9bc0 r8 : c010d9bc
[ 159.915227] r7 : 00000001 r6 : 00000004 r5 : c8e44064 r4 : c8e440ac
[ 159.921800] r3 : c8e44030 r2 : c8e4403c r1 : c010eb9c r0 : c8e44038
[ 159.928376] Flags: nZCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment none
[ 159.935652] Control: 0005317f Table: 83074000 DAC: 00000051
[ 159.941436] Register r0 information: 2-page vmalloc region starting at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
[ 159.952253] Register r1 information: non-slab/vmalloc memory
[ 159.957988] Register r2 information: 2-page vmalloc region starting at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
[ 159.968791] Register r3 information: 2-page vmalloc region starting at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
[ 159.979592] Register r4 information: 2-page vmalloc region starting at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
[ 159.990391] Register r5 information: 2-page vmalloc region starting at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
[ 160.001187] Register r6 information: non-paged memory
[ 160.006303] Register r7 information: non-paged memory
[ 160.011415] Register r8 information: non-slab/vmalloc memory
[ 160.017139] Register r9 information: slab kmalloc-32 start c35e9bc0 pointer offset 0 size 32
[ 160.025718] Register r10 information: non-slab/vmalloc memory
[ 160.031530] Register r11 information: 2-page vmalloc region starting at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
[ 160.042422] Register r12 information: 2-page vmalloc region starting at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
[ 160.053315] Process sh (pid: 199, stack limit = 0x68fc3abb)
[ 160.058955] Stack: (0xc8e44018 to 0xc8e46000)
and points me to copy_from_kernel_nofault_allowed.
Disassembly of section .text:
c010eb8c <copy_from_kernel_nofault_allowed>:
c010eb8c: e1a0c00d mov ip, sp
c010eb90: e92dd800 push {fp, ip, lr, pc}
c010eb94: e24cb004 sub fp, ip, #4
c010eb98: e52de004 push {lr} ; (str lr, [sp, #-4]!)
c010eb9c: ebfffb4f bl c010d8e0 <__gnu_mcount_nc>
c010eba0: e35004bf cmp r0, #-1090519040 ; 0xbf000000
c010eba4: 3a000003 bcc c010ebb8 <copy_from_kernel_nofault_allowed+0x2c>
c010eba8: e0900001 adds r0, r0, r1
c010ebac: 33a00001 movcc r0, #1
c010ebb0: 23a00000 movcs r0, #0
c010ebb4: e89da800 ldm sp, {fp, sp, pc}
c010ebb8: e3a00000 mov r0, #0
c010ebbc: e89da800 ldm sp, {fp, sp, pc}
In the time given I couldn't make sense of the issue, so I am passing this
hopefully useful information to you.
Best regards
Thorsten
#regzbot introduced: 953f534a7ed6b725d4f101d2949393acc9262880 ^
[1] https://lore.kernel.org/all/alp44tukzo6mvcwl4ke4ehhmojrqnv6xfcdeuliybxfjfvgd3e@gpjvwj33cc76/
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: ARM Ftrace Function Graph Fails With UNWINDER_FRAME_POINTER
2024-05-27 7:43 ` Thorsten Scherer
@ 2024-05-27 7:56 ` Russell King (Oracle)
2024-05-27 12:28 ` Uwe Kleine-König
0 siblings, 1 reply; 15+ messages in thread
From: Russell King (Oracle) @ 2024-05-27 7:56 UTC (permalink / raw)
To: Thorsten Scherer
Cc: ardb, rostedt, mhiramat, mark.rutland, linux-trace-kernel,
linux-arm-kernel, Justin Chen, Florian Fainelli, Doug Berger,
Uwe Kleine-König, kernel
On Mon, May 27, 2024 at 09:43:41AM +0200, Thorsten Scherer wrote:
> Hello,
>
> in the context of a panic on an i.MX25 based v6.9 kernel [1] Uwe pointed me to
> this thread. With the proposed code change applied the procedure
>
> # set to some known good (randomly guessed) filter function and enable function_graph
> echo mtdblock_open > set_ftrace_filter
> echo function_graph > current_tracer
>
> # walk available filter funcs
> cat available_filter_functions | while read f; do echo $f | tee -a set_ftrace_filter; sleep 1; done
>
> produces the following output
>
> [ 159.832173] Insufficient stack space to handle exception!
> [ 159.832241] Task stack: [0xc8e44000..0xc8e46000]
> [ 159.842701] IRQ stack: [0xc8800000..0xc8802000]
> [ 159.847712] Overflow stack: [0xc1934000..0xc1935000]
> [ 159.852726] Internal error: Oops - BUG: 0 [#1] PREEMPT ARM
> [ 159.858273] Modules linked in: capture_events_imxgpt ti_ads7950 industrialio_triggered_buffer kfifo_buf capture_events_irq capture_events iio_trig_hrtimer industrialio_sw_trigger industrialio_configfs dm_mod
> [ 159.876948] CPU: 0 PID: 199 Comm: sh Not tainted 6.9.0 #3
> [ 159.882412] Hardware name: Freescale i.MX25 (Device Tree Support)
> [ 159.888547] PC is at prepare_ftrace_return+0x4/0x7c
> [ 159.893520] LR is at ftrace_graph_caller+0x1c/0x28
> [ 159.898376] pc : [<c010dd44>] lr : [<c010d988>] psr: 60000093
> [ 159.904690] sp : c8e44018 ip : c8e44018 fp : c8e4403c
> [ 159.909959] r10: c0c09e78 r9 : c35e9bc0 r8 : c010d9bc
> [ 159.915227] r7 : 00000001 r6 : 00000004 r5 : c8e44064 r4 : c8e440ac
> [ 159.921800] r3 : c8e44030 r2 : c8e4403c r1 : c010eb9c r0 : c8e44038
> [ 159.928376] Flags: nZCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment none
> [ 159.935652] Control: 0005317f Table: 83074000 DAC: 00000051
> [ 159.941436] Register r0 information: 2-page vmalloc region starting at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> [ 159.952253] Register r1 information: non-slab/vmalloc memory
> [ 159.957988] Register r2 information: 2-page vmalloc region starting at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> [ 159.968791] Register r3 information: 2-page vmalloc region starting at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> [ 159.979592] Register r4 information: 2-page vmalloc region starting at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> [ 159.990391] Register r5 information: 2-page vmalloc region starting at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> [ 160.001187] Register r6 information: non-paged memory
> [ 160.006303] Register r7 information: non-paged memory
> [ 160.011415] Register r8 information: non-slab/vmalloc memory
> [ 160.017139] Register r9 information: slab kmalloc-32 start c35e9bc0 pointer offset 0 size 32
> [ 160.025718] Register r10 information: non-slab/vmalloc memory
> [ 160.031530] Register r11 information: 2-page vmalloc region starting at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> [ 160.042422] Register r12 information: 2-page vmalloc region starting at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> [ 160.053315] Process sh (pid: 199, stack limit = 0x68fc3abb)
> [ 160.058955] Stack: (0xc8e44018 to 0xc8e46000)
No backtrace? No Code: line? I'm guessing there was an attempt to ftrace
a function involving the ftrace tracing infrastructure, which is why 8KB
of stack has been gobbled up. It could be
copy_from_kernel_nofault_allowed() but it would be useful to have at
least some extract of the backtrace showing the recursive cycle to
confirm, otherwise there is nothing in your report to confirm. As I'm
not a ftrace user myself, this isn't something I'd test for, so having
a full report would be useful.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: ARM Ftrace Function Graph Fails With UNWINDER_FRAME_POINTER
2024-05-27 7:56 ` Russell King (Oracle)
@ 2024-05-27 12:28 ` Uwe Kleine-König
2024-05-27 12:51 ` Russell King (Oracle)
0 siblings, 1 reply; 15+ messages in thread
From: Uwe Kleine-König @ 2024-05-27 12:28 UTC (permalink / raw)
To: Russell King (Oracle)
Cc: Thorsten Scherer, mark.rutland, Florian Fainelli, rostedt,
Justin Chen, Doug Berger, mhiramat, kernel, ardb,
linux-arm-kernel, linux-trace-kernel
[-- Attachment #1: Type: text/plain, Size: 4374 bytes --]
On Mon, May 27, 2024 at 08:56:16AM +0100, Russell King (Oracle) wrote:
> On Mon, May 27, 2024 at 09:43:41AM +0200, Thorsten Scherer wrote:
> > Hello,
> >
> > in the context of a panic on an i.MX25 based v6.9 kernel [1] Uwe pointed me to
> > this thread. With the proposed code change applied the procedure
> >
> > # set to some known good (randomly guessed) filter function and enable function_graph
> > echo mtdblock_open > set_ftrace_filter
> > echo function_graph > current_tracer
> >
> > # walk available filter funcs
> > cat available_filter_functions | while read f; do echo $f | tee -a set_ftrace_filter; sleep 1; done
> >
> > produces the following output
> >
> > [ 159.832173] Insufficient stack space to handle exception!
> > [ 159.832241] Task stack: [0xc8e44000..0xc8e46000]
> > [ 159.842701] IRQ stack: [0xc8800000..0xc8802000]
> > [ 159.847712] Overflow stack: [0xc1934000..0xc1935000]
> > [ 159.852726] Internal error: Oops - BUG: 0 [#1] PREEMPT ARM
> > [ 159.858273] Modules linked in: capture_events_imxgpt ti_ads7950 industrialio_triggered_buffer kfifo_buf capture_events_irq capture_events iio_trig_hrtimer industrialio_sw_trigger industrialio_configfs dm_mod
> > [ 159.876948] CPU: 0 PID: 199 Comm: sh Not tainted 6.9.0 #3
> > [ 159.882412] Hardware name: Freescale i.MX25 (Device Tree Support)
> > [ 159.888547] PC is at prepare_ftrace_return+0x4/0x7c
> > [ 159.893520] LR is at ftrace_graph_caller+0x1c/0x28
> > [ 159.898376] pc : [<c010dd44>] lr : [<c010d988>] psr: 60000093
> > [ 159.904690] sp : c8e44018 ip : c8e44018 fp : c8e4403c
> > [ 159.909959] r10: c0c09e78 r9 : c35e9bc0 r8 : c010d9bc
> > [ 159.915227] r7 : 00000001 r6 : 00000004 r5 : c8e44064 r4 : c8e440ac
> > [ 159.921800] r3 : c8e44030 r2 : c8e4403c r1 : c010eb9c r0 : c8e44038
> > [ 159.928376] Flags: nZCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment none
> > [ 159.935652] Control: 0005317f Table: 83074000 DAC: 00000051
> > [ 159.941436] Register r0 information: 2-page vmalloc region starting at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> > [ 159.952253] Register r1 information: non-slab/vmalloc memory
> > [ 159.957988] Register r2 information: 2-page vmalloc region starting at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> > [ 159.968791] Register r3 information: 2-page vmalloc region starting at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> > [ 159.979592] Register r4 information: 2-page vmalloc region starting at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> > [ 159.990391] Register r5 information: 2-page vmalloc region starting at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> > [ 160.001187] Register r6 information: non-paged memory
> > [ 160.006303] Register r7 information: non-paged memory
> > [ 160.011415] Register r8 information: non-slab/vmalloc memory
> > [ 160.017139] Register r9 information: slab kmalloc-32 start c35e9bc0 pointer offset 0 size 32
> > [ 160.025718] Register r10 information: non-slab/vmalloc memory
> > [ 160.031530] Register r11 information: 2-page vmalloc region starting at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> > [ 160.042422] Register r12 information: 2-page vmalloc region starting at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> > [ 160.053315] Process sh (pid: 199, stack limit = 0x68fc3abb)
> > [ 160.058955] Stack: (0xc8e44018 to 0xc8e46000)
>
> No backtrace? No Code: line? I'm guessing there was an attempt to ftrace
> a function involving the ftrace tracing infrastructure, which is why 8KB
> of stack has been gobbled up. It could be
> copy_from_kernel_nofault_allowed() but it would be useful to have at
> least some extract of the backtrace showing the recursive cycle to
> confirm, otherwise there is nothing in your report to confirm. As I'm
> not a ftrace user myself, this isn't something I'd test for, so having
> a full report would be useful.
Is not having a backtrace related to ftrace_return_address() returning
NULL, as Arnd pointed out in
https://lore.kernel.org/linux-arm-kernel/36cd10de-c51c-40ff-90e8-71495406019d@app.fastmail.com/
?
Best regards
Uwe
--
Pengutronix e.K. | Uwe Kleine-König |
Industrial Linux Solutions | https://www.pengutronix.de/ |
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: ARM Ftrace Function Graph Fails With UNWINDER_FRAME_POINTER
2024-05-27 12:28 ` Uwe Kleine-König
@ 2024-05-27 12:51 ` Russell King (Oracle)
2024-05-28 4:52 ` Thorsten Scherer
0 siblings, 1 reply; 15+ messages in thread
From: Russell King (Oracle) @ 2024-05-27 12:51 UTC (permalink / raw)
To: Uwe Kleine-König
Cc: Thorsten Scherer, mark.rutland, Florian Fainelli, rostedt,
Justin Chen, Doug Berger, mhiramat, kernel, ardb,
linux-arm-kernel, linux-trace-kernel
On Mon, May 27, 2024 at 02:28:59PM +0200, Uwe Kleine-König wrote:
> On Mon, May 27, 2024 at 08:56:16AM +0100, Russell King (Oracle) wrote:
> > On Mon, May 27, 2024 at 09:43:41AM +0200, Thorsten Scherer wrote:
> > > Hello,
> > >
> > > in the context of a panic on an i.MX25 based v6.9 kernel [1] Uwe pointed me to
> > > this thread. With the proposed code change applied the procedure
> > >
> > > # set to some known good (randomly guessed) filter function and enable function_graph
> > > echo mtdblock_open > set_ftrace_filter
> > > echo function_graph > current_tracer
> > >
> > > # walk available filter funcs
> > > cat available_filter_functions | while read f; do echo $f | tee -a set_ftrace_filter; sleep 1; done
> > >
> > > produces the following output
> > >
> > > [ 159.832173] Insufficient stack space to handle exception!
> > > [ 159.832241] Task stack: [0xc8e44000..0xc8e46000]
> > > [ 159.842701] IRQ stack: [0xc8800000..0xc8802000]
> > > [ 159.847712] Overflow stack: [0xc1934000..0xc1935000]
> > > [ 159.852726] Internal error: Oops - BUG: 0 [#1] PREEMPT ARM
> > > [ 159.858273] Modules linked in: capture_events_imxgpt ti_ads7950 industrialio_triggered_buffer kfifo_buf capture_events_irq capture_events iio_trig_hrtimer industrialio_sw_trigger industrialio_configfs dm_mod
> > > [ 159.876948] CPU: 0 PID: 199 Comm: sh Not tainted 6.9.0 #3
> > > [ 159.882412] Hardware name: Freescale i.MX25 (Device Tree Support)
> > > [ 159.888547] PC is at prepare_ftrace_return+0x4/0x7c
> > > [ 159.893520] LR is at ftrace_graph_caller+0x1c/0x28
> > > [ 159.898376] pc : [<c010dd44>] lr : [<c010d988>] psr: 60000093
> > > [ 159.904690] sp : c8e44018 ip : c8e44018 fp : c8e4403c
> > > [ 159.909959] r10: c0c09e78 r9 : c35e9bc0 r8 : c010d9bc
> > > [ 159.915227] r7 : 00000001 r6 : 00000004 r5 : c8e44064 r4 : c8e440ac
> > > [ 159.921800] r3 : c8e44030 r2 : c8e4403c r1 : c010eb9c r0 : c8e44038
> > > [ 159.928376] Flags: nZCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment none
> > > [ 159.935652] Control: 0005317f Table: 83074000 DAC: 00000051
> > > [ 159.941436] Register r0 information: 2-page vmalloc region starting at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> > > [ 159.952253] Register r1 information: non-slab/vmalloc memory
> > > [ 159.957988] Register r2 information: 2-page vmalloc region starting at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> > > [ 159.968791] Register r3 information: 2-page vmalloc region starting at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> > > [ 159.979592] Register r4 information: 2-page vmalloc region starting at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> > > [ 159.990391] Register r5 information: 2-page vmalloc region starting at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> > > [ 160.001187] Register r6 information: non-paged memory
> > > [ 160.006303] Register r7 information: non-paged memory
> > > [ 160.011415] Register r8 information: non-slab/vmalloc memory
> > > [ 160.017139] Register r9 information: slab kmalloc-32 start c35e9bc0 pointer offset 0 size 32
> > > [ 160.025718] Register r10 information: non-slab/vmalloc memory
> > > [ 160.031530] Register r11 information: 2-page vmalloc region starting at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> > > [ 160.042422] Register r12 information: 2-page vmalloc region starting at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> > > [ 160.053315] Process sh (pid: 199, stack limit = 0x68fc3abb)
> > > [ 160.058955] Stack: (0xc8e44018 to 0xc8e46000)
> >
> > No backtrace? No Code: line? I'm guessing there was an attempt to ftrace
> > a function involving the ftrace tracing infrastructure, which is why 8KB
> > of stack has been gobbled up. It could be
> > copy_from_kernel_nofault_allowed() but it would be useful to have at
> > least some extract of the backtrace showing the recursive cycle to
> > confirm, otherwise there is nothing in your report to confirm. As I'm
> > not a ftrace user myself, this isn't something I'd test for, so having
> > a full report would be useful.
>
> Is not having a backtrace related to ftrace_return_address() returning
> NULL, as Arnd pointed out in
> https://lore.kernel.org/linux-arm-kernel/36cd10de-c51c-40ff-90e8-71495406019d@app.fastmail.com/
> ?
Unlikely - the stack and code lines are also missing. I think the
submitter truncated the oops which is highly likely given that it
would've dumped all 8kB of the stack in hex, and the trace and
code lines would be after that.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: ARM Ftrace Function Graph Fails With UNWINDER_FRAME_POINTER
2024-05-27 12:51 ` Russell King (Oracle)
@ 2024-05-28 4:52 ` Thorsten Scherer
0 siblings, 0 replies; 15+ messages in thread
From: Thorsten Scherer @ 2024-05-28 4:52 UTC (permalink / raw)
To: Russell King (Oracle)
Cc: Uwe Kleine-König, mark.rutland, Florian Fainelli, rostedt,
Justin Chen, Doug Berger, mhiramat, kernel, ardb,
linux-arm-kernel, linux-trace-kernel
Hello,
On Mon, May 27, 2024 at 01:51:49PM +0100, Russell King (Oracle) wrote:
> On Mon, May 27, 2024 at 02:28:59PM +0200, Uwe Kleine-König wrote:
> > On Mon, May 27, 2024 at 08:56:16AM +0100, Russell King (Oracle) wrote:
> > > On Mon, May 27, 2024 at 09:43:41AM +0200, Thorsten Scherer wrote:
> > > > Hello,
> > > >
> > > > in the context of a panic on an i.MX25 based v6.9 kernel [1] Uwe pointed me to
> > > > this thread. With the proposed code change applied the procedure
> > > >
> > > > # set to some known good (randomly guessed) filter function and enable function_graph
> > > > echo mtdblock_open > set_ftrace_filter
> > > > echo function_graph > current_tracer
> > > >
> > > > # walk available filter funcs
> > > > cat available_filter_functions | while read f; do echo $f | tee -a set_ftrace_filter; sleep 1; done
> > > >
> > > > produces the following output
> > > >
> > > > [ 159.832173] Insufficient stack space to handle exception!
> > > > [ 159.832241] Task stack: [0xc8e44000..0xc8e46000]
> > > > [ 159.842701] IRQ stack: [0xc8800000..0xc8802000]
> > > > [ 159.847712] Overflow stack: [0xc1934000..0xc1935000]
> > > > [ 159.852726] Internal error: Oops - BUG: 0 [#1] PREEMPT ARM
> > > > [ 159.858273] Modules linked in: capture_events_imxgpt ti_ads7950 industrialio_triggered_buffer kfifo_buf capture_events_irq capture_events iio_trig_hrtimer industrialio_sw_trigger industrialio_configfs dm_mod
> > > > [ 159.876948] CPU: 0 PID: 199 Comm: sh Not tainted 6.9.0 #3
> > > > [ 159.882412] Hardware name: Freescale i.MX25 (Device Tree Support)
> > > > [ 159.888547] PC is at prepare_ftrace_return+0x4/0x7c
> > > > [ 159.893520] LR is at ftrace_graph_caller+0x1c/0x28
> > > > [ 159.898376] pc : [<c010dd44>] lr : [<c010d988>] psr: 60000093
> > > > [ 159.904690] sp : c8e44018 ip : c8e44018 fp : c8e4403c
> > > > [ 159.909959] r10: c0c09e78 r9 : c35e9bc0 r8 : c010d9bc
> > > > [ 159.915227] r7 : 00000001 r6 : 00000004 r5 : c8e44064 r4 : c8e440ac
> > > > [ 159.921800] r3 : c8e44030 r2 : c8e4403c r1 : c010eb9c r0 : c8e44038
> > > > [ 159.928376] Flags: nZCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment none
> > > > [ 159.935652] Control: 0005317f Table: 83074000 DAC: 00000051
> > > > [ 159.941436] Register r0 information: 2-page vmalloc region starting at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> > > > [ 159.952253] Register r1 information: non-slab/vmalloc memory
> > > > [ 159.957988] Register r2 information: 2-page vmalloc region starting at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> > > > [ 159.968791] Register r3 information: 2-page vmalloc region starting at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> > > > [ 159.979592] Register r4 information: 2-page vmalloc region starting at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> > > > [ 159.990391] Register r5 information: 2-page vmalloc region starting at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> > > > [ 160.001187] Register r6 information: non-paged memory
> > > > [ 160.006303] Register r7 information: non-paged memory
> > > > [ 160.011415] Register r8 information: non-slab/vmalloc memory
> > > > [ 160.017139] Register r9 information: slab kmalloc-32 start c35e9bc0 pointer offset 0 size 32
> > > > [ 160.025718] Register r10 information: non-slab/vmalloc memory
> > > > [ 160.031530] Register r11 information: 2-page vmalloc region starting at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> > > > [ 160.042422] Register r12 information: 2-page vmalloc region starting at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> > > > [ 160.053315] Process sh (pid: 199, stack limit = 0x68fc3abb)
> > > > [ 160.058955] Stack: (0xc8e44018 to 0xc8e46000)
> > >
> > > No backtrace? No Code: line? I'm guessing there was an attempt to ftrace
> > > a function involving the ftrace tracing infrastructure, which is why 8KB
> > > of stack has been gobbled up. It could be
> > > copy_from_kernel_nofault_allowed() but it would be useful to have at
> > > least some extract of the backtrace showing the recursive cycle to
> > > confirm, otherwise there is nothing in your report to confirm. As I'm
> > > not a ftrace user myself, this isn't something I'd test for, so having
> > > a full report would be useful.
> >
> > Is not having a backtrace related to ftrace_return_address() returning
> > NULL, as Arnd pointed out in
> > https://lore.kernel.org/linux-arm-kernel/36cd10de-c51c-40ff-90e8-71495406019d@app.fastmail.com/
> > ?
>
> Unlikely - the stack and code lines are also missing. I think the
> submitter truncated the oops which is highly likely given that it
> would've dumped all 8kB of the stack in hex, and the trace and
> code lines would be after that.
sorry for causing additional friction by my imprecise description.
Indeed, this was the whole oops before the machine restarted. I double
checked that.
I see that this [1] thread contains a patch. Will test it on my machine.
> --
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
Best regards
Thorsten
[1] https://lore.kernel.org/all/20240527161236.313448-2-ardb+git@google.com/
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2024-05-28 4:52 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-30 23:47 ARM Ftrace Function Graph Fails With UNWINDER_FRAME_POINTER Justin Chen
2023-12-01 9:12 ` Ard Biesheuvel
2023-12-01 17:25 ` Justin Chen
2023-12-01 18:07 ` Steven Rostedt
2023-12-01 22:59 ` Justin Chen
2023-12-02 6:53 ` Ard Biesheuvel
2023-12-02 8:49 ` Justin Chen
2023-12-02 9:26 ` Ard Biesheuvel
2023-12-02 17:49 ` Justin Chen
2023-12-01 18:22 ` Russell King (Oracle)
2024-05-27 7:43 ` Thorsten Scherer
2024-05-27 7:56 ` Russell King (Oracle)
2024-05-27 12:28 ` Uwe Kleine-König
2024-05-27 12:51 ` Russell King (Oracle)
2024-05-28 4:52 ` Thorsten Scherer
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).