From: Joel Fernandes <joel@joelfernandes.org>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Jiping Ma <jiping.ma2@windriver.com>,
catalin.marinas@arm.com, will.deacon@arm.com,
linux-kernel@vger.kernel.org, mingo@redhat.com,
linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH v3] tracing: Function stack size and its name mismatch in arm64
Date: Tue, 6 Aug 2019 13:25:19 -0400 [thread overview]
Message-ID: <20190806172519.GD39951@google.com> (raw)
In-Reply-To: <20190806123455.487ac02b@gandalf.local.home>
On Tue, Aug 06, 2019 at 12:34:55PM -0400, Steven Rostedt wrote:
> On Tue, 6 Aug 2019 11:48:11 -0400
> Joel Fernandes <joel@joelfernandes.org> wrote:
>
>
> > > diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h
> > > index 5ab5200b2bdc..13a4832cfb00 100644
> > > --- a/arch/arm64/include/asm/ftrace.h
> > > +++ b/arch/arm64/include/asm/ftrace.h
> > > @@ -13,6 +13,7 @@
> > > #define HAVE_FUNCTION_GRAPH_FP_TEST
> > > #define MCOUNT_ADDR ((unsigned long)_mcount)
> > > #define MCOUNT_INSN_SIZE AARCH64_INSN_SIZE
> > > +#define ARCH_RET_ADDR_AFTER_LOCAL_VARS 1
> > >
> > > #ifndef __ASSEMBLY__
> > > #include <linux/compat.h>
> > > diff --git a/kernel/trace/trace_stack.c b/kernel/trace/trace_stack.c
> > > index 5d16f73898db..050c6bd9beac 100644
> > > --- a/kernel/trace/trace_stack.c
> > > +++ b/kernel/trace/trace_stack.c
> > > @@ -158,6 +158,18 @@ static void check_stack(unsigned long ip, unsigned long *stack)
> > > i++;
> > > }
> > >
> > > +#ifdef ARCH_RET_ADDR_AFTER_LOCAL_VARS
> > > + /*
> > > + * Most archs store the return address before storing the
> > > + * function's local variables. But some archs do this backwards.
> > > + */
> > > + if (x > 1) {
> > > + memmove(&stack_trace_index[0], &stack_trace_index[1],
> > > + sizeof(stack_trace_index[0]) * (x - 1));
> > > + x--;
> > > + }
> > > +#endif
> > > +
> > > stack_trace_nr_entries = x;
> > >
> > > if (task_stack_end_corrupted(current)) {
> >
> >
> > I am not fully understanding the fix :(. If the positions of the data and
> > FP/LR are swapped, then there should be a loop of some sort where the FP/LR
> > are copied repeatedly to undo the mess we are discussing. But in this patch
> > I see only one copy happening. May be I just don't understand this code well
> > enough. Are there any more clues for helping understand the fix?
>
> Here's the best way to explain this. The code is using the stack trace
> to figure out which function is the stack hog. Or perhaps a serious of
> stack hogs. On x86, a call stores the return address as it calls the
> next function. Then that function allocates its stack frame for its
> local variables and saving of registers.
This makes perfect sense, (probably also makes sense to push this whole
explanation into either the changelog or the kernel documentation)
Thanks a lot, Steve!
- Joel
> on x86:
>
> [ top of stack ]
> 0: sys call entry frame
> 10: return addr to entry code
> 11: start of sys_foo frame
> 20: return addr to sys_foo
> 21: start of kernel_func_bar frame
> 30: return addr to kernel_func_bar
> 31: [ do trace stack here ]
>
>
> Then we do a save_stack_trace which returns the addresses of the
> functions it finds. Which would be (from the bottom of the stack to the
> top)
>
> return addr to kernel_func_bar
> return addr to sys_foo
> return addr to entry code
>
> What we do here is try to figure out how much stack each of theses
> functions have. So we loop through the stack looking for the addresses
> returned by the save_stack trace, and see where on the stack this is.
> This gives us:
>
> return addr to kernel_func_bar [ 30 ]
> return addr to sys_foo [ 20 ]
> return addr to entry frame [ 10 ]
>
> From this, we can conclude (on x86) that the size of the stack used for
> kernel_func_bar is 30 - 20 = 10. Because on the stack we have:
>
> 20: return addr to sys_foo
> 21: start of kernel_func_bar frame <<-- kernel_func_bar stack frame
> 30: return addr to kernel_func_bar
>
>
> Now, what Jiping reported, is that on arm64, it saves the link register
> (the return address) when it is needed to, which is after the stack
> frame for the current function has been saved. That means we have
> something that looks like this:
>
> on arm64:
>
> [ top of stack ]
> 0: sys call entry frame
> 10: start of sys_foo_frame
> 19: return addr to entry code << lr saved before calling kern_func_bar
> 20: start of kernel_func_bar frame
> 29: return addr to sys_foo_frame << lr saved before calling next function
> 30: [ do trace stack here ]
>
> Now, I have a question. To call the mcount code (ftrace and the stack
> tracing), you need to save the return address of kern_func_bar
> somewhere, otherwise the call to mcount will overwrite the lr. But
> let's say it does and then forgets it, so we have:
>
> 30: return addr of kernel_func_bar frame
> 31: [ do trace stack here ]
>
> Now save_stack_trace gives us the same result:
>
> return addr to kernel_func bar
> return addr to sys_foo
> return addr to entry frame
>
> But we get a different result when finding them in the location of the
> stack.
>
> return addr to kernel_func_bar [ 30 ]
> return addr to sys foo [ 29 ]
> return addr to entry frame [ 19 ]
>
> The simple subtractions will be off:
>
> kernel_func_bar stack size = 30 - 29 = 1
> Or even, sys_foo 29 - 19 = 10, but if we look at the stack:
>
> 10: start of sys_foo_frame
> 19: return addr to entry_code
> 20: start of kernel_func_bar frame
> 29: return addr to sys_foo
>
> We are measuring the kernel_func_bar frame for sys_foo!
>
> We are off by one here.
>
> stack_trace_index[] is an array of the offsets mapping to the function
> return addresses found. If we shift it by one, then we then sync the
> functions found with their frames:
>
> stack_trace_index[0] = 30
> stack_trace_index[1] = 29
> stack_trace_index[2] = 19
>
> memmove((&stack_trace_index[0], &stack_trace_index[1],
> sizeof(stack_trace_index[0]) * (x - 1));
>
> Makes that:
>
> stack_trace_index[0] = 29
> stack_trace_index[1] = 19
>
> And we do x-- to lose the last frame.
>
> With the stack_dump_trace being:
>
> stack_dump_trace[0] = return addr kernel_func_bar
> stack_dump_trace[1] = return addr sys_foo
>
> we then match which frame size belongs to which function better.
>
>
> >
> > Also, this stack trace loop (original code) is a bit hairy :) It appears
> > there is a call to stack_trace_save() followed by another loop that goes
> > through the returned entries from there and tries to generate a set of
> > indexes. Isn't the real issue that the entries returned by stack_trace_save()
> > are a out of whack? I am curious also if other users of stack_trace_save()
> > will suffer from the same issue.
>
> No, the order is fine. The issue is that we are using the location of
> the return address in the stack to find out what function has the
> biggest stack usage, and our assumption for arm64 is incorrect in that
> location.
>
> -- Steve
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2019-08-06 17:25 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20190802094103.163576-1-jiping.ma2@windriver.com>
2019-08-02 15:22 ` [PATCH v3] tracing: Function stack size and its name mismatch in arm64 Steven Rostedt
2019-08-02 16:09 ` Steven Rostedt
2019-08-02 16:11 ` Steven Rostedt
2019-08-06 15:48 ` Joel Fernandes
2019-08-06 16:34 ` Steven Rostedt
2019-08-06 17:03 ` Steven Rostedt
2019-08-06 17:25 ` Joel Fernandes [this message]
2019-08-03 8:26 ` Joel Fernandes
2019-08-03 8:32 ` Joel Fernandes
2019-08-05 11:25 ` Will Deacon
2019-08-05 13:59 ` Steven Rostedt
2019-08-06 13:00 ` Steven Rostedt
2019-08-06 14:47 ` Joel Fernandes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190806172519.GD39951@google.com \
--to=joel@joelfernandes.org \
--cc=catalin.marinas@arm.com \
--cc=jiping.ma2@windriver.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=rostedt@goodmis.org \
--cc=will.deacon@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).