All of lore.kernel.org
 help / color / mirror / Atom feed
From: takahiro.akashi@linaro.org (AKASHI Takahiro)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH v7 5/6] arm64: ftrace: add arch-specific stack tracer
Date: Tue, 22 Dec 2015 15:41:03 +0900	[thread overview]
Message-ID: <5678F07F.8030308@linaro.org> (raw)
In-Reply-To: <20151221120438.GI23092@arm.com>

On 12/21/2015 09:04 PM, Will Deacon wrote:
> Hi Akashi,
>
> On Tue, Dec 15, 2015 at 05:33:43PM +0900, AKASHI Takahiro wrote:
>> Background and issues on generic check_stack():
>> 1) slurping stack
>>
>>      Assume that a given function A was invoked, and it was invoked again in
>>      another context, then it called another function B which allocated
>>      a large size of local variables on the stack, but it has not modified
>>      the variable(s) yet.
>>      When stack tracer, check_stack(), examines the stack looking for B,
>>      then A, we may have a chance to accidentally find a stale, not current,
>>      stack frame for A because the old frame might reside on the memory for
>>      the variable which has not been overwritten.
>>
>>      (issue) The stack_trace output may have stale entries.
>>
>> 2) differences between x86 and arm64
>>
>>      On x86, "call" instruction automatically pushes a return address on
>>      the top of the stack and decrements a stack pointer. Then child
>>      function allocates its local variables on the stack.
>>
>>      On arm64, a child function is responsible for allocating memory for
>>      local variables as well as a stack frame, and explicitly pushes
>>      a return address (LR) and old frame pointer in its function prologue
>>      *after* decreasing a stack pointer.
>>
>>      Generic check_stack() recogizes 'idxB,' which is the next address of
>>      the location where 'fpB' is found, in the picture below as an estimated
>>      stack pointer. This seems to fine with x86, but on arm64, 'idxB' is
>>      not appropriate just because it contains child function's "local
>>      variables."
>>      We should instead use spB, if possible, for better interpretation of
>>      func_B's stack usage.
>>
>> LOW      |  ...   |
>> fpA      +--------+   func_A (pcA, fpA, spA)
>>           |  fpB   |
>>      idxB + - - - -+
>>           |  pcB   |
>>           |  ... <----------- static local variables in func_A
>>           |  ...   |             and extra function args to func_A
>> spB      + - - - -+
>>           |  ... <----------- dynamically allocated variables in func_B
>> fpB      +--------+   func_B (pcB, fpB, spB)
>>           |  fpC   |
>>      idxC + - - - -+
>>           |  pcC   |
>>           |  ... <----------- static local variables in func_B
>>           |  ...   |             and extra function args to func_B
>> spC      + - - - -+
>>           |  ...   |
>> fpC      +--------+   func_C (pcC, fpC, spC)
>> HIGH     |        |
>>
>>      (issue) Stack size for a function in stack_trace output is inaccurate,
>>              or rather wrong.  It looks as if <Size> field is one-line
>> 	    offset against <Location>.
>>
>>                  Depth    Size   Location    (49 entries)
>>                  -----    ----   --------
>>           40)     1416      64   path_openat+0x128/0xe00       -> 176
>>           41)     1352     176   do_filp_open+0x74/0xf0        -> 256
>>           42)     1176     256   do_open_execat+0x74/0x1c8     -> 80
>>           43)      920      80   open_exec+0x3c/0x70           -> 32
>>           44)      840      32   load_elf_binary+0x294/0x10c8
>>
>> Implementation on arm64:
>> So we want to have our own stack tracer, check_stack().
>> Our approach is uniqeue in the following points:
>> * analyze a function prologue of a traced function to estimate a more
>>    accurate stack pointer value, replacing naive '<child's fp> + 0x10.'
>> * use walk_stackframe(), instead of slurping stack contents as orignal
>>    check_stack() does, to identify a stack frame and a stack index (height)
>>    for every callsite.
>>
>> Regarding a function prologue analyzer, there is no guarantee that we can
>> handle all the possible patterns of function prologue as gcc does not use
>> any fixed templates to generate them. 'Instruction scheduling' is another
>> issue here.
>
> Have you run this past any of the GCC folks?  It would be good to at least
> make them aware of the heuristics you're using and the types of prologue
> that we can handle. They even have suggestions to improve on your approach
> (e.g. using -fstack-usage).

Yeah, I can, but do you mind my including you in CC?
'cause I don't know what kind of comments you are expecting.

>> +static void __save_stack_trace_tsk(struct task_struct *tsk,
>> +		struct stack_trace *trace, unsigned long *stack_dump_sp)
>>   {
>>   	struct stack_trace_data data;
>>   	struct stackframe frame;
>>
>>   	data.trace = trace;
>>   	data.skip = trace->skip;
>> +#ifdef CONFIG_STACK_TRACER
>> +	data.sp = stack_dump_sp;
>> +#endif
>>
>>   	if (tsk != current) {
>>   		data.no_sched_functions = 1;
>> @@ -149,7 +319,8 @@ void save_stack_trace_tsk(struct task_struct *tsk, struct stack_trace *trace)
>>   		data.no_sched_functions = 0;
>>   		frame.fp = (unsigned long)__builtin_frame_address(0);
>>   		frame.sp = current_stack_pointer;
>> -		frame.pc = (unsigned long)save_stack_trace_tsk;
>> +		asm("1:");
>> +		asm("ldr %0, =1b" : "=r" (frame.pc));
>
> This looks extremely fragile. Does the original code not work?

My function prologue analyzer will fail because frame.pc points
to the first instruction of a function.
Otherwise, everything is fine.

Thanks,
-Takahiro AKASHI


> Will
>

  reply	other threads:[~2015-12-22  6:41 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-15  8:33 [PATCH v7 0/6] arm64: ftrace: fix incorrect output from stack tracer AKASHI Takahiro
2015-12-15  8:33 ` [PATCH v7 1/6] arm64: ftrace: modify a stack frame in a safe way AKASHI Takahiro
2015-12-15  8:33 ` [PATCH v7 2/6] arm64: pass a task parameter to unwind_frame() AKASHI Takahiro
2015-12-15  8:33 ` [PATCH v7 3/6] arm64: ftrace: fix a stack tracer's output under function graph tracer AKASHI Takahiro
2015-12-15  8:33 ` [PATCH v7 4/6] arm64: insn: add instruction decoders for ldp/stp and add/sub AKASHI Takahiro
2015-12-15  8:33 ` [PATCH v7 5/6] arm64: ftrace: add arch-specific stack tracer AKASHI Takahiro
2015-12-21 12:04   ` Will Deacon
2015-12-22  6:41     ` AKASHI Takahiro [this message]
2015-12-22  9:48       ` Will Deacon
2015-12-15  8:33 ` [PATCH v7 6/6] arm64: ftrace: add a test of function prologue analyzer AKASHI Takahiro
2015-12-21 12:05 ` [PATCH v7 0/6] arm64: ftrace: fix incorrect output from stack tracer Will Deacon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5678F07F.8030308@linaro.org \
    --to=takahiro.akashi@linaro.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.