Re: [PATCH 2/2 v2] tracing: Document the stack trace algorithm in the comments

From: Joel Fernandes <joel@joelfernandes.org>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Jiping Ma <jiping.ma2@windriver.com>,
	catalin.marinas@arm.com, will.deacon@arm.com,
	linux-kernel@vger.kernel.org, mingo@redhat.com,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH 2/2 v2] tracing: Document the stack trace algorithm in the comments
Date: Thu, 8 Aug 2019 16:17:02 -0400	[thread overview]
Message-ID: <20190808201702.GF261256@google.com> (raw)
In-Reply-To: <20190807172907.310138647@goodmis.org>

On Wed, Aug 07, 2019 at 01:28:28PM -0400, Steven Rostedt wrote:
> From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
> 
> As the max stack tracer algorithm is not that easy to understand from the
> code, add comments that explain the algorithm and mentions how
> ARCH_RET_ADDR_AFTER_LOCAL_VARS affects it.
> 
> Link: http://lkml.kernel.org/r/20190806123455.487ac02b@gandalf.local.home
> 

Acked-by: Joel Fernandes (Google) <joel@joelfernandes.org>

thanks!!

- Joel

> Suggested-by: Joel Fernandes <joel@joelfernandes.org>
> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> ---
>  kernel/trace/trace_stack.c | 98 ++++++++++++++++++++++++++++++++++++++
>  1 file changed, 98 insertions(+)
> 
> diff --git a/kernel/trace/trace_stack.c b/kernel/trace/trace_stack.c
> index 40e4a88eea8f..f94a2fc567de 100644
> --- a/kernel/trace/trace_stack.c
> +++ b/kernel/trace/trace_stack.c
> @@ -53,6 +53,104 @@ static void print_max_stack(void)
>  	}
>  }
>  
> +/*
> + * The stack tracer looks for a maximum stack at each call from a function. It
> + * registers a callback from ftrace, and in that callback it examines the stack
> + * size. It determines the stack size from the variable passed in, which is the
> + * address of a local variable in the stack_trace_call() callback function.
> + * The stack size is calculated by the address of the local variable to the top
> + * of the current stack. If that size is smaller than the currently saved max
> + * stack size, nothing more is done.
> + *
> + * If the size of the stack is greater than the maximum recorded size, then the
> + * following algorithm takes place.
> + *
> + * For architectures (like x86) that store the function's return address before
> + * saving the function's local variables, the stack will look something like
> + * this:
> + *
> + *   [ top of stack ]
> + *    0: sys call entry frame
> + *   10: return addr to entry code
> + *   11: start of sys_foo frame
> + *   20: return addr to sys_foo
> + *   21: start of kernel_func_bar frame
> + *   30: return addr to kernel_func_bar
> + *   31: [ do trace stack here ]
> + *
> + * The save_stack_trace() is called returning all the functions it finds in the
> + * current stack. Which would be (from the bottom of the stack to the top):
> + *
> + *   return addr to kernel_func_bar
> + *   return addr to sys_foo
> + *   return addr to entry code
> + *
> + * Now to figure out how much each of these functions' local variable size is,
> + * a search of the stack is made to find these values. When a match is made, it
> + * is added to the stack_dump_trace[] array. The offset into the stack is saved
> + * in the stack_trace_index[] array. The above example would show:
> + *
> + *        stack_dump_trace[]        |   stack_trace_index[]
> + *        ------------------        +   -------------------
> + *  return addr to kernel_func_bar  |          30
> + *  return addr to sys_foo          |          20
> + *  return addr to entry            |          10
> + *
> + * The print_max_stack() function above, uses these values to print the size of
> + * each function's portion of the stack.
> + *
> + *  for (i = 0; i < nr_entries; i++) {
> + *     size = i == nr_entries - 1 ? stack_trace_index[i] :
> + *                    stack_trace_index[i] - stack_trace_index[i+1]
> + *     print "%d %d %d %s\n", i, stack_trace_index[i], size, stack_dump_trace[i]);
> + *  }
> + *
> + * The above shows
> + *
> + *     depth size  location
> + *     ----- ----  --------
> + *  0    30   10   kernel_func_bar
> + *  1    20   10   sys_foo
> + *  2    10   10   entry code
> + *
> + * Now for architectures that might save the return address after the functions
> + * local variables (saving the link register before calling nested functions),
> + * this will cause the stack to look a little different:
> + *
> + * [ top of stack ]
> + *  0: sys call entry frame
> + * 10: start of sys_foo_frame
> + * 19: return addr to entry code << lr saved before calling kernel_func_bar
> + * 20: start of kernel_func_bar frame
> + * 29: return addr to sys_foo_frame << lr saved before calling next function
> + * 30: [ do trace stack here ]
> + *
> + * Although the functions returned by save_stack_trace() may be the same, the
> + * placement in the stack will be different. Using the same algorithm as above
> + * would yield:
> + *
> + *        stack_dump_trace[]        |   stack_trace_index[]
> + *        ------------------        +   -------------------
> + *  return addr to kernel_func_bar  |          30
> + *  return addr to sys_foo          |          29
> + *  return addr to entry            |          19
> + *
> + * Where the mapping is off by one:
> + *
> + *   kernel_func_bar stack frame size is 29 - 19 not 30 - 29!
> + *
> + * To fix this, if the architecture sets ARCH_RET_ADDR_AFTER_LOCAL_VARS the
> + * values in stack_trace_index[] are shifted by one to and the number of
> + * stack trace entries is decremented by one.
> + *
> + *        stack_dump_trace[]        |   stack_trace_index[]
> + *        ------------------        +   -------------------
> + *  return addr to kernel_func_bar  |          29
> + *  return addr to sys_foo          |          19
> + *
> + * Although the entry function is not displayed, the first function (sys_foo)
> + * will still include the stack size of it.
> + */
>  static void check_stack(unsigned long ip, unsigned long *stack)
>  {
>  	unsigned long this_size, flags; unsigned long *p, *top, *start;
> -- 
> 2.20.1
> 
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel