From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932920AbaEGQkX (ORCPT ); Wed, 7 May 2014 12:40:23 -0400 Received: from mail-ee0-f53.google.com ([74.125.83.53]:43474 "EHLO mail-ee0-f53.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752918AbaEGQkV (ORCPT ); Wed, 7 May 2014 12:40:21 -0400 Date: Wed, 7 May 2014 18:40:14 +0200 From: Ingo Molnar To: Frederic Weisbecker Cc: Richard Yao , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , x86@kernel.org, Andrew Morton , Tejun Heo , Vineet Gupta , Jesper Nilsson , Jiri Slaby , linux-kernel@vger.kernel.org, kernel@gentoo.org, Brian Behlendorf , Linus Torvalds , Peter Zijlstra , Arnaldo Carvalho de Melo , Jiri Olsa Subject: Re: [PATCH] x86/dumpstack: Walk frames when built with frame pointers Message-ID: <20140507164014.GB16034@gmail.com> References: <1398535818-14217-1-git-send-email-ryao@gentoo.org> <20140427120820.GC22116@gmail.com> <20140430215606.GD17745@localhost.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20140430215606.GD17745@localhost.localdomain> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Frederic Weisbecker wrote: > On Sun, Apr 27, 2014 at 02:08:20PM +0200, Ingo Molnar wrote: > > > > * Richard Yao wrote: > > > > > Stack traces are generated by scanning the stack and interpeting > > > anything that looks like it could be a pointer to something. We do > > > not need to do this when we have frame pointers, but we do it > > > anyway, with the distinction that we use the return pointers to mark > > > actual frames by the absence of a question mark. > > > > > > The additional verbosity of stack scanning tends to bombard us with > > > walls of text for no gain in practice, so lets switch to printing > > > only stack frames when frame pointers are available. That we can > > > spend less time reading stack traces and more time looking at code. > > > > > > Signed-off-by: Richard Yao > > > --- > > > arch/x86/kernel/dumpstack.c | 4 ++++ > > > 1 file changed, 4 insertions(+) > > > > > > diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c > > > index d9c12d3..94ffe06 100644 > > > --- a/arch/x86/kernel/dumpstack.c > > > +++ b/arch/x86/kernel/dumpstack.c > > > @@ -162,7 +162,11 @@ static void print_trace_address(void *data, unsigned long addr, int reliable) > > > static const struct stacktrace_ops print_trace_ops = { > > > .stack = print_trace_stack, > > > .address = print_trace_address, > > > +#ifdef CONFIG_FRAME_POINTER > > > + .walk_stack = print_context_stack_bp, > > > +#else > > > .walk_stack = print_context_stack, > > > +#endif > > > }; > > Besides the complementary informations brought by the full stack > walk, another big argument toward keeping full stack walk is that if > your frame pointer is screwed for whatever reason, you still have a > useful stack trace. > > I have seen and fixed several broken frame links in x86-64 by the > past. Those are very subtle and often hardly visible issues because, > if they are easily spotted on common frame scenarios like : task > > irq, they are much harder to find on trickier, rarer frame scenarios > such as: task -> softirq -> irq -> nmi -> debug exception ->.... > > For example before a2bbe75089d5eb9a3a46d50dd5c215e213790288 ("x86: > Don't use frame pointer to save old stack on irq entry"), we were > missing entire stack frames on nesting irqs (hardirq on softirqs) > while using pure frame pointer based unwinding. > > Who knows if we have other remaining issues like this? Especially > given the high possible number of frame combinations between task, > irq, softirq, nmi and exceptions. Multiply the contexts possibility > by the number of possible archs out there and their stack switch > implementations. > > Also further frame links breakages, we have many other possibilities > to end up with misleading frame pointers. Relying on that source > alone definetly reduce the reliability of our stacktraces. > > So this goes way beyond just missing complementary informations. > Debugging robustness itself is actually very concerned here if we > remove the full stack walk. Agreed, that's a very good point. Also, consider the following holistic argument, what is easier to achieve, when looking at an oops and not seeing the bug: - if only I had more information - if only I had less information we cannot put in information that we cut out, but it's not particularly hard to skip overly verbose information in most cases. Yes, there's a line to be drawn with verbosity: scroll-off is a concern when the oops does not make it to a log file. So I don't really know. Thanks, Ingo