public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@kernel.org>
To: Richard Yao <ryao@gentoo.org>
Cc: "Thomas Gleixner" <tglx@linutronix.de>,
	"Ingo Molnar" <mingo@redhat.com>,
	"H. Peter Anvin" <hpa@zytor.com>,
	x86@kernel.org, "Andrew Morton" <akpm@linux-foundation.org>,
	"Tejun Heo" <tj@kernel.org>, "Vineet Gupta" <vgupta@synopsys.com>,
	"Jesper Nilsson" <jesper.nilsson@axis.com>,
	"Jiri Slaby" <jslaby@suse.cz>,
	linux-kernel@vger.kernel.org, kernel@gentoo.org,
	"Brian Behlendorf" <behlendorf1@llnl.gov>,
	"Linus Torvalds" <torvalds@linux-foundation.org>,
	"Peter Zijlstra" <a.p.zijlstra@chello.nl>,
	"Frédéric Weisbecker" <fweisbec@gmail.com>,
	"Arnaldo Carvalho de Melo" <acme@infradead.org>,
	"Jiri Olsa" <jolsa@redhat.com>
Subject: Re: [PATCH] x86/dumpstack: Walk frames when built with frame pointers
Date: Sun, 27 Apr 2014 14:08:20 +0200	[thread overview]
Message-ID: <20140427120820.GC22116@gmail.com> (raw)
In-Reply-To: <1398535818-14217-1-git-send-email-ryao@gentoo.org>


* Richard Yao <ryao@gentoo.org> wrote:

> Stack traces are generated by scanning the stack and interpeting 
> anything that looks like it could be a pointer to something. We do 
> not need to do this when we have frame pointers, but we do it 
> anyway, with the distinction that we use the return pointers to mark 
> actual frames by the absence of a question mark.
> 
> The additional verbosity of stack scanning tends to bombard us with 
> walls of text for no gain in practice, so lets switch to printing 
> only stack frames when frame pointers are available. That we can 
> spend less time reading stack traces and more time looking at code.
> 
> Signed-off-by: Richard Yao <ryao@gentoo.org>
> ---
>  arch/x86/kernel/dumpstack.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
> index d9c12d3..94ffe06 100644
> --- a/arch/x86/kernel/dumpstack.c
> +++ b/arch/x86/kernel/dumpstack.c
> @@ -162,7 +162,11 @@ static void print_trace_address(void *data, unsigned long addr, int reliable)
>  static const struct stacktrace_ops print_trace_ops = {
>  	.stack			= print_trace_stack,
>  	.address		= print_trace_address,
> +#ifdef CONFIG_FRAME_POINTER
> +	.walk_stack		= print_context_stack_bp,
> +#else
>  	.walk_stack		= print_context_stack,
> +#endif
>  };

I don't really like this patch, and this question comes up often, so 
let me try to explain it more verbosely.

Lets take a real-life example of a BP-only trace about a lockup:

 Call Trace:
  [<79018d24>] arch_trigger_all_cpu_backtrace+0x3c/0x64
  [<7916a34c>] do_raw_spin_lock+0xb7/0xe8
  [<792b9412>] _raw_spin_lock_irqsave+0x35/0x3e
  [<790391e8>] prepare_to_wait+0x18/0x57
  [<792b7158>] __wait_on_bit+0x20/0x5f
  [<792b71eb>] out_of_line_wait_on_bit+0x54/0x5c
  [<790e4eff>] __wait_on_buffer+0x1b/0x1e
  [<7910f615>] __ext3_get_inode_loc+0x1ef/0x254
  [<7911015f>] ext3_iget+0x45/0x346
  [<79114e03>] ext3_lookup+0x97/0xa6
  [<790c9c0e>] lookup_real+0x20/0x35
  [<790ca1d2>] __lookup_hash+0x2a/0x31
  [<790caa0e>] lookup_slow+0x36/0x8c
  [<790cbdda>] path_lookupat+0xf9/0x5c5
  [<790cc2c5>] filename_lookup+0x1f/0x84
  [<790cde98>] user_path_at_empty+0x3f/0x65
  [<790cdecb>] user_path_at+0xd/0xf
  [<790c6585>] vfs_fstatat+0x40/0x88
  [<790c65f8>] vfs_stat+0x13/0x15
  [<790c67e2>] sys_stat64+0x11/0x22
  [<792b9d04>] syscall_call+0x7/0xb

What does this call trace tell us? That we locked up somewhere in the 
ext3 code. That's all that we know.

Now lets take a look at a 'verbose' entry of the same lockup, that 
also outputs other entries that 'look like' kernel text addresses:

 Call Trace:
  [<79018d24>] arch_trigger_all_cpu_backtrace+0x3c/0x64
  [<7916a34c>] do_raw_spin_lock+0xb7/0xe8
  [<792b9412>] _raw_spin_lock_irqsave+0x35/0x3e
  [<790391e8>] ? prepare_to_wait+0x18/0x57
  [<790391e8>] prepare_to_wait+0x18/0x57
  [<7914a320>] ? generic_make_request+0x80/0xb5
  [<790e4f30>] ? unmap_underlying_metadata+0x2e/0x2e
  [<792b7158>] __wait_on_bit+0x20/0x5f
  [<7914a427>] ? submit_bio+0xd2/0xfb
  [<792b71eb>] out_of_line_wait_on_bit+0x54/0x5c
  [<790e4f30>] ? unmap_underlying_metadata+0x2e/0x2e
  [<79039086>] ? autoremove_wake_function+0x31/0x31
  [<790e4eff>] __wait_on_buffer+0x1b/0x1e
  [<7910f615>] __ext3_get_inode_loc+0x1ef/0x254
  [<7911015f>] ext3_iget+0x45/0x346
  [<79114e03>] ext3_lookup+0x97/0xa6
  [<790c9c0e>] lookup_real+0x20/0x35
  [<790ca1d2>] __lookup_hash+0x2a/0x31
  [<790caa0e>] lookup_slow+0x36/0x8c
  [<790cbdda>] path_lookupat+0xf9/0x5c5
  [<790cc2c5>] filename_lookup+0x1f/0x84
  [<790cde98>] user_path_at_empty+0x3f/0x65
  [<790cdecb>] user_path_at+0xd/0xf
  [<790c6585>] vfs_fstatat+0x40/0x88
  [<7903f844>] ? lg_local_unlock+0x31/0x47
  [<790c65f8>] vfs_stat+0x13/0x15
  [<790c67e2>] sys_stat64+0x11/0x22
  [<790c3c47>] ? __fput+0x187/0x1a0
  [<792b9d37>] ? restore_all+0xf/0xf
  [<79165bb4>] ? trace_hardirqs_on_thunk+0xc/0x10
  [<792b9d04>] syscall_call+0x7/0xb

Firstly, it's still easy to tell the true backtrace: ignore all the 
'?' entries.

But the '?' entries also tell us something more: that recently this 
CPU submitted IO. That might or might not be suspected from the 
original trace, but from the 'extended' trace it's really obvious that 
IO activity happened before the lockup as well.

There were many other examples in the past where '?' entries gave us 
clues about the nature of a bug: they represent a poor man's trace of 
what happened recently on that CPU.

So it's useful information for hairy bugs and it would be sad to 
remove them.

Having said that, your complaint that '?' entries can make reading of 
back traces more difficult is valid as well - so maybe we can do 
something about that.

For example if we used more horizontal separation in the output 
format:

 Call Trace:
  [<79018d24>] arch_trigger_all_cpu_backtrace+0x3c/0x64
  [<7916a34c>] do_raw_spin_lock+0xb7/0xe8
  [<792b9412>] _raw_spin_lock_irqsave+0x35/0x3e
  [<790391e8>]                                   # ? prepare_to_wait+0x18/0x57
  [<790391e8>] prepare_to_wait+0x18/0x57
  [<7914a320>]                                   # ? generic_make_request+0x80/0xb5
  [<790e4f30>]                                   # ? unmap_underlying_metadata+0x2e/0x2e
  [<792b7158>] __wait_on_bit+0x20/0x5f
  [<7914a427>]                                   # ? submit_bio+0xd2/0xfb
  [<792b71eb>] out_of_line_wait_on_bit+0x54/0x5c
  [<790e4f30>]                                   # ? unmap_underlying_metadata+0x2e/0x2e
  [<79039086>]                                   # ? autoremove_wake_function+0x31/0x31
  [<790e4eff>] __wait_on_buffer+0x1b/0x1e
  [<7910f615>] __ext3_get_inode_loc+0x1ef/0x254
  [<7911015f>] ext3_iget+0x45/0x346
  [<79114e03>] ext3_lookup+0x97/0xa6
  [<790c9c0e>] lookup_real+0x20/0x35
  [<790ca1d2>] __lookup_hash+0x2a/0x31
  [<790caa0e>] lookup_slow+0x36/0x8c
  [<790cbdda>] path_lookupat+0xf9/0x5c5
  [<790cc2c5>] filename_lookup+0x1f/0x84
  [<790cde98>] user_path_at_empty+0x3f/0x65
  [<790cdecb>] user_path_at+0xd/0xf
  [<790c6585>] vfs_fstatat+0x40/0x88
  [<7903f844>]                                   # ? lg_local_unlock+0x31/0x47
  [<790c65f8>] vfs_stat+0x13/0x15
  [<790c67e2>] sys_stat64+0x11/0x22
  [<790c3c47>]                                   # ? __fput+0x187/0x1a0
  [<792b9d37>]                                   # ? restore_all+0xf/0xf
  [<79165bb4>]                                   # ? trace_hardirqs_on_thunk+0xc/0x10
  [<792b9d04>] syscall_call+0x7/0xb

then the information would still be there, it would still be in the 
right place, but it would also be much easier to ignore the right side 
column it visually. (The '# ?' prefix would also make it clear that 
this information is not so important as the left hand side column.)

Or something like that. Assuming this doesn't confuse tools and such.

Thanks,

	Ingo


  reply	other threads:[~2014-04-27 12:08 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-26 18:10 [PATCH] x86/dumpstack: Walk frames when built with frame pointers Richard Yao
2014-04-27 12:08 ` Ingo Molnar [this message]
2014-04-27 19:42   ` Peter Zijlstra
2014-04-27 19:51   ` Richard Yao
2014-04-27 20:08   ` Linus Torvalds
2014-04-27 20:36     ` Richard Yao
2014-05-07 17:18     ` Ingo Molnar
2014-05-07 17:37       ` Peter Zijlstra
2014-04-30 21:56   ` Frederic Weisbecker
2014-05-07 16:40     ` Ingo Molnar
2014-06-06  8:17       ` Peter Zijlstra
2014-06-06  8:24         ` Borislav Petkov
2014-06-06  9:35           ` Peter Zijlstra
2014-06-07  3:08             ` H. Peter Anvin
2014-06-07  3:09           ` H. Peter Anvin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140427120820.GC22116@gmail.com \
    --to=mingo@kernel.org \
    --cc=a.p.zijlstra@chello.nl \
    --cc=acme@infradead.org \
    --cc=akpm@linux-foundation.org \
    --cc=behlendorf1@llnl.gov \
    --cc=fweisbec@gmail.com \
    --cc=hpa@zytor.com \
    --cc=jesper.nilsson@axis.com \
    --cc=jolsa@redhat.com \
    --cc=jslaby@suse.cz \
    --cc=kernel@gentoo.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=ryao@gentoo.org \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=vgupta@synopsys.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox