From: Frederic Weisbecker <fweisbec@gmail.com>
To: Byungchul Park <byungchul.park@lge.com>
Cc: peterz@infradead.org, mingo@kernel.org,
linux-kernel@vger.kernel.org, walken@google.com
Subject: Re: [PATCH 1/2] x86/dumpstack: Optimize save_stack_trace
Date: Fri, 8 Jul 2016 16:44:04 +0200 [thread overview]
Message-ID: <20160708144403.GG30200@lerouge> (raw)
In-Reply-To: <1467628075-7289-1-git-send-email-byungchul.park@lge.com>
On Mon, Jul 04, 2016 at 07:27:54PM +0900, Byungchul Park wrote:
> I suggested this patch on https://lkml.org/lkml/2016/6/20/22. However,
> I want to proceed saperately since it's somewhat independent from each
> other. Frankly speaking, I want this patchset to be accepted at first so
> that the crossfeature can use this optimized save_stack_trace_norm()
> which makes crossrelease work smoothly.
>
> ----->8-----
> From 1ceb4cee520cfc562d5d63471f6db4e9a8d9ff42 Mon Sep 17 00:00:00 2001
> From: Byungchul Park <byungchul.park@lge.com>
> Date: Mon, 4 Jul 2016 18:31:09 +0900
> Subject: [PATCH 1/2] x86/dumpstack: Optimize save_stack_trace
>
> Currently, x86 implementation of save_stack_trace() is walking all stack
> region word by word regardless of what the trace->max_entries is.
> However, it's unnecessary to walk after already fulfilling caller's
> requirement, say, if trace->nr_entries >= trace->max_entries is true.
>
> I measured its overhead and printed its difference of sched_clock() with
> my QEMU x86 machine. The latency was improved over 70% when
> trace->max_entries = 5.
>
> Before this patch:
>
> [ 2.326940] save_stack_trace() takes 83931 ns
> [ 2.326389] save_stack_trace() takes 62576 ns
> [ 2.327575] save_stack_trace() takes 58826 ns
> [ 2.327000] save_stack_trace() takes 88980 ns
> [ 2.327424] save_stack_trace() takes 59831 ns
> [ 2.327575] save_stack_trace() takes 58482 ns
> [ 2.327597] save_stack_trace() takes 87114 ns
> [ 2.327931] save_stack_trace() takes 121140 ns
> [ 2.327434] save_stack_trace() takes 64321 ns
> [ 2.328632] save_stack_trace() takes 84997 ns
> [ 2.328000] save_stack_trace() takes 115037 ns
> [ 2.328460] save_stack_trace() takes 72292 ns
> [ 2.328632] save_stack_trace() takes 61236 ns
> [ 2.328567] save_stack_trace() takes 76666 ns
> [ 2.328867] save_stack_trace() takes 79525 ns
> [ 2.328460] save_stack_trace() takes 64902 ns
> [ 2.329585] save_stack_trace() takes 58760 ns
> [ 2.329000] save_stack_trace() takes 91349 ns
> [ 2.329414] save_stack_trace() takes 60069 ns
> [ 2.329585] save_stack_trace() takes 61012 ns
> [ 2.329573] save_stack_trace() takes 76820 ns
> [ 2.329863] save_stack_trace() takes 62131 ns
> [ 2.330000] save_stack_trace() takes 99476 ns
> [ 2.329846] save_stack_trace() takes 62419 ns
> [ 2.330000] save_stack_trace() takes 88918 ns
> [ 2.330253] save_stack_trace() takes 73669 ns
> [ 2.330520] save_stack_trace() takes 67876 ns
> [ 2.330671] save_stack_trace() takes 75963 ns
> [ 2.330983] save_stack_trace() takes 95079 ns
> [ 2.330451] save_stack_trace() takes 62352 ns
>
> After this patch:
>
> [ 2.780735] save_stack_trace() takes 19902 ns
> [ 2.780718] save_stack_trace() takes 20240 ns
> [ 2.781692] save_stack_trace() takes 45215 ns
> [ 2.781477] save_stack_trace() takes 20191 ns
> [ 2.781694] save_stack_trace() takes 20044 ns
> [ 2.782589] save_stack_trace() takes 20292 ns
> [ 2.782706] save_stack_trace() takes 20024 ns
> [ 2.782706] save_stack_trace() takes 19881 ns
> [ 2.782881] save_stack_trace() takes 24577 ns
> [ 2.782706] save_stack_trace() takes 19901 ns
> [ 2.783621] save_stack_trace() takes 24381 ns
> [ 2.783621] save_stack_trace() takes 20205 ns
> [ 2.783760] save_stack_trace() takes 19956 ns
> [ 2.783718] save_stack_trace() takes 20280 ns
> [ 2.784179] save_stack_trace() takes 20099 ns
> [ 2.784835] save_stack_trace() takes 20055 ns
> [ 2.785922] save_stack_trace() takes 20157 ns
> [ 2.785922] save_stack_trace() takes 20140 ns
> [ 2.786178] save_stack_trace() takes 20040 ns
> [ 2.786877] save_stack_trace() takes 20102 ns
> [ 2.795000] save_stack_trace() takes 21147 ns
> [ 2.795397] save_stack_trace() takes 20230 ns
> [ 2.795397] save_stack_trace() takes 31274 ns
> [ 2.795739] save_stack_trace() takes 19706 ns
> [ 2.796484] save_stack_trace() takes 20266 ns
> [ 2.796484] save_stack_trace() takes 20902 ns
> [ 2.797000] save_stack_trace() takes 38110 ns
> [ 2.797510] save_stack_trace() takes 20224 ns
> [ 2.798181] save_stack_trace() takes 20172 ns
> [ 2.798837] save_stack_trace() takes 20824 ns
>
> Signed-off-by: Byungchul Park <byungchul.park@lge.com>
> ---
> arch/x86/include/asm/stacktrace.h | 1 +
> arch/x86/kernel/dumpstack.c | 4 ++++
> arch/x86/kernel/dumpstack_32.c | 2 ++
> arch/x86/kernel/stacktrace.c | 7 +++++++
> 4 files changed, 14 insertions(+)
>
> diff --git a/arch/x86/include/asm/stacktrace.h b/arch/x86/include/asm/stacktrace.h
> index 70bbe39..fc572e7 100644
> --- a/arch/x86/include/asm/stacktrace.h
> +++ b/arch/x86/include/asm/stacktrace.h
> @@ -41,6 +41,7 @@ struct stacktrace_ops {
> /* On negative return stop dumping */
> int (*stack)(void *data, char *name);
> walk_stack_t walk_stack;
> + int (*end_walk)(void *data);
Nice improvement but how about doing that with the return value of
stacktrace_ops::address() instead?
print_context_stack_bp() uses that for example. This behaviour could
be extended.
next prev parent reply other threads:[~2016-07-08 14:44 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-07-04 10:27 [PATCH 1/2] x86/dumpstack: Optimize save_stack_trace Byungchul Park
2016-07-04 10:27 ` [PATCH 2/2] x86/dumpstack: Add save_stack_trace_norm() Byungchul Park
2016-07-07 10:17 ` [PATCH 1/2] x86/dumpstack: Optimize save_stack_trace Byungchul Park
2016-07-08 10:08 ` Ingo Molnar
2016-07-08 14:29 ` Josh Poimboeuf
2016-07-08 14:48 ` Ingo Molnar
2016-07-08 15:02 ` Frederic Weisbecker
2016-07-08 15:22 ` Josh Poimboeuf
2016-07-18 3:14 ` Byungchul Park
2016-07-18 13:09 ` Josh Poimboeuf
2016-07-19 0:08 ` Byungchul Park
2016-07-18 2:42 ` Byungchul Park
2016-07-08 15:07 ` Frederic Weisbecker
2016-07-18 2:37 ` Byungchul Park
2016-07-08 14:08 ` Josh Poimboeuf
2016-07-08 14:44 ` Frederic Weisbecker [this message]
2016-07-18 3:25 ` Byungchul Park
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160708144403.GG30200@lerouge \
--to=fweisbec@gmail.com \
--cc=byungchul.park@lge.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=walken@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.