From: "Yan, Zheng" <zheng.z.yan@intel.com>
To: Stephane Eranian <eranian@google.com>
Cc: LKML <linux-kernel@vger.kernel.org>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
Ingo Molnar <mingo@kernel.org>,
Arnaldo Carvalho de Melo <acme@infradead.org>,
Andi Kleen <andi@firstfloor.org>
Subject: Re: [PATCH 00/14] perf, x86: Haswell LBR call stack support
Date: Wed, 22 Jan 2014 09:35:03 +0800 [thread overview]
Message-ID: <52DF2047.70303@intel.com> (raw)
In-Reply-To: <CABPqkBTEoVnZ8AujD5Rvyen2bTqGS65_adF=GOps=rYS5f9=4A@mail.gmail.com>
On 01/21/2014 09:17 PM, Stephane Eranian wrote:
> Hi,
>
> Is there a git tree from which I could could pull those 14 patches from?
https://github.com/ukernel/linux.git perf-lbr-callstack
Regards
Yan, Zheng
>
> On Fri, Jan 3, 2014 at 6:47 AM, Yan, Zheng <zheng.z.yan@intel.com> wrote:
>> For many profiling tasks we need the callgraph. For example we often
>> need to see the caller of a lock or the caller of a memcpy or other
>> library function to actually tune the program. Frame pointer unwinding
>> is efficient and works well. But frame pointers are off by default on
>> 64bit code (and on modern 32bit gccs), so there are many binaries around
>> that do not use frame pointers. Profiling unchanged production code is
>> very useful in practice. On some CPUs frame pointer also has a high
>> cost. Dwarf2 unwinding also does not always work and is extremely slow
>> (upto 20% overhead).
>>
>> Haswell has a new feature that utilizes the existing Last Branch Record
>> facility to record call chains. When the feature is enabled, function
>> call will be collected as normal, but as return instructions are
>> executed the last captured branch record is popped from the on-chip LBR
>> registers. The LBR call stack facility provides an alternative to get
>> callgraph. It has some limitations too, but should work in most cases
>> and is significantly faster than dwarf. Frame pointer unwinding is still
>> the best default, but LBR call stack is a good alternative when nothing
>> else works.
>>
>> This patch series adds LBR call stack support. User can enabled/disable
>> this through an sysfs attribute file in the CPU PMU directory:
>> echo 1 > /sys/bus/event_source/devices/cpu/lbr_callstack
>>
>> When profiling bc(1) on Fedora 19:
>> echo 'scale=2000; 4*a(1)' > cmd; perf record -g fp bc -l < cmd
>>
>> If this feature is enabled, perf report output looks like:
>> 50.36% bc bc [.] bc_divide
>> |
>> --- bc_divide
>> execute
>> run_code
>> yyparse
>> main
>> __libc_start_main
>> _start
>>
>> 33.66% bc bc [.] _one_mult
>> |
>> --- _one_mult
>> bc_divide
>> execute
>> run_code
>> yyparse
>> main
>> __libc_start_main
>> _start
>>
>> 7.62% bc bc [.] _bc_do_add
>> |
>> --- _bc_do_add
>> |
>> |--99.89%-- 0x2000186a8
>> --0.11%-- [...]
>>
>> 6.83% bc bc [.] _bc_do_sub
>> |
>> --- _bc_do_sub
>> |
>> |--99.94%-- bc_add
>> | execute
>> | run_code
>> | yyparse
>> | main
>> | __libc_start_main
>> | _start
>> --0.06%-- [...]
>>
>> 0.46% bc libc-2.17.so [.] __memset_sse2
>> |
>> --- __memset_sse2
>> |
>> |--54.13%-- bc_new_num
>> | |
>> | |--51.00%-- bc_divide
>> | | execute
>> | | run_code
>> | | yyparse
>> | | main
>> | | __libc_start_main
>> | | _start
>> | |
>> | |--30.46%-- _bc_do_sub
>> | | bc_add
>> | | execute
>> | | run_code
>> | | yyparse
>> | | main
>> | | __libc_start_main
>> | | _start
>> | |
>> | --18.55%-- _bc_do_add
>> | bc_add
>> | execute
>> | run_code
>> | yyparse
>> | main
>> | __libc_start_main
>> | _start
>> |
>> --45.87%-- bc_divide
>> execute
>> run_code
>> yyparse
>> main
>> __libc_start_main
>> _start
>>
>> If this feature is disabled, perf report output looks like:
>> 50.49% bc bc [.] bc_divide
>> |
>> --- bc_divide
>>
>> 33.57% bc bc [.] _one_mult
>> |
>> --- _one_mult
>>
>> 7.61% bc bc [.] _bc_do_add
>> |
>> --- _bc_do_add
>> 0x2000186a8
>>
>> 6.88% bc bc [.] _bc_do_sub
>> |
>> --- _bc_do_sub
>>
>> 0.42% bc libc-2.17.so [.] __memcpy_ssse3_back
>> |
>> --- __memcpy_ssse3_back
>>
>> The LBR call stack has following known limitations
>> - Zero length calls are not filtered out by hardware
>> - Exception handing such as setjmp/longjmp will have calls/returns not
>> match
>> - Pushing different return address onto the stack will have calls/returns
>> not match
>> - If callstack is deeper than the LBR, only the last entries are captured
>>
>> Change since previous version
>> - split change into more patches
>> - introduce context switch callback and use it to flush LBR
>> - use the context switch callback to save/restore LBR
>> - dynamic allocate memory area for storing LBR stack, always switch the
>> memory area during context switch
>> - disable this feature by default
>> - more description in change logs
>>
prev parent reply other threads:[~2014-01-22 1:35 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-01-03 5:47 [PATCH 00/14] perf, x86: Haswell LBR call stack support Yan, Zheng
2014-01-03 5:47 ` [PATCH 01/14] perf, x86: Reduce lbr_sel_map size Yan, Zheng
2014-02-05 15:15 ` Stephane Eranian
2014-01-03 5:47 ` [PATCH 02/14] perf, core: introduce pmu context switch callback Yan, Zheng
2014-02-05 16:01 ` Stephane Eranian
2014-02-06 1:38 ` Yan, Zheng
2014-01-03 5:48 ` [PATCH 03/14] perf, x86: use context switch callback to flush LBR stack Yan, Zheng
2014-02-05 16:34 ` Stephane Eranian
2014-01-03 5:48 ` [PATCH 04/14] perf, x86: Basic Haswell LBR call stack support Yan, Zheng
2014-02-05 15:40 ` Stephane Eranian
2014-02-06 1:52 ` Yan, Zheng
2014-01-03 5:48 ` [PATCH 05/14] perf, core: allow pmu specific data for perf task context Yan, Zheng
2014-02-05 16:57 ` Stephane Eranian
2014-01-03 5:48 ` [PATCH 06/14] perf, core: always switch pmu specific data during context switch Yan, Zheng
2014-02-05 17:19 ` Stephane Eranian
2014-02-05 17:55 ` Peter Zijlstra
2014-02-05 18:35 ` Stephane Eranian
2014-02-06 2:08 ` Yan, Zheng
2014-01-03 5:48 ` [PATCH 07/14] perf: track number of events that use LBR callstack Yan, Zheng
2014-02-06 14:55 ` Stephane Eranian
2014-01-03 5:48 ` [PATCH 08/14] perf, x86: allocate space for storing LBR stack Yan, Zheng
2014-02-05 17:26 ` Stephane Eranian
2014-01-03 5:48 ` [PATCH 09/14] perf, x86: Save/resotre LBR stack during context switch Yan, Zheng
2014-02-05 17:45 ` Stephane Eranian
2014-02-06 15:09 ` Stephane Eranian
2014-02-10 8:45 ` Yan, Zheng
2014-01-03 5:48 ` [PATCH 10/14] perf, core: simplify need branch stack check Yan, Zheng
2014-02-06 15:35 ` Stephane Eranian
2014-01-03 5:48 ` [PATCH 11/14] perf, core: Pass perf_sample_data to perf_callchain() Yan, Zheng
2014-01-03 5:48 ` [PATCH 12/14] perf, x86: use LBR call stack to get user callchain Yan, Zheng
2014-02-06 15:46 ` Stephane Eranian
2014-01-03 5:48 ` [PATCH 13/14] perf, x86: enable LBR callstack when recording callchain Yan, Zheng
2014-02-06 15:50 ` Stephane Eranian
2014-01-03 5:48 ` [PATCH 14/14] perf, x86: Discard zero length call entries in LBR call stack Yan, Zheng
2014-02-06 15:57 ` Stephane Eranian
2014-01-21 13:17 ` [PATCH 00/14] perf, x86: Haswell LBR call stack support Stephane Eranian
2014-01-22 1:35 ` Yan, Zheng [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=52DF2047.70303@intel.com \
--to=zheng.z.yan@intel.com \
--cc=a.p.zijlstra@chello.nl \
--cc=acme@infradead.org \
--cc=andi@firstfloor.org \
--cc=eranian@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.