From: "Yan, Zheng" <zheng.z.yan@intel.com>
To: Stephane Eranian <eranian@google.com>
Cc: LKML <linux-kernel@vger.kernel.org>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
Ingo Molnar <mingo@kernel.org>,
Arnaldo Carvalho de Melo <acme@infradead.org>,
Andi Kleen <andi@firstfloor.org>
Subject: Re: [PATCH 00/14] perf, x86: Haswell LBR call stack support
Date: Wed, 22 Jan 2014 09:35:03 +0800 [thread overview]
Message-ID: <52DF2047.70303@intel.com> (raw)
In-Reply-To: <CABPqkBTEoVnZ8AujD5Rvyen2bTqGS65_adF=GOps=rYS5f9=4A@mail.gmail.com>
On 01/21/2014 09:17 PM, Stephane Eranian wrote:
> Hi,
>
> Is there a git tree from which I could could pull those 14 patches from?
https://github.com/ukernel/linux.git perf-lbr-callstack
Regards
Yan, Zheng
>
> On Fri, Jan 3, 2014 at 6:47 AM, Yan, Zheng <zheng.z.yan@intel.com> wrote:
>> For many profiling tasks we need the callgraph. For example we often
>> need to see the caller of a lock or the caller of a memcpy or other
>> library function to actually tune the program. Frame pointer unwinding
>> is efficient and works well. But frame pointers are off by default on
>> 64bit code (and on modern 32bit gccs), so there are many binaries around
>> that do not use frame pointers. Profiling unchanged production code is
>> very useful in practice. On some CPUs frame pointer also has a high
>> cost. Dwarf2 unwinding also does not always work and is extremely slow
>> (upto 20% overhead).
>>
>> Haswell has a new feature that utilizes the existing Last Branch Record
>> facility to record call chains. When the feature is enabled, function
>> call will be collected as normal, but as return instructions are
>> executed the last captured branch record is popped from the on-chip LBR
>> registers. The LBR call stack facility provides an alternative to get
>> callgraph. It has some limitations too, but should work in most cases
>> and is significantly faster than dwarf. Frame pointer unwinding is still
>> the best default, but LBR call stack is a good alternative when nothing
>> else works.
>>
>> This patch series adds LBR call stack support. User can enabled/disable
>> this through an sysfs attribute file in the CPU PMU directory:
>> echo 1 > /sys/bus/event_source/devices/cpu/lbr_callstack
>>
>> When profiling bc(1) on Fedora 19:
>> echo 'scale=2000; 4*a(1)' > cmd; perf record -g fp bc -l < cmd
>>
>> If this feature is enabled, perf report output looks like:
>> 50.36% bc bc [.] bc_divide
>> |
>> --- bc_divide
>> execute
>> run_code
>> yyparse
>> main
>> __libc_start_main
>> _start
>>
>> 33.66% bc bc [.] _one_mult
>> |
>> --- _one_mult
>> bc_divide
>> execute
>> run_code
>> yyparse
>> main
>> __libc_start_main
>> _start
>>
>> 7.62% bc bc [.] _bc_do_add
>> |
>> --- _bc_do_add
>> |
>> |--99.89%-- 0x2000186a8
>> --0.11%-- [...]
>>
>> 6.83% bc bc [.] _bc_do_sub
>> |
>> --- _bc_do_sub
>> |
>> |--99.94%-- bc_add
>> | execute
>> | run_code
>> | yyparse
>> | main
>> | __libc_start_main
>> | _start
>> --0.06%-- [...]
>>
>> 0.46% bc libc-2.17.so [.] __memset_sse2
>> |
>> --- __memset_sse2
>> |
>> |--54.13%-- bc_new_num
>> | |
>> | |--51.00%-- bc_divide
>> | | execute
>> | | run_code
>> | | yyparse
>> | | main
>> | | __libc_start_main
>> | | _start
>> | |
>> | |--30.46%-- _bc_do_sub
>> | | bc_add
>> | | execute
>> | | run_code
>> | | yyparse
>> | | main
>> | | __libc_start_main
>> | | _start
>> | |
>> | --18.55%-- _bc_do_add
>> | bc_add
>> | execute
>> | run_code
>> | yyparse
>> | main
>> | __libc_start_main
>> | _start
>> |
>> --45.87%-- bc_divide
>> execute
>> run_code
>> yyparse
>> main
>> __libc_start_main
>> _start
>>
>> If this feature is disabled, perf report output looks like:
>> 50.49% bc bc [.] bc_divide
>> |
>> --- bc_divide
>>
>> 33.57% bc bc [.] _one_mult
>> |
>> --- _one_mult
>>
>> 7.61% bc bc [.] _bc_do_add
>> |
>> --- _bc_do_add
>> 0x2000186a8
>>
>> 6.88% bc bc [.] _bc_do_sub
>> |
>> --- _bc_do_sub
>>
>> 0.42% bc libc-2.17.so [.] __memcpy_ssse3_back
>> |
>> --- __memcpy_ssse3_back
>>
>> The LBR call stack has following known limitations
>> - Zero length calls are not filtered out by hardware
>> - Exception handing such as setjmp/longjmp will have calls/returns not
>> match
>> - Pushing different return address onto the stack will have calls/returns
>> not match
>> - If callstack is deeper than the LBR, only the last entries are captured
>>
>> Change since previous version
>> - split change into more patches
>> - introduce context switch callback and use it to flush LBR
>> - use the context switch callback to save/restore LBR
>> - dynamic allocate memory area for storing LBR stack, always switch the
>> memory area during context switch
>> - disable this feature by default
>> - more description in change logs
>>
prev parent reply other threads:[~2014-01-22 1:35 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-01-03 5:47 [PATCH 00/14] perf, x86: Haswell LBR call stack support Yan, Zheng
2014-01-03 5:47 ` [PATCH 01/14] perf, x86: Reduce lbr_sel_map size Yan, Zheng
2014-02-05 15:15 ` Stephane Eranian
2014-01-03 5:47 ` [PATCH 02/14] perf, core: introduce pmu context switch callback Yan, Zheng
2014-02-05 16:01 ` Stephane Eranian
2014-02-06 1:38 ` Yan, Zheng
2014-01-03 5:48 ` [PATCH 03/14] perf, x86: use context switch callback to flush LBR stack Yan, Zheng
2014-02-05 16:34 ` Stephane Eranian
2014-01-03 5:48 ` [PATCH 04/14] perf, x86: Basic Haswell LBR call stack support Yan, Zheng
2014-02-05 15:40 ` Stephane Eranian
2014-02-06 1:52 ` Yan, Zheng
2014-01-03 5:48 ` [PATCH 05/14] perf, core: allow pmu specific data for perf task context Yan, Zheng
2014-02-05 16:57 ` Stephane Eranian
2014-01-03 5:48 ` [PATCH 06/14] perf, core: always switch pmu specific data during context switch Yan, Zheng
2014-02-05 17:19 ` Stephane Eranian
2014-02-05 17:55 ` Peter Zijlstra
2014-02-05 18:35 ` Stephane Eranian
2014-02-06 2:08 ` Yan, Zheng
2014-01-03 5:48 ` [PATCH 07/14] perf: track number of events that use LBR callstack Yan, Zheng
2014-02-06 14:55 ` Stephane Eranian
2014-01-03 5:48 ` [PATCH 08/14] perf, x86: allocate space for storing LBR stack Yan, Zheng
2014-02-05 17:26 ` Stephane Eranian
2014-01-03 5:48 ` [PATCH 09/14] perf, x86: Save/resotre LBR stack during context switch Yan, Zheng
2014-02-05 17:45 ` Stephane Eranian
2014-02-06 15:09 ` Stephane Eranian
2014-02-10 8:45 ` Yan, Zheng
2014-01-03 5:48 ` [PATCH 10/14] perf, core: simplify need branch stack check Yan, Zheng
2014-02-06 15:35 ` Stephane Eranian
2014-01-03 5:48 ` [PATCH 11/14] perf, core: Pass perf_sample_data to perf_callchain() Yan, Zheng
2014-01-03 5:48 ` [PATCH 12/14] perf, x86: use LBR call stack to get user callchain Yan, Zheng
2014-02-06 15:46 ` Stephane Eranian
2014-01-03 5:48 ` [PATCH 13/14] perf, x86: enable LBR callstack when recording callchain Yan, Zheng
2014-02-06 15:50 ` Stephane Eranian
2014-01-03 5:48 ` [PATCH 14/14] perf, x86: Discard zero length call entries in LBR call stack Yan, Zheng
2014-02-06 15:57 ` Stephane Eranian
2014-01-21 13:17 ` [PATCH 00/14] perf, x86: Haswell LBR call stack support Stephane Eranian
2014-01-22 1:35 ` Yan, Zheng [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=52DF2047.70303@intel.com \
--to=zheng.z.yan@intel.com \
--cc=a.p.zijlstra@chello.nl \
--cc=acme@infradead.org \
--cc=andi@firstfloor.org \
--cc=eranian@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).