public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Stephane Eranian <eranian@google.com>
Cc: linux-kernel@vger.kernel.org, mingo@elte.hu, acme@infradead.org,
	robert.richter@amd.com, ming.m.lin@intel.com,
	andi@firstfloor.org, asharma@fb.com, ravitillo@lbl.gov,
	vweaver1@eecs.utk.edu
Subject: Re: [PATCH 00/13] perf_events: add support for sampling taken branches (v3)
Date: Fri, 27 Jan 2012 13:09:01 +0100	[thread overview]
Message-ID: <1327666141.2446.168.camel@twins> (raw)
In-Reply-To: <1326127761-2723-1-git-send-email-eranian@google.com>

Arnaldo,

On Mon, 2012-01-09 at 17:49 +0100, Stephane Eranian wrote:
> I would like to thank Roberto Vitillo @ LBL for his work on the perf
> tool for this.
> 
> Enough talking, let's take a simple example. Our trivial test program
> goes like this:
> 
> void f2(void)
> {}
> void f3(void)
> {}
> void f1(unsigned long n)
> {
>   if (n & 1UL)
>     f2();
>   else
>     f3();
> }
> int main(void)
> {
>   unsigned long i;
> 
>   for (i=0; i < N; i++)
>    f1(i);
>   return 0;
> }
> 
> $ perf record -b any branchy
> $ perf report -b
> # Events: 23K cycles
> #
> # Overhead  Source Symbol     Target Symbol
> # ........  ................  ................
> 
>     18.13%  [.] f1            [.] main                          
>     18.10%  [.] main          [.] main                          
>     18.01%  [.] main          [.] f1                            
>     15.69%  [.] f1            [.] f1                            
>      9.11%  [.] f3            [.] f1                            
>      6.78%  [.] f1            [.] f3                            
>      6.74%  [.] f1            [.] f2                            
>      6.71%  [.] f2            [.] f1                            
> 
> Of the total number of branches captured, 18.13% were from f1() -> main().
> 
> Let's make this clearer by filtering the user call branches only:
> 
> $ perf record -b any_call -e cycles:u branchy
> $ perf report -b
> # Events: 19K cycles
> #
> # Overhead  Source Symbol              Target Symbol
> # ........  .........................  .........................
> #
>     52.50%  [.] main                   [.] f1                   
>     23.99%  [.] f1                     [.] f3                   
>     23.48%  [.] f1                     [.] f2                   
>      0.03%  [.] _IO_default_xsputn     [.] _IO_new_file_overflow
>      0.01%  [k] _start                 [k] __libc_start_main    
> 
> Now it is more obvious. %52 of all the captured branches where calls from main() -> f1().
> The rest is split 50/50 between f1() -> f2() and f1() -> f3() which is expected given
> that f1() dispatches based on odd vs. even values of n which is constantly increasing.
> 
> 
> Here is a kernel example, where we want to sample indirect calls:
> $ perf record -a -C 1 -b ind_call -e r1c4:k sleep 10 
> $ perf report -b
> #
> # Overhead  Source Symbol               Target Symbol
> # ........  ..........................  ..........................
> #
>     36.36%  [k] __delay                 [k] delay_tsc             
>      9.09%  [k] ktime_get               [k] read_tsc              
>      9.09%  [k] getnstimeofday          [k] read_tsc              
>      9.09%  [k] notifier_call_chain     [k] tick_notify           
>      4.55%  [k] cpuidle_idle_call       [k] intel_idle            
>      4.55%  [k] cpuidle_idle_call       [k] menu_reflect          
>      2.27%  [k] handle_irq              [k] handle_edge_irq       
>      2.27%  [k] ack_apic_edge           [k] native_apic_mem_write 
>      2.27%  [k] hpet_interrupt_handler  [k] hrtimer_interrupt     
>      2.27%  [k] __run_hrtimer           [k] watchdog_timer_fn     
>      2.27%  [k] enqueue_task            [k] enqueue_task_rt       
>      2.27%  [k] try_to_wake_up          [k] select_task_rq_rt     
>      2.27%  [k] do_timer                [k] read_tsc              
> 
> Due to HW limitations, branch filtering may be approximate on
> Core, Atom processors. It is more accurate on Nehalem, Westmere
> and best on Sandy Bridge. 

Can I have you ACK on this userspace stuff (patches 11-13)?

  parent reply	other threads:[~2012-01-27 12:09 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-09 16:49 [PATCH 00/13] perf_events: add support for sampling taken branches (v3) Stephane Eranian
2012-01-09 16:49 ` [PATCH 01/13] perf_events: add generic taken branch sampling support (v3) Stephane Eranian
2012-01-27  4:46   ` Anshuman Khandual
2012-01-27  9:57     ` Stephane Eranian
2012-01-09 16:49 ` [PATCH 02/13] perf_events: add Intel LBR MSR definitions (v3) Stephane Eranian
2012-01-27  5:03   ` Anshuman Khandual
2012-01-09 16:49 ` [PATCH 03/13] perf_events: add Intel X86 LBR sharing logic (v3) Stephane Eranian
2012-01-09 16:49 ` [PATCH 04/13] perf_events: sync branch stack sampling with X86 precise_sampling (v3) Stephane Eranian
2012-01-27  5:26   ` Anshuman Khandual
2012-01-09 16:49 ` [PATCH 05/13] perf_events: add LBR mappings for PERF_SAMPLE_BRANCH filters (v3) Stephane Eranian
2012-01-27  5:41   ` Anshuman Khandual
2012-01-09 16:49 ` [PATCH 06/13] perf_events: disable LBR support for older Intel Atom processors (v3) Stephane Eranian
2012-01-27  5:43   ` Anshuman Khandual
2012-01-09 16:49 ` [PATCH 07/13] perf_events: implement PERF_SAMPLE_BRANCH for Intel X86 (v3) Stephane Eranian
2012-01-27  6:14   ` Anshuman Khandual
2012-01-09 16:49 ` [PATCH 08/13] perf_events: add LBR software filter support " Stephane Eranian
2012-01-09 16:49 ` [PATCH 09/13] perf_events: disable PERF_SAMPLE_BRANCH_* when not supported (v3) Stephane Eranian
2012-01-27  7:15   ` Anshuman Khandual
2012-01-27  9:56     ` Stephane Eranian
2012-01-09 16:49 ` [PATCH 10/13] perf_events: add hook to flush branch_stack on context switch (v3) Stephane Eranian
2012-01-09 16:49 ` [PATCH 11/13] perf: add code to support PERF_SAMPLE_BRANCH_STACK (v3) Stephane Eranian
2012-01-10  1:25   ` Arun Sharma
2012-01-10 15:43     ` Stephane Eranian
2012-01-09 16:49 ` [PATCH 12/13] perf: add support for sampling taken branch to perf record (v3) Stephane Eranian
2012-01-09 16:49 ` [PATCH 13/13] perf: add support for taken branch sampling to perf report (v3) Stephane Eranian
2012-01-23 10:14 ` [PATCH 00/13] perf_events: add support for sampling taken branches (v3) Stephane Eranian
2012-01-23 12:25   ` Peter Zijlstra
2012-01-23 15:07     ` Stephane Eranian
2012-01-23 15:47       ` Andi Kleen
2012-01-23 17:14     ` Stephane Eranian
2012-01-24 15:39       ` Stephane Eranian
2012-01-24 16:08         ` David Ahern
2012-01-24 17:42           ` Stephane Eranian
2012-01-26 16:21           ` Stephane Eranian
2012-01-27 12:09 ` Peter Zijlstra [this message]
2012-01-27 18:20   ` Arun Sharma

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1327666141.2446.168.camel@twins \
    --to=peterz@infradead.org \
    --cc=acme@infradead.org \
    --cc=andi@firstfloor.org \
    --cc=asharma@fb.com \
    --cc=eranian@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ming.m.lin@intel.com \
    --cc=mingo@elte.hu \
    --cc=ravitillo@lbl.gov \
    --cc=robert.richter@amd.com \
    --cc=vweaver1@eecs.utk.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox