All of lore.kernel.org
 help / color / mirror / Atom feed
From: Namhyung Kim <namhyung.kim@lge.com>
To: HUMMEL Michel <michel.hummel@thalesgroup.com>
Cc: "linux-perf-users@vger.kernel.org" <linux-perf-users@vger.kernel.org>
Subject: Re: Perf 2.6.32-220.el6.x86_64 problem when recording 2 counters
Date: Thu, 24 May 2012 09:44:55 +0900	[thread overview]
Message-ID: <87bole1k6w.fsf@sejong.aot.lge.com> (raw)
In-Reply-To: <14480_1337789222_4FBD0B26_14480_3035_2_932AC94CE5A51243A6F091BAFB3EE3E0028AB9506D@THSONEA01CMS01P.one.grp> (HUMMEL Michel's message of "Wed, 23 May 2012 18:06:59 +0200")

Hi,

On Wed, 23 May 2012 18:06:59 +0200, HUMMEL Michel wrote:
> Hello,
> I am testing the perf command of my new centos 6.2 server, and I don't understand something.
> My Conf :
> ################################################################
> $ grep name /proc/cpuinfo
> model name      : Intel(R) Xeon(R) CPU           E5620  @ 2.40GHz
> $ uname -a
> Linux xxxxxxxx 2.6.32-220.el6.x86_64 #1 SMP Tue Dec 6 19:48:22 GMT 2011 x86_64 x86_64 x86_64 GNU/Linux
> $ perf --version
> perf version 2.6.32-220.el6.x86_64.debug
> $ perf test
>  1: vmlinux symtab matches kallsyms: FAILED!
>
>  2: detect open syscall event: Ok
>  3: detect open syscall event on all cpus: Ok
>  4: read samples using the mmap interface: Ok
> ################################################################
>
> I wrote a simple test case which is calling two functions.
> The first one "good_perfo"  does the same as the second "bad_perfo" but more efficiently (lesser cache miss ...).
>
> Gprof profile confirms this :
> ################################################################
> $ gprof -p ./test_perfo
> Flat profile:
>
> Each sample counts as 0.01 seconds.
>   %   cumulative   self              self     total
>  time   seconds   seconds    calls   s/call   s/call  name
>  76.14     38.75    38.75        1    38.75    38.75  bad_perfo
>  23.70     50.81    12.06        1    12.06    12.06  good_perfo
> ################################################################
>
> but perf don't want to give me the good answer when I asking for two events : cycles and instructions  :
> ################################################################
> $ perf record -e  cycles,instructions  -o perf.data.cycles.instructions ./test_perfo
> $ perf report -i perf.data.cycles.instructions --stdio
> # Events: 5K cycles
> #

One thing I noticed is the number of events in this run is much smaller
than others (5K to 51K). Probably it can be a reason of the inaccuracy,
but I'm not sure. Maybe increasing sampling freq (-F) or adding the 'u'
modifier on events can help.

Thanks,
Namhyung


> # Overhead  Command      Shared Object                      Symbol
> # ........  .......  .................  ..........................
> #
>     99.35%  test_perfo  test_perfo         [.] good_perfo
>      0.14%  test_perfo  [kernel.kallsyms]  [k] clear_page_c
>      0.08%  test_perfo  [kernel.kallsyms]  [k] page_fault
>      0.05%  test_perfo  [kernel.kallsyms]  [k] hrtimer_interrupt
>
>
> # Events: 5K instructions
> #
> # Overhead  Command      Shared Object                     Symbol
> # ........  .......  .................  .........................
> #
>     99.93%  test_perfo  test_perfo         [.] good_perfo
>      0.02%  test_perfo  [kernel.kallsyms]  [k] __run_hrtimer
>      0.02%  test_perfo  [kernel.kallsyms]  [k] inode_has_perm
>      0.02%  test_perfo  [kernel.kallsyms]  [k] update_wall_time
>
> ################################################################
>
> The bad_perfo function is not appearing in the profile (and I am sure that it should !) 
> Now if I do 2 separate records, one for the cycles and one for the instructions, the profile seems to be good :
>
> ################################################################
> $ perf record -e  cycles  -o perf.data.cycles ./test_perfo
> $ perf report -i perf.data.cycles --stdio
> # Events: 51K cycles
> #
> # Overhead  Command      Shared Object                      Symbol
> # ........  .......  .................  ..........................
> #
>     76.20%  test_perfo  test_perfo         [.] bad_perfo
>     23.36%  test_perfo  test_perfo         [.] good_perfo
>      0.03%  test_perfo  [kernel.kallsyms]  [k] apic_timer_interrupt
>      0.02%  test_perfo  [kernel.kallsyms]  [k] do_softirq
>      0.02%  test_perfo  [kernel.kallsyms]  [k] avc_has_perm_noaudit
>
> $ perf record -e  instructions  -o perf.data.instructions ./test_perfo
> $ perf report -i perf.data.instructions --stdio
> # Events: 45K instructions
> #
> # Overhead  Command      Shared Object                    Symbol
> # ........  .......  .................  ........................
> #
>     49.95%  test_perfo  test_perfo         [.] bad_perfo
>     49.94%  test_perfo  test_perfo         [.] good_perfo
>      0.01%  test_perfo  [kernel.kallsyms]  [k] raise_softirq
>      0.01%  test_perfo  [kernel.kallsyms]  [k] hrtimer_interrupt
>      0.01%  test_perfo  [kernel.kallsyms]  [k] kmem_cache_alloc_notrace
> ################################################################
>
> Is I missed something with the perf command or this is a bug ?
>
>
> I found a Quick&Dirty solution :
> Using the -D ( Collect data without buffering ) option give me the good profile :
> ################################################################
> $ perf record -D -e  cycles,instructions  -o perf.data.cycles.instructions ./test_perfo
> $ perf report -i perf.data.cycles.instructions --stdio
> # Events: 51K cycles
> #
> # Overhead  Command      Shared Object                      Symbol
> # ........  .......  .................  ..........................
> #
>     76.20%  test_perfo  test_perfo         [.] bad_perfo
>     22.99%  test_perfo  test_perfo         [.] good_perfo
>      0.10%  test_perfo  [kernel.kallsyms]  [k] _spin_lock
>      0.05%  test_perfo  [kernel.kallsyms]  [k] hrtimer_interrupt
>   
> :
> # Events: 51K cycles
> #
> # Overhead  Command      Shared Object                      Symbol
> # ........  .......  .................  ..........................
> #
>     76.20%  test_perfo  test_perfo         [.] bad_perfo
>     22.99%  test_perfo  test_perfo         [.] good_perfo
>      0.10%  test_perfo  [kernel.kallsyms]  [k] _spin_lock
>      0.05%  test_perfo  [kernel.kallsyms]  [k] hrtimer_interrupt 
> ################################################################
>
> Thank for any help,
> Michel Hummel

  parent reply	other threads:[~2012-05-24  0:46 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-23 16:06 Perf 2.6.32-220.el6.x86_64 problem when recording 2 counters HUMMEL Michel
2012-05-24  0:33 ` Michael Ellerman
2012-05-24  7:14   ` HUMMEL Michel
2012-05-24 14:57     ` David Ahern
2012-05-24 16:19       ` HUMMEL Michel
2012-05-24 16:27         ` David Ahern
2012-05-24  0:44 ` Namhyung Kim [this message]
2012-05-24  7:23   ` HUMMEL Michel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87bole1k6w.fsf@sejong.aot.lge.com \
    --to=namhyung.kim@lge.com \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=michel.hummel@thalesgroup.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.