All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@elte.hu>
To: Andi Kleen <ak@linux.intel.com>
Cc: arun@sharma-home.net, Stephane Eranian <eranian@google.com>,
	Arnaldo Carvalho de Melo <acme@infradead.org>,
	linux-kernel@vger.kernel.org,
	Peter Zijlstra <peterz@infradead.org>,
	Lin Ming <ming.m.lin@intel.com>,
	Arnaldo Carvalho de Melo <acme@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	eranian@gmail.com, Arun Sharma <asharma@fb.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [generalized cache events] Re: [PATCH 1/1] perf tools: Add missing user space support for config1/config2
Date: Sat, 23 Apr 2011 10:02:58 +0200	[thread overview]
Message-ID: <20110423080258.GA14952@elte.hu> (raw)
In-Reply-To: <20110423000347.GC9328@tassilo.jf.intel.com>


* Andi Kleen <ak@linux.intel.com> wrote:

> > > Yes, and note that with instructions events we even have skid-less PEBS 
> > > profiling so seeing the precise .
> >                                   - location of instructions is possible.
> 
> It was better when it was eaten. PEBS does not actually eliminated
> skid unfortunately. The interrupt still occurs later, so the
> instruction location is off.
> 
> PEBS merely gives you more information.

Have you actually tried perf's PEBS support feature? Try:

  perf record -e instructions:pp ./myapp

(the ':pp' postfix stands for 'precise' and activates PEBS+LBR tricks.)

Look at the perf report --tui annotated asssembly output (or check 'perf 
annotate' directly) and see how precise and skid-less the hits are. Works 
pretty well on Nehalem.

Here's a cache-bound loop with skid (profiled with '-e instructions'):

         :	0000000000400390 <main>:
    0.00 :	  400390:       31 c0                   xor    %eax,%eax
    0.00 :	  400392:       eb 22                   jmp    4003b6 <main+0x26>
   12.08 :	  400394:       fe 84 10 50 08 60 00    incb   0x600850(%rax,%rdx,1)
   87.92 :	  40039b:       48 81 c2 10 27 00 00    add    $0x2710,%rdx
    0.00 :	  4003a2:       48 81 fa 00 e1 f5 05    cmp    $0x5f5e100,%rdx
    0.00 :	  4003a9:       75 e9                   jne    400394 <main+0x4>
    0.00 :	  4003ab:       48 ff c0                inc    %rax
    0.00 :	  4003ae:       48 3d 10 27 00 00       cmp    $0x2710,%rax
    0.00 :	  4003b4:       74 04                   je     4003ba <main+0x2a>
    0.00 :	  4003b6:       31 d2                   xor    %edx,%edx
    0.00 :	  4003b8:       eb da                   jmp    400394 <main+0x4>
    0.00 :	  4003ba:       31 c0                   xor    %eax,%eax

Those 'ADD' instruction hits are bogus: 99% of the cost in this function is in 
the INCB, but the PMU NMI often skids to the next (few) instructions.

Profiled with "-e instructions:pp" we get:

         :	0000000000400390 <main>:
    0.00 :	  400390:       31 c0                   xor    %eax,%eax
    0.00 :	  400392:       eb 22                   jmp    4003b6 <main+0x26>
   85.33 :	  400394:       fe 84 10 50 08 60 00    incb   0x600850(%rax,%rdx,1)
    0.00 :	  40039b:       48 81 c2 10 27 00 00    add    $0x2710,%rdx
   14.67 :	  4003a2:       48 81 fa 00 e1 f5 05    cmp    $0x5f5e100,%rdx
    0.00 :	  4003a9:       75 e9                   jne    400394 <main+0x4>
    0.00 :	  4003ab:       48 ff c0                inc    %rax
    0.00 :	  4003ae:       48 3d 10 27 00 00       cmp    $0x2710,%rax
    0.00 :	  4003b4:       74 04                   je     4003ba <main+0x2a>
    0.00 :	  4003b6:       31 d2                   xor    %edx,%edx
    0.00 :	  4003b8:       eb da                   jmp    400394 <main+0x4>
    0.00 :	  4003ba:       31 c0                   xor    %eax,%eax

The INCB has the most hits as expected - but we also learn that there's 
something about the CMP.

Thanks,

	Ingo

  parent reply	other threads:[~2011-04-23  8:03 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-04-22  8:47 [PATCH 1/1] perf tools: Add missing user space support for config1/config2 Stephane Eranian
2011-04-22  9:23 ` Ingo Molnar
2011-04-22  9:41   ` Stephane Eranian
2011-04-22 10:52     ` [generalized cache events] " Ingo Molnar
2011-04-22 12:04       ` Stephane Eranian
2011-04-22 13:18         ` Ingo Molnar
2011-04-22 20:31           ` Stephane Eranian
2011-04-22 20:47             ` Ingo Molnar
2011-04-23 12:13               ` Stephane Eranian
2011-04-23 12:49                 ` Ingo Molnar
2011-04-22 21:03             ` Ingo Molnar
2011-04-23 12:27               ` Stephane Eranian
2011-04-22 16:51         ` Andi Kleen
2011-04-22 19:57           ` Ingo Molnar
2011-04-26  9:25           ` Peter Zijlstra
2011-04-22 16:50       ` arun
2011-04-22 17:00         ` Andi Kleen
2011-04-22 20:30         ` Ingo Molnar
2011-04-22 20:32           ` Ingo Molnar
2011-04-23  0:03             ` Andi Kleen
2011-04-23  7:50               ` Peter Zijlstra
2011-04-23 12:06                 ` Stephane Eranian
2011-04-23 12:36                   ` Ingo Molnar
2011-04-23 13:16                   ` Peter Zijlstra
2011-04-25 18:48                     ` Stephane Eranian
2011-04-25 19:40                     ` Andi Kleen
2011-04-25 19:55                       ` Ingo Molnar
2011-04-24  2:15                   ` Andi Kleen
2011-04-24  2:19                 ` Andi Kleen
2011-04-25 17:41                   ` Ingo Molnar
2011-04-25 18:00                     ` Dehao Chen
     [not found]                     ` <BANLkTiks31-pMJe4zCKrppsrA1d6KanJFA@mail.gmail.com>
2011-04-25 18:05                       ` Ingo Molnar
2011-04-25 18:39                         ` Stephane Eranian
2011-04-25 19:45                           ` Ingo Molnar
2011-04-23  8:02               ` Ingo Molnar [this message]
2011-04-23 20:14           ` [PATCH] perf events: Add stalled cycles generic event - PERF_COUNT_HW_STALLED_CYCLES Ingo Molnar
2011-04-24  6:16             ` Arun Sharma
2011-04-25 17:37               ` Ingo Molnar
2011-04-26  9:25               ` Peter Zijlstra
2011-04-26 14:00               ` Ingo Molnar
2011-04-27 11:11               ` Ingo Molnar
2011-04-27 14:47                 ` Arun Sharma
2011-04-27 15:48                   ` Ingo Molnar
2011-04-27 16:27                     ` Ingo Molnar
2011-04-27 19:05                       ` Arun Sharma
2011-04-27 19:03                     ` Arun Sharma

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110423080258.GA14952@elte.hu \
    --to=mingo@elte.hu \
    --cc=a.p.zijlstra@chello.nl \
    --cc=acme@infradead.org \
    --cc=acme@redhat.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=arun@sharma-home.net \
    --cc=asharma@fb.com \
    --cc=eranian@gmail.com \
    --cc=eranian@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ming.m.lin@intel.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.