Re: [generalized cache events] Re: [PATCH 1/1] perf tools: Add missing user space support for config1/config2

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Stephane Eranian <eranian@google.com>
To: Ingo Molnar <mingo@elte.hu>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>,
	linux-kernel@vger.kernel.org, Andi Kleen <ak@linux.intel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Lin Ming <ming.m.lin@intel.com>,
	Arnaldo Carvalho de Melo <acme@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	eranian@gmail.com, Arun Sharma <asharma@fb.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [generalized cache events] Re: [PATCH 1/1] perf tools: Add missing user space support for config1/config2
Date: Sat, 23 Apr 2011 14:13:13 +0200	[thread overview]
Message-ID: <BANLkTi=YWStNmamjhWjh_yeitqGhtKJssw@mail.gmail.com> (raw)
In-Reply-To: <20110422204740.GA21364@elte.hu>

On Fri, Apr 22, 2011 at 10:47 PM, Ingo Molnar <mingo@elte.hu> wrote:
>
> * Stephane Eranian <eranian@google.com> wrote:
>
>> On Fri, Apr 22, 2011 at 3:18 PM, Ingo Molnar <mingo@elte.hu> wrote:
>> >
>> > * Stephane Eranian <eranian@google.com> wrote:
>> >
>> > > > Say i'm a developer and i have an app with such code:
>> > > >
>> > > > #define THOUSAND 1000
>> > > >
>> > > > static char array[THOUSAND][THOUSAND];
>> > > >
>> > > > int init_array(void)
>> > > > {
>> > > >        int i, j;
>> > > >
>> > > >        for (i = 0; i < THOUSAND; i++) {
>> > > >                for (j = 0; j < THOUSAND; j++) {
>> > > >                        array[j][i]++;
>> > > >                }
>> > > >        }
>> > > >
>> > > >        return 0;
>> > > > }
>> > > >
>> > > > Pretty common stuff, right?
>> > > >
>> > > > Using the generalized cache events i can run:
>> > > >
>> > > >  $ perf stat --repeat 10 -e cycles:u -e instructions:u -e l1-dcache-loads:u -e l1-dcache-load-misses:u ./array
>> > > >
>> > > >  Performance counter stats for './array' (10 runs):
>> > > >
>> > > >         6,719,130 cycles:u                   ( +-   0.662% )
>> > > >         5,084,792 instructions:u           #      0.757 IPC     ( +-   0.000% )
>> > > >         1,037,032 l1-dcache-loads:u          ( +-   0.009% )
>> > > >         1,003,604 l1-dcache-load-misses:u    ( +-   0.003% )
>> > > >
>> > > >        0.003802098  seconds time elapsed   ( +-  13.395% )
>> > > >
>> > > > I consider that this is 'bad', because for almost every dcache-load there's a
>> > > > dcache-miss - a 99% L1 cache miss rate!
>> > > >
>> > > > Then i think a bit, notice something, apply this performance optimization:
>> > >
>> > > I don't think this example is really representative of the kind of problems
>> > > people face, it is just too small and obvious. [...]
>> >
>> > Well, the overwhelming majority of performance problems are 'small and obvious'
>>
>> Problems are not simple. Most serious applications these days are huge,
>> hundreds of MB of text, if not GB.
>>
>> In your artificial example, you knew the answer before you started the
>> measurement.
>>
>> Most of the time, applications are assembled out of hundreds of libraries, so
>> no single developers knows all the code. Thus, the performance analyst is
>> faced with a black box most of the time.
>
> I isolated out an example and assumed that you'd agree that identifying hot
> spots is trivial with generic cache events.
>
> My assumption was wrong so let me show you how trivial it really is.
>
> Here's an example with *two* problematic functions (but it could have hundreds,
> it does not matter):
>
> -------------------------------->
> #define THOUSAND 1000
>
> static char array1[THOUSAND][THOUSAND];
>
> static char array2[THOUSAND][THOUSAND];
>
> void func1(void)
> {
>        int i, j;
>
>        for (i = 0; i < THOUSAND; i++)
>                for (j = 0; j < THOUSAND; j++)
>                        array1[i][j]++;
> }
>
> void func2(void)
> {
>        int i, j;
>
>        for (i = 0; i < THOUSAND; i++)
>                for (j = 0; j < THOUSAND; j++)
>                        array2[j][i]++;
> }
>
> int main(void)
> {
>        for (;;) {
>                func1();
>                func2();
>        }
>
>        return 0;
> }
> <--------------------------------
>
> We do not know which one has the cache-misses problem, func1() or func2(), it's
> all a black box, right?
>
> Using generic cache events you simply type this:
>
>  $ perf top -e l1-dcache-load-misses -e l1-dcache-loads
>
> And you get such output:
>
>   PerfTop:    1923 irqs/sec  kernel: 0.0%  exact:  0.0% [l1-dcache-load-misses:u/l1-dcache-loads:u],  (all, 16 CPUs)
> -------------------------------------------------------------------------------------------------------
>
>   weight    samples  pcnt funct DSO
>   ______    _______ _____ _____ ______________________
>
>      1.9       6184 98.8% func2 /home/mingo/opt/array2
>      0.0         69  1.1% func1 /home/mingo/opt/array2
>
> It has pinpointed the problem in func2 *very* precisely.
>
> Obviously this can be used to analyze larger apps as well, with thousands of
> functions, to pinpoint cachemiss problems in specific functions.
>
No, it does not.

As I said before, your example is just to trivial to be representative. You
keep thinking that what you see in the profile pinpoints exactly the instruction
or even the function where the problem always occurs. This is not always
the case. There is skid, and it can be very big, the IP you get may not even
be in the same function where the load was issued.

You cannot generalized based on this example.

next prev parent reply	other threads:[~2011-04-23 12:13 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-04-22  8:47 [PATCH 1/1] perf tools: Add missing user space support for config1/config2 Stephane Eranian
2011-04-22  9:23 ` Ingo Molnar
2011-04-22  9:41   ` Stephane Eranian
2011-04-22 10:52     ` [generalized cache events] " Ingo Molnar
2011-04-22 12:04       ` Stephane Eranian
2011-04-22 13:18         ` Ingo Molnar
2011-04-22 20:31           ` Stephane Eranian
2011-04-22 20:47             ` Ingo Molnar
2011-04-23 12:13               ` Stephane Eranian [this message]
2011-04-23 12:49                 ` Ingo Molnar
2011-04-22 21:03             ` Ingo Molnar
2011-04-23 12:27               ` Stephane Eranian
2011-04-22 16:51         ` Andi Kleen
2011-04-22 19:57           ` Ingo Molnar
2011-04-26  9:25           ` Peter Zijlstra
2011-04-22 16:50       ` arun
2011-04-22 17:00         ` Andi Kleen
2011-04-22 20:30         ` Ingo Molnar
2011-04-22 20:32           ` Ingo Molnar
2011-04-23  0:03             ` Andi Kleen
2011-04-23  7:50               ` Peter Zijlstra
2011-04-23 12:06                 ` Stephane Eranian
2011-04-23 12:36                   ` Ingo Molnar
2011-04-23 13:16                   ` Peter Zijlstra
2011-04-25 18:48                     ` Stephane Eranian
2011-04-25 19:40                     ` Andi Kleen
2011-04-25 19:55                       ` Ingo Molnar
2011-04-24  2:15                   ` Andi Kleen
2011-04-24  2:19                 ` Andi Kleen
2011-04-25 17:41                   ` Ingo Molnar
2011-04-25 18:00                     ` Dehao Chen
     [not found]                     ` <BANLkTiks31-pMJe4zCKrppsrA1d6KanJFA@mail.gmail.com>
2011-04-25 18:05                       ` Ingo Molnar
2011-04-25 18:39                         ` Stephane Eranian
2011-04-25 19:45                           ` Ingo Molnar
2011-04-23  8:02               ` Ingo Molnar
2011-04-23 20:14           ` [PATCH] perf events: Add stalled cycles generic event - PERF_COUNT_HW_STALLED_CYCLES Ingo Molnar
2011-04-24  6:16             ` Arun Sharma
2011-04-25 17:37               ` Ingo Molnar
2011-04-26  9:25               ` Peter Zijlstra
2011-04-26 14:00               ` Ingo Molnar
2011-04-27 11:11               ` Ingo Molnar
2011-04-27 14:47                 ` Arun Sharma
2011-04-27 15:48                   ` Ingo Molnar
2011-04-27 16:27                     ` Ingo Molnar
2011-04-27 19:05                       ` Arun Sharma
2011-04-27 19:03                     ` Arun Sharma

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='BANLkTi=YWStNmamjhWjh_yeitqGhtKJssw@mail.gmail.com' \
    --to=eranian@google.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=acme@infradead.org \
    --cc=acme@redhat.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=asharma@fb.com \
    --cc=eranian@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ming.m.lin@intel.com \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).