From: Ingo Molnar <mingo@elte.hu>
To: arun@sharma-home.net
Cc: Stephane Eranian <eranian@google.com>,
Arnaldo Carvalho de Melo <acme@infradead.org>,
linux-kernel@vger.kernel.org, Andi Kleen <ak@linux.intel.com>,
Peter Zijlstra <peterz@infradead.org>,
Lin Ming <ming.m.lin@intel.com>,
Arnaldo Carvalho de Melo <acme@redhat.com>,
Thomas Gleixner <tglx@linutronix.de>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
eranian@gmail.com, Arun Sharma <asharma@fb.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [generalized cache events] Re: [PATCH 1/1] perf tools: Add missing user space support for config1/config2
Date: Fri, 22 Apr 2011 22:30:22 +0200 [thread overview]
Message-ID: <20110422203022.GA20573@elte.hu> (raw)
In-Reply-To: <20110422165007.GA18401@vps.sharma-home.net>
* arun@sharma-home.net <arun@sharma-home.net> wrote:
> On Fri, Apr 22, 2011 at 12:52:11PM +0200, Ingo Molnar wrote:
> >
> > Using the generalized cache events i can run:
> >
> > $ perf stat --repeat 10 -e cycles:u -e instructions:u -e l1-dcache-loads:u -e l1-dcache-load-misses:u ./array
> >
> > Performance counter stats for './array' (10 runs):
> >
> > 6,719,130 cycles:u ( +- 0.662% )
> > 5,084,792 instructions:u # 0.757 IPC ( +- 0.000% )
> > 1,037,032 l1-dcache-loads:u ( +- 0.009% )
> > 1,003,604 l1-dcache-load-misses:u ( +- 0.003% )
> >
> > 0.003802098 seconds time elapsed ( +- 13.395% )
> >
> > I consider that this is 'bad', because for almost every dcache-load there's a
> > dcache-miss - a 99% L1 cache miss rate!
>
> One could argue that all you need is cycles and instructions. [...]
Yes, and note that with instructions events we even have skid-less PEBS
profiling so seeing the precise .
> [...] If there is an expensive load, you'll see that the load instruction
> takes many cycles and you can infer that it's a cache miss.
>
> Questions app developers typically ask me:
>
> * If I fix all my top 5 L3 misses how much faster will my app go?
This has come up: we could add a 'stalled/idle-cycles' generic event - i.e.
cycles spent without performing useful work in the pipelines. (Resource-stall
events on Intel CPUs.)
Then you would profile L3 misses (there's a generic event for that), plus
stalls, and the answer to your question would be the percentage of hits you get
in the stalled-cycles profile, multiplied by the stalled-cycles/cycles ratio.
> * Am I bottlenecked on memory bandwidth?
This would be a variant of the measurement above: say the top 90% of L3 misses
profile-correlated with stalled-cycles, relative to total-cycles. If you get
'90% of L3 misses cause a 1% wall-time slowdown' then you are not memory
bottlenecked. If the answer is '35% slowdown' then you are memory bottlenecked.
> * I have 4 L3 misses every 1000 instructions and 15 branch mispredicts per
> 1000 instructions. Which one should I focus on?
AFAICS this would be another variant of stalled-cycles measurements: you create
a stalled-cycles profile and check whether the top hits are branches or memory
loads.
> It's hard to answer some of these without access to all events.
I'm curious, how would you measure these properties - do you have some
different events in mind?
> While your approach of having generic events for commonly used counters might
> be useful for some use cases, I don't see why exposing all vendor defined
> events is harmful.
>
> A clear statement on the last point would be helpful.
Well, the thing is, i think users are helped most if we add useful, highlevel
PMU features added and not just an opaque raw event pass-through engine. The
problem with lowlevel raw ABIs is that the tool space fragments into a zillion
small hacks and there's no good concentration of know-how. I'd like the art of
performance measurement to be generalized out, as well as it can be.
We had this discussion in the big perf-counters flamewars 2+ years ago, where
one side wanted raw events, while we wanted intelligent kernel-side
abstractions and generalizations. I think the abstraction and generalization
angle worked out very well in practice - but we are having this discussion
again and again :-)
As i stated it in my prior mails, i'm not against raw events as a rare
exception channel - that increases utility. I'm against what was attempted
here: an extension to raw events as the *primary* channel for DRAM measurement
features. That is just sloppy and *reduces* utility.
I'm very simple-minded: when i see reduced utility i become sad :)
Thanks,
Ingo
next prev parent reply other threads:[~2011-04-22 20:31 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-04-22 8:47 [PATCH 1/1] perf tools: Add missing user space support for config1/config2 Stephane Eranian
2011-04-22 9:23 ` Ingo Molnar
2011-04-22 9:41 ` Stephane Eranian
2011-04-22 10:52 ` [generalized cache events] " Ingo Molnar
2011-04-22 12:04 ` Stephane Eranian
2011-04-22 13:18 ` Ingo Molnar
2011-04-22 20:31 ` Stephane Eranian
2011-04-22 20:47 ` Ingo Molnar
2011-04-23 12:13 ` Stephane Eranian
2011-04-23 12:49 ` Ingo Molnar
2011-04-22 21:03 ` Ingo Molnar
2011-04-23 12:27 ` Stephane Eranian
2011-04-22 16:51 ` Andi Kleen
2011-04-22 19:57 ` Ingo Molnar
2011-04-26 9:25 ` Peter Zijlstra
2011-04-22 16:50 ` arun
2011-04-22 17:00 ` Andi Kleen
2011-04-22 20:30 ` Ingo Molnar [this message]
2011-04-22 20:32 ` Ingo Molnar
2011-04-23 0:03 ` Andi Kleen
2011-04-23 7:50 ` Peter Zijlstra
2011-04-23 12:06 ` Stephane Eranian
2011-04-23 12:36 ` Ingo Molnar
2011-04-23 13:16 ` Peter Zijlstra
2011-04-25 18:48 ` Stephane Eranian
2011-04-25 19:40 ` Andi Kleen
2011-04-25 19:55 ` Ingo Molnar
2011-04-24 2:15 ` Andi Kleen
2011-04-24 2:19 ` Andi Kleen
2011-04-25 17:41 ` Ingo Molnar
2011-04-25 18:00 ` Dehao Chen
[not found] ` <BANLkTiks31-pMJe4zCKrppsrA1d6KanJFA@mail.gmail.com>
2011-04-25 18:05 ` Ingo Molnar
2011-04-25 18:39 ` Stephane Eranian
2011-04-25 19:45 ` Ingo Molnar
2011-04-23 8:02 ` Ingo Molnar
2011-04-23 20:14 ` [PATCH] perf events: Add stalled cycles generic event - PERF_COUNT_HW_STALLED_CYCLES Ingo Molnar
2011-04-24 6:16 ` Arun Sharma
2011-04-25 17:37 ` Ingo Molnar
2011-04-26 9:25 ` Peter Zijlstra
2011-04-26 14:00 ` Ingo Molnar
2011-04-27 11:11 ` Ingo Molnar
2011-04-27 14:47 ` Arun Sharma
2011-04-27 15:48 ` Ingo Molnar
2011-04-27 16:27 ` Ingo Molnar
2011-04-27 19:05 ` Arun Sharma
2011-04-27 19:03 ` Arun Sharma
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110422203022.GA20573@elte.hu \
--to=mingo@elte.hu \
--cc=a.p.zijlstra@chello.nl \
--cc=acme@infradead.org \
--cc=acme@redhat.com \
--cc=ak@linux.intel.com \
--cc=akpm@linux-foundation.org \
--cc=arun@sharma-home.net \
--cc=asharma@fb.com \
--cc=eranian@gmail.com \
--cc=eranian@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=ming.m.lin@intel.com \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.