From: Peter Zijlstra <peterz@infradead.org>
To: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: LKML <linux-kernel@vger.kernel.org>, Ingo Molnar <mingo@elte.hu>,
Andi Kleen <andi@firstfloor.org>,
Paul Mackerras <paulus@samba.org>,
Stephane Eranian <eranian@googlemail.com>,
Frederic Weisbecker <fweisbec@gmail.com>,
Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>,
Dan Terpstra <terpstra@eecs.utk.edu>,
Philip Mucci <mucci@eecs.utk.edu>,
Maynard Johnson <mpjohn@us.ibm.com>, Carl Love <cel@us.ibm.com>
Subject: Re: [RFC] perf_events: support for uncore a.k.a. nest units
Date: Wed, 20 Jan 2010 14:34:08 +0100 [thread overview]
Message-ID: <1263994448.4283.1052.camel@laptop> (raw)
In-Reply-To: <4B560ACD.4040206@linux.vnet.ibm.com>
On Tue, 2010-01-19 at 11:41 -0800, Corey Ashford wrote:
> ----
> 3. Why does a CPU need to be assigned to manage a particular uncore unit's events?
> ----
>
> * The control registers of the uncore unit's PMU need to be read and written,
> and that may be possible only from a subset of processors in the system.
> * A processor is needed to rotate the event list on the uncore unit on every
> tick for the purposes of event scheduling.
> * Because of access latency issues, we may want the CPU to be close in locality
> to the PMU.
>
> It seems like a good idea to let the kernel decide which CPU to use to monitor a
> particular uncore event, based on the location of the uncore unit, and possibly
> current system load balance. The user will not want to have to figure out this
> detailed information.
Well, to some extend the user will have to participate. For example
which uncore pmu will be selected depends on the cpu you're attaching
the event to according to the cpu to node map.
Furthermore the intel uncore thing has curious interrupt routing
capabilities which could be tied into this mapping.
> ----
> 4. How do you encode uncore events?
> ----
> Uncore events will need to be encoded in the config field of the perf_event_attr
> struct using the existing PERF_TYPE_RAW encoding. 64 bits are available in the
> config field, and that may be sufficient to support events on most systems.
> However, due to the proliferation and added complexity of PMUs we envision, we
> might want to add another 64-bit config (perhaps call it config_extra or
> config2) field to encode any extra attributes that might be needed. The exact
> encoding used, just as for the current encoding for core events, will be on a
> per-arch and possibly per-system basis.
Lets cross that bridge when we get there.
> ----
> 5. How do you address a particular uncore PMU?
> ----
>
> This one is going to be very system- and arch-dependent, but it seems fairly
> clear that we need some sort of addressing scheme that can be
> system/arch-defined by the kernel.
>
> From a hierarchical perspective, here's an example of possible uncore PMU
> locations in a large system:
>
> 1) Per-core - units that are shared between all hardware threads in a core
> 2) Per-node - units that are shared between all cores in a node
> 3) Per-chip - units that are shared between all nodes in a chip
> 4) Per-blade - units that are shared between all chips on a blade
> 5) Per-rack - units that are shared between all blades in a rack
So how about PERF_TYPE_{CORE,NODE,SOCKET} like things?
> ----
> 6. Event rotation issues with uncore PMUs
> ----
>
> Currently, the perf_events code rotates the set of events assigned to a CPU or
> task on every system tick, so that event scheduling collisions on a PMU are
> mitigated. This turns out to cause problems for uncore units for two reasons -
> inefficiency and CPU load.
Well, if you give these things a cpumask and put them all onto the
context of first cpu of that mask things seem to collect nicely.
> b) Access to some PMU uncore units may be quite slow due to the interconnect
> that is used. This can place a burden on the CPU if it is done every system tick.
>
> This can be addressed by keeping a counter, on a per-PMU context basis that
> reduces the rate of event rotations. Setting the rotation period to three, for
> example, would cause event rotations in that context to happen on every third
> tick, instead of every tick. We think that the kernel could measure the amount
> of time it is taking to do a rotate, and then dynamically decrease the rotation
> rate if it's taking too long; "rotation rate throttling" in other words.
The better solution is to generalize the whole rr on tick scheme (which
has already been discussed).
next prev parent reply other threads:[~2010-01-20 13:34 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-01-19 19:41 [RFC] perf_events: support for uncore a.k.a. nest units Corey Ashford
2010-01-20 0:44 ` Andi Kleen
2010-01-20 1:49 ` Corey Ashford
2010-01-20 9:35 ` Andi Kleen
2010-01-20 19:28 ` Corey Ashford
2010-01-20 13:34 ` Peter Zijlstra [this message]
2010-01-20 21:33 ` Peter Zijlstra
2010-01-20 23:23 ` Corey Ashford
2010-01-21 7:21 ` Ingo Molnar
2010-01-21 19:13 ` Corey Ashford
2010-01-21 19:28 ` Corey Ashford
2010-01-27 10:28 ` Ingo Molnar
2010-01-27 19:50 ` Corey Ashford
2010-01-28 10:57 ` Peter Zijlstra
2010-01-28 18:00 ` Corey Ashford
2010-01-28 19:06 ` Peter Zijlstra
2010-01-28 19:44 ` Corey Ashford
2010-01-28 22:08 ` Corey Ashford
2010-01-29 9:52 ` Peter Zijlstra
2010-01-29 23:05 ` Corey Ashford
2010-01-30 8:42 ` Peter Zijlstra
2010-02-01 19:39 ` Corey Ashford
2010-02-01 19:54 ` Peter Zijlstra
2010-01-21 8:36 ` Peter Zijlstra
2010-01-21 8:47 ` stephane eranian
2010-01-21 8:59 ` Peter Zijlstra
2010-01-21 9:16 ` stephane eranian
2010-01-21 9:43 ` stephane eranian
[not found] ` <d3f22a1003290213x7d7904an59d50eb6a8616133@mail.gmail.com>
2010-03-30 7:42 ` Lin Ming
2010-03-30 16:49 ` Corey Ashford
2010-03-30 17:15 ` Peter Zijlstra
2010-03-30 22:12 ` Corey Ashford
2010-03-31 14:01 ` Peter Zijlstra
2010-03-31 14:13 ` stephane eranian
2010-03-31 15:49 ` Maynard Johnson
2010-03-31 17:50 ` Corey Ashford
2010-04-15 21:16 ` Gary.Mohr
2010-04-16 13:24 ` Peter Zijlstra
2010-04-19 9:08 ` Lin Ming
2010-04-19 9:27 ` Peter Zijlstra
2010-04-20 11:55 ` Lin Ming
2010-04-20 12:03 ` Peter Zijlstra
2010-04-21 8:08 ` Lin Ming
2010-04-21 8:32 ` stephane eranian
2010-04-21 8:39 ` Lin Ming
2010-04-21 8:44 ` stephane eranian
2010-04-21 9:42 ` Lin Ming
2010-04-21 9:57 ` Peter Zijlstra
2010-04-21 22:12 ` Lin Ming
2010-04-21 14:22 ` Peter Zijlstra
2010-04-21 22:38 ` Lin Ming
2010-04-21 14:53 ` Peter Zijlstra
2010-03-30 21:28 ` stephane eranian
2010-03-30 23:11 ` Corey Ashford
2010-03-31 13:43 ` stephane eranian
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1263994448.4283.1052.camel@laptop \
--to=peterz@infradead.org \
--cc=andi@firstfloor.org \
--cc=cel@us.ibm.com \
--cc=cjashfor@linux.vnet.ibm.com \
--cc=eranian@googlemail.com \
--cc=fweisbec@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=mpjohn@us.ibm.com \
--cc=mucci@eecs.utk.edu \
--cc=paulus@samba.org \
--cc=terpstra@eecs.utk.edu \
--cc=xiaoguangrong@cn.fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.