From: Peter Zijlstra <peterz@infradead.org>
To: Greg KH <greg@kroah.com>
Cc: Lin Ming <ming.m.lin@intel.com>, Ingo Molnar <mingo@elte.hu>,
Corey Ashford <cjashfor@linux.vnet.ibm.com>,
Frederic Weisbecker <fweisbec@gmail.com>,
Paul Mundt <lethal@linux-sh.org>,
"eranian@gmail.com" <eranian@gmail.com>,
"Gary.Mohr@Bull.com" <Gary.Mohr@bull.com>,
"arjan@linux.intel.com" <arjan@linux.intel.com>,
"Zhang, Yanmin" <yanmin_zhang@linux.intel.com>,
Paul Mackerras <paulus@samba.org>,
"David S. Miller" <davem@davemloft.net>,
Russell King <rmk+kernel@arm.linux.org.uk>,
Arnaldo Carvalho de Melo <acme@redhat.com>,
Will Deacon <will.deacon@arm.com>,
Maynard Johnson <mpjohn@us.ibm.com>, Carl Love <carll@us.ibm.com>,
Kay Sievers <kay.sievers@vrfy.org>,
lkml <linux-kernel@vger.kernel.org>
Subject: Re: [RFC][PATCH v2 06/11] perf: core, export pmus via sysfs
Date: Wed, 19 May 2010 09:14:36 +0200 [thread overview]
Message-ID: <1274253276.5605.10124.camel@twins> (raw)
In-Reply-To: <20100519024823.GA25229@kroah.com>
On Tue, 2010-05-18 at 19:48 -0700, Greg KH wrote:
> Again, why do you need/want anything in sysfs in the first place?
> What problem is it going to solve? Who is going to benifit? Why do
> they care? What is this whole thing about?
OK, so all of this is about perf_event. The story starts with CPUs
adding a PMU (Performance Monitor Unit) which allows the user to
count/sample cpu state.
The whole perf_counter subsystem was created to abstract this piece of
hardware and provide an kernel interface to it.
Then we realized that a generalization of the PMU exists in pretty much
everything that generates 'events' of interest and so we started adding
software PMUs that allowed us to do the same for tracepoints etc.
So we ended up with perf_events. A subsystem dedicated to counting
events and event based sampling.
Now the problem this patch set tries to solve; more hardware than the
CPU has such capabilities. There are memory controllers, bus controllers
and devices with similar capabilities.
So we need a way to identify and locate these things, and since sysfs
has the full machine topology in it, the idea was to represent these
things in sysfs as an event_source class.
Since the CPU and memory controllers are (assumed) symmetric on the
system, we get to add things like:
/sys/devices/system/cpu/cpu_event_source/
/sys/devices/system/node/node_event_source/
Devices like GPUs can do:
/sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/radeon_event_source/
Hooking them into sysfs at the proper device/machine topology location
allows us to quickly locate and identify these 'event_sources'.
Since all hardware wants to keep life interesting they all work
differently and programming PMUs is no different, they count different
things, have different ways to program them etc. But for each class
there is a useful subset of things that is pretty uniform.
CPU based PMUs all can count things like clock-cycles and instructions,
Memory controllers can count things like local/remote memory accesses
etc.
So each class has a number of actual events that are worthy of
abstracting. The idea was to place these events in the event_source,
like:
/sys/devices/system/cpu/cpu_event_source/cycles/
/sys/devices/system/cpu/cpu_event_source/instructions/
And then there are the software event_sources that expose kernel events
(through tracepoints), currently tracepoints live
in /debug/tracing/events/ (or /sys/kernel/debug/tracing/events/ for
those so inclined). But the above abstraction would suggest we expose
them similarly.
I'm not sure where we'd want them to live, we could add them to:
/sys/kernel/tracepoint_event_source/
and have them live there, but I'm open to alternatives :-)
[ With event_source's being a sysfs-class, we also get a nice flat
collection in /sys/class/event_source/ helping those who get lost
in the device topology, me :-) ]
The next issue seems to be the interface between this sysfs
representation and the perf_event syscall, how do we go about creating
an actual perf_event object from this rich sysfs event_source class
object.
The sys_perf_event_open() call takes a struct perf_event_attr pointer
which describes the event and its properties. The current event
classification goes through:
struct perf_event_attr {
__u32 type;
__u64 config;
...
};
So my initial idea was to let each event_source have a type_id and let
each of its events have a config field and read those and insert them in
your structure.
So we'd get:
/sys/devices/system/cpu/cpu_event_source/type_id
/sys/devices/system/cpu/cpu_event_source/instructions/config
cat those to get: .type = 0, .config = 1
(PERF_TYPE_HARDWARE:PERF_COUNT_HW_INSTRUCTIONS).
Then Ingo objected and said, if we need to open and read those file, you
might as well just open one file and pass the fd along, saves some
syscalls.
So you'd end up doing:
fd = open("/sys/devices/system/cpu/cpu_event_source/instructions/config");
attr->type = fd | PERF_TYPE_FD;
event_fd = perf_event_open(attr, ... );
close(fd);
>From that one fd we can find to which 'event_source' it belongs and what
particular config we need to use.
Plenty of opinions to be had on that I guess.
Anyway, this was the what, why and how of it.
next prev parent reply other threads:[~2010-05-19 7:15 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-05-19 1:46 [RFC][PATCH v2 06/11] perf: core, export pmus via sysfs Lin Ming
2010-05-18 20:05 ` Greg KH
2010-05-19 2:34 ` Lin Ming
2010-05-19 2:48 ` Greg KH
2010-05-19 3:40 ` Lin Ming
2010-05-19 5:00 ` Greg KH
2010-05-19 6:32 ` Lin Ming
2010-05-19 7:14 ` Peter Zijlstra [this message]
2010-05-20 18:42 ` Greg KH
2010-05-20 19:52 ` Peter Zijlstra
2010-05-20 20:19 ` Greg KH
2010-05-20 20:14 ` Ingo Molnar
2010-05-20 23:12 ` Greg KH
2010-05-21 8:03 ` Peter Zijlstra
2010-05-21 9:40 ` [rfc] Describe events in a structured way " Ingo Molnar
[not found] ` <AANLkTinJeYJtCg2aRWhHTcf5E2-dN2-oAfEJ8tAtFjb9@mail.gmail.com>
2010-06-01 2:34 ` Lin Ming
2010-06-08 18:43 ` Ingo Molnar
[not found] ` <AANLkTimf1Z0N9cv2Pu2qTTUscn4utC37zOPelCbqQoPv@mail.gmail.com>
2010-06-21 8:55 ` Lin Ming
[not found] ` <1277112858.3618.16.camel@jlt3.sipsolutions.net>
[not found] ` <1277187920.4467.3.camel@minggr.sh.intel.com>
[not found] ` <1277189971.3637.5.camel@jlt3.sipsolutions.net>
2010-06-22 7:22 ` Lin Ming
2010-06-22 7:33 ` Johannes Berg
2010-06-22 7:39 ` Johannes Berg
2010-06-22 8:04 ` Lin Ming
2010-06-22 8:16 ` Johannes Berg
2010-06-22 7:47 ` Lin Ming
2010-06-22 7:52 ` Johannes Berg
2010-06-24 9:36 ` Ingo Molnar
2010-06-24 16:14 ` Johannes Berg
2010-06-24 17:33 ` Ingo Molnar
2010-06-29 6:15 ` Lin Ming
2010-06-29 8:55 ` Ingo Molnar
2010-06-29 9:20 ` Lin Ming
2010-06-29 10:26 ` Ingo Molnar
2010-07-02 8:06 ` Lin Ming
2010-07-03 12:54 ` Ingo Molnar
2010-07-17 0:20 ` Corey Ashford
2010-07-20 5:48 ` Lin Ming
2010-07-20 15:19 ` Robert Richter
2010-07-20 17:50 ` Corey Ashford
2010-07-20 18:30 ` Robert Richter
2010-07-20 21:18 ` Corey Ashford
2010-07-20 17:43 ` Corey Ashford
2010-05-19 7:06 ` [RFC][PATCH v2 06/11] perf: core, export pmus " Borislav Petkov
2010-05-19 7:17 ` Peter Zijlstra
2010-05-19 7:23 ` Ingo Molnar
2010-05-18 20:07 ` Greg KH
2010-05-19 2:37 ` Lin Ming
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1274253276.5605.10124.camel@twins \
--to=peterz@infradead.org \
--cc=Gary.Mohr@bull.com \
--cc=acme@redhat.com \
--cc=arjan@linux.intel.com \
--cc=carll@us.ibm.com \
--cc=cjashfor@linux.vnet.ibm.com \
--cc=davem@davemloft.net \
--cc=eranian@gmail.com \
--cc=fweisbec@gmail.com \
--cc=greg@kroah.com \
--cc=kay.sievers@vrfy.org \
--cc=lethal@linux-sh.org \
--cc=linux-kernel@vger.kernel.org \
--cc=ming.m.lin@intel.com \
--cc=mingo@elte.hu \
--cc=mpjohn@us.ibm.com \
--cc=paulus@samba.org \
--cc=rmk+kernel@arm.linux.org.uk \
--cc=will.deacon@arm.com \
--cc=yanmin_zhang@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).