All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg KH <greg@kroah.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Lin Ming <ming.m.lin@intel.com>, Ingo Molnar <mingo@elte.hu>,
	Corey Ashford <cjashfor@linux.vnet.ibm.com>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Paul Mundt <lethal@linux-sh.org>,
	"eranian@gmail.com" <eranian@gmail.com>,
	"Gary.Mohr@Bull.com" <Gary.Mohr@bull.com>,
	"arjan@linux.intel.com" <arjan@linux.intel.com>,
	"Zhang, Yanmin" <yanmin_zhang@linux.intel.com>,
	Paul Mackerras <paulus@samba.org>,
	"David S. Miller" <davem@davemloft.net>,
	Russell King <rmk+kernel@arm.linux.org.uk>,
	Arnaldo Carvalho de Melo <acme@redhat.com>,
	Will Deacon <will.deacon@arm.com>,
	Maynard Johnson <mpjohn@us.ibm.com>, Carl Love <carll@us.ibm.com>,
	Kay Sievers <kay.sievers@vrfy.org>,
	lkml <linux-kernel@vger.kernel.org>
Subject: Re: [RFC][PATCH v2 06/11] perf: core, export pmus via sysfs
Date: Thu, 20 May 2010 11:42:13 -0700	[thread overview]
Message-ID: <20100520184213.GB21030@kroah.com> (raw)
In-Reply-To: <1274253276.5605.10124.camel@twins>

On Wed, May 19, 2010 at 09:14:36AM +0200, Peter Zijlstra wrote:
> On Tue, 2010-05-18 at 19:48 -0700, Greg KH wrote:
> > Again, why do you need/want anything in sysfs in the first place?
> > What problem is it going to solve?  Who is going to benifit?  Why do
> > they care?  What is this whole thing about? 
> 
> 
> OK, so all of this is about perf_event. The story starts with CPUs
> adding a PMU (Performance Monitor Unit) which allows the user to
> count/sample cpu state.
> 
> The whole perf_counter subsystem was created to abstract this piece of
> hardware and provide an kernel interface to it.
> 
> Then we realized that a generalization of the PMU exists in pretty much
> everything that generates 'events' of interest and so we started adding
> software PMUs that allowed us to do the same for tracepoints etc.
> 
> So we ended up with perf_events. A subsystem dedicated to counting
> events and event based sampling.
> 
> Now the problem this patch set tries to solve; more hardware than the
> CPU has such capabilities. There are memory controllers, bus controllers
> and devices with similar capabilities.
> 
> So we need a way to identify and locate these things, and since sysfs
> has the full machine topology in it, the idea was to represent these
> things in sysfs as an event_source class.
> 
> Since the CPU and memory controllers are (assumed) symmetric on the
> system, we get to add things like:
> 
> 
>   /sys/devices/system/cpu/cpu_event_source/

Wouldn't that really be:
	/sys/devices/system/cpu/cpu0/cpu_event_source/
?

/sys/devices/system/cpu is a "type" of devices in the system here, and
isn't an event source specific to the device itself?

Or is it for all cpus together?

>   /sys/devices/system/node/node_event_source/
> 
> Devices like GPUs can do:
> 
>   /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/radeon_event_source/
> 
> Hooking them into sysfs at the proper device/machine topology location
> allows us to quickly locate and identify these 'event_sources'.

Ok, this all makes a lot more sense now, thanks.

> Since all hardware wants to keep life interesting they all work
> differently and programming PMUs is no different, they count different
> things, have different ways to program them etc. But for each class
> there is a useful subset of things that is pretty uniform.
> 
> CPU based PMUs all can count things like clock-cycles and instructions,
> Memory controllers can count things like local/remote memory accesses
> etc.
> 
> So each class has a number of actual events that are worthy of
> abstracting. The idea was to place these events in the event_source,
> like:
> 
>   /sys/devices/system/cpu/cpu_event_source/cycles/
>   /sys/devices/system/cpu/cpu_event_source/instructions/
> 
> 
> 
> And then there are the software event_sources that expose kernel events
> (through tracepoints), currently tracepoints live
> in /debug/tracing/events/ (or /sys/kernel/debug/tracing/events/ for
> those so inclined). But the above abstraction would suggest we expose
> them similarly.
> 
> I'm not sure where we'd want them to live, we could add them to:
> 
>   /sys/kernel/tracepoint_event_source/
> 
> and have them live there, but I'm open to alternatives :-)

Once you go outside of /sys/devices/ you aren't playing with devices
properly, so you might just want to stick to a "class" and have
/sys/class/tracepoint_event_source/ where all of the devices would end
up symlinking to.

> [ With event_source's being a sysfs-class, we also get a nice flat
>   collection in /sys/class/event_source/ helping those who get lost
>   in the device topology, me :-) ]

Yes, but isn't the fact that you can have different types of
event sources lend itself to different classes of event sources?

> The next issue seems to be the interface between this sysfs
> representation and the perf_event syscall, how do we go about creating
> an actual perf_event object from this rich sysfs event_source class
> object.
> 
> The sys_perf_event_open() call takes a struct perf_event_attr pointer
> which describes the event and its properties. The current event
> classification goes through:
> 
> struct perf_event_attr {
> 	__u32 type;
> 	__u64 config;
> 
> 	...
> };
> 
> So my initial idea was to let each event_source have a type_id and let
> each of its events have a config field and read those and insert them in
> your structure.
> 
> So we'd get:
> 
>   /sys/devices/system/cpu/cpu_event_source/type_id
>   /sys/devices/system/cpu/cpu_event_source/instructions/config
> 
> cat those to get: .type = 0, .config = 1
> (PERF_TYPE_HARDWARE:PERF_COUNT_HW_INSTRUCTIONS).
> 
> Then Ingo objected and said, if we need to open and read those file, you
> might as well just open one file and pass the fd along, saves some
> syscalls.
> 
> So you'd end up doing:
> 
>  fd = open("/sys/devices/system/cpu/cpu_event_source/instructions/config");
>  attr->type = fd | PERF_TYPE_FD;
>  event_fd = perf_event_open(attr, ... );
>  close(fd);
> 
> From that one fd we can find to which 'event_source' it belongs and what
> particular config we need to use.

Ah, pass the fd of a sysfs file to sysfs to get the kobject.  Ick,
that's just, well, something that I never even considered someone would
need/want to do...

sysfs exports single values just fine.  If you are starting to do more
complex things, like you currently are, maybe you shouldn't be in
sysfs...

I can always knock up a eventfs for you do mount at /sys/kernel/events/
or something if you want :)

thanks,

greg k-h

  reply	other threads:[~2010-05-20 18:42 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-05-19  1:46 [RFC][PATCH v2 06/11] perf: core, export pmus via sysfs Lin Ming
2010-05-18 20:05 ` Greg KH
2010-05-19  2:34   ` Lin Ming
2010-05-19  2:48     ` Greg KH
2010-05-19  3:40       ` Lin Ming
2010-05-19  5:00         ` Greg KH
2010-05-19  6:32           ` Lin Ming
2010-05-19  7:14       ` Peter Zijlstra
2010-05-20 18:42         ` Greg KH [this message]
2010-05-20 19:52           ` Peter Zijlstra
2010-05-20 20:19             ` Greg KH
2010-05-20 20:14           ` Ingo Molnar
2010-05-20 23:12             ` Greg KH
2010-05-21  8:03               ` Peter Zijlstra
2010-05-21  9:40                 ` [rfc] Describe events in a structured way " Ingo Molnar
     [not found]                   ` <AANLkTinJeYJtCg2aRWhHTcf5E2-dN2-oAfEJ8tAtFjb9@mail.gmail.com>
2010-06-01  2:34                     ` Lin Ming
2010-06-08 18:43                       ` Ingo Molnar
     [not found]                   ` <AANLkTimf1Z0N9cv2Pu2qTTUscn4utC37zOPelCbqQoPv@mail.gmail.com>
2010-06-21  8:55                     ` Lin Ming
     [not found]                       ` <1277112858.3618.16.camel@jlt3.sipsolutions.net>
     [not found]                         ` <1277187920.4467.3.camel@minggr.sh.intel.com>
     [not found]                           ` <1277189971.3637.5.camel@jlt3.sipsolutions.net>
2010-06-22  7:22                             ` Lin Ming
2010-06-22  7:33                               ` Johannes Berg
2010-06-22  7:39                                 ` Johannes Berg
2010-06-22  8:04                                   ` Lin Ming
2010-06-22  8:16                                     ` Johannes Berg
2010-06-22  7:47                                 ` Lin Ming
2010-06-22  7:52                                   ` Johannes Berg
2010-06-24  9:36                                 ` Ingo Molnar
2010-06-24 16:14                                   ` Johannes Berg
2010-06-24 17:33                                     ` Ingo Molnar
2010-06-29  6:15                                       ` Lin Ming
2010-06-29  8:55                                         ` Ingo Molnar
2010-06-29  9:20                                           ` Lin Ming
2010-06-29 10:26                                             ` Ingo Molnar
2010-07-02  8:06                                               ` Lin Ming
2010-07-03 12:54                                                 ` Ingo Molnar
2010-07-17  0:20                                                 ` Corey Ashford
2010-07-20  5:48                                                   ` Lin Ming
2010-07-20 15:19                                                     ` Robert Richter
2010-07-20 17:50                                                       ` Corey Ashford
2010-07-20 18:30                                                         ` Robert Richter
2010-07-20 21:18                                                           ` Corey Ashford
2010-07-20 17:43                                                     ` Corey Ashford
2010-05-19  7:06     ` [RFC][PATCH v2 06/11] perf: core, export pmus " Borislav Petkov
2010-05-19  7:17       ` Peter Zijlstra
2010-05-19  7:23         ` Ingo Molnar
2010-05-18 20:07 ` Greg KH
2010-05-19  2:37   ` Lin Ming

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100520184213.GB21030@kroah.com \
    --to=greg@kroah.com \
    --cc=Gary.Mohr@bull.com \
    --cc=acme@redhat.com \
    --cc=arjan@linux.intel.com \
    --cc=carll@us.ibm.com \
    --cc=cjashfor@linux.vnet.ibm.com \
    --cc=davem@davemloft.net \
    --cc=eranian@gmail.com \
    --cc=fweisbec@gmail.com \
    --cc=kay.sievers@vrfy.org \
    --cc=lethal@linux-sh.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ming.m.lin@intel.com \
    --cc=mingo@elte.hu \
    --cc=mpjohn@us.ibm.com \
    --cc=paulus@samba.org \
    --cc=peterz@infradead.org \
    --cc=rmk+kernel@arm.linux.org.uk \
    --cc=will.deacon@arm.com \
    --cc=yanmin_zhang@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.