All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@elte.hu>
To: Peter Zijlstra <peterz@infradead.org>, Greg KH <greg@kroah.com>
Cc: Lin Ming <ming.m.lin@intel.com>,
	Corey Ashford <cjashfor@linux.vnet.ibm.com>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Paul Mundt <lethal@linux-sh.org>,
	"eranian@gmail.com" <eranian@gmail.com>,
	"Gary.Mohr@Bull.com" <Gary.Mohr@bull.com>,
	"arjan@linux.intel.com" <arjan@linux.intel.com>,
	"Zhang, Yanmin" <yanmin_zhang@linux.intel.com>,
	Paul Mackerras <paulus@samba.org>,
	"David S. Miller" <davem@davemloft.net>,
	Russell King <rmk+kernel@arm.linux.org.uk>,
	Arnaldo Carvalho de Melo <acme@redhat.com>,
	Will Deacon <will.deacon@arm.com>,
	Maynard Johnson <mpjohn@us.ibm.com>, Carl Love <carll@us.ibm.com>,
	Kay Sievers <kay.sievers@vrfy.org>,
	lkml <linux-kernel@vger.kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: [rfc] Describe events in a structured way via sysfs
Date: Fri, 21 May 2010 11:40:53 +0200	[thread overview]
Message-ID: <20100521094053.GA4658@elte.hu> (raw)
In-Reply-To: <1274429038.1674.1684.camel@laptop>


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Thu, 2010-05-20 at 16:12 -0700, Greg KH wrote:

> > How deep in the device tree are you really going to be 
> > caring about?  It sounds like the large majority of 
> > events are only going to be coming from the "system" 
> > type objects (cpu, nodes, memory, etc.) and very few 
> > would be from things that we consider a 'struct 
> > device' today (like a pci, usb, scsi, or input, etc.)
> 
> The general noise I hear from the hardware people is 
> that we'll see more and more device-level stuff - bus 
> bridges/controller and actual devices (GPUs, NICs etc.) 
> will be wanting to export performance metrics.

There's (much) more:

 - laptops want to provide power level/usage metrics,

 - we could express a lot of special, lower level 
   (transport specific) disk IO stats via events as well - 
   without having to push those stats to a higher level 
   (where it might not make sense). Currently such kinds
   of stats/metrics are very device/subsystem specific 
   way, if they are provided at all.

Also, we already have quite a few per device tracepoints 
upstream. Here are a few examples:

 - GPU tracepoints (trace_i915_gem_request_submit(), etc.)
 - WIFI tracepoints (trace_iwlwifi_dev_ioread32(), etc.)
 - block tracepoints (trace_block_bio_complete())

So these would be attached to:

  # GEM events of drm/card0:
  /sys/devices/pci0000:00/0000:00:02.0/drm/card0/events/i915_gem_request_submit/

  # Wifi-ioread events of wlan0:
  /sys/devices/pci0000:00/0000:00:1c.1/0000:03:00.0/net/wlan0/events/iwlwifi_dev_ioread32/

  # whole sdb disk events:
  /sys/block/sdb/events/block_bio_complete/

  # sdb1 partition events:
  /sys/block/sdb/sdb1/events/block_bio_complete/

And we also have 'software nodes' in /sys that have events 
upstream here and today. For example for SLAB we already 
have kmalloc/kfree tracepoints (trace_kmalloc() and 
trace_kfree()):

  # all kmalloc events:
  /sys/kernel/slab/events/

  # kmalloc events for sighand_cache:
  /sys/kernel/slab/sighand_cache/events/kmalloc/

  # kfree events for sighand_cache:
  /sys/kernel/slab/sighand_cache/events/kfree/

In general the set of events we have upstream is growing 
along an exponential curve (there's over a hundred now, 
via tracepoints).

They are either logically attached to the hardware 
topology of the system (as in the first set of examples 
above), or ae attached to the software/subsystem object 
topology of the kernel (some examples of which are 
described in the second set of examples above).

Sometimes there are aliasing/filtering relationship 
between events, which is expressed very well via the 
hierarchy and granularity of /sysfs.

New events would go into that topology there in a natural 
way.

For example general hugepage tracepoints (should we 
introduce any) would go into the existing hugepage node:

  /sys/kernel/mm/hugepages/events/...

All in one, all these existing and future events, both of 
hardware and software type, are literally begging to be 
attached to nodes in /sys :-)

If we created a separate eventfs for it we'd have to start 
with duplicating all the topology/hiearchy/structure that 
is present in sysfs already. (and dilluting /sys's 
utility)

That would be a bad thing, so it would be nice if we found 
a workable solution here. We could split up the record 
format some more:

 /sys/kernel/sched/events/sched_wakeup/format/
 /sys/kernel/sched/events/sched_wakeup/format/common_type/
 /sys/kernel/sched/events/sched_wakeup/format/common_flags/
 /sys/kernel/sched/events/sched_wakeup/format/common_preempt_count/
 /sys/kernel/sched/events/sched_wakeup/format/common_pid/
 /sys/kernel/sched/events/sched_wakeup/format/common_lock_depth/
 /sys/kernel/sched/events/sched_wakeup/format/comm/
 /sys/kernel/sched/events/sched_wakeup/format/pid/
 /sys/kernel/sched/events/sched_wakeup/format/prio/
 /sys/kernel/sched/events/sched_wakeup/format/success/
 /sys/kernel/sched/events/sched_wakeup/format/target_cpu/

Into single-value files. But this would add significant 
parsing overhead (plus significant allocation overhead), 
for no tangible benefit.

The problem with /proc was always the lack of standard 
structure and the lack of performance - while the format 
file is about _more_ structure.

Increasing structure parsing overhead does not look like 
the right answer to that problem.

Hm?

	Ingo

  reply	other threads:[~2010-05-21  9:41 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-05-19  1:46 [RFC][PATCH v2 06/11] perf: core, export pmus via sysfs Lin Ming
2010-05-18 20:05 ` Greg KH
2010-05-19  2:34   ` Lin Ming
2010-05-19  2:48     ` Greg KH
2010-05-19  3:40       ` Lin Ming
2010-05-19  5:00         ` Greg KH
2010-05-19  6:32           ` Lin Ming
2010-05-19  7:14       ` Peter Zijlstra
2010-05-20 18:42         ` Greg KH
2010-05-20 19:52           ` Peter Zijlstra
2010-05-20 20:19             ` Greg KH
2010-05-20 20:14           ` Ingo Molnar
2010-05-20 23:12             ` Greg KH
2010-05-21  8:03               ` Peter Zijlstra
2010-05-21  9:40                 ` Ingo Molnar [this message]
     [not found]                   ` <AANLkTinJeYJtCg2aRWhHTcf5E2-dN2-oAfEJ8tAtFjb9@mail.gmail.com>
2010-06-01  2:34                     ` [rfc] Describe events in a structured way " Lin Ming
2010-06-08 18:43                       ` Ingo Molnar
     [not found]                   ` <AANLkTimf1Z0N9cv2Pu2qTTUscn4utC37zOPelCbqQoPv@mail.gmail.com>
2010-06-21  8:55                     ` Lin Ming
     [not found]                       ` <1277112858.3618.16.camel@jlt3.sipsolutions.net>
     [not found]                         ` <1277187920.4467.3.camel@minggr.sh.intel.com>
     [not found]                           ` <1277189971.3637.5.camel@jlt3.sipsolutions.net>
2010-06-22  7:22                             ` Lin Ming
2010-06-22  7:33                               ` Johannes Berg
2010-06-22  7:39                                 ` Johannes Berg
2010-06-22  8:04                                   ` Lin Ming
2010-06-22  8:16                                     ` Johannes Berg
2010-06-22  7:47                                 ` Lin Ming
2010-06-22  7:52                                   ` Johannes Berg
2010-06-24  9:36                                 ` Ingo Molnar
2010-06-24 16:14                                   ` Johannes Berg
2010-06-24 17:33                                     ` Ingo Molnar
2010-06-29  6:15                                       ` Lin Ming
2010-06-29  8:55                                         ` Ingo Molnar
2010-06-29  9:20                                           ` Lin Ming
2010-06-29 10:26                                             ` Ingo Molnar
2010-07-02  8:06                                               ` Lin Ming
2010-07-03 12:54                                                 ` Ingo Molnar
2010-07-17  0:20                                                 ` Corey Ashford
2010-07-20  5:48                                                   ` Lin Ming
2010-07-20 15:19                                                     ` Robert Richter
2010-07-20 17:50                                                       ` Corey Ashford
2010-07-20 18:30                                                         ` Robert Richter
2010-07-20 21:18                                                           ` Corey Ashford
2010-07-20 17:43                                                     ` Corey Ashford
2010-05-19  7:06     ` [RFC][PATCH v2 06/11] perf: core, export pmus " Borislav Petkov
2010-05-19  7:17       ` Peter Zijlstra
2010-05-19  7:23         ` Ingo Molnar
2010-05-18 20:07 ` Greg KH
2010-05-19  2:37   ` Lin Ming

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100521094053.GA4658@elte.hu \
    --to=mingo@elte.hu \
    --cc=Gary.Mohr@bull.com \
    --cc=acme@redhat.com \
    --cc=arjan@linux.intel.com \
    --cc=carll@us.ibm.com \
    --cc=cjashfor@linux.vnet.ibm.com \
    --cc=davem@davemloft.net \
    --cc=eranian@gmail.com \
    --cc=fweisbec@gmail.com \
    --cc=greg@kroah.com \
    --cc=kay.sievers@vrfy.org \
    --cc=lethal@linux-sh.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ming.m.lin@intel.com \
    --cc=mpjohn@us.ibm.com \
    --cc=paulus@samba.org \
    --cc=peterz@infradead.org \
    --cc=rmk+kernel@arm.linux.org.uk \
    --cc=tglx@linutronix.de \
    --cc=will.deacon@arm.com \
    --cc=yanmin_zhang@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.