Re: statistics infrastructure (in -mm tree) review

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Martin Peschke <mp3@de.ibm.com>
To: "Randy.Dunlap" <rdunlap@xenotime.net>, Greg KH <greg@kroah.com>,
	akpm@osdl.org, Andi Kleen <ak@suse.de>
Cc: linux-kernel@vger.kernel.org
Subject: Re: statistics infrastructure (in -mm tree) review
Date: Thu, 15 Jun 2006 00:48:29 +0200	[thread overview]
Message-ID: <4490923D.10904@de.ibm.com> (raw)
In-Reply-To: <20060613171827.73cd0688.rdunlap@xenotime.net>

Randy.Dunlap wrote:
> On Tue, 13 Jun 2006 16:47:39 -0700 Greg KH wrote:
> 
>> First cut at reviewing this code.
>>
>> Initial impression is, "damm, that's a complex interface".  I'd really
>> like to see some other, real-world usages of this.  Like perhaps the
>> io-schedular statistics?  Some other /proc stats that have nothing to do
>> with processes?
> 
> Agreed with complexity.

Well, roughly 1500 lines of code is sort of complex, even if already
being reviewed and cleaned up several times.

Please, let's try to break it down into design details that add
their measure of complexity. A flat "too comlex" doesn't help on.

Could you ACK / NACK the following assumptions, so that we can
figure out how far our (dis)agreement goes and how to continue?

1) There are various kernel components that gather statistical data.
(kernel/profile.c, network stack, genhd, memory management, taskstats,
S390 DASD driver, zfcp driver, ...). Requirements for other statistics
aren't unusual.

2) Basically, they all implement similar things (smp-safe and efficient
data structures used for data gathering, implement algorithms for
on-the-fly data preprocessing, delivery of data through some user
interface, sometimes a switch for turning statistics on/off)

3) They all introduce their own macros / functions, resulting in code
duplication (bad), while they usually have their unique way to show
data to users (bad, too).

4) Possible ways to aggregate statistics data include plain counters,
histograms, a utilisation indicator (min, max, average etc.), and
potentially other algorithms people might come up with.

5) Statistics counters should be maintained in kernel. That's cheapest.
No bursts of zillions of incremental updates relayed to user space.
(please see also other comment at bottom of message)

6) Some library routines would suffice to take over data gathering
and preprocessing. Avoids further code duplication, avoids bugs,
speeds up development and test.

7) With regard to the delivery of statistic data to user land,
a library maintaining statistic counters, histograms or whatever
on behalf of exploiters doesn't need any help from the exploiter.
We can avoid the usual callbacks and code bloat in exploiters
this way.

8) If some library functions are responsible for showing data, and the
exploiter is not, we can achieve a common format for statistics data.
For example, a histogram about block I/O has the same format as
a histogram about network I/O.
This provides ease of use and minimises the effort of writing
scripts that could do further processing (e.g. formatting as
spreadsheats or bar charts, comparison and summarisation of
statistics, ...)

9) For performance reasons, per-cpu data and minimal locking
(local_irq_save/restore) should be used.
Adds to complexity, though.

10) If data is per-cpu, we want to be very careful with regard to
memory footprint. That is why, memory is only allocated for online
cpus (requires cpu hot(un)plug handling, which adds to complexity),

11) At least for data processing modes more expensive than plain
counters, like histograms, an on/off state makes sense.

12) In order to minimise the memory footprint, a released/allocated
state makes sense.

13) Unconfigured/released/off/on states should be handled by a tiny
state machine and a single check on statistic updates.

14) Kernel code delivering statistics data through library routines
can, at best, guess whether a user wants incremental updates be
aggregated in a single counter, a set of counters (histograms), or
in the form of other results. Users might want to change how much
detail is retained in aggregated statistic results.
Adds to complexity.

15) Nonetheless, exploiters are kindly requested to provide some
default settings that are a good starting point for general
purpose use.

16) Aggregated statistic results, in many cases, don't need to be
pushed to user space through a high-speed, high-volume interface.
Debugfs, for example, is fine for this purpose.

17) If the requirement for pushing data comes up anyway, we could,
for example, add relay-entries in debugfs anytime.
(For example, we could implement forwarding of incremental
updates to user space. Just another conceivable data processing
mode that fits into the current design.)

18) The programming interface of a statistics library can be rougly as
simple as statistic_create(), statistics_remove(), statistic_add().

19) Statistic_add() should come in different flavours:
statistic_add/inc() (just for convenience), and
statistic_*_nolock() (more efficient locking for a bundle of updates)

20) Statistic_add() takes a (X, Y) pair, with X being the main
characteristics of the statistics (e.g. a request size) and with
Y quantifying the update reported for a particular X (e.g. number
of observed requests of a particular request size).

21) Processing of (X, Y) according to abstract rules imposed by
counters, histograms etc. doesn't require any knowledge about the
semantics of X or Y.

22) There might be statistic counters that exploiters want to use and
maintain on their own, and which users still may want to have a look at
along with other statistics. Statistic_set() fits in here nicely.

>> And what does this mean for relayfs?  Those developers tuned that code
>> to the nth degree to get speed and other goodness, and here you go just
>> ignoring that stuff and add yet another way to get stats out of the
>> kernel.  Why should I use this instead of my own code with relayfs?
 >
 > Good questions.

Relayfs is a nice feature, but not appropriate here.

For example, during a performance measurements I have seen
SCSI I/O related statistics being updated millions of times while
I was just having a short lunch break. Some of them just increased
a counter, which is pretty fast if done immediately in the kernel.
If all these updates update would have to be relayed to user space
to just increase a counter maintained in user space.. urgh, surely
more expensive and not the way to go.

And what if user space isn't interested at all? Would we keep
pumping zillions of unused updates into buffers instead of
discarding them right away?

Profile.c, taskstats, genhd and all the other statistics listed
above... they all maintain their counters in the kernel and
show aggregated statistics to users.

>> And is the need for the in-kernel parser really necessary?  I know it
>> makes the userspace tools simpler (cat and echo), but should we be
>> telling the kernel how to filter and adjust the data?  Shouldn't we just
>> dump it all to userspace and use tools there to manipulate it?
> 
> I agree again.

Assumimg we can agree on in-kernel counters, histograms etc.
allowing for attributes being adjusted by users makes sense.

The parser stuff required for these attributes is implemented
using match_token() & friends, which should be acceptible.
But, I think that the standard way of using match_token() and
strsep() needs improvement (strsep is destructive to strings
parsed, which is painful).

Thanks, Martin

next prev parent reply	other threads:[~2006-06-14 22:48 UTC|newest]

Thread overview: 166+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-06-04 20:50 2.6.18 -mm merge plans Andrew Morton
2006-06-04 21:20 ` 2.6.18 hdrinstall (Re: 2.6.18 -mm merge plans) Bernhard Rosenkraenzer
2006-06-04 21:33 ` header cleanup and install David Woodhouse
2006-06-04 21:43   ` Andrew Morton
2006-06-05 10:52   ` Jens Axboe
2006-06-05 10:54     ` David Woodhouse
2006-06-05 10:59       ` Jens Axboe
2006-06-05 10:57         ` David Woodhouse
2006-06-05 11:03           ` Jens Axboe
2006-06-05 18:09             ` Andrew Morton
2006-06-05 19:19               ` David Woodhouse
2006-06-17 20:35                 ` Alistair John Strachan
2006-06-17 21:20                   ` David Woodhouse
2006-06-04 21:36 ` 2.6.18 -mm merge plans Alan Cox
2006-06-04 21:41 ` kbuild, kconfig and hrdinstall stuff Sam Ravnborg
2006-06-04 21:54   ` David Woodhouse
2006-06-04 23:04 ` klibc (was: 2.6.18 -mm merge plans) H. Peter Anvin
2006-06-05 18:09   ` Roman Zippel
2006-06-06 15:20   ` Pavel Machek
2006-06-06 20:56     ` Rafael J. Wysocki
2006-06-07  3:37       ` H. Peter Anvin
2006-06-07  4:00       ` Nigel Cunningham
2006-06-07  4:10         ` H. Peter Anvin
2006-06-07  4:25           ` Nigel Cunningham
2006-06-07  4:26             ` klibc H. Peter Anvin
2006-06-07  6:22               ` klibc Nigel Cunningham
2006-06-07  6:38                 ` klibc H. Peter Anvin
2006-06-07  6:51             ` klibc (was: 2.6.18 -mm merge plans) Joshua Hudson
2006-06-07 21:12               ` H. Peter Anvin
2006-06-09  8:03                 ` klibc Nix
2006-06-09 18:45                   ` klibc H. Peter Anvin
     [not found]                   ` <bda6d13a0606091050n40fda044v668eef09af3c29a7@mail.gmail.com>
     [not found]                     ` <871wty6rl9.fsf@hades.wkstn.nix>
2006-06-09 22:28                       ` klibc Joshua Hudson
2006-06-09 22:48                         ` klibc H. Peter Anvin
2006-06-09 23:13                           ` klibc Joshua Hudson
2006-06-09 23:44                             ` klibc H. Peter Anvin
2006-06-16  6:02                               ` klibc Joshua Hudson
2006-06-16 19:19                                 ` klibc H. Peter Anvin
2006-06-07  8:44       ` klibc (was: 2.6.18 -mm merge plans) Pavel Machek
2006-06-07  9:44         ` Rafael J. Wysocki
2006-06-04 23:50 ` clocksource Roman Zippel
2006-06-05 20:20   ` clocksource john stultz
2006-06-05 20:53     ` clocksource john stultz
2006-06-05 21:07     ` clocksource Roman Zippel
2006-06-06 19:42       ` clocksource john stultz
2006-06-07  0:41         ` clocksource Roman Zippel
2006-06-08  8:05           ` clocksource john stultz
2006-06-15 11:40             ` clocksource Roman Zippel
2006-06-16  3:21               ` clocksource john stultz
2006-06-16  3:35                 ` clocksource john stultz
2006-06-16 15:33                 ` clocksource Roman Zippel
2006-06-16 18:48                   ` clocksource john stultz
2006-06-17 19:45                     ` clocksource Roman Zippel
2006-06-17 17:04                 ` clocksource Andrew Morton
2006-06-05  0:02 ` utsname/hostname Randy.Dunlap
2006-06-05  1:06   ` utsname/hostname Andrew Morton
2006-06-05  3:10     ` utsname/hostname Randy.Dunlap
     [not found] ` <20060605002807.GA4919@mail.ustc.edu.cn>
2006-06-05  0:28   ` readahead benchmark Fengguang Wu
2006-06-05  1:02     ` Andrew Morton
2006-06-05  0:32 ` new SCSI drivers (was Re: 2.6.18 -mm merge plans) Jeff Garzik
2006-06-05  1:06 ` wireless " Jeff Garzik
2006-06-05  1:15   ` Andrew Morton
2006-06-05  8:33     ` Andreas Mohr
2006-06-05  8:45       ` Arjan van de Ven
2006-06-05 10:26         ` Alan Cox
2006-06-05 10:35           ` Arjan van de Ven
2006-06-05 10:59             ` Alan Cox
2006-06-10  6:58             ` Pavel Machek
2006-06-05  8:54   ` Christoph Hellwig
2006-06-05 12:33     ` Jeff Garzik
2006-06-05 12:48       ` Arjan van de Ven
2006-06-05 12:52         ` Jeff Garzik
2006-06-05 14:02           ` Linux kernel and laws Adrian Bunk
2006-06-05 14:21             ` linux-os (Dick Johnson)
2006-06-06  5:33             ` Evgeniy Polyakov
2006-06-05 13:27     ` wireless (was Re: 2.6.18 -mm merge plans) John W. Linville
2006-06-05 13:31       ` Christoph Hellwig
2006-06-05 13:42       ` Arjan van de Ven
2006-06-05 16:24       ` Alan Cox
2006-06-29 14:26         ` ACX100 (softmac-based) driver ready to merge, but is it legal? -- " John W. Linville
     [not found]           ` <20060629144233.GB24463@tuxdriver.com>
2006-06-29 14:47             ` [Acx100-users] Denis Vlasenko, where are you? (mail bounced) Andreas Mohr
2006-06-05  1:32 ` merging new drivers (was Re: 2.6.18 -mm merge plans) Jeff Garzik
2006-06-05  1:47   ` Andrew Morton
2006-06-05  8:59     ` Christoph Hellwig
2006-06-05  9:10       ` Andrew Morton
2006-06-05  9:16         ` Arjan van de Ven
2006-06-05 11:10       ` Ivan Novick
2006-06-05 11:26         ` Adrian Bunk
2006-06-05  6:58   ` Francois Romieu
2006-06-05 10:32     ` Alan Cox
2006-06-05 10:36       ` Arjan van de Ven
2006-06-06  2:02         ` Chris Wright
2006-06-06  7:01           ` Andi Kleen
2006-06-06 13:04             ` Steven Rostedt
2006-06-05 13:38 ` 2.6.18 -mm merge plans -- GFS David Woodhouse
2006-06-05 14:10   ` Russell King
2006-06-05 15:01   ` Steven Whitehouse
2006-06-07  7:12     ` Steven Whitehouse
2006-06-05 14:08 ` 2.6.18 -mm merge plans Oleg Nesterov
2006-06-05 14:43 ` Serge E. Hallyn
2006-06-08 19:56   ` Eric W. Biederman
2006-06-09 13:02     ` Serge E. Hallyn
2006-06-09 23:25     ` Serge E. Hallyn
2006-06-10  0:39       ` Eric W. Biederman
2006-06-10  1:23         ` Serge E. Hallyn
2006-06-10  7:52           ` Eric W. Biederman
2006-06-10  8:09           ` Eric W. Biederman
2006-06-10  9:53       ` Christoph Hellwig
     [not found] ` <20060605010501.GA4931@mail.ustc.edu.cn>
2006-06-05  1:05   ` statistics infrastructure Fengguang Wu
2006-06-05 16:30   ` Greg KH
2006-06-13 23:47     ` statistics infrastructure (in -mm tree) review Greg KH
2006-06-14  0:18       ` Randy.Dunlap
2006-06-14 16:45         ` Greg KH
2006-06-14 22:48         ` Martin Peschke [this message]
2006-06-19 22:12           ` Greg KH
2006-06-20 15:40             ` Martin Peschke
2006-06-20 16:50               ` Randy.Dunlap
2006-06-21 18:51                 ` Martin Peschke
2006-06-21 19:38                   ` Matthew Frost
2006-06-22 11:43                     ` Martin Peschke
2006-06-14  5:04       ` Andi Kleen
2006-06-14 22:49         ` Martin Peschke
2006-06-16 20:40           ` Greg KH
2006-06-16 21:34             ` Martin Peschke
2006-06-17  6:51           ` Andi Kleen
2006-06-17 11:03             ` Martin Peschke
2006-06-17 10:30       ` Martin Peschke
2006-06-06  0:54 ` Merge of per task delay accounting (was Re: 2.6.18 -mm merge plans) Balbir Singh
2006-06-06 22:28   ` Shailabh Nagar
2006-06-06 22:40     ` Andrew Morton
2006-06-08 14:27       ` Shailabh Nagar
2006-06-08 17:42         ` Andrew Morton
2006-06-08 18:36           ` Shailabh Nagar
2006-06-08 19:33             ` Balbir Singh
2006-06-06 22:52     ` Jay Lan
2006-06-06 22:55       ` Shailabh Nagar
2006-06-12 12:02       ` Martin Peschke
2006-06-12 13:28         ` Shailabh Nagar
2006-06-06 12:32 ` 2.6.18 -mm pi-futex merge Steven Rostedt
2006-06-06 13:34   ` Roman Zippel
2006-06-06 13:44     ` Steven Rostedt
2006-06-06 14:42 ` genirq Ingo Molnar
2006-06-06 16:56   ` genirq Daniel Walker
2006-06-07  8:42     ` genirq Ingo Molnar
2006-06-07  3:46   ` genirq Benjamin Herrenschmidt
2006-06-06 14:53 ` 2.6.18 -mm merge plans Ingo Molnar
2006-06-06 16:02   ` Andrew Morton
2006-06-06 16:35     ` Arjan van de Ven
2006-06-06 20:47     ` lock validator [2.6.18 -mm merge plans] Ingo Molnar
2006-06-07  3:52 ` mutex vs. local irqs (Was: 2.6.18 -mm merge plans) Benjamin Herrenschmidt
2006-06-07  4:29   ` Andrew Morton
2006-06-07  5:04     ` Benjamin Herrenschmidt
2006-06-07  5:29       ` Andrew Morton
2006-06-07  6:44         ` Benjamin Herrenschmidt
2006-06-07  7:03           ` Andrew Morton
2006-06-07 13:21           ` Ingo Molnar
2006-06-08  0:31             ` Benjamin Herrenschmidt
2006-06-08 10:49               ` David Woodhouse
2006-06-08 10:53                 ` Ingo Molnar
2006-06-08 11:01                   ` David Woodhouse
2006-06-08 11:17               ` Roman Zippel
2006-06-08 13:38                 ` Benjamin Herrenschmidt
2006-06-08 14:02                   ` Roman Zippel
2006-06-08 23:40                     ` Benjamin Herrenschmidt
2006-06-08 22:59             ` Paul Mackerras
2006-06-10 10:22 ` 2.6.18 -mm merge plans Christoph Hellwig
2006-06-14 15:18   ` Michael Halcrow

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4490923D.10904@de.ibm.com \
    --to=mp3@de.ibm.com \
    --cc=ak@suse.de \
    --cc=akpm@osdl.org \
    --cc=greg@kroah.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rdunlap@xenotime.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox