From: Martin Peschke <mp3@de.ibm.com>
To: ak@suse.de
Cc: akpm@osdl.org, linux-kernel@vger.kernel.org
Subject: Re: [patch 5/6] statistics infrastructure
Date: Fri, 16 Dec 2005 13:27:36 +0100 [thread overview]
Message-ID: <43A2B2B8.6090008@de.ibm.com> (raw)
..oops, should have made sure that my mailer does line breaks
appropriately. Getting it right this time, sorry ...
Andi Kleen wrote:
> Locks and indirect function calls?
> It seems very wrong to me to make such heavy weight statistic
> functions. Most likely you will disturb the performance whatever is
> being counted badly.
Well, that's a tradeoff between flexibility/function and performance.
I don't have any reliable numbers ready at hand. At least, doing I/O to
my SCSI devices with enabled statistics didn't feel bad.
Here is the rational for both the indirect function call and the lock:
If a statistic is disabled (that's the default) neither is locking done
nor is a function called indirectly. So far so good, I would say.
If a statistic is enabled the lock for this entity is grabbed and one
indirect function call is done. Anything else is inlined. I use granular
per-interface (per-entity, for example per-LUN or per-HBA) locking.
The indirect function call allows customization of the ways data
processing is done for particular statistics.
For example, one could deflate a histogram of latencies into a counter
providing the total of latency measurements, that is the total of
requests observed; or inflate a statistic the other way around if
required. Another example: one can make a statistic gather data for
recurring periods of time, like megabytes per seconds instead of just
the total amount of bytes transferred, or like queue utilization per
whatever-unit-of-time instead of just an overall utilization.
A statistic that feeds on request sizes can be setup to provide the
following "views":
- number of requests observed (counter)
- number of requests per unit of time (history based on a counter)
- number of bytes transfered (counter)
- number of bytes transfered per unit of time = transfer rate (history
based on a counter)
- traffic pattern (histogram for descrete request sizes or for ranges of
request sizes)
- raw measurement data gathered
- etc.
As a device driver programmer I might pick a "view" a user is interested
in. My pick might miss by a mile. I simply don't know for sure beforehand.
The indirect function call could be a replaced by a switch statement.
Not sure whether this is less critical and more acceptable than indirect
function calls. Might be architucture dependent.
We can get rid of the indirect function call (or an alternative switch
statement) if the vote is against this level of flexibility.
Then it would be solely up to the exploiter to define once and for all
whether a particular sort of data is shown as a simple counter, a
histogram, a fill level indicator, this history-type statistic thing in
a ringbuffer etc. This might be fine for a considerable number of cases.
The lock is there to avoid trouble with concurrent updates to a
statistic. If per-CPU data was used, concurrent updates are fine as long
as they are done on different CPUs. Precautions for concurrent updates
to the same per-CPU is still needed, though.
The current interface allows to use the lock this way:
lock(stat_x->interface);
statistics_inc_nolock(stat_x, y);
statistics_inc_nolock(stat_m, n);
statistics_add_nolock(stat_a, e, f);
unlock(stat_x->interface);
Because we hold the same lock when creating output for users, coherency
of several statistics of a single entity can be achieved if statistic
updates are done within one critical section as shown above.
The lock is also used to make sure that updates to a statistic don't
happen while the setup of a statistic is changed by users. If we get rid
of the indirect function call, some of these setup changes go away,
anyway. Other cases, like statistic resets or inflating a 5-counter
histogram to a 25-counter histogram, don't go away. If I can figure out
how to reallocate, say, an array of counters for a histogram without
holding a lock while updates happen... Maybe I could temporarily turn of
a statistic.
> Take a look at many other subsystems - they do per CPU counters etc.
> to make this all fast.
I am looking into per CPU data.
But, is this really required for _all_ statistics? I see that it makes
sense to have per CPU optimizations for very critical components, like
parts of VM. But there are still a lot of do-it-yourself type statistics
around that use an atomic_t, for example, without implementing it per CPU.
Then, I am not sure yet whether per CPU data is feasible for histograms
and other more complex statistics. I have got to find out.
I tried to write the code in a way that allows to add other statistic
type, like counters, histograms and so on, with moderate effort. Maybe I
can use the internal interface to plug in some disclipline based on per
CPU counters...
> But it's still unclear why it would need such an heavyweight
> infrastructure. Normally it's not that bad to reimplemented on the
> fly. Maybe some common code can be refactored out of that, but
> probably not too much.
>
> [... lots of other code snipped ... ]
>
> Looks all very very overdesigned to me. How about you just start
> with a minimum specification and describe what you want to do?
As a device driver programmer I don't want to reinvent the wheel when
coding statistics. I would prefer to use a few and easy to use library
functions. I don't want to worry about getting my personal wheel being
functional. I'd rather use my time to worry about which kind of data is
really needed and which is not.
I'd like to provide a tool that can be customized at run time to a
certain degree because it might not be acceptable for customers to
install private kernels in order to get tuned statistics.
As a device driver programmer I can make an educated guess, at best,
about certain parameters that impact the processing of statistic data.
Users might know better whether they need to focus on latencies from 2
ms up to 64 ms or from 100 ms up to 500 ms, because this kind of
decisions depends on the environment to be measured, e.g. devices attached.
In a device driver, I don't want to spent much thought about the
statistic's user interface. Particularly not if the statistic is a
little bit more complex than a simple counter. Would be really nice to
have a user interface that looks the same for all exploiters, i.e. to
have common output formats for counters, histograms, fill level
indicators etc.
Martin
next reply other threads:[~2005-12-16 12:28 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-12-16 12:27 Martin Peschke [this message]
-- strict thread matches above, loose matches on Subject: below --
2006-05-24 12:33 [Patch 5/6] statistics infrastructure Martin Peschke
2006-05-24 22:57 ` Andrew Morton
2006-05-29 22:17 ` Martin Peschke
2006-05-30 1:17 ` Andrew Morton
2006-05-30 17:17 ` Martin Peschke
2006-05-30 19:19 ` Andrew Morton
2006-05-25 8:05 ` Nikita Danilov
2006-05-30 11:35 ` Martin Peschke
2006-05-19 16:13 Martin Peschke
2005-12-14 16:46 [patch " Martin Peschke
2005-12-14 18:38 ` Andi Kleen
2005-12-16 1:00 ` Martin Peschke
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=43A2B2B8.6090008@de.ibm.com \
--to=mp3@de.ibm.com \
--cc=ak@suse.de \
--cc=akpm@osdl.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox