* Statistical methods for latency profiling
@ 2004-07-31 5:22 Lee Revell
2004-08-01 2:55 ` Matt Mackall
0 siblings, 1 reply; 6+ messages in thread
From: Lee Revell @ 2004-07-31 5:22 UTC (permalink / raw)
To: jackit-devel; +Cc: Ingo Molnar, linux-kernel
Hey,
Recently Ingo Molnar asked in one of the voluntary-preempt threads for
the minimum and average scheduling delay reported by jackd. JACK does
not currently maintain these statistics.
I realized that the distribution of maximum latencies reported on each
process cycle is fairly normally distributed. If this data followed a
normal distribution, it becomes much easier to make generalizations
about it - the standard deviation becomes meaningful, we can identify
whether a change is statistically significant, etc. It would also
indicate the effectiveness of the Linux scheduler by identifying any
skew - the distribution in fact should be normal, as the only factor
that should be influencing this value is the variance in lengths of
whatever non-preemptible sections the kernel was in when we became
runnable.
The first step in being able to meaningfully analyze these results is to
determine the degree to which distribution is in fact normal. The first
step in this process is to just to look at a histogram of the data.
In any jackd engine, the allowable window for scheduler latency is:
(0, period_usecs/2)
A latency in the range:
[period_usecs/2, period_usecs]
will cause jackd to restart, and of course anything over period_usecs is
an XRUN.
Thus we can build a histogram of the observed latencies by creating an
array with period_usecs/2 elements, and adding 1 to each "bin" whenever
we see that value. Using 32 frames at 48KHz, we have 333 "bins".
The overhead is not too bad, because we only look at the highest latency
for each period (max_usecs), so we only have to update the histogram
once per period - if the underlying distribution is normal then the
distribution of the maximimums will be as well.
The ability to use statistical methods to generalize about this data
will be increasingly important moving forward, because this is the
*only* way we can provide hard numbers: confidence intervals, p-values,
standard deviations, etc. rather than "It feels faster" or "Since $FOO
change the system is sluggish". Statistical methods would allow us to
say "$FOO change increases the mean latency from 150 to 200 usecs, but
the standard deviation is now 20 rather than 75, so it's actually an
improvement".
Any change in the kernel that makes a difference in the "feel" of the
system is now open to analysis. Many developers apparently do not
consider a user report of "It feels sluggish" to be as much of a concern
as "X took Y amount of time now it takes Z". Statistical methods can
close this gap.
Anyone care to comment?
Lee
PS I did not do very well in my statistics classes, and it's possible
that many kernel hackers are completely unfamiliar with statistical
methods. Fortunately the subset that is immediately useful to us is
pretty easy to understand, and could be covered in one or two lectures,
or a HOWTO. Depending on the level of interest I can create one.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Statistical methods for latency profiling
2004-07-31 5:22 Statistical methods for latency profiling Lee Revell
@ 2004-08-01 2:55 ` Matt Mackall
2004-08-01 3:24 ` [Jackit-devel] " Lee Revell
0 siblings, 1 reply; 6+ messages in thread
From: Matt Mackall @ 2004-08-01 2:55 UTC (permalink / raw)
To: Lee Revell; +Cc: jackit-devel, Ingo Molnar, linux-kernel
On Sat, Jul 31, 2004 at 01:22:37AM -0400, Lee Revell wrote:
> Hey,
>
> Recently Ingo Molnar asked in one of the voluntary-preempt threads for
> the minimum and average scheduling delay reported by jackd. JACK does
> not currently maintain these statistics.
>
> I realized that the distribution of maximum latencies reported on each
> process cycle is fairly normally distributed.
This is not at all what I would expect. Instead, I'd expect to see
something like a gamma distribution, where we have everything
clustered down close to zero, but with a very long tail in the
positive direction falling off exponentially and obviously a hard
limit on the other side..
--
Mathematics is the supreme nostalgia of our time.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Jackit-devel] Re: Statistical methods for latency profiling
2004-08-01 2:55 ` Matt Mackall
@ 2004-08-01 3:24 ` Lee Revell
2004-08-01 5:59 ` Lee Revell
0 siblings, 1 reply; 6+ messages in thread
From: Lee Revell @ 2004-08-01 3:24 UTC (permalink / raw)
To: Matt Mackall; +Cc: jackit-devel, Ingo Molnar, linux-kernel
On Sat, 2004-07-31 at 22:55, Matt Mackall wrote:
> On Sat, Jul 31, 2004 at 01:22:37AM -0400, Lee Revell wrote:
> > Hey,
> >
> > Recently Ingo Molnar asked in one of the voluntary-preempt threads for
> > the minimum and average scheduling delay reported by jackd. JACK does
> > not currently maintain these statistics.
> >
> > I realized that the distribution of maximum latencies reported on each
> > process cycle is fairly normally distributed.
>
> This is not at all what I would expect. Instead, I'd expect to see
> something like a gamma distribution, where we have everything
> clustered down close to zero, but with a very long tail in the
> positive direction falling off exponentially and obviously a hard
> limit on the other side..
Right, it is a lot closer to a gamma distribution. It's been years
since I have used any of this, and I took stat for psych majors, vs stat
for engineers. I was a lot more interested in playing Doom at the
time...
This looks interesting:
http://www.itl.nist.gov/div898/handbook/eda/section3/ppccplot.htm
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4291.htm
I will have some numbers soon.
Lee
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Jackit-devel] Re: Statistical methods for latency profiling
2004-08-01 3:24 ` [Jackit-devel] " Lee Revell
@ 2004-08-01 5:59 ` Lee Revell
2004-08-01 11:21 ` Ingo Molnar
0 siblings, 1 reply; 6+ messages in thread
From: Lee Revell @ 2004-08-01 5:59 UTC (permalink / raw)
To: Matt Mackall; +Cc: jackit-devel, Ingo Molnar, linux-kernel, Peter Williams
On Sat, 2004-07-31 at 23:24, Lee Revell wrote:
> I will have some numbers soon.
Here is the histogram from running 500000 cycles (~5 minutes), with
max_sectors_kb -> 64, normal desktop usage. Kernel is 2.6.8-rc2-M5, set
to 1:3, with all irqs threaded except the sound card.
Delay Count
----- -----
6 2257
7 3210
8 797
9 717
10 283
11 219
12 235
13 250
14 263
15 320
16 334
17 287
18 268
19 260
20 223
21 155
22 109
23 90
24 41
25 19
26 3
27 4
28 1
29 3
30 1
31 2
32 5
33 3
34 1
35 7
36 1
37 3
38 4
39 1
40 5
41 8
42 2
43 1
44 1
47 1
48 1
52 2
62 1
79 1
100 1
The next three tests were done with iozone -a running.
Delay Count
----- -----
6 21
7 203
8 595
9 1756
10 1643
11 957
12 618
13 507
14 532
15 540
16 523
17 423
18 400
19 424
20 418
21 293
22 202
23 119
24 71
25 38
26 20
27 8
28 7
29 2
30 1
31 1
32 4
33 4
34 7
35 6
36 1
37 2
38 5
39 7
40 3
41 4
42 4
43 1
44 2
47 1
48 1
50 1
53 1
54 1
57 1
59 1
63 1
69 1
70 2
79 1
91 1
92 1
93 1
100 1
101 1
107 1
109 1
113 1
118 1
123 1
131 1
143 1
145 1
146 1
157 1
161 1
So stressing the filesystem moves the center to the right a bit, from
6-7 to 9-10, and *drastically* lengthens the 'tail'.
These numbers suggest to me that a lot of the latencies from 47 usecs
and up are caused by one code path, because they are so uniformly
distributed over the upper part of the histogram. The prime suspect of
course being the ide io completions. I tested this theory by lowering
max_sectors_kb from 64 to 32:
Delay Count
----- -----
6 3
7 427
8 742
9 1583
10 3429
11 2010
12 614
13 203
14 200
15 275
16 267
17 177
18 104
19 50
20 55
21 36
22 22
23 21
24 22
25 29
26 26
27 9
28 2
29 3
30 1
32 2
33 2
34 2
35 1
36 1
37 2
41 1
42 1
43 1
48 1
65 1
70 1
71 4
72 3
73 6
74 3
75 1
76 1
77 3
78 2
79 4
80 2
81 3
82 3
83 3
84 3
85 2
86 2
87 4
88 4
90 1
91 1
92 3
94 3
96 2
101 2
102 3
103 1
106 1
111 1
112 1
122 1
132 1
Finally, 16:
Delay Count
----- -----
6 420
7 1463
8 652
9 1444
10 2365
11 1111
12 537
13 265
14 232
15 241
16 253
17 218
18 173
19 181
20 233
21 188
22 125
23 76
24 55
25 23
26 9
27 15
28 6
29 2
30 2
31 6
32 5
33 9
34 8
35 4
36 2
37 5
38 2
39 1
40 2
41 3
42 3
43 2
45 1
47 1
49 1
51 3
52 1
53 1
54 1
55 1
56 1
57 1
59 2
60 2
61 2
62 2
63 1
64 1
66 2
67 2
68 3
69 1
71 1
72 3
74 2
75 1
76 2
77 1
78 2
79 1
80 1
81 2
82 1
84 1
85 3
86 1
87 1
89 1
90 2
92 1
100 1
101 1
These numbers all point to the ide sg completion code as the only
thing on the system generating latencies over ~42 usecs.
I have sent the patch separately to jackit-devel so as not to clutter
LKML with unrelated patches.
Lee
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Jackit-devel] Re: Statistical methods for latency profiling
2004-08-01 5:59 ` Lee Revell
@ 2004-08-01 11:21 ` Ingo Molnar
2004-08-01 11:53 ` Lee Revell
0 siblings, 1 reply; 6+ messages in thread
From: Ingo Molnar @ 2004-08-01 11:21 UTC (permalink / raw)
To: Lee Revell
Cc: Matt Mackall, jackit-devel, Ingo Molnar, linux-kernel,
Peter Williams
On Sun, 1 Aug 2004, Lee Revell wrote:
> So stressing the filesystem moves the center to the right a bit, from
> 6-7 to 9-10, and *drastically* lengthens the 'tail'.
basically each codepath has a typical latency distribution, and when a
workload uses multiple codepaths then the latencies get intermixed almost
linearly.
> These numbers suggest to me that a lot of the latencies from 47 usecs
> and up are caused by one code path, because they are so uniformly
> distributed over the upper part of the histogram. The prime suspect of
> course being the ide io completions. I tested this theory by lowering
> max_sectors_kb from 64 to 32:
> These numbers all point to the ide sg completion code as the only thing
> on the system generating latencies over ~42 usecs.
yep, that's a fair assumption. Once the IO-APIC irq-redirection problems
are solved i'll try to further thread the IDE completion IRQ to remove
that ~100 usecs latency.
Ingo
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Jackit-devel] Re: Statistical methods for latency profiling
2004-08-01 11:21 ` Ingo Molnar
@ 2004-08-01 11:53 ` Lee Revell
0 siblings, 0 replies; 6+ messages in thread
From: Lee Revell @ 2004-08-01 11:53 UTC (permalink / raw)
To: Ingo Molnar
Cc: Matt Mackall, jackit-devel, Ingo Molnar, linux-kernel,
Peter Williams
On Sun, 2004-08-01 at 07:21, Ingo Molnar wrote:
> On Sun, 1 Aug 2004, Lee Revell wrote:
>
> > So stressing the filesystem moves the center to the right a bit, from
> > 6-7 to 9-10, and *drastically* lengthens the 'tail'.
>
> basically each codepath has a typical latency distribution, and when a
> workload uses multiple codepaths then the latencies get intermixed almost
> linearly.
>
I noticed several distinct spikes with 1M samples, that blend into
smooth Erlang/gamma type distribution at 5M. I posted some more results
to jackit-dev. It seems like each of these would represent a common
code path out of a non-preemptible region. I suspect the spike at 70-80
usecs is a bug in my code, from updating the histogram every 1024
cycles. I will start posting results on the web soon, it's getting big.
> > These numbers suggest to me that a lot of the latencies from 47 usecs
> > and up are caused by one code path, because they are so uniformly
> > distributed over the upper part of the histogram. The prime suspect of
> > course being the ide io completions. I tested this theory by lowering
> > max_sectors_kb from 64 to 32:
>
> > These numbers all point to the ide sg completion code as the only thing
> > on the system generating latencies over ~42 usecs.
>
> yep, that's a fair assumption. Once the IO-APIC irq-redirection problems
> are solved i'll try to further thread the IDE completion IRQ to remove
> that ~100 usecs latency.
>
It would be interesting to identify the code paths corresponding to the
other peaks. It occurred to me that if you suspect a peak in the
histogram is related to a certain code path, you could stick a udelay in
there, and see if the spike moves up by the same amount.
Lee
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2004-08-01 11:53 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-07-31 5:22 Statistical methods for latency profiling Lee Revell
2004-08-01 2:55 ` Matt Mackall
2004-08-01 3:24 ` [Jackit-devel] " Lee Revell
2004-08-01 5:59 ` Lee Revell
2004-08-01 11:21 ` Ingo Molnar
2004-08-01 11:53 ` Lee Revell
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).