* PREEMPT_RT and I-PIPE: the numbers, part 4
@ 2005-07-08 23:01 Kristian Benoit
2005-07-09 1:28 ` Karim Yaghmour
` (3 more replies)
0 siblings, 4 replies; 14+ messages in thread
From: Kristian Benoit @ 2005-07-08 23:01 UTC (permalink / raw)
To: linux-kernel
Cc: paulmck, bhuey, andrea, tglx, karim, mingo, pmarques, bruce,
nickpiggin, ak, sdietrich, dwalker, hch, akpm, rpm, kbenoit
This is the 4th run of our tests.
Here are the changes since last time:
- For some reason we can't explain yet, highmem was enabled throughout
all our previous runs, this despite our use of a .config provided by
Ingo verbatim. Somehow, through the cycles of "make oldconfig" it got
re-enabled somewhere. Nevertheless, as we suspected, disabling
highmem did not, by itself, fix all of the performance issues with
PREEMPT_RT. Instead, as the numbers below show, there have been some
key changes made to PREEMPT_RT that regardless of highmem have made it
much better. Attached is a file showing the differences between the
enabling and disabling of highmem for two different PREEMPT_RT kernels.
- The software versions being used were:
2.6.12 - final
RT-0.7.51-02
I-pipe v0.7
System Load:
------------
The configuration is the same as before: 5 LMbench runs for each
setup. Again, LMbench running times provide but a general idea
of system performance. The actual results collected by LMbench are
more trustworthy.
LMbench running times:
+--------------------+-------+-------+-------+-------+-------+
| Kernel | plain | IRQ | ping | IRQ & | IRQ & |
| | | test | flood | ping | hd |
+====================+=======+=======+=======+=======+=======+
| Vanilla-2.6.12 | 152 s | 150 s | 188 s | 185 s | 239 s |
+====================+=======+=======+=======+=======+=======+
| with RT-V0.7.51-02 | 152 s | 153 s | 203 s | 201 s | 239 s |
+--------------------+-------+-------+-------+-------+-------+
| % | ~ | 2.0 | 8.0 | 8.6 | ~ |
+====================+=======+=======+=======+=======+=======+
| with Ipipe-0.7 | 149 s | 150 s | 193 s | 192 s | 236 s |
+--------------------+-------+-------+-------+-------+-------+
| % | -2.0 | ~ | 2.7 | 3.8 | -1.3 |
+--------------------+-------+-------+-------+-------+-------+
"plain" run:
Measurements | Vanilla | preempt_rt | ipipe
---------------+-------------+----------------+-------------
fork | 97us | 91us (-6%) | 101us (+4%)
open/close | 2.8us | 2.9us (+3%) | 2.8us (~)
execve | 348us | 347us (~) | 356us (+2%)
select 500fd | 13.9us | 17.1us (+23%) | 13.9us (~)
mmap | 776us | 629us (-19%) | 794us (+2%)
pipe | 5.1us | 5.1us (~) | 5.4us (+6%)
"IRQ test" run:
Measurements | Vanilla | preempt_rt | ipipe
---------------+-------------+----------------+-------------
fork | 98us | 91us (-7%) | 100us (+2%)
open/close | 2.8us | 2.8us (~) | 2.8us (~)
execve | 349us | 349us (~) | 359us (+3%)
select 500fd | 13.9us | 17.2us (+24%) | 13.9us (~)
mmap | 774us | 630us (-19%) | 792us (+2%)
pipe | 5.0us | 5.0us (~) | 5.5us (+10%)
"ping flood" run:
Measurements | Vanilla | preempt_rt | ipipe
---------------+-------------+----------------+-------------
fork | 152us | 171us (+13%) | 165us (+9%)
open/close | 4.5us | 4.8us (+7%) | 4.8us (+7%)
execve | 550us | 663us (+21%) | 601us (+9%)
select 500fd | 20.9us | 29.4us (+41%) | 21.9us (+5%)
mmap | 1140us | 1122us (-2%) | 1257us (+10%)
pipe | 8.3us | 9.4us (+13%) | 10.2us (+23%)
"IRQ & ping" run:
Measurements | Vanilla | preempt_rt | ipipe
---------------+-------------+----------------+-------------
fork | 150us | 170us (+13%) | 160us (+7%)
open/close | 4.6us | 5.3us (+15%) | 4.8us (+4%)
execve | 512us | 629us (+23%) | 610us (+19%)
select 500fd | 20.9us | 30.6us (+46%) | 24.3us (+16%)
mmap | 1128us | 1083us (-4%) | 1264us (+12%)
pipe | 9.0us | 9.6us (+7%) | 9.6us (+7%)
"IRQ & hd" run:
Measurements | Vanilla | preempt_rt | ipipe
---------------+-------------+----------------+-------------
fork | 101us | 94us (-7%) | 103us (+2%)
open/close | 2.9us | 2.9us (~) | 3.0us (+3%)
execve | 366us | 370us (+1%) | 372us (+2%)
select 500fd | 14.3us | 18.1us (+27%) | 14.5us (+1%)
mmap | 794us | 654us (+18%) | 822us (+4%)
pipe | 6.3us | 6.5us (+3%) | 7.3us (+16%)
Let's get the easy one out of the way: the numbers for I-pipe have
remained fairly similar to our last run.
The numbers for PREEMPT_RT, however, have dramatically improved. All
the 50%+ overhead we saw earlier has now gone away completely. The
improvement is in fact nothing short of amazing. We were actually
so surprised that we went around looking for any mistakes we may
have done in our testing. We haven't found any though. So unless
someone comes out with another set of numbers showing differently,
we think that a warm round of applause should go to the PREEMPT_RT
folks. If nothing else, it gives us satisfaction to know that these
test rounds have helped make things better.
Interrupt response time:
------------------------
These numbers were collected very much the same way as before:
1,000,000 samples.
+--------------------+------------+------+-------+------+--------+
| Kernel | sys load | Aver | Max | Min | StdDev |
+====================+============+======+=======+======+========+
| | None | 5.8 | 51.9 | 5.6 | 0.3 |
| | Ping | 5.8 | 49.1 | 5.6 | 0.8 |
| Vanilla-2.6.12 | lm. + ping | 6.1 | 53.3 | 5.6 | 1.1 |
| | lmbench | 6.1 | 77.9 | 5.6 | 0.8 |
| | lm. + hd | 6.5 | 128.4 | 5.6 | 3.4 |
| | DoHell | 6.8 | 555.6 | 5.6 | 7.2 |
+--------------------+------------+------+-------+------+--------+
| | None | 5.7 | 48.9 | 5.6 | 0.2 |
| | Ping | 7.0 | 62.0 | 5.6 | 1.5 |
| with RT-V0.7.51-02 | lm. + ping | 7.9 | 56.2 | 5.6 | 1.9 |
| | lmbench | 7.3 | 56.1 | 5.6 | 1.4 |
| | lm. + hd | 7.3 | 70.5 | 5.6 | 1.8 |
| | DoHell | 7.4 | 54.6 | 5.6 | 1.4 |
+--------------------+------------+------+-------+------+--------+
| | None | 7.2 | 47.6 | 5.7 | 1.9 |
| | Ping | 7.3 | 48.9 | 5.7 | 0.4 |
| with Ipipe-0.7 | lm.+ ping | 7.6 | 50.5 | 5.7 | 0.8 |
| | lmbench | 7.5 | 50.5 | 5.7 | 0.9 |
| | lm. + hd | 7.5 | 50.5 | 5.7 | 1.1 |
| | DoHell | 7.6 | 50.5 | 5.7 | 0.7 |
+--------------------+------------+------+-------+------+--------+
Legend:
None = nothing special
ping = on host: "sudo ping -f $TARGET_IP_ADDR"
lm. + ping = previous test and "make rerun" in lmbench-2.0.4/src/ on
target
lmbench = "make rerun" in lmbench-2.0.4/src/ on target
lm. + hd = previous test with the following being done on the target:
"while [ true ]
do dd if=/dev/zero of=/tmp/dummy count=512 bs=1m
done"
DoHell = See:
http://marc.theaimsgroup.com/?l=linux-kernel&m=111947618802722&w=2
The results above match those found earlier.
Overall analysis:
-----------------
We certainly had no intention of doing a 4th round. But having discovered
the highmem problem, we decided to finally do one more. Fortunately,
Ingo had fixed a few things in the mean time, and the new results now
are that much more better for PREEMPT_RT.
We have yet to fully understand, however, exactly what was the problem
with PREEMPT_RT before .50-36. We know highmem wasn't all, as the attached
results show, so we went digging in the two releases just following
.50-35 as those were the ones mentioned by Ingo in reply to our 3rd
posting as having had siginificant improvements in terms of performance.
Between 50-35 and 50-36, we see a bunch of TLB fixes. Does this mean
that the TLB was getting overly thrashed in PREEMPT_RT previously?
Between 50-36 and 50-37 the changes are less straight-forward though.
We noticed that a few things were added like add/sub_preempt_count_ti(),
inc/dec_preempt_count_ti() and a couple of *_ti, but we couldn't figure
out what "ti" means. Also, instead of variables, some macros are being
used. For example, instances of "eip" have been replaced with "__EIP__".
Any explanation as to how these changes modified the results so
significantly, and the underlying problems that were fixed, would be
great.
Also, in order to evaluate things as even-handidly as possible, it would
be interesting to know of any general improvements, if any, that were
introduced in the PREEMPT_RT patches that would also be applicable to
vanilla Linux.
While it has improved greatly, it remains that PREEMPT_RT is generally
more sensitive to interrupt load than the I-pipe. Also, as in previous
runs, the I-pipe's response times tend to remain more stable and
generally lower than PREEMPT_RT's. Also, it remains that, as can be
seen by the many problems we've encountered, PREEMPT_RT is highly
kernel-config sensistive.
Again, we are happy that these testruns have motivated the fixing of
some performance problems in PREEMPT_RT, and we continue to encourage
interested parties to continue their efforts in making Linux more
suitable to real-time applications. One area of interest, as was
mentioned by Ingo in reply to our publication of our 3rd test results,
is scheduling latency. We have not studied this area in our tests,
but it is certainly relevant and we hope others will dig further in
this direction.
It remains that no test result can by itself be definitive, and at
this stage we strongly believe that others need to be involved in
continuous testing of real-time approaches. This is the only way any
significant advancement will be made. And as these past testruns
have shown, proponents of one approach or another can be very
sensitive to the publication of numbers showing their projects in
a bad light. Yet, whether good or bad, performance numbers are an
important part of any process that strives to achieve determinism.
Kristian Benoit
Karim Yaghmour
--
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || 1-866-677-4546
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: PREEMPT_RT and I-PIPE: the numbers, part 4
2005-07-08 23:01 PREEMPT_RT and I-PIPE: the numbers, part 4 Kristian Benoit
@ 2005-07-09 1:28 ` Karim Yaghmour
2005-07-09 7:19 ` Ingo Molnar
` (2 subsequent siblings)
3 siblings, 0 replies; 14+ messages in thread
From: Karim Yaghmour @ 2005-07-09 1:28 UTC (permalink / raw)
To: Kristian Benoit
Cc: linux-kernel, paulmck, bhuey, andrea, tglx, mingo, pmarques,
bruce, nickpiggin, ak, sdietrich, dwalker, hch, akpm, rpm
[-- Attachment #1: Type: text/plain, Size: 212 bytes --]
Missing attachment herein included.
Karim
--
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || karim@opersys.com || 1-866-677-4546
[-- Attachment #2: highmem.sum --]
[-- Type: text/plain, Size: 11385 bytes --]
L M B E N C H 2 . 0 S U M M A R Y
------------------------------------
Processor, Processes - times in microseconds - smaller is better
----------------------------------------------------------------
null null open signal signal fork execve /bin/sh
kernel call I/O stat fstat close install handle process process process
----------------------------- ------- ------- ------- ------- ------- ------- ------- ------- ------- -------
HIGHMEM-RT-V0.7.50-35 0.18 0.2947 3.02 0.42 3.62 0.59 1.98 156 448 1481
NOHIGHMEM-RT-V0.7.50-35 0.18 0.28635 2.91 0.42 3.70 0.58 2.02 111 383 1372
HIGHMEM-RT-V0.7.51-02 0.18 0.27045 2.47 0.39 3.02 0.56 1.75 103 372 1352
NOHIGHMEM-RT-V0.7.51-02 0.18 0.2673 2.36 0.39 2.77 0.56 1.72 90 351 1328
File select - times in microseconds - smaller is better
-------------------------------------------------------
select select select select select select select select
kernel 10 fd 100 fd 250 fd 500 fd 10 tcp 100 tcp 250 tcp 500 tcp
----------------------------- ------- ------- ------- ------- ------- ------- ------- -------
HIGHMEM-RT-V0.7.50-35 1.29 5.70 13.21 25.76 1.49 7.8809 18.6905 na
NOHIGHMEM-RT-V0.7.50-35 1.26 5.69 13.25 25.84 1.47 na na na
HIGHMEM-RT-V0.7.51-02 1.01 3.88 8.82 17.08 1.24 na 14.1979 27.8158
NOHIGHMEM-RT-V0.7.51-02 1.02 3.90 8.84 17.12 1.30 6.0573 na na
Context switching with 0K - times in microseconds - smaller is better
---------------------------------------------------------------------
2proc/0k 4proc/0k 8proc/0k 16proc/0k 32proc/0k 64proc/0k 96proc/0k
kernel ctx swtch ctx swtch ctx swtch ctx swtch ctx swtch ctx swtch ctx swtch
----------------------------- --------- --------- --------- --------- --------- --------- ---------
HIGHMEM-RT-V0.7.50-35 4.87 5.55 5.01 4.47 4.00 4.45 5.13
NOHIGHMEM-RT-V0.7.50-35 3.25 3.92 3.53 3.10 2.96 3.46 4.09
HIGHMEM-RT-V0.7.51-02 2.70 3.48 3.51 3.50 3.36 3.93 4.82
NOHIGHMEM-RT-V0.7.51-02 1.86 2.23 2.41 2.41 2.41 3.02 3.92
Context switching with 4K - times in microseconds - smaller is better
---------------------------------------------------------------------
2proc/4k 4proc/4k 8proc/4k 16proc/4k 32proc/4k 64proc/4k 96proc/4k
kernel ctx swtch ctx swtch ctx swtch ctx swtch ctx swtch ctx swtch ctx swtch
----------------------------- --------- --------- --------- --------- --------- --------- ---------
HIGHMEM-RT-V0.7.50-35 5.48 4.75 4.47 4.76 4.68 5.90 7.24
NOHIGHMEM-RT-V0.7.50-35 3.88 4.54 4.02 3.91 4.04 4.93 5.85
HIGHMEM-RT-V0.7.51-02 3.25 3.59 3.85 3.89 4.18 5.41 6.75
NOHIGHMEM-RT-V0.7.51-02 2.70 3.01 2.99 3.04 3.31 4.56 6.16
Context switching with 8K - times in microseconds - smaller is better
---------------------------------------------------------------------
2proc/8k 4proc/8k 8proc/8k 16proc/8k 32proc/8k 64proc/8k 96proc/8k
kernel ctx swtch ctx swtch ctx swtch ctx swtch ctx swtch ctx swtch ctx swtch
----------------------------- --------- --------- --------- --------- --------- --------- ---------
HIGHMEM-RT-V0.7.50-35 6.09 5.31 5.22 5.09 5.68 7.82 8.87
NOHIGHMEM-RT-V0.7.50-35 4.51 5.08 4.54 4.36 4.44 6.49 7.75
HIGHMEM-RT-V0.7.51-02 3.85 4.01 4.20 4.31 5.27 7.38 8.51
NOHIGHMEM-RT-V0.7.51-02 3.05 3.49 3.53 3.60 3.99 6.37 7.56
Context switching with 16K - times in microseconds - smaller is better
----------------------------------------------------------------------
2proc/16k 4proc/16k 8proc/16k 16prc/16k 32prc/16k 64prc/16k 96prc/16k
kernel ctx swtch ctx swtch ctx swtch ctx swtch ctx swtch ctx swtch ctx swtch
----------------------------- --------- --------- --------- --------- --------- --------- ---------
HIGHMEM-RT-V0.7.50-35 6.29 5.47 5.55 5.27 7.19 10.87 11.29
NOHIGHMEM-RT-V0.7.50-35 4.72 4.28 4.01 4.37 6.16 9.32 10.01
HIGHMEM-RT-V0.7.51-02 4.20 4.74 4.57 4.78 6.92 10.67 11.16
NOHIGHMEM-RT-V0.7.51-02 3.17 3.45 3.51 3.76 5.65 9.84 10.31
Context switching with 32K - times in microseconds - smaller is better
----------------------------------------------------------------------
2proc/32k 4proc/32k 8proc/32k 16prc/32k 32prc/32k 64prc/32k 96prc/32k
kernel ctx swtch ctx swtch ctx swtch ctx swtch ctx swtch ctx swtch ctx swtch
----------------------------- --------- --------- --------- --------- --------- --------- ---------
HIGHMEM-RT-V0.7.50-35 6.620 5.750 5.990 8.820 12.700 14.610 14.630
NOHIGHMEM-RT-V0.7.50-35 5.460 5.070 5.080 6.270 12.310 14.080 13.970
HIGHMEM-RT-V0.7.51-02 4.800 5.510 5.550 7.050 13.350 14.970 14.940
NOHIGHMEM-RT-V0.7.51-02 3.940 4.250 4.320 6.020 12.010 14.160 14.150
Context switching with 64K - times in microseconds - smaller is better
----------------------------------------------------------------------
2proc/64k 4proc/64k 8proc/64k 16prc/64k 32prc/64k 64prc/64k 96prc/64k
kernel ctx swtch ctx swtch ctx swtch ctx swtch ctx swtch ctx swtch ctx swtch
----------------------------- --------- --------- --------- --------- --------- --------- ---------
HIGHMEM-RT-V0.7.50-35 8.82 9.54 10.67 19.21 23.16 22.94 22.90
NOHIGHMEM-RT-V0.7.50-35 7.20 7.96 8.59 17.75 21.73 21.81 21.79
HIGHMEM-RT-V0.7.51-02 6.68 7.44 9.08 18.70 22.43 22.69 22.66
NOHIGHMEM-RT-V0.7.51-02 5.99 7.35 9.04 17.28 21.81 22.24 22.08
File create/delete and VM system latencies in microseconds - smaller is better
----------------------------------------------------------------------------
0K 0K 1K 1K 4K 4K 10K 10K Mmap Prot Page
kernel Create Delete Create Delete Create Delete Create Delete Latency Fault Fault
------------------------------ ------- ------- ------- ------- ------- ------- ------- ------- ------- ------ ------
HIGHMEM-RT-V0.7.50-35 14.6 7.7 27.4 16.6 28.8 16.6 43.6 20.1 2862 0.86 2.0
NOHIGHMEM-RT-V0.7.50-35 14.1 7.3 27.3 16.2 27.9 16.0 44.7 21.6 1180 0.86 2.0
HIGHMEM-RT-V0.7.51-02 13.3 6.0 24.9 13.8 25.7 13.8 40.4 16.2 642 1.01 2.0
NOHIGHMEM-RT-V0.7.51-02 12.0 5.9 24.0 13.1 24.5 13.1 40.6 16.1 630 1.55 2.0
*Local* Communication latencies in microseconds - smaller is better
-------------------------------------------------------------------
kernel Pipe AF/Unix UDP RPC/UDP TCP RPC/TCP TCPconn
----------------------------- ------- ------- ------- ------- ------- ------- -------
HIGHMEM-RT-V0.7.50-35 11.52 17.36 23.303 34.6122 25.5294 39.8675 84.95
NOHIGHMEM-RT-V0.7.50-35 8.30 11.89 18.7254 29.2952 19.3239 31.9442 70.27
HIGHMEM-RT-V0.7.51-02 7.10 13.68 21.6852 31.8112 23.3988 37.5052 79.29
NOHIGHMEM-RT-V0.7.51-02 9.86 16.24 19.4907 36.7468 30.8401 45.931 300463.00
*Local* Communication bandwidths in MB/s - bigger is better
-----------------------------------------------------------
File Mmap Bcopy Bcopy Memory Memory
kernel Pipe AF/Unix TCP reread reread (libc) (hand) read write
----------------------------- ------- ------- ------- ------- ------- ------- ------- ------- -------
HIGHMEM-RT-V0.7.50-35 1742.8 2223.2 1119.7 2154.0 3178.3 866.6 881.5 3183.4 1421.0
NOHIGHMEM-RT-V0.7.50-35 1918.1 2382.9 1317.9 2156.4 3170.4 935.9 882.1 3164.5 1410.3
HIGHMEM-RT-V0.7.51-02 2047.8 2400.0 1199.6 2208.6 3181.6 867.2 880.1 3134.9 1417.2
NOHIGHMEM-RT-V0.7.51-02 1367.2 1701.3 1165.6 2067.5 3162.4 887.5 918.2 3156.6 1446.6
*Local* More Communication bandwidths in MB/s - bigger is better
----------------------------------------------------------------
File Mmap Aligned Partial Partial Partial Partial
OS open open Bcopy Bcopy Mmap Mmap Mmap Bzero
close close (libc) (hand) read write rd/wrt copy HTTP
----------------------------- ------- ------- ------- ------- ------- ------- ------- ------- -------
HIGHMEM-RT-V0.7.50-35 2178.2 1350.1 896.7 886.4 3504.0 1455.1 1432.3 1275.7 16.15
NOHIGHMEM-RT-V0.7.50-35 2176.7 1414.6 892.5 878.0 3500.9 1440.6 1425.9 1281.8 18.11
HIGHMEM-RT-V0.7.51-02 2206.5 1453.7 901.3 890.5 3509.9 1453.2 1433.9 1279.3 17.20
NOHIGHMEM-RT-V0.7.51-02 2065.1 1397.8 895.6 931.9 3489.8 1472.5 1450.9 1247.3 16.75
Memory latencies in nanoseconds - smaller is better
---------------------------------------------------
kernel Mhz L1 $ L2 $ Main mem
----------------------------- ----- ------- ------- ---------
HIGHMEM-RT-V0.7.50-35 2779 1.45 10.32 44.9
NOHIGHMEM-RT-V0.7.50-35 2779 1.45 10.41 44.8
HIGHMEM-RT-V0.7.51-02 2779 1.44 10.40 44.8
NOHIGHMEM-RT-V0.7.51-02 2779 1.45 10.23 45.1
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: PREEMPT_RT and I-PIPE: the numbers, part 4
2005-07-08 23:01 PREEMPT_RT and I-PIPE: the numbers, part 4 Kristian Benoit
2005-07-09 1:28 ` Karim Yaghmour
@ 2005-07-09 7:19 ` Ingo Molnar
2005-07-09 15:39 ` Karim Yaghmour
2005-07-09 17:22 ` Daniel Walker
2005-07-09 9:01 ` Paul Rolland
2005-07-11 5:24 ` Ingo Molnar
3 siblings, 2 replies; 14+ messages in thread
From: Ingo Molnar @ 2005-07-09 7:19 UTC (permalink / raw)
To: Kristian Benoit
Cc: linux-kernel, paulmck, bhuey, andrea, tglx, karim, pmarques,
bruce, nickpiggin, ak, sdietrich, dwalker, hch, akpm, rpm
* Kristian Benoit <kbenoit@opersys.com> wrote:
> The numbers for PREEMPT_RT, however, have dramatically improved. All
> the 50%+ overhead we saw earlier has now gone away completely. The
> improvement is in fact nothing short of amazing. We were actually so
> surprised that we went around looking for any mistakes we may have
> done in our testing. We haven't found any though. So unless someone
> comes out with another set of numbers showing differently, we think
> that a warm round of applause should go to the PREEMPT_RT folks. If
> nothing else, it gives us satisfaction to know that these test rounds
> have helped make things better.
yeah, they definitely have helped, and thanks for this round of testing
too! I'll explain the recent changes to PREEMPT_RT that resulted in
these speedups in another mail.
Looking at your numbers i realized that the area where PREEMPT_RT is
still somewhat behind (the flood ping +~10% overhead), you might be
using an invalid test methodology:
> ping = on host: "sudo ping -f $TARGET_IP_ADDR"
i've done a couple of ping -f flood tests between various testboxes
myself, and one thing i found was that it's close to impossible to
create a stable, comparable packets per second workload! The pps rate
heavily fluctuated even within the same testrun. Another phenomenon i
noticed is that the PREEMPT_RT kernel has a tendency to handle _more_
ping packets per second, while the vanilla (and thus i suspect the
i-pipe) kernel throws away more packets.
Thus lmbench under PREEMPT_RT may perform 'slower', but in fact it was
just an unbalanced and thus unfair test. Once i created a stable packet
rate, PREEMPT_RT's IRQ overhead became acceptable.
(if your goal was to check how heavily external interrupts can influence
a PREEMPT_RT box, you should chrt the network IRQ thread to SCHED_OTHER
and renice it and softirq-net-rx and softirq-net-tx to nice +19.)
this phenomenon could be a speciality of my network setup, but still,
could you please verify the comparability of the ping -f workloads on
the vanilla and the PREEMPT_RT kernels? In particular, the interrupt
rate should be constant and comparable - but it might be better to look
at both the received and transmitted packets per second. (Since things
like iptraf are quite expensive when flood pinging is going on, the best
way i found to measure the packet rate was to process netstat -s output
via a simple script.)
Ingo
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: PREEMPT_RT and I-PIPE: the numbers, part 4
2005-07-08 23:01 PREEMPT_RT and I-PIPE: the numbers, part 4 Kristian Benoit
2005-07-09 1:28 ` Karim Yaghmour
2005-07-09 7:19 ` Ingo Molnar
@ 2005-07-09 9:01 ` Paul Rolland
2005-07-09 14:47 ` Karim Yaghmour
2005-07-09 15:22 ` Ingo Molnar
2005-07-11 5:24 ` Ingo Molnar
3 siblings, 2 replies; 14+ messages in thread
From: Paul Rolland @ 2005-07-09 9:01 UTC (permalink / raw)
To: 'Kristian Benoit', linux-kernel
Cc: paulmck, bhuey, andrea, tglx, karim, mingo, pmarques, bruce,
nickpiggin, ak, sdietrich, dwalker, hch, akpm, rpm
Hello,
> "IRQ & hd" run:
> Measurements | Vanilla | preempt_rt | ipipe
> ---------------+-------------+----------------+-------------
> fork | 101us | 94us (-7%) | 103us (+2%)
> open/close | 2.9us | 2.9us (~) | 3.0us (+3%)
> execve | 366us | 370us (+1%) | 372us (+2%)
> select 500fd | 14.3us | 18.1us (+27%) | 14.5us (+1%)
> mmap | 794us | 654us (+18%) | 822us (+4%)
^^^^^^^^^^^^
You mean -18%, not +18% I think.
Just having a quick long at the numbers, it seems that now the "weak"
part in PREEMPT_RT is the select 500fd test.
Ingo, any idea about this one ?
Regards,
Paul
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: PREEMPT_RT and I-PIPE: the numbers, part 4
2005-07-09 9:01 ` Paul Rolland
@ 2005-07-09 14:47 ` Karim Yaghmour
2005-07-09 15:22 ` Ingo Molnar
1 sibling, 0 replies; 14+ messages in thread
From: Karim Yaghmour @ 2005-07-09 14:47 UTC (permalink / raw)
To: rol
Cc: 'Kristian Benoit', linux-kernel, paulmck, bhuey, andrea,
tglx, mingo, pmarques, bruce, nickpiggin, ak, sdietrich, dwalker,
hch, akpm, rpm
Paul Rolland wrote:
>>mmap | 794us | 654us (+18%) | 822us (+4%)
> ^^^^^^^^^^^^
> You mean -18%, not +18% I think.
Doh ... too many numbers flying around ... yes, -18% :)
Karim
--
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || karim@opersys.com || 1-866-677-4546
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: PREEMPT_RT and I-PIPE: the numbers, part 4
2005-07-09 9:01 ` Paul Rolland
2005-07-09 14:47 ` Karim Yaghmour
@ 2005-07-09 15:22 ` Ingo Molnar
1 sibling, 0 replies; 14+ messages in thread
From: Ingo Molnar @ 2005-07-09 15:22 UTC (permalink / raw)
To: Paul Rolland
Cc: 'Kristian Benoit', linux-kernel, paulmck, bhuey, andrea,
tglx, karim, pmarques, bruce, nickpiggin, ak, sdietrich, dwalker,
hch, akpm, rpm
* Paul Rolland <rol@witbe.net> wrote:
> > "IRQ & hd" run:
> > Measurements | Vanilla | preempt_rt | ipipe
> > ---------------+-------------+----------------+-------------
> > fork | 101us | 94us (-7%) | 103us (+2%)
> > open/close | 2.9us | 2.9us (~) | 3.0us (+3%)
> > execve | 366us | 370us (+1%) | 372us (+2%)
> > select 500fd | 14.3us | 18.1us (+27%) | 14.5us (+1%)
> > mmap | 794us | 654us (+18%) | 822us (+4%)
>
> ^^^^^^^^^^^^
> You mean -18%, not +18% I think.
>
> Just having a quick long at the numbers, it seems that now the "weak"
> part in PREEMPT_RT is the select 500fd test.
>
> Ingo, any idea about this one ?
yeah. In the '500 fds select' benchmark workload do_select() does an
extremely tight loop over a 500-entry table that does an fget(). fget()
acquires/releases current->files->file_lock. So we get 1000 lock and
unlock operations in this workload. It cannot be for free. In fact, look
at how the various vanilla kernels compare:
AVG v2.6.12 v2.6.12-PREEMPT v2.6.12-SMP
------------------------------------------------------------------
select: 11.48 12.35 ( 7%) 26.40 (129%)
(tested on one of my single-processor testsystems.)
I.e. SMP locking is already 129% overhead, and CONFIG_PREEMPT (which
just bumps the preempt count twice(!)) has 7% overhead. In that sense,
the 27% select-500-fds overhead measured for PREEMPT_RT is more than
acceptable.
anyway, these days apps that do select() over 500 fds are expected to
perform bad no matter what locking method is used. [To fix this
particular overhead we could take the current->file_lock outside of the
loop and do a get_file() within do_select(). This would improve SMP too.
But i doubt anyone cares.]
Ingo
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: PREEMPT_RT and I-PIPE: the numbers, part 4
2005-07-09 7:19 ` Ingo Molnar
@ 2005-07-09 15:39 ` Karim Yaghmour
2005-07-09 15:53 ` Karim Yaghmour
2005-07-11 7:05 ` Ingo Molnar
2005-07-09 17:22 ` Daniel Walker
1 sibling, 2 replies; 14+ messages in thread
From: Karim Yaghmour @ 2005-07-09 15:39 UTC (permalink / raw)
To: Ingo Molnar
Cc: Kristian Benoit, linux-kernel, paulmck, bhuey, andrea, tglx,
pmarques, bruce, nickpiggin, ak, sdietrich, dwalker, hch, akpm,
rpm
Ingo Molnar wrote:
> yeah, they definitely have helped, and thanks for this round of testing
> too! I'll explain the recent changes to PREEMPT_RT that resulted in
> these speedups in another mail.
Great, I'm very much looking forward to it.
> Looking at your numbers i realized that the area where PREEMPT_RT is
> still somewhat behind (the flood ping +~10% overhead), you might be
> using an invalid test methodology:
I've got to smile reading this :) If one thing became clear out of
these threads is that no matter how careful we are with our testing,
there is always something that can be criticized about them.
Take the highmem thing, for example, I never really bought the
argument that highmem was the root of all evil ;) , and the last
comparison we did between 50-35 and 51-02 with and without highmem
clearly showed that indeed while highmem is a factor, there are
inherent problems elsewhere than the disabling of highmem doesn't
erase. Also, both vanilla and I-pipe were run with highmem, and if
they don't suffer from it, then the problem is/was with PREEMPT_RT.
With ping floods, as with other things, there is room for
improvement, but keep in mind that these are standard tests used
as-is by others to make measurements, that each run is made 5
times, and that the values in those tables represent the average
of 5 runs. So while they may not be as exact as could be, I don't
see why they couldn't be interpreted as giving us a "good idea" of
what's happening.
For one thing, the heavy fluctuation in ping packets may actually
induce a state in the monitored kernel which is more akin to the
one we want to measure than if we had a steady flow of packets.
I would usually like very much to entertain this further, but we've
really busted all the time slots I had allocated to this work. So at
this time, we really think others should start publishing results.
After all, our results are no more authoritative than those
published by others.
Karim
--
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || karim@opersys.com || 1-866-677-4546
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: PREEMPT_RT and I-PIPE: the numbers, part 4
2005-07-09 15:39 ` Karim Yaghmour
@ 2005-07-09 15:53 ` Karim Yaghmour
2005-07-09 15:53 ` Karim Yaghmour
2005-07-11 7:05 ` Ingo Molnar
1 sibling, 1 reply; 14+ messages in thread
From: Karim Yaghmour @ 2005-07-09 15:53 UTC (permalink / raw)
To: karim
Cc: Ingo Molnar, Kristian Benoit, linux-kernel, paulmck, bhuey,
andrea, tglx, pmarques, bruce, nickpiggin, ak, sdietrich, dwalker,
hch, akpm, rpm
Karim Yaghmour wrote:
> I would usually like very much to entertain this further, but we've
> really busted all the time slots I had allocated to this work. So at
> this time, we really think others should start publishing results.
> After all, our results are no more authoritative than those
> published by others.
BTW, we've also released the latest very of the LRTBF we used to
publish these latest results, so others can it a try too :)
Karim
--
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || karim@opersys.com || 1-866-677-4546
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: PREEMPT_RT and I-PIPE: the numbers, part 4
2005-07-09 15:53 ` Karim Yaghmour
@ 2005-07-09 15:53 ` Karim Yaghmour
0 siblings, 0 replies; 14+ messages in thread
From: Karim Yaghmour @ 2005-07-09 15:53 UTC (permalink / raw)
To: karim
Cc: Ingo Molnar, Kristian Benoit, linux-kernel, paulmck, bhuey,
andrea, tglx, pmarques, bruce, nickpiggin, ak, sdietrich, dwalker,
hch, akpm, rpm
Can't type right anymore ...
Karim Yaghmour wrote:
> BTW, we've also released the latest very of the LRTBF we used to
^^^^ version
Karim
--
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || karim@opersys.com || 1-866-677-4546
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: PREEMPT_RT and I-PIPE: the numbers, part 4
2005-07-09 7:19 ` Ingo Molnar
2005-07-09 15:39 ` Karim Yaghmour
@ 2005-07-09 17:22 ` Daniel Walker
2005-07-09 23:37 ` Bill Huey
1 sibling, 1 reply; 14+ messages in thread
From: Daniel Walker @ 2005-07-09 17:22 UTC (permalink / raw)
To: Ingo Molnar
Cc: Kristian Benoit, linux-kernel, paulmck, bhuey, andrea, tglx,
karim, pmarques, bruce, nickpiggin, ak, sdietrich, hch, akpm, rpm
On Sat, 2005-07-09 at 09:19 +0200, Ingo Molnar wrote:
> (if your goal was to check how heavily external interrupts can influence
> a PREEMPT_RT box, you should chrt the network IRQ thread to SCHED_OTHER
> and renice it and softirq-net-rx and softirq-net-tx to nice +19.)
>
This is interesting. I wonder how much tuning like this , just changing
thread priorities, which would effect the results of these tests.
PREEMPT_RT is not pre-tuned for every situation , but the bests
performance is achieved when the system is tuned. If any of these tests
rely on a low priority thread, then we just raise the priority and you
have better performance.
These other systems like Vanilla 2.6.x , and I-pipe aren't massively
tunable like PREEMPT_RT .
Daniel
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: PREEMPT_RT and I-PIPE: the numbers, part 4
2005-07-09 17:22 ` Daniel Walker
@ 2005-07-09 23:37 ` Bill Huey
0 siblings, 0 replies; 14+ messages in thread
From: Bill Huey @ 2005-07-09 23:37 UTC (permalink / raw)
To: Daniel Walker
Cc: Ingo Molnar, Kristian Benoit, linux-kernel, paulmck, bhuey,
andrea, tglx, karim, pmarques, bruce, nickpiggin, ak, sdietrich,
hch, akpm, rpm
On Sat, Jul 09, 2005 at 10:22:07AM -0700, Daniel Walker wrote:
> PREEMPT_RT is not pre-tuned for every situation , but the bests
> performance is achieved when the system is tuned. If any of these tests
> rely on a low priority thread, then we just raise the priority and you
> have better performance.
Just think about it. Throttling those threads via the scheduler throttles
the system in super controllable ways. This is very cool stuff. :)
bill
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: PREEMPT_RT and I-PIPE: the numbers, part 4
2005-07-08 23:01 PREEMPT_RT and I-PIPE: the numbers, part 4 Kristian Benoit
` (2 preceding siblings ...)
2005-07-09 9:01 ` Paul Rolland
@ 2005-07-11 5:24 ` Ingo Molnar
3 siblings, 0 replies; 14+ messages in thread
From: Ingo Molnar @ 2005-07-11 5:24 UTC (permalink / raw)
To: Kristian Benoit
Cc: linux-kernel, paulmck, bhuey, tglx, karim, pmarques, bruce,
nickpiggin, ak, sdietrich, dwalker, hch, akpm, rpm
* Kristian Benoit <kbenoit@opersys.com> wrote:
[...]
> "plain" run:
>
> Measurements | Vanilla | preempt_rt | ipipe
> ---------------+-------------+----------------+-------------
> fork | 97us | 91us (-6%) | 101us (+4%)
> mmap | 776us | 629us (-19%) | 794us (+2%)
some of you have wondered how it's possible that the PREEMPT_RT kernel
is _faster_ than the vanilla kernel in these two metrics.
I've done some more profiling, and one reason is kmap_atomic(). As i
pointed out in an earlier mail, in your tests you not only had HIGHMEM64
enabled, but also HIGHPTE, which is a heavy kmap_atomic() user. [and
which is an option meant for systems with 8GB or more RAM, not the
typical embedded target.]
kmap_atomic() is a pretty preemption-unfriendly per-CPU construct, which
under PREEMPT_RT had to be changed and was mapped into kmap(). The
performance advantage comes from the caching built into kmap() and not
having to do per-page invlpg calls. (which can be pretty slow,
expecially on highmem64) The 'mapping kmap_atomic into kmap' technique
is perfectly fine under PREEMPT_RT because all kernel code is
preemptible, but it's not really possible in the vanilla kernel due to
the fundamental non-preemptability of interrupts, the preempt-off-ness
of the mmu_gather mechanism, the atomicity of the ->page_table_lock
spinlock, etc.
so this is a case of 'fully preemptible beats non-preemptible due to
flexibility', but it should be more of an exception than the rule,
because generally the fully preemptible kernel tries to be 1:1 identical
to the vanilla kernel. But it's an interesting phenomenon from a
conceptual angle nevertheless.
Ingo
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: PREEMPT_RT and I-PIPE: the numbers, part 4
2005-07-09 15:39 ` Karim Yaghmour
2005-07-09 15:53 ` Karim Yaghmour
@ 2005-07-11 7:05 ` Ingo Molnar
2005-07-11 11:25 ` Karim Yaghmour
1 sibling, 1 reply; 14+ messages in thread
From: Ingo Molnar @ 2005-07-11 7:05 UTC (permalink / raw)
To: Karim Yaghmour
Cc: Kristian Benoit, linux-kernel, paulmck, bhuey, tglx, pmarques,
bruce, nickpiggin, ak, sdietrich, dwalker, hch, akpm, rpm
* Karim Yaghmour <karim@opersys.com> wrote:
> With ping floods, as with other things, there is room for improvement,
> but keep in mind that these are standard tests [...]
the problem is that ping -f isnt what it used to be. If you are using a
recent distribution with an updated ping utility, these days the
equivalent of 'ping -f' is something like:
ping -q -l 500 -A -s 10 <target>
and even this variant (and the old variant) needs to be carefully
validated for actual workload generated. Note that this is true for
workloads against vanilla kernels too. (Also note that i did not claim
that the flood ping workload you used is invalid - you have not
published packet rates or interrupt rates that could help us judge how
constant the workload was. I only said that according to my measurements
it's quite unstable, and that you should double-check it. Just running
it and ACK-ing that the packet rates are stable and identical amongst
all of these kernels would be enough to put this concern to rest.)
to see why i think there might be something wrong with the measurement,
just look at the raw numbers:
LMbench running times:
+--------------------+-------+-------+-------+-------+-------+
| Kernel | plain | IRQ | ping | IRQ & | IRQ & |
| | | test | flood | ping | hd |
+====================+=======+=======+=======+=======+=======+
| Vanilla-2.6.12 | 152 s | 150 s | 188 s | 185 s | 239 s |
+====================+=======+=======+=======+=======+=======+
| with RT-V0.7.51-02 | 152 s | 153 s | 203 s | 201 s | 239 s |
+====================+=======+=======+=======+=======+=======+
note that both the 'IRQ' and 'IRQ & hd' test involves interrupts, and
PREEMPT_RT shows overhead within statistical error, but only the 'flood
ping' workload created a ~8% slowdown.
my own testing (whatever it's worth) shows that during flood-pings, the
maximum overhead PREEMPT_RT caused was 4%. I.e. PREEMPT_RT used 4% more
system-time than the vanilla UP kernel when the CPU was 99% dedicated to
handling ping replies. But in your tests not the full CPU was dedicated
to flood ping replies (of course). Your above numbers suggest that under
the vanilla kernel 23% of CPU time was used up by flood pinging.
(188/152 == +23.6%)
Under PREEMPT_RT, my tentative guesstimation would be that it should go
from 23.6% to 24.8% - i.e. a 1.2% less CPU time for lmbench - which
turns into roughly +1 seconds of lmbench wall-clock time slowdown. Not
15 seconds, like your test suggests. So there's a more than an order of
magnitude difference in the numbers, which i felt appropriate sharing :)
_And_ your own hd and stable-rate irq workloads suggest that PREEMPT_RT
and vanilla are very close to each other. Let me repeat the table, with
only the numbers included where there was no flood pinging going on:
LMbench running times:
+--------------------+-------+-------+-------+-------+-------+
| Kernel | plain | IRQ | | | IRQ & |
| | | test | | | hd |
+====================+=======+=======+=======+=======+=======+
| Vanilla-2.6.12 | 152 s | 150 s | | | 239 s |
+====================+=======+=======+=======+=======+=======+
| with RT-V0.7.51-02 | 152 s | 153 s | | | 239 s |
+====================+=======+=======+=======+=======+=======+
| with Ipipe-0.7 | 149 s | 150 s | | | 236 s |
+====================+=======+=======+=======+=======+=======+
these numbers suggest that outside of ping-flooding all IRQ overhead
results are within statistical error.
So why do your "ping flood" results show such difference? It really is
just another type of interrupt workload and has nothing special in it.
> but keep in mind that these are standard tests used as-is by others
> [...]
are you suggesting this is not really a benchmark but a way to test how
well a particular system withholds against extreme external load?
> For one thing, the heavy fluctuation in ping packets may actually
> induce a state in the monitored kernel which is more akin to the one
> we want to measure than if we had a steady flow of packets.
so you can see ping packet flow fluctuations in your tests? Then you
cannot use those results as any sort of benchmark metric.
under PREEMPT_RT, if you wish to tone down the effects of an interrupt
source then all you have to do is something like:
P=$(pidof "IRQ "$(grep eth1 /proc/interrupts | cut -d: -f1 | xargs echo))
chrt -o -p 0 $P # net irq thread
renice -n 19 $P
chrt -o -p 0 5 # softirq-tx
renice -n 19 5
chrt -o -p 0 6 # softirq-rx
renice -n 19 6
and from this point on you should see zero lmbench overhead from flood
pinging. Can vanilla or I-PIPE do that?
Ingo
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: PREEMPT_RT and I-PIPE: the numbers, part 4
2005-07-11 7:05 ` Ingo Molnar
@ 2005-07-11 11:25 ` Karim Yaghmour
0 siblings, 0 replies; 14+ messages in thread
From: Karim Yaghmour @ 2005-07-11 11:25 UTC (permalink / raw)
To: Ingo Molnar
Cc: Kristian Benoit, linux-kernel, paulmck, bhuey, tglx, pmarques,
bruce, nickpiggin, ak, sdietrich, dwalker, hch, akpm, rpm
Ingo Molnar wrote:
> So why do your "ping flood" results show such difference? It really is
> just another type of interrupt workload and has nothing special in it.
...
> are you suggesting this is not really a benchmark but a way to test how
> well a particular system withholds against extreme external load?
Look, you're basically splitting hairs. No matter how involved an explanation
you can provide, it remains that both vanilla and I-pipe were subject to the
same load. If PREEMPT_RT consistently shows the same degradation under the
same setup, and that is indeed the case, then the problem is with PREEMPT_RT,
not the tests.
> so you can see ping packet flow fluctuations in your tests? Then you
> cannot use those results as any sort of benchmark metric.
I didn't say this. I said that if fluctuation there is, then maybe this is
something we want to see the effect of. In real world applications,
interrupts may not come in at a steady pace, as you try to achieve in your
own tests.
> and from this point on you should see zero lmbench overhead from flood
> pinging. Can vanilla or I-PIPE do that?
Let's not get into what I-pipe can or cannot do, that's not what these
numbers are about. It's pretty darn amazing that we're even having this
conversation. The PREEMPT_RT stuff is being worked on by more than a
dozen developers spread accross some of the most well-known Linux companies
out there (RedHat, MontaVista, IBM, TimeSys, etc.). Yet, despite this
massive involvement, here we have a patch developed by a single guy,
Philippe, who's doing this work outside his regular work hours, and his
patch, which does provide guaranteed deterministic behavior, is:
a) Much smaller than PREEMPT_RT
b) Less intrusive than PREEMPT_RT
c) Performs very well, as-good-as if not sometimes even better than PREEMPT_RT
Splitting hairs won't erase this reality. And again, before the I get the
PREEMPT_RT mob again on my back, this is just for the sake of argument,
both approaches remain valid, and are not mutually exclusive.
Like I said before, others are free to publish their own numbers showing
differently from what we've found.
Karim
--
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || karim@opersys.com || 1-866-677-4546
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2005-07-11 11:31 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-07-08 23:01 PREEMPT_RT and I-PIPE: the numbers, part 4 Kristian Benoit
2005-07-09 1:28 ` Karim Yaghmour
2005-07-09 7:19 ` Ingo Molnar
2005-07-09 15:39 ` Karim Yaghmour
2005-07-09 15:53 ` Karim Yaghmour
2005-07-09 15:53 ` Karim Yaghmour
2005-07-11 7:05 ` Ingo Molnar
2005-07-11 11:25 ` Karim Yaghmour
2005-07-09 17:22 ` Daniel Walker
2005-07-09 23:37 ` Bill Huey
2005-07-09 9:01 ` Paul Rolland
2005-07-09 14:47 ` Karim Yaghmour
2005-07-09 15:22 ` Ingo Molnar
2005-07-11 5:24 ` Ingo Molnar
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox