public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* lmbench results for 2.4 and 2.5
@ 2003-03-22 16:11 Chris Friesen
  2003-03-22 16:31 ` William Lee Irwin III
                   ` (4 more replies)
  0 siblings, 5 replies; 9+ messages in thread
From: Chris Friesen @ 2003-03-22 16:11 UTC (permalink / raw)
  To: linux-kernel


My previous testing with unix sockets prompted me to do a few lmbench runs with 
2.4.19 and 2.5.65.  The results have me a bit concerned, as there is no area 
where 2.5 is faster and several where it is significantly slower.

In particular:

stat is 8 times worse
open/close are 7 times worse
fork is twice as expensive
tcp latency is 5 times worse
file deletion and mmap are both twice as expensive
tcp bandwidth is 5 times worse

Optimizing for muliple processors and heavy loads is nice, but this looks like 
its happening at the cost of basic performance.  Is this really the route we 
should be taking?



                  L M B E N C H  2 . 0   S U M M A R Y
                  ------------------------------------

Processor, Processes - times in microseconds - smaller is better
----------------------------------------------------------------
Host              OS  Mhz null null      open selct sig  sig  fork exec sh
                           call  I/O stat clos TCP   inst hndl proc proc proc
------ ------------- ---- ---- ---- ---- ---- ----- ---- ---- ---- ---- ----
doug    Linux 2.5.65  750 0.38 0.61 39.8 42.1       1.07 5.29 424. 2378 20.K
doug    Linux 2.5.65  750 0.38 0.54 40.2 44.2       1.07 5.31 439. 2386 20.K
doug    Linux 2.4.19  750 0.37 0.52 5.21 6.78  36.7 0.93 3.59 197. 1472 15.K

Context switching - times in microseconds - smaller is better
-------------------------------------------------------------
Host                 OS 2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
                         ctxsw  ctxsw  ctxsw ctxsw  ctxsw   ctxsw   ctxsw
--------- ------------- ----- ------ ------ ------ ------ ------- -------
doug       Linux 2.5.65 1.790 3.0300  118.7   46.1  158.3    46.5   158.2
doug       Linux 2.5.65 1.950 2.9800  122.6   46.3  159.5    47.1   158.7
doug       Linux 2.4.19 1.690 2.6700   92.9   44.4  155.2    45.0   155.8

*Local* Communication latencies in microseconds - smaller is better
-------------------------------------------------------------------
Host                 OS 2p/0K  Pipe AF     UDP  RPC/   TCP  RPC/ TCP
                         ctxsw       UNIX         UDP         TCP conn
--------- ------------- ----- ----- ---- ----- ----- ----- ----- ----
doug       Linux 2.5.65 1.790 8.926 16.3  29.7  60.6 171.5 204.6 216.
doug       Linux 2.5.65 1.950 9.695 18.1  28.6  59.8 173.4 207.0 212.
doug       Linux 2.4.19 1.690 6.146 12.4  17.8  44.2  26.2  66.6 101.

File & VM system latencies in microseconds - smaller is better
--------------------------------------------------------------
Host                 OS   0K File      10K File      Mmap    Prot    Page
                         Create Delete Create Delete  Latency Fault   Fault
--------- ------------- ------ ------ ------ ------  ------- -----   -----
doug       Linux 2.5.65  110.2   65.0  242.5  100.7   3130.0 0.621 4.00000
doug       Linux 2.5.65  110.1   63.5  237.2   96.6   3284.0 0.741 4.00000
doug       Linux 2.4.19   82.5   32.4  187.5   47.9   1660.0 1.177 3.00000

*Local* Communication bandwidths in MB/s - bigger is better
-----------------------------------------------------------
Host                OS  Pipe AF    TCP  File   Mmap  Bcopy  Bcopy  Mem   Mem
                              UNIX      reread reread (libc) (hand) read write
--------- ------------- ---- ---- ---- ------ ------ ------ ------ ---- -----
doug       Linux 2.5.65 167. 94.7 14.3  212.5  354.8  214.5  215.9 474. 328.4
doug       Linux 2.5.65 175. 86.3 14.2  216.3  354.1  211.4  210.9 474. 328.8
doug       Linux 2.4.19 220. 108. 86.4  238.2  369.1  215.5  215.0 496. 328.0





-- 
Chris Friesen                    | MailStop: 043/33/F10
Nortel Networks                  | work: (613) 765-0557
3500 Carling Avenue              | fax:  (613) 765-2986
Nepean, ON K2H 8E9 Canada        | email: cfriesen@nortelnetworks.com


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: lmbench results for 2.4 and 2.5
  2003-03-22 16:11 lmbench results for 2.4 and 2.5 Chris Friesen
@ 2003-03-22 16:31 ` William Lee Irwin III
  2003-03-22 16:37 ` Martin J. Bligh
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: William Lee Irwin III @ 2003-03-22 16:31 UTC (permalink / raw)
  To: Chris Friesen; +Cc: linux-kernel

On Sat, Mar 22, 2003 at 11:11:14AM -0500, Chris Friesen wrote:
> My previous testing with unix sockets prompted me to do a few lmbench runs 
> with 2.4.19 and 2.5.65.  The results have me a bit concerned, as there is 
> no area where 2.5 is faster and several where it is significantly slower.
> In particular:
> stat is 8 times worse
> open/close are 7 times worse
> fork is twice as expensive
> tcp latency is 5 times worse
> file deletion and mmap are both twice as expensive
> tcp bandwidth is 5 times worse
> Optimizing for muliple processors and heavy loads is nice, but this looks 
> like its happening at the cost of basic performance.  Is this really the 
> route we should be taking?

These aren't terribly informative without profiles (esp. cache perfctrs).

TCP to localhost was explained to me as some excess checksumming that
will eventually get removed before 2.6.0.

It's unclear why open()/close()/stat()/unlink() should be any different.

fork() is just rmap stuff. Try 2.5.65-mm2 and 2.5.65-mm3.


-- wli

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: lmbench results for 2.4 and 2.5
  2003-03-22 16:11 lmbench results for 2.4 and 2.5 Chris Friesen
  2003-03-22 16:31 ` William Lee Irwin III
@ 2003-03-22 16:37 ` Martin J. Bligh
  2003-03-22 17:29 ` Alan Cox
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Martin J. Bligh @ 2003-03-22 16:37 UTC (permalink / raw)
  To: Chris Friesen, linux-kernel

> My previous testing with unix sockets prompted me to do a few lmbench 
> runs with 2.4.19 and 2.5.65.  The results have me a bit concerned, as 
> there is no area where 2.5 is faster and several where it is 
> significantly slower.
> 
> In particular:
> 
> stat is 8 times worse
> open/close are 7 times worse
> fork is twice as expensive
> tcp latency is 5 times worse
> file deletion and mmap are both twice as expensive
> tcp bandwidth is 5 times worse
> 
> Optimizing for muliple processors and heavy loads is nice, but this 
> looks like its happening at the cost of basic performance.  Is this 
> really the route we should be taking?

I think you're jumping to conclusions about what causes this - let's
actually try to find the real root cause. These things have many different 
causes ... for instance, rmap has been found to be a problem in some 
workloads (especially things like the fork stuff). If you want to 
try 65-mjb1 with and without the the shared pagetable stuff, you
may get some different results. (if you have stability problems, try
doing a patch -p1 -R of 400-shpte, it seems a little fragile right now).

http://www.kernel.org/pub/linux/kernel/people/mbligh/2.5.65/

Also, if you can get kernel profiles for each test, that'd help to work
out the root cause.

M.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: lmbench results for 2.4 and 2.5
  2003-03-22 16:11 lmbench results for 2.4 and 2.5 Chris Friesen
  2003-03-22 16:31 ` William Lee Irwin III
  2003-03-22 16:37 ` Martin J. Bligh
@ 2003-03-22 17:29 ` Alan Cox
  2003-03-23  5:27 ` Linus Torvalds
  2003-03-24  6:08 ` lmbench results for 2.4 and 2.5 -- updated results Chris Friesen
  4 siblings, 0 replies; 9+ messages in thread
From: Alan Cox @ 2003-03-22 17:29 UTC (permalink / raw)
  To: Chris Friesen; +Cc: Linux Kernel Mailing List

On Sat, 2003-03-22 at 16:11, Chris Friesen wrote:
> My previous testing with unix sockets prompted me to do a few lmbench runs with 
> 2.4.19 and 2.5.65.  The results have me a bit concerned, as there is no area 
> where 2.5 is faster and several where it is significantly slower.

Are you building both with SMP off, and pre-empt off ? Also both with APM/ACPI off ?


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: lmbench results for 2.4 and 2.5
@ 2003-03-22 18:10 rwhron
  0 siblings, 0 replies; 9+ messages in thread
From: rwhron @ 2003-03-22 18:10 UTC (permalink / raw)
  To: linux-kernel

If someone wants to go through individual lmbench metrics
and find regression points, I have some data that I believe
is mostly very good.

There is lmbench info for a lot of 2.4 and 2.5 kernels in
these pages:
http://home.earthlink.net/~rwhron/kernel/k6-2-475.html
http://home.earthlink.net/~rwhron/kernel/old-k6-2-475.html

They are from 2 different Linux OS's, but the same piece
of hardware.  It would be best not to combine them because
of the OS differences.

If anyone feels like grabbing any of the data in my web
pages and graphing it, feel free to do so.

If you have any specific questions or want even more
data/background let me know.  I'd love for the data
to be more useful.

There is another page with a slew of quad xeon benchmarks.
http://home.earthlink.net/~rwhron/kernel/bigbox.html
-- 
Randy Hron


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: lmbench results for 2.4 and 2.5
  2003-03-22 16:11 lmbench results for 2.4 and 2.5 Chris Friesen
                   ` (2 preceding siblings ...)
  2003-03-22 17:29 ` Alan Cox
@ 2003-03-23  5:27 ` Linus Torvalds
  2003-03-24  6:08 ` lmbench results for 2.4 and 2.5 -- updated results Chris Friesen
  4 siblings, 0 replies; 9+ messages in thread
From: Linus Torvalds @ 2003-03-23  5:27 UTC (permalink / raw)
  To: linux-kernel

In article <3E7C8B22.7020505@nortelnetworks.com>,
Chris Friesen  <cfriesen@nortelnetworks.com> wrote:
>
>My previous testing with unix sockets prompted me to do a few lmbench runs with 
>2.4.19 and 2.5.65.  The results have me a bit concerned, as there is no area 
>where 2.5 is faster and several where it is significantly slower.

Try it with a modern library (like the one in the RH phoebe beta), and
you'll see system calls having sped up by a factor of two. Even on UP.

But there's certainly something wrong with your open/close/stat numbers.
I don't see anywhere _near_ those kinds of differences, and there are no
real SMP locking issues there either. Are you sure you're testing the
same setup?

Oh, and the TCP bandwidth thing is at least partly due to the fact that TCP
loopback does extra copies due to debugging code being enabled.

		Linus

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: lmbench results for 2.4 and 2.5 -- updated results
  2003-03-22 16:11 lmbench results for 2.4 and 2.5 Chris Friesen
                   ` (3 preceding siblings ...)
  2003-03-23  5:27 ` Linus Torvalds
@ 2003-03-24  6:08 ` Chris Friesen
  2003-03-24  8:39   ` Linus Torvalds
  4 siblings, 1 reply; 9+ messages in thread
From: Chris Friesen @ 2003-03-24  6:08 UTC (permalink / raw)
  To: linux-kernel


Okay, I'm somewhat chagrined but a bit relieved at the same time.  Linus' 
comment about being sure that I'm testing the same setup promped me to go 
through and doublecheck my config. Turns out that I had some debug stuff turned 
on.  Duh.

Here are the results of 2.4.20 and 2.5.65 with as close to matching configs as I 
could make them.

The ones that stand out are:
--fork/exec (due to rmap I assume?)
--mmap (also due to rmap?)
--select latency (any ideas?)
--udp latency (related to select latency?)
--page fault (is this significant?)
--tcp bandwidth (explained as debugging code)

Sorry about the bogus numbers last time around.

Chris


                  L M B E N C H  2 . 0   S U M M A R Y
                  ------------------------------------

Processor, Processes - times in microseconds - smaller is better
----------------------------------------------------------------
Host            OS  Mhz null null      open selct sig  sig  fork exec sh
                         call  I/O stat clos TCP   inst hndl proc proc proc
---- ------------- ---- ---- ---- ---- ---- ----- ---- ---- ---- ---- ----
doug  Linux 2.5.65  750 0.38 0.73 5.46 7.13  64.2 1.03 3.25 231. 1729 17.K
doug  Linux 2.4.20  750 0.37 0.50 3.84 5.48  17.5 0.96 3.36 185. 1373 15.K

Context switching - times in microseconds - smaller is better
-------------------------------------------------------------
Host            OS 2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
                    ctxsw  ctxsw  ctxsw ctxsw  ctxsw   ctxsw   ctxsw
---- ------------- ----- ------ ------ ------ ------ ------- -------
doug  Linux 2.5.65 1.420 2.9700  108.7   46.6  157.6    46.7   157.5
doug  Linux 2.4.20 1.120 2.3400   91.5   43.5  155.5    45.2   156.0

*Local* Communication latencies in microseconds - smaller is better
-------------------------------------------------------------------
Host            OS 2p/0K  Pipe AF     UDP  RPC/   TCP  RPC/ TCP
                    ctxsw       UNIX         UDP         TCP conn
---- ------------- ----- ----- ---- ----- ----- ----- ----- ----
doug  Linux 2.5.65 1.420 7.642 11.4  21.9  45.0  27.2  60.5 104.
doug  Linux 2.4.20 1.120 6.606 10.3  15.8  40.9  26.2  56.5 82.9

File & VM system latencies in microseconds - smaller is better
--------------------------------------------------------------
Host            OS   0K File      10K File      Mmap    Prot    Page
                    Create Delete Create Delete  Latency Fault   Fault
---- ------------- ------ ------ ------ ------  ------- -----   -----
doug  Linux 2.5.65   64.8   21.0  165.6   42.0   2550.0 0.946 4.00000
doug  Linux 2.4.20   66.1   20.5  192.8   51.6   1612.0 0.764 2.00000

*Local* Communication bandwidths in MB/s - bigger is better
-----------------------------------------------------------
Host           OS  Pipe AF    TCP  File   Mmap  Bcopy  Bcopy  Mem   Mem
                         UNIX      reread reread (libc) (hand) read write
---- ------------- ---- ---- ---- ------ ------ ------ ------ ---- -----
doug  Linux 2.5.65 196. 107. 51.1  222.5  363.4  217.9  217.6 489. 326.0
doug  Linux 2.4.20 233. 111. 90.0  253.6  370.0  223.8  226.1 498. 328.9



-- 
Chris Friesen                    | MailStop: 043/33/F10
Nortel Networks                  | work: (613) 765-0557
3500 Carling Avenue              | fax:  (613) 765-2986
Nepean, ON K2H 8E9 Canada        | email: cfriesen@nortelnetworks.com



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: lmbench results for 2.4 and 2.5 -- updated results
  2003-03-24  6:08 ` lmbench results for 2.4 and 2.5 -- updated results Chris Friesen
@ 2003-03-24  8:39   ` Linus Torvalds
  2003-03-24  9:03     ` William Lee Irwin III
  0 siblings, 1 reply; 9+ messages in thread
From: Linus Torvalds @ 2003-03-24  8:39 UTC (permalink / raw)
  To: linux-kernel

In article <3E7EA0F6.8000308@nortelnetworks.com>,
Chris Friesen  <cfriesen@nortelnetworks.com> wrote:
>
>Here are the results of 2.4.20 and 2.5.65 with as close to matching configs as I 
>could make them.
>
>The ones that stand out are:
>--fork/exec (due to rmap I assume?)
>--mmap (also due to rmap?)

Yes. You could try the objrmap patches, they are supposed to help. They
may be in -mm, I'm not sure.

>--select latency (any ideas?)

I think this is due to the extra TCP debugging, but it might be
something else. To disable the debugging, remove the setting of 
NETIF_F_TSO in linux/drivers/net/loopback.c, and re-test:

        /* Current netfilter will die with oom linearizing large skbs,
         * however this will be cured before 2.5.x is done.
         */
        dev->features          |= NETIF_F_TSO;

>--udp latency (related to select latency?)

I doubt it. But there might be some more overhead somewhere. You should
also run lmbench at least three times to get some feeling for the
variance of the numbers, it can be quite big.

>--page fault (is this significant?)

I don't think so, there's something strange with the lmbench pagefault
tests, it only has one significant digit of accuracy, and I don't even
know what it is testing. Because of the single lack of precision, it's
hard to tell what the real change is.

>--tcp bandwidth (explained as debugging code)

See if the NETIF_F_TSO change makes any difference. If performance is
still bad, holler.

		Linus

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: lmbench results for 2.4 and 2.5 -- updated results
  2003-03-24  8:39   ` Linus Torvalds
@ 2003-03-24  9:03     ` William Lee Irwin III
  0 siblings, 0 replies; 9+ messages in thread
From: William Lee Irwin III @ 2003-03-24  9:03 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, rwhron

Chris Friesen  <cfriesen@nortelnetworks.com> wrote:
>> The ones that stand out are:
>> --fork/exec (due to rmap I assume?)
>> --mmap (also due to rmap?)

On Mon, Mar 24, 2003 at 08:39:34AM +0000, Linus Torvalds wrote:
> Yes. You could try the objrmap patches, they are supposed to help. They
> may be in -mm, I'm not sure.

I recently asked Randy Hron which 2.5.x patches made the biggest
difference in the tests he's done. He pasted the following:

<hrandoz:#kernelnewbies>                                  null     null       
+               open    signal   signal    fork    execve  /bin/sh
<hrandoz:#kernelnewbies> kernel                           call      I/O    
+stat    fstat    close   install   handle  process  process  process
<hrandoz:#kernelnewbies> 2.5.65                            0.66  0.96298     3
+.60     1.48     5.31     1.92     3.89     1279     3233    13703
<hrandoz:#kernelnewbies> 2.5.65-mm1                        0.63  1.04114     3
+.65     1.57     6.39     2.29     3.92     1370     3621    13985
<hrandoz:#kernelnewbies> 2.5.65-mm2                        0.65  0.98654     3
+.64     1.46     6.88     1.91     3.94     1511     3676    13502
<hrandoz:#kernelnewbies> 2.5.65-mm2-anobjrmap              0.66  0.96061     3
+.82     1.45     5.38     1.90     4.68     1414     3497    13169
<hrandoz:#kernelnewbies> 2.2.23                            0.42  0.80455     4
+.76     1.24     5.77     1.43     2.74      788     2303    30829
<hrandoz:#kernelnewbies> 2.4.21-pre4aa3                    0.62  0.72201     3
+.44     1.02     5.32     1.41     3.43      848     2114    10117
<hrandoz:#kernelnewbies> 2.4.21-pre5                       0.62  0.75284     3
+.18     1.02     5.35     1.41     3.25      927     2559    11884
<hrandoz:#kernelnewbies> 2.4.21-pre5-akpm                  0.61  0.73119     3
+.32     1.02     5.28     1.41     3.16      865     2421    11636
<hrandoz:#kernelnewbies> 2.5.63-mjb1                       0.66  1.12795     4
+.01     1.64     6.66     1.92     4.49     1125     2793    12475
<hrandoz:#kernelnewbies> 2.5.62-mjb2                       0.64  1.09703     4
+.12     1.66     5.77     1.89     4.05     1128     2888    12669
<hrandoz:#kernelnewbies> 2.5.63-mjb2                       0.67  1.03824     4
+.12     1.66     5.87     1.90     4.39     1144     2985    12650
<hrandoz:#kernelnewbies> 2.5.62-mm3                        0.62  0.95155     4
+.72     1.42     7.55     1.90     3.92     1164     3073    13101


-- wli

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2003-03-24  8:53 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-03-22 16:11 lmbench results for 2.4 and 2.5 Chris Friesen
2003-03-22 16:31 ` William Lee Irwin III
2003-03-22 16:37 ` Martin J. Bligh
2003-03-22 17:29 ` Alan Cox
2003-03-23  5:27 ` Linus Torvalds
2003-03-24  6:08 ` lmbench results for 2.4 and 2.5 -- updated results Chris Friesen
2003-03-24  8:39   ` Linus Torvalds
2003-03-24  9:03     ` William Lee Irwin III
  -- strict thread matches above, loose matches on Subject: below --
2003-03-22 18:10 lmbench results for 2.4 and 2.5 rwhron

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox