All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@digeo.com>
To: Linus Torvalds <torvalds@transmeta.com>
Cc: Andi Kleen <ak@suse.de>, davem@redhat.com, linux-kernel@vger.kernel.org
Subject: Re: [BENCHMARK] Lmbench 2.5.54-mm2 (impressive improvements)
Date: Sun, 05 Jan 2003 01:18:48 -0800	[thread overview]
Message-ID: <3E17F878.21A363BF@digeo.com> (raw)
In-Reply-To: Pine.LNX.4.44.0301041930300.1388-100000@home.transmeta.com

Linus Torvalds wrote:
> 
> ...
> It doesn't show up on lmbench (insufficient precision), but your AIM9
> numbers are quite interesting. Are they stable?

OK, a closer look.  This is on a dual 1.7G P4, with HT disabled (involuntarily,
grr.)   Looks like an 8-10% hit on context-switch intensive stuff.


2.5.54+BK
=========

Context switching - times in microseconds - smaller is better
-------------------------------------------------------------
Host                 OS 2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
                        ctxsw  ctxsw  ctxsw ctxsw  ctxsw   ctxsw   ctxsw
--------- ------------- ----- ------ ------ ------ ------ ------- -------
i686-linu  Linux 2.5.54    3      4     11     6     48      12      53

*Local* Communication latencies in microseconds - smaller is better
-------------------------------------------------------------------
Host                 OS 2p/0K  Pipe AF     UDP  RPC/   TCP  RPC/ TCP
                        ctxsw       UNIX         UDP         TCP conn

tbench 32:			(85k switches/sec)

Throughput 114.633 MB/sec (NB=143.291 MB/sec  1146.33 MBit/sec)
Throughput 114.157 MB/sec (NB=142.696 MB/sec  1141.57 MBit/sec)
Throughput 115.095 MB/sec (NB=143.869 MB/sec  1150.95 MBit/sec)

pollbench 1 100 5000		(118k switches/sec)
  result with handles 1 processes 100 loops 5000:time  8.371942 sec.
  result with handles 1 processes 100 loops 5000:time  8.381814 sec.
  result with handles 1 processes 100 loops 5000:time  8.367576 sec.
pollbench 2 100 2000		(105k switches/sec)
  result with handles 2 processes 100 loops 2000:time  3.694412 sec.
  result with handles 2 processes 100 loops 2000:time  3.672226 sec.
  result with handles 2 processes 100 loops 2000:time  3.657455 sec.
pollbench 5 100 2000		(79k switches/sec)
  result with handles 5 processes 100 loops 2000:time  4.564727 sec.
  result with handles 5 processes 100 loops 2000:time  4.783192 sec.
  result with handles 5 processes 100 loops 2000:time  4.561067 sec.

2.5.54+BK+broken-wrmsr-backout-patch:
=====================================


Context switching - times in microseconds - smaller is better
-------------------------------------------------------------
Host                 OS 2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
                        ctxsw  ctxsw  ctxsw ctxsw  ctxsw   ctxsw   ctxsw
--------- ------------- ----- ------ ------ ------ ------ ------- -------
i686-linu  Linux 2.5.54    3      4     11     6     48      12      53
i686-linu  Linux 2.5.54    1      3      8     4     40      10      51

*Local* Communication latencies in microseconds - smaller is better
-------------------------------------------------------------------
Host                 OS 2p/0K  Pipe AF     UDP  RPC/   TCP  RPC/ TCP
                        ctxsw       UNIX         UDP         TCP conn
--------- ------------- ----- ----- ---- ----- ----- ----- ----- ----
i686-linu  Linux 2.5.54     3    14   22    26          30         57
i686-linu  Linux 2.5.54     1    12   28    22          32         58


tbench 32:

Throughput 121.701 MB/sec (NB=152.126 MB/sec  1217.01 MBit/sec)
Throughput 124.958 MB/sec (NB=156.197 MB/sec  1249.58 MBit/sec)
Throughput 124.086 MB/sec (NB=155.107 MB/sec  1240.86 MBit/sec)

pollbench 1 100 5000
  result with handles 1 processes 100 loops 5000:time  7.306432 sec.
  result with handles 1 processes 100 loops 5000:time  7.352913 sec.
  result with handles 1 processes 100 loops 5000:time  7.337019 sec.
pollbench 2 100 2000
  result with handles 2 processes 100 loops 2000:time  3.184550 sec.
  result with handles 2 processes 100 loops 2000:time  3.251854 sec.
  result with handles 2 processes 100 loops 2000:time  3.209147 sec.
pollbench 5 100 2000
  result with handles 5 processes 100 loops 2000:time  4.135773 sec.
  result with handles 5 processes 100 loops 2000:time  4.117304 sec.
  result with handles 5 processes 100 loops 2000:time  4.119047 sec.


The tbench changes should probably be ignored.  After profiling tbench
I can say that this thoughput difference is _not_ due to the task switcher
change (__switch_to is only 1%).  I left the numbers here to show what
the effect of simply relinking and rebooting the kernel can be.


BTW, the pollbench numbers are not stunningly better than the 500MHz PIII:
pollbench 1 100 5000
  result with handles 1 processes 100 loops 5000:time  9.609487 sec.
pollbench 2 100 2000
  result with handles 2 processes 100 loops 2000:time  4.016496 sec.
pollbench 5 100 2000
  result with handles 5 processes 100 loops 2000:time  4.917921 sec.

I didn't profile the P4.  John has promised P4 oprofile support for
next week, which will be nice.

I did profile Manfred's pollbench on the PIII, uniprocessor build.  Note
that there is only a 5% throughput difference on this machine.  It's all
in __switch_to().   Here the PIII is doing 70k switches/sec.

2.5.54+BK:

c012abbc 534      2.69888     buffered_rmqueue
c0116714 617      3.11837     __wake_up_common
c010a606 635      3.20934     restore_all
c014b038 745      3.76529     do_poll
c013d4dc 757      3.82594     fget
c014551c 766      3.87142     pipe_write
c010a5c4 1249     6.31254     system_call
c014b0f0 1273     6.43384     sys_poll
c01090a4 1775     8.97099     __switch_to
c0116484 1922     9.71394     schedule

2.5.54+BK+backout-patch:

c012abbc 768      3.1024      buffered_rmqueue
c0116714 790      3.19127     __wake_up_common
c010a5e6 809      3.26803     restore_all
c013d4dc 918      3.70834     fget
c014551c 936      3.78105     pipe_write
c014b038 977      3.94668     do_poll
c01090a4 1070     4.32236     __switch_to
c014b0f0 1606     6.48758     sys_poll
c010a5a4 1678     6.77843     system_call
c0116484 2542     10.2686     schedule

  parent reply	other threads:[~2003-01-05  9:10 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <94F20261551DC141B6B559DC4910867204491F@blr-m3-msg.wipro.com.suse.lists.linux.kernel>
     [not found] ` <3E155903.F8C22286@digeo.com.suse.lists.linux.kernel>
2003-01-03 18:40   ` [BENCHMARK] Lmbench 2.5.54-mm2 (impressive improvements) Andi Kleen
2003-01-03 21:32     ` Andrew Morton
2003-01-05  1:01     ` Andrew Morton
2003-01-05  3:35       ` Linus Torvalds
2003-01-05  3:51         ` Linus Torvalds
2003-01-05  3:54         ` Andrew Morton
2003-01-05  3:52           ` Linus Torvalds
2003-01-05 10:06             ` Andi Kleen
2003-01-05 18:51               ` Linus Torvalds
2003-01-05 23:46                 ` Andi Kleen
2003-01-06  1:33                   ` Linus Torvalds
2003-01-06  2:05                     ` Andi Kleen
2003-01-06  0:58                 ` H. Peter Anvin
2003-01-05  9:18         ` Andrew Morton [this message]
2003-01-03  8:59 Aniruddha M Marathe
2003-01-03  9:33 ` Andrew Morton
2003-01-03 10:24   ` David S. Miller
2003-01-03 10:22     ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3E17F878.21A363BF@digeo.com \
    --to=akpm@digeo.com \
    --cc=ak@suse.de \
    --cc=davem@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@transmeta.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.