public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* ~5x greater CPU load for a networked application when using 2.6.15-rt15-smp vs. 2.6.12-1.1390_FC4
@ 2006-02-23 19:55 Gautam H Thaker
  2006-02-23 20:15 ` Benjamin LaHaise
                   ` (3 more replies)
  0 siblings, 4 replies; 17+ messages in thread
From: Gautam H Thaker @ 2006-02-23 19:55 UTC (permalink / raw)
  To: linux-kernel; +Cc: Gautam H. Thaker - LM ATL, Ingo Molnar

The real-time patches at the URL below do a great job of endowing Linux with
real-time capabilities.

http://people.redhat.com/mingo/realtime-preempt/

It has been documented before (and accepted) that this patch turns Linux into
a RT kernel but considerably slows down the code paths, esp. thru the I/O
subsystem. I want to provide some additional measurements and seek opinions
of if it might ever be possible to improve on this situation.

In my tests I used 20 3GHZ Intel Xeon PCs on an isoloated gigabit network.
One of the nodes has a "monitor" process that listens to incoming UDP packets
from the other 19 nodes. Each node is sending approximately 2000 UDP
packets/sec to the monitor process for a total of about 38,000 incoming UDP
packest/sec. These UDP packets are small with application payload being ~10
bytes, for total BW usage of less than 4 Mbits/sec at application level and
less than 15 Mbits/sec counting all headers. (Total BW usage is not high but
there is a large number of packets that are coming in.) Monitor process does
some fairly simple processing per packet.

I measured the CPU usage of the "monitor" process when the testbed was used
with two different operating system. The monitor process is the "nalive.p"
process in the "top" output below. The CPU laod is fairly stable and "top"
gives the following information:

::::::::::::::
top:  2.6.12-1.1390_FC4    # STANDARD KERNEL
::::::::::::::
top - 14:34:39 up  2:32,  2 users,  load average: 0.10, 0.05, 0.01
Tasks:  56 total,   2 running,  54 sleeping,   0 stopped,   0 zombie
top - 14:35:32 up  2:33,  2 users,  load average: 0.11, 0.06, 0.01
Tasks:  56 total,   2 running,  54 sleeping,   0 stopped,   0 zombie
Cpu(s):  1.4% us,  7.0% sy,  0.0% ni, 80.8% id,  0.2% wa,  7.0% hi,  3.6% si
Mem:   2076008k total,   100292k used,  1975716k free,    16192k buffers
Swap:   128512k total,        0k used,   128512k free,    50376k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 4823 root     -66   0 22712 2236 1484 S  8.4  0.1   0:37.74 nalive.p
 4860 gthaker   16   0  7396 2380 1904 R  0.2  0.1   0:00.04 sshd
    1 root      16   0  1748  572  492 S  0.0  0.0   0:01.06 init
    2 root      34  19     0    0    0 S  0.0  0.0   0:00.00 ksoftirqd/0
    3 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 events/0


::::::::::::::
top:  2.6.15-rt15-smp.out   # REAL_TIME KERNEL
::::::::::::::
node0> top
top - 09:52:48 up  1:47,  3 users,  load average: 0.91, 1.05, 1.02
Tasks:  98 total,   1 running,  97 sleeping,   0 stopped,   0 zombie
Cpu(s):  2.5% us, 41.8% sy,  0.0% ni, 55.6% id,  0.1% wa,  0.0% hi,  0.0% si
Mem:   2058608k total,    88104k used,  1970504k free,     9072k buffers
Swap:   128512k total,        0k used,   128512k free,    39208k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 2906 root     -66   0 18624 2244 1480 S 41.4  0.1  27:11.21 nalive.p
    6 root     -91   0     0    0    0 S 32.3  0.0  21:04.53 softirq-net-rx/
 1379 root     -40  -5     0    0    0 S 14.5  0.0   9:54.76 IRQ 23
  400 root      15   0     0    0    0 S  0.2  0.0   0:00.13 kjournald
    1 root      16   0  1740  564  488 S  0.0  0.0   0:04.03 init

The %CPU is at 8% for the non-real-time, uniprocessor kernel, while it is at
least 41% (and may be 41.4%+32.3%+14.5% = 88%) for the real-time SMP kernel.)


My question is this: How much improvement in raw efficiency is possible for
real-time patches? We take very long view, so if there is a belief that in 5
years the penalty will be reduced from 5-10x in this application to less than
2x that would be great. If we think this is about as well as can be done it
helps knowing that too.

There is nothing else going on on the machines, all code paths should be
going down "happy path" with no contention or blocking - my naive view is
that a 2x overhead is possible, but 5-10x seems harder to understand. And
this is not the case of finding some large non-preemptible region - since
real-time performance is excellent, but about why the code paths seem so "heavy".

Gautam Thaker


^ permalink raw reply	[flat|nested] 17+ messages in thread
* RE: ~5x greater CPU load for a networked application when using 2.6.15-rt15-smp vs. 2.6.12-1.1390_FC4
@ 2006-07-11 18:08 Jonathan Walsh
  0 siblings, 0 replies; 17+ messages in thread
From: Jonathan Walsh @ 2006-07-11 18:08 UTC (permalink / raw)
  To: linux-kernel; +Cc: Gautam H. Thaker, mingo

As a follow up to previous emails (Gautam Thaker, Ingo Molnar, Ted Tso, et. al.) on the subject of large CPU overhead by the RT kernel when under heavy network load, I ran the following test in order to get more reasonable data.  I have 19 nodes with 20 "virtual" node processes sending UDP messages to a single host at a rate of 100Hz for 38,000 packets per second.  Using cyclesoak to determine cpu usage (over 240 samples, 1 sample per second), I found the following results:
 
RT kernel: linux-2.6.17-rt1-uni
Mean: 48.9%
Variance: 5.91
Standard Deviation: 2.43
 
Standard kernel: Standard Fedora Core 4
Mean: 23.2%
Variance: 0.237
Standard Deviation: 0.4867
 
Thus I found the average cpu load on the RT kernel to be 2.11 times that of the standard kernel.  Hopefully this information will be of some use.

-Jonathan Walsh
Distributed Processing Lab; Lockheed Martin Adv. Tech. Labs


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2006-07-11 18:08 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-02-23 19:55 ~5x greater CPU load for a networked application when using 2.6.15-rt15-smp vs. 2.6.12-1.1390_FC4 Gautam H Thaker
2006-02-23 20:15 ` Benjamin LaHaise
2006-02-23 20:58 ` Ingo Molnar
2006-02-23 21:06   ` Nish Aravamudan
2006-02-23 21:08     ` Ingo Molnar
2006-02-23 21:14       ` Nish Aravamudan
2006-02-23 22:07         ` Esben Nielsen
2006-02-24  8:03       ` Jan Engelhardt
2006-02-24 12:11   ` Andrew Morton
2006-02-24 20:06     ` Gautam H Thaker
2006-02-24 20:31       ` Andrew Morton
2006-02-24 20:44         ` Gautam H Thaker
2006-02-24 16:52 ` Theodore Ts'o
2006-02-24 19:25   ` Gautam H Thaker
2006-02-28 19:27 ` Matt Mackall
2006-02-28 22:19   ` Gautam H Thaker
  -- strict thread matches above, loose matches on Subject: below --
2006-07-11 18:08 Jonathan Walsh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox