From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jesper Dangaard Brouer Subject: Re: Loopback performance from kernel 2.6.12 to 2.6.37 Date: Wed, 10 Nov 2010 12:24:16 +0100 Message-ID: <1289388256.15004.66.camel@firesoul.comx.local> References: <1288954189.28003.178.camel@firesoul.comx.local> <1288988955.2665.297.camel@edumazet-laptop> <1289213926.15004.19.camel@firesoul.comx.local> <1289214289.2820.188.camel@edumazet-laptop> <1289228785.2820.203.camel@edumazet-laptop> <1289311159.17448.9.camel@traveldev.cxnet.dk> <1289312178.17448.20.camel@traveldev.cxnet.dk> <1289313516.17448.28.camel@traveldev.cxnet.dk> Mime-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 7bit Cc: netdev , acme@redhat.com To: Eric Dumazet Return-path: Received: from lanfw001a.cxnet.dk ([87.72.215.196]:55101 "EHLO lanfw001a.cxnet.dk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752402Ab0KJLYR (ORCPT ); Wed, 10 Nov 2010 06:24:17 -0500 In-Reply-To: <1289313516.17448.28.camel@traveldev.cxnet.dk> Sender: netdev-owner@vger.kernel.org List-ID: On Tue, 2010-11-09 at 15:38 +0100, Jesper Dangaard Brouer wrote: > On Tue, 2010-11-09 at 15:16 +0100, Jesper Dangaard Brouer wrote: > > On Tue, 2010-11-09 at 14:59 +0100, Jesper Dangaard Brouer wrote: > > > On Mon, 2010-11-08 at 16:06 +0100, Eric Dumazet wrote: > > > ... > > > > To fix this I added "-q 0" to netcat. Thus my working commands are: > > > > netcat -l -p 9999 >/dev/null & > > time dd if=/dev/zero bs=1M count=10000 | netcat -q0 127.0.0.1 9999 > > > > Running this on my "big" 10G testlab system, Dual Xeon 5550 2.67GHz, > > kernel version 2.6.32-5-amd64 (which I usually don't use) > > The results are 7.487 sec > > Using kernel 2.6.35.8-comx01+ (which is 35-stable with some minor > patches of my own) on the same type of hardware (our preprod server). > The result is 12 sec. > > time dd if=/dev/zero bs=1M count=10000 | netcat -q0 127.0.0.1 9999 > 10000+0 records in > 10000+0 records out > 10485760000 bytes (10 GB) copied, 12,0805 s, 868 MB/s > > real 0m12.082s > user 0m0.311s > sys 0m15.896s On the same system I can better performance IF I pin the processes on different CPUs. BUT the trick here is I choose CPUs with different "core id", thus I avoid the HT CPUs in the system (hint look in /proc/cpuinfo for choosing the CPUs). Commands: taskset 16 netcat -lv -p 9999 >/dev/null & time taskset 1 dd if=/dev/zero bs=1M count=10000 | taskset 4 netcat -q0 127.0.0.1 9999 Result: 10485760000 bytes (10 GB) copied, 8,74021 s, 1,2 GB/s real 0m8.742s user 0m0.208s sys 0m11.426s So, perhaps the Core i7 has a problem with the HT CPUs with this workload? Forcing dd and netcat on the same HT CPU gives a result of approx 18 sec! Commands: taskset 16 netcat -lv -p 9999 >/dev/null time taskset 1 dd if=/dev/zero bs=1M count=10000 | taskset 2 netcat -q0 127.0.0.1 9999 Result: 10485760000 bytes (10 GB) copied, 18,6575 s, 562 MB/s real 0m18.659s user 0m0.341s sys 0m18.969s > BUT perf top reveals that its probably related to the function > 'find_busiest_group' ... any kernel config hints how I get rid of that? The 'find_busiest_group' seems to be an artifact of "perf top", if I use "perf record" then the 'find_busiest_group' function disappears. Which is kind of strange, as 'find_busiest_group' seem the be related to sched_fair.c. perf --version perf version 2.6.35.7.1.g60d9c -- Med venlig hilsen / Best regards Jesper Brouer ComX Networks A/S Linux Network Kernel Developer Cand. Scient Datalog / MSc.CS Author of http://adsl-optimizer.dk LinkedIn: http://www.linkedin.com/in/brouer