From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eric Dumazet Subject: Re: [PATCH] myr10ge: again fix lro_gen_skb() alignment Date: Wed, 29 Apr 2009 15:53:53 +0200 Message-ID: <49F85BF1.1020501@cosmosbay.com> References: <20090415.164248.188350673.davem@davemloft.net> <20090416085022.GA19731@gondor.apana.org.au> <49EE1C32.1060202@myri.com> <20090422104811.GA30981@gondor.apana.org.au> <49EF39B4.1040607@myri.com> <20090424054557.GA24575@gondor.apana.org.au> <49F1E5C8.7010303@myri.com> <20090427080501.GA21433@gondor.apana.org.au> <20090428061225.GA1591@gondor.apana.org.au> <49F71A00.5090701@myri.com> <20090428152047.GB7549@gondor.apana.org.au> <49F77134.9030907@myri.com> <49F85945.7030900@myri.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Herbert Xu , David Miller , brice@myri.com, sgruszka@redhat.com, netdev@vger.kernel.org To: Andrew Gallatin Return-path: Received: from gw1.cosmosbay.com ([212.99.114.194]:35053 "EHLO gw1.cosmosbay.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751526AbZD2Ny2 convert rfc822-to-8bit (ORCPT ); Wed, 29 Apr 2009 09:54:28 -0400 In-Reply-To: <49F85945.7030900@myri.com> Sender: netdev-owner@vger.kernel.org List-ID: Andrew Gallatin a =E9crit : > Andrew Gallatin wrote: >> For variety, I grabbed a different "slow" receiver. This is another >> 2 CPU machine, but a dual-socket single-core opteron (Tyan S2895) >> >> processor : 0 >> vendor_id : AuthenticAMD >> cpu family : 15 >> model : 37 >> model name : AMD Opteron(tm) Processor 252 > <...> >> The sender was an identical machine running an ancient RHEL4 kernel >> (2.6.9-42.ELsmp) and our downloadable (backported) driver. >> (http://www.myri.com/ftp/pub/Myri10GE/myri10ge-linux.1.4.4.tgz) >> I disabled LRO, on the sender. >> >> Binding the IRQ to CPU0, and the netserver to CPU1 I see 8.1Gb/s wit= h >> LRO and 8.0Gb/s with GRO. >=20 > With the recent patch to fix idle CPU time accounting from LKML appli= ed, > it is again possible to trust netperf's service demand (based on %CPU= ). > So here is raw netperf output for LRO and GRO, bound as above. >=20 > TCP SENDFILE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to > hail1-m.sw.myri.com (10.0.130.167) port 0 AF_INET : cpu bind > Recv Send Send Utilization Servic= e > Demand > Socket Socket Message Elapsed Send Recv Send = Recv > Size Size Size Time Throughput local remote local = remote > bytes bytes bytes secs. 10^6bits/s % S % S us/KB = =20 > us/KB >=20 > LRO: > 87380 65536 65536 60.00 8279.36 8.10 77.55 0.160 = 1.535 > GRO: > 87380 65536 65536 60.00 8053.19 7.86 85.47 0.160 = 1.739 >=20 > The difference is bigger if you disable TCP timestamps (and thus shri= nk > the packets headers down so they require fewer cachelines): > LRO: > 87380 65536 65536 60.02 7753.55 8.01 74.06 0.169 = 1.565 > GRO: > 87380 65536 65536 60.02 7535.12 7.27 84.57 0.158 = 1.839 >=20 >=20 > As you can see, even though the raw bandwidth is very close, the > service demand makes it clear that GRO is more expensive > than LRO. I just wish I understood why. >=20 What are "vmstat 1" ouputs on both tests ? Any difference on say... con= text switches ?