From mboxrd@z Thu Jan 1 00:00:00 1970 From: Rick Jones Subject: Re: Question about LRO/GRO and TCP acknowledgements Date: Mon, 13 Jun 2011 10:55:26 -0700 Message-ID: <1307987726.8149.3312.camel@tardy> References: <20110611215919.5fc29c27@konijn> <1307850224.22348.626.camel@localhost> <20110612095131.6d924082@konijn> <1307869632.2872.106.camel@edumazet-laptop> <20110612113004.79f48f40@konijn> <1307875698.2872.130.camel@edumazet-laptop> <20110612132428.3e1a4593@konijn> <1307890657.2872.158.camel@edumazet-laptop> Reply-To: rick.jones2@hp.com Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Joris van Rantwijk , netdev@vger.kernel.org To: Eric Dumazet Return-path: Received: from g5t0006.atlanta.hp.com ([15.192.0.43]:28474 "EHLO g5t0006.atlanta.hp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753492Ab1FMRz2 (ORCPT ); Mon, 13 Jun 2011 13:55:28 -0400 In-Reply-To: <1307890657.2872.158.camel@edumazet-laptop> Sender: netdev-owner@vger.kernel.org List-ID: On Sun, 2011-06-12 at 16:57 +0200, Eric Dumazet wrote: > Le dimanche 12 juin 2011 =C3=A0 13:24 +0200, Joris van Rantwijk a =C3= =A9crit : > > On 2011-06-12, Eric Dumazet wrote: > > > So your concern is more a Sender side implementation missing this > > > recommendation, not GRO per se... > >=20 > > Not really. The same RFC says: > > Specifically, an ACK SHOULD be generated for at least every > > second full-sized segment, ... > >=20 >=20 > Well, SHOULD is not MUST. >=20 >=20 > > I can see how the world may have been a better place if every sende= r > > implemented Appropriate Byte Counting and TCP receivers were allowe= d to > > send fewer ACKs. However, current reality is that ABC is optional, > > disabled by default in Linux, and receivers are recommended to send= one > > ACK per two segments. > >=20 >=20 > ABC might be nice for stacks that use byte counters for cwnd. We use > segments. >=20 > > I suspect that GRO currently hurts throughput of isolated TCP > > connections. This is based on a purely theoretic argument. I may be > > wrong and I have absolutely no data to confirm my suspicion. > >=20 > > If you can point out the flaw in my reasoning, I would be greatly > > relieved. Until then, I remain concerned that there may be somethin= g > > wrong with GRO and TCP ACKs. >=20 > Think of GRO being a receiver facility against stress/load, typically= in > datacenter. >=20 > Only when receiver is overloaded, GRO kicks in and can coalesce sever= al > frames before being handled in TCP stack in one run. How is that affected by interrupt coalescing in the NIC and the sending side doing TSO (and so, ostensibly sending back-to-back frames)? Are w= e assured that a NIC is updating its completion pointer on the rx ring continuously rather than just before a coalesced interrupt? Does GRO "never" kick-in over a 1GbE link (making the handwaving assumption that cores today are >> faster than a 1GbE link on a bulk transfer). It was just a quick and dirty test, but it does seem there is a positiv= e hit from GRO being enabled on a 1GbE link on a system with "fast processors" raj@tardy:~/netperf2_trunk$ sudo ethtool -K eth1 gro off raj@tardy:~/netperf2_trunk$ src/netperf -t TCP_MAERTS -H 192.168.1.3 -i 10,3 -c -- -k foo MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.1.3 (192.168.1.3) port 0 AF_INET : +/-2.500% @ 99% conf. : histogram : demo THROUGHPUT=3D935.07 LOCAL_INTERFACE_NAME=3Deth1 LOCAL_CPU_UTIL=3D16.64 LOCAL_SD=3D5.830 raj@tardy:~/netperf2_trunk$ sudo ethtool -K eth1 gro on raj@tardy:~/netperf2_trunk$ src/netperf -t TCP_MAERTS -H 192.168.1.3 -i 10,3 -c -- -k foo MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.1.3 (192.168.1.3) port 0 AF_INET : +/-2.500% @ 99% conf. : histogram : demo THROUGHPUT=3D934.81 LOCAL_INTERFACE_NAME=3Deth1 LOCAL_CPU_UTIL=3D16.21 LOCAL_SD=3D5.684 raj@tardy:~/netperf2_trunk$ uname -a Linux tardy 2.6.35-28-generic #50-Ubuntu SMP Fri Mar 18 18:42:20 UTC 2011 x86_64 GNU/Linux The receiver system here has a 3.07 GHz W3550 in it and eth1 is a port on an Intel 82571EB-based four-port card. raj@tardy:~/netperf2_trunk$ ethtool -i eth1 driver: e1000e version: 1.0.2-k4 firmware-version: 5.10-2 bus-info: 0000:2a:00.0 > If receiver is so loaded that more than 2 frames are coalesced in a N= API > run, it certainly helps to not allow sender to increase its cwnd more > than one SMSS. We probably are right before packet drops anyway. If we are indeed statistically certain we are right before packet drops (or I suppose asserting pause) then shouldn't ECN get set by the GRO code? rick