From mboxrd@z Thu Jan 1 00:00:00 1970 From: David Miller Subject: Re: Loopback performance from kernel 2.6.12 to 2.6.37 Date: Thu, 18 Nov 2010 09:48:37 -0800 (PST) Message-ID: <20101118.094837.260089185.davem@davemloft.net> References: <1289312742.18992.21.camel@edumazet-laptop> <1290088353.2781.137.camel@edumazet-laptop> <1290102113.2781.237.camel@edumazet-laptop> Mime-Version: 1.0 Content-Type: Text/Plain; charset=iso-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: jdb@comx.dk, netdev@vger.kernel.org, lawrence@brakmo.org To: eric.dumazet@gmail.com Return-path: Received: from 74-93-104-97-Washington.hfc.comcastbusiness.net ([74.93.104.97]:55697 "EHLO sunset.davemloft.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755048Ab0KRRsS convert rfc822-to-8bit (ORCPT ); Thu, 18 Nov 2010 12:48:18 -0500 In-Reply-To: <1290102113.2781.237.camel@edumazet-laptop> Sender: netdev-owner@vger.kernel.org List-ID: =46rom: Eric Dumazet Date: Thu, 18 Nov 2010 18:41:53 +0100 > Le jeudi 18 novembre 2010 =E0 14:52 +0100, Eric Dumazet a =E9crit : >> Le mardi 09 novembre 2010 =E0 15:25 +0100, Eric Dumazet a =E9crit : >>=20 >> My tests show a problem with backlog processing, and too big TCP >> windows. (at least on loopback and wild senders) >>=20 >> Basically, with huge tcp windows we have now (default 4 Mbytes), >> the reader process can have to process up to 4Mbytes of backlogged d= ata >> in __release_sock() before returning from its 'small' read(fd, buffe= r, >> 1024) done by netcat. >>=20 >> While it processes this backlog, it sends tcp ACKS, allowing sender = to >> send new frames that might be dropped because of sk_rcvqueues_full()= , or >> continue to fill receive queue up to the receiver window, feeding th= e >> task in __release_sock() [loop] >>=20 >>=20 >> This blows cpu caches completely [data is queued, and the dequeue is >> done long after], and latency of a single read() can be very high. T= his >> blocks the pipeline of user processing eventually. Thanks for looking into this Eric. We definitely need some kind of choke point so that TCP never significantly exceeds the congestion window point at which throughput stops increasing (and only latency does). One idea is that when we integrate Lawrence Brakmo's TCP-NV congestion control algorithm, we can try enabling it by default over loopback. Loopback is kind of an interesting case of the problem scenerio Lawrence is trying to solve.