From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andi Kleen Subject: Re: Socket buffer sizes with autotuning Date: Fri, 25 Apr 2008 09:06:48 +0200 Message-ID: <48118308.1090407@firstfloor.org> References: Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit To: Jerry Chu , davem@davemloft.net, johnwheffner@gmail.com, rick.jones2@hp.com, netdev@vger.kernel.org Return-path: Received: from one.firstfloor.org ([213.235.205.2]:55953 "EHLO one.firstfloor.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752794AbYDYHGv (ORCPT ); Fri, 25 Apr 2008 03:06:51 -0400 In-Reply-To: Sender: netdev-owner@vger.kernel.org List-ID: [fixed cc and subject] Jerry Chu wrote: > On Thu, Apr 24, 2008 at 3:21 PM, Andi Kleen wrote: >> David Miller writes: >> >>>> What is your interface txqueuelen and mtu? If you have a very large >>>> interface queue, TCP will happily fill it up unless you are using a >>>> delay-based congestion controller. >>> Yes, that's the fundamental problem with loss based congestion >>> control. If there are any queues in the path, TCP will fill them up. >> That just means Linux does too much queueing by default. Perhaps that >> should be fixed. On Ethernet hardware the NIC TX queue should be >> usually sufficient anyways I would guess. Do we really need the long >> qdisc queue too? > > I think we really need the large xmit queue, especially when the CPU speed, > or the aggregated CPU bandwidth in the case of multi-cores, is >> NIC speed > for the following reason: > > If the qdisc and/or NIC queue is not large enough, it may not absorb the high > burst rate from the much faster CPU xmit threads, hence causing pkts to > be dropped before they hit the wire. sendmsg should just be a little smarter on when to block depending on the state of the interface. There is already some minor code for tnat as you'll have noted. Then the bursts would be much less of a problem. We already had this discussion recently together with better behaviour on bounding. The only big problem then would be if there are more submitting threads than packets in the TX queue, but I would consider that unlikely for GB+ NICs at least (might be an issue for older designs with smaller queues) > Here the CPU/NIC relation is much like > a router It doesn't need to be. Unlike a true network it is very cheap here to do direct feedback. > Removing the unnecessary cwnd growth by counting out those pkts that are > still stuck in the host queue may be a simpler solution. I'll find out > how well it > works soon. I think that's a great start, but probably not enough. -Andi