From mboxrd@z Thu Jan 1 00:00:00 1970 From: Rick Jones Subject: Re: TCP funny-ness when over-driving a 1Gbps link. Date: Thu, 19 May 2011 17:05:40 -0700 Message-ID: <1305849940.8149.1122.camel@tardy> References: <4DD59DF2.2070707@candelatech.com> <20110519161827.2ba4b40e@nehalam> <4DD5A5CD.7040303@candelatech.com> <4DD5AAFC.8070509@candelatech.com> Reply-To: rick.jones2@hp.com Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit Cc: Stephen Hemminger , netdev To: Ben Greear Return-path: Received: from g1t0027.austin.hp.com ([15.216.28.34]:22838 "EHLO g1t0027.austin.hp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752572Ab1ETAFm (ORCPT ); Thu, 19 May 2011 20:05:42 -0400 In-Reply-To: <4DD5AAFC.8070509@candelatech.com> Sender: netdev-owner@vger.kernel.org List-ID: On Thu, 2011-05-19 at 16:42 -0700, Ben Greear wrote: > On 05/19/2011 04:20 PM, Ben Greear wrote: > > On 05/19/2011 04:18 PM, Stephen Hemminger wrote: > > >> If you overdrive, TCP expects your network emulator to have > >> a some but limited queueing (like a real router). > > > > The emulator is fine, it's not being over-driven (and has limited > > queueing if it was > > being over-driven). The queues that are backing up are in the tcp > > sockets on the > > sending machine. > > > > But, just to make sure, I'll re-run the test with a looped back cable... > > Well, with looped back cable, it isn't so bad. I still see a small drop > in aggregate throughput (around 900Mbps instead of 950Mbps), and > latency goes above 600ms, but it still performs better than when > going through the emulator. > > At 950+Mbps, the emulator is going to impart 1-2 ms of latency > even when configured for wide-open. > > If I use a bridge in place of the emulator, it seems to settle on > around 450Mbps in one direction and 945Mbps in the other (on the wire), > with round-trip latencies often over 5 seconds (user-space to user-space), > and a consistent large chunk of data in the socket send buffers: > > [root@i7-965-1 igb]# netstat -an|grep tcp|grep 8.1.1 > tcp 0 0 8.1.1.1:33038 0.0.0.0:* LISTEN > tcp 0 0 8.1.1.1:33040 0.0.0.0:* LISTEN > tcp 0 0 8.1.1.1:33042 0.0.0.0:* LISTEN > tcp 0 9328612 8.1.1.2:33039 8.1.1.1:33040 ESTABLISHED > tcp 0 17083176 8.1.1.1:33038 8.1.1.2:33037 ESTABLISHED > tcp 0 9437340 8.1.1.2:33037 8.1.1.1:33038 ESTABLISHED > tcp 0 17024620 8.1.1.1:33040 8.1.1.2:33039 ESTABLISHED > tcp 0 19557040 8.1.1.1:33042 8.1.1.2:33041 ESTABLISHED > tcp 0 9416600 8.1.1.2:33041 8.1.1.1:33042 ESTABLISHED I take it your system has higher values for the tcp_wmem value: net.ipv4.tcp_wmem = 4096 16384 4194304 and whatever is creating the TCP connections is not making explicit setsockopt() calls to set SO_*BUF. rick jones