From mboxrd@z Thu Jan 1 00:00:00 1970 From: dormando Subject: 3 packet TCP window limit? Date: Wed, 5 May 2010 02:10:49 -0700 (PDT) Message-ID: Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII To: netdev@vger.kernel.org Return-path: Received: from rydia.net ([216.218.163.68]:35393 "EHLO mail.rydia.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755064Ab0EEJQb (ORCPT ); Wed, 5 May 2010 05:16:31 -0400 Received: from [192.168.0.12] (c-24-7-50-3.hsd1.ca.comcast.net [24.7.50.3]) by mail.rydia.net (Postfix) with ESMTPA id D02573D1DF6 for ; Wed, 5 May 2010 02:10:49 -0700 (PDT) Sender: netdev-owner@vger.kernel.org List-ID: Hey, Noticed in Linux that no matter what sysctl variable I twiddle, or what TCP congestion algorithm is running, TCP will wait for remote acks after sending the first 3 packets. After that it's normal. Apologies, it's hard ot describe: Linux server listening. Remote -> SYN (RTT wait) Linux -> SYN/ACK Remote -> ACK Remote -> Packet (small HTTP request) (RTT wait) Linux -> Packet (x 3) Remote -> (returning acks per packet) (RTT wait) Linux -> More packets (up to window size) If the request response fits in 3 packets or less, that third RTT wait never happens. The remote client gets all its data, and sends back all the FIN/ACK packets for closing the connection. What's bizarre is that this 3 packet/4 packet barrier is regardless of how much data there is to send. I can cause the extra RTT to flip on or off by sending exactly +/- 1 byte to cause an extra packet. Holding the connection open and repeating the request any number of times runs just fine, after the initial request. You can pretty easily see this by: tc qdisc add dev eth0 root netem delay 100ms ... then fetching a 3k file, then 4k file from an http server running linux. Well. at least I can see this easily. I tried on a half dozen boxes (2.6.11 through 2.6.32). I'm trying to track down where in the code this is, or why my sysctl tuning isn't affecting it. I can't discern its purpose. The lag it causes is pretty awful for far away clients; adding 300ms of latency will make a small request take a full second, instead of 600ms. I'm slugging through the code but any insight would be greatly appreciated! -Dormando