From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eric Dumazet Subject: Re: Linux TCP's Robustness to Multipath Packet Reordering Date: Mon, 25 Apr 2011 13:25:01 +0200 Message-ID: <1303730701.2747.110.camel@edumazet-laptop> References: Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: netdev@vger.kernel.org To: Dominik Kaspar Return-path: Received: from mail-ww0-f44.google.com ([74.125.82.44]:59215 "EHLO mail-ww0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758285Ab1DYLZH (ORCPT ); Mon, 25 Apr 2011 07:25:07 -0400 Received: by wwa36 with SMTP id 36so2372675wwa.1 for ; Mon, 25 Apr 2011 04:25:06 -0700 (PDT) In-Reply-To: Sender: netdev-owner@vger.kernel.org List-ID: Le lundi 25 avril 2011 =C3=A0 12:37 +0200, Dominik Kaspar a =C3=A9crit = : > Hello, >=20 > Knowing how critical packet reordering is for standard TCP, I am > currently testing how robust Linux TCP is when packets are forwarded > over multiple paths (with different bandwidth and RTT). Since Linux > TCP adapts its "dupAck threshold" to an estimated level of packet > reordering, I expect it to be much more robust than a standard TCP > that strictly follows the RFCs. Indeed, as you can see in the > following plot, my experiments show a step-wise adaptation of Linux > TCP to heavy reordering. After many minutes, Linux TCP finally reache= s > a data throughput close to the perfect aggregated data rate of two > paths (emulated with characteristics similar to IEEE 802.11b (WLAN) > and a 3G link (HSPA)): >=20 > http://home.simula.no/~kaspar/static/mptcp-emu-wlan-hspa-00.png >=20 > Does anyone have clues what's going on here? Why does the aggregated > throughput increase in steps? And what could be the reason it takes > minutes to adapt to the full capacity, when in other cases, Linux TCP > adapts much faster (for example if the bandwidth of both paths are > equal). I would highly appreciate some advice from the netdev > community. >=20 > Implementation details: > This multipath TCP experiment ran between a sending machine with a > single Ethernet interface (eth0) and a client with two Ethernet > interfaces (eth1, eth2). The machines are connected through a switch > and tc/netem is used to emulate the bandwidth and RTT of both paths. > TCP connections are established using iperf between eth0 and eth1 (th= e > primary path). At the sender, an iptables' NFQUEUE is used to "spoof" > the destination IP address of outgoing packets and force some to > travel to eth2 instead of eth1 (the secondary path). This multipath > scheduling happens in proportion to the emulated bandwidths, so if th= e > paths are set to 500 and 1000 KB/s, then packets are distributed in a > 1:2 ratio. At the client, iptables' RAWDNAT is used to translate the > spoofed IP addresses back to their original, so that all packets end > up at eth1, although a portion actually travelled to eth2. ACKs are > not scheduled over multiple paths, but always travel back on the > primary path. TCP does not notice anything of the multipath > forwarding, except the side-effect of packet reordering, which can be > huge if the path RTTs are set very differently. >=20 Hi Dominik Implementation details of the tc/netem stages are important to fully understand how TCP stack can react. Is TSO active at sender side for example ? Your results show that only some exceptional events make bandwidth really change. A tcpdump/pcap of ~10.000 first packets would be nice to provide (not o= n mailing list, but on your web site)