* Linux TCP's Robustness to Multipath Packet Reordering @ 2011-04-25 10:37 Dominik Kaspar 2011-04-25 11:25 ` Eric Dumazet 2011-04-25 12:59 ` Carsten Wolff 0 siblings, 2 replies; 33+ messages in thread From: Dominik Kaspar @ 2011-04-25 10:37 UTC (permalink / raw) To: netdev Hello, Knowing how critical packet reordering is for standard TCP, I am currently testing how robust Linux TCP is when packets are forwarded over multiple paths (with different bandwidth and RTT). Since Linux TCP adapts its "dupAck threshold" to an estimated level of packet reordering, I expect it to be much more robust than a standard TCP that strictly follows the RFCs. Indeed, as you can see in the following plot, my experiments show a step-wise adaptation of Linux TCP to heavy reordering. After many minutes, Linux TCP finally reaches a data throughput close to the perfect aggregated data rate of two paths (emulated with characteristics similar to IEEE 802.11b (WLAN) and a 3G link (HSPA)): http://home.simula.no/~kaspar/static/mptcp-emu-wlan-hspa-00.png Does anyone have clues what's going on here? Why does the aggregated throughput increase in steps? And what could be the reason it takes minutes to adapt to the full capacity, when in other cases, Linux TCP adapts much faster (for example if the bandwidth of both paths are equal). I would highly appreciate some advice from the netdev community. Implementation details: This multipath TCP experiment ran between a sending machine with a single Ethernet interface (eth0) and a client with two Ethernet interfaces (eth1, eth2). The machines are connected through a switch and tc/netem is used to emulate the bandwidth and RTT of both paths. TCP connections are established using iperf between eth0 and eth1 (the primary path). At the sender, an iptables' NFQUEUE is used to "spoof" the destination IP address of outgoing packets and force some to travel to eth2 instead of eth1 (the secondary path). This multipath scheduling happens in proportion to the emulated bandwidths, so if the paths are set to 500 and 1000 KB/s, then packets are distributed in a 1:2 ratio. At the client, iptables' RAWDNAT is used to translate the spoofed IP addresses back to their original, so that all packets end up at eth1, although a portion actually travelled to eth2. ACKs are not scheduled over multiple paths, but always travel back on the primary path. TCP does not notice anything of the multipath forwarding, except the side-effect of packet reordering, which can be huge if the path RTTs are set very differently. Best regards, Dominik ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Linux TCP's Robustness to Multipath Packet Reordering 2011-04-25 10:37 Linux TCP's Robustness to Multipath Packet Reordering Dominik Kaspar @ 2011-04-25 11:25 ` Eric Dumazet 2011-04-25 14:35 ` Dominik Kaspar 2011-04-25 12:59 ` Carsten Wolff 1 sibling, 1 reply; 33+ messages in thread From: Eric Dumazet @ 2011-04-25 11:25 UTC (permalink / raw) To: Dominik Kaspar; +Cc: netdev Le lundi 25 avril 2011 à 12:37 +0200, Dominik Kaspar a écrit : > Hello, > > Knowing how critical packet reordering is for standard TCP, I am > currently testing how robust Linux TCP is when packets are forwarded > over multiple paths (with different bandwidth and RTT). Since Linux > TCP adapts its "dupAck threshold" to an estimated level of packet > reordering, I expect it to be much more robust than a standard TCP > that strictly follows the RFCs. Indeed, as you can see in the > following plot, my experiments show a step-wise adaptation of Linux > TCP to heavy reordering. After many minutes, Linux TCP finally reaches > a data throughput close to the perfect aggregated data rate of two > paths (emulated with characteristics similar to IEEE 802.11b (WLAN) > and a 3G link (HSPA)): > > http://home.simula.no/~kaspar/static/mptcp-emu-wlan-hspa-00.png > > Does anyone have clues what's going on here? Why does the aggregated > throughput increase in steps? And what could be the reason it takes > minutes to adapt to the full capacity, when in other cases, Linux TCP > adapts much faster (for example if the bandwidth of both paths are > equal). I would highly appreciate some advice from the netdev > community. > > Implementation details: > This multipath TCP experiment ran between a sending machine with a > single Ethernet interface (eth0) and a client with two Ethernet > interfaces (eth1, eth2). The machines are connected through a switch > and tc/netem is used to emulate the bandwidth and RTT of both paths. > TCP connections are established using iperf between eth0 and eth1 (the > primary path). At the sender, an iptables' NFQUEUE is used to "spoof" > the destination IP address of outgoing packets and force some to > travel to eth2 instead of eth1 (the secondary path). This multipath > scheduling happens in proportion to the emulated bandwidths, so if the > paths are set to 500 and 1000 KB/s, then packets are distributed in a > 1:2 ratio. At the client, iptables' RAWDNAT is used to translate the > spoofed IP addresses back to their original, so that all packets end > up at eth1, although a portion actually travelled to eth2. ACKs are > not scheduled over multiple paths, but always travel back on the > primary path. TCP does not notice anything of the multipath > forwarding, except the side-effect of packet reordering, which can be > huge if the path RTTs are set very differently. > Hi Dominik Implementation details of the tc/netem stages are important to fully understand how TCP stack can react. Is TSO active at sender side for example ? Your results show that only some exceptional events make bandwidth really change. A tcpdump/pcap of ~10.000 first packets would be nice to provide (not on mailing list, but on your web site) ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Linux TCP's Robustness to Multipath Packet Reordering 2011-04-25 11:25 ` Eric Dumazet @ 2011-04-25 14:35 ` Dominik Kaspar 2011-04-25 15:38 ` Eric Dumazet 2011-04-26 20:43 ` Eric Dumazet 0 siblings, 2 replies; 33+ messages in thread From: Dominik Kaspar @ 2011-04-25 14:35 UTC (permalink / raw) To: Eric Dumazet, Carsten Wolff; +Cc: netdev Hi Eric and Carsten, Thanks a lot for your quick replies. I don't have a tcpdump of this experiment, but here is the tcp_probe log that the plot is based on (I'll run a new test using tcpdump if you think that's more useful): http://home.simula.no/~kaspar/static/mptcp-emu-wlan-hspa-00.log I have also noticed what Carsten mentions, the tcp_reordering value is essential for this whole behavior. When I start an experiment and increase sysctl.net.ipv4.tcp_reordering during the running connection, the TCP throughput immediately jumps close to the aggregate of both paths. Without intervention, as in this experiment, tcp_reordering starts out as 3 and then makes small oscillations between 3 and 12 for more than 2 minutes. At about second 141, TCP somehow finds a new highest reordering value (23) and at the same time, the throughput jumps up "to the next level". The value of 23 is then used all the way until second 603, when the reordering value becomes 32 and the throughput again jumps up a level. I understand that tp->reordering is increased when reordering is detected, but what causes tp->reordering to sometimes be decreased back to 3? Also, why does a decrease back to 3 not make the whole procedure start all over again? For example, at second 1013.64, tp->reordering falls from 127 down to 3. A second later (1014.93) it then suddenly increases from 3 up to 32 without considering any numbers in between. Why it is now suddenly so fast? At the very beginning, it took 600 seconds to grow from 3 to 32 and afterward it just takes a second...? For the experiments, all default TCP options were used, meaning that SACK, DSACK, Timestamps, were all enabled. Not sure how to turn on/off TSO... so that is probably enabled, too. Path emulation is done with tc/netem at the receiver interfaces (eth1, eth2) with this script: http://home.simula.no/~kaspar/static/netem.sh Greetings, Dominik On Mon, Apr 25, 2011 at 1:25 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > Le lundi 25 avril 2011 à 12:37 +0200, Dominik Kaspar a écrit : >> Hello, >> >> Knowing how critical packet reordering is for standard TCP, I am >> currently testing how robust Linux TCP is when packets are forwarded >> over multiple paths (with different bandwidth and RTT). Since Linux >> TCP adapts its "dupAck threshold" to an estimated level of packet >> reordering, I expect it to be much more robust than a standard TCP >> that strictly follows the RFCs. Indeed, as you can see in the >> following plot, my experiments show a step-wise adaptation of Linux >> TCP to heavy reordering. After many minutes, Linux TCP finally reaches >> a data throughput close to the perfect aggregated data rate of two >> paths (emulated with characteristics similar to IEEE 802.11b (WLAN) >> and a 3G link (HSPA)): >> >> http://home.simula.no/~kaspar/static/mptcp-emu-wlan-hspa-00.png >> >> Does anyone have clues what's going on here? Why does the aggregated >> throughput increase in steps? And what could be the reason it takes >> minutes to adapt to the full capacity, when in other cases, Linux TCP >> adapts much faster (for example if the bandwidth of both paths are >> equal). I would highly appreciate some advice from the netdev >> community. >> >> Implementation details: >> This multipath TCP experiment ran between a sending machine with a >> single Ethernet interface (eth0) and a client with two Ethernet >> interfaces (eth1, eth2). The machines are connected through a switch >> and tc/netem is used to emulate the bandwidth and RTT of both paths. >> TCP connections are established using iperf between eth0 and eth1 (the >> primary path). At the sender, an iptables' NFQUEUE is used to "spoof" >> the destination IP address of outgoing packets and force some to >> travel to eth2 instead of eth1 (the secondary path). This multipath >> scheduling happens in proportion to the emulated bandwidths, so if the >> paths are set to 500 and 1000 KB/s, then packets are distributed in a >> 1:2 ratio. At the client, iptables' RAWDNAT is used to translate the >> spoofed IP addresses back to their original, so that all packets end >> up at eth1, although a portion actually travelled to eth2. ACKs are >> not scheduled over multiple paths, but always travel back on the >> primary path. TCP does not notice anything of the multipath >> forwarding, except the side-effect of packet reordering, which can be >> huge if the path RTTs are set very differently. >> > > Hi Dominik > > Implementation details of the tc/netem stages are important to fully > understand how TCP stack can react. > > Is TSO active at sender side for example ? > > Your results show that only some exceptional events make bandwidth > really change. > > A tcpdump/pcap of ~10.000 first packets would be nice to provide (not on > mailing list, but on your web site) ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Linux TCP's Robustness to Multipath Packet Reordering 2011-04-25 14:35 ` Dominik Kaspar @ 2011-04-25 15:38 ` Eric Dumazet 2011-04-26 16:58 ` Dominik Kaspar 2011-04-26 20:43 ` Eric Dumazet 1 sibling, 1 reply; 33+ messages in thread From: Eric Dumazet @ 2011-04-25 15:38 UTC (permalink / raw) To: Dominik Kaspar; +Cc: Carsten Wolff, netdev Le lundi 25 avril 2011 à 16:35 +0200, Dominik Kaspar a écrit : > Hi Eric and Carsten, > > Thanks a lot for your quick replies. I don't have a tcpdump of this > experiment, but here is the tcp_probe log that the plot is based on > (I'll run a new test using tcpdump if you think that's more useful): > > http://home.simula.no/~kaspar/static/mptcp-emu-wlan-hspa-00.log > > I have also noticed what Carsten mentions, the tcp_reordering value is > essential for this whole behavior. When I start an experiment and > increase sysctl.net.ipv4.tcp_reordering during the running connection, > the TCP throughput immediately jumps close to the aggregate of both > paths. Without intervention, as in this experiment, tcp_reordering > starts out as 3 and then makes small oscillations between 3 and 12 for > more than 2 minutes. At about second 141, TCP somehow finds a new > highest reordering value (23) and at the same time, the throughput > jumps up "to the next level". The value of 23 is then used all the way > until second 603, when the reordering value becomes 32 and the > throughput again jumps up a level. > > I understand that tp->reordering is increased when reordering is > detected, but what causes tp->reordering to sometimes be decreased > back to 3? Also, why does a decrease back to 3 not make the whole > procedure start all over again? For example, at second 1013.64, > tp->reordering falls from 127 down to 3. A second later (1014.93) it > then suddenly increases from 3 up to 32 without considering any > numbers in between. Why it is now suddenly so fast? At the very > beginning, it took 600 seconds to grow from 3 to 32 and afterward it > just takes a second...? > > For the experiments, all default TCP options were used, meaning that > SACK, DSACK, Timestamps, were all enabled. Not sure how to turn on/off > TSO... so that is probably enabled, too. Path emulation is done with > tc/netem at the receiver interfaces (eth1, eth2) with this script: > Since you have at sender a rule to spoof destination address of packets, you should make sure you dont send "super packets (up to 64Kbytes)", because it would stress the multipath more than you wanted to. This way, you send only normal packets (1500 MTU). ethtool -K eth0 tso off ethtool -K eth0 gso off I am pretty sure it should help your (atypic) workload. > http://home.simula.no/~kaspar/static/netem.sh > > Greetings, > Dominik ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Linux TCP's Robustness to Multipath Packet Reordering 2011-04-25 15:38 ` Eric Dumazet @ 2011-04-26 16:58 ` Dominik Kaspar 2011-04-26 17:10 ` Eric Dumazet 0 siblings, 1 reply; 33+ messages in thread From: Dominik Kaspar @ 2011-04-26 16:58 UTC (permalink / raw) To: Eric Dumazet; +Cc: Carsten Wolff, netdev Hi Eric, On Mon, Apr 25, 2011 at 5:38 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > > Since you have at sender a rule to spoof destination address of packets, > you should make sure you dont send "super packets (up to 64Kbytes)", > because it would stress the multipath more than you wanted to. This way, > you send only normal packets (1500 MTU). > > ethtool -K eth0 tso off > ethtool -K eth0 gso off > > I am pretty sure it should help your (atypic) workload. I made new experiments with the exact same multipath setup as before, but disabled TSO and GSO on all involved Ethernet interfaces. However, this did not seem to change much about TCP's behavior when packets are striped over heterogeneous paths. You can see the results of four 20-minute experiments on this plot: http://home.simula.no/~kaspar/static/mptcp-emu-wlan-hspa-01-tos0.png Cheers, Dominik ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Linux TCP's Robustness to Multipath Packet Reordering 2011-04-26 16:58 ` Dominik Kaspar @ 2011-04-26 17:10 ` Eric Dumazet 2011-04-26 18:00 ` Dominik Kaspar 0 siblings, 1 reply; 33+ messages in thread From: Eric Dumazet @ 2011-04-26 17:10 UTC (permalink / raw) To: Dominik Kaspar; +Cc: Carsten Wolff, netdev Le mardi 26 avril 2011 à 18:58 +0200, Dominik Kaspar a écrit : > Hi Eric, > > On Mon, Apr 25, 2011 at 5:38 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > > > > Since you have at sender a rule to spoof destination address of packets, > > you should make sure you dont send "super packets (up to 64Kbytes)", > > because it would stress the multipath more than you wanted to. This way, > > you send only normal packets (1500 MTU). > > > > ethtool -K eth0 tso off > > ethtool -K eth0 gso off > > > > I am pretty sure it should help your (atypic) workload. > > I made new experiments with the exact same multipath setup as before, > but disabled TSO and GSO on all involved Ethernet interfaces. However, > this did not seem to change much about TCP's behavior when packets are > striped over heterogeneous paths. You can see the results of four > 20-minute experiments on this plot: > > http://home.simula.no/~kaspar/static/mptcp-emu-wlan-hspa-01-tos0.png > > Cheers, > Dominik Hi Dominik Any chance to have a pcap file from sender side, of say first 10.000 packets ? ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Linux TCP's Robustness to Multipath Packet Reordering 2011-04-26 17:10 ` Eric Dumazet @ 2011-04-26 18:00 ` Dominik Kaspar 2011-04-26 20:16 ` John Heffner 0 siblings, 1 reply; 33+ messages in thread From: Dominik Kaspar @ 2011-04-26 18:00 UTC (permalink / raw) To: Eric Dumazet; +Cc: Carsten Wolff, netdev Hi Eric, Here are the tcpdump files for the first TSO-disabled experiment, in a full version and a short version with only the first 10000 packets: http://home.simula.no/~kaspar/static/mptcp-emu-wlan-hspa-01-tos0-exp1-full.pcap http://home.simula.no/~kaspar/static/mptcp-emu-wlan-hspa-01-tos0-exp1-short.pcap By the way, the packets are sent from the server (x.x.x.189) to the client interfaces (x.x.x.74) and (x.x.x.216) with the following pattern (which is a non-bursty 128-bit approximation of scheduling with a 600:400 ratio over primary path 0 and secondary path 1): 0010010100101001010010100101001010010100101001010010100101001010 0101001010010100101001010010100101001010010100101001010010100101 Greetings, Dominik On Tue, Apr 26, 2011 at 7:10 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > Le mardi 26 avril 2011 à 18:58 +0200, Dominik Kaspar a écrit : >> Hi Eric, >> >> On Mon, Apr 25, 2011 at 5:38 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: >> > >> > Since you have at sender a rule to spoof destination address of packets, >> > you should make sure you dont send "super packets (up to 64Kbytes)", >> > because it would stress the multipath more than you wanted to. This way, >> > you send only normal packets (1500 MTU). >> > >> > ethtool -K eth0 tso off >> > ethtool -K eth0 gso off >> > >> > I am pretty sure it should help your (atypic) workload. >> >> I made new experiments with the exact same multipath setup as before, >> but disabled TSO and GSO on all involved Ethernet interfaces. However, >> this did not seem to change much about TCP's behavior when packets are >> striped over heterogeneous paths. You can see the results of four >> 20-minute experiments on this plot: >> >> http://home.simula.no/~kaspar/static/mptcp-emu-wlan-hspa-01-tos0.png >> >> Cheers, >> Dominik > > Hi Dominik > > Any chance to have a pcap file from sender side, of say first 10.000 > packets ? > > > > ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Linux TCP's Robustness to Multipath Packet Reordering 2011-04-26 18:00 ` Dominik Kaspar @ 2011-04-26 20:16 ` John Heffner 2011-04-26 21:27 ` Dominik Kaspar 2011-04-27 9:57 ` Carsten Wolff 0 siblings, 2 replies; 33+ messages in thread From: John Heffner @ 2011-04-26 20:16 UTC (permalink / raw) To: Dominik Kaspar; +Cc: Eric Dumazet, Carsten Wolff, netdev First, TCP is definitely not designed to work under such conditions. For example, assumptions behind RTO calculation and fast retransmit heuristics are violated. However, in this particular case my first guess is that you are being limited by "cwnd moderation," which was the topic of recent discussion here. Under persistent reordering, cwnd moderation can inhibit the ability of cwnd to grow. Thanks, -John On Tue, Apr 26, 2011 at 2:00 PM, Dominik Kaspar <dokaspar.ietf@gmail.com> wrote: > Hi Eric, > > Here are the tcpdump files for the first TSO-disabled experiment, in a > full version and a short version with only the first 10000 packets: > > http://home.simula.no/~kaspar/static/mptcp-emu-wlan-hspa-01-tos0-exp1-full.pcap > http://home.simula.no/~kaspar/static/mptcp-emu-wlan-hspa-01-tos0-exp1-short.pcap > > By the way, the packets are sent from the server (x.x.x.189) to the > client interfaces (x.x.x.74) and (x.x.x.216) with the following > pattern (which is a non-bursty 128-bit approximation of scheduling > with a 600:400 ratio over primary path 0 and secondary path 1): > > 0010010100101001010010100101001010010100101001010010100101001010 > 0101001010010100101001010010100101001010010100101001010010100101 > > Greetings, > Dominik > > On Tue, Apr 26, 2011 at 7:10 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: >> Le mardi 26 avril 2011 à 18:58 +0200, Dominik Kaspar a écrit : >>> Hi Eric, >>> >>> On Mon, Apr 25, 2011 at 5:38 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: >>> > >>> > Since you have at sender a rule to spoof destination address of packets, >>> > you should make sure you dont send "super packets (up to 64Kbytes)", >>> > because it would stress the multipath more than you wanted to. This way, >>> > you send only normal packets (1500 MTU). >>> > >>> > ethtool -K eth0 tso off >>> > ethtool -K eth0 gso off >>> > >>> > I am pretty sure it should help your (atypic) workload. >>> >>> I made new experiments with the exact same multipath setup as before, >>> but disabled TSO and GSO on all involved Ethernet interfaces. However, >>> this did not seem to change much about TCP's behavior when packets are >>> striped over heterogeneous paths. You can see the results of four >>> 20-minute experiments on this plot: >>> >>> http://home.simula.no/~kaspar/static/mptcp-emu-wlan-hspa-01-tos0.png >>> >>> Cheers, >>> Dominik >> >> Hi Dominik >> >> Any chance to have a pcap file from sender side, of say first 10.000 >> packets ? >> >> >> >> > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Linux TCP's Robustness to Multipath Packet Reordering 2011-04-26 20:16 ` John Heffner @ 2011-04-26 21:27 ` Dominik Kaspar 2011-04-27 9:57 ` Carsten Wolff 1 sibling, 0 replies; 33+ messages in thread From: Dominik Kaspar @ 2011-04-26 21:27 UTC (permalink / raw) To: John Heffner; +Cc: Eric Dumazet, Carsten Wolff, netdev Hi John, Thanks for your advice. I am very well aware that TCP is not designed to work under such conditions. I am still surprised how well Linux TCP handles many situations of excessive, persistent packet reordering. In scenarios of fairly heterogeneous path characteristics, Linux TCP aggregates multiple paths close to ideally :-) If I'm not mistaken, cwnd moderation is a measure to prevent TCP from sending large bursts if a single ACK covers many segments. In what way can cwnd moderation prevent TCP from increasing its estimate of packet reordering? Greetings, Dominik On Tue, Apr 26, 2011 at 10:16 PM, John Heffner <johnwheffner@gmail.com> wrote: > First, TCP is definitely not designed to work under such conditions. > For example, assumptions behind RTO calculation and fast retransmit > heuristics are violated. However, in this particular case my first > guess is that you are being limited by "cwnd moderation," which was > the topic of recent discussion here. Under persistent reordering, > cwnd moderation can inhibit the ability of cwnd to grow. > > Thanks, > -John > > > On Tue, Apr 26, 2011 at 2:00 PM, Dominik Kaspar <dokaspar.ietf@gmail.com> wrote: >> Hi Eric, >> >> Here are the tcpdump files for the first TSO-disabled experiment, in a >> full version and a short version with only the first 10000 packets: >> >> http://home.simula.no/~kaspar/static/mptcp-emu-wlan-hspa-01-tos0-exp1-full.pcap >> http://home.simula.no/~kaspar/static/mptcp-emu-wlan-hspa-01-tos0-exp1-short.pcap >> >> By the way, the packets are sent from the server (x.x.x.189) to the >> client interfaces (x.x.x.74) and (x.x.x.216) with the following >> pattern (which is a non-bursty 128-bit approximation of scheduling >> with a 600:400 ratio over primary path 0 and secondary path 1): >> >> 0010010100101001010010100101001010010100101001010010100101001010 >> 0101001010010100101001010010100101001010010100101001010010100101 >> >> Greetings, >> Dominik >> >> On Tue, Apr 26, 2011 at 7:10 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: >>> Le mardi 26 avril 2011 à 18:58 +0200, Dominik Kaspar a écrit : >>>> Hi Eric, >>>> >>>> On Mon, Apr 25, 2011 at 5:38 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: >>>> > >>>> > Since you have at sender a rule to spoof destination address of packets, >>>> > you should make sure you dont send "super packets (up to 64Kbytes)", >>>> > because it would stress the multipath more than you wanted to. This way, >>>> > you send only normal packets (1500 MTU). >>>> > >>>> > ethtool -K eth0 tso off >>>> > ethtool -K eth0 gso off >>>> > >>>> > I am pretty sure it should help your (atypic) workload. >>>> >>>> I made new experiments with the exact same multipath setup as before, >>>> but disabled TSO and GSO on all involved Ethernet interfaces. However, >>>> this did not seem to change much about TCP's behavior when packets are >>>> striped over heterogeneous paths. You can see the results of four >>>> 20-minute experiments on this plot: >>>> >>>> http://home.simula.no/~kaspar/static/mptcp-emu-wlan-hspa-01-tos0.png >>>> >>>> Cheers, >>>> Dominik >>> >>> Hi Dominik >>> >>> Any chance to have a pcap file from sender side, of say first 10.000 >>> packets ? >>> >>> >>> >>> >> -- >> To unsubscribe from this list: send the line "unsubscribe netdev" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Linux TCP's Robustness to Multipath Packet Reordering 2011-04-26 20:16 ` John Heffner 2011-04-26 21:27 ` Dominik Kaspar @ 2011-04-27 9:57 ` Carsten Wolff 2011-04-27 16:22 ` Dominik Kaspar 1 sibling, 1 reply; 33+ messages in thread From: Carsten Wolff @ 2011-04-27 9:57 UTC (permalink / raw) To: John Heffner Cc: Dominik Kaspar, Eric Dumazet, netdev, Zimmermann Alexander, Lennart Schulte, Arnd Hannemann Hi all, On Tuesday 26 April 2011, John Heffner wrote: > First, TCP is definitely not designed to work under such conditions. > For example, assumptions behind RTO calculation and fast retransmit > heuristics are violated. However, in this particular case my first > guess is that you are being limited by "cwnd moderation," which was > the topic of recent discussion here. Under persistent reordering, > cwnd moderation can inhibit the ability of cwnd to grow. it's not just cwnd moderation (of which I'm still in favor, even though I lost the argument by inactivity ;-)). Anyway, there are a lot of things in reordering handling that can be improved. Our group (Alexander, Lennart, Arnd, myself and others) has worked on the problem for a long time now. This work resulted in an algorithm that is in large parts TCP-NCR (RFC4653), but also utilizes information gathered by reordering detection for determination of a good DupThresh, fixes a few problems in RFC4653 and improves on the reordering detection in Linux when the connection has no timestamps option. We implemented "pure" TCP-NCR and our own variant in Linux using a modular framework similar to the congestion control modules. A lot of measurements and evaluation have gone into the comparison of the three algorithms. We are now very close(TM) to a final patch, that is more suited for publication on this list and integrates our algorithm into tcp*. [hc] without introducing the overhead of that modular framework. Greetings, Carsten ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Linux TCP's Robustness to Multipath Packet Reordering 2011-04-27 9:57 ` Carsten Wolff @ 2011-04-27 16:22 ` Dominik Kaspar 2011-04-27 16:36 ` Alexander Zimmermann ` (2 more replies) 0 siblings, 3 replies; 33+ messages in thread From: Dominik Kaspar @ 2011-04-27 16:22 UTC (permalink / raw) To: Carsten Wolff Cc: John Heffner, Eric Dumazet, netdev, Zimmermann Alexander, Lennart Schulte, Arnd Hannemann Hi Carsten, Thanks for your feedback. I made some new tests with the same setup of packet-based forwarding over two emulated paths (600 KB/s, 10 ms) + (400 KB/s, 100 ms). In the first experiments, which showed a step-wise adaptation to reordering, SACK, DSACK, and Timestamps were all enabled. In the experiments, I individually disabled these three mechanisms and saw the following: - Disabling timestamps causes TCP to never adjust to reordering at all. - Disabling SACK allows TCP to adapt very rapidly ("perfect" aggregation!). - Disabling DSACK has no obvious impact (still a step-wise throughput). Is there an explanation for why turning off SACK can be beneficial in the presence of packet reordering? That sounds pretty counter-intuitive to me... I thought SACK=1 always performs better than SACK=0. The results are also illustrated in the following plot. For each setting, there are three runs, which all exhibit a similar behavior: http://home.simula.no/~kaspar/static/mptcp-emu-wlan-hspa-02-sack.png Greetings, Dominik On Wed, Apr 27, 2011 at 11:57 AM, Carsten Wolff <carsten@wolffcarsten.de> wrote: > Hi all, > > On Tuesday 26 April 2011, John Heffner wrote: >> First, TCP is definitely not designed to work under such conditions. >> For example, assumptions behind RTO calculation and fast retransmit >> heuristics are violated. However, in this particular case my first >> guess is that you are being limited by "cwnd moderation," which was >> the topic of recent discussion here. Under persistent reordering, >> cwnd moderation can inhibit the ability of cwnd to grow. > > it's not just cwnd moderation (of which I'm still in favor, even though I lost > the argument by inactivity ;-)). > > Anyway, there are a lot of things in reordering handling that can be improved. > Our group (Alexander, Lennart, Arnd, myself and others) has worked on the > problem for a long time now. This work resulted in an algorithm that is in > large parts TCP-NCR (RFC4653), but also utilizes information gathered by > reordering detection for determination of a good DupThresh, fixes a few > problems in RFC4653 and improves on the reordering detection in Linux when the > connection has no timestamps option. We implemented "pure" TCP-NCR and our own > variant in Linux using a modular framework similar to the congestion control > modules. A lot of measurements and evaluation have gone into the comparison of > the three algorithms. We are now very close(TM) to a final patch, that is more > suited for publication on this list and integrates our algorithm into tcp*. > [hc] without introducing the overhead of that modular framework. > > Greetings, > Carsten > ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Linux TCP's Robustness to Multipath Packet Reordering 2011-04-27 16:22 ` Dominik Kaspar @ 2011-04-27 16:36 ` Alexander Zimmermann 2011-06-21 11:25 ` Ilpo Järvinen 2011-04-27 16:48 ` Eric Dumazet 2011-04-27 17:39 ` Yuchung Cheng 2 siblings, 1 reply; 33+ messages in thread From: Alexander Zimmermann @ 2011-04-27 16:36 UTC (permalink / raw) To: Dominik Kaspar Cc: Carsten Wolff, John Heffner, Eric Dumazet, netdev, Lennart Schulte, Arnd Hannemann [-- Attachment #1: Type: text/plain, Size: 1835 bytes --] Hi, Am 27.04.2011 um 18:22 schrieb Dominik Kaspar: > Hi Carsten, > > Thanks for your feedback. I made some new tests with the same setup of > packet-based forwarding over two emulated paths (600 KB/s, 10 ms) + > (400 KB/s, 100 ms). In the first experiments, which showed a step-wise > adaptation to reordering, SACK, DSACK, and Timestamps were all > enabled. In the experiments, I individually disabled these three > mechanisms and saw the following: > > - Disabling timestamps causes TCP to never adjust to reordering at all. Reordering detection with DSACK is broken in Linux. We will fix that in a couple of weeks... > - Disabling SACK allows TCP to adapt very rapidly ("perfect" aggregation!). If you disable SACK, you will use the NewReno detection > - Disabling DSACK has no obvious impact (still a step-wise throughput). If Timestamps are enabled, Linux use Timestamps for detection. Regardless of DSACK. Timestamp detection is quicker. See RFC3522. (However, in case of an spurious FRet it's not so dramatical. In case of an Spurious RTO, you can avoid the go-back-n behavior) > > Is there an explanation for why turning off SACK can be beneficial in > the presence of packet reordering? That sounds pretty > counter-intuitive to me... I thought SACK=1 always performs better > than SACK=0. The results are also illustrated in the following plot. > For each setting, there are three runs, which all exhibit a similar > behavior: > > http://home.simula.no/~kaspar/static/mptcp-emu-wlan-hspa-02-sack.png > > Greetings, > Dominik > // // Dipl.-Inform. Alexander Zimmermann // Department of Computer Science, Informatik 4 // RWTH Aachen University // Ahornstr. 55, 52056 Aachen, Germany // phone: (49-241) 80-21422, fax: (49-241) 80-22222 // email: zimmermann@cs.rwth-aachen.de // web: http://www.umic-mesh.net // [-- Attachment #2: Signierter Teil der Nachricht --] [-- Type: application/pgp-signature, Size: 243 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Linux TCP's Robustness to Multipath Packet Reordering 2011-04-27 16:36 ` Alexander Zimmermann @ 2011-06-21 11:25 ` Ilpo Järvinen 2011-06-21 11:34 ` Carsten Wolff 0 siblings, 1 reply; 33+ messages in thread From: Ilpo Järvinen @ 2011-06-21 11:25 UTC (permalink / raw) To: Alexander Zimmermann Cc: Dominik Kaspar, Carsten Wolff, John Heffner, Eric Dumazet, Netdev, Lennart Schulte, Arnd Hannemann On Wed, 27 Apr 2011, Alexander Zimmermann wrote: > Hi, > > Am 27.04.2011 um 18:22 schrieb Dominik Kaspar: > > > Hi Carsten, > > > > Thanks for your feedback. I made some new tests with the same setup of > > packet-based forwarding over two emulated paths (600 KB/s, 10 ms) + > > (400 KB/s, 100 ms). In the first experiments, which showed a step-wise > > adaptation to reordering, SACK, DSACK, and Timestamps were all > > enabled. In the experiments, I individually disabled these three > > mechanisms and saw the following: > > > > - Disabling timestamps causes TCP to never adjust to reordering at all. > > Reordering detection with DSACK is broken in Linux. We will fix that in > a couple of weeks... > > > - Disabling SACK allows TCP to adapt very rapidly ("perfect" aggregation!). > > If you disable SACK, you will use the NewReno detection Which probably has some reordering over-estimate bugs on its own... (but I've forgotten details of my suspicion long time ago so please don't ask for the them). -- i. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Linux TCP's Robustness to Multipath Packet Reordering 2011-06-21 11:25 ` Ilpo Järvinen @ 2011-06-21 11:34 ` Carsten Wolff 2011-06-21 11:46 ` Ilpo Järvinen 0 siblings, 1 reply; 33+ messages in thread From: Carsten Wolff @ 2011-06-21 11:34 UTC (permalink / raw) To: Ilpo Järvinen Cc: Alexander Zimmermann, Dominik Kaspar, John Heffner, Eric Dumazet, Netdev, Lennart Schulte, Arnd Hannemann Hi, On Tuesday 21 June 2011, Ilpo Järvinen wrote: > On Wed, 27 Apr 2011, Alexander Zimmermann wrote: > > Am 27.04.2011 um 18:22 schrieb Dominik Kaspar: > > > Hi Carsten, > > > > > > Thanks for your feedback. I made some new tests with the same setup of > > > packet-based forwarding over two emulated paths (600 KB/s, 10 ms) + > > > (400 KB/s, 100 ms). In the first experiments, which showed a step-wise > > > adaptation to reordering, SACK, DSACK, and Timestamps were all > > > enabled. In the experiments, I individually disabled these three > > > mechanisms and saw the following: > > > > > > - Disabling timestamps causes TCP to never adjust to reordering at all. > > > > Reordering detection with DSACK is broken in Linux. We will fix that in > > a couple of weeks... > > > > > - Disabling SACK allows TCP to adapt very rapidly ("perfect" > > > aggregation!). > > > > If you disable SACK, you will use the NewReno detection > > Which probably has some reordering over-estimate bugs on its own... > (but I've forgotten details of my suspicion long time ago so please don't > ask for the them). the NewReno detection is clever, but there's no exact information it could utilize for a good metric, because it detects the event too late, when the information is already gone. In my experiments it always under-estimated the reordering extent, though. I also remmember thinking that the metric of the Eifel-detection has an off-by-one bug. Carsten ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Linux TCP's Robustness to Multipath Packet Reordering 2011-06-21 11:34 ` Carsten Wolff @ 2011-06-21 11:46 ` Ilpo Järvinen 0 siblings, 0 replies; 33+ messages in thread From: Ilpo Järvinen @ 2011-06-21 11:46 UTC (permalink / raw) To: Carsten Wolff Cc: Alexander Zimmermann, Dominik Kaspar, John Heffner, Eric Dumazet, Netdev, Lennart Schulte, Arnd Hannemann [-- Attachment #1: Type: TEXT/PLAIN, Size: 1783 bytes --] On Tue, 21 Jun 2011, Carsten Wolff wrote: > On Tuesday 21 June 2011, Ilpo Järvinen wrote: > > On Wed, 27 Apr 2011, Alexander Zimmermann wrote: > > > Am 27.04.2011 um 18:22 schrieb Dominik Kaspar: > > > > Hi Carsten, > > > > > > > > Thanks for your feedback. I made some new tests with the same setup of > > > > packet-based forwarding over two emulated paths (600 KB/s, 10 ms) + > > > > (400 KB/s, 100 ms). In the first experiments, which showed a step-wise > > > > adaptation to reordering, SACK, DSACK, and Timestamps were all > > > > enabled. In the experiments, I individually disabled these three > > > > mechanisms and saw the following: > > > > > > > > - Disabling timestamps causes TCP to never adjust to reordering at all. > > > > > > Reordering detection with DSACK is broken in Linux. We will fix that in > > > a couple of weeks... > > > > > > > - Disabling SACK allows TCP to adapt very rapidly ("perfect" > > > > aggregation!). > > > > > > If you disable SACK, you will use the NewReno detection > > > > Which probably has some reordering over-estimate bugs on its own... > > (but I've forgotten details of my suspicion long time ago so please don't > > ask for the them). > > the NewReno detection is clever, but there's no exact information it could > utilize for a good metric, because it detects the event too late, when the > information is already gone. In my experiments it always under-estimated the > reordering extent, though. I also remmember thinking that the metric of the > Eifel-detection has an off-by-one bug. That might be true for most of the cases but IIRC I figured out a a scenario where it miscalculates RTT worth of extra into the reordering (but I never really confirmed that in real tests or so, just figured it a bit). -- i. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Linux TCP's Robustness to Multipath Packet Reordering 2011-04-27 16:22 ` Dominik Kaspar 2011-04-27 16:36 ` Alexander Zimmermann @ 2011-04-27 16:48 ` Eric Dumazet 2011-04-27 17:39 ` Yuchung Cheng 2 siblings, 0 replies; 33+ messages in thread From: Eric Dumazet @ 2011-04-27 16:48 UTC (permalink / raw) To: Dominik Kaspar Cc: Carsten Wolff, John Heffner, netdev, Zimmermann Alexander, Lennart Schulte, Arnd Hannemann Le mercredi 27 avril 2011 à 18:22 +0200, Dominik Kaspar a écrit : > Hi Carsten, > > Thanks for your feedback. I made some new tests with the same setup of > packet-based forwarding over two emulated paths (600 KB/s, 10 ms) + > (400 KB/s, 100 ms). In the first experiments, which showed a step-wise > adaptation to reordering, SACK, DSACK, and Timestamps were all > enabled. In the experiments, I individually disabled these three > mechanisms and saw the following: > > - Disabling timestamps causes TCP to never adjust to reordering at all. > - Disabling SACK allows TCP to adapt very rapidly ("perfect" aggregation!). > - Disabling DSACK has no obvious impact (still a step-wise throughput). > > Is there an explanation for why turning off SACK can be beneficial in > the presence of packet reordering? That sounds pretty > counter-intuitive to me... I thought SACK=1 always performs better > than SACK=0. The results are also illustrated in the following plot. > For each setting, there are three runs, which all exhibit a similar > behavior: > > http://home.simula.no/~kaspar/static/mptcp-emu-wlan-hspa-02-sack.png > SACK is a win in a normal environnement, with few reorders, but some percents of losses ;) Given the limit of 3 blocks in SACK option, and your pretty asymetric paths (10ms and 100ms), SACK is useless and consume 12 bytes per frame... You really should add traces to every tp->reordering changes done in our TCP stack, its a 20 minutes patch, and would help you to understand where/when its increased/decreased. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Linux TCP's Robustness to Multipath Packet Reordering 2011-04-27 16:22 ` Dominik Kaspar 2011-04-27 16:36 ` Alexander Zimmermann 2011-04-27 16:48 ` Eric Dumazet @ 2011-04-27 17:39 ` Yuchung Cheng 2011-04-27 17:53 ` Alexander Zimmermann 2011-04-27 19:56 ` Dominik Kaspar 2 siblings, 2 replies; 33+ messages in thread From: Yuchung Cheng @ 2011-04-27 17:39 UTC (permalink / raw) To: Dominik Kaspar Cc: Carsten Wolff, John Heffner, Eric Dumazet, netdev, Zimmermann Alexander, Lennart Schulte, Arnd Hannemann Hi Dominik, On Wed, Apr 27, 2011 at 9:22 AM, Dominik Kaspar <dokaspar.ietf@gmail.com> wrote: > > Hi Carsten, > > Thanks for your feedback. I made some new tests with the same setup of > packet-based forwarding over two emulated paths (600 KB/s, 10 ms) + > (400 KB/s, 100 ms). In the first experiments, which showed a step-wise > adaptation to reordering, SACK, DSACK, and Timestamps were all > enabled. In the experiments, I individually disabled these three > mechanisms and saw the following: > > - Disabling timestamps causes TCP to never adjust to reordering at all. > - Disabling SACK allows TCP to adapt very rapidly ("perfect" aggregation!). Did you enable tcp_fack when sack is enabled? this may make a (big) difference. FACK assumes little network reordering and mark packet losses more aggressively. > - Disabling DSACK has no obvious impact (still a step-wise throughput). > > Is there an explanation for why turning off SACK can be beneficial in > the presence of packet reordering? That sounds pretty > counter-intuitive to me... I thought SACK=1 always performs better > than SACK=0. The results are also illustrated in the following plot. > For each setting, there are three runs, which all exhibit a similar > behavior: > > http://home.simula.no/~kaspar/static/mptcp-emu-wlan-hspa-02-sack.png > > Greetings, > Dominik > > On Wed, Apr 27, 2011 at 11:57 AM, Carsten Wolff <carsten@wolffcarsten.de> wrote: > > Hi all, > > > > On Tuesday 26 April 2011, John Heffner wrote: > >> First, TCP is definitely not designed to work under such conditions. > >> For example, assumptions behind RTO calculation and fast retransmit > >> heuristics are violated. However, in this particular case my first > >> guess is that you are being limited by "cwnd moderation," which was > >> the topic of recent discussion here. Under persistent reordering, > >> cwnd moderation can inhibit the ability of cwnd to grow. > > > > it's not just cwnd moderation (of which I'm still in favor, even though I lost > > the argument by inactivity ;-)). > > > > Anyway, there are a lot of things in reordering handling that can be improved. > > Our group (Alexander, Lennart, Arnd, myself and others) has worked on the > > problem for a long time now. This work resulted in an algorithm that is in > > large parts TCP-NCR (RFC4653), but also utilizes information gathered by > > reordering detection for determination of a good DupThresh, fixes a few > > problems in RFC4653 and improves on the reordering detection in Linux when the > > connection has no timestamps option. We implemented "pure" TCP-NCR and our own > > variant in Linux using a modular framework similar to the congestion control > > modules. A lot of measurements and evaluation have gone into the comparison of > > the three algorithms. We are now very close(TM) to a final patch, that is more > > suited for publication on this list and integrates our algorithm into tcp*. > > [hc] without introducing the overhead of that modular framework. > > > > Greetings, > > Carsten > > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Linux TCP's Robustness to Multipath Packet Reordering 2011-04-27 17:39 ` Yuchung Cheng @ 2011-04-27 17:53 ` Alexander Zimmermann 2011-04-27 19:56 ` Dominik Kaspar 1 sibling, 0 replies; 33+ messages in thread From: Alexander Zimmermann @ 2011-04-27 17:53 UTC (permalink / raw) To: Yuchung Cheng Cc: Dominik Kaspar, Carsten Wolff, John Heffner, Eric Dumazet, netdev, Lennart Schulte, Arnd Hannemann [-- Attachment #1: Type: text/plain, Size: 1323 bytes --] Hi, Am 27.04.2011 um 19:39 schrieb Yuchung Cheng: > Hi Dominik, > > On Wed, Apr 27, 2011 at 9:22 AM, Dominik Kaspar <dokaspar.ietf@gmail.com> wrote: >> >> Hi Carsten, >> >> Thanks for your feedback. I made some new tests with the same setup of >> packet-based forwarding over two emulated paths (600 KB/s, 10 ms) + >> (400 KB/s, 100 ms). In the first experiments, which showed a step-wise >> adaptation to reordering, SACK, DSACK, and Timestamps were all >> enabled. In the experiments, I individually disabled these three >> mechanisms and saw the following: >> >> - Disabling timestamps causes TCP to never adjust to reordering at all. >> - Disabling SACK allows TCP to adapt very rapidly ("perfect" aggregation!). > > Did you enable tcp_fack when sack is enabled? this may make a (big) > difference. FACK assumes little network reordering and mark packet > losses more aggressively. It's not necessary to do it manually. Linux will disable FACK as soon as it will detected reordering Alex // // Dipl.-Inform. Alexander Zimmermann // Department of Computer Science, Informatik 4 // RWTH Aachen University // Ahornstr. 55, 52056 Aachen, Germany // phone: (49-241) 80-21422, fax: (49-241) 80-22222 // email: zimmermann@cs.rwth-aachen.de // web: http://www.umic-mesh.net // [-- Attachment #2: Signierter Teil der Nachricht --] [-- Type: application/pgp-signature, Size: 243 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Linux TCP's Robustness to Multipath Packet Reordering 2011-04-27 17:39 ` Yuchung Cheng 2011-04-27 17:53 ` Alexander Zimmermann @ 2011-04-27 19:56 ` Dominik Kaspar 2011-04-27 21:41 ` Yuchung Cheng 1 sibling, 1 reply; 33+ messages in thread From: Dominik Kaspar @ 2011-04-27 19:56 UTC (permalink / raw) To: Yuchung Cheng Cc: Carsten Wolff, John Heffner, Eric Dumazet, netdev, Zimmermann Alexander, Lennart Schulte, Arnd Hannemann Hi Yuchung, Yes, FACK was enabled (as it is by default), but as Alexander already pointed out, it should be disabled automatically when TCP detects reordering. However, I am not so sure how well this automatic turning off FACK is done by Linux... I see a tendency that in situations with persistent packet reordering, TCP with FACK enabled gets a lower performance than if FACK is disabled right from the beginning of a connection. Greetings, Dominik On Wed, Apr 27, 2011 at 7:39 PM, Yuchung Cheng <ycheng@google.com> wrote: > Hi Dominik, > > On Wed, Apr 27, 2011 at 9:22 AM, Dominik Kaspar <dokaspar.ietf@gmail.com> wrote: >> >> Hi Carsten, >> >> Thanks for your feedback. I made some new tests with the same setup of >> packet-based forwarding over two emulated paths (600 KB/s, 10 ms) + >> (400 KB/s, 100 ms). In the first experiments, which showed a step-wise >> adaptation to reordering, SACK, DSACK, and Timestamps were all >> enabled. In the experiments, I individually disabled these three >> mechanisms and saw the following: >> >> - Disabling timestamps causes TCP to never adjust to reordering at all. >> - Disabling SACK allows TCP to adapt very rapidly ("perfect" aggregation!). > > Did you enable tcp_fack when sack is enabled? this may make a (big) > difference. FACK assumes little network reordering and mark packet > losses more aggressively. > >> - Disabling DSACK has no obvious impact (still a step-wise throughput). >> >> Is there an explanation for why turning off SACK can be beneficial in >> the presence of packet reordering? That sounds pretty >> counter-intuitive to me... I thought SACK=1 always performs better >> than SACK=0. The results are also illustrated in the following plot. >> For each setting, there are three runs, which all exhibit a similar >> behavior: >> >> http://home.simula.no/~kaspar/static/mptcp-emu-wlan-hspa-02-sack.png >> >> Greetings, >> Dominik >> >> On Wed, Apr 27, 2011 at 11:57 AM, Carsten Wolff <carsten@wolffcarsten.de> wrote: >> > Hi all, >> > >> > On Tuesday 26 April 2011, John Heffner wrote: >> >> First, TCP is definitely not designed to work under such conditions. >> >> For example, assumptions behind RTO calculation and fast retransmit >> >> heuristics are violated. However, in this particular case my first >> >> guess is that you are being limited by "cwnd moderation," which was >> >> the topic of recent discussion here. Under persistent reordering, >> >> cwnd moderation can inhibit the ability of cwnd to grow. >> > >> > it's not just cwnd moderation (of which I'm still in favor, even though I lost >> > the argument by inactivity ;-)). >> > >> > Anyway, there are a lot of things in reordering handling that can be improved. >> > Our group (Alexander, Lennart, Arnd, myself and others) has worked on the >> > problem for a long time now. This work resulted in an algorithm that is in >> > large parts TCP-NCR (RFC4653), but also utilizes information gathered by >> > reordering detection for determination of a good DupThresh, fixes a few >> > problems in RFC4653 and improves on the reordering detection in Linux when the >> > connection has no timestamps option. We implemented "pure" TCP-NCR and our own >> > variant in Linux using a modular framework similar to the congestion control >> > modules. A lot of measurements and evaluation have gone into the comparison of >> > the three algorithms. We are now very close(TM) to a final patch, that is more >> > suited for publication on this list and integrates our algorithm into tcp*. >> > [hc] without introducing the overhead of that modular framework. >> > >> > Greetings, >> > Carsten >> > >> -- >> To unsubscribe from this list: send the line "unsubscribe netdev" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Linux TCP's Robustness to Multipath Packet Reordering 2011-04-27 19:56 ` Dominik Kaspar @ 2011-04-27 21:41 ` Yuchung Cheng 2011-04-28 6:11 ` Alexander Zimmermann 0 siblings, 1 reply; 33+ messages in thread From: Yuchung Cheng @ 2011-04-27 21:41 UTC (permalink / raw) To: Dominik Kaspar Cc: Carsten Wolff, John Heffner, Eric Dumazet, netdev, Zimmermann Alexander, Lennart Schulte, Arnd Hannemann AFAIK, FACK is disabled throughout the life of the connection after sender detects reordering degree > 3. But Alex said the reordering has some bugs. I suspect these bugs may affect FACK/sack auto-tuning. Maybe Alex could describe the reordering bugs? On Wed, Apr 27, 2011 at 12:56 PM, Dominik Kaspar <dokaspar.ietf@gmail.com> wrote: > Hi Yuchung, > > Yes, FACK was enabled (as it is by default), but as Alexander already > pointed out, it should be disabled automatically when TCP detects > reordering. > > However, I am not so sure how well this automatic turning off FACK is > done by Linux... I see a tendency that in situations with persistent > packet reordering, TCP with FACK enabled gets a lower performance than > if FACK is disabled right from the beginning of a connection. > > Greetings, > Dominik > > On Wed, Apr 27, 2011 at 7:39 PM, Yuchung Cheng <ycheng@google.com> wrote: >> Hi Dominik, >> >> On Wed, Apr 27, 2011 at 9:22 AM, Dominik Kaspar <dokaspar.ietf@gmail.com> wrote: >>> >>> Hi Carsten, >>> >>> Thanks for your feedback. I made some new tests with the same setup of >>> packet-based forwarding over two emulated paths (600 KB/s, 10 ms) + >>> (400 KB/s, 100 ms). In the first experiments, which showed a step-wise >>> adaptation to reordering, SACK, DSACK, and Timestamps were all >>> enabled. In the experiments, I individually disabled these three >>> mechanisms and saw the following: >>> >>> - Disabling timestamps causes TCP to never adjust to reordering at all. >>> - Disabling SACK allows TCP to adapt very rapidly ("perfect" aggregation!). >> >> Did you enable tcp_fack when sack is enabled? this may make a (big) >> difference. FACK assumes little network reordering and mark packet >> losses more aggressively. >> >>> - Disabling DSACK has no obvious impact (still a step-wise throughput). >>> >>> Is there an explanation for why turning off SACK can be beneficial in >>> the presence of packet reordering? That sounds pretty >>> counter-intuitive to me... I thought SACK=1 always performs better >>> than SACK=0. The results are also illustrated in the following plot. >>> For each setting, there are three runs, which all exhibit a similar >>> behavior: >>> >>> http://home.simula.no/~kaspar/static/mptcp-emu-wlan-hspa-02-sack.png >>> >>> Greetings, >>> Dominik >>> >>> On Wed, Apr 27, 2011 at 11:57 AM, Carsten Wolff <carsten@wolffcarsten.de> wrote: >>> > Hi all, >>> > >>> > On Tuesday 26 April 2011, John Heffner wrote: >>> >> First, TCP is definitely not designed to work under such conditions. >>> >> For example, assumptions behind RTO calculation and fast retransmit >>> >> heuristics are violated. However, in this particular case my first >>> >> guess is that you are being limited by "cwnd moderation," which was >>> >> the topic of recent discussion here. Under persistent reordering, >>> >> cwnd moderation can inhibit the ability of cwnd to grow. >>> > >>> > it's not just cwnd moderation (of which I'm still in favor, even though I lost >>> > the argument by inactivity ;-)). >>> > >>> > Anyway, there are a lot of things in reordering handling that can be improved. >>> > Our group (Alexander, Lennart, Arnd, myself and others) has worked on the >>> > problem for a long time now. This work resulted in an algorithm that is in >>> > large parts TCP-NCR (RFC4653), but also utilizes information gathered by >>> > reordering detection for determination of a good DupThresh, fixes a few >>> > problems in RFC4653 and improves on the reordering detection in Linux when the >>> > connection has no timestamps option. We implemented "pure" TCP-NCR and our own >>> > variant in Linux using a modular framework similar to the congestion control >>> > modules. A lot of measurements and evaluation have gone into the comparison of >>> > the three algorithms. We are now very close(TM) to a final patch, that is more >>> > suited for publication on this list and integrates our algorithm into tcp*. >>> > [hc] without introducing the overhead of that modular framework. >>> > >>> > Greetings, >>> > Carsten >>> > >>> -- >>> To unsubscribe from this list: send the line "unsubscribe netdev" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Linux TCP's Robustness to Multipath Packet Reordering 2011-04-27 21:41 ` Yuchung Cheng @ 2011-04-28 6:11 ` Alexander Zimmermann 2011-06-19 15:22 ` Dominik Kaspar 0 siblings, 1 reply; 33+ messages in thread From: Alexander Zimmermann @ 2011-04-28 6:11 UTC (permalink / raw) To: Yuchung Cheng Cc: Dominik Kaspar, Carsten Wolff, John Heffner, Eric Dumazet, netdev, Lennart Schulte, Arnd Hannemann [-- Attachment #1: Type: text/plain, Size: 1240 bytes --] Am 27.04.2011 um 23:41 schrieb Yuchung Cheng: > AFAIK, FACK is disabled throughout the life of the connection after > sender detects reordering degree > 3. Right. > > But Alex said the reordering has some bugs. I suspect these bugs may > affect FACK/sack auto-tuning. No. It affects reordering detection only. > Maybe Alex could describe the reordering > bugs? > Yes, I can. With DSACK, you have two cases. DSACK below and above SEG.ACK. DSACK below SEG.ACK is the come case. However, Linux doesn't calculate a reordering extent in this case. We will fix this. In the other case, DSACK above SEG.ACK, Linux quantifies the reordering as the distance between the received DSACK and snd_fack. However, this is only correct if the reordering delay is greater the RTT. We will fix that too. As a result, in the first case, we waste the opportunity to calculate an extent (loosing performance). In the second case we overestimate the reordering. Alex // // Dipl.-Inform. Alexander Zimmermann // Department of Computer Science, Informatik 4 // RWTH Aachen University // Ahornstr. 55, 52056 Aachen, Germany // phone: (49-241) 80-21422, fax: (49-241) 80-22222 // email: zimmermann@cs.rwth-aachen.de // web: http://www.umic-mesh.net // [-- Attachment #2: Signierter Teil der Nachricht --] [-- Type: application/pgp-signature, Size: 243 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Linux TCP's Robustness to Multipath Packet Reordering 2011-04-28 6:11 ` Alexander Zimmermann @ 2011-06-19 15:22 ` Dominik Kaspar 2011-06-19 15:38 ` Alexander Zimmermann 0 siblings, 1 reply; 33+ messages in thread From: Dominik Kaspar @ 2011-06-19 15:22 UTC (permalink / raw) To: netdev Cc: Alexander Zimmermann, Yuchung Cheng, Carsten Wolff, John Heffner, Eric Dumazet, Lennart Schulte, Arnd Hannemann Hello again, I have another question to Linux TCP and packet reordering. What exactly happens, when a packet is so much delayed (but not causing a timeout), that it gets overtaken by a retransmitted version of itself? It seems to me that this results in "SACK reneging", but I don't really understand why... The simplified situation goes this: - Segment A gets sent and very much delayed (but not causing RTO) - Segments B, C, D cause dupACKs - Segment A_ret is retransmitted and ACKed (sent over new path) - Some more segments E, F, ... are sent and ACKed - Segment A (the delayed one) arrives at the receiver. - Now what exactly happens next...? I use default Linux TCP (with sack=1, dsack=1, fack=1, timestamps=1, ...) and the above described series of events is cause why transparently forwarding IP packets over multiple paths with RTTs of 10 and 100 milliseconds. I'd appreciate your help - best regards, Dominik ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Linux TCP's Robustness to Multipath Packet Reordering 2011-06-19 15:22 ` Dominik Kaspar @ 2011-06-19 15:38 ` Alexander Zimmermann 2011-06-19 16:25 ` Dominik Kaspar 0 siblings, 1 reply; 33+ messages in thread From: Alexander Zimmermann @ 2011-06-19 15:38 UTC (permalink / raw) To: Dominik Kaspar Cc: netdev, Yuchung Cheng, Carsten Wolff, John Heffner, Eric Dumazet, Lennart Schulte, Arnd Hannemann [-- Attachment #1: Type: text/plain, Size: 1631 bytes --] Hi, Am 19.06.2011 um 17:22 schrieb Dominik Kaspar: > Hello again, > > I have another question to Linux TCP and packet reordering. What > exactly happens, when a packet is so much delayed (but not causing a > timeout), that it gets overtaken by a retransmitted version of itself? > It seems to me that this results in "SACK reneging", but I don't > really understand why... in theory, you can detect this case with a combination of DSACK and timestamps. However, in practice a reordering delay greater than RTT will likely case an RTO (see RFC4653). IMO, if you have an packet reordering with an delay greater that the RTT, you have much more problems that SACK reneging > > The simplified situation goes this: > - Segment A gets sent and very much delayed (but not causing RTO) > - Segments B, C, D cause dupACKs > - Segment A_ret is retransmitted and ACKed (sent over new path) > - Some more segments E, F, ... are sent and ACKed > - Segment A (the delayed one) arrives at the receiver. > - Now what exactly happens next...? Receiver sends a DSACK > > I use default Linux TCP (with sack=1, dsack=1, fack=1, timestamps=1, > ...) and the above described series of events is cause why > transparently forwarding IP packets over multiple paths with RTTs of > 10 and 100 milliseconds. > > I'd appreciate your help - best regards, > Dominik // // Dipl.-Inform. Alexander Zimmermann // Department of Computer Science, Informatik 4 // RWTH Aachen University // Ahornstr. 55, 52056 Aachen, Germany // phone: (49-241) 80-21422, fax: (49-241) 80-22222 // email: zimmermann@cs.rwth-aachen.de // web: http://www.umic-mesh.net // [-- Attachment #2: Signierter Teil der Nachricht --] [-- Type: application/pgp-signature, Size: 243 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Linux TCP's Robustness to Multipath Packet Reordering 2011-06-19 15:38 ` Alexander Zimmermann @ 2011-06-19 16:25 ` Dominik Kaspar 2011-06-20 10:42 ` Ilpo Järvinen 0 siblings, 1 reply; 33+ messages in thread From: Dominik Kaspar @ 2011-06-19 16:25 UTC (permalink / raw) To: Alexander Zimmermann Cc: netdev, Yuchung Cheng, Carsten Wolff, John Heffner, Eric Dumazet, Lennart Schulte, Arnd Hannemann Hi Alexander, Ah... the receiver DSACKs the "original" packet. However, the sender already received an ACK for its retransmission and advances SND.UNA. When the DSACK finally arrives, it is actually outside of the SND.UNA - SND.NXT range, which causes the DSACK to trigger "SACK reneging". Did I get that right? :-) Cheers, Dominik On Sun, Jun 19, 2011 at 5:38 PM, Alexander Zimmermann <alexander.zimmermann@comsys.rwth-aachen.de> wrote: > Hi, > > Am 19.06.2011 um 17:22 schrieb Dominik Kaspar: > >> Hello again, >> >> I have another question to Linux TCP and packet reordering. What >> exactly happens, when a packet is so much delayed (but not causing a >> timeout), that it gets overtaken by a retransmitted version of itself? >> It seems to me that this results in "SACK reneging", but I don't >> really understand why... > > in theory, you can detect this case with a combination of DSACK > and timestamps. However, in practice a reordering delay greater than > RTT will likely case an RTO (see RFC4653). IMO, if you have an packet > reordering with an delay greater that the RTT, you have much more problems > that SACK reneging > >> >> The simplified situation goes this: >> - Segment A gets sent and very much delayed (but not causing RTO) >> - Segments B, C, D cause dupACKs >> - Segment A_ret is retransmitted and ACKed (sent over new path) >> - Some more segments E, F, ... are sent and ACKed >> - Segment A (the delayed one) arrives at the receiver. >> - Now what exactly happens next...? > > Receiver sends a DSACK > >> >> I use default Linux TCP (with sack=1, dsack=1, fack=1, timestamps=1, >> ...) and the above described series of events is cause why >> transparently forwarding IP packets over multiple paths with RTTs of >> 10 and 100 milliseconds. >> >> I'd appreciate your help - best regards, >> Dominik > > // > // Dipl.-Inform. Alexander Zimmermann > // Department of Computer Science, Informatik 4 > // RWTH Aachen University > // Ahornstr. 55, 52056 Aachen, Germany > // phone: (49-241) 80-21422, fax: (49-241) 80-22222 > // email: zimmermann@cs.rwth-aachen.de > // web: http://www.umic-mesh.net > // > > ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Linux TCP's Robustness to Multipath Packet Reordering 2011-06-19 16:25 ` Dominik Kaspar @ 2011-06-20 10:42 ` Ilpo Järvinen 2011-06-20 12:52 ` Dominik Kaspar 0 siblings, 1 reply; 33+ messages in thread From: Ilpo Järvinen @ 2011-06-20 10:42 UTC (permalink / raw) To: Dominik Kaspar Cc: Alexander Zimmermann, Netdev, Yuchung Cheng, Carsten Wolff, John Heffner, Eric Dumazet, Lennart Schulte, Arnd Hannemann On Sun, 19 Jun 2011, Dominik Kaspar wrote: > Ah... the receiver DSACKs the "original" packet. However, the sender > already received an ACK for its retransmission and advances SND.UNA. > When the DSACK finally arrives, it is actually outside of the SND.UNA > - SND.NXT range, which causes the DSACK to trigger "SACK reneging". > Did I get that right? :-) Where did you get this idea of reneging?!? Reneging has nothing to do with DSACKs, instead it is only detected if the cumulative ACK stops to such boundary where the _next_ segment is SACKed (i.e., some reason the receiver "didn't bother" to cumulatively ACK for that too). ...That certainly does not happen (ever) for out of window DSACKs. -- i. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Linux TCP's Robustness to Multipath Packet Reordering 2011-06-20 10:42 ` Ilpo Järvinen @ 2011-06-20 12:52 ` Dominik Kaspar 2011-06-21 11:35 ` Ilpo Järvinen 0 siblings, 1 reply; 33+ messages in thread From: Dominik Kaspar @ 2011-06-20 12:52 UTC (permalink / raw) To: Ilpo Järvinen Cc: Alexander Zimmermann, Netdev, Yuchung Cheng, Carsten Wolff, John Heffner, Eric Dumazet, Lennart Schulte, Arnd Hannemann Hi Ilpo, > Where did you get this idea of reneging?!? I observed that my scenario of a retransmitted packet overtaking the original somehow causes TCP to enter the "Loss" state although no RTO was caused. And since the Loss state seems to be only entered due to RTO timeout or SACK reneging, I got the idea that reneging must be occurring. > Reneging has nothing to do with DSACKs, > instead it is only detected if the cumulative ACK stops to such > boundary where the _next_ segment is SACKed (i.e., some reason > the receiver "didn't bother" to cumulatively ACK for that too). ... > That certainly does not happen (ever) for out of window DSACKs. You are right. If I turn off DSACK, the same thing happens: TCP enters the Loss state without timeouts occurring. Isn't that a sign of reneging happening? What else can it be? Dominik ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Linux TCP's Robustness to Multipath Packet Reordering 2011-06-20 12:52 ` Dominik Kaspar @ 2011-06-21 11:35 ` Ilpo Järvinen 0 siblings, 0 replies; 33+ messages in thread From: Ilpo Järvinen @ 2011-06-21 11:35 UTC (permalink / raw) To: Dominik Kaspar Cc: Alexander Zimmermann, Netdev, Yuchung Cheng, Carsten Wolff, John Heffner, Eric Dumazet, Lennart Schulte, Arnd Hannemann On Mon, 20 Jun 2011, Dominik Kaspar wrote: > > Where did you get this idea of reneging?!? > > I observed that my scenario of a retransmitted packet overtaking the > original somehow causes TCP to enter the "Loss" state although no RTO > was caused. And since the Loss state seems to be only entered due to > RTO timeout or SACK reneging, I got the idea that reneging must be > occurring. > > > Reneging has nothing to do with DSACKs, > > instead it is only detected if the cumulative ACK stops to such > > boundary where the _next_ segment is SACKed (i.e., some reason > > the receiver "didn't bother" to cumulatively ACK for that too). ... > > That certainly does not happen (ever) for out of window DSACKs. > > You are right. If I turn off DSACK, the same thing happens: TCP enters > the Loss state without timeouts occurring. Isn't that a sign of > reneging happening? What else can it be? There's a MIB for reneging from where you should be able to confirm that it did(n't) happen... Please note that tcpprobe is only run per ACK (not on timeouts), and FRTO (enabled by default) doesn't even cause CA_Loss entry immediately but slightly later on once it has figured out that the timeout doesn't seem to be spurious. -- i. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Linux TCP's Robustness to Multipath Packet Reordering 2011-04-25 14:35 ` Dominik Kaspar 2011-04-25 15:38 ` Eric Dumazet @ 2011-04-26 20:43 ` Eric Dumazet 2011-04-26 21:04 ` Dominik Kaspar 1 sibling, 1 reply; 33+ messages in thread From: Eric Dumazet @ 2011-04-26 20:43 UTC (permalink / raw) To: Dominik Kaspar; +Cc: Carsten Wolff, netdev Le lundi 25 avril 2011 à 16:35 +0200, Dominik Kaspar a écrit : > For the experiments, all default TCP options were used, meaning that > SACK, DSACK, Timestamps, were all enabled. Not sure how to turn on/off > TSO... so that is probably enabled, too. Path emulation is done with > tc/netem at the receiver interfaces (eth1, eth2) with this script: > > http://home.simula.no/~kaspar/static/netem.sh > What are the exact parameters ? (queue size for instance) It would be nice to give detailed stats after one run, on receiver (since you have netem on ingress side) tc -s -d qdisc ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Linux TCP's Robustness to Multipath Packet Reordering 2011-04-26 20:43 ` Eric Dumazet @ 2011-04-26 21:04 ` Dominik Kaspar 2011-04-26 21:08 ` Eric Dumazet 0 siblings, 1 reply; 33+ messages in thread From: Dominik Kaspar @ 2011-04-26 21:04 UTC (permalink / raw) To: Eric Dumazet; +Cc: Carsten Wolff, netdev On Tue, Apr 26, 2011 at 10:43 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > Le lundi 25 avril 2011 à 16:35 +0200, Dominik Kaspar a écrit : > >> For the experiments, all default TCP options were used, meaning that >> SACK, DSACK, Timestamps, were all enabled. Not sure how to turn on/off >> TSO... so that is probably enabled, too. Path emulation is done with >> tc/netem at the receiver interfaces (eth1, eth2) with this script: >> >> http://home.simula.no/~kaspar/static/netem.sh >> > > What are the exact parameters ? (queue size for instance) > > It would be nice to give detailed stats after one run, on receiver > (since you have netem on ingress side) > > tc -s -d qdisc In these experiments, a queue size of 1000 packets was specified. I am aware that this is typically referred to as "buffer bloat" and causes the RTT and the cwnd to grow excessively. The smaller I configure the queues, the more time it takes for TCP to "level up" to the aggregate throughput. By keeping the queues so large, I hope to more quickly identify the reason why TCP is actually able to adjust to the immense multipath reordering. What parameters could be highly relevant, other than the queue size? Thanks for the tip about printing tc/netem statistics after each run, I will use "tc -s -d qdisc" next time. Greetings, Dominik ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Linux TCP's Robustness to Multipath Packet Reordering 2011-04-26 21:04 ` Dominik Kaspar @ 2011-04-26 21:08 ` Eric Dumazet 2011-04-26 21:16 ` Dominik Kaspar 2011-04-26 21:17 ` Eric Dumazet 0 siblings, 2 replies; 33+ messages in thread From: Eric Dumazet @ 2011-04-26 21:08 UTC (permalink / raw) To: Dominik Kaspar; +Cc: Carsten Wolff, netdev Le mardi 26 avril 2011 à 23:04 +0200, Dominik Kaspar a écrit : > In these experiments, a queue size of 1000 packets was specified. I am > aware that this is typically referred to as "buffer bloat" and causes > the RTT and the cwnd to grow excessively. The smaller I configure the > queues, the more time it takes for TCP to "level up" to the aggregate > throughput. By keeping the queues so large, I hope to more quickly > identify the reason why TCP is actually able to adjust to the immense > multipath reordering. What parameters could be highly relevant, other > than the queue size? > losses of course ;) Real internet is full of packet losses, and probability of these losses depends on queue sizes (RED like AQM) > Thanks for the tip about printing tc/netem statistics after each run, > I will use "tc -s -d qdisc" next time. > ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Linux TCP's Robustness to Multipath Packet Reordering 2011-04-26 21:08 ` Eric Dumazet @ 2011-04-26 21:16 ` Dominik Kaspar 2011-04-26 21:17 ` Eric Dumazet 1 sibling, 0 replies; 33+ messages in thread From: Dominik Kaspar @ 2011-04-26 21:16 UTC (permalink / raw) To: Eric Dumazet; +Cc: Carsten Wolff, netdev On Tue, Apr 26, 2011 at 11:08 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > Le mardi 26 avril 2011 à 23:04 +0200, Dominik Kaspar a écrit : > >> In these experiments, a queue size of 1000 packets was specified. I am >> aware that this is typically referred to as "buffer bloat" and causes >> the RTT and the cwnd to grow excessively. The smaller I configure the >> queues, the more time it takes for TCP to "level up" to the aggregate >> throughput. By keeping the queues so large, I hope to more quickly >> identify the reason why TCP is actually able to adjust to the immense >> multipath reordering. What parameters could be highly relevant, other >> than the queue size? >> > > losses of course ;) > > Real internet is full of packet losses, and probability of these losses > depends on queue sizes (RED like AQM) > No additional random loss is introduced (yet), so packet loss happens only when the queue size of 1000 packets is hit. Since the queues are configured overly large, packet loss rarely happens at all... of course at the cost of a large RTT. I suspect that artificially bloating the RTT somehow allows TCP to better adjust to multipath reordering... just haven't got a clue why. Cheers, Dominik ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Linux TCP's Robustness to Multipath Packet Reordering 2011-04-26 21:08 ` Eric Dumazet 2011-04-26 21:16 ` Dominik Kaspar @ 2011-04-26 21:17 ` Eric Dumazet 1 sibling, 0 replies; 33+ messages in thread From: Eric Dumazet @ 2011-04-26 21:17 UTC (permalink / raw) To: Dominik Kaspar; +Cc: Carsten Wolff, netdev Le mardi 26 avril 2011 à 23:08 +0200, Eric Dumazet a écrit : > Le mardi 26 avril 2011 à 23:04 +0200, Dominik Kaspar a écrit : > > > In these experiments, a queue size of 1000 packets was specified. I am > > aware that this is typically referred to as "buffer bloat" and causes > > the RTT and the cwnd to grow excessively. The smaller I configure the > > queues, the more time it takes for TCP to "level up" to the aggregate > > throughput. By keeping the queues so large, I hope to more quickly > > identify the reason why TCP is actually able to adjust to the immense > > multipath reordering. What parameters could be highly relevant, other > > than the queue size? > > > > losses of course ;) > > Real internet is full of packet losses, and probability of these losses > depends on queue sizes (RED like AQM) > > BTW, netem in linux-2.6.39 contains lot of changes in netem module commit 661b79725fea030803a89a16cda (netem: revised correlated loss generator) This is a patch originated with Stefano Salsano and Fabio Ludovici. It provides several alternative loss models for use with netem. This patch adds two state machine based loss models. http://netgroup.uniroma2.it/twiki/bin/view.cgi/Main/NetemCLG ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Linux TCP's Robustness to Multipath Packet Reordering 2011-04-25 10:37 Linux TCP's Robustness to Multipath Packet Reordering Dominik Kaspar 2011-04-25 11:25 ` Eric Dumazet @ 2011-04-25 12:59 ` Carsten Wolff 1 sibling, 0 replies; 33+ messages in thread From: Carsten Wolff @ 2011-04-25 12:59 UTC (permalink / raw) To: Dominik Kaspar; +Cc: netdev Hi Dominik, On Monday 25 April 2011, Dominik Kaspar wrote: > Hello, > > Knowing how critical packet reordering is for standard TCP, I am > currently testing how robust Linux TCP is when packets are forwarded > over multiple paths (with different bandwidth and RTT). Since Linux > TCP adapts its "dupAck threshold" to an estimated level of packet > reordering, I expect it to be much more robust than a standard TCP > that strictly follows the RFCs. Indeed, as you can see in the > following plot, my experiments show a step-wise adaptation of Linux > TCP to heavy reordering. After many minutes, Linux TCP finally reaches > a data throughput close to the perfect aggregated data rate of two > paths (emulated with characteristics similar to IEEE 802.11b (WLAN) > and a 3G link (HSPA)): > > http://home.simula.no/~kaspar/static/mptcp-emu-wlan-hspa-00.png > > Does anyone have clues what's going on here? Why does the aggregated > throughput increase in steps? And what could be the reason it takes > minutes to adapt to the full capacity, when in other cases, Linux TCP > adapts much faster (for example if the bandwidth of both paths are > equal). I would highly appreciate some advice from the netdev > community. the throughput increase in steps is most likely caused by Linux's reordering detection and quantization. The DupThresh (tp->reordering) is only increased when reordering is detected and is then set to a value that depends on current inflight/pipe. This means, on a path with only reordering and no loss, where a very large DupThresh is best, you will see those steps in the throughput everytime when Linux detects reordering during a time where cwnd is large. This on the other hand depends purely on timing/luck. Linux is also only able to quantize reordering during disorder state, which leaves out many possible quantization samples, escpecially the larger ones, which would increase DupThresh to higher values. Also, reordering detection depends very much on TCP options. Which TCP Options were enabled in your test? Timestamps? D-SACK? Carsten > > Implementation details: > This multipath TCP experiment ran between a sending machine with a > single Ethernet interface (eth0) and a client with two Ethernet > interfaces (eth1, eth2). The machines are connected through a switch > and tc/netem is used to emulate the bandwidth and RTT of both paths. > TCP connections are established using iperf between eth0 and eth1 (the > primary path). At the sender, an iptables' NFQUEUE is used to "spoof" > the destination IP address of outgoing packets and force some to > travel to eth2 instead of eth1 (the secondary path). This multipath > scheduling happens in proportion to the emulated bandwidths, so if the > paths are set to 500 and 1000 KB/s, then packets are distributed in a > 1:2 ratio. At the client, iptables' RAWDNAT is used to translate the > spoofed IP addresses back to their original, so that all packets end > up at eth1, although a portion actually travelled to eth2. ACKs are > not scheduled over multiple paths, but always travel back on the > primary path. TCP does not notice anything of the multipath > forwarding, except the side-effect of packet reordering, which can be > huge if the path RTTs are set very differently. > > Best regards, > Dominik > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- /\-´-/\ ( @ @ ) ________o0O___^___O0o________ ^ permalink raw reply [flat|nested] 33+ messages in thread
end of thread, other threads:[~2011-06-21 11:46 UTC | newest] Thread overview: 33+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-04-25 10:37 Linux TCP's Robustness to Multipath Packet Reordering Dominik Kaspar 2011-04-25 11:25 ` Eric Dumazet 2011-04-25 14:35 ` Dominik Kaspar 2011-04-25 15:38 ` Eric Dumazet 2011-04-26 16:58 ` Dominik Kaspar 2011-04-26 17:10 ` Eric Dumazet 2011-04-26 18:00 ` Dominik Kaspar 2011-04-26 20:16 ` John Heffner 2011-04-26 21:27 ` Dominik Kaspar 2011-04-27 9:57 ` Carsten Wolff 2011-04-27 16:22 ` Dominik Kaspar 2011-04-27 16:36 ` Alexander Zimmermann 2011-06-21 11:25 ` Ilpo Järvinen 2011-06-21 11:34 ` Carsten Wolff 2011-06-21 11:46 ` Ilpo Järvinen 2011-04-27 16:48 ` Eric Dumazet 2011-04-27 17:39 ` Yuchung Cheng 2011-04-27 17:53 ` Alexander Zimmermann 2011-04-27 19:56 ` Dominik Kaspar 2011-04-27 21:41 ` Yuchung Cheng 2011-04-28 6:11 ` Alexander Zimmermann 2011-06-19 15:22 ` Dominik Kaspar 2011-06-19 15:38 ` Alexander Zimmermann 2011-06-19 16:25 ` Dominik Kaspar 2011-06-20 10:42 ` Ilpo Järvinen 2011-06-20 12:52 ` Dominik Kaspar 2011-06-21 11:35 ` Ilpo Järvinen 2011-04-26 20:43 ` Eric Dumazet 2011-04-26 21:04 ` Dominik Kaspar 2011-04-26 21:08 ` Eric Dumazet 2011-04-26 21:16 ` Dominik Kaspar 2011-04-26 21:17 ` Eric Dumazet 2011-04-25 12:59 ` Carsten Wolff
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).