Linux TCP's Robustness to Multipath Packet Reordering

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Linux TCP's Robustness to Multipath Packet Reordering
@ 2011-04-25 10:37 Dominik Kaspar
  2011-04-25 11:25 ` Eric Dumazet
  2011-04-25 12:59 ` Carsten Wolff
  0 siblings, 2 replies; 33+ messages in thread
From: Dominik Kaspar @ 2011-04-25 10:37 UTC (permalink / raw)
  To: netdev

Hello,

Knowing how critical packet reordering is for standard TCP, I am
currently testing how robust Linux TCP is when packets are forwarded
over multiple paths (with different bandwidth and RTT). Since Linux
TCP adapts its "dupAck threshold" to an estimated level of packet
reordering, I expect it to be much more robust than a standard TCP
that strictly follows the RFCs. Indeed, as you can see in the
following plot, my experiments show a step-wise adaptation of Linux
TCP to heavy reordering. After many minutes, Linux TCP finally reaches
a data throughput close to the perfect aggregated data rate of two
paths (emulated with characteristics similar to IEEE 802.11b (WLAN)
and a 3G link (HSPA)):

http://home.simula.no/~kaspar/static/mptcp-emu-wlan-hspa-00.png

Does anyone have clues what's going on here? Why does the aggregated
throughput increase in steps? And what could be the reason it takes
minutes to adapt to the full capacity, when in other cases, Linux TCP
adapts much faster (for example if the bandwidth of both paths are
equal). I would highly appreciate some advice from the netdev
community.

Implementation details:
This multipath TCP experiment ran between a sending machine with a
single Ethernet interface (eth0) and a client with two Ethernet
interfaces (eth1, eth2). The machines are connected through a switch
and tc/netem is used to emulate the bandwidth and RTT of both paths.
TCP connections are established using iperf between eth0 and eth1 (the
primary path). At the sender, an iptables' NFQUEUE is used to "spoof"
the destination IP address of outgoing packets and force some to
travel to eth2 instead of eth1 (the secondary path). This multipath
scheduling happens in proportion to the emulated bandwidths, so if the
paths are set to 500 and 1000 KB/s, then packets are distributed in a
1:2 ratio. At the client, iptables' RAWDNAT is used to translate the
spoofed IP addresses back to their original, so that all packets end
up at eth1, although a portion actually travelled to eth2. ACKs are
not scheduled over multiple paths, but always travel back on the
primary path. TCP does not notice anything of the multipath
forwarding, except the side-effect of packet reordering, which can be
huge if the path RTTs are set very differently.

Best regards,
Dominik

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Linux TCP's Robustness to Multipath Packet Reordering
  2011-04-25 10:37 Linux TCP's Robustness to Multipath Packet Reordering Dominik Kaspar
@ 2011-04-25 11:25 ` Eric Dumazet
  2011-04-25 14:35   ` Dominik Kaspar
  2011-04-25 12:59 ` Carsten Wolff
  1 sibling, 1 reply; 33+ messages in thread
From: Eric Dumazet @ 2011-04-25 11:25 UTC (permalink / raw)
  To: Dominik Kaspar; +Cc: netdev

Le lundi 25 avril 2011 à 12:37 +0200, Dominik Kaspar a écrit :
> Hello,
> 
> Knowing how critical packet reordering is for standard TCP, I am
> currently testing how robust Linux TCP is when packets are forwarded
> over multiple paths (with different bandwidth and RTT). Since Linux
> TCP adapts its "dupAck threshold" to an estimated level of packet
> reordering, I expect it to be much more robust than a standard TCP
> that strictly follows the RFCs. Indeed, as you can see in the
> following plot, my experiments show a step-wise adaptation of Linux
> TCP to heavy reordering. After many minutes, Linux TCP finally reaches
> a data throughput close to the perfect aggregated data rate of two
> paths (emulated with characteristics similar to IEEE 802.11b (WLAN)
> and a 3G link (HSPA)):
> 
> http://home.simula.no/~kaspar/static/mptcp-emu-wlan-hspa-00.png
> 
> Does anyone have clues what's going on here? Why does the aggregated
> throughput increase in steps? And what could be the reason it takes
> minutes to adapt to the full capacity, when in other cases, Linux TCP
> adapts much faster (for example if the bandwidth of both paths are
> equal). I would highly appreciate some advice from the netdev
> community.
> 
> Implementation details:
> This multipath TCP experiment ran between a sending machine with a
> single Ethernet interface (eth0) and a client with two Ethernet
> interfaces (eth1, eth2). The machines are connected through a switch
> and tc/netem is used to emulate the bandwidth and RTT of both paths.
> TCP connections are established using iperf between eth0 and eth1 (the
> primary path). At the sender, an iptables' NFQUEUE is used to "spoof"
> the destination IP address of outgoing packets and force some to
> travel to eth2 instead of eth1 (the secondary path). This multipath
> scheduling happens in proportion to the emulated bandwidths, so if the
> paths are set to 500 and 1000 KB/s, then packets are distributed in a
> 1:2 ratio. At the client, iptables' RAWDNAT is used to translate the
> spoofed IP addresses back to their original, so that all packets end
> up at eth1, although a portion actually travelled to eth2. ACKs are
> not scheduled over multiple paths, but always travel back on the
> primary path. TCP does not notice anything of the multipath
> forwarding, except the side-effect of packet reordering, which can be
> huge if the path RTTs are set very differently.
> 

Hi Dominik

Implementation details of the tc/netem stages are important to fully
understand how TCP stack can react.

Is TSO active at sender side for example ?

Your results show that only some exceptional events make bandwidth
really change.

A tcpdump/pcap of ~10.000 first packets would be nice to provide (not on
mailing list, but on your web site)




^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Linux TCP's Robustness to Multipath Packet Reordering
  2011-04-25 11:25 ` Eric Dumazet
@ 2011-04-25 14:35   ` Dominik Kaspar
  2011-04-25 15:38     ` Eric Dumazet
  2011-04-26 20:43     ` Eric Dumazet
  0 siblings, 2 replies; 33+ messages in thread
From: Dominik Kaspar @ 2011-04-25 14:35 UTC (permalink / raw)
  To: Eric Dumazet, Carsten Wolff; +Cc: netdev

Hi Eric and Carsten,

Thanks a lot for your quick replies. I don't have a tcpdump of this
experiment, but here is the tcp_probe log that the plot is based on
(I'll run a new test using tcpdump if you think that's more useful):

http://home.simula.no/~kaspar/static/mptcp-emu-wlan-hspa-00.log

I have also noticed what Carsten mentions, the tcp_reordering value is
essential for this whole behavior. When I start an experiment and
increase sysctl.net.ipv4.tcp_reordering during the running connection,
the TCP throughput immediately jumps close to the aggregate of both
paths. Without intervention, as in this experiment, tcp_reordering
starts out as 3 and then makes small oscillations between 3 and 12 for
more than 2 minutes. At about second 141, TCP somehow finds a new
highest reordering value (23) and at the same time, the throughput
jumps up "to the next level". The value of 23 is then used all the way
until second 603, when the reordering value becomes 32 and the
throughput again jumps up a level.

I understand that tp->reordering is increased when reordering is
detected, but what causes tp->reordering to sometimes be decreased
back to 3? Also, why does a decrease back to 3 not make the whole
procedure start all over again? For example, at second 1013.64,
tp->reordering falls from 127 down to 3. A second later (1014.93) it
then suddenly increases from 3 up to 32 without considering any
numbers in between. Why it is now suddenly so fast? At the very
beginning, it took 600 seconds to grow from 3 to 32 and afterward it
just takes a second...?

For the experiments, all default TCP options were used, meaning that
SACK, DSACK, Timestamps, were all enabled. Not sure how to turn on/off
TSO... so that is probably enabled, too. Path emulation is done with
tc/netem at the receiver interfaces (eth1, eth2) with this script:

http://home.simula.no/~kaspar/static/netem.sh

Greetings,
Dominik

On Mon, Apr 25, 2011 at 1:25 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le lundi 25 avril 2011 à 12:37 +0200, Dominik Kaspar a écrit :
>> Hello,
>>
>> Knowing how critical packet reordering is for standard TCP, I am
>> currently testing how robust Linux TCP is when packets are forwarded
>> over multiple paths (with different bandwidth and RTT). Since Linux
>> TCP adapts its "dupAck threshold" to an estimated level of packet
>> reordering, I expect it to be much more robust than a standard TCP
>> that strictly follows the RFCs. Indeed, as you can see in the
>> following plot, my experiments show a step-wise adaptation of Linux
>> TCP to heavy reordering. After many minutes, Linux TCP finally reaches
>> a data throughput close to the perfect aggregated data rate of two
>> paths (emulated with characteristics similar to IEEE 802.11b (WLAN)
>> and a 3G link (HSPA)):
>>
>> http://home.simula.no/~kaspar/static/mptcp-emu-wlan-hspa-00.png
>>
>> Does anyone have clues what's going on here? Why does the aggregated
>> throughput increase in steps? And what could be the reason it takes
>> minutes to adapt to the full capacity, when in other cases, Linux TCP
>> adapts much faster (for example if the bandwidth of both paths are
>> equal). I would highly appreciate some advice from the netdev
>> community.
>>
>> Implementation details:
>> This multipath TCP experiment ran between a sending machine with a
>> single Ethernet interface (eth0) and a client with two Ethernet
>> interfaces (eth1, eth2). The machines are connected through a switch
>> and tc/netem is used to emulate the bandwidth and RTT of both paths.
>> TCP connections are established using iperf between eth0 and eth1 (the
>> primary path). At the sender, an iptables' NFQUEUE is used to "spoof"
>> the destination IP address of outgoing packets and force some to
>> travel to eth2 instead of eth1 (the secondary path). This multipath
>> scheduling happens in proportion to the emulated bandwidths, so if the
>> paths are set to 500 and 1000 KB/s, then packets are distributed in a
>> 1:2 ratio. At the client, iptables' RAWDNAT is used to translate the
>> spoofed IP addresses back to their original, so that all packets end
>> up at eth1, although a portion actually travelled to eth2. ACKs are
>> not scheduled over multiple paths, but always travel back on the
>> primary path. TCP does not notice anything of the multipath
>> forwarding, except the side-effect of packet reordering, which can be
>> huge if the path RTTs are set very differently.
>>
>
> Hi Dominik
>
> Implementation details of the tc/netem stages are important to fully
> understand how TCP stack can react.
>
> Is TSO active at sender side for example ?
>
> Your results show that only some exceptional events make bandwidth
> really change.
>
> A tcpdump/pcap of ~10.000 first packets would be nice to provide (not on
> mailing list, but on your web site)

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Linux TCP's Robustness to Multipath Packet Reordering
  2011-04-25 14:35   ` Dominik Kaspar
@ 2011-04-25 15:38     ` Eric Dumazet
  2011-04-26 16:58       ` Dominik Kaspar
  2011-04-26 20:43     ` Eric Dumazet
  1 sibling, 1 reply; 33+ messages in thread
From: Eric Dumazet @ 2011-04-25 15:38 UTC (permalink / raw)
  To: Dominik Kaspar; +Cc: Carsten Wolff, netdev

Le lundi 25 avril 2011 à 16:35 +0200, Dominik Kaspar a écrit :
> Hi Eric and Carsten,
> 
> Thanks a lot for your quick replies. I don't have a tcpdump of this
> experiment, but here is the tcp_probe log that the plot is based on
> (I'll run a new test using tcpdump if you think that's more useful):
> 
> http://home.simula.no/~kaspar/static/mptcp-emu-wlan-hspa-00.log
> 
> I have also noticed what Carsten mentions, the tcp_reordering value is
> essential for this whole behavior. When I start an experiment and
> increase sysctl.net.ipv4.tcp_reordering during the running connection,
> the TCP throughput immediately jumps close to the aggregate of both
> paths. Without intervention, as in this experiment, tcp_reordering
> starts out as 3 and then makes small oscillations between 3 and 12 for
> more than 2 minutes. At about second 141, TCP somehow finds a new
> highest reordering value (23) and at the same time, the throughput
> jumps up "to the next level". The value of 23 is then used all the way
> until second 603, when the reordering value becomes 32 and the
> throughput again jumps up a level.
> 
> I understand that tp->reordering is increased when reordering is
> detected, but what causes tp->reordering to sometimes be decreased
> back to 3? Also, why does a decrease back to 3 not make the whole
> procedure start all over again? For example, at second 1013.64,
> tp->reordering falls from 127 down to 3. A second later (1014.93) it
> then suddenly increases from 3 up to 32 without considering any
> numbers in between. Why it is now suddenly so fast? At the very
> beginning, it took 600 seconds to grow from 3 to 32 and afterward it
> just takes a second...?
> 
> For the experiments, all default TCP options were used, meaning that
> SACK, DSACK, Timestamps, were all enabled. Not sure how to turn on/off
> TSO... so that is probably enabled, too. Path emulation is done with
> tc/netem at the receiver interfaces (eth1, eth2) with this script:
> 

Since you have at sender a rule to spoof destination address of packets,
you should make sure you dont send "super packets (up to 64Kbytes)",
because it would stress the multipath more than you wanted to. This way,
you send only normal packets (1500 MTU).

ethtool -K eth0 tso off
ethtool -K eth0 gso off

I am pretty sure it should help your (atypic) workload.

> http://home.simula.no/~kaspar/static/netem.sh
> 
> Greetings,
> Dominik



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Linux TCP's Robustness to Multipath Packet Reordering
  2011-04-25 15:38     ` Eric Dumazet
@ 2011-04-26 16:58       ` Dominik Kaspar
  2011-04-26 17:10         ` Eric Dumazet
  0 siblings, 1 reply; 33+ messages in thread
From: Dominik Kaspar @ 2011-04-26 16:58 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Carsten Wolff, netdev

Hi Eric,

On Mon, Apr 25, 2011 at 5:38 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> Since you have at sender a rule to spoof destination address of packets,
> you should make sure you dont send "super packets (up to 64Kbytes)",
> because it would stress the multipath more than you wanted to. This way,
> you send only normal packets (1500 MTU).
>
> ethtool -K eth0 tso off
> ethtool -K eth0 gso off
>
> I am pretty sure it should help your (atypic) workload.

I made new experiments with the exact same multipath setup as before,
but disabled TSO and GSO on all involved Ethernet interfaces. However,
this did not seem to change much about TCP's behavior when packets are
striped over heterogeneous paths. You can see the results of four
20-minute experiments on this plot:

http://home.simula.no/~kaspar/static/mptcp-emu-wlan-hspa-01-tos0.png

Cheers,
Dominik

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Linux TCP's Robustness to Multipath Packet Reordering
  2011-04-26 16:58       ` Dominik Kaspar
@ 2011-04-26 17:10         ` Eric Dumazet
  2011-04-26 18:00           ` Dominik Kaspar
  0 siblings, 1 reply; 33+ messages in thread
From: Eric Dumazet @ 2011-04-26 17:10 UTC (permalink / raw)
  To: Dominik Kaspar; +Cc: Carsten Wolff, netdev

Le mardi 26 avril 2011 à 18:58 +0200, Dominik Kaspar a écrit :
> Hi Eric,
> 
> On Mon, Apr 25, 2011 at 5:38 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >
> > Since you have at sender a rule to spoof destination address of packets,
> > you should make sure you dont send "super packets (up to 64Kbytes)",
> > because it would stress the multipath more than you wanted to. This way,
> > you send only normal packets (1500 MTU).
> >
> > ethtool -K eth0 tso off
> > ethtool -K eth0 gso off
> >
> > I am pretty sure it should help your (atypic) workload.
> 
> I made new experiments with the exact same multipath setup as before,
> but disabled TSO and GSO on all involved Ethernet interfaces. However,
> this did not seem to change much about TCP's behavior when packets are
> striped over heterogeneous paths. You can see the results of four
> 20-minute experiments on this plot:
> 
> http://home.simula.no/~kaspar/static/mptcp-emu-wlan-hspa-01-tos0.png
> 
> Cheers,
> Dominik

Hi Dominik

Any chance to have a pcap file from sender side, of say first 10.000
packets ?




^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Linux TCP's Robustness to Multipath Packet Reordering
  2011-04-26 17:10         ` Eric Dumazet
@ 2011-04-26 18:00           ` Dominik Kaspar
  2011-04-26 20:16             ` John Heffner
  0 siblings, 1 reply; 33+ messages in thread
From: Dominik Kaspar @ 2011-04-26 18:00 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Carsten Wolff, netdev

Hi Eric,

Here are the tcpdump files for the first TSO-disabled experiment, in a
full version and a short version with only the first 10000 packets:

http://home.simula.no/~kaspar/static/mptcp-emu-wlan-hspa-01-tos0-exp1-full.pcap
http://home.simula.no/~kaspar/static/mptcp-emu-wlan-hspa-01-tos0-exp1-short.pcap

By the way, the packets are sent from the server (x.x.x.189) to the
client interfaces (x.x.x.74) and (x.x.x.216) with the following
pattern (which is a non-bursty 128-bit approximation of scheduling
with a 600:400 ratio over primary path 0 and secondary path 1):

0010010100101001010010100101001010010100101001010010100101001010
0101001010010100101001010010100101001010010100101001010010100101

Greetings,
Dominik

On Tue, Apr 26, 2011 at 7:10 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le mardi 26 avril 2011 à 18:58 +0200, Dominik Kaspar a écrit :
>> Hi Eric,
>>
>> On Mon, Apr 25, 2011 at 5:38 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> >
>> > Since you have at sender a rule to spoof destination address of packets,
>> > you should make sure you dont send "super packets (up to 64Kbytes)",
>> > because it would stress the multipath more than you wanted to. This way,
>> > you send only normal packets (1500 MTU).
>> >
>> > ethtool -K eth0 tso off
>> > ethtool -K eth0 gso off
>> >
>> > I am pretty sure it should help your (atypic) workload.
>>
>> I made new experiments with the exact same multipath setup as before,
>> but disabled TSO and GSO on all involved Ethernet interfaces. However,
>> this did not seem to change much about TCP's behavior when packets are
>> striped over heterogeneous paths. You can see the results of four
>> 20-minute experiments on this plot:
>>
>> http://home.simula.no/~kaspar/static/mptcp-emu-wlan-hspa-01-tos0.png
>>
>> Cheers,
>> Dominik
>
> Hi Dominik
>
> Any chance to have a pcap file from sender side, of say first 10.000
> packets ?
>
>
>
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Linux TCP's Robustness to Multipath Packet Reordering
  2011-04-26 18:00           ` Dominik Kaspar
@ 2011-04-26 20:16             ` John Heffner
  2011-04-26 21:27               ` Dominik Kaspar
  2011-04-27  9:57               ` Carsten Wolff
  0 siblings, 2 replies; 33+ messages in thread
From: John Heffner @ 2011-04-26 20:16 UTC (permalink / raw)
  To: Dominik Kaspar; +Cc: Eric Dumazet, Carsten Wolff, netdev

First, TCP is definitely not designed to work under such conditions.
For example, assumptions behind RTO calculation and fast retransmit
heuristics are violated.  However, in this particular case my first
guess is that you are being limited by "cwnd moderation," which was
the topic of recent discussion here.  Under persistent reordering,
cwnd moderation can inhibit the ability of cwnd to grow.

Thanks,
  -John


On Tue, Apr 26, 2011 at 2:00 PM, Dominik Kaspar <dokaspar.ietf@gmail.com> wrote:
> Hi Eric,
>
> Here are the tcpdump files for the first TSO-disabled experiment, in a
> full version and a short version with only the first 10000 packets:
>
> http://home.simula.no/~kaspar/static/mptcp-emu-wlan-hspa-01-tos0-exp1-full.pcap
> http://home.simula.no/~kaspar/static/mptcp-emu-wlan-hspa-01-tos0-exp1-short.pcap
>
> By the way, the packets are sent from the server (x.x.x.189) to the
> client interfaces (x.x.x.74) and (x.x.x.216) with the following
> pattern (which is a non-bursty 128-bit approximation of scheduling
> with a 600:400 ratio over primary path 0 and secondary path 1):
>
> 0010010100101001010010100101001010010100101001010010100101001010
> 0101001010010100101001010010100101001010010100101001010010100101
>
> Greetings,
> Dominik
>
> On Tue, Apr 26, 2011 at 7:10 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> Le mardi 26 avril 2011 à 18:58 +0200, Dominik Kaspar a écrit :
>>> Hi Eric,
>>>
>>> On Mon, Apr 25, 2011 at 5:38 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>> >
>>> > Since you have at sender a rule to spoof destination address of packets,
>>> > you should make sure you dont send "super packets (up to 64Kbytes)",
>>> > because it would stress the multipath more than you wanted to. This way,
>>> > you send only normal packets (1500 MTU).
>>> >
>>> > ethtool -K eth0 tso off
>>> > ethtool -K eth0 gso off
>>> >
>>> > I am pretty sure it should help your (atypic) workload.
>>>
>>> I made new experiments with the exact same multipath setup as before,
>>> but disabled TSO and GSO on all involved Ethernet interfaces. However,
>>> this did not seem to change much about TCP's behavior when packets are
>>> striped over heterogeneous paths. You can see the results of four
>>> 20-minute experiments on this plot:
>>>
>>> http://home.simula.no/~kaspar/static/mptcp-emu-wlan-hspa-01-tos0.png
>>>
>>> Cheers,
>>> Dominik
>>
>> Hi Dominik
>>
>> Any chance to have a pcap file from sender side, of say first 10.000
>> packets ?
>>
>>
>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Linux TCP's Robustness to Multipath Packet Reordering
  2011-04-26 20:16             ` John Heffner
@ 2011-04-26 21:27               ` Dominik Kaspar
  2011-04-27  9:57               ` Carsten Wolff
  1 sibling, 0 replies; 33+ messages in thread
From: Dominik Kaspar @ 2011-04-26 21:27 UTC (permalink / raw)
  To: John Heffner; +Cc: Eric Dumazet, Carsten Wolff, netdev

Hi John,

Thanks for your advice. I am very well aware that TCP is not designed
to work under such conditions. I am still surprised how well Linux TCP
handles many situations of excessive, persistent packet reordering. In
scenarios of fairly heterogeneous path characteristics, Linux TCP
aggregates multiple paths close to ideally :-)

If I'm not mistaken, cwnd moderation is a measure to prevent TCP from
sending large bursts if a single ACK covers many segments. In what way
can cwnd moderation prevent TCP from increasing its estimate of packet
reordering?

Greetings,
Dominik

On Tue, Apr 26, 2011 at 10:16 PM, John Heffner <johnwheffner@gmail.com> wrote:
> First, TCP is definitely not designed to work under such conditions.
> For example, assumptions behind RTO calculation and fast retransmit
> heuristics are violated.  However, in this particular case my first
> guess is that you are being limited by "cwnd moderation," which was
> the topic of recent discussion here.  Under persistent reordering,
> cwnd moderation can inhibit the ability of cwnd to grow.
>
> Thanks,
>  -John
>
>
> On Tue, Apr 26, 2011 at 2:00 PM, Dominik Kaspar <dokaspar.ietf@gmail.com> wrote:
>> Hi Eric,
>>
>> Here are the tcpdump files for the first TSO-disabled experiment, in a
>> full version and a short version with only the first 10000 packets:
>>
>> http://home.simula.no/~kaspar/static/mptcp-emu-wlan-hspa-01-tos0-exp1-full.pcap
>> http://home.simula.no/~kaspar/static/mptcp-emu-wlan-hspa-01-tos0-exp1-short.pcap
>>
>> By the way, the packets are sent from the server (x.x.x.189) to the
>> client interfaces (x.x.x.74) and (x.x.x.216) with the following
>> pattern (which is a non-bursty 128-bit approximation of scheduling
>> with a 600:400 ratio over primary path 0 and secondary path 1):
>>
>> 0010010100101001010010100101001010010100101001010010100101001010
>> 0101001010010100101001010010100101001010010100101001010010100101
>>
>> Greetings,
>> Dominik
>>
>> On Tue, Apr 26, 2011 at 7:10 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>> Le mardi 26 avril 2011 à 18:58 +0200, Dominik Kaspar a écrit :
>>>> Hi Eric,
>>>>
>>>> On Mon, Apr 25, 2011 at 5:38 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>>> >
>>>> > Since you have at sender a rule to spoof destination address of packets,
>>>> > you should make sure you dont send "super packets (up to 64Kbytes)",
>>>> > because it would stress the multipath more than you wanted to. This way,
>>>> > you send only normal packets (1500 MTU).
>>>> >
>>>> > ethtool -K eth0 tso off
>>>> > ethtool -K eth0 gso off
>>>> >
>>>> > I am pretty sure it should help your (atypic) workload.
>>>>
>>>> I made new experiments with the exact same multipath setup as before,
>>>> but disabled TSO and GSO on all involved Ethernet interfaces. However,
>>>> this did not seem to change much about TCP's behavior when packets are
>>>> striped over heterogeneous paths. You can see the results of four
>>>> 20-minute experiments on this plot:
>>>>
>>>> http://home.simula.no/~kaspar/static/mptcp-emu-wlan-hspa-01-tos0.png
>>>>
>>>> Cheers,
>>>> Dominik
>>>
>>> Hi Dominik
>>>
>>> Any chance to have a pcap file from sender side, of say first 10.000
>>> packets ?
>>>
>>>
>>>
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Linux TCP's Robustness to Multipath Packet Reordering
  2011-04-26 20:16             ` John Heffner
  2011-04-26 21:27               ` Dominik Kaspar
@ 2011-04-27  9:57               ` Carsten Wolff
  2011-04-27 16:22                 ` Dominik Kaspar
  1 sibling, 1 reply; 33+ messages in thread
From: Carsten Wolff @ 2011-04-27  9:57 UTC (permalink / raw)
  To: John Heffner
  Cc: Dominik Kaspar, Eric Dumazet, netdev, Zimmermann Alexander,
	Lennart Schulte, Arnd Hannemann

Hi all,

On Tuesday 26 April 2011, John Heffner wrote:
> First, TCP is definitely not designed to work under such conditions.
> For example, assumptions behind RTO calculation and fast retransmit
> heuristics are violated.  However, in this particular case my first
> guess is that you are being limited by "cwnd moderation," which was
> the topic of recent discussion here.  Under persistent reordering,
> cwnd moderation can inhibit the ability of cwnd to grow.

it's not just cwnd moderation (of which I'm still in favor, even though I lost 
the argument by inactivity ;-)).

Anyway, there are a lot of things in reordering handling that can be improved. 
Our group (Alexander, Lennart, Arnd, myself and others) has worked on the 
problem for a long time now. This work resulted in an algorithm that is in 
large parts TCP-NCR (RFC4653), but also utilizes information gathered by 
reordering detection for determination of a good DupThresh, fixes a few 
problems in RFC4653 and improves on the reordering detection in Linux when the 
connection has no timestamps option. We implemented "pure" TCP-NCR and our own 
variant in Linux using a modular framework similar to the congestion control 
modules. A lot of measurements and evaluation have gone into the comparison of 
the three algorithms. We are now very close(TM) to a final patch, that is more 
suited for publication on this list and integrates our algorithm into tcp*.
[hc] without introducing the overhead of that modular framework.

Greetings,
Carsten

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Linux TCP's Robustness to Multipath Packet Reordering
  2011-04-27  9:57               ` Carsten Wolff
@ 2011-04-27 16:22                 ` Dominik Kaspar
  2011-04-27 16:36                   ` Alexander Zimmermann
                                     ` (2 more replies)
  0 siblings, 3 replies; 33+ messages in thread
From: Dominik Kaspar @ 2011-04-27 16:22 UTC (permalink / raw)
  To: Carsten Wolff
  Cc: John Heffner, Eric Dumazet, netdev, Zimmermann Alexander,
	Lennart Schulte, Arnd Hannemann

Hi Carsten,

Thanks for your feedback. I made some new tests with the same setup of
packet-based forwarding over two emulated paths (600 KB/s, 10 ms) +
(400 KB/s, 100 ms). In the first experiments, which showed a step-wise
adaptation to reordering, SACK, DSACK, and Timestamps were all
enabled. In the experiments, I individually disabled these three
mechanisms and saw the following:

- Disabling timestamps causes TCP to never adjust to reordering at all.
- Disabling SACK allows TCP to adapt very rapidly ("perfect" aggregation!).
- Disabling DSACK has no obvious impact (still a step-wise throughput).

Is there an explanation for why turning off SACK can be beneficial in
the presence of packet reordering? That sounds pretty
counter-intuitive to me... I thought SACK=1 always performs better
than SACK=0. The results are also illustrated in the following plot.
For each setting, there are three runs, which all exhibit a similar
behavior:

http://home.simula.no/~kaspar/static/mptcp-emu-wlan-hspa-02-sack.png

Greetings,
Dominik

On Wed, Apr 27, 2011 at 11:57 AM, Carsten Wolff <carsten@wolffcarsten.de> wrote:
> Hi all,
>
> On Tuesday 26 April 2011, John Heffner wrote:
>> First, TCP is definitely not designed to work under such conditions.
>> For example, assumptions behind RTO calculation and fast retransmit
>> heuristics are violated.  However, in this particular case my first
>> guess is that you are being limited by "cwnd moderation," which was
>> the topic of recent discussion here.  Under persistent reordering,
>> cwnd moderation can inhibit the ability of cwnd to grow.
>
> it's not just cwnd moderation (of which I'm still in favor, even though I lost
> the argument by inactivity ;-)).
>
> Anyway, there are a lot of things in reordering handling that can be improved.
> Our group (Alexander, Lennart, Arnd, myself and others) has worked on the
> problem for a long time now. This work resulted in an algorithm that is in
> large parts TCP-NCR (RFC4653), but also utilizes information gathered by
> reordering detection for determination of a good DupThresh, fixes a few
> problems in RFC4653 and improves on the reordering detection in Linux when the
> connection has no timestamps option. We implemented "pure" TCP-NCR and our own
> variant in Linux using a modular framework similar to the congestion control
> modules. A lot of measurements and evaluation have gone into the comparison of
> the three algorithms. We are now very close(TM) to a final patch, that is more
> suited for publication on this list and integrates our algorithm into tcp*.
> [hc] without introducing the overhead of that modular framework.
>
> Greetings,
> Carsten
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Linux TCP's Robustness to Multipath Packet Reordering
  2011-04-27 16:22                 ` Dominik Kaspar
@ 2011-04-27 16:36                   ` Alexander Zimmermann
  2011-06-21 11:25                     ` Ilpo Järvinen
  2011-04-27 16:48                   ` Eric Dumazet
  2011-04-27 17:39                   ` Yuchung Cheng
  2 siblings, 1 reply; 33+ messages in thread
From: Alexander Zimmermann @ 2011-04-27 16:36 UTC (permalink / raw)
  To: Dominik Kaspar
  Cc: Carsten Wolff, John Heffner, Eric Dumazet, netdev,
	Lennart Schulte, Arnd Hannemann

[-- Attachment #1: Type: text/plain, Size: 1835 bytes --]

Hi,

Am 27.04.2011 um 18:22 schrieb Dominik Kaspar:

> Hi Carsten,
> 
> Thanks for your feedback. I made some new tests with the same setup of
> packet-based forwarding over two emulated paths (600 KB/s, 10 ms) +
> (400 KB/s, 100 ms). In the first experiments, which showed a step-wise
> adaptation to reordering, SACK, DSACK, and Timestamps were all
> enabled. In the experiments, I individually disabled these three
> mechanisms and saw the following:
> 
> - Disabling timestamps causes TCP to never adjust to reordering at all.

Reordering detection with DSACK is broken in Linux. We will fix that in
a couple of weeks...

> - Disabling SACK allows TCP to adapt very rapidly ("perfect" aggregation!).

If you disable SACK, you will use the NewReno detection

> - Disabling DSACK has no obvious impact (still a step-wise throughput).

If Timestamps are enabled, Linux use Timestamps for detection. Regardless
of DSACK. Timestamp detection is quicker. See RFC3522. (However, in case
of an spurious FRet it's not so dramatical. In case of an Spurious RTO,
you can avoid the go-back-n behavior)


> 
> Is there an explanation for why turning off SACK can be beneficial in
> the presence of packet reordering? That sounds pretty
> counter-intuitive to me... I thought SACK=1 always performs better
> than SACK=0. The results are also illustrated in the following plot.
> For each setting, there are three runs, which all exhibit a similar
> behavior:
> 
> http://home.simula.no/~kaspar/static/mptcp-emu-wlan-hspa-02-sack.png
> 
> Greetings,
> Dominik
> 

//
// Dipl.-Inform. Alexander Zimmermann
// Department of Computer Science, Informatik 4
// RWTH Aachen University
// Ahornstr. 55, 52056 Aachen, Germany
// phone: (49-241) 80-21422, fax: (49-241) 80-22222
// email: zimmermann@cs.rwth-aachen.de
// web: http://www.umic-mesh.net
//


[-- Attachment #2: Signierter Teil der Nachricht --]
[-- Type: application/pgp-signature, Size: 243 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Linux TCP's Robustness to Multipath Packet Reordering
  2011-04-27 16:36                   ` Alexander Zimmermann
@ 2011-06-21 11:25                     ` Ilpo Järvinen
  2011-06-21 11:34                       ` Carsten Wolff
  0 siblings, 1 reply; 33+ messages in thread
From: Ilpo Järvinen @ 2011-06-21 11:25 UTC (permalink / raw)
  To: Alexander Zimmermann
  Cc: Dominik Kaspar, Carsten Wolff, John Heffner, Eric Dumazet, Netdev,
	Lennart Schulte, Arnd Hannemann

On Wed, 27 Apr 2011, Alexander Zimmermann wrote:

> Hi,
> 
> Am 27.04.2011 um 18:22 schrieb Dominik Kaspar:
> 
> > Hi Carsten,
> > 
> > Thanks for your feedback. I made some new tests with the same setup of
> > packet-based forwarding over two emulated paths (600 KB/s, 10 ms) +
> > (400 KB/s, 100 ms). In the first experiments, which showed a step-wise
> > adaptation to reordering, SACK, DSACK, and Timestamps were all
> > enabled. In the experiments, I individually disabled these three
> > mechanisms and saw the following:
> > 
> > - Disabling timestamps causes TCP to never adjust to reordering at all.
> 
> Reordering detection with DSACK is broken in Linux. We will fix that in
> a couple of weeks...
> 
> > - Disabling SACK allows TCP to adapt very rapidly ("perfect" aggregation!).
> 
> If you disable SACK, you will use the NewReno detection

Which probably has some reordering over-estimate bugs on its own... 
(but I've forgotten details of my suspicion long time ago so please don't 
ask for the them).

-- 
 i.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Linux TCP's Robustness to Multipath Packet Reordering
  2011-06-21 11:25                     ` Ilpo Järvinen
@ 2011-06-21 11:34                       ` Carsten Wolff
  2011-06-21 11:46                         ` Ilpo Järvinen
  0 siblings, 1 reply; 33+ messages in thread
From: Carsten Wolff @ 2011-06-21 11:34 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: Alexander Zimmermann, Dominik Kaspar, John Heffner, Eric Dumazet,
	Netdev, Lennart Schulte, Arnd Hannemann

Hi,

On Tuesday 21 June 2011, Ilpo Järvinen wrote:
> On Wed, 27 Apr 2011, Alexander Zimmermann wrote:
> > Am 27.04.2011 um 18:22 schrieb Dominik Kaspar:
> > > Hi Carsten,
> > > 
> > > Thanks for your feedback. I made some new tests with the same setup of
> > > packet-based forwarding over two emulated paths (600 KB/s, 10 ms) +
> > > (400 KB/s, 100 ms). In the first experiments, which showed a step-wise
> > > adaptation to reordering, SACK, DSACK, and Timestamps were all
> > > enabled. In the experiments, I individually disabled these three
> > > mechanisms and saw the following:
> > > 
> > > - Disabling timestamps causes TCP to never adjust to reordering at all.
> > 
> > Reordering detection with DSACK is broken in Linux. We will fix that in
> > a couple of weeks...
> > 
> > > - Disabling SACK allows TCP to adapt very rapidly ("perfect"
> > > aggregation!).
> > 
> > If you disable SACK, you will use the NewReno detection
> 
> Which probably has some reordering over-estimate bugs on its own...
> (but I've forgotten details of my suspicion long time ago so please don't
> ask for the them).

the NewReno detection is clever, but there's no exact information it could 
utilize for a good metric, because it detects the event too late, when the 
information is already gone. In my experiments it always under-estimated the 
reordering extent, though. I also remmember thinking that the metric of the 
Eifel-detection has an off-by-one bug.

Carsten

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Linux TCP's Robustness to Multipath Packet Reordering
  2011-06-21 11:34                       ` Carsten Wolff
@ 2011-06-21 11:46                         ` Ilpo Järvinen
  0 siblings, 0 replies; 33+ messages in thread
From: Ilpo Järvinen @ 2011-06-21 11:46 UTC (permalink / raw)
  To: Carsten Wolff
  Cc: Alexander Zimmermann, Dominik Kaspar, John Heffner, Eric Dumazet,
	Netdev, Lennart Schulte, Arnd Hannemann

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1783 bytes --]

On Tue, 21 Jun 2011, Carsten Wolff wrote:

> On Tuesday 21 June 2011, Ilpo Järvinen wrote:
> > On Wed, 27 Apr 2011, Alexander Zimmermann wrote:
> > > Am 27.04.2011 um 18:22 schrieb Dominik Kaspar:
> > > > Hi Carsten,
> > > > 
> > > > Thanks for your feedback. I made some new tests with the same setup of
> > > > packet-based forwarding over two emulated paths (600 KB/s, 10 ms) +
> > > > (400 KB/s, 100 ms). In the first experiments, which showed a step-wise
> > > > adaptation to reordering, SACK, DSACK, and Timestamps were all
> > > > enabled. In the experiments, I individually disabled these three
> > > > mechanisms and saw the following:
> > > > 
> > > > - Disabling timestamps causes TCP to never adjust to reordering at all.
> > > 
> > > Reordering detection with DSACK is broken in Linux. We will fix that in
> > > a couple of weeks...
> > > 
> > > > - Disabling SACK allows TCP to adapt very rapidly ("perfect"
> > > > aggregation!).
> > > 
> > > If you disable SACK, you will use the NewReno detection
> > 
> > Which probably has some reordering over-estimate bugs on its own...
> > (but I've forgotten details of my suspicion long time ago so please don't
> > ask for the them).
> 
> the NewReno detection is clever, but there's no exact information it could 
> utilize for a good metric, because it detects the event too late, when the 
> information is already gone. In my experiments it always under-estimated the 
> reordering extent, though. I also remmember thinking that the metric of the 
> Eifel-detection has an off-by-one bug.

That might be true for most of the cases but IIRC I figured out a
a scenario where it miscalculates RTT worth of extra into the reordering 
(but I never really confirmed that in real tests or so, just figured it 
a bit).

-- 
 i.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Linux TCP's Robustness to Multipath Packet Reordering
  2011-04-27 16:22                 ` Dominik Kaspar
  2011-04-27 16:36                   ` Alexander Zimmermann
@ 2011-04-27 16:48                   ` Eric Dumazet
  2011-04-27 17:39                   ` Yuchung Cheng
  2 siblings, 0 replies; 33+ messages in thread
From: Eric Dumazet @ 2011-04-27 16:48 UTC (permalink / raw)
  To: Dominik Kaspar
  Cc: Carsten Wolff, John Heffner, netdev, Zimmermann Alexander,
	Lennart Schulte, Arnd Hannemann

Le mercredi 27 avril 2011 à 18:22 +0200, Dominik Kaspar a écrit :
> Hi Carsten,
> 
> Thanks for your feedback. I made some new tests with the same setup of
> packet-based forwarding over two emulated paths (600 KB/s, 10 ms) +
> (400 KB/s, 100 ms). In the first experiments, which showed a step-wise
> adaptation to reordering, SACK, DSACK, and Timestamps were all
> enabled. In the experiments, I individually disabled these three
> mechanisms and saw the following:
> 
> - Disabling timestamps causes TCP to never adjust to reordering at all.
> - Disabling SACK allows TCP to adapt very rapidly ("perfect" aggregation!).
> - Disabling DSACK has no obvious impact (still a step-wise throughput).
> 
> Is there an explanation for why turning off SACK can be beneficial in
> the presence of packet reordering? That sounds pretty
> counter-intuitive to me... I thought SACK=1 always performs better
> than SACK=0. The results are also illustrated in the following plot.
> For each setting, there are three runs, which all exhibit a similar
> behavior:
> 
> http://home.simula.no/~kaspar/static/mptcp-emu-wlan-hspa-02-sack.png
> 

SACK is a win in a normal environnement, with few reorders, but some
percents of losses ;)

Given the limit of 3 blocks in SACK option, and your pretty asymetric
paths (10ms and 100ms), SACK is useless and consume 12 bytes per
frame...

You really should add traces to every tp->reordering changes done in our
TCP stack, its a 20 minutes patch, and would help you to understand
where/when its increased/decreased.





^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Linux TCP's Robustness to Multipath Packet Reordering
  2011-04-27 16:22                 ` Dominik Kaspar
  2011-04-27 16:36                   ` Alexander Zimmermann
  2011-04-27 16:48                   ` Eric Dumazet
@ 2011-04-27 17:39                   ` Yuchung Cheng
  2011-04-27 17:53                     ` Alexander Zimmermann
  2011-04-27 19:56                     ` Dominik Kaspar
  2 siblings, 2 replies; 33+ messages in thread
From: Yuchung Cheng @ 2011-04-27 17:39 UTC (permalink / raw)
  To: Dominik Kaspar
  Cc: Carsten Wolff, John Heffner, Eric Dumazet, netdev,
	Zimmermann Alexander, Lennart Schulte, Arnd Hannemann

Hi Dominik,

On Wed, Apr 27, 2011 at 9:22 AM, Dominik Kaspar <dokaspar.ietf@gmail.com> wrote:
>
> Hi Carsten,
>
> Thanks for your feedback. I made some new tests with the same setup of
> packet-based forwarding over two emulated paths (600 KB/s, 10 ms) +
> (400 KB/s, 100 ms). In the first experiments, which showed a step-wise
> adaptation to reordering, SACK, DSACK, and Timestamps were all
> enabled. In the experiments, I individually disabled these three
> mechanisms and saw the following:
>
> - Disabling timestamps causes TCP to never adjust to reordering at all.
> - Disabling SACK allows TCP to adapt very rapidly ("perfect" aggregation!).

Did you enable tcp_fack when sack is enabled? this may make a (big)
difference. FACK assumes little network reordering and mark packet
losses more aggressively.

> - Disabling DSACK has no obvious impact (still a step-wise throughput).
>
> Is there an explanation for why turning off SACK can be beneficial in
> the presence of packet reordering? That sounds pretty
> counter-intuitive to me... I thought SACK=1 always performs better
> than SACK=0. The results are also illustrated in the following plot.
> For each setting, there are three runs, which all exhibit a similar
> behavior:
>
> http://home.simula.no/~kaspar/static/mptcp-emu-wlan-hspa-02-sack.png
>
> Greetings,
> Dominik
>
> On Wed, Apr 27, 2011 at 11:57 AM, Carsten Wolff <carsten@wolffcarsten.de> wrote:
> > Hi all,
> >
> > On Tuesday 26 April 2011, John Heffner wrote:
> >> First, TCP is definitely not designed to work under such conditions.
> >> For example, assumptions behind RTO calculation and fast retransmit
> >> heuristics are violated.  However, in this particular case my first
> >> guess is that you are being limited by "cwnd moderation," which was
> >> the topic of recent discussion here.  Under persistent reordering,
> >> cwnd moderation can inhibit the ability of cwnd to grow.
> >
> > it's not just cwnd moderation (of which I'm still in favor, even though I lost
> > the argument by inactivity ;-)).
> >
> > Anyway, there are a lot of things in reordering handling that can be improved.
> > Our group (Alexander, Lennart, Arnd, myself and others) has worked on the
> > problem for a long time now. This work resulted in an algorithm that is in
> > large parts TCP-NCR (RFC4653), but also utilizes information gathered by
> > reordering detection for determination of a good DupThresh, fixes a few
> > problems in RFC4653 and improves on the reordering detection in Linux when the
> > connection has no timestamps option. We implemented "pure" TCP-NCR and our own
> > variant in Linux using a modular framework similar to the congestion control
> > modules. A lot of measurements and evaluation have gone into the comparison of
> > the three algorithms. We are now very close(TM) to a final patch, that is more
> > suited for publication on this list and integrates our algorithm into tcp*.
> > [hc] without introducing the overhead of that modular framework.
> >
> > Greetings,
> > Carsten
> >
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Linux TCP's Robustness to Multipath Packet Reordering
  2011-04-27 17:39                   ` Yuchung Cheng
@ 2011-04-27 17:53                     ` Alexander Zimmermann
  2011-04-27 19:56                     ` Dominik Kaspar
  1 sibling, 0 replies; 33+ messages in thread
From: Alexander Zimmermann @ 2011-04-27 17:53 UTC (permalink / raw)
  To: Yuchung Cheng
  Cc: Dominik Kaspar, Carsten Wolff, John Heffner, Eric Dumazet, netdev,
	Lennart Schulte, Arnd Hannemann

[-- Attachment #1: Type: text/plain, Size: 1323 bytes --]

Hi,

Am 27.04.2011 um 19:39 schrieb Yuchung Cheng:

> Hi Dominik,
> 
> On Wed, Apr 27, 2011 at 9:22 AM, Dominik Kaspar <dokaspar.ietf@gmail.com> wrote:
>> 
>> Hi Carsten,
>> 
>> Thanks for your feedback. I made some new tests with the same setup of
>> packet-based forwarding over two emulated paths (600 KB/s, 10 ms) +
>> (400 KB/s, 100 ms). In the first experiments, which showed a step-wise
>> adaptation to reordering, SACK, DSACK, and Timestamps were all
>> enabled. In the experiments, I individually disabled these three
>> mechanisms and saw the following:
>> 
>> - Disabling timestamps causes TCP to never adjust to reordering at all.
>> - Disabling SACK allows TCP to adapt very rapidly ("perfect" aggregation!).
> 
> Did you enable tcp_fack when sack is enabled? this may make a (big)
> difference. FACK assumes little network reordering and mark packet
> losses more aggressively.

It's not necessary to do it manually. Linux will disable FACK as soon as it
will detected reordering

Alex

//
// Dipl.-Inform. Alexander Zimmermann
// Department of Computer Science, Informatik 4
// RWTH Aachen University
// Ahornstr. 55, 52056 Aachen, Germany
// phone: (49-241) 80-21422, fax: (49-241) 80-22222
// email: zimmermann@cs.rwth-aachen.de
// web: http://www.umic-mesh.net
//


[-- Attachment #2: Signierter Teil der Nachricht --]
[-- Type: application/pgp-signature, Size: 243 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Linux TCP's Robustness to Multipath Packet Reordering
  2011-04-27 17:39                   ` Yuchung Cheng
  2011-04-27 17:53                     ` Alexander Zimmermann
@ 2011-04-27 19:56                     ` Dominik Kaspar
  2011-04-27 21:41                       ` Yuchung Cheng
  1 sibling, 1 reply; 33+ messages in thread
From: Dominik Kaspar @ 2011-04-27 19:56 UTC (permalink / raw)
  To: Yuchung Cheng
  Cc: Carsten Wolff, John Heffner, Eric Dumazet, netdev,
	Zimmermann Alexander, Lennart Schulte, Arnd Hannemann

Hi Yuchung,

Yes, FACK was enabled (as it is by default), but as Alexander already
pointed out, it should be disabled automatically when TCP detects
reordering.

However, I am not so sure how well this automatic turning off FACK is
done by Linux... I see a tendency that in situations with persistent
packet reordering, TCP with FACK enabled gets a lower performance than
if FACK is disabled right from the beginning of a connection.

Greetings,
Dominik

On Wed, Apr 27, 2011 at 7:39 PM, Yuchung Cheng <ycheng@google.com> wrote:
> Hi Dominik,
>
> On Wed, Apr 27, 2011 at 9:22 AM, Dominik Kaspar <dokaspar.ietf@gmail.com> wrote:
>>
>> Hi Carsten,
>>
>> Thanks for your feedback. I made some new tests with the same setup of
>> packet-based forwarding over two emulated paths (600 KB/s, 10 ms) +
>> (400 KB/s, 100 ms). In the first experiments, which showed a step-wise
>> adaptation to reordering, SACK, DSACK, and Timestamps were all
>> enabled. In the experiments, I individually disabled these three
>> mechanisms and saw the following:
>>
>> - Disabling timestamps causes TCP to never adjust to reordering at all.
>> - Disabling SACK allows TCP to adapt very rapidly ("perfect" aggregation!).
>
> Did you enable tcp_fack when sack is enabled? this may make a (big)
> difference. FACK assumes little network reordering and mark packet
> losses more aggressively.
>
>> - Disabling DSACK has no obvious impact (still a step-wise throughput).
>>
>> Is there an explanation for why turning off SACK can be beneficial in
>> the presence of packet reordering? That sounds pretty
>> counter-intuitive to me... I thought SACK=1 always performs better
>> than SACK=0. The results are also illustrated in the following plot.
>> For each setting, there are three runs, which all exhibit a similar
>> behavior:
>>
>> http://home.simula.no/~kaspar/static/mptcp-emu-wlan-hspa-02-sack.png
>>
>> Greetings,
>> Dominik
>>
>> On Wed, Apr 27, 2011 at 11:57 AM, Carsten Wolff <carsten@wolffcarsten.de> wrote:
>> > Hi all,
>> >
>> > On Tuesday 26 April 2011, John Heffner wrote:
>> >> First, TCP is definitely not designed to work under such conditions.
>> >> For example, assumptions behind RTO calculation and fast retransmit
>> >> heuristics are violated.  However, in this particular case my first
>> >> guess is that you are being limited by "cwnd moderation," which was
>> >> the topic of recent discussion here.  Under persistent reordering,
>> >> cwnd moderation can inhibit the ability of cwnd to grow.
>> >
>> > it's not just cwnd moderation (of which I'm still in favor, even though I lost
>> > the argument by inactivity ;-)).
>> >
>> > Anyway, there are a lot of things in reordering handling that can be improved.
>> > Our group (Alexander, Lennart, Arnd, myself and others) has worked on the
>> > problem for a long time now. This work resulted in an algorithm that is in
>> > large parts TCP-NCR (RFC4653), but also utilizes information gathered by
>> > reordering detection for determination of a good DupThresh, fixes a few
>> > problems in RFC4653 and improves on the reordering detection in Linux when the
>> > connection has no timestamps option. We implemented "pure" TCP-NCR and our own
>> > variant in Linux using a modular framework similar to the congestion control
>> > modules. A lot of measurements and evaluation have gone into the comparison of
>> > the three algorithms. We are now very close(TM) to a final patch, that is more
>> > suited for publication on this list and integrates our algorithm into tcp*.
>> > [hc] without introducing the overhead of that modular framework.
>> >
>> > Greetings,
>> > Carsten
>> >
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Linux TCP's Robustness to Multipath Packet Reordering
  2011-04-27 19:56                     ` Dominik Kaspar
@ 2011-04-27 21:41                       ` Yuchung Cheng
  2011-04-28  6:11                         ` Alexander Zimmermann
  0 siblings, 1 reply; 33+ messages in thread
From: Yuchung Cheng @ 2011-04-27 21:41 UTC (permalink / raw)
  To: Dominik Kaspar
  Cc: Carsten Wolff, John Heffner, Eric Dumazet, netdev,
	Zimmermann Alexander, Lennart Schulte, Arnd Hannemann

AFAIK, FACK is disabled throughout the life of the connection after
sender detects reordering degree > 3.

But Alex said the reordering has some bugs. I suspect these bugs may
affect FACK/sack auto-tuning. Maybe Alex could describe the reordering
bugs?

On Wed, Apr 27, 2011 at 12:56 PM, Dominik Kaspar
<dokaspar.ietf@gmail.com> wrote:
> Hi Yuchung,
>
> Yes, FACK was enabled (as it is by default), but as Alexander already
> pointed out, it should be disabled automatically when TCP detects
> reordering.
>
> However, I am not so sure how well this automatic turning off FACK is
> done by Linux... I see a tendency that in situations with persistent
> packet reordering, TCP with FACK enabled gets a lower performance than
> if FACK is disabled right from the beginning of a connection.
>
> Greetings,
> Dominik
>
> On Wed, Apr 27, 2011 at 7:39 PM, Yuchung Cheng <ycheng@google.com> wrote:
>> Hi Dominik,
>>
>> On Wed, Apr 27, 2011 at 9:22 AM, Dominik Kaspar <dokaspar.ietf@gmail.com> wrote:
>>>
>>> Hi Carsten,
>>>
>>> Thanks for your feedback. I made some new tests with the same setup of
>>> packet-based forwarding over two emulated paths (600 KB/s, 10 ms) +
>>> (400 KB/s, 100 ms). In the first experiments, which showed a step-wise
>>> adaptation to reordering, SACK, DSACK, and Timestamps were all
>>> enabled. In the experiments, I individually disabled these three
>>> mechanisms and saw the following:
>>>
>>> - Disabling timestamps causes TCP to never adjust to reordering at all.
>>> - Disabling SACK allows TCP to adapt very rapidly ("perfect" aggregation!).
>>
>> Did you enable tcp_fack when sack is enabled? this may make a (big)
>> difference. FACK assumes little network reordering and mark packet
>> losses more aggressively.
>>
>>> - Disabling DSACK has no obvious impact (still a step-wise throughput).
>>>
>>> Is there an explanation for why turning off SACK can be beneficial in
>>> the presence of packet reordering? That sounds pretty
>>> counter-intuitive to me... I thought SACK=1 always performs better
>>> than SACK=0. The results are also illustrated in the following plot.
>>> For each setting, there are three runs, which all exhibit a similar
>>> behavior:
>>>
>>> http://home.simula.no/~kaspar/static/mptcp-emu-wlan-hspa-02-sack.png
>>>
>>> Greetings,
>>> Dominik
>>>
>>> On Wed, Apr 27, 2011 at 11:57 AM, Carsten Wolff <carsten@wolffcarsten.de> wrote:
>>> > Hi all,
>>> >
>>> > On Tuesday 26 April 2011, John Heffner wrote:
>>> >> First, TCP is definitely not designed to work under such conditions.
>>> >> For example, assumptions behind RTO calculation and fast retransmit
>>> >> heuristics are violated.  However, in this particular case my first
>>> >> guess is that you are being limited by "cwnd moderation," which was
>>> >> the topic of recent discussion here.  Under persistent reordering,
>>> >> cwnd moderation can inhibit the ability of cwnd to grow.
>>> >
>>> > it's not just cwnd moderation (of which I'm still in favor, even though I lost
>>> > the argument by inactivity ;-)).
>>> >
>>> > Anyway, there are a lot of things in reordering handling that can be improved.
>>> > Our group (Alexander, Lennart, Arnd, myself and others) has worked on the
>>> > problem for a long time now. This work resulted in an algorithm that is in
>>> > large parts TCP-NCR (RFC4653), but also utilizes information gathered by
>>> > reordering detection for determination of a good DupThresh, fixes a few
>>> > problems in RFC4653 and improves on the reordering detection in Linux when the
>>> > connection has no timestamps option. We implemented "pure" TCP-NCR and our own
>>> > variant in Linux using a modular framework similar to the congestion control
>>> > modules. A lot of measurements and evaluation have gone into the comparison of
>>> > the three algorithms. We are now very close(TM) to a final patch, that is more
>>> > suited for publication on this list and integrates our algorithm into tcp*.
>>> > [hc] without introducing the overhead of that modular framework.
>>> >
>>> > Greetings,
>>> > Carsten
>>> >
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Linux TCP's Robustness to Multipath Packet Reordering
  2011-04-27 21:41                       ` Yuchung Cheng
@ 2011-04-28  6:11                         ` Alexander Zimmermann
  2011-06-19 15:22                           ` Dominik Kaspar
  0 siblings, 1 reply; 33+ messages in thread
From: Alexander Zimmermann @ 2011-04-28  6:11 UTC (permalink / raw)
  To: Yuchung Cheng
  Cc: Dominik Kaspar, Carsten Wolff, John Heffner, Eric Dumazet, netdev,
	Lennart Schulte, Arnd Hannemann

[-- Attachment #1: Type: text/plain, Size: 1240 bytes --]

Am 27.04.2011 um 23:41 schrieb Yuchung Cheng:

> AFAIK, FACK is disabled throughout the life of the connection after
> sender detects reordering degree > 3.

Right.

> 
> But Alex said the reordering has some bugs. I suspect these bugs may
> affect FACK/sack auto-tuning.

No. It affects reordering detection only. 

> Maybe Alex could describe the reordering
> bugs?
> 

Yes, I can. With DSACK, you have two cases. DSACK below and above SEG.ACK.
DSACK below SEG.ACK is the come case. However, Linux doesn't calculate a
reordering extent in this case. We will fix this. In the other case, DSACK
above SEG.ACK, Linux quantifies the reordering as the distance between the
received DSACK and snd_fack. However, this is only correct if the reordering
delay is greater the RTT. We will fix that too.

As a result, in the first case, we waste the opportunity to calculate an
extent (loosing performance). In the second case we overestimate the
reordering.

Alex

//
// Dipl.-Inform. Alexander Zimmermann
// Department of Computer Science, Informatik 4
// RWTH Aachen University
// Ahornstr. 55, 52056 Aachen, Germany
// phone: (49-241) 80-21422, fax: (49-241) 80-22222
// email: zimmermann@cs.rwth-aachen.de
// web: http://www.umic-mesh.net
//

[-- Attachment #2: Signierter Teil der Nachricht --]
[-- Type: application/pgp-signature, Size: 243 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Linux TCP's Robustness to Multipath Packet Reordering
  2011-04-28  6:11                         ` Alexander Zimmermann
@ 2011-06-19 15:22                           ` Dominik Kaspar
  2011-06-19 15:38                             ` Alexander Zimmermann
  0 siblings, 1 reply; 33+ messages in thread
From: Dominik Kaspar @ 2011-06-19 15:22 UTC (permalink / raw)
  To: netdev
  Cc: Alexander Zimmermann, Yuchung Cheng, Carsten Wolff, John Heffner,
	Eric Dumazet, Lennart Schulte, Arnd Hannemann

Hello again,

I have another question to Linux TCP and packet reordering. What
exactly happens, when a packet is so much delayed (but not causing a
timeout), that it gets overtaken by a retransmitted version of itself?
It seems to me that this results in "SACK reneging", but I don't
really understand why...

The simplified situation goes this:
- Segment A gets sent and very much delayed (but not causing RTO)
- Segments B, C, D cause dupACKs
- Segment A_ret is retransmitted and ACKed (sent over new path)
- Some more segments E, F, ... are sent and ACKed
- Segment A (the delayed one) arrives at the receiver.
- Now what exactly happens next...?

I use default Linux TCP (with sack=1, dsack=1, fack=1, timestamps=1,
...) and the above described series of events is cause why
transparently forwarding IP packets over multiple paths with RTTs of
10 and 100 milliseconds.

I'd appreciate your help - best regards,
Dominik

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Linux TCP's Robustness to Multipath Packet Reordering
  2011-06-19 15:22                           ` Dominik Kaspar
@ 2011-06-19 15:38                             ` Alexander Zimmermann
  2011-06-19 16:25                               ` Dominik Kaspar
  0 siblings, 1 reply; 33+ messages in thread
From: Alexander Zimmermann @ 2011-06-19 15:38 UTC (permalink / raw)
  To: Dominik Kaspar
  Cc: netdev, Yuchung Cheng, Carsten Wolff, John Heffner, Eric Dumazet,
	Lennart Schulte, Arnd Hannemann

[-- Attachment #1: Type: text/plain, Size: 1631 bytes --]

Hi,

Am 19.06.2011 um 17:22 schrieb Dominik Kaspar:

> Hello again,
> 
> I have another question to Linux TCP and packet reordering. What
> exactly happens, when a packet is so much delayed (but not causing a
> timeout), that it gets overtaken by a retransmitted version of itself?
> It seems to me that this results in "SACK reneging", but I don't
> really understand why...

in theory, you can detect this case with a combination of DSACK
and timestamps. However, in practice a reordering delay greater than
RTT will likely case an RTO (see RFC4653). IMO, if you have an packet
reordering with an delay greater that the RTT, you have much more problems  
that SACK reneging

> 
> The simplified situation goes this:
> - Segment A gets sent and very much delayed (but not causing RTO)
> - Segments B, C, D cause dupACKs
> - Segment A_ret is retransmitted and ACKed (sent over new path)
> - Some more segments E, F, ... are sent and ACKed
> - Segment A (the delayed one) arrives at the receiver.
> - Now what exactly happens next...?

Receiver sends a DSACK

> 
> I use default Linux TCP (with sack=1, dsack=1, fack=1, timestamps=1,
> ...) and the above described series of events is cause why
> transparently forwarding IP packets over multiple paths with RTTs of
> 10 and 100 milliseconds.
> 
> I'd appreciate your help - best regards,
> Dominik

//
// Dipl.-Inform. Alexander Zimmermann
// Department of Computer Science, Informatik 4
// RWTH Aachen University
// Ahornstr. 55, 52056 Aachen, Germany
// phone: (49-241) 80-21422, fax: (49-241) 80-22222
// email: zimmermann@cs.rwth-aachen.de
// web: http://www.umic-mesh.net
//


[-- Attachment #2: Signierter Teil der Nachricht --]
[-- Type: application/pgp-signature, Size: 243 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Linux TCP's Robustness to Multipath Packet Reordering
  2011-06-19 15:38                             ` Alexander Zimmermann
@ 2011-06-19 16:25                               ` Dominik Kaspar
  2011-06-20 10:42                                 ` Ilpo Järvinen
  0 siblings, 1 reply; 33+ messages in thread
From: Dominik Kaspar @ 2011-06-19 16:25 UTC (permalink / raw)
  To: Alexander Zimmermann
  Cc: netdev, Yuchung Cheng, Carsten Wolff, John Heffner, Eric Dumazet,
	Lennart Schulte, Arnd Hannemann

Hi Alexander,

Ah... the receiver DSACKs the "original" packet. However, the sender
already received an ACK for its retransmission and advances SND.UNA.
When the DSACK finally arrives, it is actually outside of the SND.UNA
- SND.NXT range, which causes the DSACK to trigger "SACK reneging".
Did I get that right? :-)

Cheers,
Dominik

On Sun, Jun 19, 2011 at 5:38 PM, Alexander Zimmermann
<alexander.zimmermann@comsys.rwth-aachen.de> wrote:
> Hi,
>
> Am 19.06.2011 um 17:22 schrieb Dominik Kaspar:
>
>> Hello again,
>>
>> I have another question to Linux TCP and packet reordering. What
>> exactly happens, when a packet is so much delayed (but not causing a
>> timeout), that it gets overtaken by a retransmitted version of itself?
>> It seems to me that this results in "SACK reneging", but I don't
>> really understand why...
>
> in theory, you can detect this case with a combination of DSACK
> and timestamps. However, in practice a reordering delay greater than
> RTT will likely case an RTO (see RFC4653). IMO, if you have an packet
> reordering with an delay greater that the RTT, you have much more problems
> that SACK reneging
>
>>
>> The simplified situation goes this:
>> - Segment A gets sent and very much delayed (but not causing RTO)
>> - Segments B, C, D cause dupACKs
>> - Segment A_ret is retransmitted and ACKed (sent over new path)
>> - Some more segments E, F, ... are sent and ACKed
>> - Segment A (the delayed one) arrives at the receiver.
>> - Now what exactly happens next...?
>
> Receiver sends a DSACK
>
>>
>> I use default Linux TCP (with sack=1, dsack=1, fack=1, timestamps=1,
>> ...) and the above described series of events is cause why
>> transparently forwarding IP packets over multiple paths with RTTs of
>> 10 and 100 milliseconds.
>>
>> I'd appreciate your help - best regards,
>> Dominik
>
> //
> // Dipl.-Inform. Alexander Zimmermann
> // Department of Computer Science, Informatik 4
> // RWTH Aachen University
> // Ahornstr. 55, 52056 Aachen, Germany
> // phone: (49-241) 80-21422, fax: (49-241) 80-22222
> // email: zimmermann@cs.rwth-aachen.de
> // web: http://www.umic-mesh.net
> //
>
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Linux TCP's Robustness to Multipath Packet Reordering
  2011-06-19 16:25                               ` Dominik Kaspar
@ 2011-06-20 10:42                                 ` Ilpo Järvinen
  2011-06-20 12:52                                   ` Dominik Kaspar
  0 siblings, 1 reply; 33+ messages in thread
From: Ilpo Järvinen @ 2011-06-20 10:42 UTC (permalink / raw)
  To: Dominik Kaspar
  Cc: Alexander Zimmermann, Netdev, Yuchung Cheng, Carsten Wolff,
	John Heffner, Eric Dumazet, Lennart Schulte, Arnd Hannemann

On Sun, 19 Jun 2011, Dominik Kaspar wrote:

> Ah... the receiver DSACKs the "original" packet. However, the sender
> already received an ACK for its retransmission and advances SND.UNA.
> When the DSACK finally arrives, it is actually outside of the SND.UNA
> - SND.NXT range, which causes the DSACK to trigger "SACK reneging".
> Did I get that right? :-)

Where did you get this idea of reneging?!? Reneging has nothing to do with 
DSACKs, instead it is only detected if the cumulative ACK stops to such 
boundary where the _next_ segment is SACKed (i.e., some reason the 
receiver "didn't bother" to cumulatively ACK for that too). ...That 
certainly does not happen (ever) for out of window DSACKs.

-- 
 i.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Linux TCP's Robustness to Multipath Packet Reordering
  2011-06-20 10:42                                 ` Ilpo Järvinen
@ 2011-06-20 12:52                                   ` Dominik Kaspar
  2011-06-21 11:35                                     ` Ilpo Järvinen
  0 siblings, 1 reply; 33+ messages in thread
From: Dominik Kaspar @ 2011-06-20 12:52 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: Alexander Zimmermann, Netdev, Yuchung Cheng, Carsten Wolff,
	John Heffner, Eric Dumazet, Lennart Schulte, Arnd Hannemann

Hi Ilpo,

> Where did you get this idea of reneging?!?

I observed that my scenario of a retransmitted packet overtaking the
original somehow causes TCP to enter the "Loss" state although no RTO
was caused. And since the Loss state seems to be only entered due to
RTO timeout or SACK reneging, I got the idea that reneging must be
occurring.

> Reneging has nothing to do with DSACKs,
> instead it is only detected if the cumulative ACK stops to such
> boundary where the _next_ segment is SACKed (i.e., some reason
> the receiver "didn't bother" to cumulatively ACK for that too). ...
> That certainly does not happen (ever) for out of window DSACKs.

You are right. If I turn off DSACK, the same thing happens: TCP enters
the Loss state without timeouts occurring. Isn't that a sign of
reneging happening? What else can it be?

Dominik

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Linux TCP's Robustness to Multipath Packet Reordering
  2011-06-20 12:52                                   ` Dominik Kaspar
@ 2011-06-21 11:35                                     ` Ilpo Järvinen
  0 siblings, 0 replies; 33+ messages in thread
From: Ilpo Järvinen @ 2011-06-21 11:35 UTC (permalink / raw)
  To: Dominik Kaspar
  Cc: Alexander Zimmermann, Netdev, Yuchung Cheng, Carsten Wolff,
	John Heffner, Eric Dumazet, Lennart Schulte, Arnd Hannemann

On Mon, 20 Jun 2011, Dominik Kaspar wrote:

> > Where did you get this idea of reneging?!?
> 
> I observed that my scenario of a retransmitted packet overtaking the
> original somehow causes TCP to enter the "Loss" state although no RTO
> was caused. And since the Loss state seems to be only entered due to
> RTO timeout or SACK reneging, I got the idea that reneging must be
> occurring.
> 
> > Reneging has nothing to do with DSACKs,
> > instead it is only detected if the cumulative ACK stops to such
> > boundary where the _next_ segment is SACKed (i.e., some reason
> > the receiver "didn't bother" to cumulatively ACK for that too). ...
> > That certainly does not happen (ever) for out of window DSACKs.
> 
> You are right. If I turn off DSACK, the same thing happens: TCP enters
> the Loss state without timeouts occurring. Isn't that a sign of
> reneging happening? What else can it be?

There's a MIB for reneging from where you should be able to confirm 
that it did(n't) happen...

Please note that tcpprobe is only run per ACK (not on timeouts), and 
FRTO (enabled by default) doesn't even cause CA_Loss entry immediately 
but slightly later on once it has figured out that the timeout doesn't 
seem to be spurious.

-- 
 i.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Linux TCP's Robustness to Multipath Packet Reordering
  2011-04-25 14:35   ` Dominik Kaspar
  2011-04-25 15:38     ` Eric Dumazet
@ 2011-04-26 20:43     ` Eric Dumazet
  2011-04-26 21:04       ` Dominik Kaspar
  1 sibling, 1 reply; 33+ messages in thread
From: Eric Dumazet @ 2011-04-26 20:43 UTC (permalink / raw)
  To: Dominik Kaspar; +Cc: Carsten Wolff, netdev

Le lundi 25 avril 2011 à 16:35 +0200, Dominik Kaspar a écrit :

> For the experiments, all default TCP options were used, meaning that
> SACK, DSACK, Timestamps, were all enabled. Not sure how to turn on/off
> TSO... so that is probably enabled, too. Path emulation is done with
> tc/netem at the receiver interfaces (eth1, eth2) with this script:
> 
> http://home.simula.no/~kaspar/static/netem.sh
> 

What are the exact parameters ? (queue size for instance)

It would be nice to give detailed stats after one run, on receiver
(since you have netem on ingress side)

tc -s -d qdisc



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Linux TCP's Robustness to Multipath Packet Reordering
  2011-04-26 20:43     ` Eric Dumazet
@ 2011-04-26 21:04       ` Dominik Kaspar
  2011-04-26 21:08         ` Eric Dumazet
  0 siblings, 1 reply; 33+ messages in thread
From: Dominik Kaspar @ 2011-04-26 21:04 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Carsten Wolff, netdev

On Tue, Apr 26, 2011 at 10:43 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le lundi 25 avril 2011 à 16:35 +0200, Dominik Kaspar a écrit :
>
>> For the experiments, all default TCP options were used, meaning that
>> SACK, DSACK, Timestamps, were all enabled. Not sure how to turn on/off
>> TSO... so that is probably enabled, too. Path emulation is done with
>> tc/netem at the receiver interfaces (eth1, eth2) with this script:
>>
>> http://home.simula.no/~kaspar/static/netem.sh
>>
>
> What are the exact parameters ? (queue size for instance)
>
> It would be nice to give detailed stats after one run, on receiver
> (since you have netem on ingress side)
>
> tc -s -d qdisc

In these experiments, a queue size of 1000 packets was specified. I am
aware that this is typically referred to as "buffer bloat" and causes
the RTT and the cwnd to grow excessively. The smaller I configure the
queues, the more time it takes for TCP to "level up" to the aggregate
throughput. By keeping the queues so large, I hope to more quickly
identify the reason why TCP is actually able to adjust to the immense
multipath reordering. What parameters could be highly relevant, other
than the queue size?

Thanks for the tip about printing tc/netem statistics after each run,
I will use "tc -s -d qdisc" next time.

Greetings,
Dominik

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Linux TCP's Robustness to Multipath Packet Reordering
  2011-04-26 21:04       ` Dominik Kaspar
@ 2011-04-26 21:08         ` Eric Dumazet
  2011-04-26 21:16           ` Dominik Kaspar
  2011-04-26 21:17           ` Eric Dumazet
  0 siblings, 2 replies; 33+ messages in thread
From: Eric Dumazet @ 2011-04-26 21:08 UTC (permalink / raw)
  To: Dominik Kaspar; +Cc: Carsten Wolff, netdev

Le mardi 26 avril 2011 à 23:04 +0200, Dominik Kaspar a écrit :

> In these experiments, a queue size of 1000 packets was specified. I am
> aware that this is typically referred to as "buffer bloat" and causes
> the RTT and the cwnd to grow excessively. The smaller I configure the
> queues, the more time it takes for TCP to "level up" to the aggregate
> throughput. By keeping the queues so large, I hope to more quickly
> identify the reason why TCP is actually able to adjust to the immense
> multipath reordering. What parameters could be highly relevant, other
> than the queue size?
> 

losses of course ;)

Real internet is full of packet losses, and probability of these losses
depends on queue sizes (RED like AQM)


> Thanks for the tip about printing tc/netem statistics after each run,
> I will use "tc -s -d qdisc" next time.
> 



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Linux TCP's Robustness to Multipath Packet Reordering
  2011-04-26 21:08         ` Eric Dumazet
@ 2011-04-26 21:16           ` Dominik Kaspar
  2011-04-26 21:17           ` Eric Dumazet
  1 sibling, 0 replies; 33+ messages in thread
From: Dominik Kaspar @ 2011-04-26 21:16 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Carsten Wolff, netdev

On Tue, Apr 26, 2011 at 11:08 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le mardi 26 avril 2011 à 23:04 +0200, Dominik Kaspar a écrit :
>
>> In these experiments, a queue size of 1000 packets was specified. I am
>> aware that this is typically referred to as "buffer bloat" and causes
>> the RTT and the cwnd to grow excessively. The smaller I configure the
>> queues, the more time it takes for TCP to "level up" to the aggregate
>> throughput. By keeping the queues so large, I hope to more quickly
>> identify the reason why TCP is actually able to adjust to the immense
>> multipath reordering. What parameters could be highly relevant, other
>> than the queue size?
>>
>
> losses of course ;)
>
> Real internet is full of packet losses, and probability of these losses
> depends on queue sizes (RED like AQM)
>

No additional random loss is introduced (yet), so packet loss happens
only when the queue size of 1000 packets is hit. Since the queues are
configured overly large, packet loss rarely happens at all... of
course at the cost of a large RTT.

I suspect that artificially bloating the RTT somehow allows TCP to
better adjust to multipath reordering... just haven't got a clue why.

Cheers,
Dominik

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Linux TCP's Robustness to Multipath Packet Reordering
  2011-04-26 21:08         ` Eric Dumazet
  2011-04-26 21:16           ` Dominik Kaspar
@ 2011-04-26 21:17           ` Eric Dumazet
  1 sibling, 0 replies; 33+ messages in thread
From: Eric Dumazet @ 2011-04-26 21:17 UTC (permalink / raw)
  To: Dominik Kaspar; +Cc: Carsten Wolff, netdev

Le mardi 26 avril 2011 à 23:08 +0200, Eric Dumazet a écrit :
> Le mardi 26 avril 2011 à 23:04 +0200, Dominik Kaspar a écrit :
> 
> > In these experiments, a queue size of 1000 packets was specified. I am
> > aware that this is typically referred to as "buffer bloat" and causes
> > the RTT and the cwnd to grow excessively. The smaller I configure the
> > queues, the more time it takes for TCP to "level up" to the aggregate
> > throughput. By keeping the queues so large, I hope to more quickly
> > identify the reason why TCP is actually able to adjust to the immense
> > multipath reordering. What parameters could be highly relevant, other
> > than the queue size?
> > 
> 
> losses of course ;)
> 
> Real internet is full of packet losses, and probability of these losses
> depends on queue sizes (RED like AQM)
> 
> 

BTW, netem in linux-2.6.39 contains lot of changes in netem module

commit 661b79725fea030803a89a16cda
(netem: revised correlated loss generator)

    This is a patch originated with Stefano Salsano and Fabio Ludovici.
    It provides several alternative loss models for use with netem.
    This patch adds two state machine based loss models.
    

http://netgroup.uniroma2.it/twiki/bin/view.cgi/Main/NetemCLG




^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Linux TCP's Robustness to Multipath Packet Reordering
  2011-04-25 10:37 Linux TCP's Robustness to Multipath Packet Reordering Dominik Kaspar
  2011-04-25 11:25 ` Eric Dumazet
@ 2011-04-25 12:59 ` Carsten Wolff
  1 sibling, 0 replies; 33+ messages in thread
From: Carsten Wolff @ 2011-04-25 12:59 UTC (permalink / raw)
  To: Dominik Kaspar; +Cc: netdev

Hi Dominik,

On Monday 25 April 2011, Dominik Kaspar wrote:
> Hello,
> 
> Knowing how critical packet reordering is for standard TCP, I am
> currently testing how robust Linux TCP is when packets are forwarded
> over multiple paths (with different bandwidth and RTT). Since Linux
> TCP adapts its "dupAck threshold" to an estimated level of packet
> reordering, I expect it to be much more robust than a standard TCP
> that strictly follows the RFCs. Indeed, as you can see in the
> following plot, my experiments show a step-wise adaptation of Linux
> TCP to heavy reordering. After many minutes, Linux TCP finally reaches
> a data throughput close to the perfect aggregated data rate of two
> paths (emulated with characteristics similar to IEEE 802.11b (WLAN)
> and a 3G link (HSPA)):
> 
> http://home.simula.no/~kaspar/static/mptcp-emu-wlan-hspa-00.png
> 
> Does anyone have clues what's going on here? Why does the aggregated
> throughput increase in steps? And what could be the reason it takes
> minutes to adapt to the full capacity, when in other cases, Linux TCP
> adapts much faster (for example if the bandwidth of both paths are
> equal). I would highly appreciate some advice from the netdev
> community.

the throughput increase in steps is most likely caused by Linux's reordering 
detection and quantization. The DupThresh (tp->reordering) is only increased 
when reordering is detected and is then set to a value that depends on current 
inflight/pipe. This means, on a path with only reordering and no loss, where a 
very large DupThresh is best, you will see those steps in the throughput 
everytime when Linux detects reordering during a time where cwnd is large. 
This on the other hand depends purely on timing/luck.
Linux is also only able to quantize reordering during disorder state, which 
leaves out many possible quantization samples, escpecially the larger ones, 
which would increase DupThresh to higher values.

Also, reordering detection depends very much on TCP options. Which TCP Options 
were enabled in your test? Timestamps? D-SACK?

Carsten

> 
> Implementation details:
> This multipath TCP experiment ran between a sending machine with a
> single Ethernet interface (eth0) and a client with two Ethernet
> interfaces (eth1, eth2). The machines are connected through a switch
> and tc/netem is used to emulate the bandwidth and RTT of both paths.
> TCP connections are established using iperf between eth0 and eth1 (the
> primary path). At the sender, an iptables' NFQUEUE is used to "spoof"
> the destination IP address of outgoing packets and force some to
> travel to eth2 instead of eth1 (the secondary path). This multipath
> scheduling happens in proportion to the emulated bandwidths, so if the
> paths are set to 500 and 1000 KB/s, then packets are distributed in a
> 1:2 ratio. At the client, iptables' RAWDNAT is used to translate the
> spoofed IP addresses back to their original, so that all packets end
> up at eth1, although a portion actually travelled to eth2. ACKs are
> not scheduled over multiple paths, but always travel back on the
> primary path. TCP does not notice anything of the multipath
> forwarding, except the side-effect of packet reordering, which can be
> huge if the path RTTs are set very differently.
> 
> Best regards,
> Dominik
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


-- 
           /\-´-/\
          (  @ @  )
________o0O___^___O0o________

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2011-06-21 11:46 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-04-25 10:37 Linux TCP's Robustness to Multipath Packet Reordering Dominik Kaspar
2011-04-25 11:25 ` Eric Dumazet
2011-04-25 14:35   ` Dominik Kaspar
2011-04-25 15:38     ` Eric Dumazet
2011-04-26 16:58       ` Dominik Kaspar
2011-04-26 17:10         ` Eric Dumazet
2011-04-26 18:00           ` Dominik Kaspar
2011-04-26 20:16             ` John Heffner
2011-04-26 21:27               ` Dominik Kaspar
2011-04-27  9:57               ` Carsten Wolff
2011-04-27 16:22                 ` Dominik Kaspar
2011-04-27 16:36                   ` Alexander Zimmermann
2011-06-21 11:25                     ` Ilpo Järvinen
2011-06-21 11:34                       ` Carsten Wolff
2011-06-21 11:46                         ` Ilpo Järvinen
2011-04-27 16:48                   ` Eric Dumazet
2011-04-27 17:39                   ` Yuchung Cheng
2011-04-27 17:53                     ` Alexander Zimmermann
2011-04-27 19:56                     ` Dominik Kaspar
2011-04-27 21:41                       ` Yuchung Cheng
2011-04-28  6:11                         ` Alexander Zimmermann
2011-06-19 15:22                           ` Dominik Kaspar
2011-06-19 15:38                             ` Alexander Zimmermann
2011-06-19 16:25                               ` Dominik Kaspar
2011-06-20 10:42                                 ` Ilpo Järvinen
2011-06-20 12:52                                   ` Dominik Kaspar
2011-06-21 11:35                                     ` Ilpo Järvinen
2011-04-26 20:43     ` Eric Dumazet
2011-04-26 21:04       ` Dominik Kaspar
2011-04-26 21:08         ` Eric Dumazet
2011-04-26 21:16           ` Dominik Kaspar
2011-04-26 21:17           ` Eric Dumazet
2011-04-25 12:59 ` Carsten Wolff

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).