[PATCH] Make CUBIC Hystart more robust to RTT variations

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] Make CUBIC Hystart more robust to RTT variations
@ 2011-03-08  9:32 Lucas Nussbaum
  2011-03-08 10:21 ` WANG Cong
  2011-03-10 23:28 ` Stephen Hemminger
  0 siblings, 2 replies; 27+ messages in thread
From: Lucas Nussbaum @ 2011-03-08  9:32 UTC (permalink / raw)
  To: netdev; +Cc: Sangtae Ha

CUBIC Hystart uses two heuristics to exit slow start earlier, before
losses start to occur. Unfortunately, it tends to exit slow start far too
early, causing poor performance since convergence to the optimal cwnd is
then very slow. This was reported in
http://permalink.gmane.org/gmane.linux.network/188169 and
https://partner-bugzilla.redhat.com/show_bug.cgi?id=616985

I am using an experimental testbed (http://www.grid5000.fr/) with two
machines connected using Gigabit ethernet to a dedicated 10-Gb backbone.
RTT between both machines is 11.3ms. Using TCP CUBIC without Hystart, cwnd
grows to ~2200.  With Hystart enabled, CUBIC exits slow start with cwnd
lower than 100, and often lower than 20, which leads to the poor
performance that I reported.

After instrumenting TCP CUBIC, I found out that the segment-to-ack RTT
tends to vary quite a lot even when the network is not congested, due to
several factors including the fact that TCP sends packet in burst (so the
packets are queued locally before being sent, increasing their RTT), and
delayed ACKs on the destination host.

The patch below increases the thresholds used by the two Hystart
heuristics. First, the length of an ACK train needs to reach 2*minRTT.
Second, the max RTT of a group of packets also needs to reach 2*minRTT.
In my setup, this causes Hystart to exit slow start when cwnd is in the
1900-2000 range using the ACK train heuristics, and sometimes to exit in the
700-900 range using the delay increase heuristic, dramatically improving
performance.

I've left commented out a printk that is useful for debugging the exit
point of Hystart. And I could provide access to my testbed if someone
wants to do further experiments.

Signed-off-by: Lucas Nussbaum <lucas.nussbaum@loria.fr>
-- 
| Lucas Nussbaum             MCF Université Nancy 2 |
| lucas.nussbaum@loria.fr         LORIA / AlGorille |
| http://www.loria.fr/~lnussbau/  +33 3 54 95 86 19 |

diff --git a/net/ipv4/tcp_cubic.c b/net/ipv4/tcp_cubic.c
index 71d5f2f..a973a49 100644
--- a/net/ipv4/tcp_cubic.c
+++ b/net/ipv4/tcp_cubic.c
@@ -344,7 +344,7 @@ static void hystart_update(struct sock *sk, u32 delay)
                /* first detection parameter - ack-train detection */
                if (curr_jiffies - ca->last_jiffies <= msecs_to_jiffies(2)) {
                        ca->last_jiffies = curr_jiffies;
-                       if (curr_jiffies - ca->round_start >= ca->delay_min>>4)
+                       if (curr_jiffies - ca->round_start >= ca->delay_min>>2)
                                ca->found |= HYSTART_ACK_TRAIN;
                }

@@ -355,8 +355,7 @@ static void hystart_update(struct sock *sk, u32 delay)

                        ca->sample_cnt++;
                } else {
-                       if (ca->curr_rtt > ca->delay_min +
-                           HYSTART_DELAY_THRESH(ca->delay_min>>4))
+                       if (ca->curr_rtt > ca->delay_min<<1)
                                ca->found |= HYSTART_DELAY;
                }
                /*
@@ -364,7 +363,10 @@ static void hystart_update(struct sock *sk, u32 delay)
                 * we exit from slow start immediately.
                 */
                if (ca->found & hystart_detect)
+               {
+//                     printk("hystart_update: cwnd=%u found=%d delay_min=%u cur_jif=%u round_start=%u curr_rtt=%u\n", tp->snd_cwnd, ca->found, ca
                        tp->snd_ssthresh = tp->snd_cwnd;
+               }
        }
 }

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH] Make CUBIC Hystart more robust to RTT variations
  2011-03-08  9:32 [PATCH] Make CUBIC Hystart more robust to RTT variations Lucas Nussbaum
@ 2011-03-08 10:21 ` WANG Cong
  2011-03-08 11:10   ` Lucas Nussbaum
  2011-03-10 23:28 ` Stephen Hemminger
  1 sibling, 1 reply; 27+ messages in thread
From: WANG Cong @ 2011-03-08 10:21 UTC (permalink / raw)
  To: netdev

On Tue, 08 Mar 2011 10:32:15 +0100, Lucas Nussbaum wrote:
> +               {
> +//                     printk("hystart_update: cwnd=%u found=%d
> delay_min=%u cur_jif=%u round_start=%u curr_rtt=%u\n", tp->snd_cwnd,
> ca->found, ca

Please remove this line from your patch.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH] Make CUBIC Hystart more robust to RTT variations
  2011-03-08 10:21 ` WANG Cong
@ 2011-03-08 11:10   ` Lucas Nussbaum
  2011-03-08 15:26     ` Injong Rhee
       [not found]     ` <AANLkTimdpEKHfVKw+bm6OnymcnUrauU+jGOPeLzy3Q0o@mail.gmail.com>
  0 siblings, 2 replies; 27+ messages in thread
From: Lucas Nussbaum @ 2011-03-08 11:10 UTC (permalink / raw)
  To: WANG Cong; +Cc: netdev

CUBIC Hystart uses two heuristics to exit slow start earlier, before
losses start to occur. Unfortunately, it tends to exit slow start far
too early, causing poor performance since convergence to the optimal
cwnd is then very slow. This was reported in
http://permalink.gmane.org/gmane.linux.network/188169 and
https://partner-bugzilla.redhat.com/show_bug.cgi?id=616985

I am using an experimental testbed (http://www.grid5000.fr/) with two
machines connected using Gigabit ethernet to a dedicated 10-Gb backbone.
RTT between both machines is 11.3ms. Using TCP CUBIC without Hystart,
cwnd grows to ~2200.  With Hystart enabled, CUBIC exits slow start with
cwnd lower than 100, and often lower than 20, which leads to the poor
performance that I reported.

After instrumenting TCP CUBIC, I found out that the segment-to-ack RTT
tends to vary quite a lot even when the network is not congested, due to
several factors including the fact that TCP sends packet in burst (so
the packets are queued locally before being sent, increasing their RTT),
and delayed ACKs on the destination host.

The patch below increases the thresholds used by the two Hystart
heuristics. First, the length of an ACK train needs to reach 2*minRTT.
Second, the max RTT of a group of packets also needs to reach 2*minRTT.
In my setup, this causes Hystart to exit slow start when cwnd is in the
1900-2000 range using the ACK train heuristics, and sometimes to exit in
the 700-900 range using the delay increase heuristic, dramatically
improving performance.

I could provide access to my testbed if someone wants to do further
experiments.

Signed-off-by: Lucas Nussbaum <lucas.nussbaum@loria.fr>
-- 
| Lucas Nussbaum             MCF Université Nancy 2 |
| lucas.nussbaum@loria.fr         LORIA / AlGorille |
| http://www.loria.fr/~lnussbau/  +33 3 54 95 86 19 |

---
diff --git a/net/ipv4/tcp_cubic.c b/net/ipv4/tcp_cubic.c
index 71d5f2f..e404de4 100644
--- a/net/ipv4/tcp_cubic.c
+++ b/net/ipv4/tcp_cubic.c
@@ -344,7 +344,7 @@ static void hystart_update(struct sock *sk, u32 delay)
                /* first detection parameter - ack-train detection */
                if (curr_jiffies - ca->last_jiffies <= msecs_to_jiffies(2)) {
                        ca->last_jiffies = curr_jiffies;
-                       if (curr_jiffies - ca->round_start >= ca->delay_min>>4)
+                       if (curr_jiffies - ca->round_start >= ca->delay_min>>2)
                                ca->found |= HYSTART_ACK_TRAIN;
                }

@@ -355,8 +355,7 @@ static void hystart_update(struct sock *sk, u32 delay)

                        ca->sample_cnt++;
                } else {
-                       if (ca->curr_rtt > ca->delay_min +
-                           HYSTART_DELAY_THRESH(ca->delay_min>>4))
+                       if (ca->curr_rtt > ca->delay_min<<1)
                                ca->found |= HYSTART_DELAY;
                }
                /*

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH] Make CUBIC Hystart more robust to RTT variations
  2011-03-08 11:10   ` Lucas Nussbaum
@ 2011-03-08 15:26     ` Injong Rhee
  2011-03-08 19:43       ` David Miller
       [not found]     ` <AANLkTimdpEKHfVKw+bm6OnymcnUrauU+jGOPeLzy3Q0o@mail.gmail.com>
  1 sibling, 1 reply; 27+ messages in thread
From: Injong Rhee @ 2011-03-08 15:26 UTC (permalink / raw)
  To: Lucas Nussbaum; +Cc: WANG Cong, netdev

Thanks for updating CUBIC hystart. You might want to test the
cases with more background traffic and verify whether this
threshold is too conservative.

On 3/8/11 6:10 AM, Lucas Nussbaum wrote:
> CUBIC Hystart uses two heuristics to exit slow start earlier, before
> losses start to occur. Unfortunately, it tends to exit slow start far
> too early, causing poor performance since convergence to the optimal
> cwnd is then very slow. This was reported in
> http://permalink.gmane.org/gmane.linux.network/188169 and
> https://partner-bugzilla.redhat.com/show_bug.cgi?id=616985
>
> I am using an experimental testbed (http://www.grid5000.fr/) with two
> machines connected using Gigabit ethernet to a dedicated 10-Gb backbone.
> RTT between both machines is 11.3ms. Using TCP CUBIC without Hystart,
> cwnd grows to ~2200.  With Hystart enabled, CUBIC exits slow start with
> cwnd lower than 100, and often lower than 20, which leads to the poor
> performance that I reported.
>
> After instrumenting TCP CUBIC, I found out that the segment-to-ack RTT
> tends to vary quite a lot even when the network is not congested, due to
> several factors including the fact that TCP sends packet in burst (so
> the packets are queued locally before being sent, increasing their RTT),
> and delayed ACKs on the destination host.
>
> The patch below increases the thresholds used by the two Hystart
> heuristics. First, the length of an ACK train needs to reach 2*minRTT.
> Second, the max RTT of a group of packets also needs to reach 2*minRTT.
> In my setup, this causes Hystart to exit slow start when cwnd is in the
> 1900-2000 range using the ACK train heuristics, and sometimes to exit in
> the 700-900 range using the delay increase heuristic, dramatically
> improving performance.
>
> I could provide access to my testbed if someone wants to do further
> experiments.
>
> Signed-off-by: Lucas Nussbaum<lucas.nussbaum@loria.fr>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Make CUBIC Hystart more robust to RTT variations
  2011-03-08 15:26     ` Injong Rhee
@ 2011-03-08 19:43       ` David Miller
  2011-03-08 23:21         ` Stephen Hemminger
  0 siblings, 1 reply; 27+ messages in thread
From: David Miller @ 2011-03-08 19:43 UTC (permalink / raw)
  To: rhee; +Cc: lucas.nussbaum, xiyou.wangcong, netdev

From: Injong Rhee <rhee@ncsu.edu>
Date: Tue, 08 Mar 2011 10:26:36 -0500

> Thanks for updating CUBIC hystart. You might want to test the
> cases with more background traffic and verify whether this
> threshold is too conservative.

So let's get down to basics.

What does Hystart do specially that allows it to avoid all of the
problems that TCP VEGAS runs into.

Specifically, that if you use RTTs to make congestion control
decisions it is impossible to notice new bandwidth becomming available
fast enough.

Again, it's impossible to react fast enough.  No matter what you tweak
all of your various settings to, this problem will still exist.

This is a core issue, you cannot get around it.

This is why I feel that Hystart is fundamentally flawed and we should
turn it off by default if not flat-out remove it.

Distributions are turning it off by default already, therefore it's
stupid for the upstream kernel to behave differently if that's what
%99 of the world is going to end up experiencing.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Make CUBIC Hystart more robust to RTT variations
  2011-03-08 19:43       ` David Miller
@ 2011-03-08 23:21         ` Stephen Hemminger
  2011-03-09  1:30           ` Injong Rhee
  2011-03-09  1:33           ` Sangtae Ha
  0 siblings, 2 replies; 27+ messages in thread
From: Stephen Hemminger @ 2011-03-08 23:21 UTC (permalink / raw)
  To: David Miller; +Cc: rhee, lucas.nussbaum, xiyou.wangcong, netdev

On Tue, 08 Mar 2011 11:43:46 -0800 (PST)
David Miller <davem@davemloft.net> wrote:

> From: Injong Rhee <rhee@ncsu.edu>
> Date: Tue, 08 Mar 2011 10:26:36 -0500
> 
> > Thanks for updating CUBIC hystart. You might want to test the
> > cases with more background traffic and verify whether this
> > threshold is too conservative.
> 
> So let's get down to basics.
> 
> What does Hystart do specially that allows it to avoid all of the
> problems that TCP VEGAS runs into.
> 
> Specifically, that if you use RTTs to make congestion control
> decisions it is impossible to notice new bandwidth becomming available
> fast enough.
> 
> Again, it's impossible to react fast enough.  No matter what you tweak
> all of your various settings to, this problem will still exist.
> 
> This is a core issue, you cannot get around it.
> 
> This is why I feel that Hystart is fundamentally flawed and we should
> turn it off by default if not flat-out remove it.
> 
> Distributions are turning it off by default already, therefore it's
> stupid for the upstream kernel to behave differently if that's what
> %99 of the world is going to end up experiencing.

The assumption in Hystart that spacing between ACK's is solely due to
congestion is a bad. If you read the paper, this is why FreeBSD's
estimation logic is dismissed. The Hystart problem is different
than the Vegas issue.

Algorithms that look at min RTT are ok, since the lower bound is
fixed; additional queuing and variation in network only increases RTT
it never reduces it. With a min RTT it is possible to compute the
upper bound on available bandwidth. i.e If all packets were as good as
this estimate minRTT then the available bandwidth is X. But then using
an individual RTT sample to estimate unused bandwidth is flawed. To
quote paper.

  "Thus, by checking whether ∆(N ) is larger than Dmin , we
can detect whether cwnd has reached the available capacity
of the path" 

So what goes wrong:
  1. Dmin can be too large because this connection always sees delays
due to other traffic or hardware. i.e buffer bloat.  This would cause
the bandwidth estimate to be too low and therefore TCP would leave
slow start too early (and not get up to full bandwidth).

  2. Dmin can be smaller than the clock resolution. This would cause
either sample to be ignored, or Dmin to be zero. If Dmin is zero,
the bandwidth estimate would in theory be infinite, which would
lead to TCP not leaving slow start because of Hystart. Instead
TCP would leave slow start at first loss.

Other possible problems:
  3. ACK's could be nudged together by variations in delay.
This would cause HyStart to exit slow start prematurely. To false
think it is an ACK train.

Noise in network is not catastrophic, it just
causes TCP to exit slow-start early and have to go into normal
window growth phase. The problem is that the original non-Hystart
behavior of Cubic is unfair; the first flow dominates the link
and other flows are unable to get in. If you run tests with two
flows one will get a larger share of the bandwidth.

I think Hystart is okay in concept but there may be issues
on low RTT links as well as other corner cases that need bug
fixing.

1. Needs to use better resolution than HZ. Since HZ can be 100.
2. Hardcoding 2ms as spacing between ACK's as train is wrong
   for local networks.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Make CUBIC Hystart more robust to RTT variations
  2011-03-08 23:21         ` Stephen Hemminger
@ 2011-03-09  1:30           ` Injong Rhee
  2011-03-09  6:53             ` Lucas Nussbaum
  2011-03-09  1:33           ` Sangtae Ha
  1 sibling, 1 reply; 27+ messages in thread
From: Injong Rhee @ 2011-03-09  1:30 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: David Miller, lucas.nussbaum, xiyou.wangcong, netdev, sangtae.ha

HyStart is a slow start algorithm, but not a congestion control 
algorithm. So the difference between vegas and hystart is obvious. Yes. 
Both hystart and vegas use delays for indication of congestion. But 
hystart exits slow starts at the detection of congestion and enters 
normal congestion avoidance; in some sense, it is much safer than vegas 
as it does not change the regular behaviors of congestion control.

I think the main problem arising right now is not because it is using 
noisy delays as congestion indication, but because of rather some 
implementation issues like use of Hz, hardcoding 2ms, etc.

Then, you might ask why hystart can use delays while vegas can't. The 
main motivation for use delays during slow start is that slow start 
creates an environment where delay samples can be more trusted. That is 
because it sends so many packets as a a burst because of doubling 
windows, which can be used as packet train to estimate the available 
capacity more reliably.

(tool 1) When many packets are sent in burst, the spacing in returning 
ACKs can be a good indicator. Hystart also uses delays as an estimation.

(tool 2) If estimated avg delays increase beyond a certain threshold, it 
sees that as a possible congestion.

Now, both tools can be wrong. But that is not catastrophic since 
congestion avoidance can kick in to save the day. In a pipe where no 
other flows are competing, then exiting slow start too early can slow 
things down as the window can be still too small. But that is in fact 
when delays are most reliable. So those tests that say bad performance 
with hystart are in fact, where hystart is supposed to perform well.

Then why do we have a bad performance? I think the answer is again the 
implementation flaws -- use different hardware, some hardwired codes, 
etc, and also could be related to a few corner cases like very low RTT 
links.

Let us examine Stephen's analysis in more detail.

1. Use of minRTT is ok. I agree.
2. Dmin can be too large at the beginning. But it is just like minRTT. 
This cannot be too large. If you trust minRTT, then delay estimation 
should say that there is a congestion. This is exactly the opposite case 
to the cases we are seeing. If Dmin is too large, then hystart would not 
exit the slow start as it does not detect the congestion. That is not 
what we are seeing right now.

3. Dmin can be smaller than clock resolution. That is why we are using a 
bunch of ACKs to get better accuracy. With a bunch of ACKs, we get 
higher value of spacing so that we can take average.

4. If ACKs are nudged together, then hystart does not quit slow start. 
Instead, it sees that there is no congestion. It is when it sees big 
spacing between ACKs -- that is when it detects congestion.

On 3/8/11 6:21 PM, Stephen Hemminger wrote:
> On Tue, 08 Mar 2011 11:43:46 -0800 (PST)
> David Miller<davem@davemloft.net>  wrote:
>
>> From: Injong Rhee<rhee@ncsu.edu>
>> Date: Tue, 08 Mar 2011 10:26:36 -0500
>>
>>> Thanks for updating CUBIC hystart. You might want to test the
>>> cases with more background traffic and verify whether this
>>> threshold is too conservative.
>> So let's get down to basics.
>>
>> What does Hystart do specially that allows it to avoid all of the
>> problems that TCP VEGAS runs into.
>>
>> Specifically, that if you use RTTs to make congestion control
>> decisions it is impossible to notice new bandwidth becomming available
>> fast enough.
>>
>> Again, it's impossible to react fast enough.  No matter what you tweak
>> all of your various settings to, this problem will still exist.
>>
>> This is a core issue, you cannot get around it.
>>
>> This is why I feel that Hystart is fundamentally flawed and we should
>> turn it off by default if not flat-out remove it.
>>
>> Distributions are turning it off by default already, therefore it's
>> stupid for the upstream kernel to behave differently if that's what
>> %99 of the world is going to end up experiencing.
> The assumption in Hystart that spacing between ACK's is solely due to
> congestion is a bad. If you read the paper, this is why FreeBSD's
> estimation logic is dismissed. The Hystart problem is different
> than the Vegas issue.
>
> Algorithms that look at min RTT are ok, since the lower bound is
> fixed; additional queuing and variation in network only increases RTT
> it never reduces it. With a min RTT it is possible to compute the
> upper bound on available bandwidth. i.e If all packets were as good as
> this estimate minRTT then the available bandwidth is X. But then using
> an individual RTT sample to estimate unused bandwidth is flawed. To
> quote paper.
>
>    "Thus, by checking whether ∆(N ) is larger than Dmin , we
> can detect whether cwnd has reached the available capacity
> of the path"
>
> So what goes wrong:
>    1. Dmin can be too large because this connection always sees delays
> due to other traffic or hardware. i.e buffer bloat.  This would cause
> the bandwidth estimate to be too low and therefore TCP would leave
> slow start too early (and not get up to full bandwidth).
>
>    2. Dmin can be smaller than the clock resolution. This would cause
> either sample to be ignored, or Dmin to be zero. If Dmin is zero,
> the bandwidth estimate would in theory be infinite, which would
> lead to TCP not leaving slow start because of Hystart. Instead
> TCP would leave slow start at first loss.
>
> Other possible problems:
>    3. ACK's could be nudged together by variations in delay.
> This would cause HyStart to exit slow start prematurely. To false
> think it is an ACK train.
>
> Noise in network is not catastrophic, it just
> causes TCP to exit slow-start early and have to go into normal
> window growth phase. The problem is that the original non-Hystart
> behavior of Cubic is unfair; the first flow dominates the link
> and other flows are unable to get in. If you run tests with two
> flows one will get a larger share of the bandwidth.
>
> I think Hystart is okay in concept but there may be issues
> on low RTT links as well as other corner cases that need bug
> fixing.
>
> 1. Needs to use better resolution than HZ. Since HZ can be 100.
> 2. Hardcoding 2ms as spacing between ACK's as train is wrong
>     for local networks.
>
>
>
>
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Make CUBIC Hystart more robust to RTT variations
  2011-03-09  1:30           ` Injong Rhee
@ 2011-03-09  6:53             ` Lucas Nussbaum
  2011-03-09 17:56               ` Stephen Hemminger
  2011-03-10  5:24               ` Bill Fink
  0 siblings, 2 replies; 27+ messages in thread
From: Lucas Nussbaum @ 2011-03-09  6:53 UTC (permalink / raw)
  To: Injong Rhee
  Cc: Stephen Hemminger, David Miller, xiyou.wangcong, netdev,
	sangtae.ha

On 08/03/11 at 20:30 -0500, Injong Rhee wrote:
> Now, both tools can be wrong. But that is not catastrophic since
> congestion avoidance can kick in to save the day. In a pipe where no
> other flows are competing, then exiting slow start too early can
> slow things down as the window can be still too small. But that is
> in fact when delays are most reliable. So those tests that say bad
> performance with hystart are in fact, where hystart is supposed to
> perform well.

Hi,

In my setup, there is no congestion at all (except the buffer bloat).
Without Hystart, transferring 8 Gb of data takes 9s, with CUBIC exiting
slow start at ~2000 packets.
With Hystart, transferring 8 Gb of data takes 19s, with CUBIC exiting
slow start at ~20 packets.
I don't think that this is "hystart performing well". We could just as
well remove slow start completely, and only do congestion avoidance,
then.

While I see the value in Hystart, it's clear that there are some flaws
in the current implementation. It probably makes sense to disable
hystart by default until those problems are fixed.
-- 
| Lucas Nussbaum             MCF Université Nancy 2 |
| lucas.nussbaum@loria.fr         LORIA / AlGorille |
| http://www.loria.fr/~lnussbau/  +33 3 54 95 86 19 |

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Make CUBIC Hystart more robust to RTT variations
  2011-03-09  6:53             ` Lucas Nussbaum
@ 2011-03-09 17:56               ` Stephen Hemminger
  2011-03-09 18:25                 ` Lucas Nussbaum
  2011-03-10  5:24               ` Bill Fink
  1 sibling, 1 reply; 27+ messages in thread
From: Stephen Hemminger @ 2011-03-09 17:56 UTC (permalink / raw)
  To: Lucas Nussbaum
  Cc: Injong Rhee, David Miller, xiyou.wangcong, netdev, sangtae.ha

On Wed, 9 Mar 2011 07:53:19 +0100
Lucas Nussbaum <lucas.nussbaum@loria.fr> wrote:

> On 08/03/11 at 20:30 -0500, Injong Rhee wrote:
> > Now, both tools can be wrong. But that is not catastrophic since
> > congestion avoidance can kick in to save the day. In a pipe where no
> > other flows are competing, then exiting slow start too early can
> > slow things down as the window can be still too small. But that is
> > in fact when delays are most reliable. So those tests that say bad
> > performance with hystart are in fact, where hystart is supposed to
> > perform well.
> 
> Hi,
> 
> In my setup, there is no congestion at all (except the buffer bloat).
> Without Hystart, transferring 8 Gb of data takes 9s, with CUBIC exiting
> slow start at ~2000 packets.
> With Hystart, transferring 8 Gb of data takes 19s, with CUBIC exiting
> slow start at ~20 packets.
> I don't think that this is "hystart performing well". We could just as
> well remove slow start completely, and only do congestion avoidance,
> then.
> 
> While I see the value in Hystart, it's clear that there are some flaws
> in the current implementation. It probably makes sense to disable
> hystart by default until those problems are fixed.


What is the speed and RTT time of your network?
I think you maybe blaming hystart for other issues in the network.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Make CUBIC Hystart more robust to RTT variations
  2011-03-09 17:56               ` Stephen Hemminger
@ 2011-03-09 18:25                 ` Lucas Nussbaum
  2011-03-09 19:56                   ` Stephen Hemminger
  2011-03-09 20:01                   ` Stephen Hemminger
  0 siblings, 2 replies; 27+ messages in thread
From: Lucas Nussbaum @ 2011-03-09 18:25 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Injong Rhee, David Miller, xiyou.wangcong, netdev, sangtae.ha

On 09/03/11 at 09:56 -0800, Stephen Hemminger wrote:
> On Wed, 9 Mar 2011 07:53:19 +0100
> Lucas Nussbaum <lucas.nussbaum@loria.fr> wrote:
> 
> > On 08/03/11 at 20:30 -0500, Injong Rhee wrote:
> > > Now, both tools can be wrong. But that is not catastrophic since
> > > congestion avoidance can kick in to save the day. In a pipe where no
> > > other flows are competing, then exiting slow start too early can
> > > slow things down as the window can be still too small. But that is
> > > in fact when delays are most reliable. So those tests that say bad
> > > performance with hystart are in fact, where hystart is supposed to
> > > perform well.
> > 
> > Hi,
> > 
> > In my setup, there is no congestion at all (except the buffer bloat).
> > Without Hystart, transferring 8 Gb of data takes 9s, with CUBIC exiting
> > slow start at ~2000 packets.
> > With Hystart, transferring 8 Gb of data takes 19s, with CUBIC exiting
> > slow start at ~20 packets.
> > I don't think that this is "hystart performing well". We could just as
> > well remove slow start completely, and only do congestion avoidance,
> > then.
> > 
> > While I see the value in Hystart, it's clear that there are some flaws
> > in the current implementation. It probably makes sense to disable
> > hystart by default until those problems are fixed.
> 
> What is the speed and RTT time of your network?
> I think you maybe blaming hystart for other issues in the network.

What kind of issues?

Host1 is connected through a gigabit ethernet LAN to Router1
Host2 is connected through a gigabit ethernet LAN to Router2
Router1 and Router2 are connected through an experimentation network at
10 Gb/s
RTT between Host1 and Host2 is 11.3ms.
The network is not congested.

(I can provide access to the testbed if someone wants to do further
testing)
-- 
| Lucas Nussbaum             MCF Université Nancy 2 |
| lucas.nussbaum@loria.fr         LORIA / AlGorille |
| http://www.loria.fr/~lnussbau/  +33 3 54 95 86 19 |

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Make CUBIC Hystart more robust to RTT variations
  2011-03-09 18:25                 ` Lucas Nussbaum
@ 2011-03-09 19:56                   ` Stephen Hemminger
  2011-03-09 21:28                     ` Lucas Nussbaum
  2011-03-09 20:01                   ` Stephen Hemminger
  1 sibling, 1 reply; 27+ messages in thread
From: Stephen Hemminger @ 2011-03-09 19:56 UTC (permalink / raw)
  To: Lucas Nussbaum
  Cc: Injong Rhee, David Miller, xiyou.wangcong, netdev, sangtae.ha

On Wed, 9 Mar 2011 19:25:05 +0100
Lucas Nussbaum <lucas.nussbaum@loria.fr> wrote:

> On 09/03/11 at 09:56 -0800, Stephen Hemminger wrote:
> > On Wed, 9 Mar 2011 07:53:19 +0100
> > Lucas Nussbaum <lucas.nussbaum@loria.fr> wrote:
> > 
> > > On 08/03/11 at 20:30 -0500, Injong Rhee wrote:
> > > > Now, both tools can be wrong. But that is not catastrophic since
> > > > congestion avoidance can kick in to save the day. In a pipe where no
> > > > other flows are competing, then exiting slow start too early can
> > > > slow things down as the window can be still too small. But that is
> > > > in fact when delays are most reliable. So those tests that say bad
> > > > performance with hystart are in fact, where hystart is supposed to
> > > > perform well.
> > > 
> > > Hi,
> > > 
> > > In my setup, there is no congestion at all (except the buffer bloat).
> > > Without Hystart, transferring 8 Gb of data takes 9s, with CUBIC exiting
> > > slow start at ~2000 packets.
> > > With Hystart, transferring 8 Gb of data takes 19s, with CUBIC exiting
> > > slow start at ~20 packets.
> > > I don't think that this is "hystart performing well". We could just as
> > > well remove slow start completely, and only do congestion avoidance,
> > > then.
> > > 
> > > While I see the value in Hystart, it's clear that there are some flaws
> > > in the current implementation. It probably makes sense to disable
> > > hystart by default until those problems are fixed.
> > 
> > What is the speed and RTT time of your network?
> > I think you maybe blaming hystart for other issues in the network.
> 
> What kind of issues?
> 
> Host1 is connected through a gigabit ethernet LAN to Router1
> Host2 is connected through a gigabit ethernet LAN to Router2
> Router1 and Router2 are connected through an experimentation network at
> 10 Gb/s
> RTT between Host1 and Host2 is 11.3ms.
> The network is not congested.
> 
> (I can provide access to the testbed if someone wants to do further
> testing)

Your backbone is faster than the LAN, interesting.
Could you check packet stats to see where packet drop is occuring?
It could be that routers don't have enough buffering to take packet
trains from 10G network and pace them out to 1G network.


-- 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Make CUBIC Hystart more robust to RTT variations
  2011-03-09 19:56                   ` Stephen Hemminger
@ 2011-03-09 21:28                     ` Lucas Nussbaum
  0 siblings, 0 replies; 27+ messages in thread
From: Lucas Nussbaum @ 2011-03-09 21:28 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Injong Rhee, David Miller, xiyou.wangcong, netdev, sangtae.ha

On 09/03/11 at 11:56 -0800, Stephen Hemminger wrote:
> On Wed, 9 Mar 2011 19:25:05 +0100
> Lucas Nussbaum <lucas.nussbaum@loria.fr> wrote:
> 
> > On 09/03/11 at 09:56 -0800, Stephen Hemminger wrote:
> > > On Wed, 9 Mar 2011 07:53:19 +0100
> > > Lucas Nussbaum <lucas.nussbaum@loria.fr> wrote:
> > > 
> > > > On 08/03/11 at 20:30 -0500, Injong Rhee wrote:
> > > > > Now, both tools can be wrong. But that is not catastrophic since
> > > > > congestion avoidance can kick in to save the day. In a pipe where no
> > > > > other flows are competing, then exiting slow start too early can
> > > > > slow things down as the window can be still too small. But that is
> > > > > in fact when delays are most reliable. So those tests that say bad
> > > > > performance with hystart are in fact, where hystart is supposed to
> > > > > perform well.
> > > > 
> > > > Hi,
> > > > 
> > > > In my setup, there is no congestion at all (except the buffer bloat).
> > > > Without Hystart, transferring 8 Gb of data takes 9s, with CUBIC exiting
> > > > slow start at ~2000 packets.
> > > > With Hystart, transferring 8 Gb of data takes 19s, with CUBIC exiting
> > > > slow start at ~20 packets.
> > > > I don't think that this is "hystart performing well". We could just as
> > > > well remove slow start completely, and only do congestion avoidance,
> > > > then.
> > > > 
> > > > While I see the value in Hystart, it's clear that there are some flaws
> > > > in the current implementation. It probably makes sense to disable
> > > > hystart by default until those problems are fixed.
> > > 
> > > What is the speed and RTT time of your network?
> > > I think you maybe blaming hystart for other issues in the network.
> > 
> > What kind of issues?
> > 
> > Host1 is connected through a gigabit ethernet LAN to Router1
> > Host2 is connected through a gigabit ethernet LAN to Router2
> > Router1 and Router2 are connected through an experimentation network at
> > 10 Gb/s
> > RTT between Host1 and Host2 is 11.3ms.
> > The network is not congested.
> > 
> > (I can provide access to the testbed if someone wants to do further
> > testing)
> 
> Your backbone is faster than the LAN, interesting.
> Could you check packet stats to see where packet drop is occuring?
> It could be that routers don't have enough buffering to take packet
> trains from 10G network and pace them out to 1G network.

I don't have access to the routers to check the packet counts here.
However, according to "netstat -s" on the sender(s), no retransmissions
are occuring, whether hystart is enabled or not: the host can just send
data at the network rate without experiencing congestion anywhere. Also,
it is unlikely that transient congestion in the backbone is an issue
according to the monitoring tools I have access to.

(Replying to your other mail as well)
> By my calculations (1G * 11.3ms) gives BDP of 941 packets which means
> CUBIC would ideally exit slow start at 900 or so packets. Old CUBIC
> slowstrart of 2000 packets means there is huge overshoot which means
> large packet loss burst which would cause a large CPU load on receiver
> processing SACK.

Since the network capacity is higher or equal to the network capacity on
the host, there's no reason why losses would occur if there's no
congestion caused by other traffic, right?

> I assume you haven't done anything that would disable RFC1323
> support like turn off window scaling or tcp timestamps.

No, nothing strange that could cause different results.

I've tried to exclude hardware problems by using different parts of the
testbed (see map at
https://www.grid5000.fr/mediawiki/images/Renater5-g5k.jpg).  I used
machines in rennes, lille, lyon and grenoble today (using different
hardware). My original testing was done between rennes and nancy. The
same symptoms appear everywhere, in both directions, and disappear when
disabling hystart.
-- 
| Lucas Nussbaum             MCF Université Nancy 2 |
| lucas.nussbaum@loria.fr         LORIA / AlGorille |
| http://www.loria.fr/~lnussbau/  +33 3 54 95 86 19 |

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Make CUBIC Hystart more robust to RTT variations
  2011-03-09 18:25                 ` Lucas Nussbaum
  2011-03-09 19:56                   ` Stephen Hemminger
@ 2011-03-09 20:01                   ` Stephen Hemminger
  2011-03-09 21:12                     ` Yuchung Cheng
  1 sibling, 1 reply; 27+ messages in thread
From: Stephen Hemminger @ 2011-03-09 20:01 UTC (permalink / raw)
  To: Lucas Nussbaum
  Cc: Injong Rhee, David Miller, xiyou.wangcong, netdev, sangtae.ha

On Wed, 9 Mar 2011 19:25:05 +0100
Lucas Nussbaum <lucas.nussbaum@loria.fr> wrote:

> On 09/03/11 at 09:56 -0800, Stephen Hemminger wrote:
> > On Wed, 9 Mar 2011 07:53:19 +0100
> > Lucas Nussbaum <lucas.nussbaum@loria.fr> wrote:
> > 
> > > On 08/03/11 at 20:30 -0500, Injong Rhee wrote:
> > > > Now, both tools can be wrong. But that is not catastrophic since
> > > > congestion avoidance can kick in to save the day. In a pipe where no
> > > > other flows are competing, then exiting slow start too early can
> > > > slow things down as the window can be still too small. But that is
> > > > in fact when delays are most reliable. So those tests that say bad
> > > > performance with hystart are in fact, where hystart is supposed to
> > > > perform well.
> > > 
> > > Hi,
> > > 
> > > In my setup, there is no congestion at all (except the buffer bloat).
> > > Without Hystart, transferring 8 Gb of data takes 9s, with CUBIC exiting
> > > slow start at ~2000 packets.
> > > With Hystart, transferring 8 Gb of data takes 19s, with CUBIC exiting
> > > slow start at ~20 packets.
> > > I don't think that this is "hystart performing well". We could just as
> > > well remove slow start completely, and only do congestion avoidance,
> > > then.
> > > 
> > > While I see the value in Hystart, it's clear that there are some flaws
> > > in the current implementation. It probably makes sense to disable
> > > hystart by default until those problems are fixed.
> > 
> > What is the speed and RTT time of your network?
> > I think you maybe blaming hystart for other issues in the network.
> 
> What kind of issues?
> 
> Host1 is connected through a gigabit ethernet LAN to Router1
> Host2 is connected through a gigabit ethernet LAN to Router2
> Router1 and Router2 are connected through an experimentation network at
> 10 Gb/s
> RTT between Host1 and Host2 is 11.3ms.
> The network is not congested.

By my calculations (1G * 11.3ms) gives BDP of 941 packets which means
CUBIC would ideally exit slow start at 900 or so packets. Old CUBIC
slowstrart of 2000 packets means there is huge overshoot which means
large packet loss burst which would cause a large CPU load on receiver
processing SACK.

I assume you haven't done anything that would disable RFC1323
support like turn off window scaling or tcp timestamps.


-- 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Make CUBIC Hystart more robust to RTT variations
  2011-03-09 20:01                   ` Stephen Hemminger
@ 2011-03-09 21:12                     ` Yuchung Cheng
  2011-03-09 21:33                       ` Lucas Nussbaum
  0 siblings, 1 reply; 27+ messages in thread
From: Yuchung Cheng @ 2011-03-09 21:12 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Lucas Nussbaum, Injong Rhee, David Miller, xiyou.wangcong, netdev,
	sangtae.ha

On Wed, Mar 9, 2011 at 12:01 PM, Stephen Hemminger
<shemminger@vyatta.com> wrote:
> On Wed, 9 Mar 2011 19:25:05 +0100
> Lucas Nussbaum <lucas.nussbaum@loria.fr> wrote:
>
>> On 09/03/11 at 09:56 -0800, Stephen Hemminger wrote:
>> > On Wed, 9 Mar 2011 07:53:19 +0100
>> > Lucas Nussbaum <lucas.nussbaum@loria.fr> wrote:
>> >
>> > > On 08/03/11 at 20:30 -0500, Injong Rhee wrote:
>> > > > Now, both tools can be wrong. But that is not catastrophic since
>> > > > congestion avoidance can kick in to save the day. In a pipe where no
>> > > > other flows are competing, then exiting slow start too early can
>> > > > slow things down as the window can be still too small. But that is
>> > > > in fact when delays are most reliable. So those tests that say bad
>> > > > performance with hystart are in fact, where hystart is supposed to
>> > > > perform well.
>> > >
>> > > Hi,
>> > >
>> > > In my setup, there is no congestion at all (except the buffer bloat).
>> > > Without Hystart, transferring 8 Gb of data takes 9s, with CUBIC exiting
>> > > slow start at ~2000 packets.
>> > > With Hystart, transferring 8 Gb of data takes 19s, with CUBIC exiting
>> > > slow start at ~20 packets.
>> > > I don't think that this is "hystart performing well". We could just as
>> > > well remove slow start completely, and only do congestion avoidance,
>> > > then.
>> > >
>> > > While I see the value in Hystart, it's clear that there are some flaws
>> > > in the current implementation. It probably makes sense to disable
>> > > hystart by default until those problems are fixed.
>> >
>> > What is the speed and RTT time of your network?
>> > I think you maybe blaming hystart for other issues in the network.
>>
>> What kind of issues?
>>
>> Host1 is connected through a gigabit ethernet LAN to Router1
>> Host2 is connected through a gigabit ethernet LAN to Router2
>> Router1 and Router2 are connected through an experimentation network at
>> 10 Gb/s
>> RTT between Host1 and Host2 is 11.3ms.
>> The network is not congested.
>
> By my calculations (1G * 11.3ms) gives BDP of 941 packets which means
> CUBIC would ideally exit slow start at 900 or so packets. Old CUBIC
> slowstrart of 2000 packets means there is huge overshoot which means
> large packet loss burst which would cause a large CPU load on receiver
> processing SACK.
It's not clear from Lucas's report that the hystart is exiting when
cwnd=2000 or when sender has sent 2000 packets.
Lucas could you clarify?

>
> I assume you haven't done anything that would disable RFC1323
> support like turn off window scaling or tcp timestamps.
>
>
> --
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Make CUBIC Hystart more robust to RTT variations
  2011-03-09 21:12                     ` Yuchung Cheng
@ 2011-03-09 21:33                       ` Lucas Nussbaum
  2011-03-09 21:51                         ` Stephen Hemminger
  0 siblings, 1 reply; 27+ messages in thread
From: Lucas Nussbaum @ 2011-03-09 21:33 UTC (permalink / raw)
  To: Yuchung Cheng
  Cc: Stephen Hemminger, Injong Rhee, David Miller, xiyou.wangcong,
	netdev, sangtae.ha

On 09/03/11 at 13:12 -0800, Yuchung Cheng wrote:
> On Wed, Mar 9, 2011 at 12:01 PM, Stephen Hemminger
> <shemminger@vyatta.com> wrote:
> > On Wed, 9 Mar 2011 19:25:05 +0100
> > Lucas Nussbaum <lucas.nussbaum@loria.fr> wrote:
> >
> >> On 09/03/11 at 09:56 -0800, Stephen Hemminger wrote:
> >> > On Wed, 9 Mar 2011 07:53:19 +0100
> >> > Lucas Nussbaum <lucas.nussbaum@loria.fr> wrote:
> >> >
> >> > > On 08/03/11 at 20:30 -0500, Injong Rhee wrote:
> >> > > > Now, both tools can be wrong. But that is not catastrophic since
> >> > > > congestion avoidance can kick in to save the day. In a pipe where no
> >> > > > other flows are competing, then exiting slow start too early can
> >> > > > slow things down as the window can be still too small. But that is
> >> > > > in fact when delays are most reliable. So those tests that say bad
> >> > > > performance with hystart are in fact, where hystart is supposed to
> >> > > > perform well.
> >> > >
> >> > > Hi,
> >> > >
> >> > > In my setup, there is no congestion at all (except the buffer bloat).
> >> > > Without Hystart, transferring 8 Gb of data takes 9s, with CUBIC exiting
> >> > > slow start at ~2000 packets.
> >> > > With Hystart, transferring 8 Gb of data takes 19s, with CUBIC exiting
> >> > > slow start at ~20 packets.
> >> > > I don't think that this is "hystart performing well". We could just as
> >> > > well remove slow start completely, and only do congestion avoidance,
> >> > > then.
> >> > >
> >> > > While I see the value in Hystart, it's clear that there are some flaws
> >> > > in the current implementation. It probably makes sense to disable
> >> > > hystart by default until those problems are fixed.
> >> >
> >> > What is the speed and RTT time of your network?
> >> > I think you maybe blaming hystart for other issues in the network.
> >>
> >> What kind of issues?
> >>
> >> Host1 is connected through a gigabit ethernet LAN to Router1
> >> Host2 is connected through a gigabit ethernet LAN to Router2
> >> Router1 and Router2 are connected through an experimentation network at
> >> 10 Gb/s
> >> RTT between Host1 and Host2 is 11.3ms.
> >> The network is not congested.
> >
> > By my calculations (1G * 11.3ms) gives BDP of 941 packets which means
> > CUBIC would ideally exit slow start at 900 or so packets. Old CUBIC
> > slowstrart of 2000 packets means there is huge overshoot which means
> > large packet loss burst which would cause a large CPU load on receiver
> > processing SACK.
> It's not clear from Lucas's report that the hystart is exiting when
> cwnd=2000 or when sender has sent 2000 packets.
> Lucas could you clarify?

When cwnd is around 2000.
-- 
| Lucas Nussbaum             MCF Université Nancy 2 |
| lucas.nussbaum@loria.fr         LORIA / AlGorille |
| http://www.loria.fr/~lnussbau/  +33 3 54 95 86 19 |

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Make CUBIC Hystart more robust to RTT variations
  2011-03-09 21:33                       ` Lucas Nussbaum
@ 2011-03-09 21:51                         ` Stephen Hemminger
  2011-03-09 22:03                           ` Lucas Nussbaum
  0 siblings, 1 reply; 27+ messages in thread
From: Stephen Hemminger @ 2011-03-09 21:51 UTC (permalink / raw)
  To: Lucas Nussbaum
  Cc: Yuchung Cheng, Injong Rhee, David Miller, xiyou.wangcong, netdev,
	sangtae.ha

On Wed, 9 Mar 2011 22:33:56 +0100
Lucas Nussbaum <lucas.nussbaum@loria.fr> wrote:

> On 09/03/11 at 13:12 -0800, Yuchung Cheng wrote:
> > On Wed, Mar 9, 2011 at 12:01 PM, Stephen Hemminger
> > <shemminger@vyatta.com> wrote:
> > > On Wed, 9 Mar 2011 19:25:05 +0100
> > > Lucas Nussbaum <lucas.nussbaum@loria.fr> wrote:
> > >
> > >> On 09/03/11 at 09:56 -0800, Stephen Hemminger wrote:
> > >> > On Wed, 9 Mar 2011 07:53:19 +0100
> > >> > Lucas Nussbaum <lucas.nussbaum@loria.fr> wrote:
> > >> >
> > >> > > On 08/03/11 at 20:30 -0500, Injong Rhee wrote:
> > >> > > > Now, both tools can be wrong. But that is not catastrophic since
> > >> > > > congestion avoidance can kick in to save the day. In a pipe where no
> > >> > > > other flows are competing, then exiting slow start too early can
> > >> > > > slow things down as the window can be still too small. But that is
> > >> > > > in fact when delays are most reliable. So those tests that say bad
> > >> > > > performance with hystart are in fact, where hystart is supposed to
> > >> > > > perform well.
> > >> > >
> > >> > > Hi,
> > >> > >
> > >> > > In my setup, there is no congestion at all (except the buffer bloat).
> > >> > > Without Hystart, transferring 8 Gb of data takes 9s, with CUBIC exiting
> > >> > > slow start at ~2000 packets.
> > >> > > With Hystart, transferring 8 Gb of data takes 19s, with CUBIC exiting
> > >> > > slow start at ~20 packets.
> > >> > > I don't think that this is "hystart performing well". We could just as
> > >> > > well remove slow start completely, and only do congestion avoidance,
> > >> > > then.
> > >> > >
> > >> > > While I see the value in Hystart, it's clear that there are some flaws
> > >> > > in the current implementation. It probably makes sense to disable
> > >> > > hystart by default until those problems are fixed.
> > >> >
> > >> > What is the speed and RTT time of your network?
> > >> > I think you maybe blaming hystart for other issues in the network.
> > >>
> > >> What kind of issues?
> > >>
> > >> Host1 is connected through a gigabit ethernet LAN to Router1
> > >> Host2 is connected through a gigabit ethernet LAN to Router2
> > >> Router1 and Router2 are connected through an experimentation network at
> > >> 10 Gb/s
> > >> RTT between Host1 and Host2 is 11.3ms.
> > >> The network is not congested.
> > >
> > > By my calculations (1G * 11.3ms) gives BDP of 941 packets which means
> > > CUBIC would ideally exit slow start at 900 or so packets. Old CUBIC
> > > slowstrart of 2000 packets means there is huge overshoot which means
> > > large packet loss burst which would cause a large CPU load on receiver
> > > processing SACK.
> > It's not clear from Lucas's report that the hystart is exiting when
> > cwnd=2000 or when sender has sent 2000 packets.
> > Lucas could you clarify?
> 
> When cwnd is around 2000.

What is HZ on the kernel configuration. Part of the problem is the hystart
code was only tested with HZ=1000 and there are some bad assumptions there.


-- 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Make CUBIC Hystart more robust to RTT variations
  2011-03-09 21:51                         ` Stephen Hemminger
@ 2011-03-09 22:03                           ` Lucas Nussbaum
  0 siblings, 0 replies; 27+ messages in thread
From: Lucas Nussbaum @ 2011-03-09 22:03 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Yuchung Cheng, Injong Rhee, David Miller, xiyou.wangcong, netdev,
	sangtae.ha

On 09/03/11 at 13:51 -0800, Stephen Hemminger wrote:
> On Wed, 9 Mar 2011 22:33:56 +0100
> Lucas Nussbaum <lucas.nussbaum@loria.fr> wrote:
> 
> > On 09/03/11 at 13:12 -0800, Yuchung Cheng wrote:
> > > On Wed, Mar 9, 2011 at 12:01 PM, Stephen Hemminger
> > > <shemminger@vyatta.com> wrote:
> > > > On Wed, 9 Mar 2011 19:25:05 +0100
> > > > Lucas Nussbaum <lucas.nussbaum@loria.fr> wrote:
> > > >
> > > >> On 09/03/11 at 09:56 -0800, Stephen Hemminger wrote:
> > > >> > On Wed, 9 Mar 2011 07:53:19 +0100
> > > >> > Lucas Nussbaum <lucas.nussbaum@loria.fr> wrote:
> > > >> >
> > > >> > > On 08/03/11 at 20:30 -0500, Injong Rhee wrote:
> > > >> > > > Now, both tools can be wrong. But that is not catastrophic since
> > > >> > > > congestion avoidance can kick in to save the day. In a pipe where no
> > > >> > > > other flows are competing, then exiting slow start too early can
> > > >> > > > slow things down as the window can be still too small. But that is
> > > >> > > > in fact when delays are most reliable. So those tests that say bad
> > > >> > > > performance with hystart are in fact, where hystart is supposed to
> > > >> > > > perform well.
> > > >> > >
> > > >> > > Hi,
> > > >> > >
> > > >> > > In my setup, there is no congestion at all (except the buffer bloat).
> > > >> > > Without Hystart, transferring 8 Gb of data takes 9s, with CUBIC exiting
> > > >> > > slow start at ~2000 packets.
> > > >> > > With Hystart, transferring 8 Gb of data takes 19s, with CUBIC exiting
> > > >> > > slow start at ~20 packets.
> > > >> > > I don't think that this is "hystart performing well". We could just as
> > > >> > > well remove slow start completely, and only do congestion avoidance,
> > > >> > > then.
> > > >> > >
> > > >> > > While I see the value in Hystart, it's clear that there are some flaws
> > > >> > > in the current implementation. It probably makes sense to disable
> > > >> > > hystart by default until those problems are fixed.
> > > >> >
> > > >> > What is the speed and RTT time of your network?
> > > >> > I think you maybe blaming hystart for other issues in the network.
> > > >>
> > > >> What kind of issues?
> > > >>
> > > >> Host1 is connected through a gigabit ethernet LAN to Router1
> > > >> Host2 is connected through a gigabit ethernet LAN to Router2
> > > >> Router1 and Router2 are connected through an experimentation network at
> > > >> 10 Gb/s
> > > >> RTT between Host1 and Host2 is 11.3ms.
> > > >> The network is not congested.
> > > >
> > > > By my calculations (1G * 11.3ms) gives BDP of 941 packets which means
> > > > CUBIC would ideally exit slow start at 900 or so packets. Old CUBIC
> > > > slowstrart of 2000 packets means there is huge overshoot which means
> > > > large packet loss burst which would cause a large CPU load on receiver
> > > > processing SACK.
> > > It's not clear from Lucas's report that the hystart is exiting when
> > > cwnd=2000 or when sender has sent 2000 packets.
> > > Lucas could you clarify?
> > 
> > When cwnd is around 2000.
> 
> What is HZ on the kernel configuration. Part of the problem is the hystart
> code was only tested with HZ=1000 and there are some bad assumptions there.

$ grep HZ /boot/config-2.6.32-5-amd64 
CONFIG_NO_HZ=y
# CONFIG_HZ_100 is not set
CONFIG_HZ_250=y
# CONFIG_HZ_300 is not set
# CONFIG_HZ_1000 is not set
CONFIG_HZ=250
-- 
| Lucas Nussbaum             MCF Université Nancy 2 |
| lucas.nussbaum@loria.fr         LORIA / AlGorille |
| http://www.loria.fr/~lnussbau/  +33 3 54 95 86 19 |

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Make CUBIC Hystart more robust to RTT variations
  2011-03-09  6:53             ` Lucas Nussbaum
  2011-03-09 17:56               ` Stephen Hemminger
@ 2011-03-10  5:24               ` Bill Fink
  2011-03-10  6:17                 ` Stephen Hemminger
  2011-03-10 14:37                 ` Injong Rhee
  1 sibling, 2 replies; 27+ messages in thread
From: Bill Fink @ 2011-03-10  5:24 UTC (permalink / raw)
  To: Lucas Nussbaum
  Cc: Injong Rhee, Stephen Hemminger, David Miller, xiyou.wangcong,
	netdev, sangtae.ha

On Wed, 9 Mar 2011, Lucas Nussbaum wrote:

> On 08/03/11 at 20:30 -0500, Injong Rhee wrote:
> > Now, both tools can be wrong. But that is not catastrophic since
> > congestion avoidance can kick in to save the day. In a pipe where no
> > other flows are competing, then exiting slow start too early can
> > slow things down as the window can be still too small. But that is
> > in fact when delays are most reliable. So those tests that say bad
> > performance with hystart are in fact, where hystart is supposed to
> > perform well.
> 
> Hi,
> 
> In my setup, there is no congestion at all (except the buffer bloat).
> Without Hystart, transferring 8 Gb of data takes 9s, with CUBIC exiting
> slow start at ~2000 packets.
> With Hystart, transferring 8 Gb of data takes 19s, with CUBIC exiting
> slow start at ~20 packets.
> I don't think that this is "hystart performing well". We could just as
> well remove slow start completely, and only do congestion avoidance,
> then.
> 
> While I see the value in Hystart, it's clear that there are some flaws
> in the current implementation. It probably makes sense to disable
> hystart by default until those problems are fixed.

Here are some tests I performed across real networks, where
congestion is generally not an issue, with a 2.6.35 kernel on
the transmit side.

8 GB transfer across an 18 ms RTT path with autotuning and hystart:

i7test7% nuttcp -n8g -i1 192.168.1.23
  517.9375 MB /   1.00 sec = 4344.6096 Mbps     0 retrans
  688.4375 MB /   1.00 sec = 5775.1998 Mbps     0 retrans
  692.9375 MB /   1.00 sec = 5812.7462 Mbps     0 retrans
  698.0625 MB /   1.00 sec = 5855.8078 Mbps     0 retrans
  699.8750 MB /   1.00 sec = 5871.0123 Mbps     0 retrans
  710.5625 MB /   1.00 sec = 5960.5707 Mbps     0 retrans
  728.8125 MB /   1.00 sec = 6113.7652 Mbps     0 retrans
  751.3750 MB /   1.00 sec = 6302.9210 Mbps     0 retrans
  783.8750 MB /   1.00 sec = 6575.6201 Mbps     0 retrans
  825.1875 MB /   1.00 sec = 6921.8145 Mbps     0 retrans
  875.4375 MB /   1.00 sec = 7343.9811 Mbps     0 retrans

 8192.0000 MB /  11.26 sec = 6102.4718 Mbps 11 %TX 28 %RX 0 retrans 18.92 msRTT

Ramps up quickly to a little under 6 Gbps, then increases more
slowly to 7+ Gbps, with no TCP retransmissions.

8 GB transfer across an 18 ms RTT path with 40 MB socket buffer and hystart:

i7test7% nuttcp -n8g -w40m -i1 192.168.1.23
  970.0625 MB /   1.00 sec = 8136.8475 Mbps     0 retrans
 1181.1875 MB /   1.00 sec = 9909.0045 Mbps     0 retrans
 1181.2500 MB /   1.00 sec = 9908.6369 Mbps     0 retrans
 1181.3125 MB /   1.00 sec = 9909.8747 Mbps     0 retrans
 1181.2500 MB /   1.00 sec = 9909.0531 Mbps     0 retrans
 1181.2500 MB /   1.00 sec = 9908.8153 Mbps     0 retrans
 1181.2500 MB /   1.00 sec = 9909.0729 Mbps     0 retrans

 8192.0000 MB /   7.13 sec = 9633.5814 Mbps 17 %TX 42 %RX 0 retrans 18.91 msRTT

Quickly ramps up to full 10-GigE line rate, with no TCP retrans.

8 GB transfer across an 18 ms RTT path with autotuning and no hystart:

i7test7% nuttcp -n8g -i1 192.168.1.23
  845.4375 MB /   1.00 sec = 7091.5828 Mbps     0 retrans
 1181.3125 MB /   1.00 sec = 9910.0134 Mbps     0 retrans
 1181.0625 MB /   1.00 sec = 9907.1830 Mbps     0 retrans
 1181.4375 MB /   1.00 sec = 9910.8936 Mbps     0 retrans
 1181.1875 MB /   1.00 sec = 9908.1721 Mbps     0 retrans
 1181.3125 MB /   1.00 sec = 9909.5774 Mbps     0 retrans
 1181.1875 MB /   1.00 sec = 9908.6874 Mbps     0 retrans

 8192.0000 MB /   7.25 sec = 9484.4524 Mbps 18 %TX 41 %RX 0 retrans 18.92 msRTT

Also quickly ramps up to full 10-GigE line rate, with no TCP retrans.

8 GB transfer across an 18 ms RTT path with 40 MB socket buffer and no hystart:

i7test7% nuttcp -n8g -w40m -i1 192.168.1.23
  969.8750 MB /   1.00 sec = 8135.6571 Mbps     0 retrans
 1181.3125 MB /   1.00 sec = 9909.3990 Mbps     0 retrans
 1181.2500 MB /   1.00 sec = 9908.9342 Mbps     0 retrans
 1181.2500 MB /   1.00 sec = 9909.4098 Mbps     0 retrans
 1181.2500 MB /   1.00 sec = 9908.8252 Mbps     0 retrans
 1181.2500 MB /   1.00 sec = 9909.0630 Mbps     0 retrans
 1181.2500 MB /   1.00 sec = 9909.3504 Mbps     0 retrans

 8192.0000 MB /   7.15 sec = 9611.8053 Mbps 18 %TX 42 %RX 0 retrans 18.95 msRTT

Basically the same as the case with 40 MB socket buffer and hystart enabled.

Now trying the same type of tests across an 80 ms RTT path.

8 GB transfer across an 80 ms RTT path with autotuning and hystart:

i7test7% nuttcp -n8g -i1 192.168.1.18
   11.3125 MB /   1.00 sec =   94.8954 Mbps     0 retrans
  441.5625 MB /   1.00 sec = 3704.1021 Mbps     0 retrans
  687.3750 MB /   1.00 sec = 5765.8657 Mbps     0 retrans
  715.5625 MB /   1.00 sec = 6002.6273 Mbps     0 retrans
  709.9375 MB /   1.00 sec = 5955.5958 Mbps     0 retrans
  691.3125 MB /   1.00 sec = 5799.0626 Mbps     0 retrans
  718.6250 MB /   1.00 sec = 6028.3538 Mbps     0 retrans
  718.0000 MB /   1.00 sec = 6023.0205 Mbps     0 retrans
  704.0000 MB /   1.00 sec = 5905.5387 Mbps     0 retrans
  733.3125 MB /   1.00 sec = 6151.4096 Mbps     0 retrans
  738.8750 MB /   1.00 sec = 6198.2381 Mbps     0 retrans
  731.8750 MB /   1.00 sec = 6139.3695 Mbps     0 retrans

 8192.0000 MB /  12.85 sec = 5348.9677 Mbps 10 %TX 23 %RX 0 retrans 80.81 msRTT

Similar to the 20 ms RTT path, but achieving somewhat lower
performance levels, presumably due to the larger RTT.  Ramps
up fairly quickly to a little under 6 Gbps, then increases
more slowly to 6+ Gbps, with no TCP retransmissions.

8 GB transfer across an 80 ms RTT path with 100 MB socket buffer and hystart:

i7test7% nuttcp -n8g -w100m -i1 192.168.1.18
  103.9375 MB /   1.00 sec =  871.8378 Mbps     0 retrans
 1086.5625 MB /   1.00 sec = 9114.6102 Mbps     0 retrans
 1106.6875 MB /   1.00 sec = 9283.5583 Mbps     0 retrans
 1109.3125 MB /   1.00 sec = 9305.5226 Mbps     0 retrans
 1111.1875 MB /   1.00 sec = 9321.9596 Mbps     0 retrans
 1112.8125 MB /   1.00 sec = 9334.8452 Mbps     0 retrans
 1113.6875 MB /   1.00 sec = 9341.6620 Mbps     0 retrans
 1120.2500 MB /   1.00 sec = 9398.0054 Mbps     0 retrans

 8192.0000 MB /   8.37 sec = 8207.2049 Mbps 16 %TX 38 %RX 0 retrans 80.81 msRTT

Quickly ramps up to 9+ Gbps and then slowly increases further,
with no TCP retrans.

8 GB transfer across an 80 ms RTT path with autotuning and no hystart:

i7test7% nuttcp -n8g -i1 192.168.1.18
   11.2500 MB /   1.00 sec =   94.3703 Mbps     0 retrans
  519.0625 MB /   1.00 sec = 4354.1596 Mbps     0 retrans
  861.2500 MB /   1.00 sec = 7224.7970 Mbps     0 retrans
  871.0000 MB /   1.00 sec = 7306.4191 Mbps     0 retrans
  860.7500 MB /   1.00 sec = 7220.4438 Mbps     0 retrans
  869.0625 MB /   1.00 sec = 7290.3340 Mbps     0 retrans
  863.4375 MB /   1.00 sec = 7242.7707 Mbps     0 retrans
  860.4375 MB /   1.00 sec = 7218.0606 Mbps     0 retrans
  875.5000 MB /   1.00 sec = 7344.3071 Mbps     0 retrans
  863.1875 MB /   1.00 sec = 7240.8257 Mbps     0 retrans

 8192.0000 MB /  10.98 sec = 6259.4379 Mbps 12 %TX 27 %RX 0 retrans 80.81 msRTT

Ramps up quickly to 7+ Gbps, then appears to stabilize at that
level, with no TCP retransmissions.  Performance is somewhat
better than with autotuning enabled, but less than using a
manually set 100 MB socket buffer.

8 GB transfer across an 80 ms RTT path with 100 MB socket buffer and no hystart:

i7test7% nuttcp -n8g -w100m -i1 192.168.1.18
  102.8750 MB /   1.00 sec =  862.9487 Mbps     0 retrans
  522.8750 MB /   1.00 sec = 4386.2811 Mbps   414 retrans
  881.5625 MB /   1.00 sec = 7394.6534 Mbps     0 retrans
 1164.3125 MB /   1.00 sec = 9766.6682 Mbps     0 retrans
 1170.5625 MB /   1.00 sec = 9819.7042 Mbps     0 retrans
 1166.8125 MB /   1.00 sec = 9788.2067 Mbps     0 retrans
 1159.8750 MB /   1.00 sec = 9729.1530 Mbps     0 retrans
  811.1250 MB /   1.00 sec = 6804.8017 Mbps    21 retrans
   73.2500 MB /   1.00 sec =  614.4674 Mbps     0 retrans
  884.6250 MB /   1.00 sec = 7420.2900 Mbps     0 retrans

 8192.0000 MB /  10.34 sec = 6647.9394 Mbps 13 %TX 31 %RX 435 retrans 80.81 msRTT

Disabling hystart on a large RTT path does not seem to play nice with
a manually specified socket buffer, resulting in TCP retransmissions
that limit the effective network performance.

This is a repeatable but extremely variable phenomenon.

i7test7% nuttcp -n8g -w100m -i1 192.168.1.18
  103.7500 MB /   1.00 sec =  870.3015 Mbps     0 retrans
 1146.3750 MB /   1.00 sec = 9616.4520 Mbps     0 retrans
 1175.9375 MB /   1.00 sec = 9864.6070 Mbps     0 retrans
  615.6875 MB /   1.00 sec = 5164.7353 Mbps    21 retrans
  139.2500 MB /   1.00 sec = 1168.1253 Mbps     0 retrans
 1090.0625 MB /   1.00 sec = 9143.8053 Mbps     0 retrans
 1170.4375 MB /   1.00 sec = 9818.6654 Mbps     0 retrans
 1174.5625 MB /   1.00 sec = 9852.8754 Mbps     0 retrans
 1174.8750 MB /   1.00 sec = 9855.6052 Mbps     0 retrans

 8192.0000 MB /   9.42 sec = 7292.9879 Mbps 14 %TX 34 %RX 21 retrans 80.81 msRTT

And:

i7test7% nuttcp -n8g -w100m -i1 192.168.1.18
  102.8125 MB /   1.00 sec =  862.4227 Mbps     0 retrans
 1148.4375 MB /   1.00 sec = 9633.6860 Mbps     0 retrans
 1177.4375 MB /   1.00 sec = 9877.3086 Mbps     0 retrans
 1168.1250 MB /   1.00 sec = 9798.9133 Mbps    11 retrans
  133.1250 MB /   1.00 sec = 1116.7457 Mbps     0 retrans
  479.8750 MB /   1.00 sec = 4025.4631 Mbps     0 retrans
 1150.6875 MB /   1.00 sec = 9652.4830 Mbps     0 retrans
 1177.3125 MB /   1.00 sec = 9876.0624 Mbps     0 retrans
 1177.3750 MB /   1.00 sec = 9876.0139 Mbps     0 retrans
  320.2500 MB /   1.00 sec = 2686.6452 Mbps    19 retrans
   64.9375 MB /   1.00 sec =  544.7363 Mbps     0 retrans
   73.6250 MB /   1.00 sec =  617.6113 Mbps     0 retrans

 8192.0000 MB /  12.39 sec = 5545.7570 Mbps 12 %TX 26 %RX 30 retrans 80.80 msRTT

Re-enabling hystart immediately gives a clean test with no TCP retrans.

i7test7% nuttcp -n8g -w100m -i1 192.168.1.18
  103.8750 MB /   1.00 sec =  871.3353 Mbps     0 retrans
 1086.7500 MB /   1.00 sec = 9116.4474 Mbps     0 retrans
 1105.8125 MB /   1.00 sec = 9276.2276 Mbps     0 retrans
 1109.4375 MB /   1.00 sec = 9306.5339 Mbps     0 retrans
 1111.3125 MB /   1.00 sec = 9322.5327 Mbps     0 retrans
 1111.3750 MB /   1.00 sec = 9322.8053 Mbps     0 retrans
 1113.7500 MB /   1.00 sec = 9342.8962 Mbps     0 retrans
 1120.3125 MB /   1.00 sec = 9397.5711 Mbps     0 retrans

 8192.0000 MB /   8.38 sec = 8204.8394 Mbps 16 %TX 39 %RX 0 retrans 80.80 msRTT

						-Bill

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Make CUBIC Hystart more robust to RTT variations
  2011-03-10  5:24               ` Bill Fink
@ 2011-03-10  6:17                 ` Stephen Hemminger
  2011-03-10  7:17                   ` Bill Fink
  2011-03-10 14:37                 ` Injong Rhee
  1 sibling, 1 reply; 27+ messages in thread
From: Stephen Hemminger @ 2011-03-10  6:17 UTC (permalink / raw)
  To: Bill Fink
  Cc: Injong Rhee, David Miller, xiyou wangcong, netdev, sangtae ha,
	Lucas Nussbaum

Bill what is the HZ in your kernel config.
I am concerned hystart doesn't work well with HZ=100

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Make CUBIC Hystart more robust to RTT variations
  2011-03-10  6:17                 ` Stephen Hemminger
@ 2011-03-10  7:17                   ` Bill Fink
  2011-03-10  8:54                     ` Lucas Nussbaum
  0 siblings, 1 reply; 27+ messages in thread
From: Bill Fink @ 2011-03-10  7:17 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Injong Rhee, David Miller, xiyou wangcong, netdev, sangtae ha,
	Lucas Nussbaum

On Wed, 9 Mar 2011, Stephen Hemminger wrote:

> Bill what is the HZ in your kernel config.
> I am concerned hystart doesn't work well with HZ=100

HZ=1000

But I did have tcp_timestamps disabled.  Should I re-run
the tests with tcp_timestamps enabled?

					-Bill

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Make CUBIC Hystart more robust to RTT variations
  2011-03-10  7:17                   ` Bill Fink
@ 2011-03-10  8:54                     ` Lucas Nussbaum
  2011-03-11  2:25                       ` Bill Fink
  0 siblings, 1 reply; 27+ messages in thread
From: Lucas Nussbaum @ 2011-03-10  8:54 UTC (permalink / raw)
  To: Bill Fink
  Cc: Stephen Hemminger, Injong Rhee, David Miller, xiyou wangcong,
	netdev, sangtae ha

On 10/03/11 at 02:17 -0500, Bill Fink wrote:
> On Wed, 9 Mar 2011, Stephen Hemminger wrote:
> 
> > Bill what is the HZ in your kernel config.
> > I am concerned hystart doesn't work well with HZ=100
> 
> HZ=1000
> 
> But I did have tcp_timestamps disabled.  Should I re-run
> the tests with tcp_timestamps enabled?

I ran my tests with timestamps enabled and HZ=250. If you have the
opportunity to run tests in the same config, it would be great. The
HZ=250 vs HZ=1000 difference could explain why it's working.

However, enabling or disabling timestamps shouldn't make a difference,
since the hystart code doesn't use TCP_CONG_RTT_STAMP.
-- 
| Lucas Nussbaum             MCF Université Nancy 2 |
| lucas.nussbaum@loria.fr         LORIA / AlGorille |
| http://www.loria.fr/~lnussbau/  +33 3 54 95 86 19 |

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Make CUBIC Hystart more robust to RTT variations
  2011-03-10  8:54                     ` Lucas Nussbaum
@ 2011-03-11  2:25                       ` Bill Fink
  0 siblings, 0 replies; 27+ messages in thread
From: Bill Fink @ 2011-03-11  2:25 UTC (permalink / raw)
  To: Lucas Nussbaum
  Cc: Stephen Hemminger, Injong Rhee, David Miller, xiyou wangcong,
	netdev, sangtae ha

On Thu, 10 Mar 2011, Lucas Nussbaum wrote:

> On 10/03/11 at 02:17 -0500, Bill Fink wrote:
> > On Wed, 9 Mar 2011, Stephen Hemminger wrote:
> > 
> > > Bill what is the HZ in your kernel config.
> > > I am concerned hystart doesn't work well with HZ=100
> > 
> > HZ=1000
> > 
> > But I did have tcp_timestamps disabled.  Should I re-run
> > the tests with tcp_timestamps enabled?
> 
> I ran my tests with timestamps enabled and HZ=250. If you have the
> opportunity to run tests in the same config, it would be great. The
> HZ=250 vs HZ=1000 difference could explain why it's working.
> 
> However, enabling or disabling timestamps shouldn't make a difference,
> since the hystart code doesn't use TCP_CONG_RTT_STAMP.

I reran the same tests with HZ=250 and tcp_timestamps enabled.
BTW all my tests are with 9000-byte jumbo frames.  If you want,
I can also try them using standard 1500-byte Ethernet frames.

First on the 18 ms RTT path:

8 GB transfer across an 18 ms RTT path with autotuning and hystart:

i7test7% nuttcp -n8g -i1 192.168.1.23
  614.5625 MB /   1.00 sec = 5155.1383 Mbps     0 retrans
  824.2500 MB /   1.00 sec = 6914.5038 Mbps     0 retrans
  826.6875 MB /   1.00 sec = 6934.5632 Mbps     0 retrans
  831.5625 MB /   1.00 sec = 6975.7146 Mbps     0 retrans
  835.1875 MB /   1.00 sec = 7006.1867 Mbps     0 retrans
  844.8125 MB /   1.00 sec = 7086.7867 Mbps     0 retrans
  862.1250 MB /   1.00 sec = 7231.9274 Mbps     0 retrans
  886.5625 MB /   1.00 sec = 7437.0402 Mbps     0 retrans
  918.6875 MB /   1.00 sec = 7706.5633 Mbps     0 retrans

 8192.0000 MB /   9.80 sec = 7009.7460 Mbps 12 %TX 31 %RX 0 retrans 18.91 msRTT

Ramps up quickly to a little under 7 Gbps, then increases more
slowly to 7.7 Gbps, with no TCP retransmissions.  Actually performed
somewhat better than the HZ=1000 case.

8 GB transfer across an 18 ms RTT path with 40 MB socket buffer and hystart:

i7test7% nuttcp -n8g -i1 -w40m 192.168.1.23
  716.0000 MB /   1.00 sec = 6006.0812 Mbps     0 retrans
  864.5000 MB /   1.00 sec = 7251.9589 Mbps     0 retrans
  866.1250 MB /   1.00 sec = 7265.4596 Mbps     0 retrans
  871.1250 MB /   1.00 sec = 7307.7746 Mbps     0 retrans
  875.6250 MB /   1.00 sec = 7345.2308 Mbps     0 retrans
  886.1875 MB /   1.00 sec = 7433.8796 Mbps     0 retrans
  904.1250 MB /   1.00 sec = 7584.3654 Mbps     0 retrans
  929.1875 MB /   1.00 sec = 7794.4728 Mbps     0 retrans
  961.6250 MB /   1.00 sec = 8066.7839 Mbps     0 retrans

 8192.0000 MB /   9.34 sec = 7356.7856 Mbps 13 %TX 32 %RX 0 retrans 18.92 msRTT

Ramps up quickly to 7+ Gbps, then increases more slowly to 8+ Gbps,
with no TCP retransmissions.  Performed significantly worse than
the HZ=1000 case.

8 GB transfer across an 18 ms RTT path with autotuning and no hystart:

i7test7% nuttcp -n8g -i1 192.168.1.23
  850.8750 MB /   1.00 sec = 7137.3642 Mbps     0 retrans
 1181.3125 MB /   1.00 sec = 9909.3396 Mbps     0 retrans
 1181.2500 MB /   1.00 sec = 9909.5486 Mbps     0 retrans
 1181.1875 MB /   1.00 sec = 9908.5883 Mbps     0 retrans
 1181.3125 MB /   1.00 sec = 9909.0621 Mbps     0 retrans
 1181.2500 MB /   1.00 sec = 9909.4396 Mbps     0 retrans
 1181.1875 MB /   1.00 sec = 9908.5189 Mbps     0 retrans

 8192.0000 MB /   7.23 sec = 9499.4276 Mbps 17 %TX 40 %RX 0 retrans 18.95 msRTT

Quickly ramps up to full 10-GigE line rate, with no TCP retrans.
Same performance as HZ=1000 case.

8 GB transfer across an 18 ms RTT path with 40 MB socket buffer and no hystart:

i7test7% nuttcp -n8g -i1 -w40m 192.168.1.23
  969.8125 MB /   1.00 sec = 8135.2793 Mbps     0 retrans
 1181.1250 MB /   1.00 sec = 9908.0541 Mbps     0 retrans
 1181.3125 MB /   1.00 sec = 9909.1810 Mbps     0 retrans
 1181.3125 MB /   1.00 sec = 9909.9044 Mbps     0 retrans
 1181.2500 MB /   1.00 sec = 9909.0729 Mbps     0 retrans
 1181.1875 MB /   1.00 sec = 9908.0532 Mbps     0 retrans
 1181.1875 MB /   1.00 sec = 9908.9549 Mbps     0 retrans

 8192.0000 MB /   7.15 sec = 9609.9893 Mbps 17 %TX 41 %RX 0 retrans 18.92 msRTT

Also quickly ramps up to full 10-GigE line rate, with no TCP retrans.
Same performance as HZ=1000 case.

Now trying the same type of tests across an 80 ms RTT path.

8 GB transfer across an 80 ms RTT path with autotuning and hystart:

i7test7% nuttcp -n8g -i1 192.168.1.18
   10.6250 MB /   1.00 sec =   89.1274 Mbps     0 retrans
  501.7500 MB /   1.00 sec = 4208.6979 Mbps     0 retrans
  872.9375 MB /   1.00 sec = 7323.2651 Mbps     0 retrans
  865.5000 MB /   1.00 sec = 7259.8901 Mbps     0 retrans
  854.9375 MB /   1.00 sec = 7172.0224 Mbps     0 retrans
  872.0000 MB /   1.00 sec = 7314.8735 Mbps     0 retrans
  866.6875 MB /   1.00 sec = 7270.3017 Mbps     0 retrans
  855.1250 MB /   1.00 sec = 7172.9354 Mbps     0 retrans
  868.7500 MB /   1.00 sec = 7288.1352 Mbps     0 retrans
  868.3750 MB /   1.00 sec = 7283.8238 Mbps     0 retrans

 8192.0000 MB /  10.99 sec = 6250.8745 Mbps 11 %TX 25 %RX 0 retrans 80.78 msRTT

Similar to the 20 ms RTT path, but achieving somewhat lower
performance levels, presumably due to the larger RTT.  Ramps
up fairly quickly to 7+ Gbps, then appears to stabilize at
that level, with no TCP retransmissions.  Somewhat better
performance than the HZ=1000 case.

8 GB transfer across an 80 ms RTT path with 100 MB socket buffer and hystart:

i7test7% nuttcp -n8g -i1 -w100m 192.168.1.18
  103.8125 MB /   1.00 sec =  870.8197 Mbps     0 retrans
 1071.6875 MB /   1.00 sec = 8989.8315 Mbps     0 retrans
 1089.6250 MB /   1.00 sec = 9140.6929 Mbps     0 retrans
 1093.4375 MB /   1.00 sec = 9172.4186 Mbps     0 retrans
 1095.1875 MB /   1.00 sec = 9187.1262 Mbps     0 retrans
 1094.7500 MB /   1.00 sec = 9183.3460 Mbps     0 retrans
 1097.8750 MB /   1.00 sec = 9208.9431 Mbps     0 retrans
 1103.9375 MB /   1.00 sec = 9261.2584 Mbps     0 retrans

 8192.0000 MB /   8.48 sec = 8102.4984 Mbps 15 %TX 38 %RX 0 retrans 80.81 msRTT

Quickly ramps up to 9 Gbps and then slowly increases further,
with no TCP retrans.  Basically same performance as HZ=1000 case.

8 GB transfer across an 80 ms RTT path with autotuning and no hystart:

i7test7% nuttcp -n8g -i1 192.168.1.18
   10.0000 MB /   1.00 sec =   83.8847 Mbps     0 retrans
  482.3125 MB /   1.00 sec = 4045.8172 Mbps     0 retrans
  863.2500 MB /   1.00 sec = 7241.4224 Mbps     0 retrans
  874.3750 MB /   1.00 sec = 7334.7304 Mbps     0 retrans
  855.0000 MB /   1.00 sec = 7172.3889 Mbps     0 retrans
  863.6250 MB /   1.00 sec = 7244.6840 Mbps     0 retrans
  875.0625 MB /   1.00 sec = 7340.5489 Mbps     0 retrans
  855.1875 MB /   1.00 sec = 7173.6390 Mbps     0 retrans
  863.8750 MB /   1.00 sec = 7246.9044 Mbps     0 retrans
  873.3125 MB /   1.00 sec = 7325.9788 Mbps     0 retrans

 8192.0000 MB /  10.99 sec = 6253.7478 Mbps 11 %TX 26 %RX 0 retrans 80.80 msRTT

Ramps up quickly to 7+ Gbps, then appears to stabilize at that
level, with no TCP retransmissions.  Performance is same as
with autotuning enabled, but less than using a manually set
100 MB socket buffer.  Same performance as HZ=1000 case.

8 GB transfer across an 80 ms RTT path with 100 MB socket buffer and no hystart:

i7test7% nuttcp -n8g -i1 -w100m 192.168.1.18
  103.8125 MB /   1.00 sec =  870.7945 Mbps     0 retrans
 1148.4375 MB /   1.00 sec = 9633.6860 Mbps     0 retrans
 1176.9375 MB /   1.00 sec = 9872.7291 Mbps     0 retrans
 1088.1250 MB /   1.00 sec = 9127.4342 Mbps    39 retrans
  171.0625 MB /   1.00 sec = 1435.1370 Mbps     0 retrans
  901.0625 MB /   1.00 sec = 7558.3275 Mbps     0 retrans
 1160.0625 MB /   1.00 sec = 9731.1831 Mbps     0 retrans
 1172.5625 MB /   1.00 sec = 9836.5508 Mbps     0 retrans
 1085.0625 MB /   1.00 sec = 9101.2174 Mbps    31 retrans
  150.3750 MB /   1.00 sec = 1261.5908 Mbps     2 retrans
   28.1875 MB /   1.00 sec =  236.4544 Mbps     0 retrans

 8192.0000 MB /  11.31 sec = 6077.0651 Mbps 14 %TX 29 %RX 72 retrans 80.82 msRTT

As in the HZ=1000 case, disabling hystart on a large RTT path
does not seem to play nice with a manually specified socket buffer,
resulting in TCP retransmissions that limit the effective network
performance.  Performance seems similar to the HZ=1000 case.

This is a repeatable phenomenon, but didn't seem quite as
variable as in the HZ=1000 case (but probably need a larger
number of repetitions to draw any firm conclusions about that).

i7test7% nuttcp -n8g -i1 -w100m 192.168.1.18
  103.4375 MB /   1.00 sec =  867.6472 Mbps     0 retrans
 1143.0625 MB /   1.00 sec = 9589.1347 Mbps     0 retrans
  629.4375 MB /   1.00 sec = 5280.0886 Mbps    24 retrans
  164.8750 MB /   1.00 sec = 1383.0759 Mbps     0 retrans
 1121.6250 MB /   1.00 sec = 9408.7878 Mbps     0 retrans
 1168.1250 MB /   1.00 sec = 9799.0309 Mbps     0 retrans
 1167.5000 MB /   1.00 sec = 9793.5725 Mbps     0 retrans
 1165.9375 MB /   1.00 sec = 9780.0841 Mbps     0 retrans
  959.8750 MB /   1.00 sec = 8052.4902 Mbps     9 retrans
  568.1250 MB /   1.00 sec = 4765.8065 Mbps     0 retrans

 8192.0000 MB /  10.03 sec = 6852.2803 Mbps 13 %TX 32 %RX 33 retrans 80.81 msRTT

And:

i7test7% nuttcp -n8g -i1 -w100m 192.168.1.18
  103.8125 MB /   1.00 sec =  870.8241 Mbps     0 retrans
 1148.8125 MB /   1.00 sec = 9636.9570 Mbps     0 retrans
 1177.3750 MB /   1.00 sec = 9876.4287 Mbps     0 retrans
 1177.4375 MB /   1.00 sec = 9877.0024 Mbps     0 retrans
  693.5000 MB /   1.00 sec = 5817.6335 Mbps    36 retrans
  263.4375 MB /   1.00 sec = 2209.7701 Mbps     0 retrans
 1137.3125 MB /   1.00 sec = 9540.7263 Mbps     0 retrans
 1169.9375 MB /   1.00 sec = 9814.2354 Mbps     0 retrans
 1168.6875 MB /   1.00 sec = 9803.7005 Mbps     0 retrans

 8192.0000 MB /   9.21 sec = 7460.8789 Mbps 14 %TX 34 %RX 36 retrans 80.81 msRTT

Re-enabling hystart immediately gives a clean test with no TCP retrans.

i7test7% nuttcp -n8g -i1 -w100m 192.168.1.18
  103.8125 MB /   1.00 sec =  870.8075 Mbps     0 retrans
 1072.3125 MB /   1.00 sec = 8995.0653 Mbps     0 retrans
 1089.4375 MB /   1.00 sec = 9139.0926 Mbps     0 retrans
 1093.1875 MB /   1.00 sec = 9170.0646 Mbps     0 retrans
 1095.5625 MB /   1.00 sec = 9190.3914 Mbps     0 retrans
 1095.5000 MB /   1.00 sec = 9189.8303 Mbps     0 retrans
 1097.6875 MB /   1.00 sec = 9207.8952 Mbps     0 retrans
 1104.1875 MB /   1.00 sec = 9262.5405 Mbps     0 retrans

 8192.0000 MB /   8.48 sec = 8104.4831 Mbps 15 %TX 38 %RX 0 retrans 80.77 msRTT

						-Bill



Previous HZ=1000 tests (with tcp_timestamps disabled):

Here are some tests I performed across real networks, where
congestion is generally not an issue, with a 2.6.35 kernel on
the transmit side.

8 GB transfer across an 18 ms RTT path with autotuning and hystart:

i7test7% nuttcp -n8g -i1 192.168.1.23
  517.9375 MB /   1.00 sec = 4344.6096 Mbps     0 retrans
  688.4375 MB /   1.00 sec = 5775.1998 Mbps     0 retrans
  692.9375 MB /   1.00 sec = 5812.7462 Mbps     0 retrans
  698.0625 MB /   1.00 sec = 5855.8078 Mbps     0 retrans
  699.8750 MB /   1.00 sec = 5871.0123 Mbps     0 retrans
  710.5625 MB /   1.00 sec = 5960.5707 Mbps     0 retrans
  728.8125 MB /   1.00 sec = 6113.7652 Mbps     0 retrans
  751.3750 MB /   1.00 sec = 6302.9210 Mbps     0 retrans
  783.8750 MB /   1.00 sec = 6575.6201 Mbps     0 retrans
  825.1875 MB /   1.00 sec = 6921.8145 Mbps     0 retrans
  875.4375 MB /   1.00 sec = 7343.9811 Mbps     0 retrans

 8192.0000 MB /  11.26 sec = 6102.4718 Mbps 11 %TX 28 %RX 0 retrans 18.92 msRTT

Ramps up quickly to a little under 6 Gbps, then increases more
slowly to 7+ Gbps, with no TCP retransmissions.

8 GB transfer across an 18 ms RTT path with 40 MB socket buffer and hystart:

i7test7% nuttcp -n8g -w40m -i1 192.168.1.23
  970.0625 MB /   1.00 sec = 8136.8475 Mbps     0 retrans
 1181.1875 MB /   1.00 sec = 9909.0045 Mbps     0 retrans
 1181.2500 MB /   1.00 sec = 9908.6369 Mbps     0 retrans
 1181.3125 MB /   1.00 sec = 9909.8747 Mbps     0 retrans
 1181.2500 MB /   1.00 sec = 9909.0531 Mbps     0 retrans
 1181.2500 MB /   1.00 sec = 9908.8153 Mbps     0 retrans
 1181.2500 MB /   1.00 sec = 9909.0729 Mbps     0 retrans

 8192.0000 MB /   7.13 sec = 9633.5814 Mbps 17 %TX 42 %RX 0 retrans 18.91 msRTT

Quickly ramps up to full 10-GigE line rate, with no TCP retrans.

8 GB transfer across an 18 ms RTT path with autotuning and no hystart:

i7test7% nuttcp -n8g -i1 192.168.1.23
  845.4375 MB /   1.00 sec = 7091.5828 Mbps     0 retrans
 1181.3125 MB /   1.00 sec = 9910.0134 Mbps     0 retrans
 1181.0625 MB /   1.00 sec = 9907.1830 Mbps     0 retrans
 1181.4375 MB /   1.00 sec = 9910.8936 Mbps     0 retrans
 1181.1875 MB /   1.00 sec = 9908.1721 Mbps     0 retrans
 1181.3125 MB /   1.00 sec = 9909.5774 Mbps     0 retrans
 1181.1875 MB /   1.00 sec = 9908.6874 Mbps     0 retrans

 8192.0000 MB /   7.25 sec = 9484.4524 Mbps 18 %TX 41 %RX 0 retrans 18.92 msRTT

Also quickly ramps up to full 10-GigE line rate, with no TCP retrans.

8 GB transfer across an 18 ms RTT path with 40 MB socket buffer and no hystart:

i7test7% nuttcp -n8g -w40m -i1 192.168.1.23
  969.8750 MB /   1.00 sec = 8135.6571 Mbps     0 retrans
 1181.3125 MB /   1.00 sec = 9909.3990 Mbps     0 retrans
 1181.2500 MB /   1.00 sec = 9908.9342 Mbps     0 retrans
 1181.2500 MB /   1.00 sec = 9909.4098 Mbps     0 retrans
 1181.2500 MB /   1.00 sec = 9908.8252 Mbps     0 retrans
 1181.2500 MB /   1.00 sec = 9909.0630 Mbps     0 retrans
 1181.2500 MB /   1.00 sec = 9909.3504 Mbps     0 retrans

 8192.0000 MB /   7.15 sec = 9611.8053 Mbps 18 %TX 42 %RX 0 retrans 18.95 msRTT

Basically the same as the case with 40 MB socket buffer and hystart enabled.

Now trying the same type of tests across an 80 ms RTT path.

8 GB transfer across an 80 ms RTT path with autotuning and hystart:

i7test7% nuttcp -n8g -i1 192.168.1.18
   11.3125 MB /   1.00 sec =   94.8954 Mbps     0 retrans
  441.5625 MB /   1.00 sec = 3704.1021 Mbps     0 retrans
  687.3750 MB /   1.00 sec = 5765.8657 Mbps     0 retrans
  715.5625 MB /   1.00 sec = 6002.6273 Mbps     0 retrans
  709.9375 MB /   1.00 sec = 5955.5958 Mbps     0 retrans
  691.3125 MB /   1.00 sec = 5799.0626 Mbps     0 retrans
  718.6250 MB /   1.00 sec = 6028.3538 Mbps     0 retrans
  718.0000 MB /   1.00 sec = 6023.0205 Mbps     0 retrans
  704.0000 MB /   1.00 sec = 5905.5387 Mbps     0 retrans
  733.3125 MB /   1.00 sec = 6151.4096 Mbps     0 retrans
  738.8750 MB /   1.00 sec = 6198.2381 Mbps     0 retrans
  731.8750 MB /   1.00 sec = 6139.3695 Mbps     0 retrans

 8192.0000 MB /  12.85 sec = 5348.9677 Mbps 10 %TX 23 %RX 0 retrans 80.81 msRTT

Similar to the 20 ms RTT path, but achieving somewhat lower
performance levels, presumably due to the larger RTT.  Ramps
up fairly quickly to a little under 6 Gbps, then increases
more slowly to 6+ Gbps, with no TCP retransmissions.

8 GB transfer across an 80 ms RTT path with 100 MB socket buffer and hystart:

i7test7% nuttcp -n8g -w100m -i1 192.168.1.18
  103.9375 MB /   1.00 sec =  871.8378 Mbps     0 retrans
 1086.5625 MB /   1.00 sec = 9114.6102 Mbps     0 retrans
 1106.6875 MB /   1.00 sec = 9283.5583 Mbps     0 retrans
 1109.3125 MB /   1.00 sec = 9305.5226 Mbps     0 retrans
 1111.1875 MB /   1.00 sec = 9321.9596 Mbps     0 retrans
 1112.8125 MB /   1.00 sec = 9334.8452 Mbps     0 retrans
 1113.6875 MB /   1.00 sec = 9341.6620 Mbps     0 retrans
 1120.2500 MB /   1.00 sec = 9398.0054 Mbps     0 retrans

 8192.0000 MB /   8.37 sec = 8207.2049 Mbps 16 %TX 38 %RX 0 retrans 80.81 msRTT

Quickly ramps up to 9+ Gbps and then slowly increases further,
with no TCP retrans.

8 GB transfer across an 80 ms RTT path with autotuning and no hystart:

i7test7% nuttcp -n8g -i1 192.168.1.18
   11.2500 MB /   1.00 sec =   94.3703 Mbps     0 retrans
  519.0625 MB /   1.00 sec = 4354.1596 Mbps     0 retrans
  861.2500 MB /   1.00 sec = 7224.7970 Mbps     0 retrans
  871.0000 MB /   1.00 sec = 7306.4191 Mbps     0 retrans
  860.7500 MB /   1.00 sec = 7220.4438 Mbps     0 retrans
  869.0625 MB /   1.00 sec = 7290.3340 Mbps     0 retrans
  863.4375 MB /   1.00 sec = 7242.7707 Mbps     0 retrans
  860.4375 MB /   1.00 sec = 7218.0606 Mbps     0 retrans
  875.5000 MB /   1.00 sec = 7344.3071 Mbps     0 retrans
  863.1875 MB /   1.00 sec = 7240.8257 Mbps     0 retrans

 8192.0000 MB /  10.98 sec = 6259.4379 Mbps 12 %TX 27 %RX 0 retrans 80.81 msRTT

Ramps up quickly to 7+ Gbps, then appears to stabilize at that
level, with no TCP retransmissions.  Performance is somewhat
better than with autotuning enabled, but less than using a
manually set 100 MB socket buffer.

8 GB transfer across an 80 ms RTT path with 100 MB socket buffer and no hystart:

i7test7% nuttcp -n8g -w100m -i1 192.168.1.18
  102.8750 MB /   1.00 sec =  862.9487 Mbps     0 retrans
  522.8750 MB /   1.00 sec = 4386.2811 Mbps   414 retrans
  881.5625 MB /   1.00 sec = 7394.6534 Mbps     0 retrans
 1164.3125 MB /   1.00 sec = 9766.6682 Mbps     0 retrans
 1170.5625 MB /   1.00 sec = 9819.7042 Mbps     0 retrans
 1166.8125 MB /   1.00 sec = 9788.2067 Mbps     0 retrans
 1159.8750 MB /   1.00 sec = 9729.1530 Mbps     0 retrans
  811.1250 MB /   1.00 sec = 6804.8017 Mbps    21 retrans
   73.2500 MB /   1.00 sec =  614.4674 Mbps     0 retrans
  884.6250 MB /   1.00 sec = 7420.2900 Mbps     0 retrans

 8192.0000 MB /  10.34 sec = 6647.9394 Mbps 13 %TX 31 %RX 435 retrans 80.81 msRTT

Disabling hystart on a large RTT path does not seem to play nice with
a manually specified socket buffer, resulting in TCP retransmissions
that limit the effective network performance.

This is a repeatable but extremely variable phenomenon.

i7test7% nuttcp -n8g -w100m -i1 192.168.1.18
  103.7500 MB /   1.00 sec =  870.3015 Mbps     0 retrans
 1146.3750 MB /   1.00 sec = 9616.4520 Mbps     0 retrans
 1175.9375 MB /   1.00 sec = 9864.6070 Mbps     0 retrans
  615.6875 MB /   1.00 sec = 5164.7353 Mbps    21 retrans
  139.2500 MB /   1.00 sec = 1168.1253 Mbps     0 retrans
 1090.0625 MB /   1.00 sec = 9143.8053 Mbps     0 retrans
 1170.4375 MB /   1.00 sec = 9818.6654 Mbps     0 retrans
 1174.5625 MB /   1.00 sec = 9852.8754 Mbps     0 retrans
 1174.8750 MB /   1.00 sec = 9855.6052 Mbps     0 retrans

 8192.0000 MB /   9.42 sec = 7292.9879 Mbps 14 %TX 34 %RX 21 retrans 80.81 msRTT

And:

i7test7% nuttcp -n8g -w100m -i1 192.168.1.18
  102.8125 MB /   1.00 sec =  862.4227 Mbps     0 retrans
 1148.4375 MB /   1.00 sec = 9633.6860 Mbps     0 retrans
 1177.4375 MB /   1.00 sec = 9877.3086 Mbps     0 retrans
 1168.1250 MB /   1.00 sec = 9798.9133 Mbps    11 retrans
  133.1250 MB /   1.00 sec = 1116.7457 Mbps     0 retrans
  479.8750 MB /   1.00 sec = 4025.4631 Mbps     0 retrans
 1150.6875 MB /   1.00 sec = 9652.4830 Mbps     0 retrans
 1177.3125 MB /   1.00 sec = 9876.0624 Mbps     0 retrans
 1177.3750 MB /   1.00 sec = 9876.0139 Mbps     0 retrans
  320.2500 MB /   1.00 sec = 2686.6452 Mbps    19 retrans
   64.9375 MB /   1.00 sec =  544.7363 Mbps     0 retrans
   73.6250 MB /   1.00 sec =  617.6113 Mbps     0 retrans

 8192.0000 MB /  12.39 sec = 5545.7570 Mbps 12 %TX 26 %RX 30 retrans 80.80 msRTT

Re-enabling hystart immediately gives a clean test with no TCP retrans.

i7test7% nuttcp -n8g -w100m -i1 192.168.1.18
  103.8750 MB /   1.00 sec =  871.3353 Mbps     0 retrans
 1086.7500 MB /   1.00 sec = 9116.4474 Mbps     0 retrans
 1105.8125 MB /   1.00 sec = 9276.2276 Mbps     0 retrans
 1109.4375 MB /   1.00 sec = 9306.5339 Mbps     0 retrans
 1111.3125 MB /   1.00 sec = 9322.5327 Mbps     0 retrans
 1111.3750 MB /   1.00 sec = 9322.8053 Mbps     0 retrans
 1113.7500 MB /   1.00 sec = 9342.8962 Mbps     0 retrans
 1120.3125 MB /   1.00 sec = 9397.5711 Mbps     0 retrans

 8192.0000 MB /   8.38 sec = 8204.8394 Mbps 16 %TX 39 %RX 0 retrans 80.80 msRTT

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Make CUBIC Hystart more robust to RTT variations
  2011-03-10  5:24               ` Bill Fink
  2011-03-10  6:17                 ` Stephen Hemminger
@ 2011-03-10 14:37                 ` Injong Rhee
  1 sibling, 0 replies; 27+ messages in thread
From: Injong Rhee @ 2011-03-10 14:37 UTC (permalink / raw)
  To: Bill Fink
  Cc: Lucas Nussbaum, Stephen Hemminger, David Miller, xiyou.wangcong,
	netdev, sangtae.ha

This is a good example why I think the problem is in implementation. The 
original idea is sound. The tests where Lucas report problems in (fat 
pipes with only a small # of flows) are the ones where hystart should 
perform very well. If you have many flows, then leaving slow start early 
(even if by mistake) can be easily covered by cubic growth function in 
congestion avoidance.

We need to look into the issue of Hz setting, other implementation 
issues, and run more extensive tests.

On 3/10/11 12:24 AM, Bill Fink wrote:
> On Wed, 9 Mar 2011, Lucas Nussbaum wrote:
>
>> On 08/03/11 at 20:30 -0500, Injong Rhee wrote:
>>> Now, both tools can be wrong. But that is not catastrophic since
>>> congestion avoidance can kick in to save the day. In a pipe where no
>>> other flows are competing, then exiting slow start too early can
>>> slow things down as the window can be still too small. But that is
>>> in fact when delays are most reliable. So those tests that say bad
>>> performance with hystart are in fact, where hystart is supposed to
>>> perform well.
>> Hi,
>>
>> In my setup, there is no congestion at all (except the buffer bloat).
>> Without Hystart, transferring 8 Gb of data takes 9s, with CUBIC exiting
>> slow start at ~2000 packets.
>> With Hystart, transferring 8 Gb of data takes 19s, with CUBIC exiting
>> slow start at ~20 packets.
>> I don't think that this is "hystart performing well". We could just as
>> well remove slow start completely, and only do congestion avoidance,
>> then.
>>
>> While I see the value in Hystart, it's clear that there are some flaws
>> in the current implementation. It probably makes sense to disable
>> hystart by default until those problems are fixed.
> Here are some tests I performed across real networks, where
> congestion is generally not an issue, with a 2.6.35 kernel on
> the transmit side.
>
> 8 GB transfer across an 18 ms RTT path with autotuning and hystart:
>
> i7test7% nuttcp -n8g -i1 192.168.1.23
>    517.9375 MB /   1.00 sec = 4344.6096 Mbps     0 retrans
>    688.4375 MB /   1.00 sec = 5775.1998 Mbps     0 retrans
>    692.9375 MB /   1.00 sec = 5812.7462 Mbps     0 retrans
>    698.0625 MB /   1.00 sec = 5855.8078 Mbps     0 retrans
>    699.8750 MB /   1.00 sec = 5871.0123 Mbps     0 retrans
>    710.5625 MB /   1.00 sec = 5960.5707 Mbps     0 retrans
>    728.8125 MB /   1.00 sec = 6113.7652 Mbps     0 retrans
>    751.3750 MB /   1.00 sec = 6302.9210 Mbps     0 retrans
>    783.8750 MB /   1.00 sec = 6575.6201 Mbps     0 retrans
>    825.1875 MB /   1.00 sec = 6921.8145 Mbps     0 retrans
>    875.4375 MB /   1.00 sec = 7343.9811 Mbps     0 retrans
>
>   8192.0000 MB /  11.26 sec = 6102.4718 Mbps 11 %TX 28 %RX 0 retrans 18.92 msRTT
>
> Ramps up quickly to a little under 6 Gbps, then increases more
> slowly to 7+ Gbps, with no TCP retransmissions.
>
> 8 GB transfer across an 18 ms RTT path with 40 MB socket buffer and hystart:
>
> i7test7% nuttcp -n8g -w40m -i1 192.168.1.23
>    970.0625 MB /   1.00 sec = 8136.8475 Mbps     0 retrans
>   1181.1875 MB /   1.00 sec = 9909.0045 Mbps     0 retrans
>   1181.2500 MB /   1.00 sec = 9908.6369 Mbps     0 retrans
>   1181.3125 MB /   1.00 sec = 9909.8747 Mbps     0 retrans
>   1181.2500 MB /   1.00 sec = 9909.0531 Mbps     0 retrans
>   1181.2500 MB /   1.00 sec = 9908.8153 Mbps     0 retrans
>   1181.2500 MB /   1.00 sec = 9909.0729 Mbps     0 retrans
>
>   8192.0000 MB /   7.13 sec = 9633.5814 Mbps 17 %TX 42 %RX 0 retrans 18.91 msRTT
>
> Quickly ramps up to full 10-GigE line rate, with no TCP retrans.
>
> 8 GB transfer across an 18 ms RTT path with autotuning and no hystart:
>
> i7test7% nuttcp -n8g -i1 192.168.1.23
>    845.4375 MB /   1.00 sec = 7091.5828 Mbps     0 retrans
>   1181.3125 MB /   1.00 sec = 9910.0134 Mbps     0 retrans
>   1181.0625 MB /   1.00 sec = 9907.1830 Mbps     0 retrans
>   1181.4375 MB /   1.00 sec = 9910.8936 Mbps     0 retrans
>   1181.1875 MB /   1.00 sec = 9908.1721 Mbps     0 retrans
>   1181.3125 MB /   1.00 sec = 9909.5774 Mbps     0 retrans
>   1181.1875 MB /   1.00 sec = 9908.6874 Mbps     0 retrans
>
>   8192.0000 MB /   7.25 sec = 9484.4524 Mbps 18 %TX 41 %RX 0 retrans 18.92 msRTT
>
> Also quickly ramps up to full 10-GigE line rate, with no TCP retrans.
>
> 8 GB transfer across an 18 ms RTT path with 40 MB socket buffer and no hystart:
>
> i7test7% nuttcp -n8g -w40m -i1 192.168.1.23
>    969.8750 MB /   1.00 sec = 8135.6571 Mbps     0 retrans
>   1181.3125 MB /   1.00 sec = 9909.3990 Mbps     0 retrans
>   1181.2500 MB /   1.00 sec = 9908.9342 Mbps     0 retrans
>   1181.2500 MB /   1.00 sec = 9909.4098 Mbps     0 retrans
>   1181.2500 MB /   1.00 sec = 9908.8252 Mbps     0 retrans
>   1181.2500 MB /   1.00 sec = 9909.0630 Mbps     0 retrans
>   1181.2500 MB /   1.00 sec = 9909.3504 Mbps     0 retrans
>
>   8192.0000 MB /   7.15 sec = 9611.8053 Mbps 18 %TX 42 %RX 0 retrans 18.95 msRTT
>
> Basically the same as the case with 40 MB socket buffer and hystart enabled.
>
> Now trying the same type of tests across an 80 ms RTT path.
>
> 8 GB transfer across an 80 ms RTT path with autotuning and hystart:
>
> i7test7% nuttcp -n8g -i1 192.168.1.18
>     11.3125 MB /   1.00 sec =   94.8954 Mbps     0 retrans
>    441.5625 MB /   1.00 sec = 3704.1021 Mbps     0 retrans
>    687.3750 MB /   1.00 sec = 5765.8657 Mbps     0 retrans
>    715.5625 MB /   1.00 sec = 6002.6273 Mbps     0 retrans
>    709.9375 MB /   1.00 sec = 5955.5958 Mbps     0 retrans
>    691.3125 MB /   1.00 sec = 5799.0626 Mbps     0 retrans
>    718.6250 MB /   1.00 sec = 6028.3538 Mbps     0 retrans
>    718.0000 MB /   1.00 sec = 6023.0205 Mbps     0 retrans
>    704.0000 MB /   1.00 sec = 5905.5387 Mbps     0 retrans
>    733.3125 MB /   1.00 sec = 6151.4096 Mbps     0 retrans
>    738.8750 MB /   1.00 sec = 6198.2381 Mbps     0 retrans
>    731.8750 MB /   1.00 sec = 6139.3695 Mbps     0 retrans
>
>   8192.0000 MB /  12.85 sec = 5348.9677 Mbps 10 %TX 23 %RX 0 retrans 80.81 msRTT
>
> Similar to the 20 ms RTT path, but achieving somewhat lower
> performance levels, presumably due to the larger RTT.  Ramps
> up fairly quickly to a little under 6 Gbps, then increases
> more slowly to 6+ Gbps, with no TCP retransmissions.
>
> 8 GB transfer across an 80 ms RTT path with 100 MB socket buffer and hystart:
>
> i7test7% nuttcp -n8g -w100m -i1 192.168.1.18
>    103.9375 MB /   1.00 sec =  871.8378 Mbps     0 retrans
>   1086.5625 MB /   1.00 sec = 9114.6102 Mbps     0 retrans
>   1106.6875 MB /   1.00 sec = 9283.5583 Mbps     0 retrans
>   1109.3125 MB /   1.00 sec = 9305.5226 Mbps     0 retrans
>   1111.1875 MB /   1.00 sec = 9321.9596 Mbps     0 retrans
>   1112.8125 MB /   1.00 sec = 9334.8452 Mbps     0 retrans
>   1113.6875 MB /   1.00 sec = 9341.6620 Mbps     0 retrans
>   1120.2500 MB /   1.00 sec = 9398.0054 Mbps     0 retrans
>
>   8192.0000 MB /   8.37 sec = 8207.2049 Mbps 16 %TX 38 %RX 0 retrans 80.81 msRTT
>
> Quickly ramps up to 9+ Gbps and then slowly increases further,
> with no TCP retrans.
>
> 8 GB transfer across an 80 ms RTT path with autotuning and no hystart:
>
> i7test7% nuttcp -n8g -i1 192.168.1.18
>     11.2500 MB /   1.00 sec =   94.3703 Mbps     0 retrans
>    519.0625 MB /   1.00 sec = 4354.1596 Mbps     0 retrans
>    861.2500 MB /   1.00 sec = 7224.7970 Mbps     0 retrans
>    871.0000 MB /   1.00 sec = 7306.4191 Mbps     0 retrans
>    860.7500 MB /   1.00 sec = 7220.4438 Mbps     0 retrans
>    869.0625 MB /   1.00 sec = 7290.3340 Mbps     0 retrans
>    863.4375 MB /   1.00 sec = 7242.7707 Mbps     0 retrans
>    860.4375 MB /   1.00 sec = 7218.0606 Mbps     0 retrans
>    875.5000 MB /   1.00 sec = 7344.3071 Mbps     0 retrans
>    863.1875 MB /   1.00 sec = 7240.8257 Mbps     0 retrans
>
>   8192.0000 MB /  10.98 sec = 6259.4379 Mbps 12 %TX 27 %RX 0 retrans 80.81 msRTT
>
> Ramps up quickly to 7+ Gbps, then appears to stabilize at that
> level, with no TCP retransmissions.  Performance is somewhat
> better than with autotuning enabled, but less than using a
> manually set 100 MB socket buffer.
>
> 8 GB transfer across an 80 ms RTT path with 100 MB socket buffer and no hystart:
>
> i7test7% nuttcp -n8g -w100m -i1 192.168.1.18
>    102.8750 MB /   1.00 sec =  862.9487 Mbps     0 retrans
>    522.8750 MB /   1.00 sec = 4386.2811 Mbps   414 retrans
>    881.5625 MB /   1.00 sec = 7394.6534 Mbps     0 retrans
>   1164.3125 MB /   1.00 sec = 9766.6682 Mbps     0 retrans
>   1170.5625 MB /   1.00 sec = 9819.7042 Mbps     0 retrans
>   1166.8125 MB /   1.00 sec = 9788.2067 Mbps     0 retrans
>   1159.8750 MB /   1.00 sec = 9729.1530 Mbps     0 retrans
>    811.1250 MB /   1.00 sec = 6804.8017 Mbps    21 retrans
>     73.2500 MB /   1.00 sec =  614.4674 Mbps     0 retrans
>    884.6250 MB /   1.00 sec = 7420.2900 Mbps     0 retrans
>
>   8192.0000 MB /  10.34 sec = 6647.9394 Mbps 13 %TX 31 %RX 435 retrans 80.81 msRTT
>
> Disabling hystart on a large RTT path does not seem to play nice with
> a manually specified socket buffer, resulting in TCP retransmissions
> that limit the effective network performance.
>
> This is a repeatable but extremely variable phenomenon.
>
> i7test7% nuttcp -n8g -w100m -i1 192.168.1.18
>    103.7500 MB /   1.00 sec =  870.3015 Mbps     0 retrans
>   1146.3750 MB /   1.00 sec = 9616.4520 Mbps     0 retrans
>   1175.9375 MB /   1.00 sec = 9864.6070 Mbps     0 retrans
>    615.6875 MB /   1.00 sec = 5164.7353 Mbps    21 retrans
>    139.2500 MB /   1.00 sec = 1168.1253 Mbps     0 retrans
>   1090.0625 MB /   1.00 sec = 9143.8053 Mbps     0 retrans
>   1170.4375 MB /   1.00 sec = 9818.6654 Mbps     0 retrans
>   1174.5625 MB /   1.00 sec = 9852.8754 Mbps     0 retrans
>   1174.8750 MB /   1.00 sec = 9855.6052 Mbps     0 retrans
>
>   8192.0000 MB /   9.42 sec = 7292.9879 Mbps 14 %TX 34 %RX 21 retrans 80.81 msRTT
>
> And:
>
> i7test7% nuttcp -n8g -w100m -i1 192.168.1.18
>    102.8125 MB /   1.00 sec =  862.4227 Mbps     0 retrans
>   1148.4375 MB /   1.00 sec = 9633.6860 Mbps     0 retrans
>   1177.4375 MB /   1.00 sec = 9877.3086 Mbps     0 retrans
>   1168.1250 MB /   1.00 sec = 9798.9133 Mbps    11 retrans
>    133.1250 MB /   1.00 sec = 1116.7457 Mbps     0 retrans
>    479.8750 MB /   1.00 sec = 4025.4631 Mbps     0 retrans
>   1150.6875 MB /   1.00 sec = 9652.4830 Mbps     0 retrans
>   1177.3125 MB /   1.00 sec = 9876.0624 Mbps     0 retrans
>   1177.3750 MB /   1.00 sec = 9876.0139 Mbps     0 retrans
>    320.2500 MB /   1.00 sec = 2686.6452 Mbps    19 retrans
>     64.9375 MB /   1.00 sec =  544.7363 Mbps     0 retrans
>     73.6250 MB /   1.00 sec =  617.6113 Mbps     0 retrans
>
>   8192.0000 MB /  12.39 sec = 5545.7570 Mbps 12 %TX 26 %RX 30 retrans 80.80 msRTT
>
> Re-enabling hystart immediately gives a clean test with no TCP retrans.
>
> i7test7% nuttcp -n8g -w100m -i1 192.168.1.18
>    103.8750 MB /   1.00 sec =  871.3353 Mbps     0 retrans
>   1086.7500 MB /   1.00 sec = 9116.4474 Mbps     0 retrans
>   1105.8125 MB /   1.00 sec = 9276.2276 Mbps     0 retrans
>   1109.4375 MB /   1.00 sec = 9306.5339 Mbps     0 retrans
>   1111.3125 MB /   1.00 sec = 9322.5327 Mbps     0 retrans
>   1111.3750 MB /   1.00 sec = 9322.8053 Mbps     0 retrans
>   1113.7500 MB /   1.00 sec = 9342.8962 Mbps     0 retrans
>   1120.3125 MB /   1.00 sec = 9397.5711 Mbps     0 retrans
>
>   8192.0000 MB /   8.38 sec = 8204.8394 Mbps 16 %TX 39 %RX 0 retrans 80.80 msRTT
>
> 						-Bill


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Make CUBIC Hystart more robust to RTT variations
  2011-03-08 23:21         ` Stephen Hemminger
  2011-03-09  1:30           ` Injong Rhee
@ 2011-03-09  1:33           ` Sangtae Ha
  1 sibling, 0 replies; 27+ messages in thread
From: Sangtae Ha @ 2011-03-09  1:33 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: David Miller, rhee, lucas.nussbaum, xiyou.wangcong, netdev

Hi Stephen,

Thank you for your feedback. Please see my answers below.

On Tue, Mar 8, 2011 at 6:21 PM, Stephen Hemminger <shemminger@vyatta.com> wrote:
> On Tue, 08 Mar 2011 11:43:46 -0800 (PST)
> David Miller <davem@davemloft.net> wrote:
>
>> From: Injong Rhee <rhee@ncsu.edu>
>> Date: Tue, 08 Mar 2011 10:26:36 -0500
>>
>> > Thanks for updating CUBIC hystart. You might want to test the
>> > cases with more background traffic and verify whether this
>> > threshold is too conservative.
>>
>> So let's get down to basics.
>>
>> What does Hystart do specially that allows it to avoid all of the
>> problems that TCP VEGAS runs into.
>>
>> Specifically, that if you use RTTs to make congestion control
>> decisions it is impossible to notice new bandwidth becomming available
>> fast enough.
>>
>> Again, it's impossible to react fast enough.  No matter what you tweak
>> all of your various settings to, this problem will still exist.
>>
>> This is a core issue, you cannot get around it.
>>
>> This is why I feel that Hystart is fundamentally flawed and we should
>> turn it off by default if not flat-out remove it.
>>
>> Distributions are turning it off by default already, therefore it's
>> stupid for the upstream kernel to behave differently if that's what
>> %99 of the world is going to end up experiencing.
>
> The assumption in Hystart that spacing between ACK's is solely due to
> congestion is a bad. If you read the paper, this is why FreeBSD's
> estimation logic is dismissed. The Hystart problem is different
> than the Vegas issue.
>
> Algorithms that look at min RTT are ok, since the lower bound is
> fixed; additional queuing and variation in network only increases RTT
> it never reduces it. With a min RTT it is possible to compute the
> upper bound on available bandwidth. i.e If all packets were as good as
> this estimate minRTT then the available bandwidth is X. But then using
> an individual RTT sample to estimate unused bandwidth is flawed. To
> quote paper.
>
>  "Thus, by checking whether ∆(N ) is larger than Dmin , we
> can detect whether cwnd has reached the available capacity
> of the path"
>
> So what goes wrong:
>  1. Dmin can be too large because this connection always sees delays
> due to other traffic or hardware. i.e buffer bloat.  This would cause
> the bandwidth estimate to be too low and therefore TCP would leave
> slow start too early (and not get up to full bandwidth).

This is true. But the idea is that running the congestion avoidance
algorithm of CUBIC for this case is better than hurting other flows
with abrupt perturbation, since the growth of CUBIC is quite
responsive and grab the bandwidth quickly in normal network
conditions.

>
>  2. Dmin can be smaller than the clock resolution. This would cause
> either sample to be ignored, or Dmin to be zero. If Dmin is zero,
> the bandwidth estimate would in theory be infinite, which would
> lead to TCP not leaving slow start because of Hystart. Instead
> TCP would leave slow start at first loss.

True. But since HyStart didn't clamp the threshold,  ca->delay_min>>4,
it can prematurely leave slow start for very small Dmin. I think this
needs to be fixed, along with the hard-coded 2ms you mentioned below.


>
> Other possible problems:
>  3. ACK's could be nudged together by variations in delay.
> This would cause HyStart to exit slow start prematurely. To false
> think it is an ACK train.

This doesn't happen when the delay is not too small (in typical WAN
including DSL), but it is possible with very small delays since the
code checking the valid ACK train uses the 2ms fixed value, which is
large for LAN.

>
> Noise in network is not catastrophic, it just
> causes TCP to exit slow-start early and have to go into normal
> window growth phase. The problem is that the original non-Hystart
> behavior of Cubic is unfair; the first flow dominates the link
> and other flows are unable to get in. If you run tests with two
> flows one will get a larger share of the bandwidth.
>
> I think Hystart is okay in concept but there may be issues
> on low RTT links as well as other corner cases that need bug
> fixing.

We do not use the delay as indication of congestion, but we use it for
improving the stability and overall performance.
Preventing burst losses quite help for mid to large BDP paths and the
performance results with non-TCP-SACK receivers are also encouraging.
I will work on the fix for the issues below.

>
> 1. Needs to use better resolution than HZ. Since HZ can be 100.
> 2. Hardcoding 2ms as spacing between ACK's as train is wrong
>   for local networks.
>
>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

[parent not found: <AANLkTimdpEKHfVKw+bm6OnymcnUrauU+jGOPeLzy3Q0o@mail.gmail.com>]

* Re: [PATCH] Make CUBIC Hystart more robust to RTT variations
       [not found]     ` <AANLkTimdpEKHfVKw+bm6OnymcnUrauU+jGOPeLzy3Q0o@mail.gmail.com>
@ 2011-03-08 18:14       ` Lucas Nussbaum
  0 siblings, 0 replies; 27+ messages in thread
From: Lucas Nussbaum @ 2011-03-08 18:14 UTC (permalink / raw)
  To: Sangtae Ha; +Cc: WANG Cong, Injong Rhee, Netdev

On 08/03/11 at 11:43 -0500, Sangtae Ha wrote:
> Hi Lucas,
> 
> The current packet-train threshold and the delay threshold have been tested
> with the bandwidth ranging from 10M to 400M, the RTT from 10ms to 320ms, and
> the buffer size from 10% BDP to 200% BDP and they were set conservatively to
> make it work over the network with very small buffer sizes. I will recreate
> your setup and check whether the current thresholds are too conservative and
> will come up with the patch.

I'm surprised. It's possible that a seemingly unrelated change broke it,
but it was already broken for me on 2.6.32.
I can provide access to the testbed if you want to run tests on it.
-- 
| Lucas Nussbaum             MCF Université Nancy 2 |
| lucas.nussbaum@loria.fr         LORIA / AlGorille |
| http://www.loria.fr/~lnussbau/  +33 3 54 95 86 19 |

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Make CUBIC Hystart more robust to RTT variations
  2011-03-08  9:32 [PATCH] Make CUBIC Hystart more robust to RTT variations Lucas Nussbaum
  2011-03-08 10:21 ` WANG Cong
@ 2011-03-10 23:28 ` Stephen Hemminger
  2011-03-11  5:59   ` Lucas Nussbaum
  1 sibling, 1 reply; 27+ messages in thread
From: Stephen Hemminger @ 2011-03-10 23:28 UTC (permalink / raw)
  To: Lucas Nussbaum; +Cc: netdev, Sangtae Ha

On Tue, 8 Mar 2011 10:32:15 +0100
Lucas Nussbaum <lucas.nussbaum@loria.fr> wrote:

> CUBIC Hystart uses two heuristics to exit slow start earlier, before
> losses start to occur. Unfortunately, it tends to exit slow start far too
> early, causing poor performance since convergence to the optimal cwnd is
> then very slow. This was reported in
> http://permalink.gmane.org/gmane.linux.network/188169 and
> https://partner-bugzilla.redhat.com/show_bug.cgi?id=616985

Ignore the RHEL bug. RHEL 5 ships with TCP BIC (not CUBIC) by default.
There are many research papers which show that BIC is too aggressive,
and not fair.

-- 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Make CUBIC Hystart more robust to RTT variations
  2011-03-10 23:28 ` Stephen Hemminger
@ 2011-03-11  5:59   ` Lucas Nussbaum
  0 siblings, 0 replies; 27+ messages in thread
From: Lucas Nussbaum @ 2011-03-11  5:59 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, Sangtae Ha

On 10/03/11 at 15:28 -0800, Stephen Hemminger wrote:
> On Tue, 8 Mar 2011 10:32:15 +0100
> Lucas Nussbaum <lucas.nussbaum@loria.fr> wrote:
> 
> > CUBIC Hystart uses two heuristics to exit slow start earlier, before
> > losses start to occur. Unfortunately, it tends to exit slow start far too
> > early, causing poor performance since convergence to the optimal cwnd is
> > then very slow. This was reported in
> > http://permalink.gmane.org/gmane.linux.network/188169 and
> > https://partner-bugzilla.redhat.com/show_bug.cgi?id=616985
> 
> Ignore the RHEL bug. RHEL 5 ships with TCP BIC (not CUBIC) by default.
> There are many research papers which show that BIC is too aggressive,
> and not fair.

According to the bug report, the server is running RHEL6 (with CUBIC and
Hystart), it's the client that is running RHEL5.
-- 
| Lucas Nussbaum             MCF Université Nancy 2 |
| lucas.nussbaum@loria.fr         LORIA / AlGorille |
| http://www.loria.fr/~lnussbau/  +33 3 54 95 86 19 |

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2011-03-11  6:02 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-03-08  9:32 [PATCH] Make CUBIC Hystart more robust to RTT variations Lucas Nussbaum
2011-03-08 10:21 ` WANG Cong
2011-03-08 11:10   ` Lucas Nussbaum
2011-03-08 15:26     ` Injong Rhee
2011-03-08 19:43       ` David Miller
2011-03-08 23:21         ` Stephen Hemminger
2011-03-09  1:30           ` Injong Rhee
2011-03-09  6:53             ` Lucas Nussbaum
2011-03-09 17:56               ` Stephen Hemminger
2011-03-09 18:25                 ` Lucas Nussbaum
2011-03-09 19:56                   ` Stephen Hemminger
2011-03-09 21:28                     ` Lucas Nussbaum
2011-03-09 20:01                   ` Stephen Hemminger
2011-03-09 21:12                     ` Yuchung Cheng
2011-03-09 21:33                       ` Lucas Nussbaum
2011-03-09 21:51                         ` Stephen Hemminger
2011-03-09 22:03                           ` Lucas Nussbaum
2011-03-10  5:24               ` Bill Fink
2011-03-10  6:17                 ` Stephen Hemminger
2011-03-10  7:17                   ` Bill Fink
2011-03-10  8:54                     ` Lucas Nussbaum
2011-03-11  2:25                       ` Bill Fink
2011-03-10 14:37                 ` Injong Rhee
2011-03-09  1:33           ` Sangtae Ha
     [not found]     ` <AANLkTimdpEKHfVKw+bm6OnymcnUrauU+jGOPeLzy3Q0o@mail.gmail.com>
2011-03-08 18:14       ` Lucas Nussbaum
2011-03-10 23:28 ` Stephen Hemminger
2011-03-11  5:59   ` Lucas Nussbaum

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).