* [PATCH 1/6] tcp: fix RTT for quick packets in congestion control
2011-03-10 16:51 [PATCH 0/6] TCP CUBIC and Hystart Stephen Hemminger
@ 2011-03-10 16:51 ` Stephen Hemminger
2011-03-10 16:51 ` [PATCH 2/6] tcp: timestamp code clarification Stephen Hemminger
` (5 subsequent siblings)
6 siblings, 0 replies; 13+ messages in thread
From: Stephen Hemminger @ 2011-03-10 16:51 UTC (permalink / raw)
To: davem, sangtae.ha, rhee; +Cc: netdev
[-- Attachment #1: tcp-input-rtt.patch --]
[-- Type: text/plain, Size: 967 bytes --]
In the congestion control interface, the callback for each ACK
includes an estimated round trip time in microseconds.
Some algorithms need high resolution (Vegas style) but most only
need jiffie resolution. If RTT is not accurate (like a retransmission)
-1 is used as a flag value.
When doing coarse resolution if RTT is less than a a jiffie
then 0 should be returned rather than no estimate. Otherwise algorithms
that expect good ack's to trigger slow start (like CUBIC Hystart)
will be confused.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
--- a/net/ipv4/tcp_input.c 2011-03-08 11:11:26.093183654 -0800
+++ b/net/ipv4/tcp_input.c 2011-03-08 11:11:46.641404939 -0800
@@ -3350,7 +3350,7 @@ static int tcp_clean_rtx_queue(struct so
net_invalid_timestamp()))
rtt_us = ktime_us_delta(ktime_get_real(),
last_ackt);
- else if (ca_seq_rtt > 0)
+ else if (ca_seq_rtt >= 0)
rtt_us = jiffies_to_usecs(ca_seq_rtt);
}
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 2/6] tcp: timestamp code clarification
2011-03-10 16:51 [PATCH 0/6] TCP CUBIC and Hystart Stephen Hemminger
2011-03-10 16:51 ` [PATCH 1/6] tcp: fix RTT for quick packets in congestion control Stephen Hemminger
@ 2011-03-10 16:51 ` Stephen Hemminger
2011-03-10 16:51 ` [PATCH 3/6] tcp_cubic: fix comparison of jiffies Stephen Hemminger
` (4 subsequent siblings)
6 siblings, 0 replies; 13+ messages in thread
From: Stephen Hemminger @ 2011-03-10 16:51 UTC (permalink / raw)
To: davem, sangtae.ha, rhee; +Cc: netdev
[-- Attachment #1: tcp-input-tstamp-clean.patch --]
[-- Type: text/plain, Size: 1462 bytes --]
Use inline functions to make the checking of ack timestamp clearer.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
--- a/net/ipv4/tcp_input.c 2011-03-10 07:58:37.715948842 -0800
+++ b/net/ipv4/tcp_input.c 2011-03-10 07:58:38.419937963 -0800
@@ -3263,7 +3263,7 @@ static int tcp_clean_rtx_queue(struct so
flag |= FLAG_NONHEAD_RETRANS_ACKED;
} else {
ca_seq_rtt = now - scb->when;
- last_ackt = skb->tstamp;
+ last_ackt = skb_get_ktime(skb);
if (seq_rtt < 0) {
seq_rtt = ca_seq_rtt;
}
@@ -3345,9 +3345,8 @@ static int tcp_clean_rtx_queue(struct so
/* Is the ACK triggering packet unambiguous? */
if (!(flag & FLAG_RETRANS_DATA_ACKED)) {
/* High resolution needed and available? */
- if (ca_ops->flags & TCP_CONG_RTT_STAMP &&
- !ktime_equal(last_ackt,
- net_invalid_timestamp()))
+ if ((ca_ops->flags & TCP_CONG_RTT_STAMP) &&
+ net_timestamp_isvalid(last_ackt))
rtt_us = ktime_us_delta(ktime_get_real(),
last_ackt);
else if (ca_seq_rtt >= 0)
--- a/include/linux/skbuff.h 2011-03-10 07:55:27.181150325 -0800
+++ b/include/linux/skbuff.h 2011-03-10 07:58:38.419937963 -0800
@@ -1965,6 +1965,11 @@ static inline ktime_t net_invalid_timest
return ktime_set(0, 0);
}
+static inline bool net_timestamp_isvalid(ktime_t t)
+{
+ return !ktime_equal(t, net_invalid_timestamp());
+}
+
extern void skb_timestamping_init(void);
#ifdef CONFIG_NETWORK_PHY_TIMESTAMPING
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 3/6] tcp_cubic: fix comparison of jiffies
2011-03-10 16:51 [PATCH 0/6] TCP CUBIC and Hystart Stephen Hemminger
2011-03-10 16:51 ` [PATCH 1/6] tcp: fix RTT for quick packets in congestion control Stephen Hemminger
2011-03-10 16:51 ` [PATCH 2/6] tcp: timestamp code clarification Stephen Hemminger
@ 2011-03-10 16:51 ` Stephen Hemminger
2011-03-11 9:40 ` Lucas Nussbaum
2011-03-10 16:51 ` [PATCH 4/6] tcp_cubic: make ack train delta value a parameter Stephen Hemminger
` (3 subsequent siblings)
6 siblings, 1 reply; 13+ messages in thread
From: Stephen Hemminger @ 2011-03-10 16:51 UTC (permalink / raw)
To: davem, sangtae.ha, rhee; +Cc: netdev
[-- Attachment #1: tcp-cubic-jiffies-wrap.patch --]
[-- Type: text/plain, Size: 952 bytes --]
Jiffies wraps around therefore the correct way to compare is
to use cast to signed value.
Note: cubic is not using full jiffies value on 64 bit arch
because using full unsigned long makes struct bictcp grow too
large for the available ca_priv area.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
--- a/net/ipv4/tcp_cubic.c 2011-03-10 08:08:32.867492953 -0800
+++ b/net/ipv4/tcp_cubic.c 2011-03-10 08:24:39.658201745 -0800
@@ -342,9 +342,11 @@ static void hystart_update(struct sock *
u32 curr_jiffies = jiffies;
/* first detection parameter - ack-train detection */
- if (curr_jiffies - ca->last_jiffies <= msecs_to_jiffies(2)) {
+ if ((s32)(curr_jiffies - ca->last_jiffies) <=
+ msecs_to_jiffies(2)) {
ca->last_jiffies = curr_jiffies;
- if (curr_jiffies - ca->round_start >= ca->delay_min>>4)
+ if ((s32) (curr_jiffies - ca->round_start) <=
+ ca->delay_min >> 4)
ca->found |= HYSTART_ACK_TRAIN;
}
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 3/6] tcp_cubic: fix comparison of jiffies
2011-03-10 16:51 ` [PATCH 3/6] tcp_cubic: fix comparison of jiffies Stephen Hemminger
@ 2011-03-11 9:40 ` Lucas Nussbaum
0 siblings, 0 replies; 13+ messages in thread
From: Lucas Nussbaum @ 2011-03-11 9:40 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: davem, sangtae.ha, rhee, netdev
On 10/03/11 at 08:51 -0800, Stephen Hemminger wrote:
> --- a/net/ipv4/tcp_cubic.c 2011-03-10 08:08:32.867492953 -0800
> +++ b/net/ipv4/tcp_cubic.c 2011-03-10 08:24:39.658201745 -0800
> @@ -342,9 +342,11 @@ static void hystart_update(struct sock *
> u32 curr_jiffies = jiffies;
>
> /* first detection parameter - ack-train detection */
> - if (curr_jiffies - ca->last_jiffies <= msecs_to_jiffies(2)) {
> + if ((s32)(curr_jiffies - ca->last_jiffies) <=
> + msecs_to_jiffies(2)) {
> ca->last_jiffies = curr_jiffies;
> - if (curr_jiffies - ca->round_start >= ca->delay_min>>4)
> + if ((s32) (curr_jiffies - ca->round_start) <=
> + ca->delay_min >> 4)
>=, not <=
--
| Lucas Nussbaum MCF Université Nancy 2 |
| lucas.nussbaum@loria.fr LORIA / AlGorille |
| http://www.loria.fr/~lnussbau/ +33 3 54 95 86 19 |
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 4/6] tcp_cubic: make ack train delta value a parameter
2011-03-10 16:51 [PATCH 0/6] TCP CUBIC and Hystart Stephen Hemminger
` (2 preceding siblings ...)
2011-03-10 16:51 ` [PATCH 3/6] tcp_cubic: fix comparison of jiffies Stephen Hemminger
@ 2011-03-10 16:51 ` Stephen Hemminger
2011-03-10 16:51 ` [PATCH 5/6] tcp_cubic: fix clock dependency Stephen Hemminger
` (2 subsequent siblings)
6 siblings, 0 replies; 13+ messages in thread
From: Stephen Hemminger @ 2011-03-10 16:51 UTC (permalink / raw)
To: davem, sangtae.ha, rhee; +Cc: netdev
[-- Attachment #1: tcp-cubic-ackdelta.patch --]
[-- Type: text/plain, Size: 1433 bytes --]
Make the spacing between ACK's that indicates a train a tuneable
value like other hystart values.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
--- a/net/ipv4/tcp_cubic.c 2011-03-10 08:24:39.658201745 -0800
+++ b/net/ipv4/tcp_cubic.c 2011-03-10 08:25:28.078700603 -0800
@@ -52,6 +52,7 @@ static int tcp_friendliness __read_mostl
static int hystart __read_mostly = 1;
static int hystart_detect __read_mostly = HYSTART_ACK_TRAIN | HYSTART_DELAY;
static int hystart_low_window __read_mostly = 16;
+static int hystart_ack_delta __read_mostly = 2;
static u32 cube_rtt_scale __read_mostly;
static u32 beta_scale __read_mostly;
@@ -75,6 +76,8 @@ MODULE_PARM_DESC(hystart_detect, "hyrbri
" 1: packet-train 2: delay 3: both packet-train and delay");
module_param(hystart_low_window, int, 0644);
MODULE_PARM_DESC(hystart_low_window, "lower bound cwnd for hybrid slow start");
+module_param(hystart_ack_delta, int, 0644);
+MODULE_PARM_DESC(hystart_ack_delta, "spacing between ack's indicating train (msecs)");
/* BIC TCP Parameters */
struct bictcp {
@@ -343,7 +346,7 @@ static void hystart_update(struct sock *
/* first detection parameter - ack-train detection */
if ((s32)(curr_jiffies - ca->last_jiffies) <=
- msecs_to_jiffies(2)) {
+ msecs_to_jiffies(hystart_ack_delta)) {
ca->last_jiffies = curr_jiffies;
if ((s32) (curr_jiffies - ca->round_start) <=
ca->delay_min >> 4)
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 5/6] tcp_cubic: fix clock dependency
2011-03-10 16:51 [PATCH 0/6] TCP CUBIC and Hystart Stephen Hemminger
` (3 preceding siblings ...)
2011-03-10 16:51 ` [PATCH 4/6] tcp_cubic: make ack train delta value a parameter Stephen Hemminger
@ 2011-03-10 16:51 ` Stephen Hemminger
2011-03-11 16:26 ` Sangtae Ha
2011-03-10 16:51 ` [PATCH 6/6] tcp_cubic: enable high resolution ack time if needed Stephen Hemminger
2011-03-11 10:28 ` [PATCH 0/6] TCP CUBIC and Hystart Lucas Nussbaum
6 siblings, 1 reply; 13+ messages in thread
From: Stephen Hemminger @ 2011-03-10 16:51 UTC (permalink / raw)
To: davem, sangtae.ha, rhee; +Cc: netdev
[-- Attachment #1: tcp-cubic-minrtt.patch --]
[-- Type: text/plain, Size: 3050 bytes --]
The hystart code was written with assumption that HZ=1000.
Replace the use of jiffies with bictcp_clock as a millisecond
real time clock.
Warning: this is still experimental, there may still be mistakes
in units (ms vs. jiffies).
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
P.s: tried using ktime_t but 'struct bictcp' is bumping against limit
of CA_PRIV_SIZE.
--- a/net/ipv4/tcp_cubic.c 2011-03-10 08:35:45.532695373 -0800
+++ b/net/ipv4/tcp_cubic.c 2011-03-10 08:35:59.968882888 -0800
@@ -88,7 +88,7 @@ struct bictcp {
u32 last_time; /* time when updated last_cwnd */
u32 bic_origin_point;/* origin point of bic function */
u32 bic_K; /* time to origin point from the beginning of the current epoch */
- u32 delay_min; /* min delay */
+ u32 delay_min; /* min delay (msec << 3) */
u32 epoch_start; /* beginning of an epoch */
u32 ack_cnt; /* number of acks */
u32 tcp_cwnd; /* estimated tcp cwnd */
@@ -98,7 +98,7 @@ struct bictcp {
u8 found; /* the exit point is found? */
u32 round_start; /* beginning of each round */
u32 end_seq; /* end_seq of the round */
- u32 last_jiffies; /* last time when the ACK spacing is close */
+ u32 last_ack; /* last time when the ACK spacing is close */
u32 curr_rtt; /* the minimum rtt of current round */
};
@@ -119,12 +119,21 @@ static inline void bictcp_reset(struct b
ca->found = 0;
}
+static inline u32 bictcp_clock(void)
+{
+#if HZ < 1000
+ return ktime_to_ms(ktime_get_real());
+#else
+ return jiffies_to_ms(jiffies);
+#endif
+}
+
static inline void bictcp_hystart_reset(struct sock *sk)
{
struct tcp_sock *tp = tcp_sk(sk);
struct bictcp *ca = inet_csk_ca(sk);
- ca->round_start = ca->last_jiffies = jiffies;
+ ca->round_start = ca->last_ack = bictcp_clock();
ca->end_seq = tp->snd_nxt;
ca->curr_rtt = 0;
ca->sample_cnt = 0;
@@ -239,7 +248,7 @@ static inline void bictcp_update(struct
*/
/* change the unit from HZ to bictcp_HZ */
- t = ((tcp_time_stamp + (ca->delay_min>>3) - ca->epoch_start)
+ t = ((tcp_time_stamp + msecs_to_jiffies(ca->delay_min>>3) - ca->epoch_start)
<< BICTCP_HZ) / HZ;
if (t < ca->bic_K) /* t - K */
@@ -342,14 +351,12 @@ static void hystart_update(struct sock *
struct bictcp *ca = inet_csk_ca(sk);
if (!(ca->found & hystart_detect)) {
- u32 curr_jiffies = jiffies;
+ u32 now = bictcp_clock();
/* first detection parameter - ack-train detection */
- if ((s32)(curr_jiffies - ca->last_jiffies) <=
- msecs_to_jiffies(hystart_ack_delta)) {
- ca->last_jiffies = curr_jiffies;
- if ((s32) (curr_jiffies - ca->round_start) <=
- ca->delay_min >> 4)
+ if ((s32)(now - ca->last_ack) <= hystart_ack_delta) {
+ ca->last_ack = now;
+ if ((s32)(now - ca->round_start) <= ca->delay_min >> 4)
ca->found |= HYSTART_ACK_TRAIN;
}
@@ -396,7 +403,7 @@ static void bictcp_acked(struct sock *sk
if ((s32)(tcp_time_stamp - ca->epoch_start) < HZ)
return;
- delay = usecs_to_jiffies(rtt_us) << 3;
+ delay = (rtt_us << 3) / USEC_PER_MSEC;
if (delay == 0)
delay = 1;
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 5/6] tcp_cubic: fix clock dependency
2011-03-10 16:51 ` [PATCH 5/6] tcp_cubic: fix clock dependency Stephen Hemminger
@ 2011-03-11 16:26 ` Sangtae Ha
0 siblings, 0 replies; 13+ messages in thread
From: Sangtae Ha @ 2011-03-11 16:26 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: davem, rhee, netdev
Thanks Stephen.
The patch is useful since I had to increase CA_PRIV_SIZE to use
ktime_t for the testing.
Indeed, CUBIC already used up the limit CA_PRIV_SIZE for its variables.
I've got compilation errors because of "jiffies_to_ms" and I corrected
it to "jiffies_to_msecs"
- return jiffies_to_ms(jiffies);
+ return jiffies_to_msecs(jiffies);
Also, >= instead of <=, which Lucas already found and reported.
- if ((s32)(now - ca->round_start) <= ca->delay_min >> 4)
+ if ((s32)(now - ca->round_start) >= ca->delay_min >> 4)
Sangtae
On Thu, Mar 10, 2011 at 11:51 AM, Stephen Hemminger
<shemminger@vyatta.com> wrote:
> The hystart code was written with assumption that HZ=1000.
> Replace the use of jiffies with bictcp_clock as a millisecond
> real time clock.
>
> Warning: this is still experimental, there may still be mistakes
> in units (ms vs. jiffies).
>
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
>
> P.s: tried using ktime_t but 'struct bictcp' is bumping against limit
> of CA_PRIV_SIZE.
>
> --- a/net/ipv4/tcp_cubic.c 2011-03-10 08:35:45.532695373 -0800
> +++ b/net/ipv4/tcp_cubic.c 2011-03-10 08:35:59.968882888 -0800
> @@ -88,7 +88,7 @@ struct bictcp {
> u32 last_time; /* time when updated last_cwnd */
> u32 bic_origin_point;/* origin point of bic function */
> u32 bic_K; /* time to origin point from the beginning of the current epoch */
> - u32 delay_min; /* min delay */
> + u32 delay_min; /* min delay (msec << 3) */
> u32 epoch_start; /* beginning of an epoch */
> u32 ack_cnt; /* number of acks */
> u32 tcp_cwnd; /* estimated tcp cwnd */
> @@ -98,7 +98,7 @@ struct bictcp {
> u8 found; /* the exit point is found? */
> u32 round_start; /* beginning of each round */
> u32 end_seq; /* end_seq of the round */
> - u32 last_jiffies; /* last time when the ACK spacing is close */
> + u32 last_ack; /* last time when the ACK spacing is close */
> u32 curr_rtt; /* the minimum rtt of current round */
> };
>
> @@ -119,12 +119,21 @@ static inline void bictcp_reset(struct b
> ca->found = 0;
> }
>
> +static inline u32 bictcp_clock(void)
> +{
> +#if HZ < 1000
> + return ktime_to_ms(ktime_get_real());
> +#else
> + return jiffies_to_ms(jiffies);
> +#endif
> +}
> +
> static inline void bictcp_hystart_reset(struct sock *sk)
> {
> struct tcp_sock *tp = tcp_sk(sk);
> struct bictcp *ca = inet_csk_ca(sk);
>
> - ca->round_start = ca->last_jiffies = jiffies;
> + ca->round_start = ca->last_ack = bictcp_clock();
> ca->end_seq = tp->snd_nxt;
> ca->curr_rtt = 0;
> ca->sample_cnt = 0;
> @@ -239,7 +248,7 @@ static inline void bictcp_update(struct
> */
>
> /* change the unit from HZ to bictcp_HZ */
> - t = ((tcp_time_stamp + (ca->delay_min>>3) - ca->epoch_start)
> + t = ((tcp_time_stamp + msecs_to_jiffies(ca->delay_min>>3) - ca->epoch_start)
> << BICTCP_HZ) / HZ;
>
> if (t < ca->bic_K) /* t - K */
> @@ -342,14 +351,12 @@ static void hystart_update(struct sock *
> struct bictcp *ca = inet_csk_ca(sk);
>
> if (!(ca->found & hystart_detect)) {
> - u32 curr_jiffies = jiffies;
> + u32 now = bictcp_clock();
>
> /* first detection parameter - ack-train detection */
> - if ((s32)(curr_jiffies - ca->last_jiffies) <=
> - msecs_to_jiffies(hystart_ack_delta)) {
> - ca->last_jiffies = curr_jiffies;
> - if ((s32) (curr_jiffies - ca->round_start) <=
> - ca->delay_min >> 4)
> + if ((s32)(now - ca->last_ack) <= hystart_ack_delta) {
> + ca->last_ack = now;
> + if ((s32)(now - ca->round_start) <= ca->delay_min >> 4)
> ca->found |= HYSTART_ACK_TRAIN;
> }
>
> @@ -396,7 +403,7 @@ static void bictcp_acked(struct sock *sk
> if ((s32)(tcp_time_stamp - ca->epoch_start) < HZ)
> return;
>
> - delay = usecs_to_jiffies(rtt_us) << 3;
> + delay = (rtt_us << 3) / USEC_PER_MSEC;
> if (delay == 0)
> delay = 1;
>
>
>
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 6/6] tcp_cubic: enable high resolution ack time if needed
2011-03-10 16:51 [PATCH 0/6] TCP CUBIC and Hystart Stephen Hemminger
` (4 preceding siblings ...)
2011-03-10 16:51 ` [PATCH 5/6] tcp_cubic: fix clock dependency Stephen Hemminger
@ 2011-03-10 16:51 ` Stephen Hemminger
2011-03-11 10:28 ` [PATCH 0/6] TCP CUBIC and Hystart Lucas Nussbaum
6 siblings, 0 replies; 13+ messages in thread
From: Stephen Hemminger @ 2011-03-10 16:51 UTC (permalink / raw)
To: davem, sangtae.ha, rhee; +Cc: netdev
[-- Attachment #1: tcp-cubic-rtt-cong.patch --]
[-- Type: text/plain, Size: 673 bytes --]
This is a refined version of an earlier patch by Lucas Nussbaum.
Cubic needs RTT values in milliseconds. If HZ < 1000 then
the values will be too coarse.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
--- a/net/ipv4/tcp_cubic.c 2011-03-10 08:35:59.968882888 -0800
+++ b/net/ipv4/tcp_cubic.c 2011-03-10 08:36:10.241016524 -0800
@@ -459,6 +459,10 @@ static int __init cubictcp_register(void
/* divide by bic_scale and by constant Srtt (100ms) */
do_div(cube_factor, bic_scale * 10);
+ /* hystart needs ms clock resolution */
+ if (hystart && HZ < 1000)
+ cubictcp.flags |= TCP_CONG_RTT_STAMP;
+
return tcp_register_congestion_control(&cubictcp);
}
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/6] TCP CUBIC and Hystart
2011-03-10 16:51 [PATCH 0/6] TCP CUBIC and Hystart Stephen Hemminger
` (5 preceding siblings ...)
2011-03-10 16:51 ` [PATCH 6/6] tcp_cubic: enable high resolution ack time if needed Stephen Hemminger
@ 2011-03-11 10:28 ` Lucas Nussbaum
2011-03-11 10:49 ` Injong Rhee
2011-03-11 15:58 ` Sangtae Ha
6 siblings, 2 replies; 13+ messages in thread
From: Lucas Nussbaum @ 2011-03-11 10:28 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: davem, sangtae.ha, rhee, netdev
On 10/03/11 at 08:51 -0800, Stephen Hemminger wrote:
> This patch set is my attempt at addressing the problems discovered
> by Lucas Nussbaum.
With those patches applied (and the fix I mentioned separately), it
works much better (still with HZ=250).
When a delayed ack train is detected, slow start ends with cwnd ~= 580
(sometimes a bit lower).
When no delayed ack train is detected, slow start ends with the detection of the
delay increase at cwnd in the [700:1100] range.
performance is still not as good as without hystart, but it is more
acceptable:
nuttcp -i1 -n1g graphene-34.nancy.grid5000.fr
94.8125 MB / 1.00 sec = 795.3059 Mbps 0 retrans
112.2500 MB / 1.00 sec = 941.6325 Mbps 0 retrans
112.2500 MB / 1.00 sec = 941.6222 Mbps 0 retrans
112.2500 MB / 1.00 sec = 941.6335 Mbps 0 retrans
112.2500 MB / 1.00 sec = 941.6354 Mbps 0 retrans
112.2500 MB / 1.00 sec = 941.6231 Mbps 0 retrans
112.2500 MB / 1.00 sec = 941.5883 Mbps 0 retrans
112.2500 MB / 1.00 sec = 941.6297 Mbps 0 retrans
112.2500 MB / 1.00 sec = 941.6391 Mbps 0 retrans
1024.0000 MB / 9.29 sec = 924.7155 Mbps 14 %TX 28 %RX 0 retrans 11.39 msRTT
During that run, no ack train was detected, but delay increase was detected when cwnd=1105:
hystart_update: cwnd=1105 ssthresh=1105 fnd=2 hs_det=3 cur_rtt=122 delay_min=90 DELTRE=16
However:
echo 1 > /proc/sys/net/ipv4/route/flush; nuttcp -i1 -n1g graphene-34.nancy.grid5000.fr
49.5000 MB / 1.00 sec = 415.2278 Mbps 0 retrans
59.0000 MB / 1.00 sec = 494.9318 Mbps 0 retrans
62.1875 MB / 1.00 sec = 521.6535 Mbps 0 retrans
64.1250 MB / 1.00 sec = 537.9329 Mbps 0 retrans
67.0625 MB / 1.00 sec = 562.5486 Mbps 0 retrans
69.4375 MB / 1.00 sec = 582.4840 Mbps 0 retrans
72.3750 MB / 1.00 sec = 607.1395 Mbps 0 retrans
75.3125 MB / 1.00 sec = 631.7557 Mbps 0 retrans
83.1250 MB / 1.00 sec = 697.2975 Mbps 0 retrans
94.3125 MB / 1.00 sec = 791.1569 Mbps 0 retrans
107.6250 MB / 1.00 sec = 902.8194 Mbps 0 retrans
112.2500 MB / 1.00 sec = 941.6231 Mbps 0 retrans
1024.0000 MB / 12.97 sec = 662.2669 Mbps 10 %TX 20 %RX 0 retrans 11.39 msRTT
[ 3050.712333] found ACK TRAIN: cwnd=493 now=2757023598 ca->last_ack=2757023598 ca->round_start=2757023593 ca->delay_min=90 delay_min>>4=5
[ 3050.726045] hystart_update: cwnd=493 ssthresh=493 fnd=1 hs_det=3 cur_rtt=91 delay_min=90 DELTRE=16
(delayed ack train detected when cwnd=493 => slower convergence)
It seems that the ack train length detection is still a bit too sensitive.
Changing:
if ((s32)(now - ca->round_start) >= ca->delay_min >> 4)
To:
if ((s32)(now - ca->round_start) > ca->delay_min >> 4)
makes things slightly better, but slow start still exits too early. (optimal cwnd=941).
I'm not sure if we can really do something more about that. The detection by
ack train length is inherently more likely to trigger false positives since all
acks are considered, not just a few acks at the beginning of the train. I'm
tempted to suggest to disable the ack train length detection by default, but
then it probably solves problems for other people, and the decrease in
performance is more acceptable now.
--
| Lucas Nussbaum MCF Université Nancy 2 |
| lucas.nussbaum@loria.fr LORIA / AlGorille |
| http://www.loria.fr/~lnussbau/ +33 3 54 95 86 19 |
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/6] TCP CUBIC and Hystart
2011-03-11 10:28 ` [PATCH 0/6] TCP CUBIC and Hystart Lucas Nussbaum
@ 2011-03-11 10:49 ` Injong Rhee
2011-03-11 15:58 ` Sangtae Ha
1 sibling, 0 replies; 13+ messages in thread
From: Injong Rhee @ 2011-03-11 10:49 UTC (permalink / raw)
To: Lucas Nussbaum; +Cc: Stephen Hemminger, davem, sangtae.ha, netdev
I think the problem is still in clock resolution (i.e., in use of Hz
250). I will look into the issue some more.
On 3/11/11 5:28 AM, Lucas Nussbaum wrote:
> On 10/03/11 at 08:51 -0800, Stephen Hemminger wrote:
>> This patch set is my attempt at addressing the problems discovered
>> by Lucas Nussbaum.
> With those patches applied (and the fix I mentioned separately), it
> works much better (still with HZ=250).
>
> When a delayed ack train is detected, slow start ends with cwnd ~= 580
> (sometimes a bit lower).
> When no delayed ack train is detected, slow start ends with the detection of the
> delay increase at cwnd in the [700:1100] range.
>
> performance is still not as good as without hystart, but it is more
> acceptable:
>
> nuttcp -i1 -n1g graphene-34.nancy.grid5000.fr
> 94.8125 MB / 1.00 sec = 795.3059 Mbps 0 retrans
> 112.2500 MB / 1.00 sec = 941.6325 Mbps 0 retrans
> 112.2500 MB / 1.00 sec = 941.6222 Mbps 0 retrans
> 112.2500 MB / 1.00 sec = 941.6335 Mbps 0 retrans
> 112.2500 MB / 1.00 sec = 941.6354 Mbps 0 retrans
> 112.2500 MB / 1.00 sec = 941.6231 Mbps 0 retrans
> 112.2500 MB / 1.00 sec = 941.5883 Mbps 0 retrans
> 112.2500 MB / 1.00 sec = 941.6297 Mbps 0 retrans
> 112.2500 MB / 1.00 sec = 941.6391 Mbps 0 retrans
>
> 1024.0000 MB / 9.29 sec = 924.7155 Mbps 14 %TX 28 %RX 0 retrans 11.39 msRTT
> During that run, no ack train was detected, but delay increase was detected when cwnd=1105:
> hystart_update: cwnd=1105 ssthresh=1105 fnd=2 hs_det=3 cur_rtt=122 delay_min=90 DELTRE=16
>
> However:
> echo 1> /proc/sys/net/ipv4/route/flush; nuttcp -i1 -n1g graphene-34.nancy.grid5000.fr
> 49.5000 MB / 1.00 sec = 415.2278 Mbps 0 retrans
> 59.0000 MB / 1.00 sec = 494.9318 Mbps 0 retrans
> 62.1875 MB / 1.00 sec = 521.6535 Mbps 0 retrans
> 64.1250 MB / 1.00 sec = 537.9329 Mbps 0 retrans
> 67.0625 MB / 1.00 sec = 562.5486 Mbps 0 retrans
> 69.4375 MB / 1.00 sec = 582.4840 Mbps 0 retrans
> 72.3750 MB / 1.00 sec = 607.1395 Mbps 0 retrans
> 75.3125 MB / 1.00 sec = 631.7557 Mbps 0 retrans
> 83.1250 MB / 1.00 sec = 697.2975 Mbps 0 retrans
> 94.3125 MB / 1.00 sec = 791.1569 Mbps 0 retrans
> 107.6250 MB / 1.00 sec = 902.8194 Mbps 0 retrans
> 112.2500 MB / 1.00 sec = 941.6231 Mbps 0 retrans
>
> 1024.0000 MB / 12.97 sec = 662.2669 Mbps 10 %TX 20 %RX 0 retrans 11.39 msRTT
> [ 3050.712333] found ACK TRAIN: cwnd=493 now=2757023598 ca->last_ack=2757023598 ca->round_start=2757023593 ca->delay_min=90 delay_min>>4=5
> [ 3050.726045] hystart_update: cwnd=493 ssthresh=493 fnd=1 hs_det=3 cur_rtt=91 delay_min=90 DELTRE=16
> (delayed ack train detected when cwnd=493 => slower convergence)
>
> It seems that the ack train length detection is still a bit too sensitive.
> Changing:
> if ((s32)(now - ca->round_start)>= ca->delay_min>> 4)
> To:
> if ((s32)(now - ca->round_start)> ca->delay_min>> 4)
> makes things slightly better, but slow start still exits too early. (optimal cwnd=941).
>
> I'm not sure if we can really do something more about that. The detection by
> ack train length is inherently more likely to trigger false positives since all
> acks are considered, not just a few acks at the beginning of the train. I'm
> tempted to suggest to disable the ack train length detection by default, but
> then it probably solves problems for other people, and the decrease in
> performance is more acceptable now.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/6] TCP CUBIC and Hystart
2011-03-11 10:28 ` [PATCH 0/6] TCP CUBIC and Hystart Lucas Nussbaum
2011-03-11 10:49 ` Injong Rhee
@ 2011-03-11 15:58 ` Sangtae Ha
2011-03-11 16:08 ` Lucas Nussbaum
1 sibling, 1 reply; 13+ messages in thread
From: Sangtae Ha @ 2011-03-11 15:58 UTC (permalink / raw)
To: Lucas Nussbaum; +Cc: Stephen Hemminger, davem, rhee, netdev
Hi Lucas,
From your setup, ca->delay_min is 90 and this means the one-way delay
is 90 >> 4 (5ms).
And our gap detection threshold is 2ms, which means that if the gap is
loosely spread over 5ms with delayed ACKs, it can early terminate the
slow start. But, given the optimal cwnd is 941 in your setup, exiting
slow start one RTT before the loss (half of the optimal cwnd) is what
hystart does.
Since the resolution is now ms, can you change the gap detection to
1ms and run it again?
Also, the following change you did doesn't hurt (1ms more train to
detect the ACK train).
if ((s32)(now - ca->round_start) > ca->delay_min >> 4)
I am also testing the algorithm with HZ=100ms and 1000ms in my network
and will share the results soon.
Sangtae
On Fri, Mar 11, 2011 at 5:28 AM, Lucas Nussbaum <lucas.nussbaum@loria.fr> wrote:
> On 10/03/11 at 08:51 -0800, Stephen Hemminger wrote:
>> This patch set is my attempt at addressing the problems discovered
>> by Lucas Nussbaum.
>
> With those patches applied (and the fix I mentioned separately), it
> works much better (still with HZ=250).
>
> When a delayed ack train is detected, slow start ends with cwnd ~= 580
> (sometimes a bit lower).
> When no delayed ack train is detected, slow start ends with the detection of the
> delay increase at cwnd in the [700:1100] range.
>
> performance is still not as good as without hystart, but it is more
> acceptable:
>
> nuttcp -i1 -n1g graphene-34.nancy.grid5000.fr
> 94.8125 MB / 1.00 sec = 795.3059 Mbps 0 retrans
> 112.2500 MB / 1.00 sec = 941.6325 Mbps 0 retrans
> 112.2500 MB / 1.00 sec = 941.6222 Mbps 0 retrans
> 112.2500 MB / 1.00 sec = 941.6335 Mbps 0 retrans
> 112.2500 MB / 1.00 sec = 941.6354 Mbps 0 retrans
> 112.2500 MB / 1.00 sec = 941.6231 Mbps 0 retrans
> 112.2500 MB / 1.00 sec = 941.5883 Mbps 0 retrans
> 112.2500 MB / 1.00 sec = 941.6297 Mbps 0 retrans
> 112.2500 MB / 1.00 sec = 941.6391 Mbps 0 retrans
>
> 1024.0000 MB / 9.29 sec = 924.7155 Mbps 14 %TX 28 %RX 0 retrans 11.39 msRTT
> During that run, no ack train was detected, but delay increase was detected when cwnd=1105:
> hystart_update: cwnd=1105 ssthresh=1105 fnd=2 hs_det=3 cur_rtt=122 delay_min=90 DELTRE=16
>
> However:
> echo 1 > /proc/sys/net/ipv4/route/flush; nuttcp -i1 -n1g graphene-34.nancy.grid5000.fr
> 49.5000 MB / 1.00 sec = 415.2278 Mbps 0 retrans
> 59.0000 MB / 1.00 sec = 494.9318 Mbps 0 retrans
> 62.1875 MB / 1.00 sec = 521.6535 Mbps 0 retrans
> 64.1250 MB / 1.00 sec = 537.9329 Mbps 0 retrans
> 67.0625 MB / 1.00 sec = 562.5486 Mbps 0 retrans
> 69.4375 MB / 1.00 sec = 582.4840 Mbps 0 retrans
> 72.3750 MB / 1.00 sec = 607.1395 Mbps 0 retrans
> 75.3125 MB / 1.00 sec = 631.7557 Mbps 0 retrans
> 83.1250 MB / 1.00 sec = 697.2975 Mbps 0 retrans
> 94.3125 MB / 1.00 sec = 791.1569 Mbps 0 retrans
> 107.6250 MB / 1.00 sec = 902.8194 Mbps 0 retrans
> 112.2500 MB / 1.00 sec = 941.6231 Mbps 0 retrans
>
> 1024.0000 MB / 12.97 sec = 662.2669 Mbps 10 %TX 20 %RX 0 retrans 11.39 msRTT
> [ 3050.712333] found ACK TRAIN: cwnd=493 now=2757023598 ca->last_ack=2757023598 ca->round_start=2757023593 ca->delay_min=90 delay_min>>4=5
> [ 3050.726045] hystart_update: cwnd=493 ssthresh=493 fnd=1 hs_det=3 cur_rtt=91 delay_min=90 DELTRE=16
> (delayed ack train detected when cwnd=493 => slower convergence)
>
> It seems that the ack train length detection is still a bit too sensitive.
> Changing:
> if ((s32)(now - ca->round_start) >= ca->delay_min >> 4)
> To:
> if ((s32)(now - ca->round_start) > ca->delay_min >> 4)
> makes things slightly better, but slow start still exits too early. (optimal cwnd=941).
>
> I'm not sure if we can really do something more about that. The detection by
> ack train length is inherently more likely to trigger false positives since all
> acks are considered, not just a few acks at the beginning of the train. I'm
> tempted to suggest to disable the ack train length detection by default, but
> then it probably solves problems for other people, and the decrease in
> performance is more acceptable now.
> --
> | Lucas Nussbaum MCF Université Nancy 2 |
> | lucas.nussbaum@loria.fr LORIA / AlGorille |
> | http://www.loria.fr/~lnussbau/ +33 3 54 95 86 19 |
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/6] TCP CUBIC and Hystart
2011-03-11 15:58 ` Sangtae Ha
@ 2011-03-11 16:08 ` Lucas Nussbaum
0 siblings, 0 replies; 13+ messages in thread
From: Lucas Nussbaum @ 2011-03-11 16:08 UTC (permalink / raw)
To: Sangtae Ha; +Cc: Stephen Hemminger, davem, rhee, netdev
On 11/03/11 at 10:58 -0500, Sangtae Ha wrote:
> Hi Lucas,
>
> From your setup, ca->delay_min is 90 and this means the one-way delay
> is 90 >> 4 (5ms).
> And our gap detection threshold is 2ms, which means that if the gap is
> loosely spread over 5ms with delayed ACKs, it can early terminate the
> slow start. But, given the optimal cwnd is 941 in your setup, exiting
> slow start one RTT before the loss (half of the optimal cwnd) is what
> hystart does.
>
> Since the resolution is now ms, can you change the gap detection to
> 1ms and run it again?
> Also, the following change you did doesn't hurt (1ms more train to
> detect the ACK train).
Hi,
Changing it to 1ms only improves the situation marginally.
[25271.861481] found ACK TRAIN: cwnd=835 now=2779244747 ca->last_ack=2779244747 ca->round_start=2779244741 ca->delay_min=90 delay_min>>4=5 nbacks=261
[25291.507340] found ACK TRAIN: cwnd=585 now=2779264393 ca->last_ack=2779264393 ca->round_start=2779264387 ca->delay_min=90 delay_min>>4=5 nbacks=221
[25327.585396] found ACK TRAIN: cwnd=1034 now=2779300471 ca->last_ack=2779300471 ca->round_start=2779300465 ca->delay_min=90 delay_min>>4=5 nbacks=245
[25347.300351] found ACK TRAIN: cwnd=1463 now=2779320186 ca->last_ack=2779320186 ca->round_start=2779320180 ca->delay_min=90 delay_min>>4=5 nbacks=427
[25390.702328] found ACK TRAIN: cwnd=587 now=2779363588 ca->last_ack=2779363588 ca->round_start=2779363582 ca->delay_min=90 delay_min>>4=5 nbacks=211
[25394.775396] found ACK TRAIN: cwnd=588 now=2779367661 ca->last_ack=2779367661 ca->round_start=2779367655 ca->delay_min=90 delay_min>>4=5 nbacks=242
[25402.061328] found ACK TRAIN: cwnd=1282 now=2779374947 ca->last_ack=2779374947 ca->round_start=2779374941 ca->delay_min=90 delay_min>>4=5 nbacks=335
[25404.894336] found ACK TRAIN: cwnd=585 now=2779377780 ca->last_ack=2779377780 ca->round_start=2779377774 ca->delay_min=90 delay_min>>4=5 nbacks=205
[25408.584337] found ACK TRAIN: cwnd=587 now=2779381470 ca->last_ack=2779381470 ca->round_start=2779381464 ca->delay_min=90 delay_min>>4=5 nbacks=209
[25421.699331] found ACK TRAIN: cwnd=856 now=2779394585 ca->last_ack=2779394585 ca->round_start=2779394579 ca->delay_min=90 delay_min>>4=5 nbacks=239
There are still some cases when ack trains are detected too early.
--
| Lucas Nussbaum MCF Université Nancy 2 |
| lucas.nussbaum@loria.fr LORIA / AlGorille |
| http://www.loria.fr/~lnussbau/ +33 3 54 95 86 19 |
^ permalink raw reply [flat|nested] 13+ messages in thread