* Re: 2.6.10 TCP troubles -- suggested patch [not found] <050QTJA12@server5.heliogroup.fr> @ 2005-02-09 18:59 ` Stephen Hemminger 2005-02-09 20:25 ` David S. Miller 2005-02-22 21:50 ` [RFT] BIC TCP delayed ack compensation Stephen Hemminger 0 siblings, 2 replies; 17+ messages in thread From: Stephen Hemminger @ 2005-02-09 18:59 UTC (permalink / raw) To: Hubert Tonneau; +Cc: Francois Romieu, Alexey Kuznetsov, netdev Please try this patch, based on Alexey's suggestion: > That's one quick and simple idea: set PSH on each tso segment. > Seems, it is always good. Hardware will preserve it only on the last skb and > everyone will be happy. # This is a BitKeeper generated diff -Nru style patch. # # ChangeSet # 2005/02/09 11:00:57-08:00 shemminger@linux.site # Always set PUSH on TSO multi-segment frames # to workaround bugs in MacOSX # # net/ipv4/tcp_output.c # 2005/02/09 11:00:44-08:00 shemminger@linux.site +8 -0 # Always set PUSH on TSO multi-segment frames # to workaround bugs in MacOSX # diff -Nru a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c --- a/net/ipv4/tcp_output.c 2005-02-09 11:01:12 -08:00 +++ b/net/ipv4/tcp_output.c 2005-02-09 11:01:12 -08:00 @@ -754,6 +754,14 @@ break; } + /* Force push to be on for any large TSO frames + * to workaround problems with busted implementations + * like MacOSX that hold off delivery of data until + * push. + */ + if (tcp_skb_pcount(skb) > 1) + TCP_SKB_CB(skb)->flags |= TCPCB_FLAG_PSH; + TCP_SKB_CB(skb)->when = tcp_time_stamp; if (tcp_transmit_skb(sk, skb_clone(skb, GFP_ATOMIC))) break; ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: 2.6.10 TCP troubles -- suggested patch 2005-02-09 18:59 ` 2.6.10 TCP troubles -- suggested patch Stephen Hemminger @ 2005-02-09 20:25 ` David S. Miller 2005-02-22 21:50 ` [RFT] BIC TCP delayed ack compensation Stephen Hemminger 1 sibling, 0 replies; 17+ messages in thread From: David S. Miller @ 2005-02-09 20:25 UTC (permalink / raw) To: Stephen Hemminger; +Cc: hubert.tonneau, romieu, kuznet, netdev On Wed, 9 Feb 2005 10:59:09 -0800 Stephen Hemminger <shemminger@osdl.org> wrote: > Please try this patch, based on Alexey's suggestion: -EBADINDENTATION :-) ^ permalink raw reply [flat|nested] 17+ messages in thread
* [RFT] BIC TCP delayed ack compensation 2005-02-09 18:59 ` 2.6.10 TCP troubles -- suggested patch Stephen Hemminger 2005-02-09 20:25 ` David S. Miller @ 2005-02-22 21:50 ` Stephen Hemminger 2005-02-22 23:30 ` John Heffner 2005-02-22 23:38 ` Baruch Even 1 sibling, 2 replies; 17+ messages in thread From: Stephen Hemminger @ 2005-02-22 21:50 UTC (permalink / raw) To: Hubert Tonneau, cliff white Cc: Alexey Kuznetsov, netdev, Injong Rhee, David S. Miller This patch which was extracted from BIC TCP 1.1 compensates for systems (like MaxOSX) that don't ACK every other packet. It has no impact for normal transfers, but might help with problems with Mac like Hubert found. diff -Nru a/include/linux/tcp.h b/include/linux/tcp.h --- a/include/linux/tcp.h 2005-02-22 13:44:12 -08:00 +++ b/include/linux/tcp.h 2005-02-22 13:44:12 -08:00 @@ -433,6 +433,7 @@ __u32 last_max_cwnd; /* last maximium snd_cwnd */ __u32 last_cwnd; /* the last snd_cwnd */ __u32 last_stamp; /* time when updated last_cwnd */ + __u32 delayed_ack; /* ratio of packets/ACKs */ } bictcp; }; diff -Nru a/include/net/tcp.h b/include/net/tcp.h --- a/include/net/tcp.h 2005-02-22 13:44:12 -08:00 +++ b/include/net/tcp.h 2005-02-22 13:44:12 -08:00 @@ -508,6 +508,8 @@ #define BICTCP_BETA_SCALE 1024 /* Scale factor beta calculation * max_cwnd = snd_cwnd * beta */ +#define BICTCP_DELAY_SCALE 1024 /* Scale for delayed_ack ratio */ + #define BICTCP_MAX_INCREMENT 32 /* * Limit on the amount of * increment allowed during diff -Nru a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c --- a/net/ipv4/tcp_input.c 2005-02-22 13:44:12 -08:00 +++ b/net/ipv4/tcp_input.c 2005-02-22 13:44:12 -08:00 @@ -339,6 +339,7 @@ tp->bictcp.last_max_cwnd = 0; tp->bictcp.last_cwnd = 0; tp->bictcp.last_stamp = 0; + tp->bictcp.delayed_ack = 2 * BICTCP_DELAY_SCALE; } /* 5. Recalculate window clamp after socket hit its memory bounds. */ @@ -2075,6 +2076,13 @@ /* linear increase */ tp->bictcp.cnt = tp->snd_cwnd / BICTCP_MAX_INCREMENT; } + + /* compensate for delayed ack's */ + tp->bictcp.cnt = (tp->bictcp.cnt * BICTCP_DELAY_SCALE) + / tp->bictcp.delayed_ack; + if (tp->bictcp.cnt == 0) + tp->bictcp.cnt = 1; + return tp->bictcp.cnt; } @@ -2418,6 +2426,7 @@ __u32 now = tcp_time_stamp; int acked = 0; __s32 seq_rtt = -1; + __u32 cnt = 0; while ((skb = skb_peek(&sk->sk_write_queue)) && skb != sk->sk_send_head) { @@ -2472,7 +2481,13 @@ tcp_packets_out_dec(tp, skb); __skb_unlink(skb, skb->list); sk_stream_free_skb(sk, skb); + ++cnt; } + + /* compute average packets per ACK (scaled by 1024) */ + if (cnt > 0 && tcp_is_bic(tp) && tp->ca_state == TCP_CA_Open) + tp->bictcp.delayed_ack = (15 * tp->bictcp.delayed_ack) / 16 + + (BICTCP_DELAY_SCALE/16) * cnt; if (acked&FLAG_ACKED) { tcp_ack_update_rtt(tp, acked, seq_rtt); ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFT] BIC TCP delayed ack compensation 2005-02-22 21:50 ` [RFT] BIC TCP delayed ack compensation Stephen Hemminger @ 2005-02-22 23:30 ` John Heffner 2005-02-22 23:38 ` Baruch Even 1 sibling, 0 replies; 17+ messages in thread From: John Heffner @ 2005-02-22 23:30 UTC (permalink / raw) To: Stephen Hemminger; +Cc: netdev Has there been any discussion of implementing ABC (RFC3465) in Linux? Thanks, -John ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFT] BIC TCP delayed ack compensation 2005-02-22 21:50 ` [RFT] BIC TCP delayed ack compensation Stephen Hemminger 2005-02-22 23:30 ` John Heffner @ 2005-02-22 23:38 ` Baruch Even 2005-02-23 1:04 ` Yee-Ting Li 1 sibling, 1 reply; 17+ messages in thread From: Baruch Even @ 2005-02-22 23:38 UTC (permalink / raw) To: Stephen Hemminger Cc: Hubert Tonneau, cliff white, Alexey Kuznetsov, netdev, Injong Rhee, David S. Miller, Yee-Ting Li, Doug Leith Stephen Hemminger wrote: > This patch which was extracted from BIC TCP 1.1 compensates > for systems (like MaxOSX) that don't ACK every other packet. > It has no impact for normal transfers, but might help with problems > with Mac like Hubert found. We have a version of ABC (Appropriate Byte Counting) implementation of RFC 3465, which we hope to submit soon for inclusion in the kernel which should be a more appropriate solution for this. The RFC is a well defined standard whereas this patch has not received any reviewing by the networking community. This solution is just a band-aid for only one congestion control, as opposed to a generic solution. It is also prone to make BIC more aggressive according to our testing. I'll try to post our ABC patch tomorrow, time permitting. One thing to note is that accounting for delayed acking is not an overly important feature, from our testing it only speeds up convergence by a small factor and doesn't change the correctness of the algorithms. Baruch ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFT] BIC TCP delayed ack compensation 2005-02-22 23:38 ` Baruch Even @ 2005-02-23 1:04 ` Yee-Ting Li 2005-02-23 15:28 ` Yee-Ting Li 0 siblings, 1 reply; 17+ messages in thread From: Yee-Ting Li @ 2005-02-23 1:04 UTC (permalink / raw) To: netdev Cc: Doug Leith, David S. Miller, Injong Rhee, Yee-Ting Li, Baruch Even, Hubert Tonneau, cliff white, Alexey Kuznetsov, Stephen Hemminger On Feb 22, 2005, at 23:38, Baruch Even wrote: > We have a version of ABC (Appropriate Byte Counting) implementation of > RFC 3465, which we hope to submit soon for inclusion in the kernel > which should be a more appropriate solution for this. The RFC is a > well defined standard whereas this patch has not received any > reviewing by the networking community. Please find enclosed a version of our implementation of RFC3465 ABC for Linux 2.6.11-rc4. There is in-built protection, as defined by the RFC, to prevent large bursts of packets should acks arrive acknowledging more than abc_L packets (sysctl_tcp_abc_L). The entire abc patch can be switched on or off using sysctl_tcp_abc={1|0} respectively. As this is also a RFT, it is switched ON by default and has the abc_L value of 2 which MAY be used (according to the RFC). Note that an abc_L of 1 will be more conservative than what is available with normal clocking of delayed acks. Note that there is currently no built in mechanism to prevent abc_L being set to over 2; the RFC defines that abc_L MUST NOT be greater than 2. This patch also has the advantage of working for all protocols currently in the kernel (except vegas which doesn't require it). Signed-off-by: Yee-Ting Li <Yee-Ting.Li@may.ie> Index: linux-2.6.11-rc4/include/linux/sysctl.h =================================================================== --- linux-2.6.11-rc4.orig/include/linux/sysctl.h Sun Feb 13 03:06:53 2005 +++ linux-2.6.11-rc4/include/linux/sysctl.h Tue Feb 22 23:48:30 2005 @@ -344,6 +344,8 @@ NET_TCP_DEFAULT_WIN_SCALE=105, NET_TCP_MODERATE_RCVBUF=106, NET_TCP_TSO_WIN_DIVISOR=107, + NET_TCP_ABC=108, + NET_TCP_ABC_L=109, }; enum { Index: linux-2.6.11-rc4/include/linux/tcp.h =================================================================== --- linux-2.6.11-rc4.orig/include/linux/tcp.h Sun Feb 13 03:06:23 2005 +++ linux-2.6.11-rc4/include/linux/tcp.h Tue Feb 22 23:39:41 2005 @@ -366,6 +366,8 @@ __u32 total_retrans; /* Total retransmits for entire connection */ + __u32 bytes_acked; /* Appropiate Byte Counting - RFC3465 */ + /* The syn_wait_lock is necessary only to avoid proc interface having * to grab the main lock sock while browsing the listening hash * (otherwise it's deadlock prone). Index: linux-2.6.11-rc4/include/net/tcp.h =================================================================== --- linux-2.6.11-rc4.orig/include/net/tcp.h Sun Feb 13 03:05:28 2005 +++ linux-2.6.11-rc4/include/net/tcp.h Tue Feb 22 23:47:59 2005 @@ -609,6 +609,10 @@ extern int sysctl_tcp_moderate_rcvbuf; extern int sysctl_tcp_tso_win_divisor; +/* RFC3465 - ABC */ +extern int sysctl_tcp_abc; +extern int sysctl_tcp_abc_L; + extern atomic_t tcp_memory_allocated; extern atomic_t tcp_sockets_allocated; extern int tcp_memory_pressure; @@ -1366,6 +1370,7 @@ static inline void tcp_enter_cwr(struct tcp_sock *tp) { tp->prior_ssthresh = 0; + tp->bytes_acked=0; if (tp->ca_state < TCP_CA_CWR) { __tcp_enter_cwr(tp); tcp_set_ca_state(tp, TCP_CA_CWR); Index: linux-2.6.11-rc4/net/ipv4/sysctl_net_ipv4.c =================================================================== --- linux-2.6.11-rc4.orig/net/ipv4/sysctl_net_ipv4.c Sun Feb 13 03:07:01 2005 +++ linux-2.6.11-rc4/net/ipv4/sysctl_net_ipv4.c Tue Feb 22 23:46:18 2005 @@ -682,6 +682,22 @@ .mode = 0644, .proc_handler = &proc_dointvec, }, + { + .ctl_name = NET_TCP_ABC, + .procname = "tcp_abc", + .data = &sysctl_tcp_abc, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = &proc_dointvec, + }, + { + .ctl_name = NET_TCP_ABC_L, + .procname = "tcp_abc_L", + .data = &sysctl_tcp_abc_L, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = &proc_dointvec, + }, { .ctl_name = 0 } }; Index: linux-2.6.11-rc4/net/ipv4/tcp.c =================================================================== --- linux-2.6.11-rc4.orig/net/ipv4/tcp.c Sun Feb 13 03:05:50 2005 +++ linux-2.6.11-rc4/net/ipv4/tcp.c Tue Feb 22 23:28:28 2005 @@ -1825,6 +1825,7 @@ tp->packets_out = 0; tp->snd_ssthresh = 0x7fffffff; tp->snd_cwnd_cnt = 0; + tp->bytes_acked = 0; tcp_set_ca_state(tp, TCP_CA_Open); tcp_clear_retrans(tp); tcp_delack_init(tp); Index: linux-2.6.11-rc4/net/ipv4/tcp_input.c =================================================================== --- linux-2.6.11-rc4.orig/net/ipv4/tcp_input.c Tue Feb 22 23:27:44 2005 +++ linux-2.6.11-rc4/net/ipv4/tcp_input.c Wed Feb 23 00:25:44 2005 @@ -92,6 +92,11 @@ int sysctl_tcp_moderate_rcvbuf = 1; +/* RFC 3465 - ABC */ +int sysctl_tcp_abc = 1; +int sysctl_tcp_abc_L = 2; /* The RFC definess 1 as being a more conservative value */ + /* that SHOULD be used, however, we use 2 as it MAY be used */ + /* Default values of the Vegas variables, in fixed-point representation * with V_PARAM_SHIFT bits to the right of the binary point. */ @@ -1287,6 +1292,7 @@ tp->snd_cwnd_cnt = 0; tp->snd_cwnd_stamp = tcp_time_stamp; + tp->bytes_acked = 0; tcp_clear_retrans(tp); /* Push undo marker, if it was plain RTO and nothing @@ -1945,6 +1951,8 @@ TCP_ECN_queue_cwr(tp); } + tp->bytes_acked = 0; + tp->snd_cwnd_cnt = 0; tcp_set_ca_state(tp, TCP_CA_Recovery); } @@ -2100,6 +2108,24 @@ tp->snd_cwnd_stamp = tcp_time_stamp; } +/* This is a wrapper function to handle RFC3465 - ABC. As per the RFC, the abc_L + * value defines a burst moderation to prevent sending large bursts of packets + * should an ack acknowledge many packets. abc_L MUST NOT be larger than 2. */ +static __inline__ void reno_cong_avoid_abc( struct tcp_sock *tp, int mss_now ) +{ + int incrs_applied = 0; + + if (sysctl_tcp_abc && !tp->nonagle) + { + while (tp->bytes_acked > mss_now && incrs_applied < sysctl_tcp_abc_L) { + tp->bytes_acked -= mss_now; + reno_cong_avoid( tp ); + } + } else + reno_cong_avoid( tp ); +} + + /* This is based on the congestion detection/avoidance scheme described in * Lawrence S. Brakmo and Larry L. Peterson. * "TCP Vegas: End to end congestion avoidance on a global internet." @@ -2322,12 +2348,15 @@ tp->snd_cwnd_stamp = tcp_time_stamp; } -static inline void tcp_cong_avoid(struct tcp_sock *tp, u32 ack, u32 seq_rtt) +static inline void tcp_cong_avoid(struct sock *sk, u32 ack, u32 seq_rtt) { + struct tcp_sock *tp = tcp_sk(sk); + int mss_now = tcp_current_mss(sk,1); + if (tcp_vegas_enabled(tp)) vegas_cong_avoid(tp, ack, seq_rtt); else - reno_cong_avoid(tp); + reno_cong_avoid_abc(tp, mss_now); } /* Restart timer after forward progress on connection. @@ -2890,6 +2919,9 @@ if (before(ack, prior_snd_una)) goto old_ack; + if ( sysctl_tcp_abc && tp->ca_state < TCP_CA_CWR ) + tp->bytes_acked += ack - prior_snd_una; + if (!(flag&FLAG_SLOWPATH) && after(ack, prior_snd_una)) { /* Window is constant, pure forward advance. * No more checks are required. @@ -2940,12 +2972,12 @@ if ((flag & FLAG_DATA_ACKED) && (tcp_vegas_enabled(tp) || prior_in_flight >= tp->snd_cwnd) && tcp_may_raise_cwnd(tp, flag)) - tcp_cong_avoid(tp, ack, seq_rtt); + tcp_cong_avoid(sk, ack, seq_rtt); tcp_fastretrans_alert(sk, prior_snd_una, prior_packets, flag); } else { if ((flag & FLAG_DATA_ACKED) && (tcp_vegas_enabled(tp) || prior_in_flight >= tp->snd_cwnd)) - tcp_cong_avoid(tp, ack, seq_rtt); + tcp_cong_avoid(sk, ack, seq_rtt); } if ((flag & FLAG_FORWARD_PROGRESS) || !(flag&FLAG_NOT_DUP)) Index: linux-2.6.11-rc4/net/ipv4/tcp_minisocks.c =================================================================== --- linux-2.6.11-rc4.orig/net/ipv4/tcp_minisocks.c Sun Feb 13 03:07:01 2005 +++ linux-2.6.11-rc4/net/ipv4/tcp_minisocks.c Tue Feb 22 23:28:28 2005 @@ -769,6 +769,8 @@ newtp->snd_cwnd = 2; newtp->snd_cwnd_cnt = 0; + newtp->bytes_acked = 0; + newtp->frto_counter = 0; newtp->frto_highmark = 0; ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFT] BIC TCP delayed ack compensation 2005-02-23 1:04 ` Yee-Ting Li @ 2005-02-23 15:28 ` Yee-Ting Li 0 siblings, 0 replies; 17+ messages in thread From: Yee-Ting Li @ 2005-02-23 15:28 UTC (permalink / raw) To: netdev Cc: David S. Miller, Stephen Hemminger, Yee-Ting Li, Baruch Even, Doug Leith Opps! checking through the code, i've realised that i forgot to increment the incrs_applied counter to account for burst moderation. Please find enclosed the correct (full) implementation of RFC3465 (the only change from the previous is the addition of incrs_applied++ in the while loop). From our tests with Linux receivers, this burst moderation will make a difference at very high speeds (>200Mbit/sec) as they do not always acknowledge for every other packet. Apologies for any inconvenience. Yee. On Feb 23, 2005, at 01:04, Yee-Ting Li wrote: > On Feb 22, 2005, at 23:38, Baruch Even wrote: >> We have a version of ABC (Appropriate Byte Counting) implementation >> of RFC 3465, which we hope to submit soon for inclusion in the kernel >> which should be a more appropriate solution for this. The RFC is a >> well defined standard whereas this patch has not received any >> reviewing by the networking community. > > Please find enclosed a version of our implementation of RFC3465 ABC > for Linux 2.6.11-rc4. > > There is in-built protection, as defined by the RFC, to prevent large > bursts of packets should acks arrive acknowledging more than abc_L > packets (sysctl_tcp_abc_L). The entire abc patch can be switched on or > off using sysctl_tcp_abc={1|0} respectively. As this is also a RFT, it > is switched ON by default and has the abc_L value of 2 which MAY be > used (according to the RFC). > > Note that an abc_L of 1 will be more conservative than what is > available with normal clocking of delayed acks. Note that there is > currently no built in mechanism to prevent abc_L being set to over 2; > the RFC defines that abc_L MUST NOT be greater than 2. > > This patch also has the advantage of working for all protocols > currently in the kernel (except vegas which doesn't require it). > Signed-off-by: Yee-Ting Li <Yee-Ting.Li@may.ie> Index: linux-2.6.11-rc4/include/linux/sysctl.h =================================================================== --- linux-2.6.11-rc4.orig/include/linux/sysctl.h Sun Feb 13 03:06:53 2005 +++ linux-2.6.11-rc4/include/linux/sysctl.h Tue Feb 22 23:48:30 2005 @@ -344,6 +344,8 @@ NET_TCP_DEFAULT_WIN_SCALE=105, NET_TCP_MODERATE_RCVBUF=106, NET_TCP_TSO_WIN_DIVISOR=107, + NET_TCP_ABC=108, + NET_TCP_ABC_L=109, }; enum { Index: linux-2.6.11-rc4/include/linux/tcp.h =================================================================== --- linux-2.6.11-rc4.orig/include/linux/tcp.h Sun Feb 13 03:06:23 2005 +++ linux-2.6.11-rc4/include/linux/tcp.h Tue Feb 22 23:39:41 2005 @@ -366,6 +366,8 @@ __u32 total_retrans; /* Total retransmits for entire connection */ + __u32 bytes_acked; /* Appropiate Byte Counting - RFC3465 */ + /* The syn_wait_lock is necessary only to avoid proc interface having * to grab the main lock sock while browsing the listening hash * (otherwise it's deadlock prone). Index: linux-2.6.11-rc4/include/net/tcp.h =================================================================== --- linux-2.6.11-rc4.orig/include/net/tcp.h Sun Feb 13 03:05:28 2005 +++ linux-2.6.11-rc4/include/net/tcp.h Tue Feb 22 23:47:59 2005 @@ -609,6 +609,10 @@ extern int sysctl_tcp_moderate_rcvbuf; extern int sysctl_tcp_tso_win_divisor; +/* RFC3465 - ABC */ +extern int sysctl_tcp_abc; +extern int sysctl_tcp_abc_L; + extern atomic_t tcp_memory_allocated; extern atomic_t tcp_sockets_allocated; extern int tcp_memory_pressure; @@ -1366,6 +1370,7 @@ static inline void tcp_enter_cwr(struct tcp_sock *tp) { tp->prior_ssthresh = 0; + tp->bytes_acked=0; if (tp->ca_state < TCP_CA_CWR) { __tcp_enter_cwr(tp); tcp_set_ca_state(tp, TCP_CA_CWR); Index: linux-2.6.11-rc4/net/ipv4/sysctl_net_ipv4.c =================================================================== --- linux-2.6.11-rc4.orig/net/ipv4/sysctl_net_ipv4.c Sun Feb 13 03:07:01 2005 +++ linux-2.6.11-rc4/net/ipv4/sysctl_net_ipv4.c Tue Feb 22 23:46:18 2005 @@ -682,6 +682,22 @@ .mode = 0644, .proc_handler = &proc_dointvec, }, + { + .ctl_name = NET_TCP_ABC, + .procname = "tcp_abc", + .data = &sysctl_tcp_abc, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = &proc_dointvec, + }, + { + .ctl_name = NET_TCP_ABC_L, + .procname = "tcp_abc_L", + .data = &sysctl_tcp_abc_L, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = &proc_dointvec, + }, { .ctl_name = 0 } }; Index: linux-2.6.11-rc4/net/ipv4/tcp.c =================================================================== --- linux-2.6.11-rc4.orig/net/ipv4/tcp.c Sun Feb 13 03:05:50 2005 +++ linux-2.6.11-rc4/net/ipv4/tcp.c Tue Feb 22 23:28:28 2005 @@ -1825,6 +1825,7 @@ tp->packets_out = 0; tp->snd_ssthresh = 0x7fffffff; tp->snd_cwnd_cnt = 0; + tp->bytes_acked = 0; tcp_set_ca_state(tp, TCP_CA_Open); tcp_clear_retrans(tp); tcp_delack_init(tp); Index: linux-2.6.11-rc4/net/ipv4/tcp_input.c =================================================================== --- linux-2.6.11-rc4.orig/net/ipv4/tcp_input.c Tue Feb 22 23:27:44 2005 +++ linux-2.6.11-rc4/net/ipv4/tcp_input.c Wed Feb 23 15:18:57 2005 @@ -92,6 +92,11 @@ int sysctl_tcp_moderate_rcvbuf = 1; +/* RFC 3465 - ABC */ +int sysctl_tcp_abc = 1; +int sysctl_tcp_abc_L = 2; /* The RFC definess 1 as being a more conservative value */ + /* that SHOULD be used, however, we use 2 as it MAY be used */ + /* Default values of the Vegas variables, in fixed-point representation * with V_PARAM_SHIFT bits to the right of the binary point. */ @@ -1287,6 +1292,7 @@ tp->snd_cwnd_cnt = 0; tp->snd_cwnd_stamp = tcp_time_stamp; + tp->bytes_acked = 0; tcp_clear_retrans(tp); /* Push undo marker, if it was plain RTO and nothing @@ -1945,6 +1951,8 @@ TCP_ECN_queue_cwr(tp); } + tp->bytes_acked = 0; + tp->snd_cwnd_cnt = 0; tcp_set_ca_state(tp, TCP_CA_Recovery); } @@ -2100,6 +2108,25 @@ tp->snd_cwnd_stamp = tcp_time_stamp; } +/* This is a wrapper function to handle RFC3465 - ABC. As per the RFC, the abc_L + * value defines a burst moderation to prevent sending large bursts of packets + * should an ack acknowledge many packets. abc_L MUST NOT be larger than 2. */ +static __inline__ void reno_cong_avoid_abc( struct tcp_sock *tp, int mss_now ) +{ + int incrs_applied = 0; + + if (sysctl_tcp_abc && !tp->nonagle) + { + while (tp->bytes_acked > mss_now && incrs_applied < sysctl_tcp_abc_L) { + tp->bytes_acked -= mss_now; + reno_cong_avoid( tp ); + incrs_applied++; + } + } else + reno_cong_avoid( tp ); +} + + /* This is based on the congestion detection/avoidance scheme described in * Lawrence S. Brakmo and Larry L. Peterson. * "TCP Vegas: End to end congestion avoidance on a global internet." @@ -2322,12 +2349,15 @@ tp->snd_cwnd_stamp = tcp_time_stamp; } -static inline void tcp_cong_avoid(struct tcp_sock *tp, u32 ack, u32 seq_rtt) +static inline void tcp_cong_avoid(struct sock *sk, u32 ack, u32 seq_rtt) { + struct tcp_sock *tp = tcp_sk(sk); + int mss_now = tcp_current_mss(sk,1); + if (tcp_vegas_enabled(tp)) vegas_cong_avoid(tp, ack, seq_rtt); else - reno_cong_avoid(tp); + reno_cong_avoid_abc(tp, mss_now); } /* Restart timer after forward progress on connection. @@ -2890,6 +2920,9 @@ if (before(ack, prior_snd_una)) goto old_ack; + if ( sysctl_tcp_abc && tp->ca_state < TCP_CA_CWR ) + tp->bytes_acked += ack - prior_snd_una; + if (!(flag&FLAG_SLOWPATH) && after(ack, prior_snd_una)) { /* Window is constant, pure forward advance. * No more checks are required. @@ -2940,12 +2973,12 @@ if ((flag & FLAG_DATA_ACKED) && (tcp_vegas_enabled(tp) || prior_in_flight >= tp->snd_cwnd) && tcp_may_raise_cwnd(tp, flag)) - tcp_cong_avoid(tp, ack, seq_rtt); + tcp_cong_avoid(sk, ack, seq_rtt); tcp_fastretrans_alert(sk, prior_snd_una, prior_packets, flag); } else { if ((flag & FLAG_DATA_ACKED) && (tcp_vegas_enabled(tp) || prior_in_flight >= tp->snd_cwnd)) - tcp_cong_avoid(tp, ack, seq_rtt); + tcp_cong_avoid(sk, ack, seq_rtt); } if ((flag & FLAG_FORWARD_PROGRESS) || !(flag&FLAG_NOT_DUP)) Index: linux-2.6.11-rc4/net/ipv4/tcp_minisocks.c =================================================================== --- linux-2.6.11-rc4.orig/net/ipv4/tcp_minisocks.c Sun Feb 13 03:07:01 2005 +++ linux-2.6.11-rc4/net/ipv4/tcp_minisocks.c Tue Feb 22 23:28:28 2005 @@ -769,6 +769,8 @@ newtp->snd_cwnd = 2; newtp->snd_cwnd_cnt = 0; + newtp->bytes_acked = 0; + newtp->frto_counter = 0; newtp->frto_highmark = 0; ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFT] BIC TCP delayed ack compensation
@ 2005-02-22 22:22 Hubert Tonneau
2005-02-23 0:58 ` Stephen Hemminger
` (2 more replies)
0 siblings, 3 replies; 17+ messages in thread
From: Hubert Tonneau @ 2005-02-22 22:22 UTC (permalink / raw)
To: Stephen Hemminger, cliff white
Cc: Alexey Kuznetsov, netdev, Injong Rhee, David S. Miller
Stephen Hemminger wrote:
>
> This patch which was extracted from BIC TCP 1.1 compensates
> for systems (like MaxOSX) that don't ACK every other packet.
> It has no impact for normal transfers, but might help with problems
> with Mac like Hubert found.
No, it's even worse.
2.6.9 to 100 Mbps connected MacOSX: 15 seconds (for roughly 100 MB or data)
2.6.9 to gigabit connected MacOSX: 5 seconds
2.6.10-ac11 to 100 Mbps connected MacOSX: 325 seconds
2.6.10-ac11 to gigabit connected MacOSX: 5 seconds
2.6.10-ac11+BIC to 100 Mbps connected MacOSX: 620 seconds
2.6.10-ac11+BIC to gigabit connected MacOSX: 5 seconds
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: [RFT] BIC TCP delayed ack compensation 2005-02-22 22:22 Hubert Tonneau @ 2005-02-23 0:58 ` Stephen Hemminger 2005-02-23 18:32 ` Injong Rhee 2005-02-23 18:37 ` Injong Rhee 2 siblings, 0 replies; 17+ messages in thread From: Stephen Hemminger @ 2005-02-23 0:58 UTC (permalink / raw) To: Hubert Tonneau Cc: cliff white, Alexey Kuznetsov, netdev, Injong Rhee, David S. Miller On Tue, 22 Feb 2005 22:22:42 GMT Hubert Tonneau <hubert.tonneau@fullpliant.org> wrote: > Stephen Hemminger wrote: > > > > This patch which was extracted from BIC TCP 1.1 compensates > > for systems (like MaxOSX) that don't ACK every other packet. > > It has no impact for normal transfers, but might help with problems > > with Mac like Hubert found. > > No, it's even worse. > > 2.6.9 to 100 Mbps connected MacOSX: 15 seconds (for roughly 100 MB or data) > 2.6.9 to gigabit connected MacOSX: 5 seconds > 2.6.10-ac11 to 100 Mbps connected MacOSX: 325 seconds > 2.6.10-ac11 to gigabit connected MacOSX: 5 seconds > 2.6.10-ac11+BIC to 100 Mbps connected MacOSX: 620 seconds > 2.6.10-ac11+BIC to gigabit connected MacOSX: 5 seconds Thanks, that is really interesting... ^ permalink raw reply [flat|nested] 17+ messages in thread
* RE: [RFT] BIC TCP delayed ack compensation 2005-02-22 22:22 Hubert Tonneau 2005-02-23 0:58 ` Stephen Hemminger @ 2005-02-23 18:32 ` Injong Rhee 2005-02-23 19:36 ` Stephen Hemminger 2005-02-23 18:37 ` Injong Rhee 2 siblings, 1 reply; 17+ messages in thread From: Injong Rhee @ 2005-02-23 18:32 UTC (permalink / raw) To: 'Hubert Tonneau', 'Stephen Hemminger', 'cliff white' Cc: 'Alexey Kuznetsov', netdev, 'David S. Miller' > -----Original Message----- > From: Hubert Tonneau [mailto:hubert.tonneau@fullpliant.org] > Sent: Tuesday, February 22, 2005 5:23 PM > 2.6.9 to 100 Mbps connected MacOSX: 15 seconds (for roughly 100 MB > or data) > 2.6.9 to gigabit connected MacOSX: 5 seconds > 2.6.10-ac11 to 100 Mbps connected MacOSX: 325 seconds It seems that there are other problems with this version of Linux. Is there any way we can find out what the problems are. Is this with BIC? If not, there are some parts not working. If it is with BIC, we would like to look into this problem. > 2.6.10-ac11 to gigabit connected MacOSX: 5 seconds > 2.6.10-ac11+BIC to 100 Mbps connected MacOSX: 620 seconds > 2.6.10-ac11+BIC to gigabit connected MacOSX: 5 seconds ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFT] BIC TCP delayed ack compensation 2005-02-23 18:32 ` Injong Rhee @ 2005-02-23 19:36 ` Stephen Hemminger 0 siblings, 0 replies; 17+ messages in thread From: Stephen Hemminger @ 2005-02-23 19:36 UTC (permalink / raw) To: Injong Rhee Cc: 'Hubert Tonneau', 'cliff white', 'Alexey Kuznetsov', netdev, 'David S. Miller' An interesting test would be to repeat the slow case: 2.6.10-ac11 over 100Mbps With first TCP Reno (old default). sysctl -w net.ipv4.tcp_bic=0 then TCP Westwood. sysctl -w net.ipv4.tcp_bic=0 sysctl -w net.ipv4.tcp_westwood=1 ^ permalink raw reply [flat|nested] 17+ messages in thread
* RE: [RFT] BIC TCP delayed ack compensation 2005-02-22 22:22 Hubert Tonneau 2005-02-23 0:58 ` Stephen Hemminger 2005-02-23 18:32 ` Injong Rhee @ 2005-02-23 18:37 ` Injong Rhee 2005-02-23 19:26 ` David S. Miller 2 siblings, 1 reply; 17+ messages in thread From: Injong Rhee @ 2005-02-23 18:37 UTC (permalink / raw) To: 'Hubert Tonneau', 'Stephen Hemminger', 'cliff white' Cc: 'Alexey Kuznetsov', netdev, 'David S. Miller' > -----Original Message----- > From: Hubert Tonneau [mailto:hubert.tonneau@fullpliant.org] > Sent: Tuesday, February 22, 2005 5:23 PM > 2.6.9 to 100 Mbps connected MacOSX: 15 seconds (for roughly 100 MB > or data) > 2.6.9 to gigabit connected MacOSX: 5 seconds > 2.6.10-ac11 to 100 Mbps connected MacOSX: 325 seconds > 2.6.10-ac11 to gigabit connected MacOSX: 5 seconds > 2.6.10-ac11+BIC to 100 Mbps connected MacOSX: 620 seconds > 2.6.10-ac11+BIC to gigabit connected MacOSX: 5 seconds Another way to test whether this is related to the os or bic implementation is to test it with our bic patch 1.1. + Linux 2.4. It will tell whether the original implementation of BIC has something to do with the performance with respect to MacOS. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFT] BIC TCP delayed ack compensation 2005-02-23 18:37 ` Injong Rhee @ 2005-02-23 19:26 ` David S. Miller 2005-02-23 22:04 ` John Heffner 0 siblings, 1 reply; 17+ messages in thread From: David S. Miller @ 2005-02-23 19:26 UTC (permalink / raw) To: Injong Rhee; +Cc: hubert.tonneau, shemminger, cliffw, kuznet, netdev On Wed, 23 Feb 2005 13:37:35 -0500 "Injong Rhee" <rhee@eos.ncsu.edu> wrote: > > > > -----Original Message----- > > From: Hubert Tonneau [mailto:hubert.tonneau@fullpliant.org] > > Sent: Tuesday, February 22, 2005 5:23 PM > > 2.6.9 to 100 Mbps connected MacOSX: 15 seconds (for roughly 100 MB > > or data) > > 2.6.9 to gigabit connected MacOSX: 5 seconds > > 2.6.10-ac11 to 100 Mbps connected MacOSX: 325 seconds > > 2.6.10-ac11 to gigabit connected MacOSX: 5 seconds > > 2.6.10-ac11+BIC to 100 Mbps connected MacOSX: 620 seconds > > 2.6.10-ac11+BIC to gigabit connected MacOSX: 5 seconds > > Another way to test whether this is related to the os or bic > implementation is to test it with our bic patch 1.1. + Linux 2.4. It > will tell whether the original implementation of BIC has something to > do with the performance with respect to MacOS. I don't think BIC has much to do with this problem. MacOS-X does delayed ACKs until a PSH is seen and this kills performance if we don't PSH often enough. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFT] BIC TCP delayed ack compensation 2005-02-23 19:26 ` David S. Miller @ 2005-02-23 22:04 ` John Heffner 2005-02-23 22:10 ` David S. Miller 0 siblings, 1 reply; 17+ messages in thread From: John Heffner @ 2005-02-23 22:04 UTC (permalink / raw) To: David S. Miller; +Cc: hubert.tonneau, netdev On Wed, 23 Feb 2005, David S. Miller wrote: > I don't think BIC has much to do with this problem. MacOS-X does delayed > ACKs until a PSH is seen and this kills performance if we don't PSH often > enough. I looked at the trace last night and I wonder if PSH is a red herring. For example: 16:42:21.837931 IP 10.107.96.230.netbios-ssn > 10.107.96.7.32801: . ack 37545601 win 57184 <nop,nop,timestamp 1709240872 641486> 16:42:21.837937 IP 10.107.96.7.32801 > 10.107.96.230.netbios-ssn: . 37545601:37547049(1448) ack 122802 win 1460 <nop,nop,timestamp 641685 1709240872> NBT Packet 16:42:21.837940 IP 10.107.96.7.32801 > 10.107.96.230.netbios-ssn: . 37547049:37548497(1448) ack 122802 win 1460 <nop,nop,timestamp 641685 1709240872> NBT Packet 16:42:21.837941 IP 10.107.96.7.32801 > 10.107.96.230.netbios-ssn: . 37548497:37549945(1448) ack 122802 win 1460 <nop,nop,timestamp 641685 1709240872> NBT Packet 16:42:21.837943 IP 10.107.96.7.32801 > 10.107.96.230.netbios-ssn: . 37549945:37551393(1448) ack 122802 win 1460 <nop,nop,timestamp 641685 1709240872> NBT Packet 16:42:21.837945 IP 10.107.96.7.32801 > 10.107.96.230.netbios-ssn: . 37551393:37552841(1448) ack 122802 win 1460 <nop,nop,timestamp 641685 1709240872> NBT Packet 16:42:21.837947 IP 10.107.96.7.32801 > 10.107.96.230.netbios-ssn: . 37552841:37554289(1448) ack 122802 win 1460 <nop,nop,timestamp 641685 1709240872> NBT Packet 16:42:21.837949 IP 10.107.96.7.32801 > 10.107.96.230.netbios-ssn: . 37554289:37555737(1448) ack 122802 win 1460 <nop,nop,timestamp 641685 1709240872> NBT Packet 16:42:21.838979 IP 10.107.96.230.netbios-ssn > 10.107.96.7.32801: . ack 37552841 win 65535 <nop,nop,timestamp 1709240872 641685> 16:42:21.838985 IP 10.107.96.7.32801 > 10.107.96.230.netbios-ssn: . 37555737:37557185(1448) ack 122802 win 1460 <nop,nop,timestamp 641686 1709240872> NBT Packet 16:42:21.838987 IP 10.107.96.7.32801 > 10.107.96.230.netbios-ssn: . 37557185:37558633(1448) ack 122802 win 1460 <nop,nop,timestamp 641686 1709240872> NBT Packet 16:42:21.838989 IP 10.107.96.7.32801 > 10.107.96.230.netbios-ssn: . 37558633:37560081(1448) ack 122802 win 1460 <nop,nop,timestamp 641686 1709240872> NBT Packet 16:42:21.838991 IP 10.107.96.7.32801 > 10.107.96.230.netbios-ssn: . 37560081:37561529(1448) ack 122802 win 1460 <nop,nop,timestamp 641686 1709240872> NBT Packet 16:42:21.838992 IP 10.107.96.7.32801 > 10.107.96.230.netbios-ssn: . 37561529:37562977(1448) ack 122802 win 1460 <nop,nop,timestamp 641686 1709240872> NBT Packet 16:42:21.838994 IP 10.107.96.7.32801 > 10.107.96.230.netbios-ssn: . 37562977:37564425(1448) ack 122802 win 1460 <nop,nop,timestamp 641686 1709240872> NBT Packet 16:42:21.839172 IP 10.107.96.230.netbios-ssn > 10.107.96.7.32801: P 122802:122853(51) ack 37554289 win 65128 <nop,nop,timestamp 1709240872 641685> NBT Packet 16:42:21.839178 IP 10.107.96.7.32801 > 10.107.96.230.netbios-ssn: P 37564425:37565873(1448) ack 122853 win 1460 <nop,nop,timestamp 641686 1709240872> NBT Packet 16:42:22.037976 IP 10.107.96.230.netbios-ssn > 10.107.96.7.32801: . ack 37565873 win 53548 <nop,nop,timestamp 1709240872 641685> Maybe this has something to do with the bi-directional nature of the flow? Mac OS delaying ACK to try to piggyback on data or something like that. One signature I noticed is that it seems the last packet sent by the Mac before the long delack timeout is always a small data packet. (I didn't rigorously verify this but it seems true.) -John ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFT] BIC TCP delayed ack compensation 2005-02-23 22:04 ` John Heffner @ 2005-02-23 22:10 ` David S. Miller 2005-02-23 22:19 ` John Heffner 0 siblings, 1 reply; 17+ messages in thread From: David S. Miller @ 2005-02-23 22:10 UTC (permalink / raw) To: John Heffner; +Cc: hubert.tonneau, netdev On Wed, 23 Feb 2005 17:04:08 -0500 (EST) John Heffner <jheffner@psc.edu> wrote: > Maybe this has something to do with the bi-directional nature of the flow? > Mac OS delaying ACK to try to piggyback on data or something like that. > One signature I noticed is that it seems the last packet sent by the Mac > before the long delack timeout is always a small data packet. (I didn't > rigorously verify this but it seems true.) I should be more specific when I say "PSH". Mac OS-X's algorithm is basically that it always delays ACKs to the delayed ACK timeout when the header prediction fast path is hit. One way to "miss" the header prediction fast path is to set PSH (this is actually a bug, Linux fixed this long ago, PSH should be ignored for header prediction fast path checking). When the fast path is missed, it does the usual "every 2 full sized frames" ACK'ing. Out of order data can cause the missing of the fast path as well. That can only be determined if we had dumps from the Mac's perspective however. Anyways, this Mac OS-X behavior has pretty much been universally agreed to as a severe bug, at least on this list :-) ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFT] BIC TCP delayed ack compensation 2005-02-23 22:10 ` David S. Miller @ 2005-02-23 22:19 ` John Heffner 0 siblings, 0 replies; 17+ messages in thread From: John Heffner @ 2005-02-23 22:19 UTC (permalink / raw) To: David S. Miller; +Cc: hubert.tonneau, netdev On Wed, 23 Feb 2005, David S. Miller wrote: > On Wed, 23 Feb 2005 17:04:08 -0500 (EST) > John Heffner <jheffner@psc.edu> wrote: > > > Maybe this has something to do with the bi-directional nature of the flow? > > Mac OS delaying ACK to try to piggyback on data or something like that. > > One signature I noticed is that it seems the last packet sent by the Mac > > before the long delack timeout is always a small data packet. (I didn't > > rigorously verify this but it seems true.) > > I should be more specific when I say "PSH". Mac OS-X's algorithm is basically > that it always delays ACKs to the delayed ACK timeout when the header prediction > fast path is hit. One way to "miss" the header prediction fast path is to > set PSH (this is actually a bug, Linux fixed this long ago, PSH should be ignored > for header prediction fast path checking). The point is it appears to be delaying ack even when PSH is set. > Anyways, this Mac OS-X behavior has pretty much been universally agreed > to as a severe bug, at least on this list :-) Yep. The Mac behavior is clearly bizarre. :) -John ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFT] BIC TCP delayed ack compensation @ 2005-02-23 21:54 Hubert Tonneau 0 siblings, 0 replies; 17+ messages in thread From: Hubert Tonneau @ 2005-02-23 21:54 UTC (permalink / raw) To: Stephen Hemminger, Injong Rhee Cc: 'cliff white', 'Alexey Kuznetsov', netdev, 'David S. Miller' Stephen Hemminger wrote: > > An interesting test would be to repeat the slow case: 2.6.10-ac11 over 100Mbps > > With first TCP Reno (old default). > sysctl -w net.ipv4.tcp_bic=0 No change. > then TCP Westwood. > sysctl -w net.ipv4.tcp_bic=0 > sysctl -w net.ipv4.tcp_westwood=1 No change. Now Linux 2.6.11-rc4 with Injong Rhee abc patch: No change. Looks like David S. Miller is right. Now, what I still don't understand is, if it's PSH/ACK related, why does the gigabit connected Mac works nicely whereas the 100 Mbps connected one does not ? ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2005-02-23 22:19 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <050QTJA12@server5.heliogroup.fr>
2005-02-09 18:59 ` 2.6.10 TCP troubles -- suggested patch Stephen Hemminger
2005-02-09 20:25 ` David S. Miller
2005-02-22 21:50 ` [RFT] BIC TCP delayed ack compensation Stephen Hemminger
2005-02-22 23:30 ` John Heffner
2005-02-22 23:38 ` Baruch Even
2005-02-23 1:04 ` Yee-Ting Li
2005-02-23 15:28 ` Yee-Ting Li
2005-02-22 22:22 Hubert Tonneau
2005-02-23 0:58 ` Stephen Hemminger
2005-02-23 18:32 ` Injong Rhee
2005-02-23 19:36 ` Stephen Hemminger
2005-02-23 18:37 ` Injong Rhee
2005-02-23 19:26 ` David S. Miller
2005-02-23 22:04 ` John Heffner
2005-02-23 22:10 ` David S. Miller
2005-02-23 22:19 ` John Heffner
-- strict thread matches above, loose matches on Subject: below --
2005-02-23 21:54 Hubert Tonneau
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).