* [PATCH 1/5] tcp: make is_dupack a parameter to tcp_fastretrans_alert()
@ 2011-11-16 18:58 Neal Cardwell
2011-11-16 18:58 ` [PATCH 2/5] tcp: use DSACKs that arrive when packets_out is 0 Neal Cardwell
` (4 more replies)
0 siblings, 5 replies; 30+ messages in thread
From: Neal Cardwell @ 2011-11-16 18:58 UTC (permalink / raw)
To: David Miller
Cc: netdev, ilpo.jarvinen, Nandita Dukkipati, Yuchung Cheng,
Tom Herbert, Neal Cardwell
Allow callers to decide whether an ACK is a duplicate ACK. This is a
prerequisite to allowing fastretrans_alert to be called from new
contexts, such as the no_queue and old_ack code paths, from which we
have extra info that tells us whether an ACK is a dupack.
Signed-off-by: Neal Cardwell <ncardwell@google.com>
---
net/ipv4/tcp_input.c | 12 ++++++++----
1 files changed, 8 insertions(+), 4 deletions(-)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 52b5c2d..f772aaa 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3009,11 +3009,11 @@ static void tcp_update_cwnd_in_recovery(struct sock *sk, int newly_acked_sacked,
* tcp_xmit_retransmit_queue().
*/
static void tcp_fastretrans_alert(struct sock *sk, int pkts_acked,
- int newly_acked_sacked, int flag)
+ int newly_acked_sacked, bool is_dupack,
+ int flag)
{
struct inet_connection_sock *icsk = inet_csk(sk);
struct tcp_sock *tp = tcp_sk(sk);
- int is_dupack = !(flag & (FLAG_SND_UNA_ADVANCED | FLAG_NOT_DUP));
int do_lost = is_dupack || ((flag & FLAG_DATA_SACKED) &&
(tcp_fackets_out(tp) > tp->reordering));
int fast_rexmit = 0, mib_idx;
@@ -3681,10 +3681,12 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
u32 prior_snd_una = tp->snd_una;
u32 ack_seq = TCP_SKB_CB(skb)->seq;
u32 ack = TCP_SKB_CB(skb)->ack_seq;
+ bool is_dupack = false;
u32 prior_in_flight;
u32 prior_fackets;
int prior_packets;
int prior_sacked = tp->sacked_out;
+ int pkts_acked = 0;
int newly_acked_sacked = 0;
int frto_cwnd = 0;
@@ -3757,6 +3759,7 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
/* See if we can take anything off of the retransmit queue. */
flag |= tcp_clean_rtx_queue(sk, prior_fackets, prior_snd_una);
+ pkts_acked = prior_packets - tp->packets_out;
newly_acked_sacked = (prior_packets - prior_sacked) -
(tp->packets_out - tp->sacked_out);
@@ -3771,8 +3774,9 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
if ((flag & FLAG_DATA_ACKED) && !frto_cwnd &&
tcp_may_raise_cwnd(sk, flag))
tcp_cong_avoid(sk, ack, prior_in_flight);
- tcp_fastretrans_alert(sk, prior_packets - tp->packets_out,
- newly_acked_sacked, flag);
+ is_dupack = !(flag & (FLAG_SND_UNA_ADVANCED | FLAG_NOT_DUP));
+ tcp_fastretrans_alert(sk, pkts_acked, newly_acked_sacked,
+ is_dupack, flag);
} else {
if ((flag & FLAG_DATA_ACKED) && !frto_cwnd)
tcp_cong_avoid(sk, ack, prior_in_flight);
--
1.7.3.1
^ permalink raw reply related [flat|nested] 30+ messages in thread* [PATCH 2/5] tcp: use DSACKs that arrive when packets_out is 0 2011-11-16 18:58 [PATCH 1/5] tcp: make is_dupack a parameter to tcp_fastretrans_alert() Neal Cardwell @ 2011-11-16 18:58 ` Neal Cardwell 2011-11-17 1:34 ` Ilpo Järvinen 2011-11-16 18:58 ` [PATCH 3/5] tcp: use SACKs and DSACKs that arrive on ACKs below snd_una Neal Cardwell ` (3 subsequent siblings) 4 siblings, 1 reply; 30+ messages in thread From: Neal Cardwell @ 2011-11-16 18:58 UTC (permalink / raw) To: David Miller Cc: netdev, ilpo.jarvinen, Nandita Dukkipati, Yuchung Cheng, Tom Herbert, Neal Cardwell The bug: Senders ignored DSACKs after recovery when there were no outstanding packets (a common scenario for HTTP servers). The change: when there are no outstanding packets (the "no_queue" goto label), call tcp_fastretrans_alert() in order to use DSACKs to undo congestion window reductions. Other patches in this series will provide other changes that are necessary to fully fix this problem. Signed-off-by: Neal Cardwell <ncardwell@google.com> --- net/ipv4/tcp_input.c | 4 ++++ 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index f772aaa..b49e418 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -3788,6 +3788,10 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag) return 1; no_queue: + /* If data was DSACKed, see if we can undo a cwnd reduction. */ + if (flag & FLAG_DSACKING_ACK) + tcp_fastretrans_alert(sk, pkts_acked, newly_acked_sacked, + is_dupack, flag); /* If this ack opens up a zero window, clear backoff. It was * being used to time the probes, and is probably far higher than * it needs to be for normal retransmission. -- 1.7.3.1 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH 2/5] tcp: use DSACKs that arrive when packets_out is 0 2011-11-16 18:58 ` [PATCH 2/5] tcp: use DSACKs that arrive when packets_out is 0 Neal Cardwell @ 2011-11-17 1:34 ` Ilpo Järvinen 2011-11-27 23:55 ` David Miller 0 siblings, 1 reply; 30+ messages in thread From: Ilpo Järvinen @ 2011-11-17 1:34 UTC (permalink / raw) To: Neal Cardwell Cc: David Miller, Netdev, Nandita Dukkipati, Yuchung Cheng, Tom Herbert [-- Attachment #1: Type: TEXT/PLAIN, Size: 1345 bytes --] On Wed, 16 Nov 2011, Neal Cardwell wrote: > The bug: Senders ignored DSACKs after recovery when there were no > outstanding packets (a common scenario for HTTP servers). > > The change: when there are no outstanding packets (the "no_queue" goto > label), call tcp_fastretrans_alert() in order to use DSACKs to undo > congestion window reductions. > > Other patches in this series will provide other changes that are > necessary to fully fix this problem. > > Signed-off-by: Neal Cardwell <ncardwell@google.com> > --- > net/ipv4/tcp_input.c | 4 ++++ > 1 files changed, 4 insertions(+), 0 deletions(-) > > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c > index f772aaa..b49e418 100644 > --- a/net/ipv4/tcp_input.c > +++ b/net/ipv4/tcp_input.c > @@ -3788,6 +3788,10 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag) > return 1; > > no_queue: > + /* If data was DSACKed, see if we can undo a cwnd reduction. */ > + if (flag & FLAG_DSACKING_ACK) > + tcp_fastretrans_alert(sk, pkts_acked, newly_acked_sacked, > + is_dupack, flag); Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> ...however, there could be other undo related gotchas now added by this patch due to complexity of tcp_fastretrans_alert. It would probably be safer to just add the dsack undo related code here? -- i. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/5] tcp: use DSACKs that arrive when packets_out is 0 2011-11-17 1:34 ` Ilpo Järvinen @ 2011-11-27 23:55 ` David Miller 0 siblings, 0 replies; 30+ messages in thread From: David Miller @ 2011-11-27 23:55 UTC (permalink / raw) To: ilpo.jarvinen; +Cc: ncardwell, netdev, nanditad, ycheng, therbert From: "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> Date: Thu, 17 Nov 2011 03:34:17 +0200 (EET) > On Wed, 16 Nov 2011, Neal Cardwell wrote: > >> The bug: Senders ignored DSACKs after recovery when there were no >> outstanding packets (a common scenario for HTTP servers). >> >> The change: when there are no outstanding packets (the "no_queue" goto >> label), call tcp_fastretrans_alert() in order to use DSACKs to undo >> congestion window reductions. >> >> Other patches in this series will provide other changes that are >> necessary to fully fix this problem. >> >> Signed-off-by: Neal Cardwell <ncardwell@google.com> ... > > Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Applied to net-next. ^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH 3/5] tcp: use SACKs and DSACKs that arrive on ACKs below snd_una 2011-11-16 18:58 [PATCH 1/5] tcp: make is_dupack a parameter to tcp_fastretrans_alert() Neal Cardwell 2011-11-16 18:58 ` [PATCH 2/5] tcp: use DSACKs that arrive when packets_out is 0 Neal Cardwell @ 2011-11-16 18:58 ` Neal Cardwell 2011-11-17 1:52 ` Ilpo Järvinen 2011-11-16 18:58 ` [PATCH 4/5] tcp: allow undo from reordered DSACKs Neal Cardwell ` (2 subsequent siblings) 4 siblings, 1 reply; 30+ messages in thread From: Neal Cardwell @ 2011-11-16 18:58 UTC (permalink / raw) To: David Miller Cc: netdev, ilpo.jarvinen, Nandita Dukkipati, Yuchung Cheng, Tom Herbert, Neal Cardwell The bug: When the ACK field is below snd_una (which can happen when ACKs are reordered), senders ignored DSACKs (preventing undo) and did not call tcp_fastretrans_alert, so they did not increment prr_delivered to reflect newly-SACKed sequence ranges, and did not call tcp_xmit_retransmit_queue, thus passing up chances to send out more retransmitted and new packets based on any newly-SACKed packets. The change: When the ACK field is below snd_una (the "old_ack" goto label), call tcp_fastretrans_alert to allow undo based on any newly-arrived DSACKs and try to send out more packets based on newly-SACKed packets. Other patches in this series will provide other changes that are necessary to fully fix this problem. Signed-off-by: Neal Cardwell <ncardwell@google.com> --- net/ipv4/tcp_input.c | 10 +++++++--- 1 files changed, 7 insertions(+), 3 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index b49e418..751d390 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -3805,10 +3805,14 @@ invalid_ack: return -1; old_ack: + /* If data was SACKed, tag it and see if we should send more data. + * If data was DSACKed, see if we can undo a cwnd reduction. + */ if (TCP_SKB_CB(skb)->sacked) { - tcp_sacktag_write_queue(sk, skb, prior_snd_una); - if (icsk->icsk_ca_state == TCP_CA_Open) - tcp_try_keep_open(sk); + flag |= tcp_sacktag_write_queue(sk, skb, prior_snd_una); + newly_acked_sacked = tp->sacked_out - prior_sacked; + tcp_fastretrans_alert(sk, pkts_acked, newly_acked_sacked, + is_dupack, flag); } SOCK_DEBUG(sk, "Ack %u before %u:%u\n", ack, tp->snd_una, tp->snd_nxt); -- 1.7.3.1 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH 3/5] tcp: use SACKs and DSACKs that arrive on ACKs below snd_una 2011-11-16 18:58 ` [PATCH 3/5] tcp: use SACKs and DSACKs that arrive on ACKs below snd_una Neal Cardwell @ 2011-11-17 1:52 ` Ilpo Järvinen 2011-11-27 23:57 ` David Miller 0 siblings, 1 reply; 30+ messages in thread From: Ilpo Järvinen @ 2011-11-17 1:52 UTC (permalink / raw) To: Neal Cardwell Cc: David Miller, Netdev, Nandita Dukkipati, Yuchung Cheng, Tom Herbert [-- Attachment #1: Type: TEXT/PLAIN, Size: 2105 bytes --] On Wed, 16 Nov 2011, Neal Cardwell wrote: > The bug: When the ACK field is below snd_una (which can happen when > ACKs are reordered), senders ignored DSACKs (preventing undo) and did > not call tcp_fastretrans_alert, so they did not increment > prr_delivered to reflect newly-SACKed sequence ranges, and did not > call tcp_xmit_retransmit_queue, thus passing up chances to send out > more retransmitted and new packets based on any newly-SACKed packets. > > The change: When the ACK field is below snd_una (the "old_ack" goto > label), call tcp_fastretrans_alert to allow undo based on any > newly-arrived DSACKs and try to send out more packets based on > newly-SACKed packets. > > Other patches in this series will provide other changes that are > necessary to fully fix this problem. > > Signed-off-by: Neal Cardwell <ncardwell@google.com> > --- > net/ipv4/tcp_input.c | 10 +++++++--- > 1 files changed, 7 insertions(+), 3 deletions(-) > > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c > index b49e418..751d390 100644 > --- a/net/ipv4/tcp_input.c > +++ b/net/ipv4/tcp_input.c > @@ -3805,10 +3805,14 @@ invalid_ack: > return -1; > > old_ack: > + /* If data was SACKed, tag it and see if we should send more data. > + * If data was DSACKed, see if we can undo a cwnd reduction. > + */ > if (TCP_SKB_CB(skb)->sacked) { > - tcp_sacktag_write_queue(sk, skb, prior_snd_una); > - if (icsk->icsk_ca_state == TCP_CA_Open) > - tcp_try_keep_open(sk); > + flag |= tcp_sacktag_write_queue(sk, skb, prior_snd_una); > + newly_acked_sacked = tp->sacked_out - prior_sacked; > + tcp_fastretrans_alert(sk, pkts_acked, newly_acked_sacked, > + is_dupack, flag); I gave FLAG_* some thought and couldn't find something that would go wrong here (due to goto not all flag enablers are checked for). Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> ...unrelated to the fix, I realized that FRTO is not fully thought through in this old ACK case, also its RFC seems to lack considerations on what to do in such case. ...I'll need to think the FRTO stuff a bit more. -- i. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 3/5] tcp: use SACKs and DSACKs that arrive on ACKs below snd_una 2011-11-17 1:52 ` Ilpo Järvinen @ 2011-11-27 23:57 ` David Miller 0 siblings, 0 replies; 30+ messages in thread From: David Miller @ 2011-11-27 23:57 UTC (permalink / raw) To: ilpo.jarvinen; +Cc: ncardwell, netdev, nanditad, ycheng, therbert From: "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> Date: Thu, 17 Nov 2011 03:52:37 +0200 (EET) > On Wed, 16 Nov 2011, Neal Cardwell wrote: > >> The bug: When the ACK field is below snd_una (which can happen when >> ACKs are reordered), senders ignored DSACKs (preventing undo) and did >> not call tcp_fastretrans_alert, so they did not increment >> prr_delivered to reflect newly-SACKed sequence ranges, and did not >> call tcp_xmit_retransmit_queue, thus passing up chances to send out >> more retransmitted and new packets based on any newly-SACKed packets. >> >> The change: When the ACK field is below snd_una (the "old_ack" goto >> label), call tcp_fastretrans_alert to allow undo based on any >> newly-arrived DSACKs and try to send out more packets based on >> newly-SACKed packets. >> >> Other patches in this series will provide other changes that are >> necessary to fully fix this problem. >> >> Signed-off-by: Neal Cardwell <ncardwell@google.com> ... > Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Applied to net-next. > ...unrelated to the fix, I realized that FRTO is not fully thought through > in this old ACK case, also its RFC seems to lack considerations on what to > do in such case. ...I'll need to think the FRTO stuff a bit more. Every feature added to TCP beginning with timestamps has not been well thought out wrt. reordering. :-) ^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH 4/5] tcp: allow undo from reordered DSACKs 2011-11-16 18:58 [PATCH 1/5] tcp: make is_dupack a parameter to tcp_fastretrans_alert() Neal Cardwell 2011-11-16 18:58 ` [PATCH 2/5] tcp: use DSACKs that arrive when packets_out is 0 Neal Cardwell 2011-11-16 18:58 ` [PATCH 3/5] tcp: use SACKs and DSACKs that arrive on ACKs below snd_una Neal Cardwell @ 2011-11-16 18:58 ` Neal Cardwell 2011-11-17 5:18 ` Ilpo Järvinen 2011-11-16 18:58 ` [PATCH 5/5] tcp: skip cwnd moderation in TCP_CA_Open in tcp_try_to_open Neal Cardwell 2011-11-17 1:23 ` [PATCH 1/5] tcp: make is_dupack a parameter to tcp_fastretrans_alert() Ilpo Järvinen 4 siblings, 1 reply; 30+ messages in thread From: Neal Cardwell @ 2011-11-16 18:58 UTC (permalink / raw) To: David Miller Cc: netdev, ilpo.jarvinen, Nandita Dukkipati, Yuchung Cheng, Tom Herbert, Neal Cardwell Previously, SACK-enabled connections hung around in TCP_CA_Disorder state while snd_una==high_seq, just waiting to accumulate DSACKs and hopefully undo a cwnd reduction. This could and did lead to the following unfortunate scenario: if some incoming ACKs advance snd_una beyond high_seq then we were setting undo_marker to 0 and moving to TCP_CA_Open, so if (due to reordering in the ACK return path) we shortly thereafter received a DSACK then we were no longer able to undo the cwnd reduction. The change: Simplify the congestion avoidance state machine by removing the behavior where SACK-enabled connections hung around in the TCP_CA_Disorder state just waiting for DSACKs. Instead, when snd_una advances to high_seq or beyond we typically move to TCP_CA_Open immediately and allow an undo in either TCP_CA_Open or TCP_CA_Disorder if we later receive enough DSACKs. Other patches in this series will provide other changes that are necessary to fully fix this problem. Signed-off-by: Neal Cardwell <ncardwell@google.com> --- net/ipv4/tcp_input.c | 15 ++------------- 1 files changed, 2 insertions(+), 13 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 751d390..a4efdd7 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -2858,7 +2858,7 @@ static void tcp_try_keep_open(struct sock *sk) struct tcp_sock *tp = tcp_sk(sk); int state = TCP_CA_Open; - if (tcp_left_out(tp) || tcp_any_retrans_done(sk) || tp->undo_marker) + if (tcp_left_out(tp) || tcp_any_retrans_done(sk)) state = TCP_CA_Disorder; if (inet_csk(sk)->icsk_ca_state != state) { @@ -3066,17 +3066,6 @@ static void tcp_fastretrans_alert(struct sock *sk, int pkts_acked, } break; - case TCP_CA_Disorder: - tcp_try_undo_dsack(sk); - if (!tp->undo_marker || - /* For SACK case do not Open to allow to undo - * catching for all duplicate ACKs. */ - tcp_is_reno(tp) || tp->snd_una != tp->high_seq) { - tp->undo_marker = 0; - tcp_set_ca_state(sk, TCP_CA_Open); - } - break; - case TCP_CA_Recovery: if (tcp_is_reno(tp)) tcp_reset_reno_sack(tp); @@ -3117,7 +3106,7 @@ static void tcp_fastretrans_alert(struct sock *sk, int pkts_acked, tcp_add_reno_sack(sk); } - if (icsk->icsk_ca_state == TCP_CA_Disorder) + if (icsk->icsk_ca_state <= TCP_CA_Disorder) tcp_try_undo_dsack(sk); if (!tcp_time_to_recover(sk)) { -- 1.7.3.1 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH 4/5] tcp: allow undo from reordered DSACKs 2011-11-16 18:58 ` [PATCH 4/5] tcp: allow undo from reordered DSACKs Neal Cardwell @ 2011-11-17 5:18 ` Ilpo Järvinen 2011-11-17 20:49 ` Neal Cardwell 0 siblings, 1 reply; 30+ messages in thread From: Ilpo Järvinen @ 2011-11-17 5:18 UTC (permalink / raw) To: Neal Cardwell Cc: David Miller, Netdev, Nandita Dukkipati, Yuchung Cheng, Tom Herbert On Wed, 16 Nov 2011, Neal Cardwell wrote: > Previously, SACK-enabled connections hung around in TCP_CA_Disorder > state while snd_una==high_seq, just waiting to accumulate DSACKs and > hopefully undo a cwnd reduction. This could and did lead to the > following unfortunate scenario: if some incoming ACKs advance snd_una > beyond high_seq then we were setting undo_marker to 0 and moving to > TCP_CA_Open, so if (due to reordering in the ACK return path) we > shortly thereafter received a DSACK then we were no longer able to > undo the cwnd reduction. > > The change: Simplify the congestion avoidance state machine by > removing the behavior where SACK-enabled connections hung around in > the TCP_CA_Disorder state just waiting for DSACKs. Instead, when > snd_una advances to high_seq or beyond we typically move to > TCP_CA_Open immediately and allow an undo in either TCP_CA_Open or > TCP_CA_Disorder if we later receive enough DSACKs. > > Other patches in this series will provide other changes that are > necessary to fully fix this problem. > > Signed-off-by: Neal Cardwell <ncardwell@google.com> > --- > net/ipv4/tcp_input.c | 15 ++------------- > 1 files changed, 2 insertions(+), 13 deletions(-) > > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c > index 751d390..a4efdd7 100644 > --- a/net/ipv4/tcp_input.c > +++ b/net/ipv4/tcp_input.c > @@ -2858,7 +2858,7 @@ static void tcp_try_keep_open(struct sock *sk) > struct tcp_sock *tp = tcp_sk(sk); > int state = TCP_CA_Open; > > - if (tcp_left_out(tp) || tcp_any_retrans_done(sk) || tp->undo_marker) > + if (tcp_left_out(tp) || tcp_any_retrans_done(sk)) > state = TCP_CA_Disorder; > > if (inet_csk(sk)->icsk_ca_state != state) { > @@ -3066,17 +3066,6 @@ static void tcp_fastretrans_alert(struct sock *sk, int pkts_acked, > } > break; > > - case TCP_CA_Disorder: > - tcp_try_undo_dsack(sk); > - if (!tp->undo_marker || > - /* For SACK case do not Open to allow to undo > - * catching for all duplicate ACKs. */ > - tcp_is_reno(tp) || tp->snd_una != tp->high_seq) { > - tp->undo_marker = 0; > - tcp_set_ca_state(sk, TCP_CA_Open); > - } > - break; > - > case TCP_CA_Recovery: > if (tcp_is_reno(tp)) > tcp_reset_reno_sack(tp); > @@ -3117,7 +3106,7 @@ static void tcp_fastretrans_alert(struct sock *sk, int pkts_acked, > tcp_add_reno_sack(sk); > } > > - if (icsk->icsk_ca_state == TCP_CA_Disorder) > + if (icsk->icsk_ca_state <= TCP_CA_Disorder) > tcp_try_undo_dsack(sk); > > if (!tcp_time_to_recover(sk)) { How about extending Disorder state until second cumulative ACK that is acking >= high_seq? -- i. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 4/5] tcp: allow undo from reordered DSACKs 2011-11-17 5:18 ` Ilpo Järvinen @ 2011-11-17 20:49 ` Neal Cardwell 2011-11-22 12:44 ` Ilpo Järvinen 0 siblings, 1 reply; 30+ messages in thread From: Neal Cardwell @ 2011-11-17 20:49 UTC (permalink / raw) To: Ilpo Järvinen Cc: David Miller, Netdev, Nandita Dukkipati, Yuchung Cheng, Tom Herbert On Thu, Nov 17, 2011 at 12:18 AM, Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> wrote: > On Wed, 16 Nov 2011, Neal Cardwell wrote: > >> Previously, SACK-enabled connections hung around in TCP_CA_Disorder >> state while snd_una==high_seq, just waiting to accumulate DSACKs and >> hopefully undo a cwnd reduction. This could and did lead to the >> following unfortunate scenario: if some incoming ACKs advance snd_una >> beyond high_seq then we were setting undo_marker to 0 and moving to >> TCP_CA_Open, so if (due to reordering in the ACK return path) we >> shortly thereafter received a DSACK then we were no longer able to >> undo the cwnd reduction. >> >> The change: Simplify the congestion avoidance state machine by >> removing the behavior where SACK-enabled connections hung around in >> the TCP_CA_Disorder state just waiting for DSACKs. Instead, when >> snd_una advances to high_seq or beyond we typically move to >> TCP_CA_Open immediately and allow an undo in either TCP_CA_Open or >> TCP_CA_Disorder if we later receive enough DSACKs. >> >> Other patches in this series will provide other changes that are >> necessary to fully fix this problem. >> >> Signed-off-by: Neal Cardwell <ncardwell@google.com> >> --- >> net/ipv4/tcp_input.c | 15 ++------------- >> 1 files changed, 2 insertions(+), 13 deletions(-) >> >> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c >> index 751d390..a4efdd7 100644 >> --- a/net/ipv4/tcp_input.c >> +++ b/net/ipv4/tcp_input.c >> @@ -2858,7 +2858,7 @@ static void tcp_try_keep_open(struct sock *sk) >> struct tcp_sock *tp = tcp_sk(sk); >> int state = TCP_CA_Open; >> >> - if (tcp_left_out(tp) || tcp_any_retrans_done(sk) || tp->undo_marker) >> + if (tcp_left_out(tp) || tcp_any_retrans_done(sk)) >> state = TCP_CA_Disorder; >> >> if (inet_csk(sk)->icsk_ca_state != state) { >> @@ -3066,17 +3066,6 @@ static void tcp_fastretrans_alert(struct sock *sk, int pkts_acked, >> } >> break; >> >> - case TCP_CA_Disorder: >> - tcp_try_undo_dsack(sk); >> - if (!tp->undo_marker || >> - /* For SACK case do not Open to allow to undo >> - * catching for all duplicate ACKs. */ >> - tcp_is_reno(tp) || tp->snd_una != tp->high_seq) { >> - tp->undo_marker = 0; >> - tcp_set_ca_state(sk, TCP_CA_Open); >> - } >> - break; >> - >> case TCP_CA_Recovery: >> if (tcp_is_reno(tp)) >> tcp_reset_reno_sack(tp); >> @@ -3117,7 +3106,7 @@ static void tcp_fastretrans_alert(struct sock *sk, int pkts_acked, >> tcp_add_reno_sack(sk); >> } >> >> - if (icsk->icsk_ca_state == TCP_CA_Disorder) >> + if (icsk->icsk_ca_state <= TCP_CA_Disorder) >> tcp_try_undo_dsack(sk); >> >> if (!tcp_time_to_recover(sk)) { > > How about extending Disorder state until second cumulative ACK that is > acking >= high_seq? That would seem to add complexity but only provide a partial solution. This proposed patch has the virtue of providing a general solution while simplifying the code a little. What are your concerns with this patch? Thanks, Ilpo, for taking a look at this patch sequence, neal ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 4/5] tcp: allow undo from reordered DSACKs 2011-11-17 20:49 ` Neal Cardwell @ 2011-11-22 12:44 ` Ilpo Järvinen 2011-11-27 23:58 ` David Miller 0 siblings, 1 reply; 30+ messages in thread From: Ilpo Järvinen @ 2011-11-22 12:44 UTC (permalink / raw) To: Neal Cardwell Cc: David Miller, Netdev, Nandita Dukkipati, Yuchung Cheng, Tom Herbert [-- Attachment #1: Type: TEXT/PLAIN, Size: 4409 bytes --] On Thu, 17 Nov 2011, Neal Cardwell wrote: > On Thu, Nov 17, 2011 at 12:18 AM, Ilpo Järvinen > <ilpo.jarvinen@helsinki.fi> wrote: > > On Wed, 16 Nov 2011, Neal Cardwell wrote: > > > >> Previously, SACK-enabled connections hung around in TCP_CA_Disorder > >> state while snd_una==high_seq, just waiting to accumulate DSACKs and > >> hopefully undo a cwnd reduction. This could and did lead to the > >> following unfortunate scenario: if some incoming ACKs advance snd_una > >> beyond high_seq then we were setting undo_marker to 0 and moving to > >> TCP_CA_Open, so if (due to reordering in the ACK return path) we > >> shortly thereafter received a DSACK then we were no longer able to > >> undo the cwnd reduction. > >> > >> The change: Simplify the congestion avoidance state machine by > >> removing the behavior where SACK-enabled connections hung around in > >> the TCP_CA_Disorder state just waiting for DSACKs. Instead, when > >> snd_una advances to high_seq or beyond we typically move to > >> TCP_CA_Open immediately and allow an undo in either TCP_CA_Open or > >> TCP_CA_Disorder if we later receive enough DSACKs. > >> > >> Other patches in this series will provide other changes that are > >> necessary to fully fix this problem. > >> > >> Signed-off-by: Neal Cardwell <ncardwell@google.com> > >> --- > >> net/ipv4/tcp_input.c | 15 ++------------- > >> 1 files changed, 2 insertions(+), 13 deletions(-) > >> > >> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c > >> index 751d390..a4efdd7 100644 > >> --- a/net/ipv4/tcp_input.c > >> +++ b/net/ipv4/tcp_input.c > >> @@ -2858,7 +2858,7 @@ static void tcp_try_keep_open(struct sock *sk) > >> struct tcp_sock *tp = tcp_sk(sk); > >> int state = TCP_CA_Open; > >> > >> - if (tcp_left_out(tp) || tcp_any_retrans_done(sk) || tp->undo_marker) > >> + if (tcp_left_out(tp) || tcp_any_retrans_done(sk)) > >> state = TCP_CA_Disorder; > >> > >> if (inet_csk(sk)->icsk_ca_state != state) { > >> @@ -3066,17 +3066,6 @@ static void tcp_fastretrans_alert(struct sock *sk, int pkts_acked, > >> } > >> break; > >> > >> - case TCP_CA_Disorder: > >> - tcp_try_undo_dsack(sk); > >> - if (!tp->undo_marker || > >> - /* For SACK case do not Open to allow to undo > >> - * catching for all duplicate ACKs. */ > >> - tcp_is_reno(tp) || tp->snd_una != tp->high_seq) { > >> - tp->undo_marker = 0; > >> - tcp_set_ca_state(sk, TCP_CA_Open); > >> - } > >> - break; > >> - > >> case TCP_CA_Recovery: > >> if (tcp_is_reno(tp)) > >> tcp_reset_reno_sack(tp); > >> @@ -3117,7 +3106,7 @@ static void tcp_fastretrans_alert(struct sock *sk, int pkts_acked, > >> tcp_add_reno_sack(sk); > >> } > >> > >> - if (icsk->icsk_ca_state == TCP_CA_Disorder) > >> + if (icsk->icsk_ca_state <= TCP_CA_Disorder) > >> tcp_try_undo_dsack(sk); > >> > >> if (!tcp_time_to_recover(sk)) { > > > > How about extending Disorder state until second cumulative ACK that is > > acking >= high_seq? > > That would seem to add complexity but only provide a partial solution. Right, I forgot the reordering. > This proposed patch has the virtue of providing a general solution > while simplifying the code a little. > > What are your concerns with this patch? ...I was thinking that if we go to the direction of removal of more and more CA_Disorder stuff we could just as well remove the whole state. But I've since that time convinced myself that the state itself is still necessary even if less action takes place in it after this patch. Also, one (serious) word of caution! This change, by its extending of CA_Open state, is somewhat similar to what I made FRTO years ago, and managed to introduces subtle breaking to the state machine. Please check that the problem similar to what was fixed by a1c1f281b84a751fdb5ff919da3b09df7297619f does not reappear (possibly in some other form causing spurious undos). ...To me it seems that tcp_packet_delayed might be similarly confused after the given patch. -- i. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 4/5] tcp: allow undo from reordered DSACKs 2011-11-22 12:44 ` Ilpo Järvinen @ 2011-11-27 23:58 ` David Miller 2011-11-28 17:31 ` Neal Cardwell 0 siblings, 1 reply; 30+ messages in thread From: David Miller @ 2011-11-27 23:58 UTC (permalink / raw) To: ilpo.jarvinen; +Cc: ncardwell, netdev, nanditad, ycheng, therbert From: "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> Date: Tue, 22 Nov 2011 14:44:32 +0200 (EET) > On Thu, 17 Nov 2011, Neal Cardwell wrote: > >> On Thu, Nov 17, 2011 at 12:18 AM, Ilpo Järvinen >> <ilpo.jarvinen@helsinki.fi> wrote: >> > On Wed, 16 Nov 2011, Neal Cardwell wrote: >> > >> >> Previously, SACK-enabled connections hung around in TCP_CA_Disorder >> >> state while snd_una==high_seq, just waiting to accumulate DSACKs and >> >> hopefully undo a cwnd reduction. This could and did lead to the >> >> following unfortunate scenario: if some incoming ACKs advance snd_una >> >> beyond high_seq then we were setting undo_marker to 0 and moving to >> >> TCP_CA_Open, so if (due to reordering in the ACK return path) we >> >> shortly thereafter received a DSACK then we were no longer able to >> >> undo the cwnd reduction. >> >> >> >> The change: Simplify the congestion avoidance state machine by >> >> removing the behavior where SACK-enabled connections hung around in >> >> the TCP_CA_Disorder state just waiting for DSACKs. Instead, when >> >> snd_una advances to high_seq or beyond we typically move to >> >> TCP_CA_Open immediately and allow an undo in either TCP_CA_Open or >> >> TCP_CA_Disorder if we later receive enough DSACKs. >> >> >> >> Other patches in this series will provide other changes that are >> >> necessary to fully fix this problem. >> >> >> >> Signed-off-by: Neal Cardwell <ncardwell@google.com> ... >> > How about extending Disorder state until second cumulative ACK that is >> > acking >= high_seq? >> >> That would seem to add complexity but only provide a partial solution. > > Right, I forgot the reordering. In order to make forward progress I've added Neil's patch to net-next. > Also, one (serious) word of caution! This change, by its extending of > CA_Open state, is somewhat similar to what I made FRTO years ago, and > managed to introduces subtle breaking to the state machine. Please check > that the problem similar to what was fixed by > a1c1f281b84a751fdb5ff919da3b09df7297619f does not reappear (possibly in > some other form causing spurious undos). ...To me it seems that > tcp_packet_delayed might be similarly confused after the given patch. Neil, please look into this so we can have any such issues fixed in time for the next merge window. Thanks. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 4/5] tcp: allow undo from reordered DSACKs 2011-11-27 23:58 ` David Miller @ 2011-11-28 17:31 ` Neal Cardwell 2011-12-12 12:05 ` [PATCH 1/1] tcp: fix spurious undos in CA_Open Ilpo Järvinen 0 siblings, 1 reply; 30+ messages in thread From: Neal Cardwell @ 2011-11-28 17:31 UTC (permalink / raw) To: David Miller; +Cc: ilpo.jarvinen, netdev, nanditad, ycheng, therbert On Sun, Nov 27, 2011 at 6:58 PM, David Miller <davem@davemloft.net> wrote: > In order to make forward progress I've added Neil's patch to net-next. Thanks! >> Also, one (serious) word of caution! This change, by its extending of >> CA_Open state, is somewhat similar to what I made FRTO years ago, and >> managed to introduces subtle breaking to the state machine. Please check >> that the problem similar to what was fixed by >> a1c1f281b84a751fdb5ff919da3b09df7297619f does not reappear (possibly in >> some other form causing spurious undos). ...To me it seems that >> tcp_packet_delayed might be similarly confused after the given patch. > > Neil, please look into this so we can have any such issues fixed > in time for the next merge window. Absolutely. I will look into the areas that Ilpo mentioned. neal ^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH 1/1] tcp: fix spurious undos in CA_Open 2011-11-28 17:31 ` Neal Cardwell @ 2011-12-12 12:05 ` Ilpo Järvinen 2011-12-12 19:40 ` Neal Cardwell 0 siblings, 1 reply; 30+ messages in thread From: Ilpo Järvinen @ 2011-12-12 12:05 UTC (permalink / raw) To: Neal Cardwell, David Miller; +Cc: Netdev, nanditad, Yuchung Cheng, therbert [-- Attachment #1: Type: TEXT/PLAIN, Size: 3838 bytes --] On Mon, 28 Nov 2011, Neal Cardwell wrote: > On Sun, Nov 27, 2011 at 6:58 PM, David Miller <davem@davemloft.net> wrote: > > >> Also, one (serious) word of caution! This change, by its extending of > >> CA_Open state, is somewhat similar to what I made FRTO years ago, and > >> managed to introduces subtle breaking to the state machine. Please check > >> that the problem similar to what was fixed by > >> a1c1f281b84a751fdb5ff919da3b09df7297619f does not reappear (possibly in > >> some other form causing spurious undos). ...To me it seems that > >> tcp_packet_delayed might be similarly confused after the given patch. > > > > Neil, please look into this so we can have any such issues fixed > > in time for the next merge window. > > Absolutely. I will look into the areas that Ilpo mentioned. Unfortunately nothing has happened? So I finally had to come up something myself. ...Compile tested only. -- [PATCH 1/1] tcp: fix spurious undos in CA_Open There's a landmine in tcp_packet_delayed. In CA_Open the retrans_stamp is very eagerly cleared which falsely triggers packet-delayed detection. Therefore this code cannot be used in CA_Open if the retrans_stamp was already cleared. This became issue once DSACK detection was extended to happen in CA_Open state too (f698204bd0b, tcp: allow undo from reordered DSACKs). Essentially the whole congestion control is disabled because the undos now always restore the previous window. I wonder if this was already in production... ...at least the Internet didn't melt ;-). Compile tested only. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> --- net/ipv4/tcp_input.c | 20 ++++++++++++++------ 1 files changed, 14 insertions(+), 6 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 52b5c2d..7d8934d 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -2640,8 +2640,14 @@ static void tcp_cwnd_down(struct sock *sk, int flag) /* Nothing was retransmitted or returned timestamp is less * than timestamp of the first retransmission. */ -static inline int tcp_packet_delayed(const struct tcp_sock *tp) +static inline int tcp_packet_delayed(const struct sock *sk) { + struct tcp_sock *tp = tcp_sk(sk); + const struct inet_connection_sock *icsk = inet_csk(sk); + + if (icsk->icsk_ca_state == TCP_CA_Open && !tp->retrans_stamp) + return 0; + return !tp->retrans_stamp || (tp->rx_opt.saw_tstamp && tp->rx_opt.rcv_tsecr && before(tp->rx_opt.rcv_tsecr, tp->retrans_stamp)); @@ -2701,9 +2707,11 @@ static void tcp_undo_cwr(struct sock *sk, const bool undo_ssthresh) tp->snd_cwnd_stamp = tcp_time_stamp; } -static inline int tcp_may_undo(const struct tcp_sock *tp) +static inline int tcp_may_undo(const struct sock *sk) { - return tp->undo_marker && (!tp->undo_retrans || tcp_packet_delayed(tp)); + struct tcp_sock *tp = tcp_sk(sk); + + return tp->undo_marker && (!tp->undo_retrans || tcp_packet_delayed(sk)); } /* People celebrate: "We love our President!" */ @@ -2711,7 +2719,7 @@ static int tcp_try_undo_recovery(struct sock *sk) { struct tcp_sock *tp = tcp_sk(sk); - if (tcp_may_undo(tp)) { + if (tcp_may_undo(sk)) { int mib_idx; /* Happy end! We did not retransmit anything @@ -2788,7 +2796,7 @@ static int tcp_try_undo_partial(struct sock *sk, int acked) /* Partial ACK arrived. Force Hoe's retransmit. */ int failed = tcp_is_reno(tp) || (tcp_fackets_out(tp) > tp->reordering); - if (tcp_may_undo(tp)) { + if (tcp_may_undo(sk)) { /* Plain luck! Hole if filled with delayed * packet, rather than with a retransmit. */ @@ -2815,7 +2823,7 @@ static int tcp_try_undo_loss(struct sock *sk) { struct tcp_sock *tp = tcp_sk(sk); - if (tcp_may_undo(tp)) { + if (tcp_may_undo(sk)) { struct sk_buff *skb; tcp_for_write_queue(skb, sk) { if (skb == tcp_send_head(sk)) -- 1.7.0.4 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH 1/1] tcp: fix spurious undos in CA_Open 2011-12-12 12:05 ` [PATCH 1/1] tcp: fix spurious undos in CA_Open Ilpo Järvinen @ 2011-12-12 19:40 ` Neal Cardwell 2011-12-14 8:57 ` Ilpo Järvinen 0 siblings, 1 reply; 30+ messages in thread From: Neal Cardwell @ 2011-12-12 19:40 UTC (permalink / raw) To: Ilpo Järvinen Cc: David Miller, Netdev, nanditad, Yuchung Cheng, therbert On Mon, Dec 12, 2011 at 7:05 AM, Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> wrote: > On Mon, 28 Nov 2011, Neal Cardwell wrote: > >> On Sun, Nov 27, 2011 at 6:58 PM, David Miller <davem@davemloft.net> wrote: >> >> >> Also, one (serious) word of caution! This change, by its extending of >> >> CA_Open state, is somewhat similar to what I made FRTO years ago, and >> >> managed to introduces subtle breaking to the state machine. Please check >> >> that the problem similar to what was fixed by >> >> a1c1f281b84a751fdb5ff919da3b09df7297619f does not reappear (possibly in >> >> some other form causing spurious undos). ...To me it seems that >> >> tcp_packet_delayed might be similarly confused after the given patch. >> > >> > Neil, please look into this so we can have any such issues fixed >> > in time for the next merge window. >> >> Absolutely. I will look into the areas that Ilpo mentioned. > > Unfortunately nothing has happened? So I finally had to come up something > myself. ...Compile tested only. Sorry for the radio silence, but I have been looking at this. So far it's been a time-consuming process, as we've uncovered a number of pre-existing issues in tcp_packet_delayed that seem like quite serious bugs: 1) tcp_packet_delayed implicitly assumes, but does not verify, that the ACK in question (whose ECR field precedes retrans_stamp) covers all retransmitted segments. So if we retransmit segments 1 and 2, and then get an ACK for 1 that has an ECR indicating it was for the original copy of segment 1, then we undo, even in cases where segment 2 really was lost and needed to be retransmitted. 2) tcp_packet_delayed implicitly assumes that if we retransmit a packet that was truly lost then any ACK for the retransmitted instance of that packet will echo the timestamp field in the retransmitted packet. But if the receiver is implementing delayed ACKs and correctly implements timestamps per RFC 1323, then the receiver will be obeying this clause: "Thus, when delayed ACKs are in use, the receiver should reply with the TSval field from the earliest unacknowledged segment." Thus tcp_packet_delayed can cause spurious undos in the following circumstances: - send segment 1 with tsval 10 - receiver is doing delayed ACKs and does not ACK 1 but sets TS.Recent to 10 - send segment 2 with tsval 20 - segment 2 is lost in transit - RTO, retransmit segment 2 with tsval 30 - receiver sends delayed ACK for packets 1 and 2 with TS.Recent of 10 - sender receives ACK for packet 2 with tsecho of 10, lower than 30, and thinks RTO was spurious For better or worse, both of these bugs appear to be inherited from the Eifel Detection Algorithm RFC (RFC 3522). In any case, we've been working on coming up with a solution for these issues. > -- > [PATCH 1/1] tcp: fix spurious undos in CA_Open > > There's a landmine in tcp_packet_delayed. In CA_Open the > retrans_stamp is very eagerly cleared which falsely triggers > packet-delayed detection. Therefore this code cannot be used > in CA_Open if the retrans_stamp was already cleared. > > This became issue once DSACK detection was extended to happen > in CA_Open state too (f698204bd0b, tcp: allow undo from > reordered DSACKs). Essentially the whole congestion control > is disabled because the undos now always restore the previous > window. I wonder if this was already in production... ...at > least the Internet didn't melt ;-). I'm pretty sure this commit is unnecessary. It seems like a NOP. Note that tcp_may_undo and tcp_packet_delayed are never called in state TCP_CA_Open. tcp_may_undo is only called from tcp_try_undo_recovery, tcp_try_undo_partial, and tcp_try_undo_loss, and to call any of these functions you must be in either TCP_CA_Loss or TCP_CA_Recovery. Only tcp_try_undo_dsack is called from TCP_CA_Open, but it makes no reference to retrans_stamp. Please let me know if I'm missing something, or if there's a specific scenario you're concerned about. thanks, neal ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 1/1] tcp: fix spurious undos in CA_Open 2011-12-12 19:40 ` Neal Cardwell @ 2011-12-14 8:57 ` Ilpo Järvinen 2011-12-18 18:23 ` Neal Cardwell 0 siblings, 1 reply; 30+ messages in thread From: Ilpo Järvinen @ 2011-12-14 8:57 UTC (permalink / raw) To: Neal Cardwell; +Cc: David Miller, Netdev, nanditad, Yuchung Cheng, therbert [-- Attachment #1: Type: TEXT/PLAIN, Size: 5753 bytes --] On Mon, 12 Dec 2011, Neal Cardwell wrote: > On Mon, Dec 12, 2011 at 7:05 AM, Ilpo Järvinen > <ilpo.jarvinen@helsinki.fi> wrote: > > On Mon, 28 Nov 2011, Neal Cardwell wrote: > > > >> On Sun, Nov 27, 2011 at 6:58 PM, David Miller <davem@davemloft.net> wrote: > >> > >> >> Also, one (serious) word of caution! This change, by its extending of > >> >> CA_Open state, is somewhat similar to what I made FRTO years ago, and > >> >> managed to introduces subtle breaking to the state machine. Please check > >> >> that the problem similar to what was fixed by > >> >> a1c1f281b84a751fdb5ff919da3b09df7297619f does not reappear (possibly in > >> >> some other form causing spurious undos). ...To me it seems that > >> >> tcp_packet_delayed might be similarly confused after the given patch. > >> > > >> > Neil, please look into this so we can have any such issues fixed > >> > in time for the next merge window. > >> > >> Absolutely. I will look into the areas that Ilpo mentioned. > > > > Unfortunately nothing has happened? So I finally had to come up something > > myself. ...Compile tested only. > > Sorry for the radio silence, but I have been looking at this. So far > it's been a time-consuming process, as we've uncovered a number of > pre-existing issues in tcp_packet_delayed that seem like quite serious > bugs: > > 1) tcp_packet_delayed implicitly assumes, but does not verify, that > the ACK in question (whose ECR field precedes retrans_stamp) covers > all retransmitted segments. So if we retransmit segments 1 and 2, and > then get an ACK for 1 that has an ECR indicating it was for the > original copy of segment 1, then we undo, even in cases where segment > 2 really was lost and needed to be retransmitted. I think this might be an mis-implementation in tcp_try_undo_partial. > 2) tcp_packet_delayed implicitly assumes that if we retransmit a > packet that was truly lost then any ACK for the retransmitted instance > of that packet will echo the timestamp field in the retransmitted > packet. But if the receiver is implementing delayed ACKs and correctly > implements timestamps per RFC 1323, then the receiver will be obeying > this clause: "Thus, when delayed ACKs are in use, the receiver should > reply with the TSval field from the earliest unacknowledged segment." > Thus tcp_packet_delayed can cause spurious undos in the following > circumstances: > > - send segment 1 with tsval 10 > - receiver is doing delayed ACKs and does not ACK 1 but sets TS.Recent to 10 > - send segment 2 with tsval 20 > - segment 2 is lost in transit > - RTO, retransmit segment 2 with tsval 30 > - receiver sends delayed ACK for packets 1 and 2 with TS.Recent of 10 > - sender receives ACK for packet 2 with tsecho of 10, lower than 30, > and thinks RTO was spurious About this I'm actually aware of, I have even some long email postponed related to this to discuss in context of 1323bis (as 1323bis seems pretty much standstill I'm not in a big hurry :-)). I think 1323 algo is flawed anyway and it was fixed/clarified/better in the bis (which doesn't address this well enough though). I discovered this while I setup a network with ACK losses to figure out other issue. I discovered that the RTT measurements occassionally took "the wrong TS" bloating the RTT estimate. As RTT samples included RTO delays, with enough ACK losses the RTO eventually hits the max value. ...As a result, not very nice performance was acquired for that such a flow. I think there are two possible solutions: 1) Echo latest timestamp for below window packets (I'm not sure if there's can of worms w.r.t. security in this approach). A good thing in this would be that it would really solve the rexmit ambiguity problem which the current approach does not do. 2) Ignore packets with DSACK in RTT measurement. 1323bis is not of much help in this which TS to echo (which is why I intented to mention it on tcpm). IIRC, it depended on which takes precedence: RFC793 in-window check or the given TS algorithm, which is not discussed anywhere. > For better or worse, both of these bugs appear to be inherited from > the Eifel Detection Algorithm RFC (RFC 3522). RFC3522 does not have the issue 1) as it's working after RTO+first ACK only. The second issue is valid for it though. > In any case, we've been working on coming up with a solution for these issues. > > > -- > > [PATCH 1/1] tcp: fix spurious undos in CA_Open > > > > There's a landmine in tcp_packet_delayed. In CA_Open the > > retrans_stamp is very eagerly cleared which falsely triggers > > packet-delayed detection. Therefore this code cannot be used > > in CA_Open if the retrans_stamp was already cleared. > > > > This became issue once DSACK detection was extended to happen > > in CA_Open state too (f698204bd0b, tcp: allow undo from > > reordered DSACKs). Essentially the whole congestion control > > is disabled because the undos now always restore the previous > > window. I wonder if this was already in production... ...at > > least the Internet didn't melt ;-). > > I'm pretty sure this commit is unnecessary. It seems like a NOP. Note > that tcp_may_undo and tcp_packet_delayed are never called in state > TCP_CA_Open. tcp_may_undo is only called from tcp_try_undo_recovery, > tcp_try_undo_partial, and tcp_try_undo_loss, and to call any of these > functions you must be in either TCP_CA_Loss or TCP_CA_Recovery. Only > tcp_try_undo_dsack is called from TCP_CA_Open, but it makes no > reference to retrans_stamp. It seems that I've missed this difference in tcp_try_undo_dsack (well, seen it but never thought it that much). I think you're right. Thanks for taking a look. Some WARN_ON to enforce this !CA_Open condition there might not hurt though. -- i. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 1/1] tcp: fix spurious undos in CA_Open 2011-12-14 8:57 ` Ilpo Järvinen @ 2011-12-18 18:23 ` Neal Cardwell 0 siblings, 0 replies; 30+ messages in thread From: Neal Cardwell @ 2011-12-18 18:23 UTC (permalink / raw) To: Ilpo Järvinen Cc: David Miller, Netdev, nanditad, Yuchung Cheng, therbert On Wed, Dec 14, 2011 at 3:57 AM, Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> wrote: > On Mon, 12 Dec 2011, Neal Cardwell wrote: > >> On Mon, Dec 12, 2011 at 7:05 AM, Ilpo Järvinen >> <ilpo.jarvinen@helsinki.fi> wrote: >> > On Mon, 28 Nov 2011, Neal Cardwell wrote: >> > >> >> On Sun, Nov 27, 2011 at 6:58 PM, David Miller <davem@davemloft.net> wrote: >> >> >> >> >> Also, one (serious) word of caution! This change, by its extending of >> >> >> CA_Open state, is somewhat similar to what I made FRTO years ago, and >> >> >> managed to introduces subtle breaking to the state machine. Please check >> >> >> that the problem similar to what was fixed by >> >> >> a1c1f281b84a751fdb5ff919da3b09df7297619f does not reappear (possibly in >> >> >> some other form causing spurious undos). ...To me it seems that >> >> >> tcp_packet_delayed might be similarly confused after the given patch. >> >> > >> >> > Neil, please look into this so we can have any such issues fixed >> >> > in time for the next merge window. >> >> >> >> Absolutely. I will look into the areas that Ilpo mentioned. >> > >> > Unfortunately nothing has happened? So I finally had to come up something >> > myself. ...Compile tested only. >> >> Sorry for the radio silence, but I have been looking at this. So far >> it's been a time-consuming process, as we've uncovered a number of >> pre-existing issues in tcp_packet_delayed that seem like quite serious >> bugs: >> >> 1) tcp_packet_delayed implicitly assumes, but does not verify, that >> the ACK in question (whose ECR field precedes retrans_stamp) covers >> all retransmitted segments. So if we retransmit segments 1 and 2, and >> then get an ACK for 1 that has an ECR indicating it was for the >> original copy of segment 1, then we undo, even in cases where segment >> 2 really was lost and needed to be retransmitted. > > I think this might be an mis-implementation in tcp_try_undo_partial. Yeah, I'm pretty sure this is a bug. There needs to be some additional logic, whether in tcp_try_undo_partial or tcp_packet_delayed, or somewhere else, to fix this bug. >> 2) tcp_packet_delayed implicitly assumes that if we retransmit a >> packet that was truly lost then any ACK for the retransmitted instance >> of that packet will echo the timestamp field in the retransmitted >> packet. But if the receiver is implementing delayed ACKs and correctly >> implements timestamps per RFC 1323, then the receiver will be obeying >> this clause: "Thus, when delayed ACKs are in use, the receiver should >> reply with the TSval field from the earliest unacknowledged segment." >> Thus tcp_packet_delayed can cause spurious undos in the following >> circumstances: >> >> - send segment 1 with tsval 10 >> - receiver is doing delayed ACKs and does not ACK 1 but sets TS.Recent to 10 >> - send segment 2 with tsval 20 >> - segment 2 is lost in transit >> - RTO, retransmit segment 2 with tsval 30 >> - receiver sends delayed ACK for packets 1 and 2 with TS.Recent of 10 >> - sender receives ACK for packet 2 with tsecho of 10, lower than 30, >> and thinks RTO was spurious > > About this I'm actually aware of, I have even some long email postponed > related to this to discuss in context of 1323bis (as 1323bis seems pretty > much standstill I'm not in a big hurry :-)). I think 1323 algo is flawed > anyway and it was fixed/clarified/better in the bis (which doesn't > address this well enough though). > > I discovered this while I setup a network with ACK losses to figure out > other issue. I discovered that the RTT measurements occassionally took > "the wrong TS" bloating the RTT estimate. As RTT samples included RTO > delays, with enough ACK losses the RTO eventually hits the max value. > ...As a result, not very nice performance was acquired for that such a > flow. > > I think there are two possible solutions: > 1) Echo latest timestamp for below window packets (I'm not sure if > there's can of worms w.r.t. security in this approach). A good thing > in this would be that it would really solve the rexmit ambiguity problem > which the current approach does not do. > 2) Ignore packets with DSACK in RTT measurement. > > 1323bis is not of much help in this which TS to echo (which is why I > intented to mention it on tcpm). IIRC, it depended on which takes > precedence: RFC793 in-window check or the given TS algorithm, which is > not discussed anywhere. Interesting. I hope the standards get improved in this area. >> For better or worse, both of these bugs appear to be inherited from >> the Eifel Detection Algorithm RFC (RFC 3522). > > RFC3522 does not have the issue 1) as it's working after RTO+first ACK > only. The second issue is valid for it though. Actually RFC 3522 does have issue 1) because it is also in effect for fast retransmit / fast recovery; RFC 3522 says "If the Eifel detection algorithm is used, the following steps MUST be taken by a TCP sender, but only upon initiation of loss recovery, i.e., when either the timeout-based retransmit or the fast retransmit is sent." In addition, RFC 3517 allows senders to send out multiple retransmissions upon the first ACK that initiates fast recovery: Section 5., Algorithm Details, says "(5) In order to take advantage of potential additional available cwnd, proceed to step (C) below. ... (C) If cwnd - pipe >= 1 SMSS the sender SHOULD transmit one or more segments." As a consequence, senders can have multiple retransmissions in flight at the point they get the first ACK during fast recovery, at which point they run the Eifel detection algorithm. So I'm pretty sure 1) is an issue both in the RFCs and in Linux. >> In any case, we've been working on coming up with a solution for these issues. >> >> > -- >> > [PATCH 1/1] tcp: fix spurious undos in CA_Open >> > >> > There's a landmine in tcp_packet_delayed. In CA_Open the >> > retrans_stamp is very eagerly cleared which falsely triggers >> > packet-delayed detection. Therefore this code cannot be used >> > in CA_Open if the retrans_stamp was already cleared. >> > >> > This became issue once DSACK detection was extended to happen >> > in CA_Open state too (f698204bd0b, tcp: allow undo from >> > reordered DSACKs). Essentially the whole congestion control >> > is disabled because the undos now always restore the previous >> > window. I wonder if this was already in production... ...at >> > least the Internet didn't melt ;-). >> >> I'm pretty sure this commit is unnecessary. It seems like a NOP. Note >> that tcp_may_undo and tcp_packet_delayed are never called in state >> TCP_CA_Open. tcp_may_undo is only called from tcp_try_undo_recovery, >> tcp_try_undo_partial, and tcp_try_undo_loss, and to call any of these >> functions you must be in either TCP_CA_Loss or TCP_CA_Recovery. Only >> tcp_try_undo_dsack is called from TCP_CA_Open, but it makes no >> reference to retrans_stamp. > > It seems that I've missed this difference in tcp_try_undo_dsack (well, > seen it but never thought it that much). I think you're right. Thanks for > taking a look. > > Some WARN_ON to enforce this !CA_Open condition there might not hurt > though. Sounds reasonable. I'll think about this as I work on coming up with a fix for the 1) and 2) bugs discussed above. neal ^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH 5/5] tcp: skip cwnd moderation in TCP_CA_Open in tcp_try_to_open 2011-11-16 18:58 [PATCH 1/5] tcp: make is_dupack a parameter to tcp_fastretrans_alert() Neal Cardwell ` (2 preceding siblings ...) 2011-11-16 18:58 ` [PATCH 4/5] tcp: allow undo from reordered DSACKs Neal Cardwell @ 2011-11-16 18:58 ` Neal Cardwell 2011-11-17 5:14 ` Ilpo Järvinen 2011-11-17 1:23 ` [PATCH 1/5] tcp: make is_dupack a parameter to tcp_fastretrans_alert() Ilpo Järvinen 4 siblings, 1 reply; 30+ messages in thread From: Neal Cardwell @ 2011-11-16 18:58 UTC (permalink / raw) To: David Miller Cc: netdev, ilpo.jarvinen, Nandita Dukkipati, Yuchung Cheng, Tom Herbert, Neal Cardwell The problem: Senders were overriding cwnd values picked during an undo by calling tcp_moderate_cwnd() in tcp_try_to_open(). The fix: Don't moderate cwnd in tcp_try_to_open() if we're in TCP_CA_Open, since doing so is generally unnecessary and specifically would override a DSACK-based undo of a cwnd reduction made in fast recovery. Signed-off-by: Neal Cardwell <ncardwell@google.com> --- net/ipv4/tcp_input.c | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index a4efdd7..78dd38c 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -2881,7 +2881,8 @@ static void tcp_try_to_open(struct sock *sk, int flag) if (inet_csk(sk)->icsk_ca_state != TCP_CA_CWR) { tcp_try_keep_open(sk); - tcp_moderate_cwnd(tp); + if (inet_csk(sk)->icsk_ca_state != TCP_CA_Open) + tcp_moderate_cwnd(tp); } else { tcp_cwnd_down(sk, flag); } -- 1.7.3.1 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH 5/5] tcp: skip cwnd moderation in TCP_CA_Open in tcp_try_to_open 2011-11-16 18:58 ` [PATCH 5/5] tcp: skip cwnd moderation in TCP_CA_Open in tcp_try_to_open Neal Cardwell @ 2011-11-17 5:14 ` Ilpo Järvinen 2011-11-19 21:08 ` Neal Cardwell 0 siblings, 1 reply; 30+ messages in thread From: Ilpo Järvinen @ 2011-11-17 5:14 UTC (permalink / raw) To: Neal Cardwell Cc: David Miller, Netdev, Nandita Dukkipati, Yuchung Cheng, Tom Herbert On Wed, 16 Nov 2011, Neal Cardwell wrote: > The problem: Senders were overriding cwnd values picked during an undo > by calling tcp_moderate_cwnd() in tcp_try_to_open(). I think it's intentional. Because of receiver lying bandwidth cheats all unlimited undos are bit dangerous. > The fix: Don't moderate cwnd in tcp_try_to_open() if we're in > TCP_CA_Open, since doing so is generally unnecessary and specifically > would override a DSACK-based undo of a cwnd reduction made in fast > recovery. > > Signed-off-by: Neal Cardwell <ncardwell@google.com> > --- > net/ipv4/tcp_input.c | 3 ++- > 1 files changed, 2 insertions(+), 1 deletions(-) > > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c > index a4efdd7..78dd38c 100644 > --- a/net/ipv4/tcp_input.c > +++ b/net/ipv4/tcp_input.c > @@ -2881,7 +2881,8 @@ static void tcp_try_to_open(struct sock *sk, int flag) > > if (inet_csk(sk)->icsk_ca_state != TCP_CA_CWR) { > tcp_try_keep_open(sk); > - tcp_moderate_cwnd(tp); > + if (inet_csk(sk)->icsk_ca_state != TCP_CA_Open) > + tcp_moderate_cwnd(tp); > } else { > tcp_cwnd_down(sk, flag); > } Wouldn't it be enough if tcp max burst is increased to match IW (iirc we had 3 still there as a magic number)? -- i. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 5/5] tcp: skip cwnd moderation in TCP_CA_Open in tcp_try_to_open 2011-11-17 5:14 ` Ilpo Järvinen @ 2011-11-19 21:08 ` Neal Cardwell 2011-11-20 8:38 ` Ilpo Järvinen 2011-11-20 10:19 ` Alexander Zimmermann 0 siblings, 2 replies; 30+ messages in thread From: Neal Cardwell @ 2011-11-19 21:08 UTC (permalink / raw) To: Ilpo Järvinen Cc: David Miller, Netdev, Nandita Dukkipati, Yuchung Cheng, Tom Herbert On Thu, Nov 17, 2011 at 12:14 AM, Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> wrote: > I think it's intentional. Because of receiver lying bandwidth cheats all > unlimited undos are bit dangerous. If your concern is receivers lying, then there are other easy ways that a misbehaving receiver can get a sender to send too fast (e.g. cumulatively ACKing the highest sequence number seen, irrespective of holes). IMHO it would be a shame to penalize the vast majority of well-behaved users to plug one potential attack vector when there are other easy holes that an attacker would use. > Wouldn't it be enough if tcp max burst is increased to match IW (iirc we > had 3 still there as a magic number)? Yes, tcp_max_burst() returns tp->reordering now. Changing it to return max(tp->reordering, TCP_INIT_CWND) sounds good to me. I think that's an excellent idea in any case, regardless of the outcome of this undo discussion. However, I don't think this is sufficient for request-response protocols (e.g. RPC) running over long-lived connections over a path with a large bandwidth-delay product. In such cases, the cwnd will grow quite large (hundreds of packets). The DSACKs will often arrive after the entire response is sent, so that cwnd moderation will typically pull the cwnd down to 3 (or, with your proposal, 10) packets. IMHO that seems like an unnecessarily steep price to pay. neal ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 5/5] tcp: skip cwnd moderation in TCP_CA_Open in tcp_try_to_open 2011-11-19 21:08 ` Neal Cardwell @ 2011-11-20 8:38 ` Ilpo Järvinen 2011-11-21 22:57 ` Yuchung Cheng 2011-11-20 10:19 ` Alexander Zimmermann 1 sibling, 1 reply; 30+ messages in thread From: Ilpo Järvinen @ 2011-11-20 8:38 UTC (permalink / raw) To: Neal Cardwell Cc: David Miller, Netdev, Nandita Dukkipati, Yuchung Cheng, Tom Herbert [-- Attachment #1: Type: TEXT/PLAIN, Size: 2230 bytes --] On Sat, 19 Nov 2011, Neal Cardwell wrote: > On Thu, Nov 17, 2011 at 12:14 AM, Ilpo Järvinen > <ilpo.jarvinen@helsinki.fi> wrote: > > > I think it's intentional. Because of receiver lying bandwidth cheats all > > unlimited undos are bit dangerous. > > If your concern is receivers lying, then there are other easy ways > that a misbehaving receiver can get a sender to send too fast > (e.g. cumulatively ACKing the highest sequence number seen, > irrespective of holes). IMHO it would be a shame to penalize the vast > majority of well-behaved users to plug one potential attack vector > when there are other easy holes that an attacker would use. I disagree... there is huge difference as the receiver then has to gamble with the reliability of the flow. DSACK attack is nasty in that that it allows maintaining reliability while cheating bw. > > Wouldn't it be enough if tcp max burst is increased to match IW (iirc we > > had 3 still there as a magic number)? > > Yes, tcp_max_burst() returns tp->reordering now. Changing it to return > max(tp->reordering, TCP_INIT_CWND) sounds good to me. I think that's > an excellent idea in any case, regardless of the outcome of this undo > discussion. Feel free to provide a patch then as I'm likely to forget :). > However, I don't think this is sufficient for request-response > protocols (e.g. RPC) running over long-lived connections over a path > with a large bandwidth-delay product. In such cases, the cwnd will > grow quite large (hundreds of packets). The DSACKs will often arrive > after the entire response is sent, so that cwnd moderation will > typically pull the cwnd down to 3 (or, with your proposal, 10) > packets. IMHO that seems like an unnecessarily steep price to pay. Sounds rather hypotetical scenario that you'd happen to get those DSACKs in such scenario... DSACK would only be sent when receiver and sender were out of sync due to heavy ACK loss that exceeds SACK redundancy or heavy reordering... the later prevents large window in the first place (or reordering has already adapted). Honestly I don't believe this is likely scenario. ...And FRTO handles single unnecesary rexmit in case of spurious RTO even before DSACK could be seen. -- i. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 5/5] tcp: skip cwnd moderation in TCP_CA_Open in tcp_try_to_open 2011-11-20 8:38 ` Ilpo Järvinen @ 2011-11-21 22:57 ` Yuchung Cheng 2011-11-22 11:47 ` Ilpo Järvinen 0 siblings, 1 reply; 30+ messages in thread From: Yuchung Cheng @ 2011-11-21 22:57 UTC (permalink / raw) To: Ilpo Järvinen Cc: Neal Cardwell, David Miller, Netdev, Nandita Dukkipati, Tom Herbert On Sun, Nov 20, 2011 at 12:38 AM, Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> wrote: > On Sat, 19 Nov 2011, Neal Cardwell wrote: >> If your concern is receivers lying, then there are other easy ways >> that a misbehaving receiver can get a sender to send too fast >> (e.g. cumulatively ACKing the highest sequence number seen, >> irrespective of holes). IMHO it would be a shame to penalize the vast >> majority of well-behaved users to plug one potential attack vector >> when there are other easy holes that an attacker would use. > > I disagree... there is huge difference as the receiver then has to gamble > with the reliability of the flow. DSACK attack is nasty in that that it > allows maintaining reliability while cheating bw. > I feel the original change Neal is proposing is getting lost. This patch proposes when a. ack is dubious b. but it's not time to recover yet c. and other checks in tcp_try_keep_open affirms the network is in good condition, i.e, stat == Open 3. do not moderate cwnd (not 3, not 10 or any magic number). Your concern is valid, but if that's true for majority then I argue all undo logic on dsack or TS are causing major threats since they all require some trusts of the remote end. Why even bother processing DSACK if we can't trust them? AFAIK cwnd moderation is to lower bursty drops not to discourage (dsack) cheating. I believe the reason is the same in tcp_try_to_open(). We are in common cases, e.g., loss-recovery, this logic hurts performance. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 5/5] tcp: skip cwnd moderation in TCP_CA_Open in tcp_try_to_open 2011-11-21 22:57 ` Yuchung Cheng @ 2011-11-22 11:47 ` Ilpo Järvinen 2011-11-28 0:01 ` David Miller 0 siblings, 1 reply; 30+ messages in thread From: Ilpo Järvinen @ 2011-11-22 11:47 UTC (permalink / raw) To: Yuchung Cheng Cc: Neal Cardwell, David Miller, Netdev, Nandita Dukkipati, Tom Herbert [-- Attachment #1: Type: TEXT/PLAIN, Size: 2970 bytes --] On Mon, 21 Nov 2011, Yuchung Cheng wrote: > On Sun, Nov 20, 2011 at 12:38 AM, Ilpo Järvinen > <ilpo.jarvinen@helsinki.fi> wrote: > > On Sat, 19 Nov 2011, Neal Cardwell wrote: > >> If your concern is receivers lying, then there are other easy ways > >> that a misbehaving receiver can get a sender to send too fast > >> (e.g. cumulatively ACKing the highest sequence number seen, > >> irrespective of holes). IMHO it would be a shame to penalize the vast > >> majority of well-behaved users to plug one potential attack vector > >> when there are other easy holes that an attacker would use. > > > > I disagree... there is huge difference as the receiver then has to gamble > > with the reliability of the flow. DSACK attack is nasty in that that it > > allows maintaining reliability while cheating bw. > > I feel the original change Neal is proposing is getting lost. This patch > proposes when > > a. ack is dubious > b. but it's not time to recover yet > c. and other checks in tcp_try_keep_open affirms the network is in > good condition, i.e, stat == Open Presence of DSACK certainly does disagree with your condition c (which was the purpose of this change? Though it could be present in a later segment). Network was in such a bad condition that DSACKs were needed in the first place. > 3. do not moderate cwnd (not 3, not 10 or any magic number). > > Your concern is valid, but if that's true for majority then I argue > all undo logic on dsack or TS are causing major threats since they all > require some trusts of the remote end. Why even bother processing > DSACK if we can't trust them? Right, it seems that DSACK RFCs (RFC 3708 mainly) don't care too much about the cheating possibility, just states that it is possible and that a non-available nonce solution would help. So yes, I myself find DSACK quite useless, albeit fancy, mechanism (there are some other uses not related to undo mentioned though in the RFC but I've not spent too much thinking the details). As regards TS, we don't trust them fully either (see e.g., some CUBIC related change from Stephen Hemminger couple of years ago). > AFAIK cwnd moderation is to lower bursty drops not to discourage > (dsack) cheating. I believe the reason is the same in > tcp_try_to_open(). We are in common cases, e.g., loss-recovery, this > logic hurts performance. Quote from the patch description: "Senders were overriding cwnd values picked during an undo by calling tcp_moderate_cwnd()" ...so after all it has to do with undo being limited. IMHO only up to orig_cwnd/2+IW is safe due to cheating opportunities. Also FRTO uses orig_cwnd/2 due to same reason (it could do the +IW too but IIRC it is only /2 currently). What would be the safeguard there after this one is removed? I kind of see your point about this particular line being related to burst mitigation but on the same time the end result of removal is that undo becomes potentially much more aggressive. -- i. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 5/5] tcp: skip cwnd moderation in TCP_CA_Open in tcp_try_to_open 2011-11-22 11:47 ` Ilpo Järvinen @ 2011-11-28 0:01 ` David Miller 2011-11-28 17:35 ` Neal Cardwell 0 siblings, 1 reply; 30+ messages in thread From: David Miller @ 2011-11-28 0:01 UTC (permalink / raw) To: ilpo.jarvinen; +Cc: ycheng, ncardwell, netdev, nanditad, therbert From: "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> Date: Tue, 22 Nov 2011 13:47:13 +0200 (EET) > On Mon, 21 Nov 2011, Yuchung Cheng wrote: > >> AFAIK cwnd moderation is to lower bursty drops not to discourage >> (dsack) cheating. I believe the reason is the same in >> tcp_try_to_open(). We are in common cases, e.g., loss-recovery, this >> logic hurts performance. > > Quote from the patch description: "Senders were overriding cwnd values > picked during an undo by calling tcp_moderate_cwnd()" ...so after all it > has to do with undo being limited. IMHO only up to orig_cwnd/2+IW is safe > due to cheating opportunities. Also FRTO uses orig_cwnd/2 due to same > reason (it could do the +IW too but IIRC it is only /2 currently). What > would be the safeguard there after this one is removed? I kind of see your > point about this particular line being related to burst mitigation but on > the same time the end result of removal is that undo becomes potentially > much more aggressive. I'm apply this patch to net-next now, but Neil and Yucheng are on the hook to fully look into the issues Ilpo has raised. Thanks. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 5/5] tcp: skip cwnd moderation in TCP_CA_Open in tcp_try_to_open 2011-11-28 0:01 ` David Miller @ 2011-11-28 17:35 ` Neal Cardwell 2011-11-28 20:33 ` Yuchung Cheng 0 siblings, 1 reply; 30+ messages in thread From: Neal Cardwell @ 2011-11-28 17:35 UTC (permalink / raw) To: David Miller; +Cc: ilpo.jarvinen, ycheng, netdev, nanditad, therbert On Sun, Nov 27, 2011 at 7:01 PM, David Miller <davem@davemloft.net> wrote: > From: "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> > Date: Tue, 22 Nov 2011 13:47:13 +0200 (EET) > >> On Mon, 21 Nov 2011, Yuchung Cheng wrote: >> >>> AFAIK cwnd moderation is to lower bursty drops not to discourage >>> (dsack) cheating. I believe the reason is the same in >>> tcp_try_to_open(). We are in common cases, e.g., loss-recovery, this >>> logic hurts performance. >> >> Quote from the patch description: "Senders were overriding cwnd values >> picked during an undo by calling tcp_moderate_cwnd()" ...so after all it >> has to do with undo being limited. IMHO only up to orig_cwnd/2+IW is safe >> due to cheating opportunities. Also FRTO uses orig_cwnd/2 due to same >> reason (it could do the +IW too but IIRC it is only /2 currently). What >> would be the safeguard there after this one is removed? I kind of see your >> point about this particular line being related to burst mitigation but on >> the same time the end result of removal is that undo becomes potentially >> much more aggressive. > > I'm apply this patch to net-next now, but Neil and Yucheng are on the hook > to fully look into the issues Ilpo has raised. Thanks! We will spend some time looking into these issues. neal ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 5/5] tcp: skip cwnd moderation in TCP_CA_Open in tcp_try_to_open 2011-11-28 17:35 ` Neal Cardwell @ 2011-11-28 20:33 ` Yuchung Cheng 2011-11-29 10:33 ` Ilpo Järvinen 0 siblings, 1 reply; 30+ messages in thread From: Yuchung Cheng @ 2011-11-28 20:33 UTC (permalink / raw) To: Neal Cardwell; +Cc: David Miller, ilpo.jarvinen, netdev, nanditad, therbert On Mon, Nov 28, 2011 at 9:35 AM, Neal Cardwell <ncardwell@google.com> wrote: > On Sun, Nov 27, 2011 at 7:01 PM, David Miller <davem@davemloft.net> wrote: >> I'm apply this patch to net-next now, but Neil and Yucheng are on the hook >> to fully look into the issues Ilpo has raised. > > Thanks! We will spend some time looking into these issues. > Ilpo do you know any report on these types of cheating? I did some literature search but didn't find any. If the receiver simply responds DSACK on good (non-spurious) retransmits, the sender may increase dupthresh (tp->reordering) and slows down triggering fast-recovery? given the connection has real losses, the receiver may end up shooting his feet. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 5/5] tcp: skip cwnd moderation in TCP_CA_Open in tcp_try_to_open 2011-11-28 20:33 ` Yuchung Cheng @ 2011-11-29 10:33 ` Ilpo Järvinen 0 siblings, 0 replies; 30+ messages in thread From: Ilpo Järvinen @ 2011-11-29 10:33 UTC (permalink / raw) To: Yuchung Cheng; +Cc: Neal Cardwell, David Miller, Netdev, nanditad, therbert On Mon, 28 Nov 2011, Yuchung Cheng wrote: > On Mon, Nov 28, 2011 at 9:35 AM, Neal Cardwell <ncardwell@google.com> wrote: > > On Sun, Nov 27, 2011 at 7:01 PM, David Miller <davem@davemloft.net> wrote: > >> I'm apply this patch to net-next now, but Neil and Yucheng are on the hook > >> to fully look into the issues Ilpo has raised. > > > > Thanks! We will spend some time looking into these issues. > > > Ilpo do you know any report on these types of cheating? I did some > literature search but didn't find any. No I know of, however, I'm not too surprised as so far such attacks have lacked a target at least among Linux endhosts. It is only possible to gain something now after you have one by one removed all safeguards which prevent such misuse :-). This possibility is certainly known to exists (even RFC3708 mentions it). Besides, DSACK adoption, not to speak of adoption of any DSACK undo algorithm, is not that high as with SACK alone iirc. > If the receiver simply responds DSACK on good (non-spurious) > retransmits, the sender may increase dupthresh (tp->reordering) and > slows down triggering fast-recovery? given the connection has real > losses, the receiver may end up shooting his feet. For some attack variants, yes. However, it is always "safe" to attack the current Linux sender with something that is past the acknowledgment number, there's only dummy counting done for that (trivial to arrange). If the attacker plays safe, no tp->reordering effect is ever imposed but attack eventually succeeds to hit the right number of undo_retrans no matter what if one keeps trying. This is what you just asked us to enable, unlimited attack window! Perhaps we could add a penalty of misguessing the number of rexmits but knowing the right number is not that hard anyway (and some caveats exist in such approach though to those who are hit by HW/path duping). Btw, another thing that should be auditted now that this was applied: check that undo_marker checks do not behave bad due to 32-bit seqno wrap-arounds since its needs-to-be-valid period got just extented way beyond a single loss event. -- i. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 5/5] tcp: skip cwnd moderation in TCP_CA_Open in tcp_try_to_open 2011-11-19 21:08 ` Neal Cardwell 2011-11-20 8:38 ` Ilpo Järvinen @ 2011-11-20 10:19 ` Alexander Zimmermann 1 sibling, 0 replies; 30+ messages in thread From: Alexander Zimmermann @ 2011-11-20 10:19 UTC (permalink / raw) To: Neal Cardwell Cc: Ilpo Järvinen, David Miller, netdev, Nandita Dukkipati, Yuchung Cheng, Tom Herbert, Carsten Wolff [-- Attachment #1: Type: text/plain, Size: 2637 bytes --] Hi Neal, Am 19.11.2011 um 22:08 schrieb Neal Cardwell: > On Thu, Nov 17, 2011 at 12:14 AM, Ilpo Järvinen > <ilpo.jarvinen@helsinki.fi> wrote: > >> I think it's intentional. Because of receiver lying bandwidth cheats all >> unlimited undos are bit dangerous. > > If your concern is receivers lying, then there are other easy ways > that a misbehaving receiver can get a sender to send too fast > (e.g. cumulatively ACKing the highest sequence number seen, > irrespective of holes). IMHO it would be a shame to penalize the vast > majority of well-behaved users to plug one potential attack vector > when there are other easy holes that an attacker would use. > >> Wouldn't it be enough if tcp max burst is increased to match IW (iirc we >> had 3 still there as a magic number)? > > Yes, tcp_max_burst() returns tp->reordering now. Changing it to return > max(tp->reordering, TCP_INIT_CWND) sounds good to me. I think that's > an excellent idea in any case, regardless of the outcome of this undo > discussion. For my Phd thesis - based on the Linux reordering detection we implemented an adaptive Version of TCP-NCR (RFC 4653) - we change tcp_max_burst() to return TCP_INIT_CWND. If the reordering delay is high, tp->reordering is high as well, allowing to send a huge burst. In 2008 John Heffner* changed tcp_max_burst() to return tp->reordering according to problem for reordering on the reverse path. However, I never was be able to reproduce old problem... Due to my experience I recommend to use TCP_INIT_CWND directly. (*)http://git.kernel.org/linus/dd9e0dda66ba38a2ddd140 5ac279894260dc5c36. > > However, I don't think this is sufficient for request-response > protocols (e.g. RPC) running over long-lived connections over a path > with a large bandwidth-delay product. In such cases, the cwnd will > grow quite large (hundreds of packets). The DSACKs will often arrive > after the entire response is sent, so that cwnd moderation will > typically pull the cwnd down to 3 (or, with your proposal, 10) > packets. IMHO that seems like an unnecessarily steep price to pay. > > neal > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html // // Dipl.-Inform. Alexander Zimmermann // Department of Computer Science, Informatik 4 // RWTH Aachen University // Ahornstr. 55, 52056 Aachen, Germany // phone: (49-241) 80-21422, fax: (49-241) 80-22222 // email: zimmermann@cs.rwth-aachen.de // web: http://www.umic-mesh.net // [-- Attachment #2: Message signed with OpenPGP using GPGMail --] [-- Type: application/pgp-signature, Size: 163 bytes --] ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 1/5] tcp: make is_dupack a parameter to tcp_fastretrans_alert() 2011-11-16 18:58 [PATCH 1/5] tcp: make is_dupack a parameter to tcp_fastretrans_alert() Neal Cardwell ` (3 preceding siblings ...) 2011-11-16 18:58 ` [PATCH 5/5] tcp: skip cwnd moderation in TCP_CA_Open in tcp_try_to_open Neal Cardwell @ 2011-11-17 1:23 ` Ilpo Järvinen 2011-11-27 23:55 ` David Miller 4 siblings, 1 reply; 30+ messages in thread From: Ilpo Järvinen @ 2011-11-17 1:23 UTC (permalink / raw) To: Neal Cardwell Cc: David Miller, Netdev, Ilpo Järvinen, Nandita Dukkipati, Yuchung Cheng, Tom Herbert [-- Attachment #1: Type: TEXT/PLAIN, Size: 430 bytes --] On Wed, 16 Nov 2011, Neal Cardwell wrote: > Allow callers to decide whether an ACK is a duplicate ACK. This is a > prerequisite to allowing fastretrans_alert to be called from new > contexts, such as the no_queue and old_ack code paths, from which we > have extra info that tells us whether an ACK is a dupack. > > Signed-off-by: Neal Cardwell <ncardwell@google.com> Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> -- i. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 1/5] tcp: make is_dupack a parameter to tcp_fastretrans_alert() 2011-11-17 1:23 ` [PATCH 1/5] tcp: make is_dupack a parameter to tcp_fastretrans_alert() Ilpo Järvinen @ 2011-11-27 23:55 ` David Miller 0 siblings, 0 replies; 30+ messages in thread From: David Miller @ 2011-11-27 23:55 UTC (permalink / raw) To: ilpo.jarvinen; +Cc: ncardwell, netdev, nanditad, ycheng, therbert From: "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> Date: Thu, 17 Nov 2011 03:23:53 +0200 (EET) > On Wed, 16 Nov 2011, Neal Cardwell wrote: > >> Allow callers to decide whether an ACK is a duplicate ACK. This is a >> prerequisite to allowing fastretrans_alert to be called from new >> contexts, such as the no_queue and old_ack code paths, from which we >> have extra info that tells us whether an ACK is a dupack. >> >> Signed-off-by: Neal Cardwell <ncardwell@google.com> > > Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Applied to net-next. ^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2011-12-18 18:23 UTC | newest] Thread overview: 30+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-11-16 18:58 [PATCH 1/5] tcp: make is_dupack a parameter to tcp_fastretrans_alert() Neal Cardwell 2011-11-16 18:58 ` [PATCH 2/5] tcp: use DSACKs that arrive when packets_out is 0 Neal Cardwell 2011-11-17 1:34 ` Ilpo Järvinen 2011-11-27 23:55 ` David Miller 2011-11-16 18:58 ` [PATCH 3/5] tcp: use SACKs and DSACKs that arrive on ACKs below snd_una Neal Cardwell 2011-11-17 1:52 ` Ilpo Järvinen 2011-11-27 23:57 ` David Miller 2011-11-16 18:58 ` [PATCH 4/5] tcp: allow undo from reordered DSACKs Neal Cardwell 2011-11-17 5:18 ` Ilpo Järvinen 2011-11-17 20:49 ` Neal Cardwell 2011-11-22 12:44 ` Ilpo Järvinen 2011-11-27 23:58 ` David Miller 2011-11-28 17:31 ` Neal Cardwell 2011-12-12 12:05 ` [PATCH 1/1] tcp: fix spurious undos in CA_Open Ilpo Järvinen 2011-12-12 19:40 ` Neal Cardwell 2011-12-14 8:57 ` Ilpo Järvinen 2011-12-18 18:23 ` Neal Cardwell 2011-11-16 18:58 ` [PATCH 5/5] tcp: skip cwnd moderation in TCP_CA_Open in tcp_try_to_open Neal Cardwell 2011-11-17 5:14 ` Ilpo Järvinen 2011-11-19 21:08 ` Neal Cardwell 2011-11-20 8:38 ` Ilpo Järvinen 2011-11-21 22:57 ` Yuchung Cheng 2011-11-22 11:47 ` Ilpo Järvinen 2011-11-28 0:01 ` David Miller 2011-11-28 17:35 ` Neal Cardwell 2011-11-28 20:33 ` Yuchung Cheng 2011-11-29 10:33 ` Ilpo Järvinen 2011-11-20 10:19 ` Alexander Zimmermann 2011-11-17 1:23 ` [PATCH 1/5] tcp: make is_dupack a parameter to tcp_fastretrans_alert() Ilpo Järvinen 2011-11-27 23:55 ` David Miller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).