[PATCH 0/2 net-next] tcp: reducing lost retransmits in recovery

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/2 net-next] tcp: reducing lost retransmits in recovery
@ 2015-07-01 21:11 Yuchung Cheng
  2015-07-01 21:11 ` [PATCH 1/2 net-next] tcp: reduce cwnd if retransmit is lost in CA_Loss Yuchung Cheng
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Yuchung Cheng @ 2015-07-01 21:11 UTC (permalink / raw)
  To: davem; +Cc: netdev, Yuchung Cheng

This patch series reduces lost retransmits in recovery, in particular
when dealing with traffic policers. The main problem is that
slow start in recovery under policing can cause massive lost and
retransmit storms: any excess sending rate turns into drops. The
solution is to avoid doing slow start when lost retransmit is
detected and use packet conservation instead.

On networks with traffic policers the patches have lowered the
TCP loss rates by ~20% from Google servers without latency regressions.

Yuchung Cheng (2):
  tcp: reduce cwnd if retransmit is lost in CA_Loss
  tcp: PRR uses CRB mode by default and SS mode conditionally

 net/ipv4/tcp_input.c | 43 +++++++++++++++++++++++--------------------
 1 file changed, 23 insertions(+), 20 deletions(-)

-- 
2.4.3.573.g4eafbef

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 1/2 net-next] tcp: reduce cwnd if retransmit is lost in CA_Loss
  2015-07-01 21:11 [PATCH 0/2 net-next] tcp: reducing lost retransmits in recovery Yuchung Cheng
@ 2015-07-01 21:11 ` Yuchung Cheng
  2015-07-01 21:11 ` [PATCH 2/2 net-next] tcp: PRR uses CRB mode by default and SS mode conditionally Yuchung Cheng
  2015-07-08 20:30 ` [PATCH 0/2 net-next] tcp: reducing lost retransmits in recovery David Miller
  2 siblings, 0 replies; 4+ messages in thread
From: Yuchung Cheng @ 2015-07-01 21:11 UTC (permalink / raw)
  To: davem; +Cc: netdev, Yuchung Cheng, Nandita Dukkipati, Neal Cardwell

If the retransmission in CA_Loss is lost again, we should not
continue to slow start or raise cwnd in congestion avoidance mode.
Instead we should enter fast recovery and use PRR to reduce cwnd,
following the principle in RFC5681:

"... or the loss of a retransmission, should be taken as two
 indications of congestion and, therefore, cwnd (and ssthresh) MUST
 be lowered twice in this case."

This is especially important to reduce loss when the CA_Loss
state was caused by a traffic policer dropping the entire inflight.
The CA_Loss state has a problem where a loss of L packets causes the
sender to send a burst of L packets. So a policer that's dropping
most packets in a given RTT can cause a huge retransmit storm. By
contrast, PRR includes logic to bound the number of outbound packets
that result from a given ACK. So switching to CA_Recovery on lost
retransmits in CA_Loss avoids this retransmit storm problem when
in CA_Loss.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
---
 net/ipv4/tcp_input.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 684f095..923e0e5 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -109,6 +109,7 @@ int sysctl_tcp_invalid_ratelimit __read_mostly = HZ/2;
 #define FLAG_SYN_ACKED		0x10 /* This ACK acknowledged SYN.		*/
 #define FLAG_DATA_SACKED	0x20 /* New SACK.				*/
 #define FLAG_ECE		0x40 /* ECE in this ACK				*/
+#define FLAG_LOST_RETRANS	0x80 /* This ACK marks some retransmission lost */
 #define FLAG_SLOWPATH		0x100 /* Do not skip RFC checks for window update.*/
 #define FLAG_ORIG_SACK_ACKED	0x200 /* Never retransmitted data are (s)acked	*/
 #define FLAG_SND_UNA_ADVANCED	0x400 /* Snd_una was changed (!= FLAG_DATA_ACKED) */
@@ -1037,7 +1038,7 @@ static bool tcp_is_sackblock_valid(struct tcp_sock *tp, bool is_dsack,
  * highest SACK block). Also calculate the lowest snd_nxt among the remaining
  * retransmitted skbs to avoid some costly processing per ACKs.
  */
-static void tcp_mark_lost_retrans(struct sock *sk)
+static void tcp_mark_lost_retrans(struct sock *sk, int *flag)
 {
 	const struct inet_connection_sock *icsk = inet_csk(sk);
 	struct tcp_sock *tp = tcp_sk(sk);
@@ -1078,7 +1079,7 @@ static void tcp_mark_lost_retrans(struct sock *sk)
 		if (after(received_upto, ack_seq)) {
 			TCP_SKB_CB(skb)->sacked &= ~TCPCB_SACKED_RETRANS;
 			tp->retrans_out -= tcp_skb_pcount(skb);
-
+			*flag |= FLAG_LOST_RETRANS;
 			tcp_skb_mark_lost_uncond_verify(tp, skb);
 			NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPLOSTRETRANSMIT);
 		} else {
@@ -1818,7 +1819,7 @@ advance_sp:
 	    ((inet_csk(sk)->icsk_ca_state != TCP_CA_Loss) || tp->undo_marker))
 		tcp_update_reordering(sk, tp->fackets_out - state->reord, 0);
 
-	tcp_mark_lost_retrans(sk);
+	tcp_mark_lost_retrans(sk, &state->flag);
 	tcp_verify_left_out(tp);
 out:
 
@@ -2676,7 +2677,7 @@ static void tcp_enter_recovery(struct sock *sk, bool ece_ack)
 	tp->prior_ssthresh = 0;
 	tcp_init_undo(tp);
 
-	if (inet_csk(sk)->icsk_ca_state < TCP_CA_CWR) {
+	if (!tcp_in_cwnd_reduction(sk)) {
 		if (!ece_ack)
 			tp->prior_ssthresh = tcp_current_ssthresh(sk);
 		tcp_init_cwnd_reduction(sk);
@@ -2852,9 +2853,10 @@ static void tcp_fastretrans_alert(struct sock *sk, const int acked,
 		break;
 	case TCP_CA_Loss:
 		tcp_process_loss(sk, flag, is_dupack);
-		if (icsk->icsk_ca_state != TCP_CA_Open)
+		if (icsk->icsk_ca_state != TCP_CA_Open &&
+		    !(flag & FLAG_LOST_RETRANS))
 			return;
-		/* Fall through to processing in Open state. */
+		/* Change state if cwnd is undone or retransmits are lost */
 	default:
 		if (tcp_is_reno(tp)) {
 			if (flag & FLAG_SND_UNA_ADVANCED)
-- 
2.4.3.573.g4eafbef

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH 2/2 net-next] tcp: PRR uses CRB mode by default and SS mode conditionally
  2015-07-01 21:11 [PATCH 0/2 net-next] tcp: reducing lost retransmits in recovery Yuchung Cheng
  2015-07-01 21:11 ` [PATCH 1/2 net-next] tcp: reduce cwnd if retransmit is lost in CA_Loss Yuchung Cheng
@ 2015-07-01 21:11 ` Yuchung Cheng
  2015-07-08 20:30 ` [PATCH 0/2 net-next] tcp: reducing lost retransmits in recovery David Miller
  2 siblings, 0 replies; 4+ messages in thread
From: Yuchung Cheng @ 2015-07-01 21:11 UTC (permalink / raw)
  To: davem; +Cc: netdev, Yuchung Cheng, Nandita Dukkipati, Neal Cardwell

PRR slow start is often too aggressive especially when drops are
caused by traffic policers. The policers mainly use token bucket
to enforce the rate so sending (twice) faster than the delivery
rate causes excessive drops.

This patch changes PRR to the conservative reduction bound
(CRB) mode in RFC 6937 by default. CRB follows the packet
conservation rule to send at most the delivery rate by default.

But if many packets are lost and the pipe is empty, CRB may take N
round trips to repair N losses. We conditionally turn on slow start
mode if all these conditions are made to speed up the recovery:

  1) on the second round or later in recovery
  2) retransmission sent in the previous round is delivered on this ACK
  3) no retransmission is marked lost on this ACK

By using packet conservation by default, this change reduces the loss
retransmits signicantly on networks that deploy traffic policers,
up to 20% reduction of overall loss rate.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
---
 net/ipv4/tcp_input.c | 29 +++++++++++++++--------------
 1 file changed, 15 insertions(+), 14 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 923e0e5..ad1482d 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2476,15 +2476,14 @@ static bool tcp_try_undo_loss(struct sock *sk, bool frto_undo)
 	return false;
 }
 
-/* The cwnd reduction in CWR and Recovery use the PRR algorithm
- * https://datatracker.ietf.org/doc/draft-ietf-tcpm-proportional-rate-reduction/
+/* The cwnd reduction in CWR and Recovery uses the PRR algorithm in RFC 6937.
  * It computes the number of packets to send (sndcnt) based on packets newly
  * delivered:
  *   1) If the packets in flight is larger than ssthresh, PRR spreads the
  *	cwnd reductions across a full RTT.
- *   2) If packets in flight is lower than ssthresh (such as due to excess
- *	losses and/or application stalls), do not perform any further cwnd
- *	reductions, but instead slow start up to ssthresh.
+ *   2) Otherwise PRR uses packet conservation to send as much as delivered.
+ *      But when the retransmits are acked without further losses, PRR
+ *      slow starts cwnd up to ssthresh to speed up the recovery.
  */
 static void tcp_init_cwnd_reduction(struct sock *sk)
 {
@@ -2501,7 +2500,7 @@ static void tcp_init_cwnd_reduction(struct sock *sk)
 }
 
 static void tcp_cwnd_reduction(struct sock *sk, const int prior_unsacked,
-			       int fast_rexmit)
+			       int fast_rexmit, int flag)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
 	int sndcnt = 0;
@@ -2510,16 +2509,18 @@ static void tcp_cwnd_reduction(struct sock *sk, const int prior_unsacked,
 				 (tp->packets_out - tp->sacked_out);
 
 	tp->prr_delivered += newly_acked_sacked;
-	if (tcp_packets_in_flight(tp) > tp->snd_ssthresh) {
+	if (delta < 0) {
 		u64 dividend = (u64)tp->snd_ssthresh * tp->prr_delivered +
 			       tp->prior_cwnd - 1;
 		sndcnt = div_u64(dividend, tp->prior_cwnd) - tp->prr_out;
-	} else {
+	} else if ((flag & FLAG_RETRANS_DATA_ACKED) &&
+		   !(flag & FLAG_LOST_RETRANS)) {
 		sndcnt = min_t(int, delta,
 			       max_t(int, tp->prr_delivered - tp->prr_out,
 				     newly_acked_sacked) + 1);
+	} else {
+		sndcnt = min(delta, newly_acked_sacked);
 	}
-
 	sndcnt = max(sndcnt, (fast_rexmit ? 1 : 0));
 	tp->snd_cwnd = tcp_packets_in_flight(tp) + sndcnt;
 }
@@ -2580,7 +2581,7 @@ static void tcp_try_to_open(struct sock *sk, int flag, const int prior_unsacked)
 	if (inet_csk(sk)->icsk_ca_state != TCP_CA_CWR) {
 		tcp_try_keep_open(sk);
 	} else {
-		tcp_cwnd_reduction(sk, prior_unsacked, 0);
+		tcp_cwnd_reduction(sk, prior_unsacked, 0, flag);
 	}
 }
 
@@ -2737,7 +2738,7 @@ static void tcp_process_loss(struct sock *sk, int flag, bool is_dupack)
 
 /* Undo during fast recovery after partial ACK. */
 static bool tcp_try_undo_partial(struct sock *sk, const int acked,
-				 const int prior_unsacked)
+				 const int prior_unsacked, int flag)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
 
@@ -2753,7 +2754,7 @@ static bool tcp_try_undo_partial(struct sock *sk, const int acked,
 		 * mark more packets lost or retransmit more.
 		 */
 		if (tp->retrans_out) {
-			tcp_cwnd_reduction(sk, prior_unsacked, 0);
+			tcp_cwnd_reduction(sk, prior_unsacked, 0, flag);
 			return true;
 		}
 
@@ -2840,7 +2841,7 @@ static void tcp_fastretrans_alert(struct sock *sk, const int acked,
 			if (tcp_is_reno(tp) && is_dupack)
 				tcp_add_reno_sack(sk);
 		} else {
-			if (tcp_try_undo_partial(sk, acked, prior_unsacked))
+			if (tcp_try_undo_partial(sk, acked, prior_unsacked, flag))
 				return;
 			/* Partial ACK arrived. Force fast retransmit. */
 			do_lost = tcp_is_reno(tp) ||
@@ -2891,7 +2892,7 @@ static void tcp_fastretrans_alert(struct sock *sk, const int acked,
 
 	if (do_lost)
 		tcp_update_scoreboard(sk, fast_rexmit);
-	tcp_cwnd_reduction(sk, prior_unsacked, fast_rexmit);
+	tcp_cwnd_reduction(sk, prior_unsacked, fast_rexmit, flag);
 	tcp_xmit_retransmit_queue(sk);
 }
 
-- 
2.4.3.573.g4eafbef

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH 0/2 net-next] tcp: reducing lost retransmits in recovery
  2015-07-01 21:11 [PATCH 0/2 net-next] tcp: reducing lost retransmits in recovery Yuchung Cheng
  2015-07-01 21:11 ` [PATCH 1/2 net-next] tcp: reduce cwnd if retransmit is lost in CA_Loss Yuchung Cheng
  2015-07-01 21:11 ` [PATCH 2/2 net-next] tcp: PRR uses CRB mode by default and SS mode conditionally Yuchung Cheng
@ 2015-07-08 20:30 ` David Miller
  2 siblings, 0 replies; 4+ messages in thread
From: David Miller @ 2015-07-08 20:30 UTC (permalink / raw)
  To: ycheng; +Cc: netdev

From: Yuchung Cheng <ycheng@google.com>
Date: Wed,  1 Jul 2015 14:11:13 -0700

> This patch series reduces lost retransmits in recovery, in particular
> when dealing with traffic policers. The main problem is that
> slow start in recovery under policing can cause massive lost and
> retransmit storms: any excess sending rate turns into drops. The
> solution is to avoid doing slow start when lost retransmit is
> detected and use packet conservation instead.
> 
> On networks with traffic policers the patches have lowered the
> TCP loss rates by ~20% from Google servers without latency regressions.

Looks great, series applied to net-next.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-07-08 20:30 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-07-01 21:11 [PATCH 0/2 net-next] tcp: reducing lost retransmits in recovery Yuchung Cheng
2015-07-01 21:11 ` [PATCH 1/2 net-next] tcp: reduce cwnd if retransmit is lost in CA_Loss Yuchung Cheng
2015-07-01 21:11 ` [PATCH 2/2 net-next] tcp: PRR uses CRB mode by default and SS mode conditionally Yuchung Cheng
2015-07-08 20:30 ` [PATCH 0/2 net-next] tcp: reducing lost retransmits in recovery David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).