netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next] tcp: increase throughput when reordering is high
@ 2013-08-22  0:29 Yuchung Cheng
  2013-08-22  2:18 ` Neal Cardwell
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Yuchung Cheng @ 2013-08-22  0:29 UTC (permalink / raw)
  To: davem, ncardwell, edumazet; +Cc: netdev, Yuchung Cheng

The stack currently detects reordering and avoid spurious
retransmission very well. However the throughput is sub-optimal under
high reordering because cwnd is increased only if the data is deliverd
in order. I.e., FLAG_DATA_ACKED check in tcp_ack().  The more packet
are reordered the worse the throughput is.

Therefore when reordering is proven high, cwnd should advance whenever
the data is delivered regardless of its ordering. If reordering is low,
conservatively advance cwnd only on ordered deliveries in Open state,
and retain cwnd in Disordered state (RFC5681).

Using netperf on a qdisc setup of 20Mbps BW and random RTT from 45ms
to 55ms (for reordering effect). This change increases TCP throughput
by 20 - 25% to near bottleneck BW.

A special case is the stretched ACK with new SACK and/or ECE mark.
For example, a receiver may receive an out of order or ECN packet with
unacked data buffered because of LRO or delayed ACK. The principle on
such an ACK is to advance cwnd on the cummulative acked part first,
then reduce cwnd in tcp_fastretrans_alert().

Signed-off-by: Yuchung Cheng <ycheng@google.com>
---
 net/ipv4/tcp_input.c | 32 ++++++++++++++++++++------------
 1 file changed, 20 insertions(+), 12 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index e965cc7..ec492ea 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2485,8 +2485,6 @@ static void tcp_try_to_open(struct sock *sk, int flag, const int prior_unsacked)
 
 	if (inet_csk(sk)->icsk_ca_state != TCP_CA_CWR) {
 		tcp_try_keep_open(sk);
-		if (inet_csk(sk)->icsk_ca_state != TCP_CA_Open)
-			tcp_moderate_cwnd(tp);
 	} else {
 		tcp_cwnd_reduction(sk, prior_unsacked, 0);
 	}
@@ -3128,11 +3126,24 @@ static inline bool tcp_ack_is_dubious(const struct sock *sk, const int flag)
 		inet_csk(sk)->icsk_ca_state != TCP_CA_Open;
 }
 
+/* Decide wheather to run the increase function of congestion control. */
 static inline bool tcp_may_raise_cwnd(const struct sock *sk, const int flag)
 {
-	const struct tcp_sock *tp = tcp_sk(sk);
-	return (!(flag & FLAG_ECE) || tp->snd_cwnd < tp->snd_ssthresh) &&
-		!tcp_in_cwnd_reduction(sk);
+	if (tcp_in_cwnd_reduction(sk))
+		return false;
+
+	/* If reordering is high then always grow cwnd whenever data is
+	 * delivered regardless of its ordering. Otherwise stay conservative
+	 * and only grow cwnd on in-order delivery in Open state, and retain
+	 * cwnd in Disordered state (RFC5681). A stretched ACK with
+	 * new SACK or ECE mark may first advance cwnd here and later reduce
+	 * cwnd in tcp_fastretrans_alert() based on more states.
+	 */
+	if (tcp_sk(sk)->reordering > sysctl_tcp_reordering)
+		return flag & FLAG_FORWARD_PROGRESS;
+
+	return inet_csk(sk)->icsk_ca_state == TCP_CA_Open &&
+	       flag & FLAG_DATA_ACKED;
 }
 
 /* Check that window update is acceptable.
@@ -3352,18 +3363,15 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
 	flag |= tcp_clean_rtx_queue(sk, prior_fackets, prior_snd_una, sack_rtt);
 	acked -= tp->packets_out;
 
+	/* Advance cwnd if state allows */
+	if (tcp_may_raise_cwnd(sk, flag))
+		tcp_cong_avoid(sk, ack, prior_in_flight);
+
 	if (tcp_ack_is_dubious(sk, flag)) {
-		/* Advance CWND, if state allows this. */
-		if ((flag & FLAG_DATA_ACKED) && tcp_may_raise_cwnd(sk, flag))
-			tcp_cong_avoid(sk, ack, prior_in_flight);
 		is_dupack = !(flag & (FLAG_SND_UNA_ADVANCED | FLAG_NOT_DUP));
 		tcp_fastretrans_alert(sk, acked, prior_unsacked,
 				      is_dupack, flag);
-	} else {
-		if (flag & FLAG_DATA_ACKED)
-			tcp_cong_avoid(sk, ack, prior_in_flight);
 	}
-
 	if (tp->tlp_high_seq)
 		tcp_process_tlp_ack(sk, ack, flag);
 
-- 
1.8.3

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH net-next] tcp: increase throughput when reordering is high
  2013-08-22  0:29 [PATCH net-next] tcp: increase throughput when reordering is high Yuchung Cheng
@ 2013-08-22  2:18 ` Neal Cardwell
  2013-08-22  3:52 ` Eric Dumazet
  2013-08-22 21:40 ` David Miller
  2 siblings, 0 replies; 4+ messages in thread
From: Neal Cardwell @ 2013-08-22  2:18 UTC (permalink / raw)
  To: Yuchung Cheng; +Cc: David Miller, Eric Dumazet, Netdev

On Wed, Aug 21, 2013 at 8:29 PM, Yuchung Cheng <ycheng@google.com> wrote:
> The stack currently detects reordering and avoid spurious
> retransmission very well. However the throughput is sub-optimal under
> high reordering because cwnd is increased only if the data is deliverd
> in order. I.e., FLAG_DATA_ACKED check in tcp_ack().  The more packet
> are reordered the worse the throughput is.
>
> Therefore when reordering is proven high, cwnd should advance whenever
> the data is delivered regardless of its ordering. If reordering is low,
> conservatively advance cwnd only on ordered deliveries in Open state,
> and retain cwnd in Disordered state (RFC5681).
>
> Using netperf on a qdisc setup of 20Mbps BW and random RTT from 45ms
> to 55ms (for reordering effect). This change increases TCP throughput
> by 20 - 25% to near bottleneck BW.
>
> A special case is the stretched ACK with new SACK and/or ECE mark.
> For example, a receiver may receive an out of order or ECN packet with
> unacked data buffered because of LRO or delayed ACK. The principle on
> such an ACK is to advance cwnd on the cummulative acked part first,
> then reduce cwnd in tcp_fastretrans_alert().
>
> Signed-off-by: Yuchung Cheng <ycheng@google.com>
> ---
>  net/ipv4/tcp_input.c | 32 ++++++++++++++++++++------------
>  1 file changed, 20 insertions(+), 12 deletions(-)

Acked-by: Neal Cardwell <ncardwell@google.com>

neal

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH net-next] tcp: increase throughput when reordering is high
  2013-08-22  0:29 [PATCH net-next] tcp: increase throughput when reordering is high Yuchung Cheng
  2013-08-22  2:18 ` Neal Cardwell
@ 2013-08-22  3:52 ` Eric Dumazet
  2013-08-22 21:40 ` David Miller
  2 siblings, 0 replies; 4+ messages in thread
From: Eric Dumazet @ 2013-08-22  3:52 UTC (permalink / raw)
  To: Yuchung Cheng; +Cc: davem, ncardwell, edumazet, netdev

On Wed, 2013-08-21 at 17:29 -0700, Yuchung Cheng wrote:
> The stack currently detects reordering and avoid spurious
> retransmission very well. However the throughput is sub-optimal under
> high reordering because cwnd is increased only if the data is deliverd
> in order. I.e., FLAG_DATA_ACKED check in tcp_ack().  The more packet
> are reordered the worse the throughput is.
> 
> Therefore when reordering is proven high, cwnd should advance whenever
> the data is delivered regardless of its ordering. If reordering is low,
> conservatively advance cwnd only on ordered deliveries in Open state,
> and retain cwnd in Disordered state (RFC5681).
> 
> Using netperf on a qdisc setup of 20Mbps BW and random RTT from 45ms
> to 55ms (for reordering effect). This change increases TCP throughput
> by 20 - 25% to near bottleneck BW.
> 
> A special case is the stretched ACK with new SACK and/or ECE mark.
> For example, a receiver may receive an out of order or ECN packet with
> unacked data buffered because of LRO or delayed ACK. The principle on
> such an ACK is to advance cwnd on the cummulative acked part first,
> then reduce cwnd in tcp_fastretrans_alert().
> 
> Signed-off-by: Yuchung Cheng <ycheng@google.com>
> ---
>  net/ipv4/tcp_input.c | 32 ++++++++++++++++++++------------
>  1 file changed, 20 insertions(+), 12 deletions(-)

Acked-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH net-next] tcp: increase throughput when reordering is high
  2013-08-22  0:29 [PATCH net-next] tcp: increase throughput when reordering is high Yuchung Cheng
  2013-08-22  2:18 ` Neal Cardwell
  2013-08-22  3:52 ` Eric Dumazet
@ 2013-08-22 21:40 ` David Miller
  2 siblings, 0 replies; 4+ messages in thread
From: David Miller @ 2013-08-22 21:40 UTC (permalink / raw)
  To: ycheng; +Cc: ncardwell, edumazet, netdev

From: Yuchung Cheng <ycheng@google.com>
Date: Wed, 21 Aug 2013 17:29:23 -0700

> The stack currently detects reordering and avoid spurious
> retransmission very well. However the throughput is sub-optimal under
> high reordering because cwnd is increased only if the data is deliverd
> in order. I.e., FLAG_DATA_ACKED check in tcp_ack().  The more packet
> are reordered the worse the throughput is.
> 
> Therefore when reordering is proven high, cwnd should advance whenever
> the data is delivered regardless of its ordering. If reordering is low,
> conservatively advance cwnd only on ordered deliveries in Open state,
> and retain cwnd in Disordered state (RFC5681).
> 
> Using netperf on a qdisc setup of 20Mbps BW and random RTT from 45ms
> to 55ms (for reordering effect). This change increases TCP throughput
> by 20 - 25% to near bottleneck BW.
> 
> A special case is the stretched ACK with new SACK and/or ECE mark.
> For example, a receiver may receive an out of order or ECN packet with
> unacked data buffered because of LRO or delayed ACK. The principle on
> such an ACK is to advance cwnd on the cummulative acked part first,
> then reduce cwnd in tcp_fastretrans_alert().
> 
> Signed-off-by: Yuchung Cheng <ycheng@google.com>

Applied, your team is doing great work on the TCP stack!

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2013-08-22 21:40 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-08-22  0:29 [PATCH net-next] tcp: increase throughput when reordering is high Yuchung Cheng
2013-08-22  2:18 ` Neal Cardwell
2013-08-22  3:52 ` Eric Dumazet
2013-08-22 21:40 ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).