netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2 net-next] tcp: do not use RTT from delayed ACKs for min-RTT
@ 2018-01-17 20:10 Yuchung Cheng
  2018-01-17 20:11 ` [PATCH 1/2 net-next] tcp: avoid min-RTT overestimation from delayed ACKs Yuchung Cheng
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Yuchung Cheng @ 2018-01-17 20:10 UTC (permalink / raw)
  To: davem; +Cc: netdev, edumazet, ncardwell, soheil, Yuchung Cheng

This patch set prevents TCP sender from using RTT samples from
(suspected) delayed ACKs as the minimum RTT, to avoid unbounded
over-estimation of the network path delay. This issue is common
when a connection has extended periods of one packet chit-chat
beyond the min RTT filter window. The first patch does that for TCP
general min RTT estimation. The second patch addresses specifically
the BBR congestion control's min RTT filter.

Yuchung Cheng (2):
  tcp: avoid min-RTT overestimation from delayed ACKs
  tcp: avoid min RTT bloat by skipping RTT from delayed-ACK in BBR

 include/net/tcp.h    |  1 +
 net/ipv4/tcp_bbr.c   |  3 ++-
 net/ipv4/tcp_input.c | 24 ++++++++++++++++++++++--
 3 files changed, 25 insertions(+), 3 deletions(-)

-- 
2.16.0.rc1.238.g530d649a79-goog

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 1/2 net-next] tcp: avoid min-RTT overestimation from delayed ACKs
  2018-01-17 20:10 [PATCH 0/2 net-next] tcp: do not use RTT from delayed ACKs for min-RTT Yuchung Cheng
@ 2018-01-17 20:11 ` Yuchung Cheng
  2018-01-17 20:11 ` [PATCH 2/2 net-next] tcp: avoid min RTT bloat by skipping RTT from delayed-ACK in BBR Yuchung Cheng
  2018-01-19 20:39 ` [PATCH 0/2 net-next] tcp: do not use RTT from delayed ACKs for min-RTT David Miller
  2 siblings, 0 replies; 4+ messages in thread
From: Yuchung Cheng @ 2018-01-17 20:11 UTC (permalink / raw)
  To: davem; +Cc: netdev, edumazet, ncardwell, soheil, Yuchung Cheng

This patch avoids having TCP sender or congestion control
overestimate the min RTT by orders of magnitude. This happens when
all the samples in the windowed filter are one-packet transfer
like small request and health-check like chit-chat, which is farily
common for applications using persistent connections. This patch
tries to conservatively labels and skip RTT samples obtained from
this type of workload.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/tcp_input.c | 23 +++++++++++++++++++++--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index ff71b18d9682..2c6797134553 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -97,6 +97,7 @@ int sysctl_tcp_max_orphans __read_mostly = NR_FILE;
 #define FLAG_SACK_RENEGING	0x2000 /* snd_una advanced to a sacked seq */
 #define FLAG_UPDATE_TS_RECENT	0x4000 /* tcp_replace_ts_recent() */
 #define FLAG_NO_CHALLENGE_ACK	0x8000 /* do not call tcp_send_challenge_ack()	*/
+#define FLAG_ACK_MAYBE_DELAYED	0x10000 /* Likely a delayed ACK */
 
 #define FLAG_ACKED		(FLAG_DATA_ACKED|FLAG_SYN_ACKED)
 #define FLAG_NOT_DUP		(FLAG_DATA|FLAG_WIN_UPDATE|FLAG_ACKED)
@@ -2857,11 +2858,18 @@ static void tcp_fastretrans_alert(struct sock *sk, const u32 prior_snd_una,
 	*rexmit = REXMIT_LOST;
 }
 
-static void tcp_update_rtt_min(struct sock *sk, u32 rtt_us)
+static void tcp_update_rtt_min(struct sock *sk, u32 rtt_us, const int flag)
 {
 	u32 wlen = sock_net(sk)->ipv4.sysctl_tcp_min_rtt_wlen * HZ;
 	struct tcp_sock *tp = tcp_sk(sk);
 
+	if ((flag & FLAG_ACK_MAYBE_DELAYED) && rtt_us > tcp_min_rtt(tp)) {
+		/* If the remote keeps returning delayed ACKs, eventually
+		 * the min filter would pick it up and overestimate the
+		 * prop. delay when it expires. Skip suspected delayed ACKs.
+		 */
+		return;
+	}
 	minmax_running_min(&tp->rtt_min, wlen, tcp_jiffies32,
 			   rtt_us ? : jiffies_to_usecs(1));
 }
@@ -2901,7 +2909,7 @@ static bool tcp_ack_update_rtt(struct sock *sk, const int flag,
 	 * always taken together with ACK, SACK, or TS-opts. Any negative
 	 * values will be skipped with the seq_rtt_us < 0 check above.
 	 */
-	tcp_update_rtt_min(sk, ca_rtt_us);
+	tcp_update_rtt_min(sk, ca_rtt_us, flag);
 	tcp_rtt_estimator(sk, seq_rtt_us);
 	tcp_set_rto(sk);
 
@@ -3125,6 +3133,17 @@ static int tcp_clean_rtx_queue(struct sock *sk, u32 prior_fack,
 	if (likely(first_ackt) && !(flag & FLAG_RETRANS_DATA_ACKED)) {
 		seq_rtt_us = tcp_stamp_us_delta(tp->tcp_mstamp, first_ackt);
 		ca_rtt_us = tcp_stamp_us_delta(tp->tcp_mstamp, last_ackt);
+
+		if (pkts_acked == 1 && last_in_flight < tp->mss_cache &&
+		    last_in_flight && !prior_sacked && fully_acked &&
+		    sack->rate->prior_delivered + 1 == tp->delivered &&
+		    !(flag & (FLAG_CA_ALERT | FLAG_SYN_ACKED))) {
+			/* Conservatively mark a delayed ACK. It's typically
+			 * from a lone runt packet over the round trip to
+			 * a receiver w/o out-of-order or CE events.
+			 */
+			flag |= FLAG_ACK_MAYBE_DELAYED;
+		}
 	}
 	if (sack->first_sackt) {
 		sack_rtt_us = tcp_stamp_us_delta(tp->tcp_mstamp, sack->first_sackt);
-- 
2.16.0.rc1.238.g530d649a79-goog

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH 2/2 net-next] tcp: avoid min RTT bloat by skipping RTT from delayed-ACK in BBR
  2018-01-17 20:10 [PATCH 0/2 net-next] tcp: do not use RTT from delayed ACKs for min-RTT Yuchung Cheng
  2018-01-17 20:11 ` [PATCH 1/2 net-next] tcp: avoid min-RTT overestimation from delayed ACKs Yuchung Cheng
@ 2018-01-17 20:11 ` Yuchung Cheng
  2018-01-19 20:39 ` [PATCH 0/2 net-next] tcp: do not use RTT from delayed ACKs for min-RTT David Miller
  2 siblings, 0 replies; 4+ messages in thread
From: Yuchung Cheng @ 2018-01-17 20:11 UTC (permalink / raw)
  To: davem; +Cc: netdev, edumazet, ncardwell, soheil, Yuchung Cheng

A persistent connection may send tiny amount of data (e.g. health-check)
for a long period of time. BBR's windowed min RTT filter may only see
RTT samples from delayed ACKs causing BBR to grossly over-estimate
the path delay depending how much the ACK was delayed at the receiver.

This patch skips RTT samples that are likely coming from delayed ACKs. Note
that it is possible the sender never obtains a valid measure to set the
min RTT. In this case BBR will continue to set cwnd to initial window
which seems fine because the connection is thin stream.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Priyaranjan Jha <priyarjha@google.com>
---
 include/net/tcp.h    | 1 +
 net/ipv4/tcp_bbr.c   | 3 ++-
 net/ipv4/tcp_input.c | 1 +
 3 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 6939e69d3c37..5a1d26a18599 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -953,6 +953,7 @@ struct rate_sample {
 	u32  prior_in_flight;	/* in flight before this ACK */
 	bool is_app_limited;	/* is sample from packet with bubble in pipe? */
 	bool is_retrans;	/* is sample from retransmission? */
+	bool is_ack_delayed;	/* is this (likely) a delayed ACK? */
 };
 
 struct tcp_congestion_ops {
diff --git a/net/ipv4/tcp_bbr.c b/net/ipv4/tcp_bbr.c
index 8322f26e770e..785712be5b0d 100644
--- a/net/ipv4/tcp_bbr.c
+++ b/net/ipv4/tcp_bbr.c
@@ -766,7 +766,8 @@ static void bbr_update_min_rtt(struct sock *sk, const struct rate_sample *rs)
 	filter_expired = after(tcp_jiffies32,
 			       bbr->min_rtt_stamp + bbr_min_rtt_win_sec * HZ);
 	if (rs->rtt_us >= 0 &&
-	    (rs->rtt_us <= bbr->min_rtt_us || filter_expired)) {
+	    (rs->rtt_us <= bbr->min_rtt_us ||
+	     (filter_expired && !rs->is_ack_delayed))) {
 		bbr->min_rtt_us = rs->rtt_us;
 		bbr->min_rtt_stamp = tcp_jiffies32;
 	}
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 2c6797134553..cfa51cfd2d99 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3633,6 +3633,7 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
 
 	delivered = tp->delivered - delivered;	/* freshly ACKed or SACKed */
 	lost = tp->lost - lost;			/* freshly marked lost */
+	rs.is_ack_delayed = !!(flag & FLAG_ACK_MAYBE_DELAYED);
 	tcp_rate_gen(sk, delivered, lost, is_sack_reneg, sack_state.rate);
 	tcp_cong_control(sk, ack, delivered, flag, sack_state.rate);
 	tcp_xmit_recovery(sk, rexmit);
-- 
2.16.0.rc1.238.g530d649a79-goog

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH 0/2 net-next] tcp: do not use RTT from delayed ACKs for min-RTT
  2018-01-17 20:10 [PATCH 0/2 net-next] tcp: do not use RTT from delayed ACKs for min-RTT Yuchung Cheng
  2018-01-17 20:11 ` [PATCH 1/2 net-next] tcp: avoid min-RTT overestimation from delayed ACKs Yuchung Cheng
  2018-01-17 20:11 ` [PATCH 2/2 net-next] tcp: avoid min RTT bloat by skipping RTT from delayed-ACK in BBR Yuchung Cheng
@ 2018-01-19 20:39 ` David Miller
  2 siblings, 0 replies; 4+ messages in thread
From: David Miller @ 2018-01-19 20:39 UTC (permalink / raw)
  To: ycheng; +Cc: netdev, edumazet, ncardwell, soheil

From: Yuchung Cheng <ycheng@google.com>
Date: Wed, 17 Jan 2018 12:10:59 -0800

> This patch set prevents TCP sender from using RTT samples from
> (suspected) delayed ACKs as the minimum RTT, to avoid unbounded
> over-estimation of the network path delay. This issue is common
> when a connection has extended periods of one packet chit-chat
> beyond the min RTT filter window. The first patch does that for TCP
> general min RTT estimation. The second patch addresses specifically
> the BBR congestion control's min RTT filter.

Series applied, thank you.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-01-19 20:39 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-01-17 20:10 [PATCH 0/2 net-next] tcp: do not use RTT from delayed ACKs for min-RTT Yuchung Cheng
2018-01-17 20:11 ` [PATCH 1/2 net-next] tcp: avoid min-RTT overestimation from delayed ACKs Yuchung Cheng
2018-01-17 20:11 ` [PATCH 2/2 net-next] tcp: avoid min RTT bloat by skipping RTT from delayed-ACK in BBR Yuchung Cheng
2018-01-19 20:39 ` [PATCH 0/2 net-next] tcp: do not use RTT from delayed ACKs for min-RTT David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).