From: Yuchung Cheng <ycheng@google.com>
To: davem@davemloft.net
Cc: netdev@vger.kernel.org, edumazet@google.com,
ncardwell@google.com, nanditad@google.com,
Yuchung Cheng <ycheng@google.com>
Subject: [net-next 06/13] tcp: check undo conditions before detecting losses
Date: Thu, 12 Jan 2017 22:03:47 -0800 [thread overview]
Message-ID: <20170113060354.85234-7-ycheng@google.com> (raw)
In-Reply-To: <20170113060354.85234-1-ycheng@google.com>
Currently RACK would mark loss before the undo operations in TCP
loss recovery. This could incorrectly identify real losses as
spurious. For example a sender first experiences a delay spike and
then eventually some packets were lost due to buffer overrun.
In this case, the sender should perform fast recovery b/c not all
the packets were lost.
But the sender may first trigger a (spurious) RTO and reset
cwnd to 1. The following ACKs may used to mark real losses by
tcp_rack_mark_lost. Then in tcp_process_loss this ACK could trigger
F-RTO undo condition and unmark real losses and revert the cwnd
reduction. If there are no more ACKs coming back, eventually the
sender would timeout again instead of performing fast recovery.
The patch fixes this incorrect process by always performing
the undo checks before detecting losses.
Fixes: 4f41b1c58a32 ("tcp: use RACK to detect losses")
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
---
net/ipv4/tcp_input.c | 33 ++++++++++++++++++++-------------
1 file changed, 20 insertions(+), 13 deletions(-)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index e42ca11c0326..9c98dc874825 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2801,6 +2801,21 @@ static bool tcp_try_undo_partial(struct sock *sk, const int acked)
return false;
}
+static void tcp_rack_identify_loss(struct sock *sk, int *ack_flag,
+ const struct skb_mstamp *ack_time)
+{
+ struct tcp_sock *tp = tcp_sk(sk);
+
+ /* Use RACK to detect loss */
+ if (sysctl_tcp_recovery & TCP_RACK_LOST_RETRANS) {
+ u32 prior_retrans = tp->retrans_out;
+
+ tcp_rack_mark_lost(sk, ack_time);
+ if (prior_retrans > tp->retrans_out)
+ *ack_flag |= FLAG_LOST_RETRANS;
+ }
+}
+
/* Process an event, which can update packets-in-flight not trivially.
* Main goal of this function is to calculate new estimate for left_out,
* taking into account both packets sitting in receiver's buffer and
@@ -2866,17 +2881,6 @@ static void tcp_fastretrans_alert(struct sock *sk, const int acked,
}
}
- /* Use RACK to detect loss */
- if (sysctl_tcp_recovery & TCP_RACK_LOST_RETRANS) {
- u32 prior_retrans = tp->retrans_out;
-
- tcp_rack_mark_lost(sk, ack_time);
- if (prior_retrans > tp->retrans_out) {
- flag |= FLAG_LOST_RETRANS;
- *ack_flag |= FLAG_LOST_RETRANS;
- }
- }
-
/* E. Process state. */
switch (icsk->icsk_ca_state) {
case TCP_CA_Recovery:
@@ -2894,11 +2898,13 @@ static void tcp_fastretrans_alert(struct sock *sk, const int acked,
tcp_try_keep_open(sk);
return;
}
+ tcp_rack_identify_loss(sk, ack_flag, ack_time);
break;
case TCP_CA_Loss:
tcp_process_loss(sk, flag, is_dupack, rexmit);
- if (icsk->icsk_ca_state != TCP_CA_Open &&
- !(flag & FLAG_LOST_RETRANS))
+ tcp_rack_identify_loss(sk, ack_flag, ack_time);
+ if (!(icsk->icsk_ca_state == TCP_CA_Open ||
+ (*ack_flag & FLAG_LOST_RETRANS)))
return;
/* Change state if cwnd is undone or retransmits are lost */
default:
@@ -2912,6 +2918,7 @@ static void tcp_fastretrans_alert(struct sock *sk, const int acked,
if (icsk->icsk_ca_state <= TCP_CA_Disorder)
tcp_try_undo_dsack(sk);
+ tcp_rack_identify_loss(sk, ack_flag, ack_time);
if (!tcp_time_to_recover(sk, flag)) {
tcp_try_to_open(sk, flag);
return;
--
2.11.0.483.g087da7b7c-goog
next prev parent reply other threads:[~2017-01-13 6:04 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-01-13 6:03 [net-next 00/13] RACK fast recovery Yuchung Cheng
2017-01-13 6:03 ` [net-next 01/13] tcp: new helper function for RACK loss detection Yuchung Cheng
2017-01-13 6:07 ` Yuchung Cheng
2017-01-13 6:03 ` [net-next 02/13] tcp: new helper for RACK to detect loss Yuchung Cheng
2017-01-13 6:03 ` [net-next 03/13] tcp: record most recent RTT in RACK loss detection Yuchung Cheng
2017-01-13 6:03 ` [net-next 04/13] tcp: add reordering timer " Yuchung Cheng
2017-01-13 6:03 ` [net-next 05/13] tcp: use sequence to break TS ties for " Yuchung Cheng
2017-01-13 6:03 ` Yuchung Cheng [this message]
2017-01-13 6:03 ` [net-next 07/13] tcp: enable RACK loss detection to trigger recovery Yuchung Cheng
2017-01-13 6:03 ` [net-next 08/13] tcp: extend F-RTO to catch more spurious timeouts Yuchung Cheng
2017-01-13 6:03 ` [net-next 09/13] tcp: remove forward retransmit feature Yuchung Cheng
2017-01-13 6:03 ` [net-next 10/13] tcp: remove early retransmit Yuchung Cheng
2017-01-13 6:03 ` [net-next 11/13] tcp: remove RFC4653 NCR Yuchung Cheng
2017-01-13 6:03 ` [net-next 12/13] tcp: remove thin_dupack feature Yuchung Cheng
2017-01-13 6:03 ` [net-next 13/13] tcp: disable fack by default Yuchung Cheng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170113060354.85234-7-ycheng@google.com \
--to=ycheng@google.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=nanditad@google.com \
--cc=ncardwell@google.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).