[PATCH net] tcp: defer SACK compression after DupThresh

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net] tcp: defer SACK compression after DupThresh
@ 2018-11-20 13:53 Eric Dumazet
  2018-11-21 23:50 ` David Miller
  0 siblings, 1 reply; 2+ messages in thread
From: Eric Dumazet @ 2018-11-20 13:53 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Jean-Louis Dupond, Neal Cardwell, Yuchung Cheng,
	Eric Dumazet, Eric Dumazet

Jean-Louis reported a TCP regression and bisected to recent SACK
compression.

After a loss episode (receiver not able to keep up and dropping
packets because its backlog is full), linux TCP stack is sending
a single SACK (DUPACK).

Sender waits a full RTO timer before recovering losses.

While RFC 6675 says in section 5, "Algorithm Details",

   (2) If DupAcks < DupThresh but IsLost (HighACK + 1) returns true --
       indicating at least three segments have arrived above the current
       cumulative acknowledgment point, which is taken to indicate loss
       -- go to step (4).
...
   (4) Invoke fast retransmit and enter loss recovery as follows:

there are old TCP stacks not implementing this strategy, and
still counting the dupacks before starting fast retransmit.

While these stacks probably perform poorly when receivers implement
LRO/GRO, we should be a little more gentle to them.

This patch makes sure we do not enable SACK compression unless
3 dupacks have been sent since last rcv_nxt update.

Ideally we should even rearm the timer to send one or two
more DUPACK if no more packets are coming, but that will
be work aiming for linux-4.21.

Many thanks to Jean-Louis for bisecting the issue, providing
packet captures and testing this patch.

Fixes: 5d9f4262b7ea ("tcp: add SACK compression")
Reported-by: Jean-Louis Dupond <jean-louis@dupond.be>
Tested-by: Jean-Louis Dupond <jean-louis@dupond.be>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
---
 include/linux/tcp.h   |  1 +
 net/ipv4/tcp_input.c  | 14 ++++++++++++--
 net/ipv4/tcp_output.c |  6 +++---
 net/ipv4/tcp_timer.c  |  2 +-
 4 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 8ed77bb4ed8636e9294389a011529fd9a667dce4..a9b0280687d52797972506a8bac13ed0747e2182 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -196,6 +196,7 @@ struct tcp_sock {
 	u32	rcv_tstamp;	/* timestamp of last received ACK (for keepalives) */
 	u32	lsndtime;	/* timestamp of last sent data packet (for restart window) */
 	u32	last_oow_ack_time;  /* timestamp of last out-of-window ACK */
+	u32	compressed_ack_rcv_nxt;
 
 	u32	tsoffset;	/* timestamp offset */
 
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 2868ef28ce52179b3c5874e749b680ffbdc0521a..81e4264676b404fdfa23c2aeb9fe50459deee120 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4268,7 +4268,7 @@ static void tcp_sack_new_ofo_skb(struct sock *sk, u32 seq, u32 end_seq)
 	 * If the sack array is full, forget about the last one.
 	 */
 	if (this_sack >= TCP_NUM_SACKS) {
-		if (tp->compressed_ack)
+		if (tp->compressed_ack > TCP_FASTRETRANS_THRESH)
 			tcp_send_ack(sk);
 		this_sack--;
 		tp->rx_opt.num_sacks--;
@@ -5188,7 +5188,17 @@ static void __tcp_ack_snd_check(struct sock *sk, int ofo_possible)
 	if (!tcp_is_sack(tp) ||
 	    tp->compressed_ack >= sock_net(sk)->ipv4.sysctl_tcp_comp_sack_nr)
 		goto send_now;
-	tp->compressed_ack++;
+
+	if (tp->compressed_ack_rcv_nxt != tp->rcv_nxt) {
+		tp->compressed_ack_rcv_nxt = tp->rcv_nxt;
+		if (tp->compressed_ack > TCP_FASTRETRANS_THRESH)
+			NET_ADD_STATS(sock_net(sk), LINUX_MIB_TCPACKCOMPRESSED,
+				      tp->compressed_ack - TCP_FASTRETRANS_THRESH);
+		tp->compressed_ack = 0;
+	}
+
+	if (++tp->compressed_ack <= TCP_FASTRETRANS_THRESH)
+		goto send_now;
 
 	if (hrtimer_is_queued(&tp->compressed_ack_timer))
 		return;
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 9c34b97d365d719ff76250bc9fe7fa20495a3ed2..3f510cad0b3ec884aeb23f58aaa597ec98c82c88 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -180,10 +180,10 @@ static inline void tcp_event_ack_sent(struct sock *sk, unsigned int pkts,
 {
 	struct tcp_sock *tp = tcp_sk(sk);
 
-	if (unlikely(tp->compressed_ack)) {
+	if (unlikely(tp->compressed_ack > TCP_FASTRETRANS_THRESH)) {
 		NET_ADD_STATS(sock_net(sk), LINUX_MIB_TCPACKCOMPRESSED,
-			      tp->compressed_ack);
-		tp->compressed_ack = 0;
+			      tp->compressed_ack - TCP_FASTRETRANS_THRESH);
+		tp->compressed_ack = TCP_FASTRETRANS_THRESH;
 		if (hrtimer_try_to_cancel(&tp->compressed_ack_timer) == 1)
 			__sock_put(sk);
 	}
diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index 676020663ce80a79341ad1a05352742cc8dd5850..5f8b6d3cd855dc639409e69d84ade5bb2be51626 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -740,7 +740,7 @@ static enum hrtimer_restart tcp_compressed_ack_kick(struct hrtimer *timer)
 
 	bh_lock_sock(sk);
 	if (!sock_owned_by_user(sk)) {
-		if (tp->compressed_ack)
+		if (tp->compressed_ack > TCP_FASTRETRANS_THRESH)
 			tcp_send_ack(sk);
 	} else {
 		if (!test_and_set_bit(TCP_DELACK_TIMER_DEFERRED,
-- 
2.19.1.1215.g8438c0b245-goog

^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH net] tcp: defer SACK compression after DupThresh
  2018-11-20 13:53 [PATCH net] tcp: defer SACK compression after DupThresh Eric Dumazet
@ 2018-11-21 23:50 ` David Miller
  0 siblings, 0 replies; 2+ messages in thread
From: David Miller @ 2018-11-21 23:50 UTC (permalink / raw)
  To: edumazet; +Cc: netdev, jean-louis, ncardwell, ycheng, eric.dumazet

From: Eric Dumazet <edumazet@google.com>
Date: Tue, 20 Nov 2018 05:53:59 -0800

> Jean-Louis reported a TCP regression and bisected to recent SACK
> compression.
> 
> After a loss episode (receiver not able to keep up and dropping
> packets because its backlog is full), linux TCP stack is sending
> a single SACK (DUPACK).
> 
> Sender waits a full RTO timer before recovering losses.
> 
> While RFC 6675 says in section 5, "Algorithm Details",
> 
>    (2) If DupAcks < DupThresh but IsLost (HighACK + 1) returns true --
>        indicating at least three segments have arrived above the current
>        cumulative acknowledgment point, which is taken to indicate loss
>        -- go to step (4).
> ...
>    (4) Invoke fast retransmit and enter loss recovery as follows:
> 
> there are old TCP stacks not implementing this strategy, and
> still counting the dupacks before starting fast retransmit.
> 
> While these stacks probably perform poorly when receivers implement
> LRO/GRO, we should be a little more gentle to them.
> 
> This patch makes sure we do not enable SACK compression unless
> 3 dupacks have been sent since last rcv_nxt update.
> 
> Ideally we should even rearm the timer to send one or two
> more DUPACK if no more packets are coming, but that will
> be work aiming for linux-4.21.
> 
> Many thanks to Jean-Louis for bisecting the issue, providing
> packet captures and testing this patch.
> 
> Fixes: 5d9f4262b7ea ("tcp: add SACK compression")
> Reported-by: Jean-Louis Dupond <jean-louis@dupond.be>
> Tested-by: Jean-Louis Dupond <jean-louis@dupond.be>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Acked-by: Neal Cardwell <ncardwell@google.com>

Applied and queued up for -stable.

Thanks Eric.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2018-11-22 10:27 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-11-20 13:53 [PATCH net] tcp: defer SACK compression after DupThresh Eric Dumazet
2018-11-21 23:50 ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).