netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net] tcp: fix tcp_tso_should_defer() vs large RTT
@ 2025-10-11 11:57 Eric Dumazet
  2025-10-11 17:58 ` Neal Cardwell
  0 siblings, 1 reply; 2+ messages in thread
From: Eric Dumazet @ 2025-10-11 11:57 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Neal Cardwell, Willem de Bruijn, Kuniyuki Iwashima,
	netdev, eric.dumazet, Eric Dumazet

Neal reported that using neper tcp_stream with TCP_TX_DELAY
set to 50ms would often lead to flows stuck in a small cwnd mode,
regardless of the congestion control.

While tcp_stream sets TCP_TX_DELAY too late after the connect(),
it highlighted two kernel bugs.

The following heuristic in tcp_tso_should_defer() seems wrong
for large RTT:

delta = tp->tcp_clock_cache - head->tstamp;
/* If next ACK is likely to come too late (half srtt), do not defer */
if ((s64)(delta - (u64)NSEC_PER_USEC * (tp->srtt_us >> 4)) < 0)
      goto send_now;

If next ACK is expected to come in more than 1 ms, we should
not defer because we prefer a smooth ACK clocking.

While blamed commit was a step in the good direction, it was not
generic enough.

Another patch fixing TCP_TX_DELAY for established flows
will be proposed when net-next reopens.

Fixes: 50c8339e9299 ("tcp: tso: restore IW10 after TSO autosizing")
Reported-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/tcp_output.c | 21 ++++++++++++++++-----
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index bb3576ac0ad7d7330ef272e1d9dc1f19bb8f86bb..bbeed379a3c5342c7de0d2416f97ad944e3e35b0 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2369,7 +2369,8 @@ static bool tcp_tso_should_defer(struct sock *sk, struct sk_buff *skb,
 				 u32 max_segs)
 {
 	const struct inet_connection_sock *icsk = inet_csk(sk);
-	u32 send_win, cong_win, limit, in_flight;
+	u32 send_win, cong_win, limit, in_flight, threshold;
+	u64 srtt_in_ns, expected_ack, how_far_is_the_ack;
 	struct tcp_sock *tp = tcp_sk(sk);
 	struct sk_buff *head;
 	int win_divisor;
@@ -2431,10 +2432,20 @@ static bool tcp_tso_should_defer(struct sock *sk, struct sk_buff *skb,
 	head = tcp_rtx_queue_head(sk);
 	if (!head)
 		goto send_now;
-	delta = tp->tcp_clock_cache - head->tstamp;
-	/* If next ACK is likely to come too late (half srtt), do not defer */
-	if ((s64)(delta - (u64)NSEC_PER_USEC * (tp->srtt_us >> 4)) < 0)
-		goto send_now;
+
+	srtt_in_ns = (u64)(NSEC_PER_USEC >> 3) * tp->srtt_us;
+	/* When is the ACK expected ? */
+	expected_ack = head->tstamp + srtt_in_ns;
+	/* How far from now is the ACK expected ? */
+	how_far_is_the_ack = expected_ack - tp->tcp_clock_cache;
+
+	/* If next ACK is likely to come too late,
+	 * ie in more than min(1ms, half srtt), do not defer.
+	 */
+	threshold = min(srtt_in_ns >> 1, NSEC_PER_MSEC);
+
+	if ((s64)(how_far_is_the_ack - threshold) > 0)
+	     goto send_now;
 
 	/* Ok, it looks like it is advisable to defer.
 	 * Three cases are tracked :
-- 
2.51.0.740.g6adb054d12-goog


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH net] tcp: fix tcp_tso_should_defer() vs large RTT
  2025-10-11 11:57 [PATCH net] tcp: fix tcp_tso_should_defer() vs large RTT Eric Dumazet
@ 2025-10-11 17:58 ` Neal Cardwell
  0 siblings, 0 replies; 2+ messages in thread
From: Neal Cardwell @ 2025-10-11 17:58 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Willem de Bruijn, Kuniyuki Iwashima, netdev, eric.dumazet

On Sat, Oct 11, 2025 at 7:57 AM Eric Dumazet <edumazet@google.com> wrote:
>
> Neal reported that using neper tcp_stream with TCP_TX_DELAY
> set to 50ms would often lead to flows stuck in a small cwnd mode,
> regardless of the congestion control.
>
> While tcp_stream sets TCP_TX_DELAY too late after the connect(),
> it highlighted two kernel bugs.
>
> The following heuristic in tcp_tso_should_defer() seems wrong
> for large RTT:
>
> delta = tp->tcp_clock_cache - head->tstamp;
> /* If next ACK is likely to come too late (half srtt), do not defer */
> if ((s64)(delta - (u64)NSEC_PER_USEC * (tp->srtt_us >> 4)) < 0)
>       goto send_now;
>
> If next ACK is expected to come in more than 1 ms, we should
> not defer because we prefer a smooth ACK clocking.
>
> While blamed commit was a step in the good direction, it was not
> generic enough.
>
> Another patch fixing TCP_TX_DELAY for established flows
> will be proposed when net-next reopens.
>
> Fixes: 50c8339e9299 ("tcp: tso: restore IW10 after TSO autosizing")
> Reported-by: Neal Cardwell <ncardwell@google.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---

Thanks, Eric! Great catch! The patch looks great to me, and I tested
that it fixes the issue I was seeing with neper tcp_stream with
TCP_TX_DELAY.

Reviewed-by: Neal Cardwell <ncardwell@google.com>
Tested-by: Neal Cardwell <ncardwell@google.com>

neal

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2025-10-11 17:58 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-11 11:57 [PATCH net] tcp: fix tcp_tso_should_defer() vs large RTT Eric Dumazet
2025-10-11 17:58 ` Neal Cardwell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).