netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net v2] net_sched: sch_fq: don't follow the fast path if Tx is behind now
@ 2024-11-24  2:21 Jakub Kicinski
  2024-11-24 17:48 ` Eric Dumazet
  2024-11-28  9:20 ` patchwork-bot+netdevbpf
  0 siblings, 2 replies; 3+ messages in thread
From: Jakub Kicinski @ 2024-11-24  2:21 UTC (permalink / raw)
  To: edumazet
  Cc: netdev, davem, pabeni, Jakub Kicinski, stable, jhs,
	xiyou.wangcong, jiri

Recent kernels cause a lot of TCP retransmissions

[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  2.24 GBytes  19.2 Gbits/sec  2767    442 KBytes
[  5]   1.00-2.00   sec  2.23 GBytes  19.1 Gbits/sec  2312    350 KBytes
                                                      ^^^^

Replacing the qdisc with pfifo makes retransmissions go away.

It appears that a flow may have a delayed packet with a very near
Tx time. Later, we may get busy processing Rx and the target Tx time
will pass, but we won't service Tx since the CPU is busy with Rx.
If Rx sees an ACK and we try to push more data for the delayed flow
we may fastpath the skb, not realizing that there are already "ready
to send" packets for this flow sitting in the qdisc.

Don't trust the fastpath if we are "behind" according to the projected
Tx time for next flow waiting in the Qdisc. Because we consider anything
within the offload window to be okay for fastpath we must consider
the entire offload window as "now".

Qdisc config:

qdisc fq 8001: dev eth0 parent 1234:1 limit 10000p flow_limit 100p \
  buckets 32768 orphan_mask 1023 bands 3 \
  priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 \
  weights 589824 196608 65536 quantum 3028b initial_quantum 15140b \
  low_rate_threshold 550Kbit \
  refill_delay 40ms timer_slack 10us horizon 10s horizon_drop

For iperf this change seems to do fine, the reordering is gone.
The fastpath still gets used most of the time:

  gc 0 highprio 0 fastpath 142614 throttled 418309 latency 19.1us
   xx_behind 2731

where "xx_behind" counts how many times we hit the new "return false".

CC: stable@vger.kernel.org
Fixes: 076433bd78d7 ("net_sched: sch_fq: add fast path for mostly idle qdisc")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
v2:
 - use Eric's condition (fix offload, don't care about throttled)
 - throttled -> delayed
 - explicitly CC stable, it won't build on 6.12 because of the offload
   horizon, so make sure they don't just drop this
v1: https://lore.kernel.org/20241122162108.2697803-1-kuba@kernel.org

CC: jhs@mojatatu.com
CC: xiyou.wangcong@gmail.com
CC: jiri@resnulli.us
---
 net/sched/sch_fq.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c
index a97638bef6da..a5e87f9ea986 100644
--- a/net/sched/sch_fq.c
+++ b/net/sched/sch_fq.c
@@ -332,6 +332,12 @@ static bool fq_fastpath_check(const struct Qdisc *sch, struct sk_buff *skb,
 		 */
 		if (q->internal.qlen >= 8)
 			return false;
+
+		/* Ordering invariants fall apart if some delayed flows
+		 * are ready but we haven't serviced them, yet.
+		 */
+		if (q->time_next_delayed_flow <= now + q->offload_horizon)
+			return false;
 	}
 
 	sk = skb->sk;
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH net v2] net_sched: sch_fq: don't follow the fast path if Tx is behind now
  2024-11-24  2:21 [PATCH net v2] net_sched: sch_fq: don't follow the fast path if Tx is behind now Jakub Kicinski
@ 2024-11-24 17:48 ` Eric Dumazet
  2024-11-28  9:20 ` patchwork-bot+netdevbpf
  1 sibling, 0 replies; 3+ messages in thread
From: Eric Dumazet @ 2024-11-24 17:48 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: netdev, davem, pabeni, stable, jhs, xiyou.wangcong, jiri

On Sun, Nov 24, 2024 at 3:21 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> Recent kernels cause a lot of TCP retransmissions
>
> [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
> [  5]   0.00-1.00   sec  2.24 GBytes  19.2 Gbits/sec  2767    442 KBytes
> [  5]   1.00-2.00   sec  2.23 GBytes  19.1 Gbits/sec  2312    350 KBytes
>                                                       ^^^^
>
> Replacing the qdisc with pfifo makes retransmissions go away.
>
> It appears that a flow may have a delayed packet with a very near
> Tx time. Later, we may get busy processing Rx and the target Tx time
> will pass, but we won't service Tx since the CPU is busy with Rx.
> If Rx sees an ACK and we try to push more data for the delayed flow
> we may fastpath the skb, not realizing that there are already "ready
> to send" packets for this flow sitting in the qdisc.
>
> Don't trust the fastpath if we are "behind" according to the projected
> Tx time for next flow waiting in the Qdisc. Because we consider anything
> within the offload window to be okay for fastpath we must consider
> the entire offload window as "now".
>
> Qdisc config:
>
> qdisc fq 8001: dev eth0 parent 1234:1 limit 10000p flow_limit 100p \
>   buckets 32768 orphan_mask 1023 bands 3 \
>   priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 \
>   weights 589824 196608 65536 quantum 3028b initial_quantum 15140b \
>   low_rate_threshold 550Kbit \
>   refill_delay 40ms timer_slack 10us horizon 10s horizon_drop
>
> For iperf this change seems to do fine, the reordering is gone.
> The fastpath still gets used most of the time:
>
>   gc 0 highprio 0 fastpath 142614 throttled 418309 latency 19.1us
>    xx_behind 2731
>
> where "xx_behind" counts how many times we hit the new "return false".
>
> CC: stable@vger.kernel.org
> Fixes: 076433bd78d7 ("net_sched: sch_fq: add fast path for mostly idle qdisc")
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Reviewed-by: Eric Dumazet <edumazet@google.com>

Thanks !

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH net v2] net_sched: sch_fq: don't follow the fast path if Tx is behind now
  2024-11-24  2:21 [PATCH net v2] net_sched: sch_fq: don't follow the fast path if Tx is behind now Jakub Kicinski
  2024-11-24 17:48 ` Eric Dumazet
@ 2024-11-28  9:20 ` patchwork-bot+netdevbpf
  1 sibling, 0 replies; 3+ messages in thread
From: patchwork-bot+netdevbpf @ 2024-11-28  9:20 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: edumazet, netdev, davem, pabeni, stable, jhs, xiyou.wangcong,
	jiri

Hello:

This patch was applied to netdev/net.git (main)
by Paolo Abeni <pabeni@redhat.com>:

On Sat, 23 Nov 2024 18:21:48 -0800 you wrote:
> Recent kernels cause a lot of TCP retransmissions
> 
> [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
> [  5]   0.00-1.00   sec  2.24 GBytes  19.2 Gbits/sec  2767    442 KBytes
> [  5]   1.00-2.00   sec  2.23 GBytes  19.1 Gbits/sec  2312    350 KBytes
>                                                       ^^^^
> 
> [...]

Here is the summary with links:
  - [net,v2] net_sched: sch_fq: don't follow the fast path if Tx is behind now
    https://git.kernel.org/netdev/net/c/122aba8c8061

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-11-28  9:20 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-24  2:21 [PATCH net v2] net_sched: sch_fq: don't follow the fast path if Tx is behind now Jakub Kicinski
2024-11-24 17:48 ` Eric Dumazet
2024-11-28  9:20 ` patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).