[PATCH net] tcp: reduce tcp_comp_sack_slack

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net] tcp: reduce tcp_comp_sack_slack_ns default value to 10 usec
@ 2025-11-14 13:51 Eric Dumazet
  2025-11-14 14:08 ` Eric Dumazet
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Eric Dumazet @ 2025-11-14 13:51 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Neal Cardwell, Kuniyuki Iwashima, netdev,
	eric.dumazet, Eric Dumazet

net.ipv4.tcp_comp_sack_slack_ns current default value is too high.

When a flow has many drops (1 % or more), and small RTT, adding 100 usec
before sending SACK stalls the sender relying on getting SACK
fast enough to keep the pipe busy.

Decrease the default to 10 usec.

This is orthogonal to Congestion Control heuristics to determine
if drops are caused by congestion or not.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 Documentation/networking/ip-sysctl.rst | 3 ++-
 net/ipv4/tcp_ipv4.c                    | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst
index 2bae61be18593a8111a83d9f034517e4646eb653..f4ad739a6b532914e4091c425828b329ee342bc6 100644
--- a/Documentation/networking/ip-sysctl.rst
+++ b/Documentation/networking/ip-sysctl.rst
@@ -875,8 +875,9 @@ tcp_comp_sack_slack_ns - LONG INTEGER
 	timer used by SACK compression. This gives extra time
 	for small RTT flows, and reduces system overhead by allowing
 	opportunistic reduction of timer interrupts.
+	Too big values might reduce goodput.
 
-	Default : 100,000 ns (100 us)
+	Default : 10,000 ns (10 us)
 
 tcp_comp_sack_nr - INTEGER
 	Max number of SACK that can be compressed.
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index a7d9fec2950b915e24f0586b2cb964e0e68866ed..6fcaecb67284ecade97b623d955dbbe2cd02a831 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -3593,7 +3593,7 @@ static int __net_init tcp_sk_init(struct net *net)
 		       sizeof(init_net.ipv4.sysctl_tcp_wmem));
 	}
 	net->ipv4.sysctl_tcp_comp_sack_delay_ns = NSEC_PER_MSEC;
-	net->ipv4.sysctl_tcp_comp_sack_slack_ns = 100 * NSEC_PER_USEC;
+	net->ipv4.sysctl_tcp_comp_sack_slack_ns = 10 * NSEC_PER_USEC;
 	net->ipv4.sysctl_tcp_comp_sack_nr = 44;
 	net->ipv4.sysctl_tcp_comp_sack_rtt_percent = 33;
 	net->ipv4.sysctl_tcp_backlog_ack_defer = 1;
-- 
2.52.0.rc1.455.g30608eb744-goog


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH net] tcp: reduce tcp_comp_sack_slack_ns default value to 10 usec
  2025-11-14 13:51 [PATCH net] tcp: reduce tcp_comp_sack_slack_ns default value to 10 usec Eric Dumazet
@ 2025-11-14 14:08 ` Eric Dumazet
  2025-11-14 16:03   ` Jakub Kicinski
  2025-11-14 14:22 ` Neal Cardwell
  2025-11-18  1:10 ` patchwork-bot+netdevbpf
  2 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2025-11-14 14:08 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Neal Cardwell, Kuniyuki Iwashima, netdev,
	eric.dumazet

On Fri, Nov 14, 2025 at 5:51 AM Eric Dumazet <edumazet@google.com> wrote:
>
> net.ipv4.tcp_comp_sack_slack_ns current default value is too high.
>
> When a flow has many drops (1 % or more), and small RTT, adding 100 usec
> before sending SACK stalls the sender relying on getting SACK
> fast enough to keep the pipe busy.
>
> Decrease the default to 10 usec.
>
> This is orthogonal to Congestion Control heuristics to determine
> if drops are caused by congestion or not.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

This was meant for net-next, but applying this to net tree should be
fine as well.

No need for backports though.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net] tcp: reduce tcp_comp_sack_slack_ns default value to 10 usec
  2025-11-14 13:51 [PATCH net] tcp: reduce tcp_comp_sack_slack_ns default value to 10 usec Eric Dumazet
  2025-11-14 14:08 ` Eric Dumazet
@ 2025-11-14 14:22 ` Neal Cardwell
  2025-11-18  1:10 ` patchwork-bot+netdevbpf
  2 siblings, 0 replies; 8+ messages in thread
From: Neal Cardwell @ 2025-11-14 14:22 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Kuniyuki Iwashima, netdev, eric.dumazet

On Fri, Nov 14, 2025 at 8:51 AM Eric Dumazet <edumazet@google.com> wrote:
>
> net.ipv4.tcp_comp_sack_slack_ns current default value is too high.
>
> When a flow has many drops (1 % or more), and small RTT, adding 100 usec
> before sending SACK stalls the sender relying on getting SACK
> fast enough to keep the pipe busy.
>
> Decrease the default to 10 usec.
>
> This is orthogonal to Congestion Control heuristics to determine
> if drops are caused by congestion or not.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---

Reviewed-by: Neal Cardwell <ncardwell@google.com>

Thanks, Eric!

neal

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net] tcp: reduce tcp_comp_sack_slack_ns default value to 10 usec
  2025-11-14 14:08 ` Eric Dumazet
@ 2025-11-14 16:03   ` Jakub Kicinski
  2025-11-14 16:17     ` Eric Dumazet
  0 siblings, 1 reply; 8+ messages in thread
From: Jakub Kicinski @ 2025-11-14 16:03 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Paolo Abeni, Simon Horman, Neal Cardwell,
	Kuniyuki Iwashima, netdev, eric.dumazet

On Fri, 14 Nov 2025 06:08:58 -0800 Eric Dumazet wrote:
> On Fri, Nov 14, 2025 at 5:51 AM Eric Dumazet <edumazet@google.com> wrote:
> >
> > net.ipv4.tcp_comp_sack_slack_ns current default value is too high.
> >
> > When a flow has many drops (1 % or more), and small RTT, adding 100 usec
> > before sending SACK stalls the sender relying on getting SACK
> > fast enough to keep the pipe busy.
> >
> > Decrease the default to 10 usec.
> >
> > This is orthogonal to Congestion Control heuristics to determine
> > if drops are caused by congestion or not.
> >
> > Signed-off-by: Eric Dumazet <edumazet@google.com>  
> 
> This was meant for net-next, but applying this to net tree should be
> fine as well.
> 
> No need for backports though.

Sorry to piggy back on a random post but looks like the changes from
a ~week ago made ncdevmem flaky: 

https://netdev.bots.linux.dev/contest.html?executor=vmksft-fbnic-qemu&test=devmem-py

Specifically it says:

using ifindex=3
using queues 2..3
got tx dmabuf id=5
Connect to 2001:db8:1::2 37943 (via enp1s0)
sendmsg_ret=6
ncdevmem: did not receive tx completion

This is what was in the branch that made the test fail:

[+] tcp: add net.ipv4.tcp_comp_sack_rtt_percent
[+] net: increase skb_defer_max default to 128
[+] net: fix napi_consume_skb() with alien skbs
[+] net: allow skb_release_head_state() to be called multiple times

https://netdev.bots.linux.dev/static/nipa/branch_deltas/net-next-hw-2025-11-08--00-00.html

I'm guessing we need to take care of the uarg if we defer freeing 
of Tx skbs..

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net] tcp: reduce tcp_comp_sack_slack_ns default value to 10 usec
  2025-11-14 16:03   ` Jakub Kicinski
@ 2025-11-14 16:17     ` Eric Dumazet
  2025-11-14 19:35       ` Eric Dumazet
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2025-11-14 16:17 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David S . Miller, Paolo Abeni, Simon Horman, Neal Cardwell,
	Kuniyuki Iwashima, netdev, eric.dumazet

On Fri, Nov 14, 2025 at 8:03 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Fri, 14 Nov 2025 06:08:58 -0800 Eric Dumazet wrote:
> > On Fri, Nov 14, 2025 at 5:51 AM Eric Dumazet <edumazet@google.com> wrote:
> > >
> > > net.ipv4.tcp_comp_sack_slack_ns current default value is too high.
> > >
> > > When a flow has many drops (1 % or more), and small RTT, adding 100 usec
> > > before sending SACK stalls the sender relying on getting SACK
> > > fast enough to keep the pipe busy.
> > >
> > > Decrease the default to 10 usec.
> > >
> > > This is orthogonal to Congestion Control heuristics to determine
> > > if drops are caused by congestion or not.
> > >
> > > Signed-off-by: Eric Dumazet <edumazet@google.com>
> >
> > This was meant for net-next, but applying this to net tree should be
> > fine as well.
> >
> > No need for backports though.
>
> Sorry to piggy back on a random post but looks like the changes from
> a ~week ago made ncdevmem flaky:
>
> https://netdev.bots.linux.dev/contest.html?executor=vmksft-fbnic-qemu&test=devmem-py
>
> Specifically it says:
>
> using ifindex=3
> using queues 2..3
> got tx dmabuf id=5
> Connect to 2001:db8:1::2 37943 (via enp1s0)
> sendmsg_ret=6
> ncdevmem: did not receive tx completion
>
> This is what was in the branch that made the test fail:
>
> [+] tcp: add net.ipv4.tcp_comp_sack_rtt_percent
> [+] net: increase skb_defer_max default to 128
> [+] net: fix napi_consume_skb() with alien skbs
> [+] net: allow skb_release_head_state() to be called multiple times
>
> https://netdev.bots.linux.dev/static/nipa/branch_deltas/net-next-hw-2025-11-08--00-00.html
>
> I'm guessing we need to take care of the uarg if we defer freeing
> of Tx skbs..

Makes sense, or expedite/force the IPI if these skbs are 'deferred'

I did not complete the series to call skb_data_unref() from
skb_attempt_defer_free().
I hope to finish this soon.

Thanks.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net] tcp: reduce tcp_comp_sack_slack_ns default value to 10 usec
  2025-11-14 16:17     ` Eric Dumazet
@ 2025-11-14 19:35       ` Eric Dumazet
  2025-11-15  1:43         ` Jakub Kicinski
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2025-11-14 19:35 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David S . Miller, Paolo Abeni, Simon Horman, Neal Cardwell,
	Kuniyuki Iwashima, netdev, eric.dumazet

On Fri, Nov 14, 2025 at 8:17 AM Eric Dumazet <edumazet@google.com> wrote:
>
> On Fri, Nov 14, 2025 at 8:03 AM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > On Fri, 14 Nov 2025 06:08:58 -0800 Eric Dumazet wrote:
> > > On Fri, Nov 14, 2025 at 5:51 AM Eric Dumazet <edumazet@google.com> wrote:
> > > >
> > > > net.ipv4.tcp_comp_sack_slack_ns current default value is too high.
> > > >
> > > > When a flow has many drops (1 % or more), and small RTT, adding 100 usec
> > > > before sending SACK stalls the sender relying on getting SACK
> > > > fast enough to keep the pipe busy.
> > > >
> > > > Decrease the default to 10 usec.
> > > >
> > > > This is orthogonal to Congestion Control heuristics to determine
> > > > if drops are caused by congestion or not.
> > > >
> > > > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > >
> > > This was meant for net-next, but applying this to net tree should be
> > > fine as well.
> > >
> > > No need for backports though.
> >
> > Sorry to piggy back on a random post but looks like the changes from
> > a ~week ago made ncdevmem flaky:
> >
> > https://netdev.bots.linux.dev/contest.html?executor=vmksft-fbnic-qemu&test=devmem-py
> >
> > Specifically it says:
> >
> > using ifindex=3
> > using queues 2..3
> > got tx dmabuf id=5
> > Connect to 2001:db8:1::2 37943 (via enp1s0)
> > sendmsg_ret=6
> > ncdevmem: did not receive tx completion
> >
> > This is what was in the branch that made the test fail:
> >
> > [+] tcp: add net.ipv4.tcp_comp_sack_rtt_percent
> > [+] net: increase skb_defer_max default to 128
> > [+] net: fix napi_consume_skb() with alien skbs
> > [+] net: allow skb_release_head_state() to be called multiple times
> >
> > https://netdev.bots.linux.dev/static/nipa/branch_deltas/net-next-hw-2025-11-08--00-00.html
> >
> > I'm guessing we need to take care of the uarg if we defer freeing
> > of Tx skbs..
>
> Makes sense, or expedite/force the IPI if these skbs are 'deferred'
>
> I did not complete the series to call skb_data_unref() from
> skb_attempt_defer_free().
> I hope to finish this soon.
>

In the meantime we could add this  fix:

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index f34372666e67cee5329d3ba1d3c86f8622facac3..12d65357fc7f83cfa9f8714227c7b69035441644
100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1480,6 +1480,9 @@ void napi_consume_skb(struct sk_buff *skb, int budget)
        DEBUG_NET_WARN_ON_ONCE(!in_softirq());

        if (skb->alloc_cpu != smp_processor_id() && !skb_shared(skb)) {
+               if (skb_zcopy(skb))
+                       return consume_skb(skb);
+
                skb_release_head_state(skb);
                return skb_attempt_defer_free(skb);
        }

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net] tcp: reduce tcp_comp_sack_slack_ns default value to 10 usec
  2025-11-14 19:35       ` Eric Dumazet
@ 2025-11-15  1:43         ` Jakub Kicinski
  0 siblings, 0 replies; 8+ messages in thread
From: Jakub Kicinski @ 2025-11-15  1:43 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Paolo Abeni, Simon Horman, Neal Cardwell,
	Kuniyuki Iwashima, netdev, eric.dumazet

On Fri, 14 Nov 2025 11:35:12 -0800 Eric Dumazet wrote:
> > Makes sense, or expedite/force the IPI if these skbs are 'deferred'
> >
> > I did not complete the series to call skb_data_unref() from
> > skb_attempt_defer_free().
> > I hope to finish this soon.
> 
> In the meantime we could add this  fix:
> 
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index f34372666e67cee5329d3ba1d3c86f8622facac3..12d65357fc7f83cfa9f8714227c7b69035441644
> 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -1480,6 +1480,9 @@ void napi_consume_skb(struct sk_buff *skb, int budget)
>         DEBUG_NET_WARN_ON_ONCE(!in_softirq());
> 
>         if (skb->alloc_cpu != smp_processor_id() && !skb_shared(skb)) {
> +               if (skb_zcopy(skb))
> +                       return consume_skb(skb);
> +
>                 skb_release_head_state(skb);
>                 return skb_attempt_defer_free(skb);
>         }

Queued locally in the CI for now.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net] tcp: reduce tcp_comp_sack_slack_ns default value to 10 usec
  2025-11-14 13:51 [PATCH net] tcp: reduce tcp_comp_sack_slack_ns default value to 10 usec Eric Dumazet
  2025-11-14 14:08 ` Eric Dumazet
  2025-11-14 14:22 ` Neal Cardwell
@ 2025-11-18  1:10 ` patchwork-bot+netdevbpf
  2 siblings, 0 replies; 8+ messages in thread
From: patchwork-bot+netdevbpf @ 2025-11-18  1:10 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: davem, kuba, pabeni, horms, ncardwell, kuniyu, netdev,
	eric.dumazet

Hello:

This patch was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Fri, 14 Nov 2025 13:51:41 +0000 you wrote:
> net.ipv4.tcp_comp_sack_slack_ns current default value is too high.
> 
> When a flow has many drops (1 % or more), and small RTT, adding 100 usec
> before sending SACK stalls the sender relying on getting SACK
> fast enough to keep the pipe busy.
> 
> Decrease the default to 10 usec.
> 
> [...]

Here is the summary with links:
  - [net] tcp: reduce tcp_comp_sack_slack_ns default value to 10 usec
    https://git.kernel.org/netdev/net-next/c/ca412f25d6b2

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-11-18  1:10 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-14 13:51 [PATCH net] tcp: reduce tcp_comp_sack_slack_ns default value to 10 usec Eric Dumazet
2025-11-14 14:08 ` Eric Dumazet
2025-11-14 16:03   ` Jakub Kicinski
2025-11-14 16:17     ` Eric Dumazet
2025-11-14 19:35       ` Eric Dumazet
2025-11-15  1:43         ` Jakub Kicinski
2025-11-14 14:22 ` Neal Cardwell
2025-11-18  1:10 ` patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).