netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net v2] tcp: Adjust clamping window for applications specifying SO_RCVBUF
@ 2024-07-26 20:41 Subash Abhinov Kasiviswanathan
  2024-07-29  6:59 ` Eric Dumazet
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Subash Abhinov Kasiviswanathan @ 2024-07-26 20:41 UTC (permalink / raw)
  To: edumazet, soheil, ncardwell, yyd, ycheng, davem, kuba, netdev,
	dsahern, pabeni
  Cc: Subash Abhinov Kasiviswanathan, Sean Tranchetti

tp->scaling_ratio is not updated based on skb->len/skb->truesize once
SO_RCVBUF is set leading to the maximum window scaling to be 25% of
rcvbuf after
commit dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale")
and 50% of rcvbuf after
commit 697a6c8cec03 ("tcp: increase the default TCP scaling ratio").
50% tries to emulate the behavior of older kernels using
sysctl_tcp_adv_win_scale with default value.

Systems which were using a different values of sysctl_tcp_adv_win_scale
in older kernels ended up seeing reduced download speeds in certain
cases as covered in https://lists.openwall.net/netdev/2024/05/15/13
While the sysctl scheme is no longer acceptable, the value of 50% is
a bit conservative when the skb->len/skb->truesize ratio is later
determined to be ~0.66.

Applications not specifying SO_RCVBUF update the window scaling and
the receiver buffer every time data is copied to userspace. This
computation is now used for applications setting SO_RCVBUF to update
the maximum window scaling while ensuring that the receive buffer
is within the application specified limit.

Fixes: dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale")
Signed-off-by: Sean Tranchetti <quic_stranche@quicinc.com>
Signed-off-by: Subash Abhinov Kasiviswanathan <quic_subashab@quicinc.com>
---
v1 -> v2
  Update the condition for SO_RCVBUF window_clamp updates to always
  monitor the current rcvbuf value as suggested by Eric.

 net/ipv4/tcp_input.c | 23 ++++++++++++++++-------
 1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 454362e359da..e2b9583ed96a 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -754,8 +754,7 @@ void tcp_rcv_space_adjust(struct sock *sk)
 	 * <prev RTT . ><current RTT .. ><next RTT .... >
 	 */
 
-	if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_moderate_rcvbuf) &&
-	    !(sk->sk_userlocks & SOCK_RCVBUF_LOCK)) {
+	if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_moderate_rcvbuf)) {
 		u64 rcvwin, grow;
 		int rcvbuf;
 
@@ -771,12 +770,22 @@ void tcp_rcv_space_adjust(struct sock *sk)
 
 		rcvbuf = min_t(u64, tcp_space_from_win(sk, rcvwin),
 			       READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_rmem[2]));
-		if (rcvbuf > sk->sk_rcvbuf) {
-			WRITE_ONCE(sk->sk_rcvbuf, rcvbuf);
+		if (!(sk->sk_userlocks & SOCK_RCVBUF_LOCK)) {
+			if (rcvbuf > sk->sk_rcvbuf) {
+				WRITE_ONCE(sk->sk_rcvbuf, rcvbuf);
 
-			/* Make the window clamp follow along.  */
-			WRITE_ONCE(tp->window_clamp,
-				   tcp_win_from_space(sk, rcvbuf));
+				/* Make the window clamp follow along.  */
+				WRITE_ONCE(tp->window_clamp,
+					   tcp_win_from_space(sk, rcvbuf));
+			}
+		} else {
+			/* Make the window clamp follow along while being bounded
+			 * by SO_RCVBUF.
+			 */
+			int clamp = tcp_win_from_space(sk, min(rcvbuf, sk->sk_rcvbuf));
+
+			if (clamp > tp->window_clamp)
+				WRITE_ONCE(tp->window_clamp, clamp);
 		}
 	}
 	tp->rcvq_space.space = copied;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH net v2] tcp: Adjust clamping window for applications specifying SO_RCVBUF
  2024-07-26 20:41 [PATCH net v2] tcp: Adjust clamping window for applications specifying SO_RCVBUF Subash Abhinov Kasiviswanathan
@ 2024-07-29  6:59 ` Eric Dumazet
  2024-07-29 11:24 ` patchwork-bot+netdevbpf
  2024-07-29 14:51 ` Neal Cardwell
  2 siblings, 0 replies; 8+ messages in thread
From: Eric Dumazet @ 2024-07-29  6:59 UTC (permalink / raw)
  To: Subash Abhinov Kasiviswanathan
  Cc: soheil, ncardwell, yyd, ycheng, davem, kuba, netdev, dsahern,
	pabeni, Sean Tranchetti

On Fri, Jul 26, 2024 at 10:41 PM Subash Abhinov Kasiviswanathan
<quic_subashab@quicinc.com> wrote:
>
> tp->scaling_ratio is not updated based on skb->len/skb->truesize once
> SO_RCVBUF is set leading to the maximum window scaling to be 25% of
> rcvbuf after
> commit dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale")
> and 50% of rcvbuf after
> commit 697a6c8cec03 ("tcp: increase the default TCP scaling ratio").
> 50% tries to emulate the behavior of older kernels using
> sysctl_tcp_adv_win_scale with default value.
>
> Systems which were using a different values of sysctl_tcp_adv_win_scale
> in older kernels ended up seeing reduced download speeds in certain
> cases as covered in https://lists.openwall.net/netdev/2024/05/15/13
> While the sysctl scheme is no longer acceptable, the value of 50% is
> a bit conservative when the skb->len/skb->truesize ratio is later
> determined to be ~0.66.
>
> Applications not specifying SO_RCVBUF update the window scaling and
> the receiver buffer every time data is copied to userspace. This
> computation is now used for applications setting SO_RCVBUF to update
> the maximum window scaling while ensuring that the receive buffer
> is within the application specified limit.
>
> Fixes: dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale")
> Signed-off-by: Sean Tranchetti <quic_stranche@quicinc.com>
> Signed-off-by: Subash Abhinov Kasiviswanathan <quic_subashab@quicinc.com>

Reviewed-by: Eric Dumazet <edumazet@google.com>

Thanks.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net v2] tcp: Adjust clamping window for applications specifying SO_RCVBUF
  2024-07-26 20:41 [PATCH net v2] tcp: Adjust clamping window for applications specifying SO_RCVBUF Subash Abhinov Kasiviswanathan
  2024-07-29  6:59 ` Eric Dumazet
@ 2024-07-29 11:24 ` patchwork-bot+netdevbpf
  2024-07-29 14:51 ` Neal Cardwell
  2 siblings, 0 replies; 8+ messages in thread
From: patchwork-bot+netdevbpf @ 2024-07-29 11:24 UTC (permalink / raw)
  To: Subash Abhinov Kasiviswanathan
  Cc: edumazet, soheil, ncardwell, yyd, ycheng, davem, kuba, netdev,
	dsahern, pabeni, quic_stranche

Hello:

This patch was applied to netdev/net.git (main)
by David S. Miller <davem@davemloft.net>:

On Fri, 26 Jul 2024 13:41:05 -0700 you wrote:
> tp->scaling_ratio is not updated based on skb->len/skb->truesize once
> SO_RCVBUF is set leading to the maximum window scaling to be 25% of
> rcvbuf after
> commit dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale")
> and 50% of rcvbuf after
> commit 697a6c8cec03 ("tcp: increase the default TCP scaling ratio").
> 50% tries to emulate the behavior of older kernels using
> sysctl_tcp_adv_win_scale with default value.
> 
> [...]

Here is the summary with links:
  - [net,v2] tcp: Adjust clamping window for applications specifying SO_RCVBUF
    https://git.kernel.org/netdev/net/c/05f76b2d634e

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net v2] tcp: Adjust clamping window for applications specifying SO_RCVBUF
  2024-07-26 20:41 [PATCH net v2] tcp: Adjust clamping window for applications specifying SO_RCVBUF Subash Abhinov Kasiviswanathan
  2024-07-29  6:59 ` Eric Dumazet
  2024-07-29 11:24 ` patchwork-bot+netdevbpf
@ 2024-07-29 14:51 ` Neal Cardwell
  2024-07-29 15:19   ` Eric Dumazet
  2 siblings, 1 reply; 8+ messages in thread
From: Neal Cardwell @ 2024-07-29 14:51 UTC (permalink / raw)
  To: Subash Abhinov Kasiviswanathan
  Cc: edumazet, soheil, yyd, ycheng, davem, kuba, netdev, dsahern,
	pabeni, Sean Tranchetti

On Fri, Jul 26, 2024 at 4:41 PM Subash Abhinov Kasiviswanathan
<quic_subashab@quicinc.com> wrote:
>
> tp->scaling_ratio is not updated based on skb->len/skb->truesize once
> SO_RCVBUF is set leading to the maximum window scaling to be 25% of
> rcvbuf after
> commit dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale")
> and 50% of rcvbuf after
> commit 697a6c8cec03 ("tcp: increase the default TCP scaling ratio").
> 50% tries to emulate the behavior of older kernels using
> sysctl_tcp_adv_win_scale with default value.
>
> Systems which were using a different values of sysctl_tcp_adv_win_scale
> in older kernels ended up seeing reduced download speeds in certain
> cases as covered in https://lists.openwall.net/netdev/2024/05/15/13
> While the sysctl scheme is no longer acceptable, the value of 50% is
> a bit conservative when the skb->len/skb->truesize ratio is later
> determined to be ~0.66.
>
> Applications not specifying SO_RCVBUF update the window scaling and
> the receiver buffer every time data is copied to userspace. This
> computation is now used for applications setting SO_RCVBUF to update
> the maximum window scaling while ensuring that the receive buffer
> is within the application specified limit.
>
> Fixes: dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale")
> Signed-off-by: Sean Tranchetti <quic_stranche@quicinc.com>
> Signed-off-by: Subash Abhinov Kasiviswanathan <quic_subashab@quicinc.com>
> ---
> v1 -> v2
>   Update the condition for SO_RCVBUF window_clamp updates to always
>   monitor the current rcvbuf value as suggested by Eric.
>
>  net/ipv4/tcp_input.c | 23 ++++++++++++++++-------
>  1 file changed, 16 insertions(+), 7 deletions(-)
>
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 454362e359da..e2b9583ed96a 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -754,8 +754,7 @@ void tcp_rcv_space_adjust(struct sock *sk)
>          * <prev RTT . ><current RTT .. ><next RTT .... >
>          */
>
> -       if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_moderate_rcvbuf) &&
> -           !(sk->sk_userlocks & SOCK_RCVBUF_LOCK)) {
> +       if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_moderate_rcvbuf)) {
>                 u64 rcvwin, grow;
>                 int rcvbuf;
>
> @@ -771,12 +770,22 @@ void tcp_rcv_space_adjust(struct sock *sk)
>
>                 rcvbuf = min_t(u64, tcp_space_from_win(sk, rcvwin),
>                                READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_rmem[2]));
> -               if (rcvbuf > sk->sk_rcvbuf) {
> -                       WRITE_ONCE(sk->sk_rcvbuf, rcvbuf);
> +               if (!(sk->sk_userlocks & SOCK_RCVBUF_LOCK)) {
> +                       if (rcvbuf > sk->sk_rcvbuf) {
> +                               WRITE_ONCE(sk->sk_rcvbuf, rcvbuf);
>
> -                       /* Make the window clamp follow along.  */
> -                       WRITE_ONCE(tp->window_clamp,
> -                                  tcp_win_from_space(sk, rcvbuf));
> +                               /* Make the window clamp follow along.  */
> +                               WRITE_ONCE(tp->window_clamp,
> +                                          tcp_win_from_space(sk, rcvbuf));
> +                       }
> +               } else {
> +                       /* Make the window clamp follow along while being bounded
> +                        * by SO_RCVBUF.
> +                        */
> +                       int clamp = tcp_win_from_space(sk, min(rcvbuf, sk->sk_rcvbuf));
> +
> +                       if (clamp > tp->window_clamp)
> +                               WRITE_ONCE(tp->window_clamp, clamp);
>                 }
>         }
>         tp->rcvq_space.space = copied;
> --

Is this the correct place to put this new code to update
tp->window_clamp? AFAICT it's not the correct place.

If a system administrator has disabled receive buffer autotuning by
setting `sysctl net.ipv4.tcp_moderate_rcvbuf=0`, or if (copied <=
tp->rcvq_space.space), then TCP connections will not reach this new
code, and the window_clamp will not be adjusted, and the receive
window will still be too low.

Even if a system administrator has disabled receive buffer autotuning
by setting `sysctl net.ipv4.tcp_moderate_rcvbuf=0`, or even if (copied
<= tp->rcvq_space.space), AFAICT we still want the correct receive
window value for whatever sk->sk_rcvbuf we have, based on the correct
tp->scaling_ratio.

So AFAICT the correct place to put this kind of logic is in
tcp_measure_rcv_mss(). If we compute a new scaling_ratio and it's
different than tp->scaling_ratio, then it seems we should compute a
new window_clamp value using sk->sk_rcvbuf, and if the new
window_clamp value is different then we should WRITE_ONCE that value
into tp->window_clamp.

That way we can have the correct tp->window_clamp, no matter the value
of net.ipv4.tcp_moderate_rcvbuf, and even if (copied <=
tp->rcvq_space.space).

How does that sound?

neal

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net v2] tcp: Adjust clamping window for applications specifying SO_RCVBUF
  2024-07-29 14:51 ` Neal Cardwell
@ 2024-07-29 15:19   ` Eric Dumazet
  2024-07-29 15:32     ` Neal Cardwell
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2024-07-29 15:19 UTC (permalink / raw)
  To: Neal Cardwell
  Cc: Subash Abhinov Kasiviswanathan, soheil, yyd, ycheng, davem, kuba,
	netdev, dsahern, pabeni, Sean Tranchetti

On Mon, Jul 29, 2024 at 4:51 PM Neal Cardwell <ncardwell@google.com> wrote:
>
> On Fri, Jul 26, 2024 at 4:41 PM Subash Abhinov Kasiviswanathan
> <quic_subashab@quicinc.com> wrote:
> >
> > tp->scaling_ratio is not updated based on skb->len/skb->truesize once
> > SO_RCVBUF is set leading to the maximum window scaling to be 25% of
> > rcvbuf after
> > commit dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale")
> > and 50% of rcvbuf after
> > commit 697a6c8cec03 ("tcp: increase the default TCP scaling ratio").
> > 50% tries to emulate the behavior of older kernels using
> > sysctl_tcp_adv_win_scale with default value.
> >
> > Systems which were using a different values of sysctl_tcp_adv_win_scale
> > in older kernels ended up seeing reduced download speeds in certain
> > cases as covered in https://lists.openwall.net/netdev/2024/05/15/13
> > While the sysctl scheme is no longer acceptable, the value of 50% is
> > a bit conservative when the skb->len/skb->truesize ratio is later
> > determined to be ~0.66.
> >
> > Applications not specifying SO_RCVBUF update the window scaling and
> > the receiver buffer every time data is copied to userspace. This
> > computation is now used for applications setting SO_RCVBUF to update
> > the maximum window scaling while ensuring that the receive buffer
> > is within the application specified limit.
> >
> > Fixes: dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale")
> > Signed-off-by: Sean Tranchetti <quic_stranche@quicinc.com>
> > Signed-off-by: Subash Abhinov Kasiviswanathan <quic_subashab@quicinc.com>
> > ---
> > v1 -> v2
> >   Update the condition for SO_RCVBUF window_clamp updates to always
> >   monitor the current rcvbuf value as suggested by Eric.
> >
> >  net/ipv4/tcp_input.c | 23 ++++++++++++++++-------
> >  1 file changed, 16 insertions(+), 7 deletions(-)
> >
> > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> > index 454362e359da..e2b9583ed96a 100644
> > --- a/net/ipv4/tcp_input.c
> > +++ b/net/ipv4/tcp_input.c
> > @@ -754,8 +754,7 @@ void tcp_rcv_space_adjust(struct sock *sk)
> >          * <prev RTT . ><current RTT .. ><next RTT .... >
> >          */
> >
> > -       if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_moderate_rcvbuf) &&
> > -           !(sk->sk_userlocks & SOCK_RCVBUF_LOCK)) {
> > +       if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_moderate_rcvbuf)) {
> >                 u64 rcvwin, grow;
> >                 int rcvbuf;
> >
> > @@ -771,12 +770,22 @@ void tcp_rcv_space_adjust(struct sock *sk)
> >
> >                 rcvbuf = min_t(u64, tcp_space_from_win(sk, rcvwin),
> >                                READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_rmem[2]));
> > -               if (rcvbuf > sk->sk_rcvbuf) {
> > -                       WRITE_ONCE(sk->sk_rcvbuf, rcvbuf);
> > +               if (!(sk->sk_userlocks & SOCK_RCVBUF_LOCK)) {
> > +                       if (rcvbuf > sk->sk_rcvbuf) {
> > +                               WRITE_ONCE(sk->sk_rcvbuf, rcvbuf);
> >
> > -                       /* Make the window clamp follow along.  */
> > -                       WRITE_ONCE(tp->window_clamp,
> > -                                  tcp_win_from_space(sk, rcvbuf));
> > +                               /* Make the window clamp follow along.  */
> > +                               WRITE_ONCE(tp->window_clamp,
> > +                                          tcp_win_from_space(sk, rcvbuf));
> > +                       }
> > +               } else {
> > +                       /* Make the window clamp follow along while being bounded
> > +                        * by SO_RCVBUF.
> > +                        */
> > +                       int clamp = tcp_win_from_space(sk, min(rcvbuf, sk->sk_rcvbuf));
> > +
> > +                       if (clamp > tp->window_clamp)
> > +                               WRITE_ONCE(tp->window_clamp, clamp);
> >                 }
> >         }
> >         tp->rcvq_space.space = copied;
> > --
>
> Is this the correct place to put this new code to update
> tp->window_clamp? AFAICT it's not the correct place.
>
> If a system administrator has disabled receive buffer autotuning by
> setting `sysctl net.ipv4.tcp_moderate_rcvbuf=0`, or if (copied <=
> tp->rcvq_space.space), then TCP connections will not reach this new
> code, and the window_clamp will not be adjusted, and the receive
> window will still be too low.
>
> Even if a system administrator has disabled receive buffer autotuning
> by setting `sysctl net.ipv4.tcp_moderate_rcvbuf=0`, or even if (copied
> <= tp->rcvq_space.space), AFAICT we still want the correct receive
> window value for whatever sk->sk_rcvbuf we have, based on the correct
> tp->scaling_ratio.
>
> So AFAICT the correct place to put this kind of logic is in
> tcp_measure_rcv_mss(). If we compute a new scaling_ratio and it's
> different than tp->scaling_ratio, then it seems we should compute a
> new window_clamp value using sk->sk_rcvbuf, and if the new
> window_clamp value is different then we should WRITE_ONCE that value
> into tp->window_clamp.
>
> That way we can have the correct tp->window_clamp, no matter the value
> of net.ipv4.tcp_moderate_rcvbuf, and even if (copied <=
> tp->rcvq_space.space).
>
> How does that sound?

Can this be done without adding new code in the fast path ?

Otherwise, I feel that we send a wrong signal to 'administrators' :
"We will maintain code to make sure that wrong sysctls settings were
not so wrong."

Are you aware of anyone changing net.ipv4.tcp_moderate_rcvbuf for any
valid reason ?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net v2] tcp: Adjust clamping window for applications specifying SO_RCVBUF
  2024-07-29 15:19   ` Eric Dumazet
@ 2024-07-29 15:32     ` Neal Cardwell
  2024-07-29 16:07       ` Eric Dumazet
  2024-08-14  2:28       ` Jason Xing
  0 siblings, 2 replies; 8+ messages in thread
From: Neal Cardwell @ 2024-07-29 15:32 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Subash Abhinov Kasiviswanathan, soheil, yyd, ycheng, davem, kuba,
	netdev, dsahern, pabeni, Sean Tranchetti

On Mon, Jul 29, 2024 at 11:19 AM Eric Dumazet <edumazet@google.com> wrote:
>
> On Mon, Jul 29, 2024 at 4:51 PM Neal Cardwell <ncardwell@google.com> wrote:
> >
> > On Fri, Jul 26, 2024 at 4:41 PM Subash Abhinov Kasiviswanathan
> > <quic_subashab@quicinc.com> wrote:
> > >
> > > tp->scaling_ratio is not updated based on skb->len/skb->truesize once
> > > SO_RCVBUF is set leading to the maximum window scaling to be 25% of
> > > rcvbuf after
> > > commit dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale")
> > > and 50% of rcvbuf after
> > > commit 697a6c8cec03 ("tcp: increase the default TCP scaling ratio").
> > > 50% tries to emulate the behavior of older kernels using
> > > sysctl_tcp_adv_win_scale with default value.
> > >
> > > Systems which were using a different values of sysctl_tcp_adv_win_scale
> > > in older kernels ended up seeing reduced download speeds in certain
> > > cases as covered in https://lists.openwall.net/netdev/2024/05/15/13
> > > While the sysctl scheme is no longer acceptable, the value of 50% is
> > > a bit conservative when the skb->len/skb->truesize ratio is later
> > > determined to be ~0.66.
> > >
> > > Applications not specifying SO_RCVBUF update the window scaling and
> > > the receiver buffer every time data is copied to userspace. This
> > > computation is now used for applications setting SO_RCVBUF to update
> > > the maximum window scaling while ensuring that the receive buffer
> > > is within the application specified limit.
> > >
> > > Fixes: dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale")
> > > Signed-off-by: Sean Tranchetti <quic_stranche@quicinc.com>
> > > Signed-off-by: Subash Abhinov Kasiviswanathan <quic_subashab@quicinc.com>
> > > ---
> > > v1 -> v2
> > >   Update the condition for SO_RCVBUF window_clamp updates to always
> > >   monitor the current rcvbuf value as suggested by Eric.
> > >
> > >  net/ipv4/tcp_input.c | 23 ++++++++++++++++-------
> > >  1 file changed, 16 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> > > index 454362e359da..e2b9583ed96a 100644
> > > --- a/net/ipv4/tcp_input.c
> > > +++ b/net/ipv4/tcp_input.c
> > > @@ -754,8 +754,7 @@ void tcp_rcv_space_adjust(struct sock *sk)
> > >          * <prev RTT . ><current RTT .. ><next RTT .... >
> > >          */
> > >
> > > -       if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_moderate_rcvbuf) &&
> > > -           !(sk->sk_userlocks & SOCK_RCVBUF_LOCK)) {
> > > +       if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_moderate_rcvbuf)) {
> > >                 u64 rcvwin, grow;
> > >                 int rcvbuf;
> > >
> > > @@ -771,12 +770,22 @@ void tcp_rcv_space_adjust(struct sock *sk)
> > >
> > >                 rcvbuf = min_t(u64, tcp_space_from_win(sk, rcvwin),
> > >                                READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_rmem[2]));
> > > -               if (rcvbuf > sk->sk_rcvbuf) {
> > > -                       WRITE_ONCE(sk->sk_rcvbuf, rcvbuf);
> > > +               if (!(sk->sk_userlocks & SOCK_RCVBUF_LOCK)) {
> > > +                       if (rcvbuf > sk->sk_rcvbuf) {
> > > +                               WRITE_ONCE(sk->sk_rcvbuf, rcvbuf);
> > >
> > > -                       /* Make the window clamp follow along.  */
> > > -                       WRITE_ONCE(tp->window_clamp,
> > > -                                  tcp_win_from_space(sk, rcvbuf));
> > > +                               /* Make the window clamp follow along.  */
> > > +                               WRITE_ONCE(tp->window_clamp,
> > > +                                          tcp_win_from_space(sk, rcvbuf));
> > > +                       }
> > > +               } else {
> > > +                       /* Make the window clamp follow along while being bounded
> > > +                        * by SO_RCVBUF.
> > > +                        */
> > > +                       int clamp = tcp_win_from_space(sk, min(rcvbuf, sk->sk_rcvbuf));
> > > +
> > > +                       if (clamp > tp->window_clamp)
> > > +                               WRITE_ONCE(tp->window_clamp, clamp);
> > >                 }
> > >         }
> > >         tp->rcvq_space.space = copied;
> > > --
> >
> > Is this the correct place to put this new code to update
> > tp->window_clamp? AFAICT it's not the correct place.
> >
> > If a system administrator has disabled receive buffer autotuning by
> > setting `sysctl net.ipv4.tcp_moderate_rcvbuf=0`, or if (copied <=
> > tp->rcvq_space.space), then TCP connections will not reach this new
> > code, and the window_clamp will not be adjusted, and the receive
> > window will still be too low.
> >
> > Even if a system administrator has disabled receive buffer autotuning
> > by setting `sysctl net.ipv4.tcp_moderate_rcvbuf=0`, or even if (copied
> > <= tp->rcvq_space.space), AFAICT we still want the correct receive
> > window value for whatever sk->sk_rcvbuf we have, based on the correct
> > tp->scaling_ratio.
> >
> > So AFAICT the correct place to put this kind of logic is in
> > tcp_measure_rcv_mss(). If we compute a new scaling_ratio and it's
> > different than tp->scaling_ratio, then it seems we should compute a
> > new window_clamp value using sk->sk_rcvbuf, and if the new
> > window_clamp value is different then we should WRITE_ONCE that value
> > into tp->window_clamp.
> >
> > That way we can have the correct tp->window_clamp, no matter the value
> > of net.ipv4.tcp_moderate_rcvbuf, and even if (copied <=
> > tp->rcvq_space.space).
> >
> > How does that sound?
>
> Can this be done without adding new code in the fast path ?

I was imagining that the code would not really be in the fast path,
because it would move to the spot in tcp_measure_rcv_mss() where the
segment length "len" is greater than icsk->icsk_ack.rcv_mss. I imagine
that should be rare, and we are already doing a somewhat expensive
do_div() call there, so I was imagining that the additional cost would
be relatively rare and small?

> Otherwise, I feel that we send a wrong signal to 'administrators' :
> "We will maintain code to make sure that wrong sysctls settings were
> not so wrong."
>
> Are you aware of anyone changing net.ipv4.tcp_moderate_rcvbuf for any
> valid reason ?

No, I'm not aware of any valid reason to disable
net.ipv4.tcp_moderate_rcvbuf. :-)

Even if tcp_moderate_rcvbuf is enabled, AFAICT there is still the
issue that if (copied <= tp->rcvq_space.space) we will not get to this
code...

neal

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net v2] tcp: Adjust clamping window for applications specifying SO_RCVBUF
  2024-07-29 15:32     ` Neal Cardwell
@ 2024-07-29 16:07       ` Eric Dumazet
  2024-08-14  2:28       ` Jason Xing
  1 sibling, 0 replies; 8+ messages in thread
From: Eric Dumazet @ 2024-07-29 16:07 UTC (permalink / raw)
  To: Neal Cardwell
  Cc: Subash Abhinov Kasiviswanathan, soheil, yyd, ycheng, davem, kuba,
	netdev, dsahern, pabeni, Sean Tranchetti

On Mon, Jul 29, 2024 at 5:33 PM Neal Cardwell <ncardwell@google.com> wrote:

> I was imagining that the code would not really be in the fast path,
> because it would move to the spot in tcp_measure_rcv_mss() where the
> segment length "len" is greater than icsk->icsk_ack.rcv_mss. I imagine
> that should be rare, and we are already doing a somewhat expensive
> do_div() call there, so I was imagining that the additional cost would
> be relatively rare and small?

Let's see your patch :)

Thanks !

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net v2] tcp: Adjust clamping window for applications specifying SO_RCVBUF
  2024-07-29 15:32     ` Neal Cardwell
  2024-07-29 16:07       ` Eric Dumazet
@ 2024-08-14  2:28       ` Jason Xing
  1 sibling, 0 replies; 8+ messages in thread
From: Jason Xing @ 2024-08-14  2:28 UTC (permalink / raw)
  To: Neal Cardwell
  Cc: Eric Dumazet, Subash Abhinov Kasiviswanathan, soheil, yyd, ycheng,
	davem, kuba, netdev, dsahern, pabeni, Sean Tranchetti

Hello Neal,

> > Otherwise, I feel that we send a wrong signal to 'administrators' :
> > "We will maintain code to make sure that wrong sysctls settings were
> > not so wrong."
> >
> > Are you aware of anyone changing net.ipv4.tcp_moderate_rcvbuf for any
> > valid reason ?
>
> No, I'm not aware of any valid reason to disable
> net.ipv4.tcp_moderate_rcvbuf. :-)

I was also curious about why this sysctl knob was useful a long time
ago? I don't see any good in it (for many years, we haven't touched
it, setting it to 1 as default). Since you maintainers don't think
it's a good one, could we mark it as deprecated and remove this
sysctl?

Thanks,
Jason

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-08-14  2:28 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-26 20:41 [PATCH net v2] tcp: Adjust clamping window for applications specifying SO_RCVBUF Subash Abhinov Kasiviswanathan
2024-07-29  6:59 ` Eric Dumazet
2024-07-29 11:24 ` patchwork-bot+netdevbpf
2024-07-29 14:51 ` Neal Cardwell
2024-07-29 15:19   ` Eric Dumazet
2024-07-29 15:32     ` Neal Cardwell
2024-07-29 16:07       ` Eric Dumazet
2024-08-14  2:28       ` Jason Xing

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).