* [PATCH net v2] tcp: Adjust clamping window for applications specifying SO_RCVBUF
@ 2024-07-26 20:41 Subash Abhinov Kasiviswanathan
2024-07-29 6:59 ` Eric Dumazet
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Subash Abhinov Kasiviswanathan @ 2024-07-26 20:41 UTC (permalink / raw)
To: edumazet, soheil, ncardwell, yyd, ycheng, davem, kuba, netdev,
dsahern, pabeni
Cc: Subash Abhinov Kasiviswanathan, Sean Tranchetti
tp->scaling_ratio is not updated based on skb->len/skb->truesize once
SO_RCVBUF is set leading to the maximum window scaling to be 25% of
rcvbuf after
commit dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale")
and 50% of rcvbuf after
commit 697a6c8cec03 ("tcp: increase the default TCP scaling ratio").
50% tries to emulate the behavior of older kernels using
sysctl_tcp_adv_win_scale with default value.
Systems which were using a different values of sysctl_tcp_adv_win_scale
in older kernels ended up seeing reduced download speeds in certain
cases as covered in https://lists.openwall.net/netdev/2024/05/15/13
While the sysctl scheme is no longer acceptable, the value of 50% is
a bit conservative when the skb->len/skb->truesize ratio is later
determined to be ~0.66.
Applications not specifying SO_RCVBUF update the window scaling and
the receiver buffer every time data is copied to userspace. This
computation is now used for applications setting SO_RCVBUF to update
the maximum window scaling while ensuring that the receive buffer
is within the application specified limit.
Fixes: dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale")
Signed-off-by: Sean Tranchetti <quic_stranche@quicinc.com>
Signed-off-by: Subash Abhinov Kasiviswanathan <quic_subashab@quicinc.com>
---
v1 -> v2
Update the condition for SO_RCVBUF window_clamp updates to always
monitor the current rcvbuf value as suggested by Eric.
net/ipv4/tcp_input.c | 23 ++++++++++++++++-------
1 file changed, 16 insertions(+), 7 deletions(-)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 454362e359da..e2b9583ed96a 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -754,8 +754,7 @@ void tcp_rcv_space_adjust(struct sock *sk)
* <prev RTT . ><current RTT .. ><next RTT .... >
*/
- if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_moderate_rcvbuf) &&
- !(sk->sk_userlocks & SOCK_RCVBUF_LOCK)) {
+ if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_moderate_rcvbuf)) {
u64 rcvwin, grow;
int rcvbuf;
@@ -771,12 +770,22 @@ void tcp_rcv_space_adjust(struct sock *sk)
rcvbuf = min_t(u64, tcp_space_from_win(sk, rcvwin),
READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_rmem[2]));
- if (rcvbuf > sk->sk_rcvbuf) {
- WRITE_ONCE(sk->sk_rcvbuf, rcvbuf);
+ if (!(sk->sk_userlocks & SOCK_RCVBUF_LOCK)) {
+ if (rcvbuf > sk->sk_rcvbuf) {
+ WRITE_ONCE(sk->sk_rcvbuf, rcvbuf);
- /* Make the window clamp follow along. */
- WRITE_ONCE(tp->window_clamp,
- tcp_win_from_space(sk, rcvbuf));
+ /* Make the window clamp follow along. */
+ WRITE_ONCE(tp->window_clamp,
+ tcp_win_from_space(sk, rcvbuf));
+ }
+ } else {
+ /* Make the window clamp follow along while being bounded
+ * by SO_RCVBUF.
+ */
+ int clamp = tcp_win_from_space(sk, min(rcvbuf, sk->sk_rcvbuf));
+
+ if (clamp > tp->window_clamp)
+ WRITE_ONCE(tp->window_clamp, clamp);
}
}
tp->rcvq_space.space = copied;
--
2.34.1
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: [PATCH net v2] tcp: Adjust clamping window for applications specifying SO_RCVBUF 2024-07-26 20:41 [PATCH net v2] tcp: Adjust clamping window for applications specifying SO_RCVBUF Subash Abhinov Kasiviswanathan @ 2024-07-29 6:59 ` Eric Dumazet 2024-07-29 11:24 ` patchwork-bot+netdevbpf 2024-07-29 14:51 ` Neal Cardwell 2 siblings, 0 replies; 8+ messages in thread From: Eric Dumazet @ 2024-07-29 6:59 UTC (permalink / raw) To: Subash Abhinov Kasiviswanathan Cc: soheil, ncardwell, yyd, ycheng, davem, kuba, netdev, dsahern, pabeni, Sean Tranchetti On Fri, Jul 26, 2024 at 10:41 PM Subash Abhinov Kasiviswanathan <quic_subashab@quicinc.com> wrote: > > tp->scaling_ratio is not updated based on skb->len/skb->truesize once > SO_RCVBUF is set leading to the maximum window scaling to be 25% of > rcvbuf after > commit dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale") > and 50% of rcvbuf after > commit 697a6c8cec03 ("tcp: increase the default TCP scaling ratio"). > 50% tries to emulate the behavior of older kernels using > sysctl_tcp_adv_win_scale with default value. > > Systems which were using a different values of sysctl_tcp_adv_win_scale > in older kernels ended up seeing reduced download speeds in certain > cases as covered in https://lists.openwall.net/netdev/2024/05/15/13 > While the sysctl scheme is no longer acceptable, the value of 50% is > a bit conservative when the skb->len/skb->truesize ratio is later > determined to be ~0.66. > > Applications not specifying SO_RCVBUF update the window scaling and > the receiver buffer every time data is copied to userspace. This > computation is now used for applications setting SO_RCVBUF to update > the maximum window scaling while ensuring that the receive buffer > is within the application specified limit. > > Fixes: dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale") > Signed-off-by: Sean Tranchetti <quic_stranche@quicinc.com> > Signed-off-by: Subash Abhinov Kasiviswanathan <quic_subashab@quicinc.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Thanks. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net v2] tcp: Adjust clamping window for applications specifying SO_RCVBUF 2024-07-26 20:41 [PATCH net v2] tcp: Adjust clamping window for applications specifying SO_RCVBUF Subash Abhinov Kasiviswanathan 2024-07-29 6:59 ` Eric Dumazet @ 2024-07-29 11:24 ` patchwork-bot+netdevbpf 2024-07-29 14:51 ` Neal Cardwell 2 siblings, 0 replies; 8+ messages in thread From: patchwork-bot+netdevbpf @ 2024-07-29 11:24 UTC (permalink / raw) To: Subash Abhinov Kasiviswanathan Cc: edumazet, soheil, ncardwell, yyd, ycheng, davem, kuba, netdev, dsahern, pabeni, quic_stranche Hello: This patch was applied to netdev/net.git (main) by David S. Miller <davem@davemloft.net>: On Fri, 26 Jul 2024 13:41:05 -0700 you wrote: > tp->scaling_ratio is not updated based on skb->len/skb->truesize once > SO_RCVBUF is set leading to the maximum window scaling to be 25% of > rcvbuf after > commit dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale") > and 50% of rcvbuf after > commit 697a6c8cec03 ("tcp: increase the default TCP scaling ratio"). > 50% tries to emulate the behavior of older kernels using > sysctl_tcp_adv_win_scale with default value. > > [...] Here is the summary with links: - [net,v2] tcp: Adjust clamping window for applications specifying SO_RCVBUF https://git.kernel.org/netdev/net/c/05f76b2d634e You are awesome, thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/patchwork/pwbot.html ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net v2] tcp: Adjust clamping window for applications specifying SO_RCVBUF 2024-07-26 20:41 [PATCH net v2] tcp: Adjust clamping window for applications specifying SO_RCVBUF Subash Abhinov Kasiviswanathan 2024-07-29 6:59 ` Eric Dumazet 2024-07-29 11:24 ` patchwork-bot+netdevbpf @ 2024-07-29 14:51 ` Neal Cardwell 2024-07-29 15:19 ` Eric Dumazet 2 siblings, 1 reply; 8+ messages in thread From: Neal Cardwell @ 2024-07-29 14:51 UTC (permalink / raw) To: Subash Abhinov Kasiviswanathan Cc: edumazet, soheil, yyd, ycheng, davem, kuba, netdev, dsahern, pabeni, Sean Tranchetti On Fri, Jul 26, 2024 at 4:41 PM Subash Abhinov Kasiviswanathan <quic_subashab@quicinc.com> wrote: > > tp->scaling_ratio is not updated based on skb->len/skb->truesize once > SO_RCVBUF is set leading to the maximum window scaling to be 25% of > rcvbuf after > commit dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale") > and 50% of rcvbuf after > commit 697a6c8cec03 ("tcp: increase the default TCP scaling ratio"). > 50% tries to emulate the behavior of older kernels using > sysctl_tcp_adv_win_scale with default value. > > Systems which were using a different values of sysctl_tcp_adv_win_scale > in older kernels ended up seeing reduced download speeds in certain > cases as covered in https://lists.openwall.net/netdev/2024/05/15/13 > While the sysctl scheme is no longer acceptable, the value of 50% is > a bit conservative when the skb->len/skb->truesize ratio is later > determined to be ~0.66. > > Applications not specifying SO_RCVBUF update the window scaling and > the receiver buffer every time data is copied to userspace. This > computation is now used for applications setting SO_RCVBUF to update > the maximum window scaling while ensuring that the receive buffer > is within the application specified limit. > > Fixes: dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale") > Signed-off-by: Sean Tranchetti <quic_stranche@quicinc.com> > Signed-off-by: Subash Abhinov Kasiviswanathan <quic_subashab@quicinc.com> > --- > v1 -> v2 > Update the condition for SO_RCVBUF window_clamp updates to always > monitor the current rcvbuf value as suggested by Eric. > > net/ipv4/tcp_input.c | 23 ++++++++++++++++------- > 1 file changed, 16 insertions(+), 7 deletions(-) > > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c > index 454362e359da..e2b9583ed96a 100644 > --- a/net/ipv4/tcp_input.c > +++ b/net/ipv4/tcp_input.c > @@ -754,8 +754,7 @@ void tcp_rcv_space_adjust(struct sock *sk) > * <prev RTT . ><current RTT .. ><next RTT .... > > */ > > - if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_moderate_rcvbuf) && > - !(sk->sk_userlocks & SOCK_RCVBUF_LOCK)) { > + if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_moderate_rcvbuf)) { > u64 rcvwin, grow; > int rcvbuf; > > @@ -771,12 +770,22 @@ void tcp_rcv_space_adjust(struct sock *sk) > > rcvbuf = min_t(u64, tcp_space_from_win(sk, rcvwin), > READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_rmem[2])); > - if (rcvbuf > sk->sk_rcvbuf) { > - WRITE_ONCE(sk->sk_rcvbuf, rcvbuf); > + if (!(sk->sk_userlocks & SOCK_RCVBUF_LOCK)) { > + if (rcvbuf > sk->sk_rcvbuf) { > + WRITE_ONCE(sk->sk_rcvbuf, rcvbuf); > > - /* Make the window clamp follow along. */ > - WRITE_ONCE(tp->window_clamp, > - tcp_win_from_space(sk, rcvbuf)); > + /* Make the window clamp follow along. */ > + WRITE_ONCE(tp->window_clamp, > + tcp_win_from_space(sk, rcvbuf)); > + } > + } else { > + /* Make the window clamp follow along while being bounded > + * by SO_RCVBUF. > + */ > + int clamp = tcp_win_from_space(sk, min(rcvbuf, sk->sk_rcvbuf)); > + > + if (clamp > tp->window_clamp) > + WRITE_ONCE(tp->window_clamp, clamp); > } > } > tp->rcvq_space.space = copied; > -- Is this the correct place to put this new code to update tp->window_clamp? AFAICT it's not the correct place. If a system administrator has disabled receive buffer autotuning by setting `sysctl net.ipv4.tcp_moderate_rcvbuf=0`, or if (copied <= tp->rcvq_space.space), then TCP connections will not reach this new code, and the window_clamp will not be adjusted, and the receive window will still be too low. Even if a system administrator has disabled receive buffer autotuning by setting `sysctl net.ipv4.tcp_moderate_rcvbuf=0`, or even if (copied <= tp->rcvq_space.space), AFAICT we still want the correct receive window value for whatever sk->sk_rcvbuf we have, based on the correct tp->scaling_ratio. So AFAICT the correct place to put this kind of logic is in tcp_measure_rcv_mss(). If we compute a new scaling_ratio and it's different than tp->scaling_ratio, then it seems we should compute a new window_clamp value using sk->sk_rcvbuf, and if the new window_clamp value is different then we should WRITE_ONCE that value into tp->window_clamp. That way we can have the correct tp->window_clamp, no matter the value of net.ipv4.tcp_moderate_rcvbuf, and even if (copied <= tp->rcvq_space.space). How does that sound? neal ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net v2] tcp: Adjust clamping window for applications specifying SO_RCVBUF 2024-07-29 14:51 ` Neal Cardwell @ 2024-07-29 15:19 ` Eric Dumazet 2024-07-29 15:32 ` Neal Cardwell 0 siblings, 1 reply; 8+ messages in thread From: Eric Dumazet @ 2024-07-29 15:19 UTC (permalink / raw) To: Neal Cardwell Cc: Subash Abhinov Kasiviswanathan, soheil, yyd, ycheng, davem, kuba, netdev, dsahern, pabeni, Sean Tranchetti On Mon, Jul 29, 2024 at 4:51 PM Neal Cardwell <ncardwell@google.com> wrote: > > On Fri, Jul 26, 2024 at 4:41 PM Subash Abhinov Kasiviswanathan > <quic_subashab@quicinc.com> wrote: > > > > tp->scaling_ratio is not updated based on skb->len/skb->truesize once > > SO_RCVBUF is set leading to the maximum window scaling to be 25% of > > rcvbuf after > > commit dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale") > > and 50% of rcvbuf after > > commit 697a6c8cec03 ("tcp: increase the default TCP scaling ratio"). > > 50% tries to emulate the behavior of older kernels using > > sysctl_tcp_adv_win_scale with default value. > > > > Systems which were using a different values of sysctl_tcp_adv_win_scale > > in older kernels ended up seeing reduced download speeds in certain > > cases as covered in https://lists.openwall.net/netdev/2024/05/15/13 > > While the sysctl scheme is no longer acceptable, the value of 50% is > > a bit conservative when the skb->len/skb->truesize ratio is later > > determined to be ~0.66. > > > > Applications not specifying SO_RCVBUF update the window scaling and > > the receiver buffer every time data is copied to userspace. This > > computation is now used for applications setting SO_RCVBUF to update > > the maximum window scaling while ensuring that the receive buffer > > is within the application specified limit. > > > > Fixes: dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale") > > Signed-off-by: Sean Tranchetti <quic_stranche@quicinc.com> > > Signed-off-by: Subash Abhinov Kasiviswanathan <quic_subashab@quicinc.com> > > --- > > v1 -> v2 > > Update the condition for SO_RCVBUF window_clamp updates to always > > monitor the current rcvbuf value as suggested by Eric. > > > > net/ipv4/tcp_input.c | 23 ++++++++++++++++------- > > 1 file changed, 16 insertions(+), 7 deletions(-) > > > > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c > > index 454362e359da..e2b9583ed96a 100644 > > --- a/net/ipv4/tcp_input.c > > +++ b/net/ipv4/tcp_input.c > > @@ -754,8 +754,7 @@ void tcp_rcv_space_adjust(struct sock *sk) > > * <prev RTT . ><current RTT .. ><next RTT .... > > > */ > > > > - if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_moderate_rcvbuf) && > > - !(sk->sk_userlocks & SOCK_RCVBUF_LOCK)) { > > + if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_moderate_rcvbuf)) { > > u64 rcvwin, grow; > > int rcvbuf; > > > > @@ -771,12 +770,22 @@ void tcp_rcv_space_adjust(struct sock *sk) > > > > rcvbuf = min_t(u64, tcp_space_from_win(sk, rcvwin), > > READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_rmem[2])); > > - if (rcvbuf > sk->sk_rcvbuf) { > > - WRITE_ONCE(sk->sk_rcvbuf, rcvbuf); > > + if (!(sk->sk_userlocks & SOCK_RCVBUF_LOCK)) { > > + if (rcvbuf > sk->sk_rcvbuf) { > > + WRITE_ONCE(sk->sk_rcvbuf, rcvbuf); > > > > - /* Make the window clamp follow along. */ > > - WRITE_ONCE(tp->window_clamp, > > - tcp_win_from_space(sk, rcvbuf)); > > + /* Make the window clamp follow along. */ > > + WRITE_ONCE(tp->window_clamp, > > + tcp_win_from_space(sk, rcvbuf)); > > + } > > + } else { > > + /* Make the window clamp follow along while being bounded > > + * by SO_RCVBUF. > > + */ > > + int clamp = tcp_win_from_space(sk, min(rcvbuf, sk->sk_rcvbuf)); > > + > > + if (clamp > tp->window_clamp) > > + WRITE_ONCE(tp->window_clamp, clamp); > > } > > } > > tp->rcvq_space.space = copied; > > -- > > Is this the correct place to put this new code to update > tp->window_clamp? AFAICT it's not the correct place. > > If a system administrator has disabled receive buffer autotuning by > setting `sysctl net.ipv4.tcp_moderate_rcvbuf=0`, or if (copied <= > tp->rcvq_space.space), then TCP connections will not reach this new > code, and the window_clamp will not be adjusted, and the receive > window will still be too low. > > Even if a system administrator has disabled receive buffer autotuning > by setting `sysctl net.ipv4.tcp_moderate_rcvbuf=0`, or even if (copied > <= tp->rcvq_space.space), AFAICT we still want the correct receive > window value for whatever sk->sk_rcvbuf we have, based on the correct > tp->scaling_ratio. > > So AFAICT the correct place to put this kind of logic is in > tcp_measure_rcv_mss(). If we compute a new scaling_ratio and it's > different than tp->scaling_ratio, then it seems we should compute a > new window_clamp value using sk->sk_rcvbuf, and if the new > window_clamp value is different then we should WRITE_ONCE that value > into tp->window_clamp. > > That way we can have the correct tp->window_clamp, no matter the value > of net.ipv4.tcp_moderate_rcvbuf, and even if (copied <= > tp->rcvq_space.space). > > How does that sound? Can this be done without adding new code in the fast path ? Otherwise, I feel that we send a wrong signal to 'administrators' : "We will maintain code to make sure that wrong sysctls settings were not so wrong." Are you aware of anyone changing net.ipv4.tcp_moderate_rcvbuf for any valid reason ? ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net v2] tcp: Adjust clamping window for applications specifying SO_RCVBUF 2024-07-29 15:19 ` Eric Dumazet @ 2024-07-29 15:32 ` Neal Cardwell 2024-07-29 16:07 ` Eric Dumazet 2024-08-14 2:28 ` Jason Xing 0 siblings, 2 replies; 8+ messages in thread From: Neal Cardwell @ 2024-07-29 15:32 UTC (permalink / raw) To: Eric Dumazet Cc: Subash Abhinov Kasiviswanathan, soheil, yyd, ycheng, davem, kuba, netdev, dsahern, pabeni, Sean Tranchetti On Mon, Jul 29, 2024 at 11:19 AM Eric Dumazet <edumazet@google.com> wrote: > > On Mon, Jul 29, 2024 at 4:51 PM Neal Cardwell <ncardwell@google.com> wrote: > > > > On Fri, Jul 26, 2024 at 4:41 PM Subash Abhinov Kasiviswanathan > > <quic_subashab@quicinc.com> wrote: > > > > > > tp->scaling_ratio is not updated based on skb->len/skb->truesize once > > > SO_RCVBUF is set leading to the maximum window scaling to be 25% of > > > rcvbuf after > > > commit dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale") > > > and 50% of rcvbuf after > > > commit 697a6c8cec03 ("tcp: increase the default TCP scaling ratio"). > > > 50% tries to emulate the behavior of older kernels using > > > sysctl_tcp_adv_win_scale with default value. > > > > > > Systems which were using a different values of sysctl_tcp_adv_win_scale > > > in older kernels ended up seeing reduced download speeds in certain > > > cases as covered in https://lists.openwall.net/netdev/2024/05/15/13 > > > While the sysctl scheme is no longer acceptable, the value of 50% is > > > a bit conservative when the skb->len/skb->truesize ratio is later > > > determined to be ~0.66. > > > > > > Applications not specifying SO_RCVBUF update the window scaling and > > > the receiver buffer every time data is copied to userspace. This > > > computation is now used for applications setting SO_RCVBUF to update > > > the maximum window scaling while ensuring that the receive buffer > > > is within the application specified limit. > > > > > > Fixes: dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale") > > > Signed-off-by: Sean Tranchetti <quic_stranche@quicinc.com> > > > Signed-off-by: Subash Abhinov Kasiviswanathan <quic_subashab@quicinc.com> > > > --- > > > v1 -> v2 > > > Update the condition for SO_RCVBUF window_clamp updates to always > > > monitor the current rcvbuf value as suggested by Eric. > > > > > > net/ipv4/tcp_input.c | 23 ++++++++++++++++------- > > > 1 file changed, 16 insertions(+), 7 deletions(-) > > > > > > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c > > > index 454362e359da..e2b9583ed96a 100644 > > > --- a/net/ipv4/tcp_input.c > > > +++ b/net/ipv4/tcp_input.c > > > @@ -754,8 +754,7 @@ void tcp_rcv_space_adjust(struct sock *sk) > > > * <prev RTT . ><current RTT .. ><next RTT .... > > > > */ > > > > > > - if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_moderate_rcvbuf) && > > > - !(sk->sk_userlocks & SOCK_RCVBUF_LOCK)) { > > > + if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_moderate_rcvbuf)) { > > > u64 rcvwin, grow; > > > int rcvbuf; > > > > > > @@ -771,12 +770,22 @@ void tcp_rcv_space_adjust(struct sock *sk) > > > > > > rcvbuf = min_t(u64, tcp_space_from_win(sk, rcvwin), > > > READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_rmem[2])); > > > - if (rcvbuf > sk->sk_rcvbuf) { > > > - WRITE_ONCE(sk->sk_rcvbuf, rcvbuf); > > > + if (!(sk->sk_userlocks & SOCK_RCVBUF_LOCK)) { > > > + if (rcvbuf > sk->sk_rcvbuf) { > > > + WRITE_ONCE(sk->sk_rcvbuf, rcvbuf); > > > > > > - /* Make the window clamp follow along. */ > > > - WRITE_ONCE(tp->window_clamp, > > > - tcp_win_from_space(sk, rcvbuf)); > > > + /* Make the window clamp follow along. */ > > > + WRITE_ONCE(tp->window_clamp, > > > + tcp_win_from_space(sk, rcvbuf)); > > > + } > > > + } else { > > > + /* Make the window clamp follow along while being bounded > > > + * by SO_RCVBUF. > > > + */ > > > + int clamp = tcp_win_from_space(sk, min(rcvbuf, sk->sk_rcvbuf)); > > > + > > > + if (clamp > tp->window_clamp) > > > + WRITE_ONCE(tp->window_clamp, clamp); > > > } > > > } > > > tp->rcvq_space.space = copied; > > > -- > > > > Is this the correct place to put this new code to update > > tp->window_clamp? AFAICT it's not the correct place. > > > > If a system administrator has disabled receive buffer autotuning by > > setting `sysctl net.ipv4.tcp_moderate_rcvbuf=0`, or if (copied <= > > tp->rcvq_space.space), then TCP connections will not reach this new > > code, and the window_clamp will not be adjusted, and the receive > > window will still be too low. > > > > Even if a system administrator has disabled receive buffer autotuning > > by setting `sysctl net.ipv4.tcp_moderate_rcvbuf=0`, or even if (copied > > <= tp->rcvq_space.space), AFAICT we still want the correct receive > > window value for whatever sk->sk_rcvbuf we have, based on the correct > > tp->scaling_ratio. > > > > So AFAICT the correct place to put this kind of logic is in > > tcp_measure_rcv_mss(). If we compute a new scaling_ratio and it's > > different than tp->scaling_ratio, then it seems we should compute a > > new window_clamp value using sk->sk_rcvbuf, and if the new > > window_clamp value is different then we should WRITE_ONCE that value > > into tp->window_clamp. > > > > That way we can have the correct tp->window_clamp, no matter the value > > of net.ipv4.tcp_moderate_rcvbuf, and even if (copied <= > > tp->rcvq_space.space). > > > > How does that sound? > > Can this be done without adding new code in the fast path ? I was imagining that the code would not really be in the fast path, because it would move to the spot in tcp_measure_rcv_mss() where the segment length "len" is greater than icsk->icsk_ack.rcv_mss. I imagine that should be rare, and we are already doing a somewhat expensive do_div() call there, so I was imagining that the additional cost would be relatively rare and small? > Otherwise, I feel that we send a wrong signal to 'administrators' : > "We will maintain code to make sure that wrong sysctls settings were > not so wrong." > > Are you aware of anyone changing net.ipv4.tcp_moderate_rcvbuf for any > valid reason ? No, I'm not aware of any valid reason to disable net.ipv4.tcp_moderate_rcvbuf. :-) Even if tcp_moderate_rcvbuf is enabled, AFAICT there is still the issue that if (copied <= tp->rcvq_space.space) we will not get to this code... neal ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net v2] tcp: Adjust clamping window for applications specifying SO_RCVBUF 2024-07-29 15:32 ` Neal Cardwell @ 2024-07-29 16:07 ` Eric Dumazet 2024-08-14 2:28 ` Jason Xing 1 sibling, 0 replies; 8+ messages in thread From: Eric Dumazet @ 2024-07-29 16:07 UTC (permalink / raw) To: Neal Cardwell Cc: Subash Abhinov Kasiviswanathan, soheil, yyd, ycheng, davem, kuba, netdev, dsahern, pabeni, Sean Tranchetti On Mon, Jul 29, 2024 at 5:33 PM Neal Cardwell <ncardwell@google.com> wrote: > I was imagining that the code would not really be in the fast path, > because it would move to the spot in tcp_measure_rcv_mss() where the > segment length "len" is greater than icsk->icsk_ack.rcv_mss. I imagine > that should be rare, and we are already doing a somewhat expensive > do_div() call there, so I was imagining that the additional cost would > be relatively rare and small? Let's see your patch :) Thanks ! ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net v2] tcp: Adjust clamping window for applications specifying SO_RCVBUF 2024-07-29 15:32 ` Neal Cardwell 2024-07-29 16:07 ` Eric Dumazet @ 2024-08-14 2:28 ` Jason Xing 1 sibling, 0 replies; 8+ messages in thread From: Jason Xing @ 2024-08-14 2:28 UTC (permalink / raw) To: Neal Cardwell Cc: Eric Dumazet, Subash Abhinov Kasiviswanathan, soheil, yyd, ycheng, davem, kuba, netdev, dsahern, pabeni, Sean Tranchetti Hello Neal, > > Otherwise, I feel that we send a wrong signal to 'administrators' : > > "We will maintain code to make sure that wrong sysctls settings were > > not so wrong." > > > > Are you aware of anyone changing net.ipv4.tcp_moderate_rcvbuf for any > > valid reason ? > > No, I'm not aware of any valid reason to disable > net.ipv4.tcp_moderate_rcvbuf. :-) I was also curious about why this sysctl knob was useful a long time ago? I don't see any good in it (for many years, we haven't touched it, setting it to 1 as default). Since you maintainers don't think it's a good one, could we mark it as deprecated and remove this sysctl? Thanks, Jason ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2024-08-14 2:28 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-07-26 20:41 [PATCH net v2] tcp: Adjust clamping window for applications specifying SO_RCVBUF Subash Abhinov Kasiviswanathan 2024-07-29 6:59 ` Eric Dumazet 2024-07-29 11:24 ` patchwork-bot+netdevbpf 2024-07-29 14:51 ` Neal Cardwell 2024-07-29 15:19 ` Eric Dumazet 2024-07-29 15:32 ` Neal Cardwell 2024-07-29 16:07 ` Eric Dumazet 2024-08-14 2:28 ` Jason Xing
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).