From: Eric Dumazet <edumazet@google.com>
To: "David S . Miller" <davem@davemloft.net>,
Jakub Kicinski <kuba@kernel.org>,
Paolo Abeni <pabeni@redhat.com>,
Neal Cardwell <ncardwell@google.com>
Cc: Simon Horman <horms@kernel.org>,
Kuniyuki Iwashima <kuniyu@amazon.com>,
Rick Jones <jonesrick@google.com>, Wei Wang <weiwan@google.com>,
netdev@vger.kernel.org, eric.dumazet@gmail.com,
Eric Dumazet <edumazet@google.com>
Subject: [PATCH net-next 02/11] tcp: fix sk_rcvbuf overshoot
Date: Tue, 13 May 2025 19:39:10 +0000 [thread overview]
Message-ID: <20250513193919.1089692-3-edumazet@google.com> (raw)
In-Reply-To: <20250513193919.1089692-1-edumazet@google.com>
Current autosizing in tcp_rcv_space_adjust() is too aggressive.
Instead of betting on possible losses and over estimate BDP,
it is better to only account for slow start.
The following patch is then adding a more precise tuning
in the events of packet losses.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/ipv4/tcp_input.c | 59 +++++++++++++++++++-------------------------
1 file changed, 25 insertions(+), 34 deletions(-)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 88beb6d0f7b5981e65937a6727a1111fd341335b..89e886bb0fa11666ca4b51b032d536f233078dca 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -747,6 +747,29 @@ static inline void tcp_rcv_rtt_measure_ts(struct sock *sk,
}
}
+static void tcp_rcvbuf_grow(struct sock *sk)
+{
+ const struct net *net = sock_net(sk);
+ struct tcp_sock *tp = tcp_sk(sk);
+ int rcvwin, rcvbuf, cap;
+
+ if (!READ_ONCE(net->ipv4.sysctl_tcp_moderate_rcvbuf) ||
+ (sk->sk_userlocks & SOCK_RCVBUF_LOCK))
+ return;
+
+ /* slow start: allow the sender to double its rate. */
+ rcvwin = tp->rcvq_space.space << 1;
+
+ cap = READ_ONCE(net->ipv4.sysctl_tcp_rmem[2]);
+
+ rcvbuf = min_t(u32, tcp_space_from_win(sk, rcvwin), cap);
+ if (rcvbuf > sk->sk_rcvbuf) {
+ WRITE_ONCE(sk->sk_rcvbuf, rcvbuf);
+ /* Make the window clamp follow along. */
+ WRITE_ONCE(tp->window_clamp,
+ tcp_win_from_space(sk, rcvbuf));
+ }
+}
/*
* This function should be called every time data is copied to user space.
* It calculates the appropriate TCP receive buffer space.
@@ -771,42 +794,10 @@ void tcp_rcv_space_adjust(struct sock *sk)
trace_tcp_rcvbuf_grow(sk, time);
- /* A bit of theory :
- * copied = bytes received in previous RTT, our base window
- * To cope with packet losses, we need a 2x factor
- * To cope with slow start, and sender growing its cwin by 100 %
- * every RTT, we need a 4x factor, because the ACK we are sending
- * now is for the next RTT, not the current one :
- * <prev RTT . ><current RTT .. ><next RTT .... >
- */
-
- if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_moderate_rcvbuf) &&
- !(sk->sk_userlocks & SOCK_RCVBUF_LOCK)) {
- u64 rcvwin, grow;
- int rcvbuf;
-
- /* minimal window to cope with packet losses, assuming
- * steady state. Add some cushion because of small variations.
- */
- rcvwin = ((u64)copied << 1) + 16 * tp->advmss;
-
- /* Accommodate for sender rate increase (eg. slow start) */
- grow = rcvwin * (copied - tp->rcvq_space.space);
- do_div(grow, tp->rcvq_space.space);
- rcvwin += (grow << 1);
-
- rcvbuf = min_t(u64, tcp_space_from_win(sk, rcvwin),
- READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_rmem[2]));
- if (rcvbuf > sk->sk_rcvbuf) {
- WRITE_ONCE(sk->sk_rcvbuf, rcvbuf);
-
- /* Make the window clamp follow along. */
- WRITE_ONCE(tp->window_clamp,
- tcp_win_from_space(sk, rcvbuf));
- }
- }
tp->rcvq_space.space = copied;
+ tcp_rcvbuf_grow(sk);
+
new_measure:
tp->rcvq_space.seq = tp->copied_seq;
tp->rcvq_space.time = tp->tcp_mstamp;
--
2.49.0.1045.g170613ef41-goog
next prev parent reply other threads:[~2025-05-13 19:39 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-13 19:39 [PATCH net-next 00/11] tcp: receive side improvements Eric Dumazet
2025-05-13 19:39 ` [PATCH net-next 01/11] tcp: add tcp_rcvbuf_grow() tracepoint Eric Dumazet
2025-05-14 15:30 ` David Ahern
2025-05-14 15:38 ` Eric Dumazet
2025-05-14 15:46 ` David Ahern
2025-05-14 16:33 ` Eric Dumazet
2025-05-13 19:39 ` Eric Dumazet [this message]
2025-05-13 19:39 ` [PATCH net-next 03/11] tcp: adjust rcvbuf in presence of reorders Eric Dumazet
2025-05-13 19:39 ` [PATCH net-next 04/11] tcp: add receive queue awareness in tcp_rcv_space_adjust() Eric Dumazet
2025-05-13 19:39 ` [PATCH net-next 05/11] tcp: remove zero TCP TS samples for autotuning Eric Dumazet
2025-05-13 19:39 ` [PATCH net-next 06/11] tcp: fix initial tp->rcvq_space.space value for passive TS enabled flows Eric Dumazet
2025-05-13 19:39 ` [PATCH net-next 07/11] tcp: always seek for minimal rtt in tcp_rcv_rtt_update() Eric Dumazet
2025-05-13 19:39 ` [PATCH net-next 08/11] tcp: skip big rtt sample if receive queue is not empty Eric Dumazet
2025-05-13 19:39 ` [PATCH net-next 09/11] tcp: increase tcp_limit_output_bytes default value to 4MB Eric Dumazet
2025-05-13 19:39 ` [PATCH net-next 10/11] tcp: always use tcp_limit_output_bytes limitation Eric Dumazet
2025-05-13 19:39 ` [PATCH net-next 11/11] tcp: increase tcp_rmem[2] to 32 MB Eric Dumazet
2025-05-14 20:24 ` Jakub Kicinski
2025-05-14 20:53 ` Kuniyuki Iwashima
2025-05-14 21:20 ` Kuniyuki Iwashima
2025-05-14 21:26 ` Jakub Kicinski
2025-05-14 21:28 ` Kuniyuki Iwashima
2025-05-14 20:26 ` [PATCH net-next 00/11] tcp: receive side improvements Jakub Kicinski
2025-05-15 18:50 ` patchwork-bot+netdevbpf
2025-05-22 14:03 ` Daniel Borkmann
2025-05-22 14:11 ` Eric Dumazet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250513193919.1089692-3-edumazet@google.com \
--to=edumazet@google.com \
--cc=davem@davemloft.net \
--cc=eric.dumazet@gmail.com \
--cc=horms@kernel.org \
--cc=jonesrick@google.com \
--cc=kuba@kernel.org \
--cc=kuniyu@amazon.com \
--cc=ncardwell@google.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=weiwan@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).