public inbox for mptcp@lists.linux.dev
 help / color / mirror / Atom feed
* [PATCH RFC mptcp-next] mptcp: support net.ipv4.tcp_rcvbuf_low_rtt
@ 2025-11-27 15:58 Matthieu Baerts (NGI0)
  2025-11-27 17:20 ` Paolo Abeni
  2025-11-27 17:24 ` MPTCP CI
  0 siblings, 2 replies; 4+ messages in thread
From: Matthieu Baerts (NGI0) @ 2025-11-27 15:58 UTC (permalink / raw)
  To: MPTCP Upstream; +Cc: Paolo Abeni, Matthieu Baerts (NGI0)

This is a follow up of commit ecfea98b7d0d ("tcp: add
net.ipv4.tcp_rcvbuf_low_rtt"), but adapted to MPTCP.

MPTCP has mptcp_rcvbuf_grow(), which is similar to tcp_rcvbuf_grow, but
adapted for the MPTCP-level socket.

The idea here is similar to what has been done on TCP side: not let
mptcp_rcvbuf_grow() grow sk->sk_rcvbuf too fast for small RTT flows.
Quoting Eric: If sk->sk_rcvbuf is too big, this can force NIC driver to
not recycle pages from their page pool, and also can cause cache
evictions for DDIO enabled cpus/NIC, as receivers are usually slower
than senders.

If RTT if smaller than the new net.ipv4.tcp_rcvbuf_low_rtt sysctl value,
use the RTT / tcp_rcvbuf_low_rtt ratio to control sk_rcvbuf inflation.

Tested: NO :)

This is why it is still a RFC. My perf test env is currently broken. I'm
sharing this patch just in case it is easy for someone to validate this
patch. Ideally such tests should be done on top of "trace: mptcp: add
mptcp_rcvbuf_grow tracepoint" patch from Paolo (and probably on top of
the related series), following similar tests to the ones done by Eric,
making sure the receiver is slower than the sender. Feel free to take
the patch, and send new versions changing the author, etc. if needed.

Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
 net/mptcp/protocol.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index e484c6391b48..715a9a072c6a 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -208,6 +208,7 @@ static bool mptcp_rcvbuf_grow(struct sock *sk, u32 newval)
 	struct mptcp_sock *msk = mptcp_sk(sk);
 	const struct net *net = sock_net(sk);
 	u32 rcvwin, rcvbuf, cap, oldval;
+	u32 rtt_threshold, rtt_us;
 	u64 grow;
 
 	oldval = msk->rcvq_space.space;
@@ -219,10 +220,19 @@ static bool mptcp_rcvbuf_grow(struct sock *sk, u32 newval)
 	/* DRS is always one RTT late. */
 	rcvwin = newval << 1;
 
-	/* slow start: allow the sender to double its rate. */
-	grow = (u64)rcvwin * (newval - oldval);
-	do_div(grow, oldval);
-	rcvwin += grow << 1;
+	rtt_us = msk->rcvq_space.rtt_us >> 3;
+	rtt_threshold = READ_ONCE(net->ipv4.sysctl_tcp_rcvbuf_low_rtt);
+	if (rtt_us < rtt_threshold) {
+		/* For small RTT, we set @grow to rcvwin * rtt_us/rtt_threshold.
+		 * It might take few additional ms to reach 'line rate',
+		 * but will avoid sk_rcvbuf inflation and poor cache use.
+		 */
+		grow = div_u64((u64)rcvwin * rtt_us, rtt_threshold);
+	} else {
+		/* slow start: allow the sender to double its rate. */
+		grow = div_u64(((u64)rcvwin << 1) * (newval - oldval), oldval);
+	}
+	rcvwin += grow;
 
 	if (!RB_EMPTY_ROOT(&msk->out_of_order_queue))
 		rcvwin += MPTCP_SKB_CB(msk->ooo_last_skb)->end_seq - msk->ack_seq;

---
base-commit: 1fea9a6bd10f5c5494b7973141083ec56ecffd74
change-id: 20251127-mptcp-tcp_rcvbuf_low_rtt-fc64120b153a

Best regards,
-- 
Matthieu Baerts (NGI0) <matttbe@kernel.org>


^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-11-27 17:45 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-27 15:58 [PATCH RFC mptcp-next] mptcp: support net.ipv4.tcp_rcvbuf_low_rtt Matthieu Baerts (NGI0)
2025-11-27 17:20 ` Paolo Abeni
2025-11-27 17:45   ` Matthieu Baerts
2025-11-27 17:24 ` MPTCP CI

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox