* [PATCH mptcp-next v6 0/4] mptcp: autotune related improvement
@ 2025-11-19 21:54 Paolo Abeni
2025-11-19 21:54 ` [PATCH mptcp-next v6 1/4] trace: mptcp: add mptcp_rcvbuf_grow tracepoint Paolo Abeni
` (4 more replies)
0 siblings, 5 replies; 6+ messages in thread
From: Paolo Abeni @ 2025-11-19 21:54 UTC (permalink / raw)
To: mptcp; +Cc: martineau
This series collects a few follow-up for the backlog refactor, with some
of them posing as fixes just to confuse the enemy.
Targeting net-next as the issues addressed are very old, the change
is quite invasive and, as mentioned, based on the BL refactor.
Overall this:
- introduces a tracepoint similar to tcp useful to track down the issues
addressed here
- avoid rx drop in corner case scenarios, improving tput stability and
peak tput over fast link
- refactor the rcv space and rtt estimator, overall making DRS more
correct and avoiding rcv buffer drifting to tcp_rmem[2], which in
turn makes the tput more stable and less bursty
Note that after this upstream series will be merged:
https://patchwork.kernel.org/user/todo/netdevbpf/?series=1025209
we want to introduce similar changes in mptcp, too.
---
v5 -> v6:
- dropped the buggy patch 3/5
v4 -> v5:
- dropped patch 4/6; it proved to be useless together with new version
of patch 4/5
- refactor rtt estimator: use sliding window instead of dropping "too
high" values
v3 -> v4:
- dropped already merged patch 1
- update comment in patch 6/6
v2 -> v3:
- use __entry->family in patch 1/7
- more verbose commit message for patch 6/7
v1 -> v2:
- new patch 1/7
- old patch 2/4 has been split in 3: 3/7, 4/7 and 5/7
- refactor both the rcv space init and rtt estimator
*** SUBJECT HERE ***
*** BLURB HERE ***
Paolo Abeni (4):
trace: mptcp: add mptcp_rcvbuf_grow tracepoint
mptcp: fix receive space timestamp initialization.
mptcp: better mptcp-level RTT estimator
mptcp: add receive queue awareness in tcp_rcv_space_adjust()
include/trace/events/mptcp.h | 80 ++++++++++++++++++++++++++++++++++
net/mptcp/protocol.c | 83 ++++++++++++++++++++----------------
net/mptcp/protocol.h | 43 ++++++++++++++++++-
3 files changed, 169 insertions(+), 37 deletions(-)
--
2.51.1
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH mptcp-next v6 1/4] trace: mptcp: add mptcp_rcvbuf_grow tracepoint
2025-11-19 21:54 [PATCH mptcp-next v6 0/4] mptcp: autotune related improvement Paolo Abeni
@ 2025-11-19 21:54 ` Paolo Abeni
2025-11-19 21:54 ` [PATCH mptcp-next v6 2/4] mptcp: fix receive space timestamp initialization Paolo Abeni
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Paolo Abeni @ 2025-11-19 21:54 UTC (permalink / raw)
To: mptcp; +Cc: martineau
Similar to tcp, provide a new tracepoint to better understand
mptcp_rcv_space_adjust() behavior, which presents many artifacts.
Note that the used format string is so long that I preferred
wrap it, contrary to guidance for quoted strings.
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
v4 -> v5:
- fixed a couple of checkpatch issues.
v2 -> v3:
- use __entry->family; note that its value is show as raw number
instead of string, as show_family_name is available only to
net/core/ trace and I preferred not moving the mptcp traces
there as we need mptcp-specific helpers, too.
---
include/trace/events/mptcp.h | 80 ++++++++++++++++++++++++++++++++++++
net/mptcp/protocol.c | 3 ++
2 files changed, 83 insertions(+)
diff --git a/include/trace/events/mptcp.h b/include/trace/events/mptcp.h
index 085b749cdd97..269d949b2025 100644
--- a/include/trace/events/mptcp.h
+++ b/include/trace/events/mptcp.h
@@ -5,7 +5,13 @@
#if !defined(_TRACE_MPTCP_H) || defined(TRACE_HEADER_MULTI_READ)
#define _TRACE_MPTCP_H
+#include <linux/ipv6.h>
+#include <linux/tcp.h>
#include <linux/tracepoint.h>
+#include <net/ipv6.h>
+#include <net/tcp.h>
+#include <linux/sock_diag.h>
+#include <net/rstreason.h>
#define show_mapping_status(status) \
__print_symbolic(status, \
@@ -178,6 +184,80 @@ TRACE_EVENT(subflow_check_data_avail,
__entry->skb)
);
+#include <trace/events/net_probe_common.h>
+
+TRACE_EVENT(mptcp_rcvbuf_grow,
+
+ TP_PROTO(struct sock *sk, int time),
+
+ TP_ARGS(sk, time),
+
+ TP_STRUCT__entry(
+ __field(int, time)
+ __field(__u32, rtt_us)
+ __field(__u32, copied)
+ __field(__u32, inq)
+ __field(__u32, space)
+ __field(__u32, ooo_space)
+ __field(__u32, rcvbuf)
+ __field(__u32, rcv_wnd)
+ __field(__u8, scaling_ratio)
+ __field(__u16, sport)
+ __field(__u16, dport)
+ __field(__u16, family)
+ __array(__u8, saddr, 4)
+ __array(__u8, daddr, 4)
+ __array(__u8, saddr_v6, 16)
+ __array(__u8, daddr_v6, 16)
+ __field(const void *, skaddr)
+ ),
+
+ TP_fast_assign(
+ struct mptcp_sock *msk = mptcp_sk(sk);
+ struct inet_sock *inet = inet_sk(sk);
+ bool ofo_empty;
+ __be32 *p32;
+
+ __entry->time = time;
+ __entry->rtt_us = msk->rcvq_space.rtt_us >> 3;
+ __entry->copied = msk->rcvq_space.copied;
+ __entry->inq = mptcp_inq_hint(sk);
+ __entry->space = msk->rcvq_space.space;
+ ofo_empty = RB_EMPTY_ROOT(&msk->out_of_order_queue);
+ __entry->ooo_space = ofo_empty ? 0 :
+ MPTCP_SKB_CB(msk->ooo_last_skb)->end_seq -
+ msk->ack_seq;
+
+ __entry->rcvbuf = sk->sk_rcvbuf;
+ __entry->rcv_wnd = atomic64_read(&msk->rcv_wnd_sent) -
+ msk->ack_seq;
+ __entry->scaling_ratio = msk->scaling_ratio;
+ __entry->sport = ntohs(inet->inet_sport);
+ __entry->dport = ntohs(inet->inet_dport);
+ __entry->family = sk->sk_family;
+
+ p32 = (__be32 *)__entry->saddr;
+ *p32 = inet->inet_saddr;
+
+ p32 = (__be32 *)__entry->daddr;
+ *p32 = inet->inet_daddr;
+
+ TP_STORE_ADDRS(__entry, inet->inet_saddr, inet->inet_daddr,
+ sk->sk_v6_rcv_saddr, sk->sk_v6_daddr);
+
+ __entry->skaddr = sk;
+ ),
+
+ TP_printk("time=%u rtt_us=%u copied=%u inq=%u space=%u ooo=%u scaling_ratio=%u "
+ "rcvbuf=%u rcv_wnd=%u family=%d sport=%hu dport=%hu saddr=%pI4 "
+ "daddr=%pI4 saddrv6=%pI6c daddrv6=%pI6c skaddr=%p",
+ __entry->time, __entry->rtt_us, __entry->copied,
+ __entry->inq, __entry->space, __entry->ooo_space,
+ __entry->scaling_ratio, __entry->rcvbuf, __entry->rcv_wnd,
+ __entry->family, __entry->sport, __entry->dport,
+ __entry->saddr, __entry->daddr, __entry->saddr_v6,
+ __entry->daddr_v6, __entry->skaddr)
+);
#endif /* _TRACE_MPTCP_H */
/* This part must be outside protection */
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index f8756521b2e2..64f592a7897c 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -28,6 +28,8 @@
#include "protocol.h"
#include "mib.h"
+static unsigned int mptcp_inq_hint(const struct sock *sk);
+
#define CREATE_TRACE_POINTS
#include <trace/events/mptcp.h>
@@ -2118,6 +2120,7 @@ static void mptcp_rcv_space_adjust(struct mptcp_sock *msk, int copied)
if (msk->rcvq_space.copied <= msk->rcvq_space.space)
goto new_measure;
+ trace_mptcp_rcvbuf_grow(sk, time);
if (mptcp_rcvbuf_grow(sk, msk->rcvq_space.copied)) {
/* Make subflows follow along. If we do not do this, we
* get drops at subflow level if skbs can't be moved to
--
2.51.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH mptcp-next v6 2/4] mptcp: fix receive space timestamp initialization.
2025-11-19 21:54 [PATCH mptcp-next v6 0/4] mptcp: autotune related improvement Paolo Abeni
2025-11-19 21:54 ` [PATCH mptcp-next v6 1/4] trace: mptcp: add mptcp_rcvbuf_grow tracepoint Paolo Abeni
@ 2025-11-19 21:54 ` Paolo Abeni
2025-11-19 21:54 ` [PATCH mptcp-next v6 3/4] mptcp: better mptcp-level RTT estimator Paolo Abeni
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Paolo Abeni @ 2025-11-19 21:54 UTC (permalink / raw)
To: mptcp; +Cc: martineau
MPTCP initialize the receive buffer stamp in mptcp_rcv_space_init(),
using the provided subflow stamp. Such helper is invoked in several
places; for passive sockets, space init happened at clone time.
In such scenario, MPTCP ends-up accesses the subflow stamp before
its initialization, leading to quite randomic timing for the first
receive buffer auto-tune event, as the timestamp for newly created
subflow is not refreshed there.
Fix the issue moving the stamp initialization out of the mentioned helper,
at the data transfer start, and always using a fresh timestamp.
Fixes: 013e3179dbd2 ("mptcp: fix rcv space initialization")
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
v5 -> v6
- really remove the stamp init from mptcp_rcv_space_init()
v1 -> v2:
- factor out only the tstamp change for better reviewability
---
net/mptcp/protocol.c | 9 ++++-----
net/mptcp/protocol.h | 5 +++++
2 files changed, 9 insertions(+), 5 deletions(-)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 64f592a7897c..d6d2d9d1849a 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -2089,8 +2089,8 @@ static void mptcp_rcv_space_adjust(struct mptcp_sock *msk, int copied)
msk->rcvq_space.copied += copied;
- mstamp = div_u64(tcp_clock_ns(), NSEC_PER_USEC);
- time = tcp_stamp_us_delta(mstamp, msk->rcvq_space.time);
+ mstamp = mptcp_stamp();
+ time = tcp_stamp_us_delta(mstamp, READ_ONCE(msk->rcvq_space.time));
rtt_us = msk->rcvq_space.rtt_us;
if (rtt_us && time < (rtt_us >> 3))
@@ -3530,7 +3530,7 @@ struct sock *mptcp_sk_clone_init(const struct sock *sk,
mptcp_copy_inaddrs(nsk, ssk);
__mptcp_propagate_sndbuf(nsk, ssk);
- mptcp_rcv_space_init(msk, ssk);
+ msk->rcvq_space.time = mptcp_stamp();
if (mp_opt->suboptions & OPTION_MPTCP_MPC_ACK)
__mptcp_subflow_fully_established(msk, subflow, mp_opt);
@@ -3548,8 +3548,6 @@ void mptcp_rcv_space_init(struct mptcp_sock *msk, const struct sock *ssk)
msk->rcvq_space.copied = 0;
msk->rcvq_space.rtt_us = 0;
- msk->rcvq_space.time = tp->tcp_mstamp;
-
/* initial rcv_space offering made to peer */
msk->rcvq_space.space = min_t(u32, tp->rcv_wnd,
TCP_INIT_CWND * tp->advmss);
@@ -3745,6 +3743,7 @@ void mptcp_finish_connect(struct sock *ssk)
* accessing the field below
*/
WRITE_ONCE(msk->local_key, subflow->local_key);
+ WRITE_ONCE(msk->rcvq_space.time, mptcp_stamp());
mptcp_pm_new_connection(msk, ssk, 0);
}
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 199f28f3dd5e..95c62f2ac705 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -917,6 +917,11 @@ static inline bool mptcp_is_fully_established(struct sock *sk)
READ_ONCE(mptcp_sk(sk)->fully_established);
}
+static inline u64 mptcp_stamp(void)
+{
+ return div_u64(tcp_clock_ns(), NSEC_PER_USEC);
+}
+
void mptcp_rcv_space_init(struct mptcp_sock *msk, const struct sock *ssk);
void mptcp_data_ready(struct sock *sk, struct sock *ssk);
bool mptcp_finish_join(struct sock *sk);
--
2.51.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH mptcp-next v6 3/4] mptcp: better mptcp-level RTT estimator
2025-11-19 21:54 [PATCH mptcp-next v6 0/4] mptcp: autotune related improvement Paolo Abeni
2025-11-19 21:54 ` [PATCH mptcp-next v6 1/4] trace: mptcp: add mptcp_rcvbuf_grow tracepoint Paolo Abeni
2025-11-19 21:54 ` [PATCH mptcp-next v6 2/4] mptcp: fix receive space timestamp initialization Paolo Abeni
@ 2025-11-19 21:54 ` Paolo Abeni
2025-11-19 21:54 ` [PATCH mptcp-next v6 4/4] mptcp: add receive queue awareness in tcp_rcv_space_adjust() Paolo Abeni
2025-11-19 23:29 ` [PATCH mptcp-next v6 0/4] mptcp: autotune related improvement MPTCP CI
4 siblings, 0 replies; 6+ messages in thread
From: Paolo Abeni @ 2025-11-19 21:54 UTC (permalink / raw)
To: mptcp; +Cc: martineau
The current MPTCP-level RTT estimator has several issues. On high speed
links, the MPTCP-level receive buffer auto-tuning happens with a frequency
well above the TCP-level's one. That in turn can cause excessive/unneeded
receive buffer increase.
On such links, the initial rtt_us value is considerably higher
than the actual delay, and the current mptcp_rcv_space_adjust() updates
msk->rcvq_space.rtt_us with a period equal to the such field previous
value. If the initial rtt_us is 40ms, its first update will happen after
40ms, even if the subflows see actual RTT orders of magnitude lower.
Additionally:
- setting the msk rtt to the maximum among all the subflows RTTs makes DRS
constantly overshooting the rcvbuf size when a subflow has considerable
higher latency than the other(s).
- during unidirectional bulk transfers with multiple active subflows, the
TCP-level RTT estimator occasionally sees considerably higher value than
the real link delay, i.e. when the packet scheduler reacts to an incoming
ack on given subflow pushing data on a different subflow.
- currently inactive but still open subflows (i.e. switched to backup mode)
are always considered when computing the msk-level rtt.
Address the all the issues above with a more accurate RTT estimation
strategy: the MPTCP-level RTT is set to the minimum of all the subflows
actually feeding data into the MPTCP receive buffer, using a small sliding
window.
While at it, also use EWMA to compute the msk-level scaling_ratio, to that
mptcp can avoid traversing the subflow list is mptcp_rcv_space_adjust().
Use some care to avoid updating msk and ssk level fields too often.
Fixes: a6b118febbab ("mptcp: add receive buffer auto-tuning")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
v4 -> v5:
- avoid filtering out too high value, use sliding window instead
v3 -> v4:
- really refresh msk rtt after a full win per subflow (off-by-one in prev
revision)
- sync mptcp_rcv_space_adjust() comment with the new code
v1 -> v2:
- do not use explicit reset flags - do rcv win based decision instead
- discard 0 rtt_us samples from subflows
- discard samples on non empty rx queue
- discard "too high" samples, see the code comments WRT the whys
---
include/trace/events/mptcp.h | 2 +-
net/mptcp/protocol.c | 63 ++++++++++++++++++++----------------
net/mptcp/protocol.h | 38 +++++++++++++++++++++-
3 files changed, 73 insertions(+), 30 deletions(-)
diff --git a/include/trace/events/mptcp.h b/include/trace/events/mptcp.h
index 269d949b2025..04521acba483 100644
--- a/include/trace/events/mptcp.h
+++ b/include/trace/events/mptcp.h
@@ -219,7 +219,7 @@ TRACE_EVENT(mptcp_rcvbuf_grow,
__be32 *p32;
__entry->time = time;
- __entry->rtt_us = msk->rcvq_space.rtt_us >> 3;
+ __entry->rtt_us = mptcp_rtt_us_est(msk) >> 3;
__entry->copied = msk->rcvq_space.copied;
__entry->inq = mptcp_inq_hint(sk);
__entry->space = msk->rcvq_space.space;
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index d6d2d9d1849a..7af4c13b8f60 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -886,6 +886,32 @@ static bool move_skbs_to_msk(struct mptcp_sock *msk, struct sock *ssk)
return moved;
}
+static void mptcp_rcv_rtt_update(struct mptcp_sock *msk,
+ struct mptcp_subflow_context *subflow)
+{
+ const struct tcp_sock *tp = tcp_sk(subflow->tcp_sock);
+ u32 rtt_us = tp->rcv_rtt_est.rtt_us;
+ int id;
+
+ /* Update once per subflow per rcvwnd to avoid touching the msk
+ * too often.
+ */
+ if (!rtt_us || tp->rcv_rtt_est.seq == subflow->prev_rtt_seq)
+ return;
+
+ subflow->prev_rtt_seq = tp->rcv_rtt_est.seq;
+
+ /* Pairs with READ_ONCE() in mptcp_rtt_us_est(). */
+ id = msk->rcv_rtt_est.next_sample;
+ WRITE_ONCE(msk->rcv_rtt_est.samples[id], rtt_us);
+ if (++msk->rcv_rtt_est.next_sample == MPTCP_RTT_SAMPLES)
+ msk->rcv_rtt_est.next_sample = 0;
+
+ /* EWMA among the incoming subflows */
+ msk->scaling_ratio = ((msk->scaling_ratio << 3) - msk->scaling_ratio +
+ tp->scaling_ratio) >> 3;
+}
+
void mptcp_data_ready(struct sock *sk, struct sock *ssk)
{
struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
@@ -899,6 +925,7 @@ void mptcp_data_ready(struct sock *sk, struct sock *ssk)
return;
mptcp_data_lock(sk);
+ mptcp_rcv_rtt_update(msk, subflow);
if (!sock_owned_by_user(sk)) {
/* Wake-up the reader only for in-sequence data */
if (move_skbs_to_msk(msk, ssk) && mptcp_epollin_ready(sk))
@@ -2069,15 +2096,15 @@ static int __mptcp_recvmsg_mskq(struct sock *sk, struct msghdr *msg,
/* receive buffer autotuning. See tcp_rcv_space_adjust for more information.
*
- * Only difference: Use highest rtt estimate of the subflows in use.
+ * Only difference: Use lowest rtt estimate of the subflows in use, see
+ * mptcp_rcv_rtt_update() and mptcp_rtt_us_est().
*/
static void mptcp_rcv_space_adjust(struct mptcp_sock *msk, int copied)
{
struct mptcp_subflow_context *subflow;
struct sock *sk = (struct sock *)msk;
- u8 scaling_ratio = U8_MAX;
- u32 time, advmss = 1;
- u64 rtt_us, mstamp;
+ u32 time, rtt_us;
+ u64 mstamp;
msk_owned_by_me(msk);
@@ -2092,29 +2119,8 @@ static void mptcp_rcv_space_adjust(struct mptcp_sock *msk, int copied)
mstamp = mptcp_stamp();
time = tcp_stamp_us_delta(mstamp, READ_ONCE(msk->rcvq_space.time));
- rtt_us = msk->rcvq_space.rtt_us;
- if (rtt_us && time < (rtt_us >> 3))
- return;
-
- rtt_us = 0;
- mptcp_for_each_subflow(msk, subflow) {
- const struct tcp_sock *tp;
- u64 sf_rtt_us;
- u32 sf_advmss;
-
- tp = tcp_sk(mptcp_subflow_tcp_sock(subflow));
-
- sf_rtt_us = READ_ONCE(tp->rcv_rtt_est.rtt_us);
- sf_advmss = READ_ONCE(tp->advmss);
-
- rtt_us = max(sf_rtt_us, rtt_us);
- advmss = max(sf_advmss, advmss);
- scaling_ratio = min(tp->scaling_ratio, scaling_ratio);
- }
-
- msk->rcvq_space.rtt_us = rtt_us;
- msk->scaling_ratio = scaling_ratio;
- if (time < (rtt_us >> 3) || rtt_us == 0)
+ rtt_us = mptcp_rtt_us_est(msk);
+ if (rtt_us == U32_MAX || time < (rtt_us >> 3))
return;
if (msk->rcvq_space.copied <= msk->rcvq_space.space)
@@ -2960,6 +2966,7 @@ static void __mptcp_init_sock(struct sock *sk)
msk->timer_ival = TCP_RTO_MIN;
msk->scaling_ratio = TCP_DEFAULT_SCALING_RATIO;
msk->backlog_len = 0;
+ mptcp_init_rtt_est(msk);
WRITE_ONCE(msk->first, NULL);
inet_csk(sk)->icsk_sync_mss = mptcp_sync_mss;
@@ -3403,6 +3410,7 @@ static int mptcp_disconnect(struct sock *sk, int flags)
msk->bytes_sent = 0;
msk->bytes_retrans = 0;
msk->rcvspace_init = 0;
+ mptcp_init_rtt_est(msk);
/* for fallback's sake */
WRITE_ONCE(msk->ack_seq, 0);
@@ -3546,7 +3554,6 @@ void mptcp_rcv_space_init(struct mptcp_sock *msk, const struct sock *ssk)
msk->rcvspace_init = 1;
msk->rcvq_space.copied = 0;
- msk->rcvq_space.rtt_us = 0;
/* initial rcv_space offering made to peer */
msk->rcvq_space.space = min_t(u32, tp->rcv_wnd,
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 95c62f2ac705..df33b30043b6 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -269,6 +269,13 @@ struct mptcp_data_frag {
struct page *page;
};
+/* Arbitrary compromise between as low as possible to react timely to subflow
+ * close event and as big as possible to avoid being fouled by biased large
+ * samples due to peer sending data on a different subflow WRT to the incoming
+ * ack.
+ */
+#define MPTCP_RTT_SAMPLES 5
+
/* MPTCP connection sock */
struct mptcp_sock {
/* inet_connection_sock must be the first member */
@@ -340,11 +347,17 @@ struct mptcp_sock {
*/
struct mptcp_pm_data pm;
struct mptcp_sched_ops *sched;
+
+ /* Most recent rtt_us observed by in use incoming subflows. */
+ struct {
+ u32 samples[MPTCP_RTT_SAMPLES];
+ u32 next_sample;
+ } rcv_rtt_est;
+
struct {
int space; /* bytes copied in last measurement window */
int copied; /* bytes copied in this measurement window */
u64 time; /* start time of measurement window */
- u64 rtt_us; /* last maximum rtt of subflows */
} rcvq_space;
u8 scaling_ratio;
bool allow_subflows;
@@ -422,6 +435,27 @@ static inline struct mptcp_data_frag *mptcp_send_head(const struct sock *sk)
return msk->first_pending;
}
+static inline void mptcp_init_rtt_est(struct mptcp_sock *msk)
+{
+ int i;
+
+ for (i = 0; i < MPTCP_RTT_SAMPLES; ++i)
+ msk->rcv_rtt_est.samples[i] = U32_MAX;
+ msk->rcv_rtt_est.next_sample = 0;
+ msk->scaling_ratio = TCP_DEFAULT_SCALING_RATIO;
+}
+
+static inline u32 mptcp_rtt_us_est(const struct mptcp_sock *msk)
+{
+ u32 rtt_us = msk->rcv_rtt_est.samples[0];
+ int i;
+
+ /* Lockless access of collected samples. */
+ for (i = 1; i < MPTCP_RTT_SAMPLES; ++i)
+ rtt_us = min(rtt_us, READ_ONCE(msk->rcv_rtt_est.samples[i]));
+ return rtt_us;
+}
+
static inline struct mptcp_data_frag *mptcp_send_next(struct sock *sk)
{
struct mptcp_sock *msk = mptcp_sk(sk);
@@ -523,6 +557,8 @@ struct mptcp_subflow_context {
u32 map_data_len;
__wsum map_data_csum;
u32 map_csum_len;
+ u32 prev_rtt_us;
+ u32 prev_rtt_seq;
u32 request_mptcp : 1, /* send MP_CAPABLE */
request_join : 1, /* send MP_JOIN */
request_bkup : 1,
--
2.51.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH mptcp-next v6 4/4] mptcp: add receive queue awareness in tcp_rcv_space_adjust()
2025-11-19 21:54 [PATCH mptcp-next v6 0/4] mptcp: autotune related improvement Paolo Abeni
` (2 preceding siblings ...)
2025-11-19 21:54 ` [PATCH mptcp-next v6 3/4] mptcp: better mptcp-level RTT estimator Paolo Abeni
@ 2025-11-19 21:54 ` Paolo Abeni
2025-11-19 23:29 ` [PATCH mptcp-next v6 0/4] mptcp: autotune related improvement MPTCP CI
4 siblings, 0 replies; 6+ messages in thread
From: Paolo Abeni @ 2025-11-19 21:54 UTC (permalink / raw)
To: mptcp; +Cc: martineau
This is the mptcp counter-part of commit ea33537d8292 ("tcp: add receive
queue awareness in tcp_rcv_space_adjust()").
Prior to this commit:
ESTAB 33165568 0 192.168.255.2:5201 192.168.255.1:53380 \
skmem:(r33076416,rb33554432,t0,tb91136,f448,w0,o0,bl0,d0)
After:
ESTAB 3279168 0 192.168.255.2:5201 192.168.255.1]:53042 \
skmem:(r3190912,rb3719956,t0,tb91136,f1536,w0,o0,bl0,d0)
(same tput)
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
net/mptcp/protocol.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 7af4c13b8f60..8d43ad716209 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -2123,11 +2123,13 @@ static void mptcp_rcv_space_adjust(struct mptcp_sock *msk, int copied)
if (rtt_us == U32_MAX || time < (rtt_us >> 3))
return;
- if (msk->rcvq_space.copied <= msk->rcvq_space.space)
+ copied = msk->rcvq_space.copied;
+ copied -= mptcp_inq_hint(sk);
+ if (copied <= msk->rcvq_space.space)
goto new_measure;
trace_mptcp_rcvbuf_grow(sk, time);
- if (mptcp_rcvbuf_grow(sk, msk->rcvq_space.copied)) {
+ if (mptcp_rcvbuf_grow(sk, copied)) {
/* Make subflows follow along. If we do not do this, we
* get drops at subflow level if skbs can't be moved to
* the mptcp rx queue fast enough (announced rcv_win can
@@ -2141,7 +2143,7 @@ static void mptcp_rcv_space_adjust(struct mptcp_sock *msk, int copied)
slow = lock_sock_fast(ssk);
/* subflows can be added before tcp_init_transfer() */
if (tcp_sk(ssk)->rcvq_space.space)
- tcp_rcvbuf_grow(ssk, msk->rcvq_space.copied);
+ tcp_rcvbuf_grow(ssk, copied);
unlock_sock_fast(ssk, slow);
}
}
--
2.51.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH mptcp-next v6 0/4] mptcp: autotune related improvement
2025-11-19 21:54 [PATCH mptcp-next v6 0/4] mptcp: autotune related improvement Paolo Abeni
` (3 preceding siblings ...)
2025-11-19 21:54 ` [PATCH mptcp-next v6 4/4] mptcp: add receive queue awareness in tcp_rcv_space_adjust() Paolo Abeni
@ 2025-11-19 23:29 ` MPTCP CI
4 siblings, 0 replies; 6+ messages in thread
From: MPTCP CI @ 2025-11-19 23:29 UTC (permalink / raw)
To: Paolo Abeni; +Cc: mptcp
Hi Paolo,
Thank you for your modifications, that's great!
Our CI did some validations and here is its report:
- KVM Validation: normal (except selftest_mptcp_join): Critical: 1 Call Trace(s) - Critical: Unexpected stop of the VM ❌
- KVM Validation: normal (only selftest_mptcp_join): Success! ✅
- KVM Validation: debug (except selftest_mptcp_join): Critical: 1 Call Trace(s) - Critical: Unexpected stop of the VM ❌
- KVM Validation: debug (only selftest_mptcp_join): Critical: 1 Call Trace(s) - Critical: Unexpected stop of the VM ❌
- KVM Validation: btf-normal (only bpftest_all): Success! ✅
- KVM Validation: btf-debug (only bpftest_all): Success! ✅
- Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/19518367759
Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/1e2aec772c74
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=1025526
If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:
$ cd [kernel source code]
$ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
--pull always mptcp/mptcp-upstream-virtme-docker:latest \
auto-normal
For more details:
https://github.com/multipath-tcp/mptcp-upstream-virtme-docker
Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)
Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-11-19 23:29 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-19 21:54 [PATCH mptcp-next v6 0/4] mptcp: autotune related improvement Paolo Abeni
2025-11-19 21:54 ` [PATCH mptcp-next v6 1/4] trace: mptcp: add mptcp_rcvbuf_grow tracepoint Paolo Abeni
2025-11-19 21:54 ` [PATCH mptcp-next v6 2/4] mptcp: fix receive space timestamp initialization Paolo Abeni
2025-11-19 21:54 ` [PATCH mptcp-next v6 3/4] mptcp: better mptcp-level RTT estimator Paolo Abeni
2025-11-19 21:54 ` [PATCH mptcp-next v6 4/4] mptcp: add receive queue awareness in tcp_rcv_space_adjust() Paolo Abeni
2025-11-19 23:29 ` [PATCH mptcp-next v6 0/4] mptcp: autotune related improvement MPTCP CI
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox