From: Matthieu Baerts <matttbe@kernel.org>
To: Paolo Abeni <pabeni@redhat.com>, martineau@kernel.org
Cc: mptcp@lists.linux.dev
Subject: Re: [PATCH v7 mptcp-next 5/6] mptcp: better mptcp-level RTT estimator
Date: Fri, 28 Nov 2025 10:51:28 +0100 [thread overview]
Message-ID: <cdd003da-c4e4-4b37-b27d-254633e2b2ec@kernel.org> (raw)
In-Reply-To: <7e85ed1a-3072-4230-a14f-b7eecdf2d4d2@redhat.com>
Hi Paolo,
On 28/11/2025 09:47, Paolo Abeni wrote:
> On 11/27/25 7:13 PM, Matthieu Baerts wrote:
>> On 20/11/2025 09:39, Paolo Abeni wrote:
>>> The current MPTCP-level RTT estimator has several issues. On high speed
>>> links, the MPTCP-level receive buffer auto-tuning happens with a frequency
>>> well above the TCP-level's one. That in turn can cause excessive/unneeded
>>> receive buffer increase.
>>>
>>> On such links, the initial rtt_us value is considerably higher
>>> than the actual delay, and the current mptcp_rcv_space_adjust() updates
>>> msk->rcvq_space.rtt_us with a period equal to the such field previous
>>> value. If the initial rtt_us is 40ms, its first update will happen after
>>> 40ms, even if the subflows see actual RTT orders of magnitude lower.
>>>
>>> Additionally:
>>> - setting the msk rtt to the maximum among all the subflows RTTs makes DRS
>>> constantly overshooting the rcvbuf size when a subflow has considerable
>>> higher latency than the other(s).
>>>
>>> - during unidirectional bulk transfers with multiple active subflows, the
>>> TCP-level RTT estimator occasionally sees considerably higher value than
>>> the real link delay, i.e. when the packet scheduler reacts to an incoming
>>> ack on given subflow pushing data on a different subflow.
>>>
>>> - currently inactive but still open subflows (i.e. switched to backup mode)
>>> are always considered when computing the msk-level rtt.
>>>
>>> Address the all the issues above with a more accurate RTT estimation
>>> strategy: the MPTCP-level RTT is set to the minimum of all the subflows
>>> actually feeding data into the MPTCP receive buffer, using a small sliding
>>> window.
>>
>> (...)
>>
>>> diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
>>> index ee0dbd6dbacf..b392d7855928 100644
>>> --- a/net/mptcp/protocol.h
>>> +++ b/net/mptcp/protocol.h
>>> @@ -269,6 +269,13 @@ struct mptcp_data_frag {
>>> struct page *page;
>>> };
>>>
>>> +/* Arbitrary compromise between as low as possible to react timely to subflow
>>> + * close event and as big as possible to avoid being fouled by biased large
>>> + * samples due to peer sending data on a different subflow WRT to the incoming
>>> + * ack.
>>> + */
>>> +#define MPTCP_RTT_SAMPLES 5
>>> +
>>> /* MPTCP connection sock */
>>> struct mptcp_sock {
>>> /* inet_connection_sock must be the first member */
>>> @@ -340,11 +347,17 @@ struct mptcp_sock {
>>> */
>>> struct mptcp_pm_data pm;
>>> struct mptcp_sched_ops *sched;
>>> +
>>> + /* Most recent rtt_us observed by in use incoming subflows. */
>>> + struct {
>>> + u32 samples[MPTCP_RTT_SAMPLES];
>>> + u32 next_sample;
>>> + } rcv_rtt_est;
>>
>> I'm sorry to react only now, I didn't manage to follow this in details,
>> but I have one question: why not using a smooth RTT [1]? Is it because
>> the goal is to mix data from the active/recently used subflows and only
>> to take the minimum, and not "combining" RTT from different subflows?
>>
>> [1] https://datatracker.ietf.org/doc/rfc6298/
>> srtt = old * (1-alpha) + new * alpha # alpha is 1/8 in RFC6298
>
> TCP already use EWMA for rtt; the values seen by MPTCP on each subflow
> went already into such processing.
>
> If there is a single subflow, doing again EWMA should only cause slower
> reactions.
>
> If there are multiple subflows, we really want the min() not the
> smoothed average, because:
>
> - there are high spikes caused by the mptcp packet scheduler: we want to
> entirely filter them out, while EWMA will make them contributing the the
> estimate and results are very visible (negatively) in my experiments.
>
> - different subflows can have very different rtt (say 1ms vs 100ms). We
> really want the minimum, otherwise DRS will be fouled/the rcvbuf will
> "explode"
Thank you for having taken the time to explain me all this. That's much
clearer, and I just realised I was mixing up acronyms! :)
Cheers,
Matt
--
Sponsored by the NGI0 Core fund.
next prev parent reply other threads:[~2025-11-28 9:51 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-20 8:39 [PATCH v7 mptcp-next 0/6] mptcp: autotune related improvement Paolo Abeni
2025-11-20 8:39 ` [PATCH v7 mptcp-next 1/6] trace: mptcp: add mptcp_rcvbuf_grow tracepoint Paolo Abeni
2025-11-20 8:39 ` [PATCH v7 mptcp-next 2/6] mptcp: do not account for OoO in mptcp_rcvbuf_grow() Paolo Abeni
2025-11-27 0:06 ` Mat Martineau
2025-11-20 8:39 ` [PATCH v7 mptcp-next 3/6] mptcp: fix receive space timestamp initialization Paolo Abeni
2025-11-20 8:39 ` [PATCH v7 mptcp-next 4/6] mptcp: consolidate rcv space init Paolo Abeni
2025-11-20 8:39 ` [PATCH v7 mptcp-next 5/6] mptcp: better mptcp-level RTT estimator Paolo Abeni
2025-11-27 2:19 ` Mat Martineau
2025-11-27 7:36 ` Paolo Abeni
2025-11-27 18:13 ` Matthieu Baerts
2025-11-28 8:47 ` Paolo Abeni
2025-11-28 9:51 ` Matthieu Baerts [this message]
2025-12-16 16:38 ` Matthieu Baerts
2025-11-20 8:39 ` [PATCH v7 mptcp-next 6/6] mptcp: add receive queue awareness in tcp_rcv_space_adjust() Paolo Abeni
2025-11-20 9:48 ` [PATCH v7 mptcp-next 0/6] mptcp: autotune related improvement MPTCP CI
2025-11-27 18:42 ` Matthieu Baerts
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cdd003da-c4e4-4b37-b27d-254633e2b2ec@kernel.org \
--to=matttbe@kernel.org \
--cc=martineau@kernel.org \
--cc=mptcp@lists.linux.dev \
--cc=pabeni@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.