public inbox for mptcp@lists.linux.dev
 help / color / mirror / Atom feed
From: Matthieu Baerts <matttbe@kernel.org>
To: Paolo Abeni <pabeni@redhat.com>, martineau@kernel.org
Cc: mptcp@lists.linux.dev
Subject: Re: [PATCH v7 mptcp-next 5/6] mptcp: better mptcp-level RTT estimator
Date: Fri, 28 Nov 2025 10:51:28 +0100	[thread overview]
Message-ID: <cdd003da-c4e4-4b37-b27d-254633e2b2ec@kernel.org> (raw)
In-Reply-To: <7e85ed1a-3072-4230-a14f-b7eecdf2d4d2@redhat.com>

Hi Paolo,

On 28/11/2025 09:47, Paolo Abeni wrote:
> On 11/27/25 7:13 PM, Matthieu Baerts wrote:
>> On 20/11/2025 09:39, Paolo Abeni wrote:
>>> The current MPTCP-level RTT estimator has several issues. On high speed
>>> links, the MPTCP-level receive buffer auto-tuning happens with a frequency
>>> well above the TCP-level's one. That in turn can cause excessive/unneeded
>>> receive buffer increase.
>>>
>>> On such links, the initial rtt_us value is considerably higher
>>> than the actual delay, and the current mptcp_rcv_space_adjust() updates
>>> msk->rcvq_space.rtt_us with a period equal to the such field previous
>>> value. If the initial rtt_us is 40ms, its first update will happen after
>>> 40ms, even if the subflows see actual RTT orders of magnitude lower.
>>>
>>> Additionally:
>>> - setting the msk rtt to the maximum among all the subflows RTTs makes DRS
>>> constantly overshooting the rcvbuf size when a subflow has considerable
>>> higher latency than the other(s).
>>>
>>> - during unidirectional bulk transfers with multiple active subflows, the
>>> TCP-level RTT estimator occasionally sees considerably higher value than
>>> the real link delay, i.e. when the packet scheduler reacts to an incoming
>>> ack on given subflow pushing data on a different subflow.
>>>
>>> - currently inactive but still open subflows (i.e. switched to backup mode)
>>> are always considered when computing the msk-level rtt.
>>>
>>> Address the all the issues above with a more accurate RTT estimation
>>> strategy: the MPTCP-level RTT is set to the minimum of all the subflows
>>> actually feeding data into the MPTCP receive buffer, using a small sliding
>>> window.
>>
>> (...)
>>
>>> diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
>>> index ee0dbd6dbacf..b392d7855928 100644
>>> --- a/net/mptcp/protocol.h
>>> +++ b/net/mptcp/protocol.h
>>> @@ -269,6 +269,13 @@ struct mptcp_data_frag {
>>>  	struct page *page;
>>>  };
>>>  
>>> +/* Arbitrary compromise between as low as possible to react timely to subflow
>>> + * close event and as big as possible to avoid being fouled by biased large
>>> + * samples due to peer sending data on a different subflow WRT to the incoming
>>> + * ack.
>>> + */
>>> +#define MPTCP_RTT_SAMPLES	5
>>> +
>>>  /* MPTCP connection sock */
>>>  struct mptcp_sock {
>>>  	/* inet_connection_sock must be the first member */
>>> @@ -340,11 +347,17 @@ struct mptcp_sock {
>>>  				 */
>>>  	struct mptcp_pm_data	pm;
>>>  	struct mptcp_sched_ops	*sched;
>>> +
>>> +	/* Most recent rtt_us observed by in use incoming subflows. */
>>> +	struct {
>>> +		u32	samples[MPTCP_RTT_SAMPLES];
>>> +		u32	next_sample;
>>> +	} rcv_rtt_est;
>>
>> I'm sorry to react only now, I didn't manage to follow this in details,
>> but I have one question: why not using a smooth RTT [1]? Is it because
>> the goal is to mix data from the active/recently used subflows and only
>> to take the minimum, and not "combining" RTT from different subflows?
>>
>> [1] https://datatracker.ietf.org/doc/rfc6298/
>>     srtt = old * (1-alpha) + new * alpha   # alpha is 1/8 in RFC6298
> 
> TCP already use EWMA for rtt; the values seen by MPTCP on each subflow
> went already into such processing.
> 
> If there is a single subflow, doing again EWMA should only cause slower
> reactions.
> 
> If there are multiple subflows, we really want the min() not the
> smoothed average, because:
> 
> - there are high spikes caused by the mptcp packet scheduler: we want to
> entirely filter them out, while EWMA will make them contributing the the
> estimate and results are very visible (negatively) in my experiments.
> 
> - different subflows can have very different rtt (say 1ms vs 100ms). We
> really want the minimum, otherwise DRS will be fouled/the rcvbuf will
> "explode"

Thank you for having taken the time to explain me all this. That's much
clearer, and I just realised I was mixing up acronyms! :)

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


  reply	other threads:[~2025-11-28  9:51 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-20  8:39 [PATCH v7 mptcp-next 0/6] mptcp: autotune related improvement Paolo Abeni
2025-11-20  8:39 ` [PATCH v7 mptcp-next 1/6] trace: mptcp: add mptcp_rcvbuf_grow tracepoint Paolo Abeni
2025-11-20  8:39 ` [PATCH v7 mptcp-next 2/6] mptcp: do not account for OoO in mptcp_rcvbuf_grow() Paolo Abeni
2025-11-27  0:06   ` Mat Martineau
2025-11-20  8:39 ` [PATCH v7 mptcp-next 3/6] mptcp: fix receive space timestamp initialization Paolo Abeni
2025-11-20  8:39 ` [PATCH v7 mptcp-next 4/6] mptcp: consolidate rcv space init Paolo Abeni
2025-11-20  8:39 ` [PATCH v7 mptcp-next 5/6] mptcp: better mptcp-level RTT estimator Paolo Abeni
2025-11-27  2:19   ` Mat Martineau
2025-11-27  7:36     ` Paolo Abeni
2025-11-27 18:13   ` Matthieu Baerts
2025-11-28  8:47     ` Paolo Abeni
2025-11-28  9:51       ` Matthieu Baerts [this message]
2025-12-16 16:38   ` Matthieu Baerts
2025-11-20  8:39 ` [PATCH v7 mptcp-next 6/6] mptcp: add receive queue awareness in tcp_rcv_space_adjust() Paolo Abeni
2025-11-20  9:48 ` [PATCH v7 mptcp-next 0/6] mptcp: autotune related improvement MPTCP CI
2025-11-27 18:42 ` Matthieu Baerts

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cdd003da-c4e4-4b37-b27d-254633e2b2ec@kernel.org \
    --to=matttbe@kernel.org \
    --cc=martineau@kernel.org \
    --cc=mptcp@lists.linux.dev \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox