From: Paolo Abeni <pabeni@redhat.com>
To: gang.yan@linux.dev
Cc: "Matthieu Baerts (NGI0)" <matttbe@kernel.org>,
Geliang Tang <geliang@kernel.org>,
MPTCP Linux <mptcp@lists.linux.dev>
Subject: Re: [PATCH v8 mptcp-next 7/9] mptcp: implemented OoO queue pruning
Date: Wed, 27 May 2026 12:01:47 +0200 [thread overview]
Message-ID: <76116a98-e6c3-4991-b4cd-d52c354e01ca@redhat.com> (raw)
In-Reply-To: <6d38bd40e238fa4e578a629e8db789f36fc46ebb@linux.dev>
On 5/27/26 7:30 AM, gang.yan@linux.dev wrote:
> May 26, 2026 at 2:50 PM, "Paolo Abeni" <pabeni@redhat.com mailto:pabeni@redhat.com?to=%22Paolo%20Abeni%22%20%3Cpabeni%40redhat.com%3E > wrote:
>> On 5/26/26 5:13 AM, gang.yan@linux.dev wrote:
>>> May 23, 2026 at 5:43 AM, "Paolo Abeni" <pabeni@redhat.com mailto:pabeni@redhat.com?to=%22Paolo%20Abeni%22%20%3Cpabeni%40redhat.com%3E > wrote:
>>>> diff --git a/net/mptcp/mib.c b/net/mptcp/mib.c
>>>> index ef65e2df709f..d9bd4f4afcc0 100644
>>>> --- a/net/mptcp/mib.c
>>>> +++ b/net/mptcp/mib.c
>>>> @@ -87,6 +87,7 @@ static const struct snmp_mib mptcp_snmp_list[] = {
>>>> SNMP_MIB_ITEM("WinProbe", MPTCP_MIB_WINPROBE),
>>>> SNMP_MIB_ITEM("BacklogDrop", MPTCP_MIB_BACKLOGDROP),
>>>> SNMP_MIB_ITEM("RcvPruned", MPTCP_MIB_RCVPRUNED),
>>>> + SNMP_MIB_ITEM("OfoPruned", MPTCP_MIB_OFO_PRUNED),
>>>> };
>>>>
>>>> /* mptcp_mib_alloc - allocate percpu mib counters
>>>> diff --git a/net/mptcp/mib.h b/net/mptcp/mib.h
>>>> index c84eb853d499..18f35f7e0a2d 100644
>>>> --- a/net/mptcp/mib.h
>>>> +++ b/net/mptcp/mib.h
>>>> @@ -90,6 +90,7 @@ enum linux_mptcp_mib_field {
>>>> MPTCP_MIB_WINPROBE, /* MPTCP-level zero window probe */
>>>> MPTCP_MIB_BACKLOGDROP, /* Backlog over memory limit */
>>>> MPTCP_MIB_RCVPRUNED, /* Dropped due to memory constrains */
>>>> + MPTCP_MIB_OFO_PRUNED, /* MPTCP-level OoO queue pruned */
>>>> __MPTCP_MIB_MAX
>>>> };
>>>>
>>>> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
>>>> index 03d6f8658467..f446e22148b9 100644
>>>> --- a/net/mptcp/protocol.c
>>>> +++ b/net/mptcp/protocol.c
>>>> @@ -373,6 +373,43 @@ static void mptcp_init_skb(struct sock *ssk, struct sk_buff *skb, int offset,
>>>> skb_dst_drop(skb);
>>>> }
>>>>
>>>> +/* "Inspired" from the TCP version */
>>>> +static void mptcp_prune_ofo_queue(struct sock *sk, u64 seq)
>>>> +{
>>>> + struct mptcp_sock *msk = mptcp_sk(sk);
>>>> + struct rb_node *node, *prev;
>>>> + bool pruned = false;
>>>> + u64 mem;
>>>> +
>>>> + if (RB_EMPTY_ROOT(&msk->out_of_order_queue))
>>>> + return;
>>>> +
>>>> + node = &msk->ooo_last_skb->rbnode;
>>>> +
>>>> + do {
>>>> + struct sk_buff *skb = rb_to_skb(node);
>>>> +
>>>> + /* Stop pruning if the incoming skb would land in OoO tail. */
>>>> + if (after64(seq, MPTCP_SKB_CB(skb)->map_seq))
>>>> + break;
>>>> +
>>>> + pruned = true;
>>>> + prev = rb_prev(node);
>>>> + rb_erase(node, &msk->out_of_order_queue);
>>>> + mptcp_drop(sk, skb);
>>>> + msk->ooo_last_skb = rb_to_skb(prev);
>>>> +
>>>> + mem = (unsigned int)atomic_read(&sk->sk_rmem_alloc);
>>>> + if (mem < sk->sk_rcvbuf)
>>>> + break;
>>>>
>>> Hi Paolo,
>>>
>>> Thanks for the v8. While going through the code, I ran into a
>>> point that I'm not entirely sure about.
>>>
>>> TCP‘s design uses sk->sk_rcvbuf >> 3 (one eighth of the buffer)
>>> as a goal. It we use sk->sk_rcvbuf here, the loop may break after
>>> deleting just one packet, right? This may fail to free enough space
>>> for the incoming out‑of‑order packet, leading to repeated pruning
>>> calls and potential packet drops.
>>>
>> Thank you for your review.
>>
>> Yes, here there is some difference vs plain TCP and I think it's for the
>> better.
>>
>> TCP tries to make a "significant" amount of space in the receiver buffer
>> while MPTCP tries to make room for a single packet.
>>
>> Both strategies make sense in their respective context: TCP invokes
>> tcp_prune_ofo_queue() only after collapsing, and the latter is very CPU
>> intensive; if tcp_prune_ofo_queue() would make room for a single packet,
>> the next incoming one could still trigger collapsing and burn a lot of
>> CPU cycles (note that this bad chain of events could still happen if
>> sk_rcvbuf / 8 is not bigger than 2 packets).
>>
>> MPTCP (intentionally, as per discussion in the last iterations here)
>> does not perform collapsing, and directly does pruning when over limit.
>> Pruning is relatively cheap - computational complexity wise we could do
>> it for each incoming packet with "no issues", at least compared to
>> collapsing.
>>
>> Dropping the minimal amount of packets to fit the incoming one, allows
>> MPTCP to minimize the need for reinjections (for packets already acked
>> at the TCP-level, which we really should avoid). I think overall this
>> compromise is a better fit for MPTCP.
>>
>> BTW did you have the chance to perform testing on top of this revision?
>>
>
> Hi Paolo,
>
> Yes, I've tested v8 many times — hundreds of runs. And the tls.c part works
> fine too.
>
> I also spotted some issues with MPTCP TLS adaptation, but those are unrelated
> to v8. I'll reply in Geliang's thread about them soon.
Thank you for testing. I'll send a v9 to try to get sashiko comments on
remaining patches, but no real changes in it. Feel free to add your
formal Tested-by tag, if you want :)
Thanks,
Paolo
next prev parent reply other threads:[~2026-05-27 10:01 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-22 21:43 [PATCH v8 mptcp-next 0/9] mptcp: address stall under memory pressure Paolo Abeni
2026-05-22 21:43 ` [PATCH v8 mptcp-next 1/9] mptcp: fix missing wakeups in edge scenarios Paolo Abeni
2026-05-26 7:47 ` Matthieu Baerts
2026-05-22 21:43 ` [PATCH v8 mptcp-next 2/9] mptcp: fix retransmission loop when csum is enabled Paolo Abeni
2026-05-26 7:48 ` Matthieu Baerts
2026-05-26 15:10 ` Paolo Abeni
2026-05-22 21:43 ` [PATCH v8 mptcp-next 3/9] mptcp: close TOCTOU race while computing rcv_wnd Paolo Abeni
2026-05-26 6:10 ` Geliang Tang
2026-05-26 6:34 ` Paolo Abeni
2026-05-26 7:48 ` Matthieu Baerts
2026-05-22 21:43 ` [PATCH v8 mptcp-next 4/9] mptcp: allow subflow rcv wnd to shrink Paolo Abeni
2026-05-24 8:34 ` Paolo Abeni
2026-05-26 7:02 ` Matthieu Baerts
2026-05-26 7:49 ` Matthieu Baerts
2026-05-26 15:17 ` Paolo Abeni
2026-05-27 0:51 ` Matthieu Baerts
2026-05-26 7:48 ` Matthieu Baerts
2026-05-26 15:16 ` Paolo Abeni
2026-05-22 21:43 ` [PATCH v8 mptcp-next 5/9] mptcp: explicitly drop over memory limits Paolo Abeni
2026-05-22 21:43 ` [PATCH v8 mptcp-next 6/9] mptcp: enforce hard limit on backlog flushing Paolo Abeni
2026-05-22 21:43 ` [PATCH v8 mptcp-next 7/9] mptcp: implemented OoO queue pruning Paolo Abeni
2026-05-26 3:13 ` gang.yan
2026-05-26 6:50 ` Paolo Abeni
2026-05-27 5:30 ` gang.yan
2026-05-27 10:01 ` Paolo Abeni [this message]
2026-05-28 1:18 ` gang.yan
2026-05-22 21:43 ` [PATCH v8 mptcp-next 8/9] mptcp: move the retrans loop to a separate helper Paolo Abeni
2026-05-22 21:43 ` [PATCH v8 mptcp-next 9/9] mptcp: let the retrans scheduler do its job Paolo Abeni
2026-05-27 5:46 ` Geliang Tang
2026-05-22 23:10 ` [PATCH v8 mptcp-next 0/9] mptcp: address stall under memory pressure MPTCP CI
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=76116a98-e6c3-4991-b4cd-d52c354e01ca@redhat.com \
--to=pabeni@redhat.com \
--cc=gang.yan@linux.dev \
--cc=geliang@kernel.org \
--cc=matttbe@kernel.org \
--cc=mptcp@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox