MPTCP Linux Development
 help / color / mirror / Atom feed
From: Paolo Abeni <pabeni@redhat.com>
To: gang.yan@linux.dev
Cc: "Matthieu Baerts (NGI0)" <matttbe@kernel.org>,
	Geliang Tang <geliang@kernel.org>,
	MPTCP Linux <mptcp@lists.linux.dev>
Subject: Re: [PATCH v8 mptcp-next 7/9] mptcp: implemented OoO queue pruning
Date: Wed, 27 May 2026 12:01:47 +0200	[thread overview]
Message-ID: <76116a98-e6c3-4991-b4cd-d52c354e01ca@redhat.com> (raw)
In-Reply-To: <6d38bd40e238fa4e578a629e8db789f36fc46ebb@linux.dev>

On 5/27/26 7:30 AM, gang.yan@linux.dev wrote:
> May 26, 2026 at 2:50 PM, "Paolo Abeni" <pabeni@redhat.com mailto:pabeni@redhat.com?to=%22Paolo%20Abeni%22%20%3Cpabeni%40redhat.com%3E > wrote:
>> On 5/26/26 5:13 AM, gang.yan@linux.dev wrote:
>>> May 23, 2026 at 5:43 AM, "Paolo Abeni" <pabeni@redhat.com mailto:pabeni@redhat.com?to=%22Paolo%20Abeni%22%20%3Cpabeni%40redhat.com%3E > wrote:
>>>> diff --git a/net/mptcp/mib.c b/net/mptcp/mib.c
>>>>  index ef65e2df709f..d9bd4f4afcc0 100644
>>>>  --- a/net/mptcp/mib.c
>>>>  +++ b/net/mptcp/mib.c
>>>>  @@ -87,6 +87,7 @@ static const struct snmp_mib mptcp_snmp_list[] = {
>>>>  SNMP_MIB_ITEM("WinProbe", MPTCP_MIB_WINPROBE),
>>>>  SNMP_MIB_ITEM("BacklogDrop", MPTCP_MIB_BACKLOGDROP),
>>>>  SNMP_MIB_ITEM("RcvPruned", MPTCP_MIB_RCVPRUNED),
>>>>  + SNMP_MIB_ITEM("OfoPruned", MPTCP_MIB_OFO_PRUNED),
>>>>  };
>>>>  
>>>>  /* mptcp_mib_alloc - allocate percpu mib counters
>>>>  diff --git a/net/mptcp/mib.h b/net/mptcp/mib.h
>>>>  index c84eb853d499..18f35f7e0a2d 100644
>>>>  --- a/net/mptcp/mib.h
>>>>  +++ b/net/mptcp/mib.h
>>>>  @@ -90,6 +90,7 @@ enum linux_mptcp_mib_field {
>>>>  MPTCP_MIB_WINPROBE, /* MPTCP-level zero window probe */
>>>>  MPTCP_MIB_BACKLOGDROP, /* Backlog over memory limit */
>>>>  MPTCP_MIB_RCVPRUNED, /* Dropped due to memory constrains */
>>>>  + MPTCP_MIB_OFO_PRUNED, /* MPTCP-level OoO queue pruned */
>>>>  __MPTCP_MIB_MAX
>>>>  };
>>>>  
>>>>  diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
>>>>  index 03d6f8658467..f446e22148b9 100644
>>>>  --- a/net/mptcp/protocol.c
>>>>  +++ b/net/mptcp/protocol.c
>>>>  @@ -373,6 +373,43 @@ static void mptcp_init_skb(struct sock *ssk, struct sk_buff *skb, int offset,
>>>>  skb_dst_drop(skb);
>>>>  }
>>>>  
>>>>  +/* "Inspired" from the TCP version */
>>>>  +static void mptcp_prune_ofo_queue(struct sock *sk, u64 seq)
>>>>  +{
>>>>  + struct mptcp_sock *msk = mptcp_sk(sk);
>>>>  + struct rb_node *node, *prev;
>>>>  + bool pruned = false;
>>>>  + u64 mem;
>>>>  +
>>>>  + if (RB_EMPTY_ROOT(&msk->out_of_order_queue))
>>>>  + return;
>>>>  +
>>>>  + node = &msk->ooo_last_skb->rbnode;
>>>>  +
>>>>  + do {
>>>>  + struct sk_buff *skb = rb_to_skb(node);
>>>>  +
>>>>  + /* Stop pruning if the incoming skb would land in OoO tail. */
>>>>  + if (after64(seq, MPTCP_SKB_CB(skb)->map_seq))
>>>>  + break;
>>>>  +
>>>>  + pruned = true;
>>>>  + prev = rb_prev(node);
>>>>  + rb_erase(node, &msk->out_of_order_queue);
>>>>  + mptcp_drop(sk, skb);
>>>>  + msk->ooo_last_skb = rb_to_skb(prev);
>>>>  +
>>>>  + mem = (unsigned int)atomic_read(&sk->sk_rmem_alloc);
>>>>  + if (mem < sk->sk_rcvbuf)
>>>>  + break;
>>>>
>>>  Hi Paolo,
>>>  
>>>  Thanks for the v8. While going through the code, I ran into a
>>>  point that I'm not entirely sure about.
>>>  
>>>  TCP‘s design uses sk->sk_rcvbuf >> 3 (one eighth of the buffer)
>>>  as a goal. It we use sk->sk_rcvbuf here, the loop may break after
>>>  deleting just one packet, right? This may fail to free enough space
>>>  for the incoming out‑of‑order packet, leading to repeated pruning
>>>  calls and potential packet drops.
>>>
>> Thank you for your review.
>>
>> Yes, here there is some difference vs plain TCP and I think it's for the
>> better.
>>
>> TCP tries to make a "significant" amount of space in the receiver buffer
>> while MPTCP tries to make room for a single packet.
>>
>> Both strategies make sense in their respective context: TCP invokes
>> tcp_prune_ofo_queue() only after collapsing, and the latter is very CPU
>> intensive; if tcp_prune_ofo_queue() would make room for a single packet,
>> the next incoming one could still trigger collapsing and burn a lot of
>> CPU cycles (note that this bad chain of events could still happen if
>> sk_rcvbuf / 8 is not bigger than 2 packets).
>>
>> MPTCP (intentionally, as per discussion in the last iterations here)
>> does not perform collapsing, and directly does pruning when over limit.
>> Pruning is relatively cheap - computational complexity wise we could do
>> it for each incoming packet with "no issues", at least compared to
>> collapsing.
>>
>> Dropping the minimal amount of packets to fit the incoming one, allows
>> MPTCP to minimize the need for reinjections (for packets already acked
>> at the TCP-level, which we really should avoid). I think overall this
>> compromise is a better fit for MPTCP.
>>
>> BTW did you have the chance to perform testing on top of this revision?
>>
> 
> Hi Paolo,
> 
> Yes, I've tested v8 many times — hundreds of runs. And the tls.c part works
> fine too.
> 
> I also spotted some issues with MPTCP TLS adaptation, but those are unrelated
> to v8. I'll reply in Geliang's thread about them soon.

Thank you for testing. I'll send a v9 to try to get sashiko comments on
remaining patches, but no real changes in it. Feel free to add your
formal Tested-by tag, if you want :)

Thanks,

Paolo


  reply	other threads:[~2026-05-27 10:01 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-22 21:43 [PATCH v8 mptcp-next 0/9] mptcp: address stall under memory pressure Paolo Abeni
2026-05-22 21:43 ` [PATCH v8 mptcp-next 1/9] mptcp: fix missing wakeups in edge scenarios Paolo Abeni
2026-05-26  7:47   ` Matthieu Baerts
2026-05-22 21:43 ` [PATCH v8 mptcp-next 2/9] mptcp: fix retransmission loop when csum is enabled Paolo Abeni
2026-05-26  7:48   ` Matthieu Baerts
2026-05-26 15:10     ` Paolo Abeni
2026-05-22 21:43 ` [PATCH v8 mptcp-next 3/9] mptcp: close TOCTOU race while computing rcv_wnd Paolo Abeni
2026-05-26  6:10   ` Geliang Tang
2026-05-26  6:34     ` Paolo Abeni
2026-05-26  7:48       ` Matthieu Baerts
2026-05-22 21:43 ` [PATCH v8 mptcp-next 4/9] mptcp: allow subflow rcv wnd to shrink Paolo Abeni
2026-05-24  8:34   ` Paolo Abeni
2026-05-26  7:02     ` Matthieu Baerts
2026-05-26  7:49       ` Matthieu Baerts
2026-05-26 15:17         ` Paolo Abeni
2026-05-27  0:51           ` Matthieu Baerts
2026-05-26  7:48   ` Matthieu Baerts
2026-05-26 15:16     ` Paolo Abeni
2026-05-22 21:43 ` [PATCH v8 mptcp-next 5/9] mptcp: explicitly drop over memory limits Paolo Abeni
2026-05-22 21:43 ` [PATCH v8 mptcp-next 6/9] mptcp: enforce hard limit on backlog flushing Paolo Abeni
2026-05-22 21:43 ` [PATCH v8 mptcp-next 7/9] mptcp: implemented OoO queue pruning Paolo Abeni
2026-05-26  3:13   ` gang.yan
2026-05-26  6:50     ` Paolo Abeni
2026-05-27  5:30       ` gang.yan
2026-05-27 10:01         ` Paolo Abeni [this message]
2026-05-28  1:18           ` gang.yan
2026-05-22 21:43 ` [PATCH v8 mptcp-next 8/9] mptcp: move the retrans loop to a separate helper Paolo Abeni
2026-05-22 21:43 ` [PATCH v8 mptcp-next 9/9] mptcp: let the retrans scheduler do its job Paolo Abeni
2026-05-27  5:46   ` Geliang Tang
2026-05-22 23:10 ` [PATCH v8 mptcp-next 0/9] mptcp: address stall under memory pressure MPTCP CI

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=76116a98-e6c3-4991-b4cd-d52c354e01ca@redhat.com \
    --to=pabeni@redhat.com \
    --cc=gang.yan@linux.dev \
    --cc=geliang@kernel.org \
    --cc=matttbe@kernel.org \
    --cc=mptcp@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox