From: gang.yan@linux.dev
To: "Paolo Abeni" <pabeni@redhat.com>, mptcp@lists.linux.dev
Cc: "Matthieu Baerts" <matttbe@kernel.org>,
"Geliang Tang" <geliang@kernel.org>
Subject: Re: [PATCH v8 mptcp-next 7/9] mptcp: implemented OoO queue pruning
Date: Wed, 27 May 2026 05:30:09 +0000 [thread overview]
Message-ID: <6d38bd40e238fa4e578a629e8db789f36fc46ebb@linux.dev> (raw)
In-Reply-To: <1445abca-9a84-46df-be7d-8f7b656772a1@redhat.com>
May 26, 2026 at 2:50 PM, "Paolo Abeni" <pabeni@redhat.com mailto:pabeni@redhat.com?to=%22Paolo%20Abeni%22%20%3Cpabeni%40redhat.com%3E > wrote:
>
> On 5/26/26 5:13 AM, gang.yan@linux.dev wrote:
>
> >
> > May 23, 2026 at 5:43 AM, "Paolo Abeni" <pabeni@redhat.com mailto:pabeni@redhat.com?to=%22Paolo%20Abeni%22%20%3Cpabeni%40redhat.com%3E > wrote:
> >
> > >
> > > diff --git a/net/mptcp/mib.c b/net/mptcp/mib.c
> > > index ef65e2df709f..d9bd4f4afcc0 100644
> > > --- a/net/mptcp/mib.c
> > > +++ b/net/mptcp/mib.c
> > > @@ -87,6 +87,7 @@ static const struct snmp_mib mptcp_snmp_list[] = {
> > > SNMP_MIB_ITEM("WinProbe", MPTCP_MIB_WINPROBE),
> > > SNMP_MIB_ITEM("BacklogDrop", MPTCP_MIB_BACKLOGDROP),
> > > SNMP_MIB_ITEM("RcvPruned", MPTCP_MIB_RCVPRUNED),
> > > + SNMP_MIB_ITEM("OfoPruned", MPTCP_MIB_OFO_PRUNED),
> > > };
> > >
> > > /* mptcp_mib_alloc - allocate percpu mib counters
> > > diff --git a/net/mptcp/mib.h b/net/mptcp/mib.h
> > > index c84eb853d499..18f35f7e0a2d 100644
> > > --- a/net/mptcp/mib.h
> > > +++ b/net/mptcp/mib.h
> > > @@ -90,6 +90,7 @@ enum linux_mptcp_mib_field {
> > > MPTCP_MIB_WINPROBE, /* MPTCP-level zero window probe */
> > > MPTCP_MIB_BACKLOGDROP, /* Backlog over memory limit */
> > > MPTCP_MIB_RCVPRUNED, /* Dropped due to memory constrains */
> > > + MPTCP_MIB_OFO_PRUNED, /* MPTCP-level OoO queue pruned */
> > > __MPTCP_MIB_MAX
> > > };
> > >
> > > diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> > > index 03d6f8658467..f446e22148b9 100644
> > > --- a/net/mptcp/protocol.c
> > > +++ b/net/mptcp/protocol.c
> > > @@ -373,6 +373,43 @@ static void mptcp_init_skb(struct sock *ssk, struct sk_buff *skb, int offset,
> > > skb_dst_drop(skb);
> > > }
> > >
> > > +/* "Inspired" from the TCP version */
> > > +static void mptcp_prune_ofo_queue(struct sock *sk, u64 seq)
> > > +{
> > > + struct mptcp_sock *msk = mptcp_sk(sk);
> > > + struct rb_node *node, *prev;
> > > + bool pruned = false;
> > > + u64 mem;
> > > +
> > > + if (RB_EMPTY_ROOT(&msk->out_of_order_queue))
> > > + return;
> > > +
> > > + node = &msk->ooo_last_skb->rbnode;
> > > +
> > > + do {
> > > + struct sk_buff *skb = rb_to_skb(node);
> > > +
> > > + /* Stop pruning if the incoming skb would land in OoO tail. */
> > > + if (after64(seq, MPTCP_SKB_CB(skb)->map_seq))
> > > + break;
> > > +
> > > + pruned = true;
> > > + prev = rb_prev(node);
> > > + rb_erase(node, &msk->out_of_order_queue);
> > > + mptcp_drop(sk, skb);
> > > + msk->ooo_last_skb = rb_to_skb(prev);
> > > +
> > > + mem = (unsigned int)atomic_read(&sk->sk_rmem_alloc);
> > > + if (mem < sk->sk_rcvbuf)
> > > + break;
> > >
> > Hi Paolo,
> >
> > Thanks for the v8. While going through the code, I ran into a
> > point that I'm not entirely sure about.
> >
> > TCP‘s design uses sk->sk_rcvbuf >> 3 (one eighth of the buffer)
> > as a goal. It we use sk->sk_rcvbuf here, the loop may break after
> > deleting just one packet, right? This may fail to free enough space
> > for the incoming out‑of‑order packet, leading to repeated pruning
> > calls and potential packet drops.
> >
> Thank you for your review.
>
> Yes, here there is some difference vs plain TCP and I think it's for the
> better.
>
> TCP tries to make a "significant" amount of space in the receiver buffer
> while MPTCP tries to make room for a single packet.
>
> Both strategies make sense in their respective context: TCP invokes
> tcp_prune_ofo_queue() only after collapsing, and the latter is very CPU
> intensive; if tcp_prune_ofo_queue() would make room for a single packet,
> the next incoming one could still trigger collapsing and burn a lot of
> CPU cycles (note that this bad chain of events could still happen if
> sk_rcvbuf / 8 is not bigger than 2 packets).
>
> MPTCP (intentionally, as per discussion in the last iterations here)
> does not perform collapsing, and directly does pruning when over limit.
> Pruning is relatively cheap - computational complexity wise we could do
> it for each incoming packet with "no issues", at least compared to
> collapsing.
>
> Dropping the minimal amount of packets to fit the incoming one, allows
> MPTCP to minimize the need for reinjections (for packets already acked
> at the TCP-level, which we really should avoid). I think overall this
> compromise is a better fit for MPTCP.
>
> BTW did you have the chance to perform testing on top of this revision?
>
Hi Paolo,
Yes, I've tested v8 many times — hundreds of runs. And the tls.c part works
fine too.
I also spotted some issues with MPTCP TLS adaptation, but those are unrelated
to v8. I'll reply in Geliang's thread about them soon.
Thanks for your help.
Cherrs
Gang
> Thanks,
>
> Paolo
>
next prev parent reply other threads:[~2026-05-27 5:30 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-22 21:43 [PATCH v8 mptcp-next 0/9] mptcp: address stall under memory pressure Paolo Abeni
2026-05-22 21:43 ` [PATCH v8 mptcp-next 1/9] mptcp: fix missing wakeups in edge scenarios Paolo Abeni
2026-05-26 7:47 ` Matthieu Baerts
2026-05-22 21:43 ` [PATCH v8 mptcp-next 2/9] mptcp: fix retransmission loop when csum is enabled Paolo Abeni
2026-05-26 7:48 ` Matthieu Baerts
2026-05-26 15:10 ` Paolo Abeni
2026-05-22 21:43 ` [PATCH v8 mptcp-next 3/9] mptcp: close TOCTOU race while computing rcv_wnd Paolo Abeni
2026-05-26 6:10 ` Geliang Tang
2026-05-26 6:34 ` Paolo Abeni
2026-05-26 7:48 ` Matthieu Baerts
2026-05-22 21:43 ` [PATCH v8 mptcp-next 4/9] mptcp: allow subflow rcv wnd to shrink Paolo Abeni
2026-05-24 8:34 ` Paolo Abeni
2026-05-26 7:02 ` Matthieu Baerts
2026-05-26 7:49 ` Matthieu Baerts
2026-05-26 15:17 ` Paolo Abeni
2026-05-27 0:51 ` Matthieu Baerts
2026-05-26 7:48 ` Matthieu Baerts
2026-05-26 15:16 ` Paolo Abeni
2026-05-22 21:43 ` [PATCH v8 mptcp-next 5/9] mptcp: explicitly drop over memory limits Paolo Abeni
2026-05-22 21:43 ` [PATCH v8 mptcp-next 6/9] mptcp: enforce hard limit on backlog flushing Paolo Abeni
2026-05-22 21:43 ` [PATCH v8 mptcp-next 7/9] mptcp: implemented OoO queue pruning Paolo Abeni
2026-05-26 3:13 ` gang.yan
2026-05-26 6:50 ` Paolo Abeni
2026-05-27 5:30 ` gang.yan [this message]
2026-05-27 10:01 ` Paolo Abeni
2026-05-28 1:18 ` gang.yan
2026-05-22 21:43 ` [PATCH v8 mptcp-next 8/9] mptcp: move the retrans loop to a separate helper Paolo Abeni
2026-05-22 21:43 ` [PATCH v8 mptcp-next 9/9] mptcp: let the retrans scheduler do its job Paolo Abeni
2026-05-27 5:46 ` Geliang Tang
2026-05-22 23:10 ` [PATCH v8 mptcp-next 0/9] mptcp: address stall under memory pressure MPTCP CI
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6d38bd40e238fa4e578a629e8db789f36fc46ebb@linux.dev \
--to=gang.yan@linux.dev \
--cc=geliang@kernel.org \
--cc=matttbe@kernel.org \
--cc=mptcp@lists.linux.dev \
--cc=pabeni@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox