From: gang.yan@linux.dev
To: "Paolo Abeni" <pabeni@redhat.com>, mptcp@lists.linux.dev
Cc: "Matthieu Baerts" <matttbe@kernel.org>,
"Geliang Tang" <geliang@kernel.org>
Subject: Re: [PATCH v8 mptcp-next 7/9] mptcp: implemented OoO queue pruning
Date: Wed, 27 May 2026 05:30:09 +0000 [thread overview]
Message-ID: <6d38bd40e238fa4e578a629e8db789f36fc46ebb@linux.dev> (raw)
In-Reply-To: <1445abca-9a84-46df-be7d-8f7b656772a1@redhat.com>
May 26, 2026 at 2:50 PM, "Paolo Abeni" <pabeni@redhat.com mailto:pabeni@redhat.com?to=%22Paolo%20Abeni%22%20%3Cpabeni%40redhat.com%3E > wrote:
>
> On 5/26/26 5:13 AM, gang.yan@linux.dev wrote:
>
> >
> > May 23, 2026 at 5:43 AM, "Paolo Abeni" <pabeni@redhat.com mailto:pabeni@redhat.com?to=%22Paolo%20Abeni%22%20%3Cpabeni%40redhat.com%3E > wrote:
> >
> > >
> > > diff --git a/net/mptcp/mib.c b/net/mptcp/mib.c
> > > index ef65e2df709f..d9bd4f4afcc0 100644
> > > --- a/net/mptcp/mib.c
> > > +++ b/net/mptcp/mib.c
> > > @@ -87,6 +87,7 @@ static const struct snmp_mib mptcp_snmp_list[] = {
> > > SNMP_MIB_ITEM("WinProbe", MPTCP_MIB_WINPROBE),
> > > SNMP_MIB_ITEM("BacklogDrop", MPTCP_MIB_BACKLOGDROP),
> > > SNMP_MIB_ITEM("RcvPruned", MPTCP_MIB_RCVPRUNED),
> > > + SNMP_MIB_ITEM("OfoPruned", MPTCP_MIB_OFO_PRUNED),
> > > };
> > >
> > > /* mptcp_mib_alloc - allocate percpu mib counters
> > > diff --git a/net/mptcp/mib.h b/net/mptcp/mib.h
> > > index c84eb853d499..18f35f7e0a2d 100644
> > > --- a/net/mptcp/mib.h
> > > +++ b/net/mptcp/mib.h
> > > @@ -90,6 +90,7 @@ enum linux_mptcp_mib_field {
> > > MPTCP_MIB_WINPROBE, /* MPTCP-level zero window probe */
> > > MPTCP_MIB_BACKLOGDROP, /* Backlog over memory limit */
> > > MPTCP_MIB_RCVPRUNED, /* Dropped due to memory constrains */
> > > + MPTCP_MIB_OFO_PRUNED, /* MPTCP-level OoO queue pruned */
> > > __MPTCP_MIB_MAX
> > > };
> > >
> > > diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> > > index 03d6f8658467..f446e22148b9 100644
> > > --- a/net/mptcp/protocol.c
> > > +++ b/net/mptcp/protocol.c
> > > @@ -373,6 +373,43 @@ static void mptcp_init_skb(struct sock *ssk, struct sk_buff *skb, int offset,
> > > skb_dst_drop(skb);
> > > }
> > >
> > > +/* "Inspired" from the TCP version */
> > > +static void mptcp_prune_ofo_queue(struct sock *sk, u64 seq)
> > > +{
> > > + struct mptcp_sock *msk = mptcp_sk(sk);
> > > + struct rb_node *node, *prev;
> > > + bool pruned = false;
> > > + u64 mem;
> > > +
> > > + if (RB_EMPTY_ROOT(&msk->out_of_order_queue))
> > > + return;
> > > +
> > > + node = &msk->ooo_last_skb->rbnode;
> > > +
> > > + do {
> > > + struct sk_buff *skb = rb_to_skb(node);
> > > +
> > > + /* Stop pruning if the incoming skb would land in OoO tail. */
> > > + if (after64(seq, MPTCP_SKB_CB(skb)->map_seq))
> > > + break;
> > > +
> > > + pruned = true;
> > > + prev = rb_prev(node);
> > > + rb_erase(node, &msk->out_of_order_queue);
> > > + mptcp_drop(sk, skb);
> > > + msk->ooo_last_skb = rb_to_skb(prev);
> > > +
> > > + mem = (unsigned int)atomic_read(&sk->sk_rmem_alloc);
> > > + if (mem < sk->sk_rcvbuf)
> > > + break;
> > >
> > Hi Paolo,
> >
> > Thanks for the v8. While going through the code, I ran into a
> > point that I'm not entirely sure about.
> >
> > TCP‘s design uses sk->sk_rcvbuf >> 3 (one eighth of the buffer)
> > as a goal. It we use sk->sk_rcvbuf here, the loop may break after
> > deleting just one packet, right? This may fail to free enough space
> > for the incoming out‑of‑order packet, leading to repeated pruning
> > calls and potential packet drops.
> >
> Thank you for your review.
>
> Yes, here there is some difference vs plain TCP and I think it's for the
> better.
>
> TCP tries to make a "significant" amount of space in the receiver buffer
> while MPTCP tries to make room for a single packet.
>
> Both strategies make sense in their respective context: TCP invokes
> tcp_prune_ofo_queue() only after collapsing, and the latter is very CPU
> intensive; if tcp_prune_ofo_queue() would make room for a single packet,
> the next incoming one could still trigger collapsing and burn a lot of
> CPU cycles (note that this bad chain of events could still happen if
> sk_rcvbuf / 8 is not bigger than 2 packets).
>
> MPTCP (intentionally, as per discussion in the last iterations here)
> does not perform collapsing, and directly does pruning when over limit.
> Pruning is relatively cheap - computational complexity wise we could do
> it for each incoming packet with "no issues", at least compared to
> collapsing.
>
> Dropping the minimal amount of packets to fit the incoming one, allows
> MPTCP to minimize the need for reinjections (for packets already acked
> at the TCP-level, which we really should avoid). I think overall this
> compromise is a better fit for MPTCP.
>
> BTW did you have the chance to perform testing on top of this revision?
>
Hi Paolo,
Yes, I've tested v8 many times — hundreds of runs. And the tls.c part works
fine too.
I also spotted some issues with MPTCP TLS adaptation, but those are unrelated
to v8. I'll reply in Geliang's thread about them soon.
Thanks for your help.
Cherrs
Gang
> Thanks,
>
> Paolo
>
next prev parent reply other threads:[~2026-05-27 5:30 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-22 21:43 [PATCH v8 mptcp-next 0/9] mptcp: address stall under memory pressure Paolo Abeni
2026-05-22 21:43 ` [PATCH v8 mptcp-next 1/9] mptcp: fix missing wakeups in edge scenarios Paolo Abeni
2026-05-26 7:47 ` Matthieu Baerts
2026-05-22 21:43 ` [PATCH v8 mptcp-next 2/9] mptcp: fix retransmission loop when csum is enabled Paolo Abeni
2026-05-26 7:48 ` Matthieu Baerts
2026-05-26 15:10 ` Paolo Abeni
2026-05-22 21:43 ` [PATCH v8 mptcp-next 3/9] mptcp: close TOCTOU race while computing rcv_wnd Paolo Abeni
2026-05-26 6:10 ` Geliang Tang
2026-05-26 6:34 ` Paolo Abeni
2026-05-26 7:48 ` Matthieu Baerts
2026-05-22 21:43 ` [PATCH v8 mptcp-next 4/9] mptcp: allow subflow rcv wnd to shrink Paolo Abeni
2026-05-24 8:34 ` Paolo Abeni
2026-05-26 7:02 ` Matthieu Baerts
2026-05-26 7:49 ` Matthieu Baerts
2026-05-26 15:17 ` Paolo Abeni
2026-05-27 0:51 ` Matthieu Baerts
2026-05-26 7:48 ` Matthieu Baerts
2026-05-26 15:16 ` Paolo Abeni
2026-05-22 21:43 ` [PATCH v8 mptcp-next 5/9] mptcp: explicitly drop over memory limits Paolo Abeni
2026-05-22 21:43 ` [PATCH v8 mptcp-next 6/9] mptcp: enforce hard limit on backlog flushing Paolo Abeni
2026-05-22 21:43 ` [PATCH v8 mptcp-next 7/9] mptcp: implemented OoO queue pruning Paolo Abeni
2026-05-26 3:13 ` gang.yan
2026-05-26 6:50 ` Paolo Abeni
2026-05-27 5:30 ` gang.yan [this message]
2026-05-27 10:01 ` Paolo Abeni
2026-05-28 1:18 ` gang.yan
2026-05-22 21:43 ` [PATCH v8 mptcp-next 8/9] mptcp: move the retrans loop to a separate helper Paolo Abeni
2026-05-22 21:43 ` [PATCH v8 mptcp-next 9/9] mptcp: let the retrans scheduler do its job Paolo Abeni
2026-05-27 5:46 ` Geliang Tang
2026-05-22 23:10 ` [PATCH v8 mptcp-next 0/9] mptcp: address stall under memory pressure MPTCP CI
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6d38bd40e238fa4e578a629e8db789f36fc46ebb@linux.dev \
--to=gang.yan@linux.dev \
--cc=geliang@kernel.org \
--cc=matttbe@kernel.org \
--cc=mptcp@lists.linux.dev \
--cc=pabeni@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.