MPTCP Linux Development
 help / color / mirror / Atom feed
From: gang.yan@linux.dev
To: "Paolo Abeni" <pabeni@redhat.com>, mptcp@lists.linux.dev
Cc: "Matthieu Baerts" <matttbe@kernel.org>,
	"Geliang Tang" <geliang@kernel.org>
Subject: Re: [PATCH v8 mptcp-next 7/9] mptcp: implemented OoO queue pruning
Date: Wed, 27 May 2026 05:30:09 +0000	[thread overview]
Message-ID: <6d38bd40e238fa4e578a629e8db789f36fc46ebb@linux.dev> (raw)
In-Reply-To: <1445abca-9a84-46df-be7d-8f7b656772a1@redhat.com>

May 26, 2026 at 2:50 PM, "Paolo Abeni" <pabeni@redhat.com mailto:pabeni@redhat.com?to=%22Paolo%20Abeni%22%20%3Cpabeni%40redhat.com%3E > wrote:


> 
> On 5/26/26 5:13 AM, gang.yan@linux.dev wrote:
> 
> > 
> > May 23, 2026 at 5:43 AM, "Paolo Abeni" <pabeni@redhat.com mailto:pabeni@redhat.com?to=%22Paolo%20Abeni%22%20%3Cpabeni%40redhat.com%3E > wrote:
> > 
> > > 
> > > diff --git a/net/mptcp/mib.c b/net/mptcp/mib.c
> > >  index ef65e2df709f..d9bd4f4afcc0 100644
> > >  --- a/net/mptcp/mib.c
> > >  +++ b/net/mptcp/mib.c
> > >  @@ -87,6 +87,7 @@ static const struct snmp_mib mptcp_snmp_list[] = {
> > >  SNMP_MIB_ITEM("WinProbe", MPTCP_MIB_WINPROBE),
> > >  SNMP_MIB_ITEM("BacklogDrop", MPTCP_MIB_BACKLOGDROP),
> > >  SNMP_MIB_ITEM("RcvPruned", MPTCP_MIB_RCVPRUNED),
> > >  + SNMP_MIB_ITEM("OfoPruned", MPTCP_MIB_OFO_PRUNED),
> > >  };
> > >  
> > >  /* mptcp_mib_alloc - allocate percpu mib counters
> > >  diff --git a/net/mptcp/mib.h b/net/mptcp/mib.h
> > >  index c84eb853d499..18f35f7e0a2d 100644
> > >  --- a/net/mptcp/mib.h
> > >  +++ b/net/mptcp/mib.h
> > >  @@ -90,6 +90,7 @@ enum linux_mptcp_mib_field {
> > >  MPTCP_MIB_WINPROBE, /* MPTCP-level zero window probe */
> > >  MPTCP_MIB_BACKLOGDROP, /* Backlog over memory limit */
> > >  MPTCP_MIB_RCVPRUNED, /* Dropped due to memory constrains */
> > >  + MPTCP_MIB_OFO_PRUNED, /* MPTCP-level OoO queue pruned */
> > >  __MPTCP_MIB_MAX
> > >  };
> > >  
> > >  diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> > >  index 03d6f8658467..f446e22148b9 100644
> > >  --- a/net/mptcp/protocol.c
> > >  +++ b/net/mptcp/protocol.c
> > >  @@ -373,6 +373,43 @@ static void mptcp_init_skb(struct sock *ssk, struct sk_buff *skb, int offset,
> > >  skb_dst_drop(skb);
> > >  }
> > >  
> > >  +/* "Inspired" from the TCP version */
> > >  +static void mptcp_prune_ofo_queue(struct sock *sk, u64 seq)
> > >  +{
> > >  + struct mptcp_sock *msk = mptcp_sk(sk);
> > >  + struct rb_node *node, *prev;
> > >  + bool pruned = false;
> > >  + u64 mem;
> > >  +
> > >  + if (RB_EMPTY_ROOT(&msk->out_of_order_queue))
> > >  + return;
> > >  +
> > >  + node = &msk->ooo_last_skb->rbnode;
> > >  +
> > >  + do {
> > >  + struct sk_buff *skb = rb_to_skb(node);
> > >  +
> > >  + /* Stop pruning if the incoming skb would land in OoO tail. */
> > >  + if (after64(seq, MPTCP_SKB_CB(skb)->map_seq))
> > >  + break;
> > >  +
> > >  + pruned = true;
> > >  + prev = rb_prev(node);
> > >  + rb_erase(node, &msk->out_of_order_queue);
> > >  + mptcp_drop(sk, skb);
> > >  + msk->ooo_last_skb = rb_to_skb(prev);
> > >  +
> > >  + mem = (unsigned int)atomic_read(&sk->sk_rmem_alloc);
> > >  + if (mem < sk->sk_rcvbuf)
> > >  + break;
> > > 
> >  Hi Paolo,
> >  
> >  Thanks for the v8. While going through the code, I ran into a
> >  point that I'm not entirely sure about.
> >  
> >  TCP‘s design uses sk->sk_rcvbuf >> 3 (one eighth of the buffer)
> >  as a goal. It we use sk->sk_rcvbuf here, the loop may break after
> >  deleting just one packet, right? This may fail to free enough space
> >  for the incoming out‑of‑order packet, leading to repeated pruning
> >  calls and potential packet drops.
> > 
> Thank you for your review.
> 
> Yes, here there is some difference vs plain TCP and I think it's for the
> better.
> 
> TCP tries to make a "significant" amount of space in the receiver buffer
> while MPTCP tries to make room for a single packet.
> 
> Both strategies make sense in their respective context: TCP invokes
> tcp_prune_ofo_queue() only after collapsing, and the latter is very CPU
> intensive; if tcp_prune_ofo_queue() would make room for a single packet,
> the next incoming one could still trigger collapsing and burn a lot of
> CPU cycles (note that this bad chain of events could still happen if
> sk_rcvbuf / 8 is not bigger than 2 packets).
> 
> MPTCP (intentionally, as per discussion in the last iterations here)
> does not perform collapsing, and directly does pruning when over limit.
> Pruning is relatively cheap - computational complexity wise we could do
> it for each incoming packet with "no issues", at least compared to
> collapsing.
> 
> Dropping the minimal amount of packets to fit the incoming one, allows
> MPTCP to minimize the need for reinjections (for packets already acked
> at the TCP-level, which we really should avoid). I think overall this
> compromise is a better fit for MPTCP.
> 
> BTW did you have the chance to perform testing on top of this revision?
> 

Hi Paolo,

Yes, I've tested v8 many times — hundreds of runs. And the tls.c part works
fine too.

I also spotted some issues with MPTCP TLS adaptation, but those are unrelated
to v8. I'll reply in Geliang's thread about them soon.

Thanks for your help.

Cherrs
Gang

> Thanks,
> 
> Paolo
>

  reply	other threads:[~2026-05-27  5:30 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-22 21:43 [PATCH v8 mptcp-next 0/9] mptcp: address stall under memory pressure Paolo Abeni
2026-05-22 21:43 ` [PATCH v8 mptcp-next 1/9] mptcp: fix missing wakeups in edge scenarios Paolo Abeni
2026-05-26  7:47   ` Matthieu Baerts
2026-05-22 21:43 ` [PATCH v8 mptcp-next 2/9] mptcp: fix retransmission loop when csum is enabled Paolo Abeni
2026-05-26  7:48   ` Matthieu Baerts
2026-05-26 15:10     ` Paolo Abeni
2026-05-22 21:43 ` [PATCH v8 mptcp-next 3/9] mptcp: close TOCTOU race while computing rcv_wnd Paolo Abeni
2026-05-26  6:10   ` Geliang Tang
2026-05-26  6:34     ` Paolo Abeni
2026-05-26  7:48       ` Matthieu Baerts
2026-05-22 21:43 ` [PATCH v8 mptcp-next 4/9] mptcp: allow subflow rcv wnd to shrink Paolo Abeni
2026-05-24  8:34   ` Paolo Abeni
2026-05-26  7:02     ` Matthieu Baerts
2026-05-26  7:49       ` Matthieu Baerts
2026-05-26 15:17         ` Paolo Abeni
2026-05-27  0:51           ` Matthieu Baerts
2026-05-26  7:48   ` Matthieu Baerts
2026-05-26 15:16     ` Paolo Abeni
2026-05-22 21:43 ` [PATCH v8 mptcp-next 5/9] mptcp: explicitly drop over memory limits Paolo Abeni
2026-05-22 21:43 ` [PATCH v8 mptcp-next 6/9] mptcp: enforce hard limit on backlog flushing Paolo Abeni
2026-05-22 21:43 ` [PATCH v8 mptcp-next 7/9] mptcp: implemented OoO queue pruning Paolo Abeni
2026-05-26  3:13   ` gang.yan
2026-05-26  6:50     ` Paolo Abeni
2026-05-27  5:30       ` gang.yan [this message]
2026-05-27 10:01         ` Paolo Abeni
2026-05-28  1:18           ` gang.yan
2026-05-22 21:43 ` [PATCH v8 mptcp-next 8/9] mptcp: move the retrans loop to a separate helper Paolo Abeni
2026-05-22 21:43 ` [PATCH v8 mptcp-next 9/9] mptcp: let the retrans scheduler do its job Paolo Abeni
2026-05-27  5:46   ` Geliang Tang
2026-05-22 23:10 ` [PATCH v8 mptcp-next 0/9] mptcp: address stall under memory pressure MPTCP CI

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6d38bd40e238fa4e578a629e8db789f36fc46ebb@linux.dev \
    --to=gang.yan@linux.dev \
    --cc=geliang@kernel.org \
    --cc=matttbe@kernel.org \
    --cc=mptcp@lists.linux.dev \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox