From: gang.yan@linux.dev
To: "Paolo Abeni" <pabeni@redhat.com>, mptcp@lists.linux.dev
Cc: "Matthieu Baerts" <matttbe@kernel.org>,
"Geliang Tang" <geliang@kernel.org>
Subject: Re: [PATCH v8 mptcp-next 7/9] mptcp: implemented OoO queue pruning
Date: Tue, 26 May 2026 03:13:37 +0000 [thread overview]
Message-ID: <9c1d28dbc379eff6cc09e9b02a6b77beafcdc4f2@linux.dev> (raw)
In-Reply-To: <b67f85e91abafcdc87d4b0b8e479f715489ed6f2.1779485511.git.pabeni@redhat.com>
May 23, 2026 at 5:43 AM, "Paolo Abeni" <pabeni@redhat.com mailto:pabeni@redhat.com?to=%22Paolo%20Abeni%22%20%3Cpabeni%40redhat.com%3E > wrote:
>
> Leverage the hybrid helpers to implement the receive queue and OoO queue
> collapsing at ingress time when reaching memory bounds.
>
> If the msk is owned by the user-space at incoming skb time, perform the
> pruning in the release_cb. The prune check is additionally performed
> when the skb reaches the msk-level queues.
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> v6 -> v7:
> - fix u64 -> u32 truncation
>
> v2 -> v3:
> - deal with unsynced TFO skb at prune time - only possible when pruning
> in mptcp_over_limit()
>
> v1 -> v2:
> - collapse rcv queue, too
> - deal with MPC map, too
> - drop left-over sentence in the commit message
>
> RFC -> v1:
> - use data_seq only when available
> - avoid ack_seq lockless access
> - drop limit on fallback
> - collapse rcvqueue, too
> - drop only when pruning is not possible and over rcvbuf * 2
>
> Note:
> - sashiko can be confused about fwd memory lifecycle (I can
> understand that :). Any exceeding amount of fwd allocated memory
> is always released by the next sk_mem_uncharge() - i.e. fwd memory
> is not tied to the current skb.
> - AFAICS KASAN handles bitmap variables in a sane way, and sashiko
> doesn't know about that
> ---
> net/mptcp/mib.c | 1 +
> net/mptcp/mib.h | 1 +
> net/mptcp/protocol.c | 46 +++++++++++++++++++++++++++++++++++++++++---
> 3 files changed, 45 insertions(+), 3 deletions(-)
>
> diff --git a/net/mptcp/mib.c b/net/mptcp/mib.c
> index ef65e2df709f..d9bd4f4afcc0 100644
> --- a/net/mptcp/mib.c
> +++ b/net/mptcp/mib.c
> @@ -87,6 +87,7 @@ static const struct snmp_mib mptcp_snmp_list[] = {
> SNMP_MIB_ITEM("WinProbe", MPTCP_MIB_WINPROBE),
> SNMP_MIB_ITEM("BacklogDrop", MPTCP_MIB_BACKLOGDROP),
> SNMP_MIB_ITEM("RcvPruned", MPTCP_MIB_RCVPRUNED),
> + SNMP_MIB_ITEM("OfoPruned", MPTCP_MIB_OFO_PRUNED),
> };
>
> /* mptcp_mib_alloc - allocate percpu mib counters
> diff --git a/net/mptcp/mib.h b/net/mptcp/mib.h
> index c84eb853d499..18f35f7e0a2d 100644
> --- a/net/mptcp/mib.h
> +++ b/net/mptcp/mib.h
> @@ -90,6 +90,7 @@ enum linux_mptcp_mib_field {
> MPTCP_MIB_WINPROBE, /* MPTCP-level zero window probe */
> MPTCP_MIB_BACKLOGDROP, /* Backlog over memory limit */
> MPTCP_MIB_RCVPRUNED, /* Dropped due to memory constrains */
> + MPTCP_MIB_OFO_PRUNED, /* MPTCP-level OoO queue pruned */
> __MPTCP_MIB_MAX
> };
>
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index 03d6f8658467..f446e22148b9 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -373,6 +373,43 @@ static void mptcp_init_skb(struct sock *ssk, struct sk_buff *skb, int offset,
> skb_dst_drop(skb);
> }
>
> +/* "Inspired" from the TCP version */
> +static void mptcp_prune_ofo_queue(struct sock *sk, u64 seq)
> +{
> + struct mptcp_sock *msk = mptcp_sk(sk);
> + struct rb_node *node, *prev;
> + bool pruned = false;
> + u64 mem;
> +
> + if (RB_EMPTY_ROOT(&msk->out_of_order_queue))
> + return;
> +
> + node = &msk->ooo_last_skb->rbnode;
> +
> + do {
> + struct sk_buff *skb = rb_to_skb(node);
> +
> + /* Stop pruning if the incoming skb would land in OoO tail. */
> + if (after64(seq, MPTCP_SKB_CB(skb)->map_seq))
> + break;
> +
> + pruned = true;
> + prev = rb_prev(node);
> + rb_erase(node, &msk->out_of_order_queue);
> + mptcp_drop(sk, skb);
> + msk->ooo_last_skb = rb_to_skb(prev);
> +
> + mem = (unsigned int)atomic_read(&sk->sk_rmem_alloc);
> + if (mem < sk->sk_rcvbuf)
> + break;
Hi Paolo,
Thanks for the v8. While going through the code, I ran into a
point that I'm not entirely sure about.
TCP‘s design uses sk->sk_rcvbuf >> 3 (one eighth of the buffer)
as a goal. It we use sk->sk_rcvbuf here, the loop may break after
deleting just one packet, right? This may fail to free enough space
for the incoming out‑of‑order packet, leading to repeated pruning
calls and potential packet drops.
Perhaps you intend to resolve the differences between TCP and MPTCP
when refactoring this function later?
Thanks
Gang
> +
> + node = prev;
> + } while (node);
> +
> + if (pruned)
> + MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_OFO_PRUNED);
> +}
> +
> static bool __mptcp_move_skb(struct sock *sk, struct sk_buff *skb)
> {
> u64 copy_len = MPTCP_SKB_CB(skb)->end_seq - MPTCP_SKB_CB(skb)->map_seq;
> @@ -386,9 +423,12 @@ static bool __mptcp_move_skb(struct sock *sk, struct sk_buff *skb)
> */
> if (unlikely(sk_rmem_alloc_get(sk) > READ_ONCE(sk->sk_rcvbuf)) &&
> !__mptcp_check_fallback(msk)) {
> - MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_RCVPRUNED);
> - mptcp_drop(sk, skb);
> - return false;
> + mptcp_prune_ofo_queue(sk, MPTCP_SKB_CB(skb)->map_seq);
> + if (sk_rmem_alloc_get(sk) > READ_ONCE(sk->sk_rcvbuf)) {
> + MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_RCVPRUNED);
> + mptcp_drop(sk, skb);
> + return false;
> + }
> }
>
> if (MPTCP_SKB_CB(skb)->map_seq == msk->ack_seq) {
> --
> 2.54.0
>
next prev parent reply other threads:[~2026-05-26 3:13 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-22 21:43 [PATCH v8 mptcp-next 0/9] mptcp: address stall under memory pressure Paolo Abeni
2026-05-22 21:43 ` [PATCH v8 mptcp-next 1/9] mptcp: fix missing wakeups in edge scenarios Paolo Abeni
2026-05-26 7:47 ` Matthieu Baerts
2026-05-22 21:43 ` [PATCH v8 mptcp-next 2/9] mptcp: fix retransmission loop when csum is enabled Paolo Abeni
2026-05-26 7:48 ` Matthieu Baerts
2026-05-26 15:10 ` Paolo Abeni
2026-05-22 21:43 ` [PATCH v8 mptcp-next 3/9] mptcp: close TOCTOU race while computing rcv_wnd Paolo Abeni
2026-05-26 6:10 ` Geliang Tang
2026-05-26 6:34 ` Paolo Abeni
2026-05-26 7:48 ` Matthieu Baerts
2026-05-22 21:43 ` [PATCH v8 mptcp-next 4/9] mptcp: allow subflow rcv wnd to shrink Paolo Abeni
2026-05-24 8:34 ` Paolo Abeni
2026-05-26 7:02 ` Matthieu Baerts
2026-05-26 7:49 ` Matthieu Baerts
2026-05-26 15:17 ` Paolo Abeni
2026-05-27 0:51 ` Matthieu Baerts
2026-05-26 7:48 ` Matthieu Baerts
2026-05-26 15:16 ` Paolo Abeni
2026-05-22 21:43 ` [PATCH v8 mptcp-next 5/9] mptcp: explicitly drop over memory limits Paolo Abeni
2026-05-22 21:43 ` [PATCH v8 mptcp-next 6/9] mptcp: enforce hard limit on backlog flushing Paolo Abeni
2026-05-22 21:43 ` [PATCH v8 mptcp-next 7/9] mptcp: implemented OoO queue pruning Paolo Abeni
2026-05-26 3:13 ` gang.yan [this message]
2026-05-26 6:50 ` Paolo Abeni
2026-05-27 5:30 ` gang.yan
2026-05-27 10:01 ` Paolo Abeni
2026-05-28 1:18 ` gang.yan
2026-05-22 21:43 ` [PATCH v8 mptcp-next 8/9] mptcp: move the retrans loop to a separate helper Paolo Abeni
2026-05-22 21:43 ` [PATCH v8 mptcp-next 9/9] mptcp: let the retrans scheduler do its job Paolo Abeni
2026-05-27 5:46 ` Geliang Tang
2026-05-22 23:10 ` [PATCH v8 mptcp-next 0/9] mptcp: address stall under memory pressure MPTCP CI
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9c1d28dbc379eff6cc09e9b02a6b77beafcdc4f2@linux.dev \
--to=gang.yan@linux.dev \
--cc=geliang@kernel.org \
--cc=matttbe@kernel.org \
--cc=mptcp@lists.linux.dev \
--cc=pabeni@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.