MPTCP Linux Development
 help / color / mirror / Atom feed
* [PATCH RESENT v7 mptcp-next 0/4] mptcp: introduce backlog processing
@ 2025-10-27 14:57 Paolo Abeni
  2025-10-27 14:57 ` [PATCH RESENT v7 mptcp-next 1/4] mptcp: handle first subflow closing consistently Paolo Abeni
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Paolo Abeni @ 2025-10-27 14:57 UTC (permalink / raw)
  To: mptcp; +Cc: Mat Martineau, geliang

This series includes RX path improvement built around backlog processing

The main goals are improving the RX performances _and_ increase the
long term maintainability.

Patch 1 and 2 refactor the memory account logic in the RX path, so that
the msk don't need anymore to do fwd allocation, removing possible drop
sources.

Patch 3 and 4 cope with backlog processing. Patch 3 introduces the
helpers needed to manipulate the msk-level backlog, and the data struct
itself, without any actual functional change. Patch 4 finally use the
backlog for RX skb processing. Note that MPTCP can't use the sk_backlog,
as the mptcp release callback can also release and re-acquire the
msk-level spinlock and core backlog processing works under the
assumption that such event is not possible.

A relevant point is memory accounts for skbs in the backlog.

It's somewhat "original" due to MPTCP constraints. Such skbs use space
from the incoming subflow receive buffer, but are fwd memory accounted
on the msk, using memory borrowed by the subflow.

Instead the msk borrows memory from the subflow and reserve it for
the backlog - see patch 2 and 4 for the gory details.

Paolo Abeni (4):
  mptcp: handle first subflow closing consistently
  mptcp: borrow forward memory from subflow
  mptcp: introduce mptcp-level backlog
  mptcp: leverage the backlog for RX packet processing

 net/mptcp/fastopen.c   |   4 +-
 net/mptcp/mib.c        |   1 -
 net/mptcp/mib.h        |   1 -
 net/mptcp/mptcp_diag.c |   3 +-
 net/mptcp/protocol.c   | 231 +++++++++++++++++++++++++++++------------
 net/mptcp/protocol.h   |  40 ++++++-
 6 files changed, 208 insertions(+), 72 deletions(-)

-- 
2.51.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH RESENT v7 mptcp-next 1/4] mptcp: handle first subflow closing consistently
  2025-10-27 14:57 [PATCH RESENT v7 mptcp-next 0/4] mptcp: introduce backlog processing Paolo Abeni
@ 2025-10-27 14:57 ` Paolo Abeni
  2025-10-27 14:58 ` [PATCH RESENT v7 mptcp-next 2/4] mptcp: borrow forward memory from subflow Paolo Abeni
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: Paolo Abeni @ 2025-10-27 14:57 UTC (permalink / raw)
  To: mptcp; +Cc: Mat Martineau, geliang

Currently, as soon as the PM closes a subflow, the msk stops accepting data
from it, even if the TCP socket could be still formally open in the
incoming direction, with the notable exception of the first subflow.

The root cause of such behavior is that code currently piggy back two
separate semantic on the subflow->disposable bit: the subflow context
must be released and that the subflow must stop accepting incoming
data.

The first subflow is never disposed, so it also never stop accepting
incoming data. Use a separate bit to mark to mark the latter status
and set such bit in __mptcp_close_ssk() for all subflows.

Beyond making per subflow behaviour more consistent this will also
simplify the next patch.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/mptcp/protocol.c | 14 +++++++++-----
 net/mptcp/protocol.h |  3 ++-
 2 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index f4e3d0be7c87..74be417be980 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -842,10 +842,10 @@ void mptcp_data_ready(struct sock *sk, struct sock *ssk)
 	struct mptcp_sock *msk = mptcp_sk(sk);
 
 	/* The peer can send data while we are shutting down this
-	 * subflow at msk destruction time, but we must avoid enqueuing
+	 * subflow at subflow destruction time, but we must avoid enqueuing
 	 * more data to the msk receive queue
 	 */
-	if (unlikely(subflow->disposable))
+	if (unlikely(subflow->closing))
 		return;
 
 	mptcp_data_lock(sk);
@@ -2429,6 +2429,13 @@ static void __mptcp_close_ssk(struct sock *sk, struct sock *ssk,
 	struct mptcp_sock *msk = mptcp_sk(sk);
 	bool dispose_it, need_push = false;
 
+	/* Do not pass RX data to the msk, even if the subflow socket is not
+	 * going to be freed (i.e. even for the first subflow on graceful
+	 * subflow close.
+	 */
+	lock_sock_nested(ssk, SINGLE_DEPTH_NESTING);
+	subflow->closing = 1;
+
 	/* If the first subflow moved to a close state before accept, e.g. due
 	 * to an incoming reset or listener shutdown, the subflow socket is
 	 * already deleted by inet_child_forget() and the mptcp socket can't
@@ -2439,7 +2446,6 @@ static void __mptcp_close_ssk(struct sock *sk, struct sock *ssk,
 		/* ensure later check in mptcp_worker() will dispose the msk */
 		sock_set_flag(sk, SOCK_DEAD);
 		mptcp_set_close_tout(sk, tcp_jiffies32 - (mptcp_close_timeout(sk) + 1));
-		lock_sock_nested(ssk, SINGLE_DEPTH_NESTING);
 		mptcp_subflow_drop_ctx(ssk);
 		goto out_release;
 	}
@@ -2448,8 +2454,6 @@ static void __mptcp_close_ssk(struct sock *sk, struct sock *ssk,
 	if (dispose_it)
 		list_del(&subflow->node);
 
-	lock_sock_nested(ssk, SINGLE_DEPTH_NESTING);
-
 	if ((flags & MPTCP_CF_FASTCLOSE) && !__mptcp_check_fallback(msk)) {
 		/* be sure to force the tcp_close path
 		 * to generate the egress reset
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index cd6350073144..9f7e5f2c964d 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -536,12 +536,13 @@ struct mptcp_subflow_context {
 		send_infinite_map : 1,
 		remote_key_valid : 1,        /* received the peer key from */
 		disposable : 1,	    /* ctx can be free at ulp release time */
+		closing : 1,	    /* must not pass rx data to msk anymore */
 		stale : 1,	    /* unable to snd/rcv data, do not use for xmit */
 		valid_csum_seen : 1,        /* at least one csum validated */
 		is_mptfo : 1,	    /* subflow is doing TFO */
 		close_event_done : 1,       /* has done the post-closed part */
 		mpc_drop : 1,	    /* the MPC option has been dropped in a rtx */
-		__unused : 9;
+		__unused : 8;
 	bool	data_avail;
 	bool	scheduled;
 	bool	pm_listener;	    /* a listener managed by the kernel PM? */
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH RESENT v7 mptcp-next 2/4] mptcp: borrow forward memory from subflow
  2025-10-27 14:57 [PATCH RESENT v7 mptcp-next 0/4] mptcp: introduce backlog processing Paolo Abeni
  2025-10-27 14:57 ` [PATCH RESENT v7 mptcp-next 1/4] mptcp: handle first subflow closing consistently Paolo Abeni
@ 2025-10-27 14:58 ` Paolo Abeni
  2025-10-31 22:44   ` Mat Martineau
  2025-10-27 14:58 ` [PATCH RESENT v7 mptcp-next 3/4] mptcp: introduce mptcp-level backlog Paolo Abeni
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Paolo Abeni @ 2025-10-27 14:58 UTC (permalink / raw)
  To: mptcp; +Cc: Mat Martineau, geliang

In the MPTCP receive path, we release the subflow allocated fwd
memory just to allocate it again shortly after for the msk.

That could increases the failures chances, especially when we will
add backlog processing, with other actions could consume the just
released memory before the msk socket has a chance to do the
rcv allocation.

Replace the skb_orphan() call with an open-coded variant that
explicitly borrows, the fwd memory from the subflow socket instead
of releasing it.

The borrowed memory does not have PAGE_SIZE granularity; rounding to
the page size will make the fwd allocated memory higher than what is
strictly required and could make the incoming subflow fwd mem
consistently negative. Instead, keep track of the accumulated frag and
borrow the full page at subflow close time.

This allow removing the last drop in the TCP to MPTCP transition and
the associated, now unused, MIB.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/mptcp/fastopen.c |  4 +++-
 net/mptcp/mib.c      |  1 -
 net/mptcp/mib.h      |  1 -
 net/mptcp/protocol.c | 23 +++++++++++++++--------
 net/mptcp/protocol.h | 23 +++++++++++++++++++++++
 5 files changed, 41 insertions(+), 11 deletions(-)

diff --git a/net/mptcp/fastopen.c b/net/mptcp/fastopen.c
index b9e451197902..82ec15bcfd7f 100644
--- a/net/mptcp/fastopen.c
+++ b/net/mptcp/fastopen.c
@@ -32,7 +32,8 @@ void mptcp_fastopen_subflow_synack_set_params(struct mptcp_subflow_context *subf
 	/* dequeue the skb from sk receive queue */
 	__skb_unlink(skb, &ssk->sk_receive_queue);
 	skb_ext_reset(skb);
-	skb_orphan(skb);
+
+	mptcp_subflow_lend_fwdmem(subflow, skb);
 
 	/* We copy the fastopen data, but that don't belong to the mptcp sequence
 	 * space, need to offset it in the subflow sequence, see mptcp_subflow_get_map_offset()
@@ -50,6 +51,7 @@ void mptcp_fastopen_subflow_synack_set_params(struct mptcp_subflow_context *subf
 	mptcp_data_lock(sk);
 	DEBUG_NET_WARN_ON_ONCE(sock_owned_by_user_nocheck(sk));
 
+	mptcp_borrow_fwdmem(sk, skb);
 	skb_set_owner_r(skb, sk);
 	__skb_queue_tail(&sk->sk_receive_queue, skb);
 	mptcp_sk(sk)->bytes_received += skb->len;
diff --git a/net/mptcp/mib.c b/net/mptcp/mib.c
index 171643815076..f23fda0c55a7 100644
--- a/net/mptcp/mib.c
+++ b/net/mptcp/mib.c
@@ -71,7 +71,6 @@ static const struct snmp_mib mptcp_snmp_list[] = {
 	SNMP_MIB_ITEM("MPFastcloseRx", MPTCP_MIB_MPFASTCLOSERX),
 	SNMP_MIB_ITEM("MPRstTx", MPTCP_MIB_MPRSTTX),
 	SNMP_MIB_ITEM("MPRstRx", MPTCP_MIB_MPRSTRX),
-	SNMP_MIB_ITEM("RcvPruned", MPTCP_MIB_RCVPRUNED),
 	SNMP_MIB_ITEM("SubflowStale", MPTCP_MIB_SUBFLOWSTALE),
 	SNMP_MIB_ITEM("SubflowRecover", MPTCP_MIB_SUBFLOWRECOVER),
 	SNMP_MIB_ITEM("SndWndShared", MPTCP_MIB_SNDWNDSHARED),
diff --git a/net/mptcp/mib.h b/net/mptcp/mib.h
index a1d3e9369fbb..812218b5ed2b 100644
--- a/net/mptcp/mib.h
+++ b/net/mptcp/mib.h
@@ -70,7 +70,6 @@ enum linux_mptcp_mib_field {
 	MPTCP_MIB_MPFASTCLOSERX,	/* Received a MP_FASTCLOSE */
 	MPTCP_MIB_MPRSTTX,		/* Transmit a MP_RST */
 	MPTCP_MIB_MPRSTRX,		/* Received a MP_RST */
-	MPTCP_MIB_RCVPRUNED,		/* Incoming packet dropped due to memory limit */
 	MPTCP_MIB_SUBFLOWSTALE,		/* Subflows entered 'stale' status */
 	MPTCP_MIB_SUBFLOWRECOVER,	/* Subflows returned to active status after being stale */
 	MPTCP_MIB_SNDWNDSHARED,		/* Subflow snd wnd is overridden by msk's one */
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 74be417be980..f6d96cb01e00 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -349,7 +349,7 @@ static void mptcp_data_queue_ofo(struct mptcp_sock *msk, struct sk_buff *skb)
 static void mptcp_init_skb(struct sock *ssk, struct sk_buff *skb, int offset,
 			   int copy_len)
 {
-	const struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
+	struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
 	bool has_rxtstamp = TCP_SKB_CB(skb)->has_rxtstamp;
 
 	/* the skb map_seq accounts for the skb offset:
@@ -374,11 +374,7 @@ static bool __mptcp_move_skb(struct sock *sk, struct sk_buff *skb)
 	struct mptcp_sock *msk = mptcp_sk(sk);
 	struct sk_buff *tail;
 
-	/* try to fetch required memory from subflow */
-	if (!sk_rmem_schedule(sk, skb, skb->truesize)) {
-		MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_RCVPRUNED);
-		goto drop;
-	}
+	mptcp_borrow_fwdmem(sk, skb);
 
 	if (MPTCP_SKB_CB(skb)->map_seq == msk->ack_seq) {
 		/* in sequence */
@@ -400,7 +396,6 @@ static bool __mptcp_move_skb(struct sock *sk, struct sk_buff *skb)
 	 * will retransmit as needed, if needed.
 	 */
 	MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_DUPDATA);
-drop:
 	mptcp_drop(sk, skb);
 	return false;
 }
@@ -701,7 +696,7 @@ static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk,
 			size_t len = skb->len - offset;
 
 			mptcp_init_skb(ssk, skb, offset, len);
-			skb_orphan(skb);
+			mptcp_subflow_lend_fwdmem(subflow, skb);
 			ret = __mptcp_move_skb(sk, skb) || ret;
 			seq += len;
 
@@ -2428,6 +2423,7 @@ static void __mptcp_close_ssk(struct sock *sk, struct sock *ssk,
 {
 	struct mptcp_sock *msk = mptcp_sk(sk);
 	bool dispose_it, need_push = false;
+	int fwd_remaning;
 
 	/* Do not pass RX data to the msk, even if the subflow socket is not
 	 * going to be freed (i.e. even for the first subflow on graceful
@@ -2436,6 +2432,17 @@ static void __mptcp_close_ssk(struct sock *sk, struct sock *ssk,
 	lock_sock_nested(ssk, SINGLE_DEPTH_NESTING);
 	subflow->closing = 1;
 
+	/* Borrow the fwd allocated page left-over; fwd memory for the subflow
+	 * could be negative at this point, but will be reach zero soon - when
+	 * the data allocated using such fragment will be freed.
+	 */
+	if (subflow->lent_mem_frag) {
+		fwd_remaning = PAGE_SIZE - subflow->lent_mem_frag;
+		sk_forward_alloc_add(sk, fwd_remaning);
+		sk_forward_alloc_add(ssk, -fwd_remaning);
+		subflow->lent_mem_frag = 0;
+	}
+
 	/* If the first subflow moved to a close state before accept, e.g. due
 	 * to an incoming reset or listener shutdown, the subflow socket is
 	 * already deleted by inet_child_forget() and the mptcp socket can't
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 9f7e5f2c964d..80d520888235 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -547,6 +547,7 @@ struct mptcp_subflow_context {
 	bool	scheduled;
 	bool	pm_listener;	    /* a listener managed by the kernel PM? */
 	bool	fully_established;  /* path validated */
+	u32	lent_mem_frag;
 	u32	remote_nonce;
 	u64	thmac;
 	u32	local_nonce;
@@ -646,6 +647,28 @@ mptcp_send_active_reset_reason(struct sock *sk)
 	tcp_send_active_reset(sk, GFP_ATOMIC, reason);
 }
 
+static inline void mptcp_borrow_fwdmem(struct sock *sk, struct sk_buff *skb)
+{
+	struct sock *ssk = skb->sk;
+
+	/* The subflow just lend the skb fwd memory, and we know that the skb
+	 * is only accounted on the incoming subflow rcvbuf.
+	 */
+	skb->sk = NULL;
+	sk_forward_alloc_add(sk, skb->truesize);
+	atomic_sub(skb->truesize, &ssk->sk_rmem_alloc);
+}
+
+static inline void
+mptcp_subflow_lend_fwdmem(struct mptcp_subflow_context *subflow,
+			  struct sk_buff *skb)
+{
+	int frag = (subflow->lent_mem_frag + skb->truesize) & (PAGE_SIZE - 1);
+
+	skb->destructor = NULL;
+	subflow->lent_mem_frag = frag;
+}
+
 static inline u64
 mptcp_subflow_get_map_offset(const struct mptcp_subflow_context *subflow)
 {
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH RESENT v7 mptcp-next 3/4] mptcp: introduce mptcp-level backlog
  2025-10-27 14:57 [PATCH RESENT v7 mptcp-next 0/4] mptcp: introduce backlog processing Paolo Abeni
  2025-10-27 14:57 ` [PATCH RESENT v7 mptcp-next 1/4] mptcp: handle first subflow closing consistently Paolo Abeni
  2025-10-27 14:58 ` [PATCH RESENT v7 mptcp-next 2/4] mptcp: borrow forward memory from subflow Paolo Abeni
@ 2025-10-27 14:58 ` Paolo Abeni
  2025-10-27 14:58 ` [PATCH RESENT v7 mptcp-next 4/4] mptcp: leverage the backlog for RX packet processing Paolo Abeni
  2025-10-27 16:13 ` [PATCH RESENT v7 mptcp-next 0/4] mptcp: introduce backlog processing MPTCP CI
  4 siblings, 0 replies; 11+ messages in thread
From: Paolo Abeni @ 2025-10-27 14:58 UTC (permalink / raw)
  To: mptcp; +Cc: Mat Martineau, geliang

We are soon using it for incoming data processing.
MPTCP can't leverage the sk_backlog, as the latter is processed
before the release callback, and such callback for MPTCP releases
and re-acquire the socket spinlock, breaking the sk_backlog processing
assumption.

Add a skb backlog list inside the mptcp sock struct, and implement
basic helper to transfer packet to and purge such list.

Packets in the backlog are memory accounted and still use the incoming
subflow receive memory, to allow back-pressure. The backlog size is
implicitly bounded to the sum of subflows rcvbuf.

When a subflow is closed, references from the backlog to such sock
are removed.

No packet is currently added to the backlog, so no functional changes
intended here.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
--
v6 -> v7:
  - real fwd memory account for the backlog
  - do not introduce a new destructor: we only have a call-site
    dropping packets from the backlog, handle that one explicitly
  - update to new borrow fwd mem API

v5 -> v6:
  - call mptcp_bl_free() instead of inlining it.
  - report the bl mem in diag mem info
  - moved here the mptcp_close_ssk chunk from the next patch.
    (logically belongs here)

v4 -> v5:
  - split out of the next path, to make the latter smaller
  - set a custom destructor for skbs in the backlog, this avoid
    duplicate code, and fix a few places where the need ssk cleanup
    was not performed.
  - factor out the backlog purge in a new helper,
    use spinlock protection, clear the backlog list and zero the
    backlog len
  - explicitly init the backlog_len at mptcp_init_sock() time
---
 net/mptcp/mptcp_diag.c |  3 +-
 net/mptcp/protocol.c   | 77 ++++++++++++++++++++++++++++++++++++++++--
 net/mptcp/protocol.h   | 25 ++++++++++----
 3 files changed, 96 insertions(+), 9 deletions(-)

diff --git a/net/mptcp/mptcp_diag.c b/net/mptcp/mptcp_diag.c
index ac974299de71..136c2d05c0ee 100644
--- a/net/mptcp/mptcp_diag.c
+++ b/net/mptcp/mptcp_diag.c
@@ -195,7 +195,8 @@ static void mptcp_diag_get_info(struct sock *sk, struct inet_diag_msg *r,
 	struct mptcp_sock *msk = mptcp_sk(sk);
 	struct mptcp_info *info = _info;
 
-	r->idiag_rqueue = sk_rmem_alloc_get(sk);
+	r->idiag_rqueue = sk_rmem_alloc_get(sk) +
+			  READ_ONCE(mptcp_sk(sk)->backlog_len);
 	r->idiag_wqueue = sk_wmem_alloc_get(sk);
 
 	if (inet_sk_state_load(sk) == TCP_LISTEN) {
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index f6d96cb01e00..4c62de93e132 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -649,6 +649,38 @@ static void mptcp_dss_corruption(struct mptcp_sock *msk, struct sock *ssk)
 		mptcp_subflow_reset(ssk);
 	}
 }
+static void __mptcp_add_backlog(struct sock *sk,
+				struct mptcp_subflow_context *subflow,
+				struct sk_buff *skb)
+{
+	struct mptcp_sock *msk = mptcp_sk(sk);
+	struct sk_buff *tail = NULL;
+	bool fragstolen;
+	int delta;
+
+	if (unlikely(sk->sk_state == TCP_CLOSE)) {
+		kfree_skb_reason(skb, SKB_DROP_REASON_SOCKET_CLOSE);
+		return;
+	}
+
+	/* Try to coalesce with the last skb in our backlog */
+	if (!list_empty(&msk->backlog_list))
+		tail = list_last_entry(&msk->backlog_list, struct sk_buff, list);
+
+	if (tail && MPTCP_SKB_CB(skb)->map_seq == MPTCP_SKB_CB(tail)->end_seq &&
+	    skb->sk == tail->sk &&
+	    __mptcp_try_coalesce(sk, tail, skb, &fragstolen, &delta)) {
+		skb->truesize -= delta;
+		kfree_skb_partial(skb, fragstolen);
+		__mptcp_subflow_lend_fwdmem(subflow, delta);
+		WRITE_ONCE(msk->backlog_len, msk->backlog_len + delta);
+		return;
+	}
+
+	list_add_tail(&skb->list, &msk->backlog_list);
+	mptcp_subflow_lend_fwdmem(subflow, skb);
+	WRITE_ONCE(msk->backlog_len, msk->backlog_len + skb->truesize);
+}
 
 static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk,
 					   struct sock *ssk)
@@ -696,8 +728,13 @@ static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk,
 			size_t len = skb->len - offset;
 
 			mptcp_init_skb(ssk, skb, offset, len);
-			mptcp_subflow_lend_fwdmem(subflow, skb);
-			ret = __mptcp_move_skb(sk, skb) || ret;
+
+			if (true) {
+				mptcp_subflow_lend_fwdmem(subflow, skb);
+				ret |= __mptcp_move_skb(sk, skb);
+			} else {
+				__mptcp_add_backlog(sk, subflow, skb);
+			}
 			seq += len;
 
 			if (unlikely(map_remaining < len)) {
@@ -2529,6 +2566,9 @@ static void __mptcp_close_ssk(struct sock *sk, struct sock *ssk,
 void mptcp_close_ssk(struct sock *sk, struct sock *ssk,
 		     struct mptcp_subflow_context *subflow)
 {
+	struct mptcp_sock *msk = mptcp_sk(sk);
+	struct sk_buff *skb;
+
 	/* The first subflow can already be closed and still in the list */
 	if (subflow->close_event_done)
 		return;
@@ -2538,6 +2578,17 @@ void mptcp_close_ssk(struct sock *sk, struct sock *ssk,
 	if (sk->sk_state == TCP_ESTABLISHED)
 		mptcp_event(MPTCP_EVENT_SUB_CLOSED, mptcp_sk(sk), ssk, GFP_KERNEL);
 
+	/* Remove any reference from the backlog to this ssk; backlog skbs consume
+	 * space in the msk receive queue, no need to touch sk->sk_rmem_alloc
+	 */
+	list_for_each_entry(skb, &msk->backlog_list, list) {
+		if (skb->sk != ssk)
+			continue;
+
+		atomic_sub(skb->truesize, &skb->sk->sk_rmem_alloc);
+		skb->sk = NULL;
+	}
+
 	/* subflow aborted before reaching the fully_established status
 	 * attempt the creation of the next subflow
 	 */
@@ -2766,12 +2817,31 @@ static void mptcp_mp_fail_no_response(struct mptcp_sock *msk)
 	unlock_sock_fast(ssk, slow);
 }
 
+static void mptcp_backlog_purge(struct sock *sk)
+{
+	struct mptcp_sock *msk = mptcp_sk(sk);
+	struct sk_buff *tmp, *skb;
+	LIST_HEAD(backlog);
+
+	mptcp_data_lock(sk);
+	list_splice_init(&msk->backlog_list, &backlog);
+	msk->backlog_len = 0;
+	mptcp_data_unlock(sk);
+
+	list_for_each_entry_safe(skb, tmp, &backlog, list) {
+		mptcp_borrow_fwdmem(sk, skb);
+		kfree_skb_reason(skb, SKB_DROP_REASON_SOCKET_CLOSE);
+	}
+	sk_mem_reclaim(sk);
+}
+
 static void mptcp_do_fastclose(struct sock *sk)
 {
 	struct mptcp_subflow_context *subflow, *tmp;
 	struct mptcp_sock *msk = mptcp_sk(sk);
 
 	mptcp_set_state(sk, TCP_CLOSE);
+	mptcp_backlog_purge(sk);
 	mptcp_for_each_subflow_safe(msk, subflow, tmp)
 		__mptcp_close_ssk(sk, mptcp_subflow_tcp_sock(subflow),
 				  subflow, MPTCP_CF_FASTCLOSE);
@@ -2829,11 +2899,13 @@ static void __mptcp_init_sock(struct sock *sk)
 	INIT_LIST_HEAD(&msk->conn_list);
 	INIT_LIST_HEAD(&msk->join_list);
 	INIT_LIST_HEAD(&msk->rtx_queue);
+	INIT_LIST_HEAD(&msk->backlog_list);
 	INIT_WORK(&msk->work, mptcp_worker);
 	msk->out_of_order_queue = RB_ROOT;
 	msk->first_pending = NULL;
 	msk->timer_ival = TCP_RTO_MIN;
 	msk->scaling_ratio = TCP_DEFAULT_SCALING_RATIO;
+	msk->backlog_len = 0;
 
 	WRITE_ONCE(msk->first, NULL);
 	inet_csk(sk)->icsk_sync_mss = mptcp_sync_mss;
@@ -3210,6 +3282,7 @@ static void mptcp_destroy_common(struct mptcp_sock *msk, unsigned int flags)
 	struct sock *sk = (struct sock *)msk;
 
 	__mptcp_clear_xmit(sk);
+	mptcp_backlog_purge(sk);
 
 	/* join list will be eventually flushed (with rst) at sock lock release time */
 	mptcp_for_each_subflow_safe(msk, subflow, tmp)
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 80d520888235..cf82aefb5513 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -358,6 +358,9 @@ struct mptcp_sock {
 					 * allow_infinite_fallback and
 					 * allow_join
 					 */
+
+	struct list_head backlog_list;	/* protected by the data lock */
+	u32		backlog_len;
 };
 
 #define mptcp_data_lock(sk) spin_lock_bh(&(sk)->sk_lock.slock)
@@ -408,6 +411,7 @@ static inline int mptcp_space_from_win(const struct sock *sk, int win)
 static inline int __mptcp_space(const struct sock *sk)
 {
 	return mptcp_win_from_space(sk, READ_ONCE(sk->sk_rcvbuf) -
+				    READ_ONCE(mptcp_sk(sk)->backlog_len) -
 				    sk_rmem_alloc_get(sk));
 }
 
@@ -651,22 +655,31 @@ static inline void mptcp_borrow_fwdmem(struct sock *sk, struct sk_buff *skb)
 {
 	struct sock *ssk = skb->sk;
 
-	/* The subflow just lend the skb fwd memory, and we know that the skb
-	 * is only accounted on the incoming subflow rcvbuf.
+	/* The subflow just lend the skb fwd memory; if the subflow meanwhile
+	 * closed mptcp_close_ssk already released the ssk rcv memory.
 	 */
-	skb->sk = NULL;
 	sk_forward_alloc_add(sk, skb->truesize);
+	if (!ssk)
+		return;
+
 	atomic_sub(skb->truesize, &ssk->sk_rmem_alloc);
+	skb->sk = NULL;
+}
+
+static inline void
+__mptcp_subflow_lend_fwdmem(struct mptcp_subflow_context *subflow, int size)
+{
+	int frag = (subflow->lent_mem_frag + size) & (PAGE_SIZE - 1);
+
+	subflow->lent_mem_frag = frag;
 }
 
 static inline void
 mptcp_subflow_lend_fwdmem(struct mptcp_subflow_context *subflow,
 			  struct sk_buff *skb)
 {
-	int frag = (subflow->lent_mem_frag + skb->truesize) & (PAGE_SIZE - 1);
-
+	__mptcp_subflow_lend_fwdmem(subflow, skb->truesize);
 	skb->destructor = NULL;
-	subflow->lent_mem_frag = frag;
 }
 
 static inline u64
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH RESENT v7 mptcp-next 4/4] mptcp: leverage the backlog for RX packet processing
  2025-10-27 14:57 [PATCH RESENT v7 mptcp-next 0/4] mptcp: introduce backlog processing Paolo Abeni
                   ` (2 preceding siblings ...)
  2025-10-27 14:58 ` [PATCH RESENT v7 mptcp-next 3/4] mptcp: introduce mptcp-level backlog Paolo Abeni
@ 2025-10-27 14:58 ` Paolo Abeni
  2025-11-01  0:04   ` Mat Martineau
  2025-10-27 16:13 ` [PATCH RESENT v7 mptcp-next 0/4] mptcp: introduce backlog processing MPTCP CI
  4 siblings, 1 reply; 11+ messages in thread
From: Paolo Abeni @ 2025-10-27 14:58 UTC (permalink / raw)
  To: mptcp; +Cc: Mat Martineau, geliang

When the msk socket is owned or the msk receive buffer is full,
move the incoming skbs in a msk level backlog list. This avoid
traversing the joined subflows and acquiring the subflow level
socket lock at reception time, improving the RX performances.

When processing the backlog, use the fwd alloc memory borrowed from
the incoming subflow. skbs exceeding the msk receive space are
not dropped; instead they are kept into the backlog until the receive
buffer is freed. Dropping packets already acked at the TCP level is
explicitly discouraged by the RFC and would corrupt the data stream
for fallback sockets.

Special care is needed to avoid adding skbs to the backlog of a closed
msk and to avoid leaving dangling references into the backlog
at subflow closing time.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
v6 -> v7:
  - do not limit the overall backlog spooling loop, it's hard to do it
    right and the pre backlog code did not care for the similar existing
    loop

v5 -> v6:
  - do backlog len update asap to advise the correct window.
  - explicitly bound backlog processing loop to the maximum BL len

v4 -> v5:
  - consolidate ssk rcvbuf accunting in __mptcp_move_skb(), remove
    some code duplication
  - return soon in __mptcp_add_backlog() when dropping skbs due to
    the msk closed. This avoid later UaF
---
 net/mptcp/protocol.c | 121 ++++++++++++++++++++++++-------------------
 net/mptcp/protocol.h |   1 -
 2 files changed, 68 insertions(+), 54 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 4c62de93e132..f93f973a4ffb 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -683,7 +683,7 @@ static void __mptcp_add_backlog(struct sock *sk,
 }
 
 static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk,
-					   struct sock *ssk)
+					   struct sock *ssk, bool own_msk)
 {
 	struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
 	struct sock *sk = (struct sock *)msk;
@@ -699,9 +699,6 @@ static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk,
 		struct sk_buff *skb;
 		bool fin;
 
-		if (sk_rmem_alloc_get(sk) > sk->sk_rcvbuf)
-			break;
-
 		/* try to move as much data as available */
 		map_remaining = subflow->map_data_len -
 				mptcp_subflow_get_map_offset(subflow);
@@ -729,7 +726,7 @@ static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk,
 
 			mptcp_init_skb(ssk, skb, offset, len);
 
-			if (true) {
+			if (own_msk && sk_rmem_alloc_get(sk) < sk->sk_rcvbuf) {
 				mptcp_subflow_lend_fwdmem(subflow, skb);
 				ret |= __mptcp_move_skb(sk, skb);
 			} else {
@@ -853,7 +850,7 @@ static bool move_skbs_to_msk(struct mptcp_sock *msk, struct sock *ssk)
 	struct sock *sk = (struct sock *)msk;
 	bool moved;
 
-	moved = __mptcp_move_skbs_from_subflow(msk, ssk);
+	moved = __mptcp_move_skbs_from_subflow(msk, ssk, true);
 	__mptcp_ofo_queue(msk);
 	if (unlikely(ssk->sk_err))
 		__mptcp_subflow_error_report(sk, ssk);
@@ -886,7 +883,7 @@ void mptcp_data_ready(struct sock *sk, struct sock *ssk)
 		if (move_skbs_to_msk(msk, ssk) && mptcp_epollin_ready(sk))
 			sk->sk_data_ready(sk);
 	} else {
-		__set_bit(MPTCP_DEQUEUE, &mptcp_sk(sk)->cb_flags);
+		__mptcp_move_skbs_from_subflow(msk, ssk, false);
 	}
 	mptcp_data_unlock(sk);
 }
@@ -2126,60 +2123,74 @@ static void mptcp_rcv_space_adjust(struct mptcp_sock *msk, int copied)
 	msk->rcvq_space.time = mstamp;
 }
 
-static struct mptcp_subflow_context *
-__mptcp_first_ready_from(struct mptcp_sock *msk,
-			 struct mptcp_subflow_context *subflow)
+static bool __mptcp_move_skbs(struct sock *sk, struct list_head *skbs, u32 *delta)
 {
-	struct mptcp_subflow_context *start_subflow = subflow;
+	struct sk_buff *skb = list_first_entry(skbs, struct sk_buff, list);
+	struct mptcp_sock *msk = mptcp_sk(sk);
+	bool moved = false;
+
+	*delta = 0;
+	while (1) {
+		/* If the msk recvbuf is full stop, don't drop */
+		if (sk_rmem_alloc_get(sk) > sk->sk_rcvbuf)
+			break;
+
+		prefetch(skb->next);
+		list_del(&skb->list);
+		*delta += skb->truesize;
+
+		moved |= __mptcp_move_skb(sk, skb);
+		if (list_empty(skbs))
+			break;
 
-	while (!READ_ONCE(subflow->data_avail)) {
-		subflow = mptcp_next_subflow(msk, subflow);
-		if (subflow == start_subflow)
-			return NULL;
+		skb = list_first_entry(skbs, struct sk_buff, list);
 	}
-	return subflow;
+
+	__mptcp_ofo_queue(msk);
+	if (moved)
+		mptcp_check_data_fin((struct sock *)msk);
+	return moved;
 }
 
-static bool __mptcp_move_skbs(struct sock *sk)
+static bool mptcp_can_spool_backlog(struct sock *sk, struct list_head *skbs)
 {
-	struct mptcp_subflow_context *subflow;
 	struct mptcp_sock *msk = mptcp_sk(sk);
-	bool ret = false;
 
-	if (list_empty(&msk->conn_list))
+	/* Don't spool the backlog if the rcvbuf is full. */
+	if (list_empty(&msk->backlog_list) ||
+	    sk_rmem_alloc_get(sk) > sk->sk_rcvbuf)
 		return false;
 
-	subflow = list_first_entry(&msk->conn_list,
-				   struct mptcp_subflow_context, node);
-	for (;;) {
-		struct sock *ssk;
-		bool slowpath;
+	INIT_LIST_HEAD(skbs);
+	list_splice_init(&msk->backlog_list, skbs);
+	return true;
+}
 
-		/*
-		 * As an optimization avoid traversing the subflows list
-		 * and ev. acquiring the subflow socket lock before baling out
-		 */
-		if (sk_rmem_alloc_get(sk) > sk->sk_rcvbuf)
-			break;
+static void mptcp_backlog_spooled(struct sock *sk, u32 moved,
+				  struct list_head *skbs)
+{
+	struct mptcp_sock *msk = mptcp_sk(sk);
 
-		subflow = __mptcp_first_ready_from(msk, subflow);
-		if (!subflow)
-			break;
+	WRITE_ONCE(msk->backlog_len, msk->backlog_len - moved);
+	list_splice(skbs, &msk->backlog_list);
+}
 
-		ssk = mptcp_subflow_tcp_sock(subflow);
-		slowpath = lock_sock_fast(ssk);
-		ret = __mptcp_move_skbs_from_subflow(msk, ssk) || ret;
-		if (unlikely(ssk->sk_err))
-			__mptcp_error_report(sk);
-		unlock_sock_fast(ssk, slowpath);
+static bool mptcp_move_skbs(struct sock *sk)
+{
+	struct list_head skbs;
+	bool enqueued = false;
+	u32 moved;
 
-		subflow = mptcp_next_subflow(msk, subflow);
-	}
+	mptcp_data_lock(sk);
+	while (mptcp_can_spool_backlog(sk, &skbs)) {
+		mptcp_data_unlock(sk);
+		enqueued |= __mptcp_move_skbs(sk, &skbs, &moved);
 
-	__mptcp_ofo_queue(msk);
-	if (ret)
-		mptcp_check_data_fin((struct sock *)msk);
-	return ret;
+		mptcp_data_lock(sk);
+		mptcp_backlog_spooled(sk, moved, &skbs);
+	}
+	mptcp_data_unlock(sk);
+	return enqueued;
 }
 
 static unsigned int mptcp_inq_hint(const struct sock *sk)
@@ -2245,7 +2256,7 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 
 		copied += bytes_read;
 
-		if (skb_queue_empty(&sk->sk_receive_queue) && __mptcp_move_skbs(sk))
+		if (!list_empty(&msk->backlog_list) && mptcp_move_skbs(sk))
 			continue;
 
 		/* only the MPTCP socket status is relevant here. The exit
@@ -3530,8 +3541,7 @@ void __mptcp_check_push(struct sock *sk, struct sock *ssk)
 
 #define MPTCP_FLAGS_PROCESS_CTX_NEED (BIT(MPTCP_PUSH_PENDING) | \
 				      BIT(MPTCP_RETRANSMIT) | \
-				      BIT(MPTCP_FLUSH_JOIN_LIST) | \
-				      BIT(MPTCP_DEQUEUE))
+				      BIT(MPTCP_FLUSH_JOIN_LIST))
 
 /* processes deferred events and flush wmem */
 static void mptcp_release_cb(struct sock *sk)
@@ -3541,9 +3551,12 @@ static void mptcp_release_cb(struct sock *sk)
 
 	for (;;) {
 		unsigned long flags = (msk->cb_flags & MPTCP_FLAGS_PROCESS_CTX_NEED);
-		struct list_head join_list;
+		struct list_head join_list, skbs;
+		bool spool_bl;
+		u32 moved;
 
-		if (!flags)
+		spool_bl = mptcp_can_spool_backlog(sk, &skbs);
+		if (!flags && !spool_bl)
 			break;
 
 		INIT_LIST_HEAD(&join_list);
@@ -3565,7 +3578,7 @@ static void mptcp_release_cb(struct sock *sk)
 			__mptcp_push_pending(sk, 0);
 		if (flags & BIT(MPTCP_RETRANSMIT))
 			__mptcp_retrans(sk);
-		if ((flags & BIT(MPTCP_DEQUEUE)) && __mptcp_move_skbs(sk)) {
+		if (spool_bl && __mptcp_move_skbs(sk, &skbs, &moved)) {
 			/* notify ack seq update */
 			mptcp_cleanup_rbuf(msk, 0);
 			sk->sk_data_ready(sk);
@@ -3573,6 +3586,8 @@ static void mptcp_release_cb(struct sock *sk)
 
 		cond_resched();
 		spin_lock_bh(&sk->sk_lock.slock);
+		if (spool_bl)
+			mptcp_backlog_spooled(sk, moved, &skbs);
 	}
 
 	if (__test_and_clear_bit(MPTCP_CLEAN_UNA, &msk->cb_flags))
@@ -3805,7 +3820,7 @@ static int mptcp_ioctl(struct sock *sk, int cmd, int *karg)
 			return -EINVAL;
 
 		lock_sock(sk);
-		if (__mptcp_move_skbs(sk))
+		if (mptcp_move_skbs(sk))
 			mptcp_cleanup_rbuf(msk, 0);
 		*karg = mptcp_inq_hint(sk);
 		release_sock(sk);
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index cf82aefb5513..8e0f780e9210 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -124,7 +124,6 @@
 #define MPTCP_FLUSH_JOIN_LIST	5
 #define MPTCP_SYNC_STATE	6
 #define MPTCP_SYNC_SNDBUF	7
-#define MPTCP_DEQUEUE		8
 
 struct mptcp_skb_cb {
 	u64 map_seq;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH RESENT v7 mptcp-next 0/4] mptcp: introduce backlog processing
  2025-10-27 14:57 [PATCH RESENT v7 mptcp-next 0/4] mptcp: introduce backlog processing Paolo Abeni
                   ` (3 preceding siblings ...)
  2025-10-27 14:58 ` [PATCH RESENT v7 mptcp-next 4/4] mptcp: leverage the backlog for RX packet processing Paolo Abeni
@ 2025-10-27 16:13 ` MPTCP CI
  4 siblings, 0 replies; 11+ messages in thread
From: MPTCP CI @ 2025-10-27 16:13 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: mptcp

Hi Paolo,

Thank you for your modifications, that's great!

Our CI did some validations and here is its report:

- KVM Validation: normal (except selftest_mptcp_join): Success! ✅
- KVM Validation: normal (only selftest_mptcp_join): Success! ✅
- KVM Validation: debug (except selftest_mptcp_join): Success! ✅
- KVM Validation: debug (only selftest_mptcp_join): Success! ✅
- KVM Validation: btf-normal (only bpftest_all): Success! ✅
- KVM Validation: btf-debug (only bpftest_all): Success! ✅
- Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/18846183864

Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/a809ef68db21
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=1016326


If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:

    $ cd [kernel source code]
    $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
        --pull always mptcp/mptcp-upstream-virtme-docker:latest \
        auto-normal

For more details:

    https://github.com/multipath-tcp/mptcp-upstream-virtme-docker


Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)

Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RESENT v7 mptcp-next 2/4] mptcp: borrow forward memory from subflow
  2025-10-27 14:58 ` [PATCH RESENT v7 mptcp-next 2/4] mptcp: borrow forward memory from subflow Paolo Abeni
@ 2025-10-31 22:44   ` Mat Martineau
  2025-11-03 16:26     ` Paolo Abeni
  0 siblings, 1 reply; 11+ messages in thread
From: Mat Martineau @ 2025-10-31 22:44 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: mptcp, geliang

On Mon, 27 Oct 2025, Paolo Abeni wrote:

> In the MPTCP receive path, we release the subflow allocated fwd
> memory just to allocate it again shortly after for the msk.
>
> That could increases the failures chances, especially when we will
> add backlog processing, with other actions could consume the just
> released memory before the msk socket has a chance to do the
> rcv allocation.
>
> Replace the skb_orphan() call with an open-coded variant that
> explicitly borrows, the fwd memory from the subflow socket instead
> of releasing it.
>
> The borrowed memory does not have PAGE_SIZE granularity; rounding to
> the page size will make the fwd allocated memory higher than what is
> strictly required and could make the incoming subflow fwd mem
> consistently negative. Instead, keep track of the accumulated frag and
> borrow the full page at subflow close time.
>
> This allow removing the last drop in the TCP to MPTCP transition and
> the associated, now unused, MIB.
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> net/mptcp/fastopen.c |  4 +++-
> net/mptcp/mib.c      |  1 -
> net/mptcp/mib.h      |  1 -
> net/mptcp/protocol.c | 23 +++++++++++++++--------
> net/mptcp/protocol.h | 23 +++++++++++++++++++++++
> 5 files changed, 41 insertions(+), 11 deletions(-)
>
> diff --git a/net/mptcp/fastopen.c b/net/mptcp/fastopen.c
> index b9e451197902..82ec15bcfd7f 100644
> --- a/net/mptcp/fastopen.c
> +++ b/net/mptcp/fastopen.c
> @@ -32,7 +32,8 @@ void mptcp_fastopen_subflow_synack_set_params(struct mptcp_subflow_context *subf
> 	/* dequeue the skb from sk receive queue */
> 	__skb_unlink(skb, &ssk->sk_receive_queue);
> 	skb_ext_reset(skb);
> -	skb_orphan(skb);
> +
> +	mptcp_subflow_lend_fwdmem(subflow, skb);
>
> 	/* We copy the fastopen data, but that don't belong to the mptcp sequence
> 	 * space, need to offset it in the subflow sequence, see mptcp_subflow_get_map_offset()
> @@ -50,6 +51,7 @@ void mptcp_fastopen_subflow_synack_set_params(struct mptcp_subflow_context *subf
> 	mptcp_data_lock(sk);
> 	DEBUG_NET_WARN_ON_ONCE(sock_owned_by_user_nocheck(sk));
>
> +	mptcp_borrow_fwdmem(sk, skb);
> 	skb_set_owner_r(skb, sk);
> 	__skb_queue_tail(&sk->sk_receive_queue, skb);
> 	mptcp_sk(sk)->bytes_received += skb->len;
> diff --git a/net/mptcp/mib.c b/net/mptcp/mib.c
> index 171643815076..f23fda0c55a7 100644
> --- a/net/mptcp/mib.c
> +++ b/net/mptcp/mib.c
> @@ -71,7 +71,6 @@ static const struct snmp_mib mptcp_snmp_list[] = {
> 	SNMP_MIB_ITEM("MPFastcloseRx", MPTCP_MIB_MPFASTCLOSERX),
> 	SNMP_MIB_ITEM("MPRstTx", MPTCP_MIB_MPRSTTX),
> 	SNMP_MIB_ITEM("MPRstRx", MPTCP_MIB_MPRSTRX),
> -	SNMP_MIB_ITEM("RcvPruned", MPTCP_MIB_RCVPRUNED),
> 	SNMP_MIB_ITEM("SubflowStale", MPTCP_MIB_SUBFLOWSTALE),
> 	SNMP_MIB_ITEM("SubflowRecover", MPTCP_MIB_SUBFLOWRECOVER),
> 	SNMP_MIB_ITEM("SndWndShared", MPTCP_MIB_SNDWNDSHARED),
> diff --git a/net/mptcp/mib.h b/net/mptcp/mib.h
> index a1d3e9369fbb..812218b5ed2b 100644
> --- a/net/mptcp/mib.h
> +++ b/net/mptcp/mib.h
> @@ -70,7 +70,6 @@ enum linux_mptcp_mib_field {
> 	MPTCP_MIB_MPFASTCLOSERX,	/* Received a MP_FASTCLOSE */
> 	MPTCP_MIB_MPRSTTX,		/* Transmit a MP_RST */
> 	MPTCP_MIB_MPRSTRX,		/* Received a MP_RST */
> -	MPTCP_MIB_RCVPRUNED,		/* Incoming packet dropped due to memory limit */
> 	MPTCP_MIB_SUBFLOWSTALE,		/* Subflows entered 'stale' status */
> 	MPTCP_MIB_SUBFLOWRECOVER,	/* Subflows returned to active status after being stale */
> 	MPTCP_MIB_SNDWNDSHARED,		/* Subflow snd wnd is overridden by msk's one */
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index 74be417be980..f6d96cb01e00 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -349,7 +349,7 @@ static void mptcp_data_queue_ofo(struct mptcp_sock *msk, struct sk_buff *skb)
> static void mptcp_init_skb(struct sock *ssk, struct sk_buff *skb, int offset,
> 			   int copy_len)
> {
> -	const struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
> +	struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
> 	bool has_rxtstamp = TCP_SKB_CB(skb)->has_rxtstamp;
>
> 	/* the skb map_seq accounts for the skb offset:
> @@ -374,11 +374,7 @@ static bool __mptcp_move_skb(struct sock *sk, struct sk_buff *skb)
> 	struct mptcp_sock *msk = mptcp_sk(sk);
> 	struct sk_buff *tail;
>
> -	/* try to fetch required memory from subflow */
> -	if (!sk_rmem_schedule(sk, skb, skb->truesize)) {
> -		MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_RCVPRUNED);
> -		goto drop;
> -	}
> +	mptcp_borrow_fwdmem(sk, skb);
>
> 	if (MPTCP_SKB_CB(skb)->map_seq == msk->ack_seq) {
> 		/* in sequence */
> @@ -400,7 +396,6 @@ static bool __mptcp_move_skb(struct sock *sk, struct sk_buff *skb)
> 	 * will retransmit as needed, if needed.
> 	 */
> 	MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_DUPDATA);
> -drop:
> 	mptcp_drop(sk, skb);
> 	return false;
> }
> @@ -701,7 +696,7 @@ static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk,
> 			size_t len = skb->len - offset;
>
> 			mptcp_init_skb(ssk, skb, offset, len);
> -			skb_orphan(skb);
> +			mptcp_subflow_lend_fwdmem(subflow, skb);
> 			ret = __mptcp_move_skb(sk, skb) || ret;
> 			seq += len;
>
> @@ -2428,6 +2423,7 @@ static void __mptcp_close_ssk(struct sock *sk, struct sock *ssk,
> {
> 	struct mptcp_sock *msk = mptcp_sk(sk);
> 	bool dispose_it, need_push = false;

Hi Paolo -

> +	int fwd_remaning;

One spelling fix, "fwd_remaining".

>
> 	/* Do not pass RX data to the msk, even if the subflow socket is not
> 	 * going to be freed (i.e. even for the first subflow on graceful
> @@ -2436,6 +2432,17 @@ static void __mptcp_close_ssk(struct sock *sk, struct sock *ssk,
> 	lock_sock_nested(ssk, SINGLE_DEPTH_NESTING);
> 	subflow->closing = 1;
>
> +	/* Borrow the fwd allocated page left-over; fwd memory for the subflow
> +	 * could be negative at this point, but will be reach zero soon - when
> +	 * the data allocated using such fragment will be freed.
> +	 */
> +	if (subflow->lent_mem_frag) {
> +		fwd_remaning = PAGE_SIZE - subflow->lent_mem_frag;
> +		sk_forward_alloc_add(sk, fwd_remaning);
> +		sk_forward_alloc_add(ssk, -fwd_remaning);
> +		subflow->lent_mem_frag = 0;
> +	}
> +
> 	/* If the first subflow moved to a close state before accept, e.g. due
> 	 * to an incoming reset or listener shutdown, the subflow socket is
> 	 * already deleted by inet_child_forget() and the mptcp socket can't
> diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
> index 9f7e5f2c964d..80d520888235 100644
> --- a/net/mptcp/protocol.h
> +++ b/net/mptcp/protocol.h
> @@ -547,6 +547,7 @@ struct mptcp_subflow_context {
> 	bool	scheduled;
> 	bool	pm_listener;	    /* a listener managed by the kernel PM? */
> 	bool	fully_established;  /* path validated */
> +	u32	lent_mem_frag;
> 	u32	remote_nonce;
> 	u64	thmac;
> 	u32	local_nonce;
> @@ -646,6 +647,28 @@ mptcp_send_active_reset_reason(struct sock *sk)
> 	tcp_send_active_reset(sk, GFP_ATOMIC, reason);
> }
>
> +static inline void mptcp_borrow_fwdmem(struct sock *sk, struct sk_buff *skb)
> +{
> +	struct sock *ssk = skb->sk;
> +
> +	/* The subflow just lend the skb fwd memory, and we know that the skb
> +	 * is only accounted on the incoming subflow rcvbuf.
> +	 */
> +	skb->sk = NULL;

Looks like the intent is to always pair mptcp_subflow_lend_fwd() with 
mptcp_borrow_fwdmem() (in that order).

Given that skb->sk and skb->destructor are usually cleared together, 
should mptcp_borrow_fwdmem() have a WARN_ON_ONCE(skb->destructor)? Or is 
that excessive paranoia?

- Mat

> +	sk_forward_alloc_add(sk, skb->truesize);
> +	atomic_sub(skb->truesize, &ssk->sk_rmem_alloc);
> +}
> +
> +static inline void
> +mptcp_subflow_lend_fwdmem(struct mptcp_subflow_context *subflow,
> +			  struct sk_buff *skb)
> +{
> +	int frag = (subflow->lent_mem_frag + skb->truesize) & (PAGE_SIZE - 1);
> +
> +	skb->destructor = NULL;
> +	subflow->lent_mem_frag = frag;
> +}
> +
> static inline u64
> mptcp_subflow_get_map_offset(const struct mptcp_subflow_context *subflow)
> {
> -- 
> 2.51.0
>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RESENT v7 mptcp-next 4/4] mptcp: leverage the backlog for RX packet processing
  2025-10-27 14:58 ` [PATCH RESENT v7 mptcp-next 4/4] mptcp: leverage the backlog for RX packet processing Paolo Abeni
@ 2025-11-01  0:04   ` Mat Martineau
  2025-11-03 16:23     ` Paolo Abeni
  0 siblings, 1 reply; 11+ messages in thread
From: Mat Martineau @ 2025-11-01  0:04 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: mptcp, geliang

On Mon, 27 Oct 2025, Paolo Abeni wrote:

> When the msk socket is owned or the msk receive buffer is full,
> move the incoming skbs in a msk level backlog list. This avoid
> traversing the joined subflows and acquiring the subflow level
> socket lock at reception time, improving the RX performances.
>
> When processing the backlog, use the fwd alloc memory borrowed from
> the incoming subflow. skbs exceeding the msk receive space are
> not dropped; instead they are kept into the backlog until the receive
> buffer is freed. Dropping packets already acked at the TCP level is
> explicitly discouraged by the RFC and would corrupt the data stream
> for fallback sockets.
>
> Special care is needed to avoid adding skbs to the backlog of a closed
> msk and to avoid leaving dangling references into the backlog
> at subflow closing time.
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> v6 -> v7:
>  - do not limit the overall backlog spooling loop, it's hard to do it
>    right and the pre backlog code did not care for the similar existing
>    loop
>
> v5 -> v6:
>  - do backlog len update asap to advise the correct window.
>  - explicitly bound backlog processing loop to the maximum BL len
>
> v4 -> v5:
>  - consolidate ssk rcvbuf accunting in __mptcp_move_skb(), remove
>    some code duplication
>  - return soon in __mptcp_add_backlog() when dropping skbs due to
>    the msk closed. This avoid later UaF
> ---
> net/mptcp/protocol.c | 121 ++++++++++++++++++++++++-------------------
> net/mptcp/protocol.h |   1 -
> 2 files changed, 68 insertions(+), 54 deletions(-)
>
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index 4c62de93e132..f93f973a4ffb 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -683,7 +683,7 @@ static void __mptcp_add_backlog(struct sock *sk,
> }
>
> static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk,
> -					   struct sock *ssk)
> +					   struct sock *ssk, bool own_msk)
> {
> 	struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
> 	struct sock *sk = (struct sock *)msk;
> @@ -699,9 +699,6 @@ static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk,
> 		struct sk_buff *skb;
> 		bool fin;
>
> -		if (sk_rmem_alloc_get(sk) > sk->sk_rcvbuf)
> -			break;
> -
> 		/* try to move as much data as available */
> 		map_remaining = subflow->map_data_len -
> 				mptcp_subflow_get_map_offset(subflow);
> @@ -729,7 +726,7 @@ static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk,
>
> 			mptcp_init_skb(ssk, skb, offset, len);
>
> -			if (true) {
> +			if (own_msk && sk_rmem_alloc_get(sk) < sk->sk_rcvbuf) {
> 				mptcp_subflow_lend_fwdmem(subflow, skb);
> 				ret |= __mptcp_move_skb(sk, skb);
> 			} else {
> @@ -853,7 +850,7 @@ static bool move_skbs_to_msk(struct mptcp_sock *msk, struct sock *ssk)
> 	struct sock *sk = (struct sock *)msk;
> 	bool moved;
>
> -	moved = __mptcp_move_skbs_from_subflow(msk, ssk);
> +	moved = __mptcp_move_skbs_from_subflow(msk, ssk, true);
> 	__mptcp_ofo_queue(msk);
> 	if (unlikely(ssk->sk_err))
> 		__mptcp_subflow_error_report(sk, ssk);
> @@ -886,7 +883,7 @@ void mptcp_data_ready(struct sock *sk, struct sock *ssk)
> 		if (move_skbs_to_msk(msk, ssk) && mptcp_epollin_ready(sk))
> 			sk->sk_data_ready(sk);
> 	} else {
> -		__set_bit(MPTCP_DEQUEUE, &mptcp_sk(sk)->cb_flags);
> +		__mptcp_move_skbs_from_subflow(msk, ssk, false);
> 	}
> 	mptcp_data_unlock(sk);
> }
> @@ -2126,60 +2123,74 @@ static void mptcp_rcv_space_adjust(struct mptcp_sock *msk, int copied)
> 	msk->rcvq_space.time = mstamp;
> }
>
> -static struct mptcp_subflow_context *
> -__mptcp_first_ready_from(struct mptcp_sock *msk,
> -			 struct mptcp_subflow_context *subflow)
> +static bool __mptcp_move_skbs(struct sock *sk, struct list_head *skbs, u32 *delta)
> {
> -	struct mptcp_subflow_context *start_subflow = subflow;
> +	struct sk_buff *skb = list_first_entry(skbs, struct sk_buff, list);
> +	struct mptcp_sock *msk = mptcp_sk(sk);
> +	bool moved = false;
> +
> +	*delta = 0;
> +	while (1) {
> +		/* If the msk recvbuf is full stop, don't drop */
> +		if (sk_rmem_alloc_get(sk) > sk->sk_rcvbuf)
> +			break;
> +
> +		prefetch(skb->next);
> +		list_del(&skb->list);
> +		*delta += skb->truesize;
> +
> +		moved |= __mptcp_move_skb(sk, skb);
> +		if (list_empty(skbs))
> +			break;
>
> -	while (!READ_ONCE(subflow->data_avail)) {
> -		subflow = mptcp_next_subflow(msk, subflow);
> -		if (subflow == start_subflow)
> -			return NULL;
> +		skb = list_first_entry(skbs, struct sk_buff, list);
> 	}
> -	return subflow;
> +
> +	__mptcp_ofo_queue(msk);
> +	if (moved)
> +		mptcp_check_data_fin((struct sock *)msk);
> +	return moved;
> }
>
> -static bool __mptcp_move_skbs(struct sock *sk)
> +static bool mptcp_can_spool_backlog(struct sock *sk, struct list_head *skbs)
> {
> -	struct mptcp_subflow_context *subflow;
> 	struct mptcp_sock *msk = mptcp_sk(sk);
> -	bool ret = false;
>
> -	if (list_empty(&msk->conn_list))
> +	/* Don't spool the backlog if the rcvbuf is full. */
> +	if (list_empty(&msk->backlog_list) ||
> +	    sk_rmem_alloc_get(sk) > sk->sk_rcvbuf)
> 		return false;
>
> -	subflow = list_first_entry(&msk->conn_list,
> -				   struct mptcp_subflow_context, node);
> -	for (;;) {
> -		struct sock *ssk;
> -		bool slowpath;
> +	INIT_LIST_HEAD(skbs);
> +	list_splice_init(&msk->backlog_list, skbs);
> +	return true;
> +}
>
> -		/*
> -		 * As an optimization avoid traversing the subflows list
> -		 * and ev. acquiring the subflow socket lock before baling out
> -		 */
> -		if (sk_rmem_alloc_get(sk) > sk->sk_rcvbuf)
> -			break;
> +static void mptcp_backlog_spooled(struct sock *sk, u32 moved,
> +				  struct list_head *skbs)
> +{
> +	struct mptcp_sock *msk = mptcp_sk(sk);
>
> -		subflow = __mptcp_first_ready_from(msk, subflow);
> -		if (!subflow)
> -			break;
> +	WRITE_ONCE(msk->backlog_len, msk->backlog_len - moved);
> +	list_splice(skbs, &msk->backlog_list);
> +}
>
> -		ssk = mptcp_subflow_tcp_sock(subflow);
> -		slowpath = lock_sock_fast(ssk);
> -		ret = __mptcp_move_skbs_from_subflow(msk, ssk) || ret;
> -		if (unlikely(ssk->sk_err))
> -			__mptcp_error_report(sk);
> -		unlock_sock_fast(ssk, slowpath);
> +static bool mptcp_move_skbs(struct sock *sk)
> +{
> +	struct list_head skbs;
> +	bool enqueued = false;
> +	u32 moved;
>
> -		subflow = mptcp_next_subflow(msk, subflow);
> -	}
> +	mptcp_data_lock(sk);
> +	while (mptcp_can_spool_backlog(sk, &skbs)) {
> +		mptcp_data_unlock(sk);
> +		enqueued |= __mptcp_move_skbs(sk, &skbs, &moved);
>
> -	__mptcp_ofo_queue(msk);
> -	if (ret)
> -		mptcp_check_data_fin((struct sock *)msk);
> -	return ret;
> +		mptcp_data_lock(sk);
> +		mptcp_backlog_spooled(sk, moved, &skbs);
> +	}
> +	mptcp_data_unlock(sk);
> +	return enqueued;
> }
>
> static unsigned int mptcp_inq_hint(const struct sock *sk)
> @@ -2245,7 +2256,7 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
>
> 		copied += bytes_read;
>
> -		if (skb_queue_empty(&sk->sk_receive_queue) && __mptcp_move_skbs(sk))
> +		if (!list_empty(&msk->backlog_list) && mptcp_move_skbs(sk))
> 			continue;
>
> 		/* only the MPTCP socket status is relevant here. The exit
> @@ -3530,8 +3541,7 @@ void __mptcp_check_push(struct sock *sk, struct sock *ssk)
>
> #define MPTCP_FLAGS_PROCESS_CTX_NEED (BIT(MPTCP_PUSH_PENDING) | \
> 				      BIT(MPTCP_RETRANSMIT) | \
> -				      BIT(MPTCP_FLUSH_JOIN_LIST) | \
> -				      BIT(MPTCP_DEQUEUE))
> +				      BIT(MPTCP_FLUSH_JOIN_LIST))
>
> /* processes deferred events and flush wmem */
> static void mptcp_release_cb(struct sock *sk)
> @@ -3541,9 +3551,12 @@ static void mptcp_release_cb(struct sock *sk)
>
> 	for (;;) {
> 		unsigned long flags = (msk->cb_flags & MPTCP_FLAGS_PROCESS_CTX_NEED);
> -		struct list_head join_list;
> +		struct list_head join_list, skbs;
> +		bool spool_bl;
> +		u32 moved;
>
> -		if (!flags)
> +		spool_bl = mptcp_can_spool_backlog(sk, &skbs);
> +		if (!flags && !spool_bl)
> 			break;
>
> 		INIT_LIST_HEAD(&join_list);
> @@ -3565,7 +3578,7 @@ static void mptcp_release_cb(struct sock *sk)
> 			__mptcp_push_pending(sk, 0);
> 		if (flags & BIT(MPTCP_RETRANSMIT))
> 			__mptcp_retrans(sk);
> -		if ((flags & BIT(MPTCP_DEQUEUE)) && __mptcp_move_skbs(sk)) {
> +		if (spool_bl && __mptcp_move_skbs(sk, &skbs, &moved)) {
> 			/* notify ack seq update */
> 			mptcp_cleanup_rbuf(msk, 0);
> 			sk->sk_data_ready(sk);
> @@ -3573,6 +3586,8 @@ static void mptcp_release_cb(struct sock *sk)
>
> 		cond_resched();
> 		spin_lock_bh(&sk->sk_lock.slock);
> +		if (spool_bl)
> +			mptcp_backlog_spooled(sk, moved, &skbs);

Hi Paolo -

Given the discussion in v5, to address the "wild producer" scenario this 
loop can keep track of total_moved and exit the loop when that gets too 
large.

The question is what the total limit would be - available rcvbuf when 
entering the loop, or some multiple of that?

- Mat

> 	}
>
> 	if (__test_and_clear_bit(MPTCP_CLEAN_UNA, &msk->cb_flags))
> @@ -3805,7 +3820,7 @@ static int mptcp_ioctl(struct sock *sk, int cmd, int *karg)
> 			return -EINVAL;
>
> 		lock_sock(sk);
> -		if (__mptcp_move_skbs(sk))
> +		if (mptcp_move_skbs(sk))
> 			mptcp_cleanup_rbuf(msk, 0);
> 		*karg = mptcp_inq_hint(sk);
> 		release_sock(sk);
> diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
> index cf82aefb5513..8e0f780e9210 100644
> --- a/net/mptcp/protocol.h
> +++ b/net/mptcp/protocol.h
> @@ -124,7 +124,6 @@
> #define MPTCP_FLUSH_JOIN_LIST	5
> #define MPTCP_SYNC_STATE	6
> #define MPTCP_SYNC_SNDBUF	7
> -#define MPTCP_DEQUEUE		8
>
> struct mptcp_skb_cb {
> 	u64 map_seq;
> -- 
> 2.51.0
>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RESENT v7 mptcp-next 4/4] mptcp: leverage the backlog for RX packet processing
  2025-11-01  0:04   ` Mat Martineau
@ 2025-11-03 16:23     ` Paolo Abeni
  2025-11-04 22:32       ` Mat Martineau
  0 siblings, 1 reply; 11+ messages in thread
From: Paolo Abeni @ 2025-11-03 16:23 UTC (permalink / raw)
  To: Mat Martineau; +Cc: mptcp, geliang

On 11/1/25 1:04 AM, Mat Martineau wrote:
> On Mon, 27 Oct 2025, Paolo Abeni wrote:
>> @@ -3573,6 +3586,8 @@ static void mptcp_release_cb(struct sock *sk)
>>
>> 		cond_resched();
>> 		spin_lock_bh(&sk->sk_lock.slock);
>> +		if (spool_bl)
>> +			mptcp_backlog_spooled(sk, moved, &skbs);
> 
> Hi Paolo -
> 
> Given the discussion in v5, to address the "wild producer" scenario this 
> loop can keep track of total_moved and exit the loop when that gets too 
> large.
> 
> The question is what the total limit would be - available rcvbuf when 
> entering the loop, or some multiple of that?

I spent a lot of time around that choice. Limiting the loop here is
quite tricky. Stopping the loop after sk_rcvbuf bytes causes mptcp-level
OoO and adds additional complexity to the code (as the receiver will
need to check the backlog even when the receive queue is not empty.

Note that the code proposed in this series is as robust as the current
(i.e. release_cb() can already be bothered by a wild producer in exactly
the same way).

I propose to decouple the wild producer problem solution from these patches.

/P


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RESENT v7 mptcp-next 2/4] mptcp: borrow forward memory from subflow
  2025-10-31 22:44   ` Mat Martineau
@ 2025-11-03 16:26     ` Paolo Abeni
  0 siblings, 0 replies; 11+ messages in thread
From: Paolo Abeni @ 2025-11-03 16:26 UTC (permalink / raw)
  To: Mat Martineau, Matthieu Baerts (NGI0); +Cc: mptcp, geliang

On 10/31/25 11:44 PM, Mat Martineau wrote:
> On Mon, 27 Oct 2025, Paolo Abeni wrote:
>> @@ -646,6 +647,28 @@ mptcp_send_active_reset_reason(struct sock *sk)
>> 	tcp_send_active_reset(sk, GFP_ATOMIC, reason);
>> }
>>
>> +static inline void mptcp_borrow_fwdmem(struct sock *sk, struct sk_buff *skb)
>> +{
>> +	struct sock *ssk = skb->sk;
>> +
>> +	/* The subflow just lend the skb fwd memory, and we know that the skb
>> +	 * is only accounted on the incoming subflow rcvbuf.
>> +	 */
>> +	skb->sk = NULL;
> 
> Looks like the intent is to always pair mptcp_subflow_lend_fwd() with 
> mptcp_borrow_fwdmem() (in that order).
> 
> Given that skb->sk and skb->destructor are usually cleared together, 
> should mptcp_borrow_fwdmem() have a WARN_ON_ONCE(skb->destructor)? Or is 
> that excessive paranoia?
I thought about that, and it indeed it felt a little bit paranoid. No
strong opinion against adding it anyway.

Please LMK if you prefer a new version.

/P


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RESENT v7 mptcp-next 4/4] mptcp: leverage the backlog for RX packet processing
  2025-11-03 16:23     ` Paolo Abeni
@ 2025-11-04 22:32       ` Mat Martineau
  0 siblings, 0 replies; 11+ messages in thread
From: Mat Martineau @ 2025-11-04 22:32 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: mptcp, geliang

On Mon, 3 Nov 2025, Paolo Abeni wrote:

> On 11/1/25 1:04 AM, Mat Martineau wrote:
>> On Mon, 27 Oct 2025, Paolo Abeni wrote:
>>> @@ -3573,6 +3586,8 @@ static void mptcp_release_cb(struct sock *sk)
>>>
>>> 		cond_resched();
>>> 		spin_lock_bh(&sk->sk_lock.slock);
>>> +		if (spool_bl)
>>> +			mptcp_backlog_spooled(sk, moved, &skbs);
>>
>> Hi Paolo -
>>
>> Given the discussion in v5, to address the "wild producer" scenario this
>> loop can keep track of total_moved and exit the loop when that gets too
>> large.
>>
>> The question is what the total limit would be - available rcvbuf when
>> entering the loop, or some multiple of that?
>
> I spent a lot of time around that choice. Limiting the loop here is
> quite tricky. Stopping the loop after sk_rcvbuf bytes causes mptcp-level
> OoO and adds additional complexity to the code (as the receiver will
> need to check the backlog even when the receive queue is not empty.
>
> Note that the code proposed in this series is as robust as the current
> (i.e. release_cb() can already be bothered by a wild producer in exactly
> the same way).
>
> I propose to decouple the wild producer problem solution from these patches.

Sounds reasonable to me. After the v5 thread, wanted to be sure I was 
clearly understanding the tradeoffs. Thanks for the details!

- Mat

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2025-11-04 22:32 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-27 14:57 [PATCH RESENT v7 mptcp-next 0/4] mptcp: introduce backlog processing Paolo Abeni
2025-10-27 14:57 ` [PATCH RESENT v7 mptcp-next 1/4] mptcp: handle first subflow closing consistently Paolo Abeni
2025-10-27 14:58 ` [PATCH RESENT v7 mptcp-next 2/4] mptcp: borrow forward memory from subflow Paolo Abeni
2025-10-31 22:44   ` Mat Martineau
2025-11-03 16:26     ` Paolo Abeni
2025-10-27 14:58 ` [PATCH RESENT v7 mptcp-next 3/4] mptcp: introduce mptcp-level backlog Paolo Abeni
2025-10-27 14:58 ` [PATCH RESENT v7 mptcp-next 4/4] mptcp: leverage the backlog for RX packet processing Paolo Abeni
2025-11-01  0:04   ` Mat Martineau
2025-11-03 16:23     ` Paolo Abeni
2025-11-04 22:32       ` Mat Martineau
2025-10-27 16:13 ` [PATCH RESENT v7 mptcp-next 0/4] mptcp: introduce backlog processing MPTCP CI

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox