All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v7 mptcp-next 0/4] mptcp: introduce backlog processing
@ 2025-10-27 14:40 Paolo Abeni
  2025-10-27 14:40 ` [PATCH v7 mptcp-next 1/4] DO-NOT-MERGE: mptcp: enabled by default Paolo Abeni
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Paolo Abeni @ 2025-10-27 14:40 UTC (permalink / raw)
  To: mptcp; +Cc: Mat Martineau, geliang

This series includes RX path improvement built around backlog processing

The main goals are improving the RX performances _and_ increase the
long term maintainability.

Patch 1 and 2 refactor the memory account logic in the RX path, so that
the msk don't need anymore to do fwd allocation, removing possible drop
sources.

Patch 3 and 4 cope with backlog processing. Patch 3 introduces the
helpers needed to manipulate the msk-level backlog, and the data struct
itself, without any actual functional change. Patch 4 finally use the
backlog for RX skb processing. Note that MPTCP can't use the sk_backlog,
as the mptcp release callback can also release and re-acquire the
msk-level spinlock and core backlog processing works under the
assumption that such event is not possible.

A relevant point is memory accounts for skbs in the backlog.

It's somewhat "original" due to MPTCP constraints. Such skbs use space
from the incoming subflow receive buffer, but are fwd memory accounted
on the msk, using memory borrowed by the subflow.

Instead the msk borrows memory from the subflow and reserve it for
the backlog - see patch 3 and 11 for the gory details.
---
v6 -> v7:
 - dropped merged patches
 - added patch 1/4
 - refactor borrow/account logic, see individual patches for the details

Matthieu Baerts (1):
  DO-NOT-MERGE: mptcp: enabled by default

Paolo Abeni (3):
  mptcp: handle first subflow closing consistently
  mptcp: borrow forward memory from subflow
  mptcp: introduce mptcp-level backlog

 net/mptcp/Kconfig      |   1 +
 net/mptcp/fastopen.c   |   4 +-
 net/mptcp/mib.c        |   1 -
 net/mptcp/mib.h        |   1 -
 net/mptcp/mptcp_diag.c |   3 +-
 net/mptcp/protocol.c   | 112 +++++++++++++++++++++++++++++++++++------
 net/mptcp/protocol.h   |  39 +++++++++++++-
 7 files changed, 142 insertions(+), 19 deletions(-)

-- 
2.51.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v7 mptcp-next 1/4] DO-NOT-MERGE: mptcp: enabled by default
  2025-10-27 14:40 [PATCH v7 mptcp-next 0/4] mptcp: introduce backlog processing Paolo Abeni
@ 2025-10-27 14:40 ` Paolo Abeni
  2025-10-27 14:40 ` [PATCH v7 mptcp-next 2/4] mptcp: handle first subflow closing consistently Paolo Abeni
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Paolo Abeni @ 2025-10-27 14:40 UTC (permalink / raw)
  To: mptcp; +Cc: Mat Martineau, geliang

From: Matthieu Baerts <matttbe@kernel.org>

This commit is useful for automated builds, e.g. from Intel's kbuild.

Signed-off-by: Matthieu Baerts <matttbe@kernel.org>
---
 net/mptcp/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/mptcp/Kconfig b/net/mptcp/Kconfig
index b755fc9b6660..f674915dc31e 100644
--- a/net/mptcp/Kconfig
+++ b/net/mptcp/Kconfig
@@ -5,6 +5,7 @@ config MPTCP
 	select SKB_EXTENSIONS
 	select CRYPTO_LIB_SHA256
 	select CRYPTO
+	default y
 	help
 	  Multipath TCP (MPTCP) connections send and receive data over multiple
 	  subflows in order to utilize multiple network paths. Each subflow
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v7 mptcp-next 2/4] mptcp: handle first subflow closing consistently
  2025-10-27 14:40 [PATCH v7 mptcp-next 0/4] mptcp: introduce backlog processing Paolo Abeni
  2025-10-27 14:40 ` [PATCH v7 mptcp-next 1/4] DO-NOT-MERGE: mptcp: enabled by default Paolo Abeni
@ 2025-10-27 14:40 ` Paolo Abeni
  2025-10-27 14:40 ` [PATCH v7 mptcp-next 3/4] mptcp: borrow forward memory from subflow Paolo Abeni
  2025-10-27 14:54 ` [PATCH v7 mptcp-next 0/4] mptcp: introduce backlog processing Paolo Abeni
  3 siblings, 0 replies; 5+ messages in thread
From: Paolo Abeni @ 2025-10-27 14:40 UTC (permalink / raw)
  To: mptcp; +Cc: Mat Martineau, geliang

Currently, as soon as the PM closes a subflow, the msk stops accepting data
from it, even if the TCP socket could be still formally open in the
incoming direction, with the notable exception of the first subflow.

The root cause of such behavior is that code currently piggy back two
separate semantic on the subflow->disposable bit: the subflow context
must be released and that the subflow must stop accepting incoming
data.

The first subflow is never disposed, so it also never stop accepting
incoming data. Use a separate bit to mark to mark the latter status
and set such bit in __mptcp_close_ssk() for all subflows.

Beyond making per subflow behaviour more consistent this will also
simplify the next patch.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/mptcp/protocol.c | 14 +++++++++-----
 net/mptcp/protocol.h |  3 ++-
 2 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index f4e3d0be7c87..74be417be980 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -842,10 +842,10 @@ void mptcp_data_ready(struct sock *sk, struct sock *ssk)
 	struct mptcp_sock *msk = mptcp_sk(sk);
 
 	/* The peer can send data while we are shutting down this
-	 * subflow at msk destruction time, but we must avoid enqueuing
+	 * subflow at subflow destruction time, but we must avoid enqueuing
 	 * more data to the msk receive queue
 	 */
-	if (unlikely(subflow->disposable))
+	if (unlikely(subflow->closing))
 		return;
 
 	mptcp_data_lock(sk);
@@ -2429,6 +2429,13 @@ static void __mptcp_close_ssk(struct sock *sk, struct sock *ssk,
 	struct mptcp_sock *msk = mptcp_sk(sk);
 	bool dispose_it, need_push = false;
 
+	/* Do not pass RX data to the msk, even if the subflow socket is not
+	 * going to be freed (i.e. even for the first subflow on graceful
+	 * subflow close.
+	 */
+	lock_sock_nested(ssk, SINGLE_DEPTH_NESTING);
+	subflow->closing = 1;
+
 	/* If the first subflow moved to a close state before accept, e.g. due
 	 * to an incoming reset or listener shutdown, the subflow socket is
 	 * already deleted by inet_child_forget() and the mptcp socket can't
@@ -2439,7 +2446,6 @@ static void __mptcp_close_ssk(struct sock *sk, struct sock *ssk,
 		/* ensure later check in mptcp_worker() will dispose the msk */
 		sock_set_flag(sk, SOCK_DEAD);
 		mptcp_set_close_tout(sk, tcp_jiffies32 - (mptcp_close_timeout(sk) + 1));
-		lock_sock_nested(ssk, SINGLE_DEPTH_NESTING);
 		mptcp_subflow_drop_ctx(ssk);
 		goto out_release;
 	}
@@ -2448,8 +2454,6 @@ static void __mptcp_close_ssk(struct sock *sk, struct sock *ssk,
 	if (dispose_it)
 		list_del(&subflow->node);
 
-	lock_sock_nested(ssk, SINGLE_DEPTH_NESTING);
-
 	if ((flags & MPTCP_CF_FASTCLOSE) && !__mptcp_check_fallback(msk)) {
 		/* be sure to force the tcp_close path
 		 * to generate the egress reset
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index cd6350073144..9f7e5f2c964d 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -536,12 +536,13 @@ struct mptcp_subflow_context {
 		send_infinite_map : 1,
 		remote_key_valid : 1,        /* received the peer key from */
 		disposable : 1,	    /* ctx can be free at ulp release time */
+		closing : 1,	    /* must not pass rx data to msk anymore */
 		stale : 1,	    /* unable to snd/rcv data, do not use for xmit */
 		valid_csum_seen : 1,        /* at least one csum validated */
 		is_mptfo : 1,	    /* subflow is doing TFO */
 		close_event_done : 1,       /* has done the post-closed part */
 		mpc_drop : 1,	    /* the MPC option has been dropped in a rtx */
-		__unused : 9;
+		__unused : 8;
 	bool	data_avail;
 	bool	scheduled;
 	bool	pm_listener;	    /* a listener managed by the kernel PM? */
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v7 mptcp-next 3/4] mptcp: borrow forward memory from subflow
  2025-10-27 14:40 [PATCH v7 mptcp-next 0/4] mptcp: introduce backlog processing Paolo Abeni
  2025-10-27 14:40 ` [PATCH v7 mptcp-next 1/4] DO-NOT-MERGE: mptcp: enabled by default Paolo Abeni
  2025-10-27 14:40 ` [PATCH v7 mptcp-next 2/4] mptcp: handle first subflow closing consistently Paolo Abeni
@ 2025-10-27 14:40 ` Paolo Abeni
  2025-10-27 14:54 ` [PATCH v7 mptcp-next 0/4] mptcp: introduce backlog processing Paolo Abeni
  3 siblings, 0 replies; 5+ messages in thread
From: Paolo Abeni @ 2025-10-27 14:40 UTC (permalink / raw)
  To: mptcp; +Cc: Mat Martineau, geliang

In the MPTCP receive path, we release the subflow allocated fwd
memory just to allocate it again shortly after for the msk.

That could increases the failures chances, especially when we will
add backlog processing, with other actions could consume the just
released memory before the msk socket has a chance to do the
rcv allocation.

Replace the skb_orphan() call with an open-coded variant that
explicitly borrows, the fwd memory from the subflow socket instead
of releasing it.

The borrowed memory does not have PAGE_SIZE granularity; rounding to
the page size will make the fwd allocated memory higher than what is
strictly required and could make the incoming subflow fwd mem
consistently negative. Instead, keep track of the accumulated frag and
borrow the full page at subflow close time.

This allow removing the last drop in the TCP to MPTCP transition and
the associated, now unused, MIB.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/mptcp/fastopen.c |  4 +++-
 net/mptcp/mib.c      |  1 -
 net/mptcp/mib.h      |  1 -
 net/mptcp/protocol.c | 23 +++++++++++++++--------
 net/mptcp/protocol.h | 23 +++++++++++++++++++++++
 5 files changed, 41 insertions(+), 11 deletions(-)

diff --git a/net/mptcp/fastopen.c b/net/mptcp/fastopen.c
index b9e451197902..82ec15bcfd7f 100644
--- a/net/mptcp/fastopen.c
+++ b/net/mptcp/fastopen.c
@@ -32,7 +32,8 @@ void mptcp_fastopen_subflow_synack_set_params(struct mptcp_subflow_context *subf
 	/* dequeue the skb from sk receive queue */
 	__skb_unlink(skb, &ssk->sk_receive_queue);
 	skb_ext_reset(skb);
-	skb_orphan(skb);
+
+	mptcp_subflow_lend_fwdmem(subflow, skb);
 
 	/* We copy the fastopen data, but that don't belong to the mptcp sequence
 	 * space, need to offset it in the subflow sequence, see mptcp_subflow_get_map_offset()
@@ -50,6 +51,7 @@ void mptcp_fastopen_subflow_synack_set_params(struct mptcp_subflow_context *subf
 	mptcp_data_lock(sk);
 	DEBUG_NET_WARN_ON_ONCE(sock_owned_by_user_nocheck(sk));
 
+	mptcp_borrow_fwdmem(sk, skb);
 	skb_set_owner_r(skb, sk);
 	__skb_queue_tail(&sk->sk_receive_queue, skb);
 	mptcp_sk(sk)->bytes_received += skb->len;
diff --git a/net/mptcp/mib.c b/net/mptcp/mib.c
index 171643815076..f23fda0c55a7 100644
--- a/net/mptcp/mib.c
+++ b/net/mptcp/mib.c
@@ -71,7 +71,6 @@ static const struct snmp_mib mptcp_snmp_list[] = {
 	SNMP_MIB_ITEM("MPFastcloseRx", MPTCP_MIB_MPFASTCLOSERX),
 	SNMP_MIB_ITEM("MPRstTx", MPTCP_MIB_MPRSTTX),
 	SNMP_MIB_ITEM("MPRstRx", MPTCP_MIB_MPRSTRX),
-	SNMP_MIB_ITEM("RcvPruned", MPTCP_MIB_RCVPRUNED),
 	SNMP_MIB_ITEM("SubflowStale", MPTCP_MIB_SUBFLOWSTALE),
 	SNMP_MIB_ITEM("SubflowRecover", MPTCP_MIB_SUBFLOWRECOVER),
 	SNMP_MIB_ITEM("SndWndShared", MPTCP_MIB_SNDWNDSHARED),
diff --git a/net/mptcp/mib.h b/net/mptcp/mib.h
index a1d3e9369fbb..812218b5ed2b 100644
--- a/net/mptcp/mib.h
+++ b/net/mptcp/mib.h
@@ -70,7 +70,6 @@ enum linux_mptcp_mib_field {
 	MPTCP_MIB_MPFASTCLOSERX,	/* Received a MP_FASTCLOSE */
 	MPTCP_MIB_MPRSTTX,		/* Transmit a MP_RST */
 	MPTCP_MIB_MPRSTRX,		/* Received a MP_RST */
-	MPTCP_MIB_RCVPRUNED,		/* Incoming packet dropped due to memory limit */
 	MPTCP_MIB_SUBFLOWSTALE,		/* Subflows entered 'stale' status */
 	MPTCP_MIB_SUBFLOWRECOVER,	/* Subflows returned to active status after being stale */
 	MPTCP_MIB_SNDWNDSHARED,		/* Subflow snd wnd is overridden by msk's one */
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 74be417be980..f6d96cb01e00 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -349,7 +349,7 @@ static void mptcp_data_queue_ofo(struct mptcp_sock *msk, struct sk_buff *skb)
 static void mptcp_init_skb(struct sock *ssk, struct sk_buff *skb, int offset,
 			   int copy_len)
 {
-	const struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
+	struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
 	bool has_rxtstamp = TCP_SKB_CB(skb)->has_rxtstamp;
 
 	/* the skb map_seq accounts for the skb offset:
@@ -374,11 +374,7 @@ static bool __mptcp_move_skb(struct sock *sk, struct sk_buff *skb)
 	struct mptcp_sock *msk = mptcp_sk(sk);
 	struct sk_buff *tail;
 
-	/* try to fetch required memory from subflow */
-	if (!sk_rmem_schedule(sk, skb, skb->truesize)) {
-		MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_RCVPRUNED);
-		goto drop;
-	}
+	mptcp_borrow_fwdmem(sk, skb);
 
 	if (MPTCP_SKB_CB(skb)->map_seq == msk->ack_seq) {
 		/* in sequence */
@@ -400,7 +396,6 @@ static bool __mptcp_move_skb(struct sock *sk, struct sk_buff *skb)
 	 * will retransmit as needed, if needed.
 	 */
 	MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_DUPDATA);
-drop:
 	mptcp_drop(sk, skb);
 	return false;
 }
@@ -701,7 +696,7 @@ static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk,
 			size_t len = skb->len - offset;
 
 			mptcp_init_skb(ssk, skb, offset, len);
-			skb_orphan(skb);
+			mptcp_subflow_lend_fwdmem(subflow, skb);
 			ret = __mptcp_move_skb(sk, skb) || ret;
 			seq += len;
 
@@ -2428,6 +2423,7 @@ static void __mptcp_close_ssk(struct sock *sk, struct sock *ssk,
 {
 	struct mptcp_sock *msk = mptcp_sk(sk);
 	bool dispose_it, need_push = false;
+	int fwd_remaning;
 
 	/* Do not pass RX data to the msk, even if the subflow socket is not
 	 * going to be freed (i.e. even for the first subflow on graceful
@@ -2436,6 +2432,17 @@ static void __mptcp_close_ssk(struct sock *sk, struct sock *ssk,
 	lock_sock_nested(ssk, SINGLE_DEPTH_NESTING);
 	subflow->closing = 1;
 
+	/* Borrow the fwd allocated page left-over; fwd memory for the subflow
+	 * could be negative at this point, but will be reach zero soon - when
+	 * the data allocated using such fragment will be freed.
+	 */
+	if (subflow->lent_mem_frag) {
+		fwd_remaning = PAGE_SIZE - subflow->lent_mem_frag;
+		sk_forward_alloc_add(sk, fwd_remaning);
+		sk_forward_alloc_add(ssk, -fwd_remaning);
+		subflow->lent_mem_frag = 0;
+	}
+
 	/* If the first subflow moved to a close state before accept, e.g. due
 	 * to an incoming reset or listener shutdown, the subflow socket is
 	 * already deleted by inet_child_forget() and the mptcp socket can't
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 9f7e5f2c964d..80d520888235 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -547,6 +547,7 @@ struct mptcp_subflow_context {
 	bool	scheduled;
 	bool	pm_listener;	    /* a listener managed by the kernel PM? */
 	bool	fully_established;  /* path validated */
+	u32	lent_mem_frag;
 	u32	remote_nonce;
 	u64	thmac;
 	u32	local_nonce;
@@ -646,6 +647,28 @@ mptcp_send_active_reset_reason(struct sock *sk)
 	tcp_send_active_reset(sk, GFP_ATOMIC, reason);
 }
 
+static inline void mptcp_borrow_fwdmem(struct sock *sk, struct sk_buff *skb)
+{
+	struct sock *ssk = skb->sk;
+
+	/* The subflow just lend the skb fwd memory, and we know that the skb
+	 * is only accounted on the incoming subflow rcvbuf.
+	 */
+	skb->sk = NULL;
+	sk_forward_alloc_add(sk, skb->truesize);
+	atomic_sub(skb->truesize, &ssk->sk_rmem_alloc);
+}
+
+static inline void
+mptcp_subflow_lend_fwdmem(struct mptcp_subflow_context *subflow,
+			  struct sk_buff *skb)
+{
+	int frag = (subflow->lent_mem_frag + skb->truesize) & (PAGE_SIZE - 1);
+
+	skb->destructor = NULL;
+	subflow->lent_mem_frag = frag;
+}
+
 static inline u64
 mptcp_subflow_get_map_offset(const struct mptcp_subflow_context *subflow)
 {
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v7 mptcp-next 0/4] mptcp: introduce backlog processing
  2025-10-27 14:40 [PATCH v7 mptcp-next 0/4] mptcp: introduce backlog processing Paolo Abeni
                   ` (2 preceding siblings ...)
  2025-10-27 14:40 ` [PATCH v7 mptcp-next 3/4] mptcp: borrow forward memory from subflow Paolo Abeni
@ 2025-10-27 14:54 ` Paolo Abeni
  3 siblings, 0 replies; 5+ messages in thread
From: Paolo Abeni @ 2025-10-27 14:54 UTC (permalink / raw)
  To: mptcp; +Cc: Mat Martineau, geliang

On 10/27/25 3:40 PM, Paolo Abeni wrote:
> This series includes RX path improvement built around backlog processing
> 
> The main goals are improving the RX performances _and_ increase the
> long term maintainability.
> 
> Patch 1 and 2 refactor the memory account logic in the RX path, so that
> the msk don't need anymore to do fwd allocation, removing possible drop
> sources.
> 
> Patch 3 and 4 cope with backlog processing. Patch 3 introduces the
> helpers needed to manipulate the msk-level backlog, and the data struct
> itself, without any actual functional change. Patch 4 finally use the
> backlog for RX skb processing. Note that MPTCP can't use the sk_backlog,
> as the mptcp release callback can also release and re-acquire the
> msk-level spinlock and core backlog processing works under the
> assumption that such event is not possible.
> 
> A relevant point is memory accounts for skbs in the backlog.
> 
> It's somewhat "original" due to MPTCP constraints. Such skbs use space
> from the incoming subflow receive buffer, but are fwd memory accounted
> on the msk, using memory borrowed by the subflow.
> 
> Instead the msk borrows memory from the subflow and reserve it for
> the backlog - see patch 3 and 11 for the gory details.

whoops, sorry PEBKAC here. Please ignore. I'll resend shortly hopefully
in a less corrupted form.

/P


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-10-27 14:54 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-27 14:40 [PATCH v7 mptcp-next 0/4] mptcp: introduce backlog processing Paolo Abeni
2025-10-27 14:40 ` [PATCH v7 mptcp-next 1/4] DO-NOT-MERGE: mptcp: enabled by default Paolo Abeni
2025-10-27 14:40 ` [PATCH v7 mptcp-next 2/4] mptcp: handle first subflow closing consistently Paolo Abeni
2025-10-27 14:40 ` [PATCH v7 mptcp-next 3/4] mptcp: borrow forward memory from subflow Paolo Abeni
2025-10-27 14:54 ` [PATCH v7 mptcp-next 0/4] mptcp: introduce backlog processing Paolo Abeni

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.