[MPTCP next v3 00/12] mptcp: receive path improvement

All of lore.kernel.org
 help / color / mirror / Atom feed

* [MPTCP next v3 00/12] mptcp: receive path improvement
@ 2025-09-19 15:53 Paolo Abeni
  2025-09-19 15:53 ` [MPTCP next v3 01/12] mptcp: leverage skb deferral free Paolo Abeni
                   ` (13 more replies)
  0 siblings, 14 replies; 31+ messages in thread
From: Paolo Abeni @ 2025-09-19 15:53 UTC (permalink / raw)
  To: mptcp

This series includes several changes to the MPTCP RX path.

The main goals are improving the RX performances _and_ increase the
long term maintainability.

Some changes reflects recent (or not so) improvements introduced in the
TCP stack: patch 1, 2 and 3 are the MPTCP counter part of skb deferral
free and auto-tuning improvements.

Note that patch 3 could possibly fix issues/574, and overall such patch
should protect from similar issues to arise in the future.

All the others patches are aimed at introducing the socket backlog usage
to process the packets received by the subflows while the msk socket is
owned. That (almost completely) replace the processing currently
happening in the mptcp_release_cb().

The actual job is done in patch 10, while the others are cleanups needed
to make the change tidy and more follow-up cleanups.

Sharing earlier with known issues (at least on fallback socket) to raise
awareness about this upcoming work.
---
v2 -> v3:
  - (hopefully) addressed CI failures
  - reordered to avoid trainsintly breaking fallback
  - refactor patch 3/12

v1 -> v2:
  - fix compile warn in patch 3
  - removed unneeded arg in patch 4
  - commit msg clarification and rebase

Paolo Abeni (12):
  mptcp: leverage skb deferral free
  tcp: make tcp_rcvbuf_grow() accessible to mptcp code
  mptcp: rcvbuf auto-tuning improvement
  mptcp: introduce the mptcp_init_skb helper.
  mptcp: remove unneeded mptcp_move_skb()
  mptcp: factor out a basic skb coalesce helper
  mptcp: minor move_skbs_to_msk() cleanup
  mptcp: cleanup fallback data fin reception
  mptcp: cleanup fallback dummy mapping generation.
  mptcp: leverage the sk backlog for RX packet processing.
  mptcp: prevernt __mptcp_move_skbs() interfering with the fastpath
  mptcp: borrow forward memory from subflow

 include/net/tcp.h    |   1 +
 net/ipv4/tcp_input.c |   2 +-
 net/mptcp/mib.c      |   2 +
 net/mptcp/mib.h      |   4 +
 net/mptcp/protocol.c | 343 ++++++++++++++++++++++++-------------------
 net/mptcp/protocol.h |   8 +-
 net/mptcp/subflow.c  |  12 +-
 7 files changed, 218 insertions(+), 154 deletions(-)

-- 
2.51.0

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [MPTCP next v3 01/12] mptcp: leverage skb deferral free
  2025-09-19 15:53 [MPTCP next v3 00/12] mptcp: receive path improvement Paolo Abeni
@ 2025-09-19 15:53 ` Paolo Abeni
  2025-09-19 15:53 ` [MPTCP next v3 02/12] tcp: make tcp_rcvbuf_grow() accessible to mptcp code Paolo Abeni
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 31+ messages in thread
From: Paolo Abeni @ 2025-09-19 15:53 UTC (permalink / raw)
  To: mptcp

Usage of the skb deferral API is straight-forward; with multiple
subflows actives this allow moving part of the received application
load into multiple CPUs.

Also fix a typo in the related comment.

Reviewed-by: Geliang Tang <geliang@kernel.org>
Tested-by: Geliang Tang <geliang@kernel.org>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/mptcp/protocol.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 7933291e991ce..9d95d24781509 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -1943,12 +1943,13 @@ static int __mptcp_recvmsg_mskq(struct sock *sk,
 		}
 
 		if (!(flags & MSG_PEEK)) {
-			/* avoid the indirect call, we know the destructor is sock_wfree */
+			/* avoid the indirect call, we know the destructor is sock_rfree */
 			skb->destructor = NULL;
+			skb->sk = NULL;
 			atomic_sub(skb->truesize, &sk->sk_rmem_alloc);
 			sk_mem_uncharge(sk, skb->truesize);
 			__skb_unlink(skb, &sk->sk_receive_queue);
-			__kfree_skb(skb);
+			skb_attempt_defer_free(skb);
 			msk->bytes_consumed += count;
 		}
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [MPTCP next v3 02/12] tcp: make tcp_rcvbuf_grow() accessible to mptcp code
  2025-09-19 15:53 [MPTCP next v3 00/12] mptcp: receive path improvement Paolo Abeni
  2025-09-19 15:53 ` [MPTCP next v3 01/12] mptcp: leverage skb deferral free Paolo Abeni
@ 2025-09-19 15:53 ` Paolo Abeni
  2025-09-19 15:53 ` [MPTCP next v3 03/12] mptcp: rcvbuf auto-tuning improvement Paolo Abeni
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 31+ messages in thread
From: Paolo Abeni @ 2025-09-19 15:53 UTC (permalink / raw)
  To: mptcp

To leverage the auto-tuning improvements brought by commit 2da35e4b4df9
("Merge branch 'tcp-receive-side-improvements'"), the MPTCP stack need
to access the mentioned helper.

Reviewed-by: Geliang Tang <geliang@kernel.org>
Tested-by: Geliang Tang <geliang@kernel.org>
Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 include/net/tcp.h    | 1 +
 net/ipv4/tcp_input.c | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index e25340459ce4a..db2a4e05147fa 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -356,6 +356,7 @@ void tcp_delack_timer_handler(struct sock *sk);
 int tcp_ioctl(struct sock *sk, int cmd, int *karg);
 enum skb_drop_reason tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb);
 void tcp_rcv_established(struct sock *sk, struct sk_buff *skb);
+void tcp_rcvbuf_grow(struct sock *sk);
 void tcp_rcv_space_adjust(struct sock *sk);
 int tcp_twsk_unique(struct sock *sk, struct sock *sktw, void *twp);
 void tcp_twsk_destructor(struct sock *sk);
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index b2793e749cfd9..ad09995a1aaec 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -701,7 +701,7 @@ static inline void tcp_rcv_rtt_measure_ts(struct sock *sk,
 	}
 }
 
-static void tcp_rcvbuf_grow(struct sock *sk)
+void tcp_rcvbuf_grow(struct sock *sk)
 {
 	const struct net *net = sock_net(sk);
 	struct tcp_sock *tp = tcp_sk(sk);
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [MPTCP next v3 03/12] mptcp: rcvbuf auto-tuning improvement
  2025-09-19 15:53 [MPTCP next v3 00/12] mptcp: receive path improvement Paolo Abeni
  2025-09-19 15:53 ` [MPTCP next v3 01/12] mptcp: leverage skb deferral free Paolo Abeni
  2025-09-19 15:53 ` [MPTCP next v3 02/12] tcp: make tcp_rcvbuf_grow() accessible to mptcp code Paolo Abeni
@ 2025-09-19 15:53 ` Paolo Abeni
  2025-09-19 21:10   ` Matthieu Baerts
  2025-09-19 15:53 ` [MPTCP next v3 04/12] mptcp: introduce the mptcp_init_skb helper Paolo Abeni
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 31+ messages in thread
From: Paolo Abeni @ 2025-09-19 15:53 UTC (permalink / raw)
  To: mptcp

Apply to the MPTCP auto-tuning the same improvements introduced for the
TCP protocol by the merge commit 2da35e4b4df9 ("Merge branch
'tcp-receive-side-improvements'").

The main difference is that TCP subflow and the main MPTCP socket need
to account separately for OoO: MPTCP does not care for TCP-level OoO
and vice versa, as a consequence do not reflect MPTCP-level rcvbuf
increase due to OoO packets at the subflow level.

This refeactor additionally allow dropping the msk receive buffer update
at receive time, as the latter only intended to cope with subflow receive
buffer increase due to OoO packets.

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/487
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/559
Reviewed-by: Geliang Tang <geliang@kernel.org>
Tested-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
v2 -> v3:
  - copy tcp_rcvbuf_grow() implementation, verbatim
  - intentionally omitted Mat's tag due to the many changes

v1 -> v2:
  - fix unused variable
  - reword the commit message
---
 net/mptcp/protocol.c | 97 +++++++++++++++++++++-----------------------
 net/mptcp/protocol.h |  4 +-
 2 files changed, 49 insertions(+), 52 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 9d95d24781509..c9fcdbaf50874 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -179,6 +179,35 @@ static bool mptcp_ooo_try_coalesce(struct mptcp_sock *msk, struct sk_buff *to,
 	return mptcp_try_coalesce((struct sock *)msk, to, from);
 }
 
+/* "inspired" by tcp_rcvbuf_grow(), main difference:
+ * - mptcp does not maintain a msk-level window clamp
+ * - returns true when  the receive buffer is actually updated
+ */
+static bool mptcp_rcvbuf_grow(struct sock *sk)
+{
+	struct mptcp_sock *msk = mptcp_sk(sk);
+	const struct net *net = sock_net(sk);
+	int rcvwin, rcvbuf, cap;
+
+	if (!READ_ONCE(net->ipv4.sysctl_tcp_moderate_rcvbuf) ||
+	    (sk->sk_userlocks & SOCK_RCVBUF_LOCK))
+		return false;
+
+	rcvwin = msk->rcvq_space.space << 1;
+
+	if (!RB_EMPTY_ROOT(&msk->out_of_order_queue))
+		rcvwin += MPTCP_SKB_CB(msk->ooo_last_skb)->end_seq - msk->ack_seq;
+
+	cap = READ_ONCE(net->ipv4.sysctl_tcp_rmem[2]);
+
+	rcvbuf = min_t(u32, mptcp_space_from_win(sk, rcvwin), cap);
+	if (rcvbuf > sk->sk_rcvbuf) {
+		WRITE_ONCE(sk->sk_rcvbuf, rcvbuf);
+		return true;
+	}
+	return false;
+}
+
 /* "inspired" by tcp_data_queue_ofo(), main differences:
  * - use mptcp seqs
  * - don't cope with sacks
@@ -292,6 +321,9 @@ static void mptcp_data_queue_ofo(struct mptcp_sock *msk, struct sk_buff *skb)
 end:
 	skb_condense(skb);
 	skb_set_owner_r(skb, sk);
+	/* do not grow rcvbuf for not-yet-accepted or orphaned sockets. */
+	if (sk->sk_socket)
+		mptcp_rcvbuf_grow(sk);
 }
 
 static bool __mptcp_move_skb(struct mptcp_sock *msk, struct sock *ssk,
@@ -784,18 +816,10 @@ static bool move_skbs_to_msk(struct mptcp_sock *msk, struct sock *ssk)
 	return moved;
 }
 
-static void __mptcp_rcvbuf_update(struct sock *sk, struct sock *ssk)
-{
-	if (unlikely(ssk->sk_rcvbuf > sk->sk_rcvbuf))
-		WRITE_ONCE(sk->sk_rcvbuf, ssk->sk_rcvbuf);
-}
-
 static void __mptcp_data_ready(struct sock *sk, struct sock *ssk)
 {
 	struct mptcp_sock *msk = mptcp_sk(sk);
 
-	__mptcp_rcvbuf_update(sk, ssk);
-
 	/* Wake-up the reader only for in-sequence data */
 	if (move_skbs_to_msk(msk, ssk) && mptcp_epollin_ready(sk))
 		sk->sk_data_ready(sk);
@@ -2014,48 +2038,26 @@ static void mptcp_rcv_space_adjust(struct mptcp_sock *msk, int copied)
 	if (msk->rcvq_space.copied <= msk->rcvq_space.space)
 		goto new_measure;
 
-	if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_moderate_rcvbuf) &&
-	    !(sk->sk_userlocks & SOCK_RCVBUF_LOCK)) {
-		u64 rcvwin, grow;
-		int rcvbuf;
-
-		rcvwin = ((u64)msk->rcvq_space.copied << 1) + 16 * advmss;
-
-		grow = rcvwin * (msk->rcvq_space.copied - msk->rcvq_space.space);
-
-		do_div(grow, msk->rcvq_space.space);
-		rcvwin += (grow << 1);
-
-		rcvbuf = min_t(u64, mptcp_space_from_win(sk, rcvwin),
-			       READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_rmem[2]));
-
-		if (rcvbuf > sk->sk_rcvbuf) {
-			u32 window_clamp;
-
-			window_clamp = mptcp_win_from_space(sk, rcvbuf);
-			WRITE_ONCE(sk->sk_rcvbuf, rcvbuf);
+	msk->rcvq_space.space = msk->rcvq_space.copied;
+	if (mptcp_rcvbuf_grow(sk)) {
 
-			/* Make subflows follow along.  If we do not do this, we
-			 * get drops at subflow level if skbs can't be moved to
-			 * the mptcp rx queue fast enough (announced rcv_win can
-			 * exceed ssk->sk_rcvbuf).
-			 */
-			mptcp_for_each_subflow(msk, subflow) {
-				struct sock *ssk;
-				bool slow;
+		/* Make subflows follow along.  If we do not do this, we
+		 * get drops at subflow level if skbs can't be moved to
+		 * the mptcp rx queue fast enough (announced rcv_win can
+		 * exceed ssk->sk_rcvbuf).
+		 */
+		mptcp_for_each_subflow(msk, subflow) {
+			struct sock *ssk;
+			bool slow;
 
-				ssk = mptcp_subflow_tcp_sock(subflow);
-				slow = lock_sock_fast(ssk);
-				WRITE_ONCE(ssk->sk_rcvbuf, rcvbuf);
-				WRITE_ONCE(tcp_sk(ssk)->window_clamp, window_clamp);
-				if (tcp_can_send_ack(ssk))
-					tcp_cleanup_rbuf(ssk, 1);
-				unlock_sock_fast(ssk, slow);
-			}
+			ssk = mptcp_subflow_tcp_sock(subflow);
+			slow = lock_sock_fast(ssk);
+			tcp_sk(ssk)->rcvq_space.space = msk->rcvq_space.copied;
+			tcp_rcvbuf_grow(ssk);
+			unlock_sock_fast(ssk, slow);
 		}
 	}
 
-	msk->rcvq_space.space = msk->rcvq_space.copied;
 new_measure:
 	msk->rcvq_space.copied = 0;
 	msk->rcvq_space.time = mstamp;
@@ -2084,11 +2086,6 @@ static bool __mptcp_move_skbs(struct sock *sk)
 	if (list_empty(&msk->conn_list))
 		return false;
 
-	/* verify we can move any data from the subflow, eventually updating */
-	if (!(sk->sk_userlocks & SOCK_RCVBUF_LOCK))
-		mptcp_for_each_subflow(msk, subflow)
-			__mptcp_rcvbuf_update(sk, subflow->tcp_sock);
-
 	subflow = list_first_entry(&msk->conn_list,
 				   struct mptcp_subflow_context, node);
 	for (;;) {
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 9b5a248bad404..6ac58e92a1aa3 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -342,8 +342,8 @@ struct mptcp_sock {
 	struct mptcp_pm_data	pm;
 	struct mptcp_sched_ops	*sched;
 	struct {
-		u32	space;	/* bytes copied in last measurement window */
-		u32	copied; /* bytes copied in this measurement window */
+		int	space;	/* bytes copied in last measurement window */
+		int	copied; /* bytes copied in this measurement window */
 		u64	time;	/* start time of measurement window */
 		u64	rtt_us; /* last maximum rtt of subflows */
 	} rcvq_space;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [MPTCP next v3 03/12] mptcp: rcvbuf auto-tuning improvement
  2025-09-19 15:53 ` [MPTCP next v3 03/12] mptcp: rcvbuf auto-tuning improvement Paolo Abeni
@ 2025-09-19 21:10   ` Matthieu Baerts
  0 siblings, 0 replies; 31+ messages in thread
From: Matthieu Baerts @ 2025-09-19 21:10 UTC (permalink / raw)
  To: Paolo Abeni, mptcp

Hi Paolo,

On 19/09/2025 17:53, Paolo Abeni wrote:
> Apply to the MPTCP auto-tuning the same improvements introduced for the
> TCP protocol by the merge commit 2da35e4b4df9 ("Merge branch
> 'tcp-receive-side-improvements'").
> 
> The main difference is that TCP subflow and the main MPTCP socket need
> to account separately for OoO: MPTCP does not care for TCP-level OoO
> and vice versa, as a consequence do not reflect MPTCP-level rcvbuf
> increase due to OoO packets at the subflow level.
> 
> This refeactor additionally allow dropping the msk receive buffer update
> at receive time, as the latter only intended to cope with subflow receive
> buffer increase due to OoO packets.
> 
> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/487
> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/559
> Reviewed-by: Geliang Tang <geliang@kernel.org>
> Tested-by: Geliang Tang <geliang@kernel.org>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> v2 -> v3:
>   - copy tcp_rcvbuf_grow() implementation, verbatim
>   - intentionally omitted Mat's tag due to the many changes

Thank you for the v3, that's easier to spot the differences now!

Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [MPTCP next v3 04/12] mptcp: introduce the mptcp_init_skb helper.
  2025-09-19 15:53 [MPTCP next v3 00/12] mptcp: receive path improvement Paolo Abeni
                   ` (2 preceding siblings ...)
  2025-09-19 15:53 ` [MPTCP next v3 03/12] mptcp: rcvbuf auto-tuning improvement Paolo Abeni
@ 2025-09-19 15:53 ` Paolo Abeni
  2025-09-20  0:01   ` Geliang Tang
  2025-09-20  0:03   ` Geliang Tang
  2025-09-19 15:53 ` [MPTCP next v3 05/12] mptcp: remove unneeded mptcp_move_skb() Paolo Abeni
                   ` (9 subsequent siblings)
  13 siblings, 2 replies; 31+ messages in thread
From: Paolo Abeni @ 2025-09-19 15:53 UTC (permalink / raw)
  To: mptcp

Factor out all the skb initialization step in a new helper and
use it. Note that this change moves the MPTCP CB initialization
earlier: we can do such step as soon as the skb leaves the
subflow socket receive queues.

Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
v1 -> v2:
  - drop subflow argument from mptcp_init_skb()
  - change msk args to sock arg in __mptcp_move_skb()
---
 net/mptcp/protocol.c | 46 ++++++++++++++++++++++++--------------------
 1 file changed, 25 insertions(+), 21 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index c9fcdbaf50874..3aa03da781ba3 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -326,27 +326,11 @@ static void mptcp_data_queue_ofo(struct mptcp_sock *msk, struct sk_buff *skb)
 		mptcp_rcvbuf_grow(sk);
 }
 
-static bool __mptcp_move_skb(struct mptcp_sock *msk, struct sock *ssk,
-			     struct sk_buff *skb, unsigned int offset,
-			     size_t copy_len)
+static void mptcp_init_skb(struct sock *ssk,
+			   struct sk_buff *skb, int offset, int copy_len)
 {
-	struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
-	struct sock *sk = (struct sock *)msk;
-	struct sk_buff *tail;
-	bool has_rxtstamp;
-
-	__skb_unlink(skb, &ssk->sk_receive_queue);
-
-	skb_ext_reset(skb);
-	skb_orphan(skb);
-
-	/* try to fetch required memory from subflow */
-	if (!sk_rmem_schedule(sk, skb, skb->truesize)) {
-		MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_RCVPRUNED);
-		goto drop;
-	}
-
-	has_rxtstamp = TCP_SKB_CB(skb)->has_rxtstamp;
+	const struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
+	bool has_rxtstamp = TCP_SKB_CB(skb)->has_rxtstamp;
 
 	/* the skb map_seq accounts for the skb offset:
 	 * mptcp_subflow_get_mapped_dsn() is based on the current tp->copied_seq
@@ -358,6 +342,24 @@ static bool __mptcp_move_skb(struct mptcp_sock *msk, struct sock *ssk,
 	MPTCP_SKB_CB(skb)->has_rxtstamp = has_rxtstamp;
 	MPTCP_SKB_CB(skb)->cant_coalesce = 0;
 
+	__skb_unlink(skb, &ssk->sk_receive_queue);
+
+	skb_ext_reset(skb);
+	skb_dst_drop(skb);
+}
+
+static bool __mptcp_move_skb(struct sock *sk, struct sk_buff *skb)
+{
+	u64 copy_len = MPTCP_SKB_CB(skb)->end_seq - MPTCP_SKB_CB(skb)->map_seq;
+	struct mptcp_sock *msk = mptcp_sk(sk);
+	struct sk_buff *tail;
+
+	/* try to fetch required memory from subflow */
+	if (!sk_rmem_schedule(sk, skb, skb->truesize)) {
+		MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_RCVPRUNED);
+		goto drop;
+	}
+
 	if (MPTCP_SKB_CB(skb)->map_seq == msk->ack_seq) {
 		/* in sequence */
 		msk->bytes_received += copy_len;
@@ -678,7 +680,9 @@ static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk,
 		if (offset < skb->len) {
 			size_t len = skb->len - offset;
 
-			ret = __mptcp_move_skb(msk, ssk, skb, offset, len) || ret;
+			mptcp_init_skb(ssk, skb, offset, len);
+			skb_orphan(skb);
+			ret = __mptcp_move_skb(sk, skb) || ret;
 			seq += len;
 
 			if (unlikely(map_remaining < len)) {
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [MPTCP next v3 04/12] mptcp: introduce the mptcp_init_skb helper.
  2025-09-19 15:53 ` [MPTCP next v3 04/12] mptcp: introduce the mptcp_init_skb helper Paolo Abeni
@ 2025-09-20  0:01   ` Geliang Tang
  2025-09-22 10:44     ` Paolo Abeni
  2025-09-20  0:03   ` Geliang Tang
  1 sibling, 1 reply; 31+ messages in thread
From: Geliang Tang @ 2025-09-20  0:01 UTC (permalink / raw)
  To: Paolo Abeni, mptcp

Hi Paolo,

Thank you for the v3. All tests pass on my end, including my "implement
mptcp read_sock v11" test case.

I have a few minor cleanup comments to make the patch clean. There is
no need to send a v4 or block the merging of this series. Matt can make
the changes when he merges it, or I can send some squash-to patches to
address these after merging. Either approach would be fine.

On Fri, 2025-09-19 at 17:53 +0200, Paolo Abeni wrote:
> Factor out all the skb initialization step in a new helper and
> use it. Note that this change moves the MPTCP CB initialization
> earlier: we can do such step as soon as the skb leaves the
> subflow socket receive queues.
> 
> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> v1 -> v2:
>   - drop subflow argument from mptcp_init_skb()
>   - change msk args to sock arg in __mptcp_move_skb()
> ---
>  net/mptcp/protocol.c | 46 ++++++++++++++++++++++++------------------
> --
>  1 file changed, 25 insertions(+), 21 deletions(-)
> 
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index c9fcdbaf50874..3aa03da781ba3 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -326,27 +326,11 @@ static void mptcp_data_queue_ofo(struct
> mptcp_sock *msk, struct sk_buff *skb)
>  		mptcp_rcvbuf_grow(sk);
>  }
>  
> -static bool __mptcp_move_skb(struct mptcp_sock *msk, struct sock
> *ssk,
> -			     struct sk_buff *skb, unsigned int
> offset,
> -			     size_t copy_len)
> +static void mptcp_init_skb(struct sock *ssk,
> +			   struct sk_buff *skb, int offset, int
> copy_len)

nit:

static void mptcp_init_skb(struct sock *ssk, struct sk_buff *skb,
                           int offset, int copy_len)

is better.

>  {
> -	struct mptcp_subflow_context *subflow =
> mptcp_subflow_ctx(ssk);
> -	struct sock *sk = (struct sock *)msk;
> -	struct sk_buff *tail;
> -	bool has_rxtstamp;
> -
> -	__skb_unlink(skb, &ssk->sk_receive_queue);
> -
> -	skb_ext_reset(skb);
> -	skb_orphan(skb);
> -
> -	/* try to fetch required memory from subflow */
> -	if (!sk_rmem_schedule(sk, skb, skb->truesize)) {
> -		MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_RCVPRUNED);
> -		goto drop;
> -	}
> -
> -	has_rxtstamp = TCP_SKB_CB(skb)->has_rxtstamp;
> +	const struct mptcp_subflow_context *subflow =
> mptcp_subflow_ctx(ssk);
> +	bool has_rxtstamp = TCP_SKB_CB(skb)->has_rxtstamp;

nit, how about keeping it as the original code to make this patch
smaller:

-	struct mptcp_subflow_context *subflow =
mptcp_subflow_ctx(ssk);
-	bool has_rxtstamp;
-
-	has_rxtstamp = TCP_SKB_CB(skb)->has_rxtstamp;

Reviewed-by: Geliang Tang <geliang@kernel.org>
Tested-by: Geliang Tang <geliang@kernel.org>

Thanks,
-Geliang

>  
>  	/* the skb map_seq accounts for the skb offset:
>  	 * mptcp_subflow_get_mapped_dsn() is based on the current
> tp->copied_seq
> @@ -358,6 +342,24 @@ static bool __mptcp_move_skb(struct mptcp_sock
> *msk, struct sock *ssk,
>  	MPTCP_SKB_CB(skb)->has_rxtstamp = has_rxtstamp;
>  	MPTCP_SKB_CB(skb)->cant_coalesce = 0;
>  
> +	__skb_unlink(skb, &ssk->sk_receive_queue);
> +
> +	skb_ext_reset(skb);
> +	skb_dst_drop(skb);
> +}
> +
> +static bool __mptcp_move_skb(struct sock *sk, struct sk_buff *skb)
> +{
> +	u64 copy_len = MPTCP_SKB_CB(skb)->end_seq -
> MPTCP_SKB_CB(skb)->map_seq;
> +	struct mptcp_sock *msk = mptcp_sk(sk);
> +	struct sk_buff *tail;
> +
> +	/* try to fetch required memory from subflow */
> +	if (!sk_rmem_schedule(sk, skb, skb->truesize)) {
> +		MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_RCVPRUNED);
> +		goto drop;
> +	}
> +
>  	if (MPTCP_SKB_CB(skb)->map_seq == msk->ack_seq) {
>  		/* in sequence */
>  		msk->bytes_received += copy_len;
> @@ -678,7 +680,9 @@ static bool __mptcp_move_skbs_from_subflow(struct
> mptcp_sock *msk,
>  		if (offset < skb->len) {
>  			size_t len = skb->len - offset;
>  
> -			ret = __mptcp_move_skb(msk, ssk, skb,
> offset, len) || ret;
> +			mptcp_init_skb(ssk, skb, offset, len);
> +			skb_orphan(skb);
> +			ret = __mptcp_move_skb(sk, skb) || ret;
>  			seq += len;
>  
>  			if (unlikely(map_remaining < len)) {

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [MPTCP next v3 04/12] mptcp: introduce the mptcp_init_skb helper.
  2025-09-20  0:01   ` Geliang Tang
@ 2025-09-22 10:44     ` Paolo Abeni
  0 siblings, 0 replies; 31+ messages in thread
From: Paolo Abeni @ 2025-09-22 10:44 UTC (permalink / raw)
  To: Geliang Tang, mptcp

On 9/20/25 2:01 AM, Geliang Tang wrote:
> On Fri, 2025-09-19 at 17:53 +0200, Paolo Abeni wrote:
>> Factor out all the skb initialization step in a new helper and
>> use it. Note that this change moves the MPTCP CB initialization
>> earlier: we can do such step as soon as the skb leaves the
>> subflow socket receive queues.
>>
>> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
>> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>> ---
>> v1 -> v2:
>>   - drop subflow argument from mptcp_init_skb()
>>   - change msk args to sock arg in __mptcp_move_skb()
>> ---
>>  net/mptcp/protocol.c | 46 ++++++++++++++++++++++++------------------
>> --
>>  1 file changed, 25 insertions(+), 21 deletions(-)
>>
>> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
>> index c9fcdbaf50874..3aa03da781ba3 100644
>> --- a/net/mptcp/protocol.c
>> +++ b/net/mptcp/protocol.c
>> @@ -326,27 +326,11 @@ static void mptcp_data_queue_ofo(struct
>> mptcp_sock *msk, struct sk_buff *skb)
>>  		mptcp_rcvbuf_grow(sk);
>>  }
>>  
>> -static bool __mptcp_move_skb(struct mptcp_sock *msk, struct sock
>> *ssk,
>> -			     struct sk_buff *skb, unsigned int
>> offset,
>> -			     size_t copy_len)
>> +static void mptcp_init_skb(struct sock *ssk,
>> +			   struct sk_buff *skb, int offset, int
>> copy_len)
> 
> nit:
> 
> static void mptcp_init_skb(struct sock *ssk, struct sk_buff *skb,
>                            int offset, int copy_len)
> 
> is better.
> 
>>  {
>> -	struct mptcp_subflow_context *subflow =
>> mptcp_subflow_ctx(ssk);
>> -	struct sock *sk = (struct sock *)msk;
>> -	struct sk_buff *tail;
>> -	bool has_rxtstamp;
>> -
>> -	__skb_unlink(skb, &ssk->sk_receive_queue);
>> -
>> -	skb_ext_reset(skb);
>> -	skb_orphan(skb);
>> -
>> -	/* try to fetch required memory from subflow */
>> -	if (!sk_rmem_schedule(sk, skb, skb->truesize)) {
>> -		MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_RCVPRUNED);
>> -		goto drop;
>> -	}
>> -
>> -	has_rxtstamp = TCP_SKB_CB(skb)->has_rxtstamp;
>> +	const struct mptcp_subflow_context *subflow =
>> mptcp_subflow_ctx(ssk);
>> +	bool has_rxtstamp = TCP_SKB_CB(skb)->has_rxtstamp;
> 
> nit, how about keeping it as the original code to make this patch
> smaller:
> 
> -	struct mptcp_subflow_context *subflow =
> mptcp_subflow_ctx(ssk);
> -	bool has_rxtstamp;
> -
> -	has_rxtstamp = TCP_SKB_CB(skb)->has_rxtstamp;

I would like to keep the 'struct mptcp_subflow_context' constantness,
helps GCC generating better code.

> 
> Reviewed-by: Geliang Tang <geliang@kernel.org>
> Tested-by: Geliang Tang <geliang@kernel.org>

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [MPTCP next v3 04/12] mptcp: introduce the mptcp_init_skb helper.
  2025-09-19 15:53 ` [MPTCP next v3 04/12] mptcp: introduce the mptcp_init_skb helper Paolo Abeni
  2025-09-20  0:01   ` Geliang Tang
@ 2025-09-20  0:03   ` Geliang Tang
  2025-09-21  0:23     ` Geliang Tang
  1 sibling, 1 reply; 31+ messages in thread
From: Geliang Tang @ 2025-09-20  0:03 UTC (permalink / raw)
  To: Paolo Abeni, mptcp

Hi Paolo,

Thank you for the v3. All tests pass on my end, including my "implement
mptcp read_sock v11" test case.

I have a few minor cleanup comments to make the patch clean. There is
no need to send a v4 or block the merging of this series. Matt can make
the changes when he merges it, or I can send some squash-to patches to
address these after merging. Either approach would be fine.

On Fri, 2025-09-19 at 17:53 +0200, Paolo Abeni wrote:
> Factor out all the skb initialization step in a new helper and
> use it. Note that this change moves the MPTCP CB initialization
> earlier: we can do such step as soon as the skb leaves the
> subflow socket receive queues.
> 
> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> v1 -> v2:
>   - drop subflow argument from mptcp_init_skb()
>   - change msk args to sock arg in __mptcp_move_skb()
> ---
>  net/mptcp/protocol.c | 46 ++++++++++++++++++++++++------------------
> --
>  1 file changed, 25 insertions(+), 21 deletions(-)
> 
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index c9fcdbaf50874..3aa03da781ba3 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -326,27 +326,11 @@ static void mptcp_data_queue_ofo(struct
> mptcp_sock *msk, struct sk_buff *skb)
>  		mptcp_rcvbuf_grow(sk);
>  }
>  
> -static bool __mptcp_move_skb(struct mptcp_sock *msk, struct sock
> *ssk,
> -			     struct sk_buff *skb, unsigned int
> offset,
> -			     size_t copy_len)
> +static void mptcp_init_skb(struct sock *ssk,
> +			   struct sk_buff *skb, int offset, int
> copy_len)

nit:

static void mptcp_init_skb(struct sock *ssk, struct sk_buff *skb,
                           int offset, int copy_len)

is better.

>  {
> -	struct mptcp_subflow_context *subflow =
> mptcp_subflow_ctx(ssk);
> -	struct sock *sk = (struct sock *)msk;
> -	struct sk_buff *tail;
> -	bool has_rxtstamp;
> -
> -	__skb_unlink(skb, &ssk->sk_receive_queue);
> -
> -	skb_ext_reset(skb);
> -	skb_orphan(skb);
> -
> -	/* try to fetch required memory from subflow */
> -	if (!sk_rmem_schedule(sk, skb, skb->truesize)) {
> -		MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_RCVPRUNED);
> -		goto drop;
> -	}
> -
> -	has_rxtstamp = TCP_SKB_CB(skb)->has_rxtstamp;
> +	const struct mptcp_subflow_context *subflow =
> mptcp_subflow_ctx(ssk);
> +	bool has_rxtstamp = TCP_SKB_CB(skb)->has_rxtstamp;

nit, how about keeping it as the original code to make this patch
smaller:

-	struct mptcp_subflow_context *subflow =
mptcp_subflow_ctx(ssk);
-	bool has_rxtstamp;
-
-	has_rxtstamp = TCP_SKB_CB(skb)->has_rxtstamp;

Reviewed-by: Geliang Tang <geliang@kernel.org>
Tested-by: Geliang Tang <geliang@kernel.org>

Thanks,
-Geliang

>  
>  	/* the skb map_seq accounts for the skb offset:
>  	 * mptcp_subflow_get_mapped_dsn() is based on the current
> tp->copied_seq
> @@ -358,6 +342,24 @@ static bool __mptcp_move_skb(struct mptcp_sock
> *msk, struct sock *ssk,
>  	MPTCP_SKB_CB(skb)->has_rxtstamp = has_rxtstamp;
>  	MPTCP_SKB_CB(skb)->cant_coalesce = 0;
>  
> +	__skb_unlink(skb, &ssk->sk_receive_queue);
> +
> +	skb_ext_reset(skb);
> +	skb_dst_drop(skb);
> +}
> +
> +static bool __mptcp_move_skb(struct sock *sk, struct sk_buff *skb)
> +{
> +	u64 copy_len = MPTCP_SKB_CB(skb)->end_seq -
> MPTCP_SKB_CB(skb)->map_seq;
> +	struct mptcp_sock *msk = mptcp_sk(sk);
> +	struct sk_buff *tail;
> +
> +	/* try to fetch required memory from subflow */
> +	if (!sk_rmem_schedule(sk, skb, skb->truesize)) {
> +		MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_RCVPRUNED);
> +		goto drop;
> +	}
> +
>  	if (MPTCP_SKB_CB(skb)->map_seq == msk->ack_seq) {
>  		/* in sequence */
>  		msk->bytes_received += copy_len;
> @@ -678,7 +680,9 @@ static bool __mptcp_move_skbs_from_subflow(struct
> mptcp_sock *msk,
>  		if (offset < skb->len) {
>  			size_t len = skb->len - offset;
>  
> -			ret = __mptcp_move_skb(msk, ssk, skb,
> offset, len) || ret;
> +			mptcp_init_skb(ssk, skb, offset, len);
> +			skb_orphan(skb);
> +			ret = __mptcp_move_skb(sk, skb) || ret;
>  			seq += len;
>  
>  			if (unlikely(map_remaining < len)) {

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [MPTCP next v3 04/12] mptcp: introduce the mptcp_init_skb helper.
  2025-09-20  0:03   ` Geliang Tang
@ 2025-09-21  0:23     ` Geliang Tang
  2025-09-21  0:48       ` Geliang Tang
  0 siblings, 1 reply; 31+ messages in thread
From: Geliang Tang @ 2025-09-21  0:23 UTC (permalink / raw)
  To: Paolo Abeni, mptcp

On Sat, 2025-09-20 at 08:03 +0800, Geliang Tang wrote:
> Hi Paolo,
> 
> Thank you for the v3. All tests pass on my end, including my
> "implement
> mptcp read_sock v11" test case.
> 
> I have a few minor cleanup comments to make the patch clean. There is
> no need to send a v4 or block the merging of this series. Matt can
> make
> the changes when he merges it, or I can send some squash-to patches
> to
> address these after merging. Either approach would be fine.
> 
> On Fri, 2025-09-19 at 17:53 +0200, Paolo Abeni wrote:
> > Factor out all the skb initialization step in a new helper and
> > use it. Note that this change moves the MPTCP CB initialization
> > earlier: we can do such step as soon as the skb leaves the
> > subflow socket receive queues.
> > 
> > Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > ---
> > v1 -> v2:
> >   - drop subflow argument from mptcp_init_skb()
> >   - change msk args to sock arg in __mptcp_move_skb()
> > ---
> >  net/mptcp/protocol.c | 46 ++++++++++++++++++++++++----------------
> > --
> > --
> >  1 file changed, 25 insertions(+), 21 deletions(-)
> > 
> > diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> > index c9fcdbaf50874..3aa03da781ba3 100644
> > --- a/net/mptcp/protocol.c
> > +++ b/net/mptcp/protocol.c
> > @@ -326,27 +326,11 @@ static void mptcp_data_queue_ofo(struct
> > mptcp_sock *msk, struct sk_buff *skb)
> >  		mptcp_rcvbuf_grow(sk);
> >  }
> >  
> > -static bool __mptcp_move_skb(struct mptcp_sock *msk, struct sock
> > *ssk,
> > -			     struct sk_buff *skb, unsigned int
> > offset,
> > -			     size_t copy_len)
> > +static void mptcp_init_skb(struct sock *ssk,
> > +			   struct sk_buff *skb, int offset, int
> > copy_len)
> 
> nit:
> 
> static void mptcp_init_skb(struct sock *ssk, struct sk_buff *skb,
>                            int offset, int copy_len)
> 
> is better.
> 
> >  {
> > -	struct mptcp_subflow_context *subflow =
> > mptcp_subflow_ctx(ssk);
> > -	struct sock *sk = (struct sock *)msk;
> > -	struct sk_buff *tail;
> > -	bool has_rxtstamp;
> > -
> > -	__skb_unlink(skb, &ssk->sk_receive_queue);
> > -
> > -	skb_ext_reset(skb);
> > -	skb_orphan(skb);
> > -
> > -	/* try to fetch required memory from subflow */
> > -	if (!sk_rmem_schedule(sk, skb, skb->truesize)) {
> > -		MPTCP_INC_STATS(sock_net(sk),
> > MPTCP_MIB_RCVPRUNED);
> > -		goto drop;
> > -	}
> > -
> > -	has_rxtstamp = TCP_SKB_CB(skb)->has_rxtstamp;
> > +	const struct mptcp_subflow_context *subflow =
> > mptcp_subflow_ctx(ssk);
> > +	bool has_rxtstamp = TCP_SKB_CB(skb)->has_rxtstamp;
> 
> nit, how about keeping it as the original code to make this patch
> smaller:
> 
> -	struct mptcp_subflow_context *subflow =
> mptcp_subflow_ctx(ssk);
> -	bool has_rxtstamp;
> -
> -	has_rxtstamp = TCP_SKB_CB(skb)->has_rxtstamp;
> 
> Reviewed-by: Geliang Tang <geliang@kernel.org>
> Tested-by: Geliang Tang <geliang@kernel.org>
> 
> Thanks,
> -Geliang
> 
> >  
> >  	/* the skb map_seq accounts for the skb offset:
> >  	 * mptcp_subflow_get_mapped_dsn() is based on the current
> > tp->copied_seq
> > @@ -358,6 +342,24 @@ static bool __mptcp_move_skb(struct mptcp_sock
> > *msk, struct sock *ssk,
> >  	MPTCP_SKB_CB(skb)->has_rxtstamp = has_rxtstamp;
> >  	MPTCP_SKB_CB(skb)->cant_coalesce = 0;
> >  
> > +	__skb_unlink(skb, &ssk->sk_receive_queue);
> > +
> > +	skb_ext_reset(skb);
> > +	skb_dst_drop(skb);
> > +}
> > +
> > +static bool __mptcp_move_skb(struct sock *sk, struct sk_buff *skb)
> > +{
> > +	u64 copy_len = MPTCP_SKB_CB(skb)->end_seq -
> > MPTCP_SKB_CB(skb)->map_seq;
> > +	struct mptcp_sock *msk = mptcp_sk(sk);
> > +	struct sk_buff *tail;
> > +
> > +	/* try to fetch required memory from subflow */
> > +	if (!sk_rmem_schedule(sk, skb, skb->truesize)) {
> > +		MPTCP_INC_STATS(sock_net(sk),
> > MPTCP_MIB_RCVPRUNED);
> > +		goto drop;
> > +	}
> > +
> >  	if (MPTCP_SKB_CB(skb)->map_seq == msk->ack_seq) {
> >  		/* in sequence */
> >  		msk->bytes_received += copy_len;
> > @@ -678,7 +680,9 @@ static bool
> > __mptcp_move_skbs_from_subflow(struct
> > mptcp_sock *msk,
> >  		if (offset < skb->len) {
> >  			size_t len = skb->len - offset;
> >  
> > -			ret = __mptcp_move_skb(msk, ssk, skb,
> > offset, len) || ret;
> > +			mptcp_init_skb(ssk, skb, offset, len);
> > +			skb_orphan(skb);

One more small request: could you also put skb_orphan(skb); into
mptcp_init_skb, placing it on the last line, and then replace it in
patch 12.

Thanks,
-Geliang

> > +			ret = __mptcp_move_skb(sk, skb) || ret;
> >  			seq += len;
> >  
> >  			if (unlikely(map_remaining < len)) {

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [MPTCP next v3 04/12] mptcp: introduce the mptcp_init_skb helper.
  2025-09-21  0:23     ` Geliang Tang
@ 2025-09-21  0:48       ` Geliang Tang
  0 siblings, 0 replies; 31+ messages in thread
From: Geliang Tang @ 2025-09-21  0:48 UTC (permalink / raw)
  To: Paolo Abeni, mptcp

On Sun, 2025-09-21 at 08:23 +0800, Geliang Tang wrote:
> On Sat, 2025-09-20 at 08:03 +0800, Geliang Tang wrote:
> > Hi Paolo,
> > 
> > Thank you for the v3. All tests pass on my end, including my
> > "implement
> > mptcp read_sock v11" test case.
> > 
> > I have a few minor cleanup comments to make the patch clean. There
> > is
> > no need to send a v4 or block the merging of this series. Matt can
> > make
> > the changes when he merges it, or I can send some squash-to patches
> > to
> > address these after merging. Either approach would be fine.
> > 
> > On Fri, 2025-09-19 at 17:53 +0200, Paolo Abeni wrote:
> > > Factor out all the skb initialization step in a new helper and
> > > use it. Note that this change moves the MPTCP CB initialization
> > > earlier: we can do such step as soon as the skb leaves the
> > > subflow socket receive queues.
> > > 
> > > Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
> > > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > > ---
> > > v1 -> v2:
> > >   - drop subflow argument from mptcp_init_skb()
> > >   - change msk args to sock arg in __mptcp_move_skb()
> > > ---
> > >  net/mptcp/protocol.c | 46 ++++++++++++++++++++++++--------------
> > > --
> > > --
> > > --
> > >  1 file changed, 25 insertions(+), 21 deletions(-)
> > > 
> > > diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> > > index c9fcdbaf50874..3aa03da781ba3 100644
> > > --- a/net/mptcp/protocol.c
> > > +++ b/net/mptcp/protocol.c
> > > @@ -326,27 +326,11 @@ static void mptcp_data_queue_ofo(struct
> > > mptcp_sock *msk, struct sk_buff *skb)
> > >  		mptcp_rcvbuf_grow(sk);
> > >  }
> > >  
> > > -static bool __mptcp_move_skb(struct mptcp_sock *msk, struct sock
> > > *ssk,
> > > -			     struct sk_buff *skb, unsigned int
> > > offset,
> > > -			     size_t copy_len)
> > > +static void mptcp_init_skb(struct sock *ssk,
> > > +			   struct sk_buff *skb, int offset, int
> > > copy_len)
> > 
> > nit:
> > 
> > static void mptcp_init_skb(struct sock *ssk, struct sk_buff *skb,
> >                            int offset, int copy_len)
> > 
> > is better.
> > 
> > >  {
> > > -	struct mptcp_subflow_context *subflow =
> > > mptcp_subflow_ctx(ssk);
> > > -	struct sock *sk = (struct sock *)msk;
> > > -	struct sk_buff *tail;
> > > -	bool has_rxtstamp;
> > > -
> > > -	__skb_unlink(skb, &ssk->sk_receive_queue);
> > > -
> > > -	skb_ext_reset(skb);
> > > -	skb_orphan(skb);
> > > -
> > > -	/* try to fetch required memory from subflow */
> > > -	if (!sk_rmem_schedule(sk, skb, skb->truesize)) {
> > > -		MPTCP_INC_STATS(sock_net(sk),
> > > MPTCP_MIB_RCVPRUNED);
> > > -		goto drop;
> > > -	}
> > > -
> > > -	has_rxtstamp = TCP_SKB_CB(skb)->has_rxtstamp;
> > > +	const struct mptcp_subflow_context *subflow =
> > > mptcp_subflow_ctx(ssk);
> > > +	bool has_rxtstamp = TCP_SKB_CB(skb)->has_rxtstamp;
> > 
> > nit, how about keeping it as the original code to make this patch
> > smaller:
> > 
> > -	struct mptcp_subflow_context *subflow =
> > mptcp_subflow_ctx(ssk);
> > -	bool has_rxtstamp;
> > -
> > -	has_rxtstamp = TCP_SKB_CB(skb)->has_rxtstamp;
> > 
> > Reviewed-by: Geliang Tang <geliang@kernel.org>
> > Tested-by: Geliang Tang <geliang@kernel.org>
> > 
> > Thanks,
> > -Geliang
> > 
> > >  
> > >  	/* the skb map_seq accounts for the skb offset:
> > >  	 * mptcp_subflow_get_mapped_dsn() is based on the
> > > current
> > > tp->copied_seq
> > > @@ -358,6 +342,24 @@ static bool __mptcp_move_skb(struct
> > > mptcp_sock
> > > *msk, struct sock *ssk,
> > >  	MPTCP_SKB_CB(skb)->has_rxtstamp = has_rxtstamp;
> > >  	MPTCP_SKB_CB(skb)->cant_coalesce = 0;
> > >  
> > > +	__skb_unlink(skb, &ssk->sk_receive_queue);
> > > +
> > > +	skb_ext_reset(skb);
> > > +	skb_dst_drop(skb);
> > > +}
> > > +
> > > +static bool __mptcp_move_skb(struct sock *sk, struct sk_buff
> > > *skb)
> > > +{
> > > +	u64 copy_len = MPTCP_SKB_CB(skb)->end_seq -
> > > MPTCP_SKB_CB(skb)->map_seq;
> > > +	struct mptcp_sock *msk = mptcp_sk(sk);
> > > +	struct sk_buff *tail;
> > > +
> > > +	/* try to fetch required memory from subflow */
> > > +	if (!sk_rmem_schedule(sk, skb, skb->truesize)) {
> > > +		MPTCP_INC_STATS(sock_net(sk),
> > > MPTCP_MIB_RCVPRUNED);
> > > +		goto drop;
> > > +	}
> > > +
> > >  	if (MPTCP_SKB_CB(skb)->map_seq == msk->ack_seq) {
> > >  		/* in sequence */
> > >  		msk->bytes_received += copy_len;
> > > @@ -678,7 +680,9 @@ static bool
> > > __mptcp_move_skbs_from_subflow(struct
> > > mptcp_sock *msk,
> > >  		if (offset < skb->len) {
> > >  			size_t len = skb->len - offset;
> > >  
> > > -			ret = __mptcp_move_skb(msk, ssk, skb,
> > > offset, len) || ret;
> > > +			mptcp_init_skb(ssk, skb, offset, len);
> > > +			skb_orphan(skb);
> 
> One more small request: could you also put skb_orphan(skb); into
> mptcp_init_skb, placing it on the last line, and then replace it in
> patch 12.

And please delete the sentence in the subject of this patch too.

Thanks,
-Geliang

> 
> Thanks,
> -Geliang
> 
> > > +			ret = __mptcp_move_skb(sk, skb) || ret;
> > >  			seq += len;
> > >  
> > >  			if (unlikely(map_remaining < len)) {

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [MPTCP next v3 05/12] mptcp: remove unneeded mptcp_move_skb()
  2025-09-19 15:53 [MPTCP next v3 00/12] mptcp: receive path improvement Paolo Abeni
                   ` (3 preceding siblings ...)
  2025-09-19 15:53 ` [MPTCP next v3 04/12] mptcp: introduce the mptcp_init_skb helper Paolo Abeni
@ 2025-09-19 15:53 ` Paolo Abeni
  2025-09-19 15:53 ` [MPTCP next v3 06/12] mptcp: factor out a basic skb coalesce helper Paolo Abeni
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 31+ messages in thread
From: Paolo Abeni @ 2025-09-19 15:53 UTC (permalink / raw)
  To: mptcp

Since commit b7535cfed223 ("mptcp: drop legacy code around RX EOF"),
sk_shutdown can't change during the main recvmsg loop, we can drop
the related race breaker.

Reviewed-by: Geliang Tang <geliang@kernel.org>
Tested-by: Geliang Tang <geliang@kernel.org>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/mptcp/protocol.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 3aa03da781ba3..909c611d5b528 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -2207,14 +2207,8 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 				break;
 			}
 
-			if (sk->sk_shutdown & RCV_SHUTDOWN) {
-				/* race breaker: the shutdown could be after the
-				 * previous receive queue check
-				 */
-				if (__mptcp_move_skbs(sk))
-					continue;
+			if (sk->sk_shutdown & RCV_SHUTDOWN)
 				break;
-			}
 
 			if (sk->sk_state == TCP_CLOSE) {
 				copied = -ENOTCONN;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [MPTCP next v3 06/12] mptcp: factor out a basic skb coalesce helper
  2025-09-19 15:53 [MPTCP next v3 00/12] mptcp: receive path improvement Paolo Abeni
                   ` (4 preceding siblings ...)
  2025-09-19 15:53 ` [MPTCP next v3 05/12] mptcp: remove unneeded mptcp_move_skb() Paolo Abeni
@ 2025-09-19 15:53 ` Paolo Abeni
  2025-09-19 15:53 ` [MPTCP next v3 07/12] mptcp: minor move_skbs_to_msk() cleanup Paolo Abeni
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 31+ messages in thread
From: Paolo Abeni @ 2025-09-19 15:53 UTC (permalink / raw)
  To: mptcp

The upcoming patch will introduced backlog processing for MPTCP
socket, and we want to leverage coalescing in such data path.

Factor out the relevant bits not touching memory accounting to
deal with such use-case.

Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/mptcp/protocol.c | 24 ++++++++++++++++++------
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 909c611d5b528..bd83abefe4965 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -142,22 +142,34 @@ static void mptcp_drop(struct sock *sk, struct sk_buff *skb)
 	__kfree_skb(skb);
 }
 
-static bool mptcp_try_coalesce(struct sock *sk, struct sk_buff *to,
-			       struct sk_buff *from)
+static int __mptcp_try_coalesce(struct sock *sk, struct sk_buff *to,
+				struct sk_buff *from, bool *fragstolen)
 {
-	bool fragstolen;
+	int limit = READ_ONCE(sk->sk_rcvbuf);
 	int delta;
 
 	if (unlikely(MPTCP_SKB_CB(to)->cant_coalesce) ||
 	    MPTCP_SKB_CB(from)->offset ||
-	    ((to->len + from->len) > (sk->sk_rcvbuf >> 3)) ||
-	    !skb_try_coalesce(to, from, &fragstolen, &delta))
-		return false;
+	    ((to->len + from->len) > (limit >> 3)) ||
+	    !skb_try_coalesce(to, from, fragstolen, &delta))
+		return 0;
 
 	pr_debug("colesced seq %llx into %llx new len %d new end seq %llx\n",
 		 MPTCP_SKB_CB(from)->map_seq, MPTCP_SKB_CB(to)->map_seq,
 		 to->len, MPTCP_SKB_CB(from)->end_seq);
 	MPTCP_SKB_CB(to)->end_seq = MPTCP_SKB_CB(from)->end_seq;
+	return delta;
+}
+
+static bool mptcp_try_coalesce(struct sock *sk, struct sk_buff *to,
+			       struct sk_buff *from)
+{
+	bool fragstolen;
+	int delta;
+
+	delta = __mptcp_try_coalesce(sk, to, from, &fragstolen);
+	if (!delta)
+		return false;
 
 	/* note the fwd memory can reach a negative value after accounting
 	 * for the delta, but the later skb free will restore a non
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [MPTCP next v3 07/12] mptcp: minor move_skbs_to_msk() cleanup
  2025-09-19 15:53 [MPTCP next v3 00/12] mptcp: receive path improvement Paolo Abeni
                   ` (5 preceding siblings ...)
  2025-09-19 15:53 ` [MPTCP next v3 06/12] mptcp: factor out a basic skb coalesce helper Paolo Abeni
@ 2025-09-19 15:53 ` Paolo Abeni
  2025-09-19 15:53 ` [MPTCP next v3 08/12] mptcp: cleanup fallback data fin reception Paolo Abeni
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 31+ messages in thread
From: Paolo Abeni @ 2025-09-19 15:53 UTC (permalink / raw)
  To: mptcp

Such function is called only by __mptcp_data_ready(), which in turn
is always invoked when msk is not owned by the user: we can drop the
redundant, related check.

Additionally mptcp needs to propagate the socket error only for
current subflow.

Reviewed-by: Geliang Tang <geliang@kernel.org>
Tested-by: Geliang Tang <geliang@kernel.org>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/mptcp/protocol.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index bd83abefe4965..fce70cdad2a7f 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -815,12 +815,8 @@ static bool move_skbs_to_msk(struct mptcp_sock *msk, struct sock *ssk)
 
 	moved = __mptcp_move_skbs_from_subflow(msk, ssk);
 	__mptcp_ofo_queue(msk);
-	if (unlikely(ssk->sk_err)) {
-		if (!sock_owned_by_user(sk))
-			__mptcp_error_report(sk);
-		else
-			__set_bit(MPTCP_ERROR_REPORT,  &msk->cb_flags);
-	}
+	if (unlikely(ssk->sk_err))
+		__mptcp_subflow_error_report(sk, ssk);
 
 	/* If the moves have caught up with the DATA_FIN sequence number
 	 * it's time to ack the DATA_FIN and change socket state, but
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [MPTCP next v3 08/12] mptcp: cleanup fallback data fin reception
  2025-09-19 15:53 [MPTCP next v3 00/12] mptcp: receive path improvement Paolo Abeni
                   ` (6 preceding siblings ...)
  2025-09-19 15:53 ` [MPTCP next v3 07/12] mptcp: minor move_skbs_to_msk() cleanup Paolo Abeni
@ 2025-09-19 15:53 ` Paolo Abeni
  2025-09-20  0:04   ` Geliang Tang
  2025-09-19 15:53 ` [MPTCP next v3 09/12] mptcp: cleanup fallback dummy mapping generation Paolo Abeni
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 31+ messages in thread
From: Paolo Abeni @ 2025-09-19 15:53 UTC (permalink / raw)
  To: mptcp

MPTCP currently generate a dummy data_fin for fallback socket
when the fallback subflow has completed data reception using
the current ack_seq.

We are going to introduce backlog usage for the msk soon, even
for fallback sockets: the ack_seq value will not match the most recent
sequence number seen by the fallback subflow socket, as it will ignore
data_seq sitting in the backlog.

Instead use the last map sequence number to set the data_fin,
as fallback (dummy) map sequences are always in sequence.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
v2 -> v3:
  - keep the close check in subflow_sched_work_if_closed, fix
    CI failures
---
 net/mptcp/subflow.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index e8325890a3223..b9455c04e8a46 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -1285,6 +1285,7 @@ static bool subflow_is_done(const struct sock *sk)
 /* sched mptcp worker for subflow cleanup if no more data is pending */
 static void subflow_sched_work_if_closed(struct mptcp_sock *msk, struct sock *ssk)
 {
+	const struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
 	struct sock *sk = (struct sock *)msk;
 
 	if (likely(ssk->sk_state != TCP_CLOSE &&
@@ -1303,7 +1304,8 @@ static void subflow_sched_work_if_closed(struct mptcp_sock *msk, struct sock *ss
 	 */
 	if (__mptcp_check_fallback(msk) && subflow_is_done(ssk) &&
 	    msk->first == ssk &&
-	    mptcp_update_rcv_data_fin(msk, READ_ONCE(msk->ack_seq), true))
+	    mptcp_update_rcv_data_fin(msk, subflow->map_seq +
+				      subflow->map_data_len, true))
 		mptcp_schedule_work(sk);
 }
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [MPTCP next v3 08/12] mptcp: cleanup fallback data fin reception
  2025-09-19 15:53 ` [MPTCP next v3 08/12] mptcp: cleanup fallback data fin reception Paolo Abeni
@ 2025-09-20  0:04   ` Geliang Tang
  0 siblings, 0 replies; 31+ messages in thread
From: Geliang Tang @ 2025-09-20  0:04 UTC (permalink / raw)
  To: Paolo Abeni, mptcp

On Fri, 2025-09-19 at 17:53 +0200, Paolo Abeni wrote:
> MPTCP currently generate a dummy data_fin for fallback socket
> when the fallback subflow has completed data reception using
> the current ack_seq.
> 
> We are going to introduce backlog usage for the msk soon, even
> for fallback sockets: the ack_seq value will not match the most
> recent
> sequence number seen by the fallback subflow socket, as it will
> ignore
> data_seq sitting in the backlog.
> 
> Instead use the last map sequence number to set the data_fin,
> as fallback (dummy) map sequences are always in sequence.
> 
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>

LGTM.

Reviewed-by: Geliang Tang <geliang@kernel.org>
Tested-by: Geliang Tang <geliang@kernel.org>

Thanks,
-Geliang

> ---
> v2 -> v3:
>   - keep the close check in subflow_sched_work_if_closed, fix
>     CI failures
> ---
>  net/mptcp/subflow.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
> index e8325890a3223..b9455c04e8a46 100644
> --- a/net/mptcp/subflow.c
> +++ b/net/mptcp/subflow.c
> @@ -1285,6 +1285,7 @@ static bool subflow_is_done(const struct sock
> *sk)
>  /* sched mptcp worker for subflow cleanup if no more data is pending
> */
>  static void subflow_sched_work_if_closed(struct mptcp_sock *msk,
> struct sock *ssk)
>  {
> +	const struct mptcp_subflow_context *subflow =
> mptcp_subflow_ctx(ssk);
>  	struct sock *sk = (struct sock *)msk;
>  
>  	if (likely(ssk->sk_state != TCP_CLOSE &&
> @@ -1303,7 +1304,8 @@ static void subflow_sched_work_if_closed(struct
> mptcp_sock *msk, struct sock *ss
>  	 */
>  	if (__mptcp_check_fallback(msk) && subflow_is_done(ssk) &&
>  	    msk->first == ssk &&
> -	    mptcp_update_rcv_data_fin(msk, READ_ONCE(msk->ack_seq),
> true))
> +	    mptcp_update_rcv_data_fin(msk, subflow->map_seq +
> +				      subflow->map_data_len, true))
>  		mptcp_schedule_work(sk);
>  }
>  

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [MPTCP next v3 09/12] mptcp: cleanup fallback dummy mapping generation.
  2025-09-19 15:53 [MPTCP next v3 00/12] mptcp: receive path improvement Paolo Abeni
                   ` (7 preceding siblings ...)
  2025-09-19 15:53 ` [MPTCP next v3 08/12] mptcp: cleanup fallback data fin reception Paolo Abeni
@ 2025-09-19 15:53 ` Paolo Abeni
  2025-09-20  0:06   ` Geliang Tang
  2025-09-19 15:53 ` [MPTCP next v3 10/12] mptcp: leverage the sk backlog for RX packet processing Paolo Abeni
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 31+ messages in thread
From: Paolo Abeni @ 2025-09-19 15:53 UTC (permalink / raw)
  To: mptcp

MPTCP currently access ack_seq outside the msk socket log scope to
generate the dummy mapping for fallback socket. Soon we are going
to introduce backlog usage and even for fallback socket the ack_seq
value will be significantly off outside of the msk socket lock scope.

Avoid relying on ack_seq for dummy mapping generation, using instead
the subflow sequence number. Note that in case of disconnect() and
(re)connect() we must ensure that any previous state is re-set.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
v2 -> v3:
 - reordered before the backlog introduction to avoid transiently
   break the fallback
 - explicitly reset ack_seq
---
 net/mptcp/protocol.c | 3 +++
 net/mptcp/subflow.c  | 8 +++++++-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index fce70cdad2a7f..c8b02048126a9 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -3224,6 +3224,9 @@ static int mptcp_disconnect(struct sock *sk, int flags)
 	msk->bytes_retrans = 0;
 	msk->rcvspace_init = 0;
 
+	/* for fallback's sake */
+	WRITE_ONCE(msk->ack_seq, 0);
+
 	WRITE_ONCE(sk->sk_shutdown, 0);
 	sk_error_report(sk);
 	return 0;
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index b9455c04e8a46..ac8616e7521e8 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -491,6 +491,9 @@ static void subflow_set_remote_key(struct mptcp_sock *msk,
 	mptcp_crypto_key_sha(subflow->remote_key, NULL, &subflow->iasn);
 	subflow->iasn++;
 
+	/* for fallback's sake */
+	subflow->map_seq = subflow->iasn;
+
 	WRITE_ONCE(msk->remote_key, subflow->remote_key);
 	WRITE_ONCE(msk->ack_seq, subflow->iasn);
 	WRITE_ONCE(msk->can_ack, true);
@@ -1435,9 +1438,12 @@ static bool subflow_check_data_avail(struct sock *ssk)
 
 	skb = skb_peek(&ssk->sk_receive_queue);
 	subflow->map_valid = 1;
-	subflow->map_seq = READ_ONCE(msk->ack_seq);
 	subflow->map_data_len = skb->len;
 	subflow->map_subflow_seq = tcp_sk(ssk)->copied_seq - subflow->ssn_offset;
+	subflow->map_seq = __mptcp_expand_seq(subflow->map_seq,
+					      subflow->iasn +
+					      TCP_SKB_CB(skb)->seq -
+					      subflow->ssn_offset - 1);
 	WRITE_ONCE(subflow->data_avail, true);
 	return true;
 }
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [MPTCP next v3 09/12] mptcp: cleanup fallback dummy mapping generation.
  2025-09-19 15:53 ` [MPTCP next v3 09/12] mptcp: cleanup fallback dummy mapping generation Paolo Abeni
@ 2025-09-20  0:06   ` Geliang Tang
  2025-09-21  1:01     ` Geliang Tang
  0 siblings, 1 reply; 31+ messages in thread
From: Geliang Tang @ 2025-09-20  0:06 UTC (permalink / raw)
  To: Paolo Abeni, mptcp

On Fri, 2025-09-19 at 17:53 +0200, Paolo Abeni wrote:
> MPTCP currently access ack_seq outside the msk socket log scope to
> generate the dummy mapping for fallback socket. Soon we are going
> to introduce backlog usage and even for fallback socket the ack_seq
> value will be significantly off outside of the msk socket lock scope.
> 
> Avoid relying on ack_seq for dummy mapping generation, using instead
> the subflow sequence number. Note that in case of disconnect() and
> (re)connect() we must ensure that any previous state is re-set.
> 
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> v2 -> v3:
>  - reordered before the backlog introduction to avoid transiently
>    break the fallback
>  - explicitly reset ack_seq
> ---
>  net/mptcp/protocol.c | 3 +++
>  net/mptcp/subflow.c  | 8 +++++++-
>  2 files changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index fce70cdad2a7f..c8b02048126a9 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -3224,6 +3224,9 @@ static int mptcp_disconnect(struct sock *sk,
> int flags)
>  	msk->bytes_retrans = 0;
>  	msk->rcvspace_init = 0;
>  
> +	/* for fallback's sake */
> +	WRITE_ONCE(msk->ack_seq, 0);
> +
>  	WRITE_ONCE(sk->sk_shutdown, 0);
>  	sk_error_report(sk);
>  	return 0;
> diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
> index b9455c04e8a46..ac8616e7521e8 100644
> --- a/net/mptcp/subflow.c
> +++ b/net/mptcp/subflow.c
> @@ -491,6 +491,9 @@ static void subflow_set_remote_key(struct
> mptcp_sock *msk,
>  	mptcp_crypto_key_sha(subflow->remote_key, NULL, &subflow-
> >iasn);
>  	subflow->iasn++;
>  
> +	/* for fallback's sake */
> +	subflow->map_seq = subflow->iasn;
> +
>  	WRITE_ONCE(msk->remote_key, subflow->remote_key);
>  	WRITE_ONCE(msk->ack_seq, subflow->iasn);
>  	WRITE_ONCE(msk->can_ack, true);
> @@ -1435,9 +1438,12 @@ static bool subflow_check_data_avail(struct
> sock *ssk)
>  
>  	skb = skb_peek(&ssk->sk_receive_queue);
>  	subflow->map_valid = 1;
> -	subflow->map_seq = READ_ONCE(msk->ack_seq);

nit:

How about replacing this line in place? I mean, do not move it to a
later position:

+	subflow->map_seq = __mptcp_expand_seq(subflow->map_seq,
+					      subflow->iasn +
+					      TCP_SKB_CB(skb)->seq -
+					      subflow->ssn_offset -
1);

Reviewed-by: Geliang Tang <geliang@kernel.org>
Tested-by: Geliang Tang <geliang@kernel.org>

Thanks,
-Geliang

>  	subflow->map_data_len = skb->len;
>  	subflow->map_subflow_seq = tcp_sk(ssk)->copied_seq -
> subflow->ssn_offset;
> +	subflow->map_seq = __mptcp_expand_seq(subflow->map_seq,
> +					      subflow->iasn +
> +					      TCP_SKB_CB(skb)->seq -
> +					      subflow->ssn_offset -
> 1);
>  	WRITE_ONCE(subflow->data_avail, true);
>  	return true;
>  }

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [MPTCP next v3 09/12] mptcp: cleanup fallback dummy mapping generation.
  2025-09-20  0:06   ` Geliang Tang
@ 2025-09-21  1:01     ` Geliang Tang
  0 siblings, 0 replies; 31+ messages in thread
From: Geliang Tang @ 2025-09-21  1:01 UTC (permalink / raw)
  To: Paolo Abeni, mptcp

On Sat, 2025-09-20 at 08:06 +0800, Geliang Tang wrote:
> On Fri, 2025-09-19 at 17:53 +0200, Paolo Abeni wrote:
> > MPTCP currently access ack_seq outside the msk socket log scope to
> > generate the dummy mapping for fallback socket. Soon we are going
> > to introduce backlog usage and even for fallback socket the ack_seq
> > value will be significantly off outside of the msk socket lock
> > scope.
> > 
> > Avoid relying on ack_seq for dummy mapping generation, using
> > instead
> > the subflow sequence number. Note that in case of disconnect() and
> > (re)connect() we must ensure that any previous state is re-set.
> > 
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > ---
> > v2 -> v3:
> >  - reordered before the backlog introduction to avoid transiently
> >    break the fallback
> >  - explicitly reset ack_seq
> > ---
> >  net/mptcp/protocol.c | 3 +++
> >  net/mptcp/subflow.c  | 8 +++++++-
> >  2 files changed, 10 insertions(+), 1 deletion(-)
> > 
> > diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> > index fce70cdad2a7f..c8b02048126a9 100644
> > --- a/net/mptcp/protocol.c
> > +++ b/net/mptcp/protocol.c
> > @@ -3224,6 +3224,9 @@ static int mptcp_disconnect(struct sock *sk,
> > int flags)
> >  	msk->bytes_retrans = 0;
> >  	msk->rcvspace_init = 0;
> >  
> > +	/* for fallback's sake */
> > +	WRITE_ONCE(msk->ack_seq, 0);
> > +
> >  	WRITE_ONCE(sk->sk_shutdown, 0);
> >  	sk_error_report(sk);
> >  	return 0;
> > diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
> > index b9455c04e8a46..ac8616e7521e8 100644
> > --- a/net/mptcp/subflow.c
> > +++ b/net/mptcp/subflow.c
> > @@ -491,6 +491,9 @@ static void subflow_set_remote_key(struct
> > mptcp_sock *msk,
> >  	mptcp_crypto_key_sha(subflow->remote_key, NULL, &subflow-
> > > iasn);
> >  	subflow->iasn++;
> >  
> > +	/* for fallback's sake */
> > +	subflow->map_seq = subflow->iasn;
> > +
> >  	WRITE_ONCE(msk->remote_key, subflow->remote_key);
> >  	WRITE_ONCE(msk->ack_seq, subflow->iasn);
> >  	WRITE_ONCE(msk->can_ack, true);
> > @@ -1435,9 +1438,12 @@ static bool subflow_check_data_avail(struct
> > sock *ssk)
> >  
> >  	skb = skb_peek(&ssk->sk_receive_queue);
> >  	subflow->map_valid = 1;
> > -	subflow->map_seq = READ_ONCE(msk->ack_seq);
> 
> nit:
> 
> How about replacing this line in place? I mean, do not move it to a
> later position:
> 
> +	subflow->map_seq = __mptcp_expand_seq(subflow->map_seq,
> +					      subflow->iasn +
> +					      TCP_SKB_CB(skb)->seq -
> +					      subflow->ssn_offset -
> 1);
> 
> Reviewed-by: Geliang Tang <geliang@kernel.org>
> Tested-by: Geliang Tang <geliang@kernel.org>

Please delete the sentence in the subject of this patch too.

Thanks,
-Geliang

> 
> Thanks,
> -Geliang
> 
> >  	subflow->map_data_len = skb->len;
> >  	subflow->map_subflow_seq = tcp_sk(ssk)->copied_seq -
> > subflow->ssn_offset;
> > +	subflow->map_seq = __mptcp_expand_seq(subflow->map_seq,
> > +					      subflow->iasn +
> > +					      TCP_SKB_CB(skb)->seq
> > -
> > +					      subflow->ssn_offset
> > -
> > 1);
> >  	WRITE_ONCE(subflow->data_avail, true);
> >  	return true;
> >  }

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [MPTCP next v3 10/12] mptcp: leverage the sk backlog for RX packet processing.
  2025-09-19 15:53 [MPTCP next v3 00/12] mptcp: receive path improvement Paolo Abeni
                   ` (8 preceding siblings ...)
  2025-09-19 15:53 ` [MPTCP next v3 09/12] mptcp: cleanup fallback dummy mapping generation Paolo Abeni
@ 2025-09-19 15:53 ` Paolo Abeni
  2025-09-20  0:09   ` Geliang Tang
  2025-09-19 15:53 ` [MPTCP next v3 11/12] mptcp: prevernt __mptcp_move_skbs() interfering with the fastpath Paolo Abeni
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 31+ messages in thread
From: Paolo Abeni @ 2025-09-19 15:53 UTC (permalink / raw)
  To: mptcp

This streamline the RX path implementation and improves the RX
performances by reducing the subflow-level locking and the amount of
work done under the msk socket lock; the implementation mirror closely
the TCP backlog processing.

Note that MPTCP needs now to traverse the existing subflow looking for
data that was left there due to the msk receive buffer full, only after
that recvmsg completely empties the receive queue.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/mptcp/protocol.c | 103 ++++++++++++++++++++++++++++++-------------
 net/mptcp/protocol.h |   2 +-
 2 files changed, 73 insertions(+), 32 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index c8b02048126a9..201e6ac5fe631 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -360,6 +360,27 @@ static void mptcp_init_skb(struct sock *ssk,
 	skb_dst_drop(skb);
 }
 
+static void __mptcp_add_backlog(struct sock *sk, struct sock *ssk,
+				struct sk_buff *skb)
+{
+	struct sk_buff *tail = sk->sk_backlog.tail;
+	bool fragstolen;
+	int delta;
+
+	if (tail && MPTCP_SKB_CB(skb)->map_seq == MPTCP_SKB_CB(tail)->end_seq) {
+		delta = __mptcp_try_coalesce(sk, tail, skb, &fragstolen);
+		if (delta) {
+			sk->sk_backlog.len += delta;
+			kfree_skb_partial(skb, fragstolen);
+			return;
+		}
+	}
+
+	/* mptcp checks the limit before adding the skb to the backlog */
+	__sk_add_backlog(sk, skb);
+	sk->sk_backlog.len += skb->truesize;
+}
+
 static bool __mptcp_move_skb(struct sock *sk, struct sk_buff *skb)
 {
 	u64 copy_len = MPTCP_SKB_CB(skb)->end_seq - MPTCP_SKB_CB(skb)->map_seq;
@@ -648,7 +669,7 @@ static void mptcp_dss_corruption(struct mptcp_sock *msk, struct sock *ssk)
 }
 
 static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk,
-					   struct sock *ssk)
+					   struct sock *ssk, bool own_msk)
 {
 	struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
 	struct sock *sk = (struct sock *)msk;
@@ -659,12 +680,13 @@ static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk,
 	pr_debug("msk=%p ssk=%p\n", msk, ssk);
 	tp = tcp_sk(ssk);
 	do {
+		int mem = own_msk ? sk_rmem_alloc_get(sk) : sk->sk_backlog.len;
 		u32 map_remaining, offset;
 		u32 seq = tp->copied_seq;
 		struct sk_buff *skb;
 		bool fin;
 
-		if (sk_rmem_alloc_get(sk) > sk->sk_rcvbuf)
+		if (mem > READ_ONCE(sk->sk_rcvbuf))
 			break;
 
 		/* try to move as much data as available */
@@ -694,7 +716,11 @@ static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk,
 
 			mptcp_init_skb(ssk, skb, offset, len);
 			skb_orphan(skb);
-			ret = __mptcp_move_skb(sk, skb) || ret;
+
+			if (own_msk)
+				ret |= __mptcp_move_skb(sk, skb);
+			else
+				__mptcp_add_backlog(sk, ssk, skb);
 			seq += len;
 
 			if (unlikely(map_remaining < len)) {
@@ -715,7 +741,7 @@ static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk,
 
 	} while (more_data_avail);
 
-	if (ret)
+	if (ret && own_msk)
 		msk->last_data_recv = tcp_jiffies32;
 	return ret;
 }
@@ -813,7 +839,7 @@ static bool move_skbs_to_msk(struct mptcp_sock *msk, struct sock *ssk)
 	struct sock *sk = (struct sock *)msk;
 	bool moved;
 
-	moved = __mptcp_move_skbs_from_subflow(msk, ssk);
+	moved = __mptcp_move_skbs_from_subflow(msk, ssk, true);
 	__mptcp_ofo_queue(msk);
 	if (unlikely(ssk->sk_err))
 		__mptcp_subflow_error_report(sk, ssk);
@@ -828,18 +854,10 @@ static bool move_skbs_to_msk(struct mptcp_sock *msk, struct sock *ssk)
 	return moved;
 }
 
-static void __mptcp_data_ready(struct sock *sk, struct sock *ssk)
-{
-	struct mptcp_sock *msk = mptcp_sk(sk);
-
-	/* Wake-up the reader only for in-sequence data */
-	if (move_skbs_to_msk(msk, ssk) && mptcp_epollin_ready(sk))
-		sk->sk_data_ready(sk);
-}
-
 void mptcp_data_ready(struct sock *sk, struct sock *ssk)
 {
 	struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
+	struct mptcp_sock *msk = mptcp_sk(sk);
 
 	/* The peer can send data while we are shutting down this
 	 * subflow at msk destruction time, but we must avoid enqueuing
@@ -849,13 +867,33 @@ void mptcp_data_ready(struct sock *sk, struct sock *ssk)
 		return;
 
 	mptcp_data_lock(sk);
-	if (!sock_owned_by_user(sk))
-		__mptcp_data_ready(sk, ssk);
-	else
-		__set_bit(MPTCP_DEQUEUE, &mptcp_sk(sk)->cb_flags);
+	if (!sock_owned_by_user(sk)) {
+		/* Wake-up the reader only for in-sequence data */
+		if (move_skbs_to_msk(msk, ssk) && mptcp_epollin_ready(sk))
+			sk->sk_data_ready(sk);
+	} else {
+		__mptcp_move_skbs_from_subflow(msk, ssk, false);
+		if (unlikely(ssk->sk_err))
+			__set_bit(MPTCP_ERROR_REPORT,  &msk->cb_flags);
+	}
 	mptcp_data_unlock(sk);
 }
 
+static int mptcp_move_skb(struct sock *sk, struct sk_buff *skb)
+{
+	struct mptcp_sock *msk = mptcp_sk(sk);
+
+	if (__mptcp_move_skb(sk, skb)) {
+		msk->last_data_recv = tcp_jiffies32;
+		__mptcp_ofo_queue(msk);
+		/* notify ack seq update */
+		mptcp_cleanup_rbuf(msk, 0);
+		mptcp_check_data_fin(sk);
+		sk->sk_data_ready(sk);
+	}
+	return 0;
+}
+
 static void mptcp_subflow_joined(struct mptcp_sock *msk, struct sock *ssk)
 {
 	mptcp_subflow_ctx(ssk)->map_seq = READ_ONCE(msk->ack_seq);
@@ -2117,7 +2155,7 @@ static bool __mptcp_move_skbs(struct sock *sk)
 
 		ssk = mptcp_subflow_tcp_sock(subflow);
 		slowpath = lock_sock_fast(ssk);
-		ret = __mptcp_move_skbs_from_subflow(msk, ssk) || ret;
+		ret = __mptcp_move_skbs_from_subflow(msk, ssk, true) || ret;
 		if (unlikely(ssk->sk_err))
 			__mptcp_error_report(sk);
 		unlock_sock_fast(ssk, slowpath);
@@ -2193,8 +2231,12 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 
 		copied += bytes_read;
 
-		if (skb_queue_empty(&sk->sk_receive_queue) && __mptcp_move_skbs(sk))
-			continue;
+		if (skb_queue_empty(&sk->sk_receive_queue)) {
+			__sk_flush_backlog(sk);
+			if (!skb_queue_empty(&sk->sk_receive_queue) ||
+			    __mptcp_move_skbs(sk))
+				continue;
+		}
 
 		/* only the MPTCP socket status is relevant here. The exit
 		 * conditions mirror closely tcp_recvmsg()
@@ -2542,7 +2584,6 @@ static void __mptcp_close_subflow(struct sock *sk)
 
 		mptcp_close_ssk(sk, ssk, subflow);
 	}
-
 }
 
 static bool mptcp_close_tout_expired(const struct sock *sk)
@@ -3126,6 +3167,13 @@ bool __mptcp_close(struct sock *sk, long timeout)
 	pr_debug("msk=%p state=%d\n", sk, sk->sk_state);
 	mptcp_pm_connection_closed(msk);
 
+	/* process the backlog; note that it never destroies the msk */
+	local_bh_disable();
+	bh_lock_sock(sk);
+	__release_sock(sk);
+	bh_unlock_sock(sk);
+	local_bh_enable();
+
 	if (sk->sk_state == TCP_CLOSE) {
 		__mptcp_destroy_sock(sk);
 		do_cancel_work = true;
@@ -3429,8 +3477,7 @@ void __mptcp_check_push(struct sock *sk, struct sock *ssk)
 
 #define MPTCP_FLAGS_PROCESS_CTX_NEED (BIT(MPTCP_PUSH_PENDING) | \
 				      BIT(MPTCP_RETRANSMIT) | \
-				      BIT(MPTCP_FLUSH_JOIN_LIST) | \
-				      BIT(MPTCP_DEQUEUE))
+				      BIT(MPTCP_FLUSH_JOIN_LIST))
 
 /* processes deferred events and flush wmem */
 static void mptcp_release_cb(struct sock *sk)
@@ -3464,11 +3511,6 @@ static void mptcp_release_cb(struct sock *sk)
 			__mptcp_push_pending(sk, 0);
 		if (flags & BIT(MPTCP_RETRANSMIT))
 			__mptcp_retrans(sk);
-		if ((flags & BIT(MPTCP_DEQUEUE)) && __mptcp_move_skbs(sk)) {
-			/* notify ack seq update */
-			mptcp_cleanup_rbuf(msk, 0);
-			sk->sk_data_ready(sk);
-		}
 
 		cond_resched();
 		spin_lock_bh(&sk->sk_lock.slock);
@@ -3704,8 +3746,6 @@ static int mptcp_ioctl(struct sock *sk, int cmd, int *karg)
 			return -EINVAL;
 
 		lock_sock(sk);
-		if (__mptcp_move_skbs(sk))
-			mptcp_cleanup_rbuf(msk, 0);
 		*karg = mptcp_inq_hint(sk);
 		release_sock(sk);
 		break;
@@ -3817,6 +3857,7 @@ static struct proto mptcp_prot = {
 	.sendmsg	= mptcp_sendmsg,
 	.ioctl		= mptcp_ioctl,
 	.recvmsg	= mptcp_recvmsg,
+	.backlog_rcv	= mptcp_move_skb,
 	.release_cb	= mptcp_release_cb,
 	.hash		= mptcp_hash,
 	.unhash		= mptcp_unhash,
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 6ac58e92a1aa3..7bfd4e0d21a8a 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -124,7 +124,6 @@
 #define MPTCP_FLUSH_JOIN_LIST	5
 #define MPTCP_SYNC_STATE	6
 #define MPTCP_SYNC_SNDBUF	7
-#define MPTCP_DEQUEUE		8
 
 struct mptcp_skb_cb {
 	u64 map_seq;
@@ -408,6 +407,7 @@ static inline int mptcp_space_from_win(const struct sock *sk, int win)
 static inline int __mptcp_space(const struct sock *sk)
 {
 	return mptcp_win_from_space(sk, READ_ONCE(sk->sk_rcvbuf) -
+				    READ_ONCE(sk->sk_backlog.len) -
 				    sk_rmem_alloc_get(sk));
 }
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [MPTCP next v3 10/12] mptcp: leverage the sk backlog for RX packet processing.
  2025-09-19 15:53 ` [MPTCP next v3 10/12] mptcp: leverage the sk backlog for RX packet processing Paolo Abeni
@ 2025-09-20  0:09   ` Geliang Tang
  2025-09-21  0:27     ` Geliang Tang
  0 siblings, 1 reply; 31+ messages in thread
From: Geliang Tang @ 2025-09-20  0:09 UTC (permalink / raw)
  To: Paolo Abeni, mptcp

On Fri, 2025-09-19 at 17:53 +0200, Paolo Abeni wrote:
> This streamline the RX path implementation and improves the RX
> performances by reducing the subflow-level locking and the amount of
> work done under the msk socket lock; the implementation mirror
> closely
> the TCP backlog processing.
> 
> Note that MPTCP needs now to traverse the existing subflow looking
> for
> data that was left there due to the msk receive buffer full, only
> after
> that recvmsg completely empties the receive queue.
> 
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
>  net/mptcp/protocol.c | 103 ++++++++++++++++++++++++++++++-----------
> --
>  net/mptcp/protocol.h |   2 +-
>  2 files changed, 73 insertions(+), 32 deletions(-)
> 
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index c8b02048126a9..201e6ac5fe631 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -360,6 +360,27 @@ static void mptcp_init_skb(struct sock *ssk,
>  	skb_dst_drop(skb);
>  }
>  
> +static void __mptcp_add_backlog(struct sock *sk, struct sock *ssk,
> +				struct sk_buff *skb)
> +{
> +	struct sk_buff *tail = sk->sk_backlog.tail;
> +	bool fragstolen;
> +	int delta;
> +
> +	if (tail && MPTCP_SKB_CB(skb)->map_seq ==
> MPTCP_SKB_CB(tail)->end_seq) {
> +		delta = __mptcp_try_coalesce(sk, tail, skb,
> &fragstolen);
> +		if (delta) {
> +			sk->sk_backlog.len += delta;
> +			kfree_skb_partial(skb, fragstolen);
> +			return;
> +		}
> +	}
> +
> +	/* mptcp checks the limit before adding the skb to the
> backlog */
> +	__sk_add_backlog(sk, skb);
> +	sk->sk_backlog.len += skb->truesize;
> +}
> +
>  static bool __mptcp_move_skb(struct sock *sk, struct sk_buff *skb)
>  {
>  	u64 copy_len = MPTCP_SKB_CB(skb)->end_seq -
> MPTCP_SKB_CB(skb)->map_seq;
> @@ -648,7 +669,7 @@ static void mptcp_dss_corruption(struct
> mptcp_sock *msk, struct sock *ssk)
>  }
>  
>  static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk,
> -					   struct sock *ssk)
> +					   struct sock *ssk, bool
> own_msk)
>  {
>  	struct mptcp_subflow_context *subflow =
> mptcp_subflow_ctx(ssk);
>  	struct sock *sk = (struct sock *)msk;
> @@ -659,12 +680,13 @@ static bool
> __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk,
>  	pr_debug("msk=%p ssk=%p\n", msk, ssk);
>  	tp = tcp_sk(ssk);
>  	do {
> +		int mem = own_msk ? sk_rmem_alloc_get(sk) : sk-
> >sk_backlog.len;
>  		u32 map_remaining, offset;
>  		u32 seq = tp->copied_seq;
>  		struct sk_buff *skb;
>  		bool fin;
>  
> -		if (sk_rmem_alloc_get(sk) > sk->sk_rcvbuf)
> +		if (mem > READ_ONCE(sk->sk_rcvbuf))
>  			break;
>  
>  		/* try to move as much data as available */
> @@ -694,7 +716,11 @@ static bool
> __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk,
>  
>  			mptcp_init_skb(ssk, skb, offset, len);
>  			skb_orphan(skb);
> -			ret = __mptcp_move_skb(sk, skb) || ret;
> +
> +			if (own_msk)
> +				ret |= __mptcp_move_skb(sk, skb);
> +			else
> +				__mptcp_add_backlog(sk, ssk, skb);
>  			seq += len;
>  
>  			if (unlikely(map_remaining < len)) {
> @@ -715,7 +741,7 @@ static bool __mptcp_move_skbs_from_subflow(struct
> mptcp_sock *msk,
>  
>  	} while (more_data_avail);
>  
> -	if (ret)
> +	if (ret && own_msk)
>  		msk->last_data_recv = tcp_jiffies32;
>  	return ret;
>  }
> @@ -813,7 +839,7 @@ static bool move_skbs_to_msk(struct mptcp_sock
> *msk, struct sock *ssk)
>  	struct sock *sk = (struct sock *)msk;
>  	bool moved;
>  
> -	moved = __mptcp_move_skbs_from_subflow(msk, ssk);
> +	moved = __mptcp_move_skbs_from_subflow(msk, ssk, true);
>  	__mptcp_ofo_queue(msk);
>  	if (unlikely(ssk->sk_err))
>  		__mptcp_subflow_error_report(sk, ssk);
> @@ -828,18 +854,10 @@ static bool move_skbs_to_msk(struct mptcp_sock
> *msk, struct sock *ssk)
>  	return moved;
>  }
>  
> -static void __mptcp_data_ready(struct sock *sk, struct sock *ssk)
> -{
> -	struct mptcp_sock *msk = mptcp_sk(sk);
> -
> -	/* Wake-up the reader only for in-sequence data */
> -	if (move_skbs_to_msk(msk, ssk) && mptcp_epollin_ready(sk))
> -		sk->sk_data_ready(sk);
> -}
> -
>  void mptcp_data_ready(struct sock *sk, struct sock *ssk)
>  {
>  	struct mptcp_subflow_context *subflow =
> mptcp_subflow_ctx(ssk);
> +	struct mptcp_sock *msk = mptcp_sk(sk);
>  
>  	/* The peer can send data while we are shutting down this
>  	 * subflow at msk destruction time, but we must avoid
> enqueuing
> @@ -849,13 +867,33 @@ void mptcp_data_ready(struct sock *sk, struct
> sock *ssk)
>  		return;
>  
>  	mptcp_data_lock(sk);
> -	if (!sock_owned_by_user(sk))
> -		__mptcp_data_ready(sk, ssk);
> -	else
> -		__set_bit(MPTCP_DEQUEUE, &mptcp_sk(sk)->cb_flags);
> +	if (!sock_owned_by_user(sk)) {
> +		/* Wake-up the reader only for in-sequence data */
> +		if (move_skbs_to_msk(msk, ssk) &&
> mptcp_epollin_ready(sk))
> +			sk->sk_data_ready(sk);
> +	} else {
> +		__mptcp_move_skbs_from_subflow(msk, ssk, false);
> +		if (unlikely(ssk->sk_err))
> +			__set_bit(MPTCP_ERROR_REPORT,  &msk-
> >cb_flags);
> +	}
>  	mptcp_data_unlock(sk);
>  }
>  
> +static int mptcp_move_skb(struct sock *sk, struct sk_buff *skb)
> +{
> +	struct mptcp_sock *msk = mptcp_sk(sk);
> +
> +	if (__mptcp_move_skb(sk, skb)) {
> +		msk->last_data_recv = tcp_jiffies32;
> +		__mptcp_ofo_queue(msk);
> +		/* notify ack seq update */
> +		mptcp_cleanup_rbuf(msk, 0);
> +		mptcp_check_data_fin(sk);
> +		sk->sk_data_ready(sk);
> +	}
> +	return 0;
> +}
> +
>  static void mptcp_subflow_joined(struct mptcp_sock *msk, struct sock
> *ssk)
>  {
>  	mptcp_subflow_ctx(ssk)->map_seq = READ_ONCE(msk->ack_seq);
> @@ -2117,7 +2155,7 @@ static bool __mptcp_move_skbs(struct sock *sk)
>  
>  		ssk = mptcp_subflow_tcp_sock(subflow);
>  		slowpath = lock_sock_fast(ssk);
> -		ret = __mptcp_move_skbs_from_subflow(msk, ssk) ||
> ret;
> +		ret = __mptcp_move_skbs_from_subflow(msk, ssk, true)
> || ret;
>  		if (unlikely(ssk->sk_err))
>  			__mptcp_error_report(sk);
>  		unlock_sock_fast(ssk, slowpath);
> @@ -2193,8 +2231,12 @@ static int mptcp_recvmsg(struct sock *sk,
> struct msghdr *msg, size_t len,
>  
>  		copied += bytes_read;
>  
> -		if (skb_queue_empty(&sk->sk_receive_queue) &&
> __mptcp_move_skbs(sk))
> -			continue;
> +		if (skb_queue_empty(&sk->sk_receive_queue)) {
> +			__sk_flush_backlog(sk);
> +			if (!skb_queue_empty(&sk->sk_receive_queue)
> ||
> +			    __mptcp_move_skbs(sk))
> +				continue;
> +		}
>  
>  		/* only the MPTCP socket status is relevant here.
> The exit
>  		 * conditions mirror closely tcp_recvmsg()
> @@ -2542,7 +2584,6 @@ static void __mptcp_close_subflow(struct sock
> *sk)
>  
>  		mptcp_close_ssk(sk, ssk, subflow);
>  	}
> -

Deleting this blank line is unrelated to the current patch. Let's make
the change later when we modify the __mptcp_close_subflow function
together.

I need to spend some more time understanding and testing patches 10-12.

Thanks,
-Geliang

>  }
>  
>  static bool mptcp_close_tout_expired(const struct sock *sk)
> @@ -3126,6 +3167,13 @@ bool __mptcp_close(struct sock *sk, long
> timeout)
>  	pr_debug("msk=%p state=%d\n", sk, sk->sk_state);
>  	mptcp_pm_connection_closed(msk);
>  
> +	/* process the backlog; note that it never destroies the msk
> */
> +	local_bh_disable();
> +	bh_lock_sock(sk);
> +	__release_sock(sk);
> +	bh_unlock_sock(sk);
> +	local_bh_enable();
> +
>  	if (sk->sk_state == TCP_CLOSE) {
>  		__mptcp_destroy_sock(sk);
>  		do_cancel_work = true;
> @@ -3429,8 +3477,7 @@ void __mptcp_check_push(struct sock *sk, struct
> sock *ssk)
>  
>  #define MPTCP_FLAGS_PROCESS_CTX_NEED (BIT(MPTCP_PUSH_PENDING) | \
>  				      BIT(MPTCP_RETRANSMIT) | \
> -				      BIT(MPTCP_FLUSH_JOIN_LIST) | \
> -				      BIT(MPTCP_DEQUEUE))
> +				      BIT(MPTCP_FLUSH_JOIN_LIST))
>  
>  /* processes deferred events and flush wmem */
>  static void mptcp_release_cb(struct sock *sk)
> @@ -3464,11 +3511,6 @@ static void mptcp_release_cb(struct sock *sk)
>  			__mptcp_push_pending(sk, 0);
>  		if (flags & BIT(MPTCP_RETRANSMIT))
>  			__mptcp_retrans(sk);
> -		if ((flags & BIT(MPTCP_DEQUEUE)) &&
> __mptcp_move_skbs(sk)) {
> -			/* notify ack seq update */
> -			mptcp_cleanup_rbuf(msk, 0);
> -			sk->sk_data_ready(sk);
> -		}
>  
>  		cond_resched();
>  		spin_lock_bh(&sk->sk_lock.slock);
> @@ -3704,8 +3746,6 @@ static int mptcp_ioctl(struct sock *sk, int
> cmd, int *karg)
>  			return -EINVAL;
>  
>  		lock_sock(sk);
> -		if (__mptcp_move_skbs(sk))
> -			mptcp_cleanup_rbuf(msk, 0);
>  		*karg = mptcp_inq_hint(sk);
>  		release_sock(sk);
>  		break;
> @@ -3817,6 +3857,7 @@ static struct proto mptcp_prot = {
>  	.sendmsg	= mptcp_sendmsg,
>  	.ioctl		= mptcp_ioctl,
>  	.recvmsg	= mptcp_recvmsg,
> +	.backlog_rcv	= mptcp_move_skb,
>  	.release_cb	= mptcp_release_cb,
>  	.hash		= mptcp_hash,
>  	.unhash		= mptcp_unhash,
> diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
> index 6ac58e92a1aa3..7bfd4e0d21a8a 100644
> --- a/net/mptcp/protocol.h
> +++ b/net/mptcp/protocol.h
> @@ -124,7 +124,6 @@
>  #define MPTCP_FLUSH_JOIN_LIST	5
>  #define MPTCP_SYNC_STATE	6
>  #define MPTCP_SYNC_SNDBUF	7
> -#define MPTCP_DEQUEUE		8
>  
>  struct mptcp_skb_cb {
>  	u64 map_seq;
> @@ -408,6 +407,7 @@ static inline int mptcp_space_from_win(const
> struct sock *sk, int win)
>  static inline int __mptcp_space(const struct sock *sk)
>  {
>  	return mptcp_win_from_space(sk, READ_ONCE(sk->sk_rcvbuf) -
> +				    READ_ONCE(sk->sk_backlog.len) -
>  				    sk_rmem_alloc_get(sk));
>  }
>  

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [MPTCP next v3 10/12] mptcp: leverage the sk backlog for RX packet processing.
  2025-09-20  0:09   ` Geliang Tang
@ 2025-09-21  0:27     ` Geliang Tang
  0 siblings, 0 replies; 31+ messages in thread
From: Geliang Tang @ 2025-09-21  0:27 UTC (permalink / raw)
  To: Paolo Abeni, mptcp

On Sat, 2025-09-20 at 08:09 +0800, Geliang Tang wrote:
> On Fri, 2025-09-19 at 17:53 +0200, Paolo Abeni wrote:
> > This streamline the RX path implementation and improves the RX
> > performances by reducing the subflow-level locking and the amount
> > of
> > work done under the msk socket lock; the implementation mirror
> > closely
> > the TCP backlog processing.
> > 
> > Note that MPTCP needs now to traverse the existing subflow looking
> > for
> > data that was left there due to the msk receive buffer full, only
> > after
> > that recvmsg completely empties the receive queue.
> > 
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > ---
> >  net/mptcp/protocol.c | 103 ++++++++++++++++++++++++++++++---------
> > --
> > --
> >  net/mptcp/protocol.h |   2 +-
> >  2 files changed, 73 insertions(+), 32 deletions(-)
> > 
> > diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> > index c8b02048126a9..201e6ac5fe631 100644
> > --- a/net/mptcp/protocol.c
> > +++ b/net/mptcp/protocol.c
> > @@ -360,6 +360,27 @@ static void mptcp_init_skb(struct sock *ssk,
> >  	skb_dst_drop(skb);
> >  }
> >  
> > +static void __mptcp_add_backlog(struct sock *sk, struct sock *ssk,
> > +				struct sk_buff *skb)
> > +{
> > +	struct sk_buff *tail = sk->sk_backlog.tail;
> > +	bool fragstolen;
> > +	int delta;
> > +
> > +	if (tail && MPTCP_SKB_CB(skb)->map_seq ==
> > MPTCP_SKB_CB(tail)->end_seq) {
> > +		delta = __mptcp_try_coalesce(sk, tail, skb,
> > &fragstolen);
> > +		if (delta) {
> > +			sk->sk_backlog.len += delta;
> > +			kfree_skb_partial(skb, fragstolen);
> > +			return;
> > +		}
> > +	}
> > +
> > +	/* mptcp checks the limit before adding the skb to the
> > backlog */
> > +	__sk_add_backlog(sk, skb);
> > +	sk->sk_backlog.len += skb->truesize;
> > +}
> > +
> >  static bool __mptcp_move_skb(struct sock *sk, struct sk_buff *skb)
> >  {
> >  	u64 copy_len = MPTCP_SKB_CB(skb)->end_seq -
> > MPTCP_SKB_CB(skb)->map_seq;
> > @@ -648,7 +669,7 @@ static void mptcp_dss_corruption(struct
> > mptcp_sock *msk, struct sock *ssk)
> >  }
> >  
> >  static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk,
> > -					   struct sock *ssk)
> > +					   struct sock *ssk, bool
> > own_msk)
> >  {
> >  	struct mptcp_subflow_context *subflow =
> > mptcp_subflow_ctx(ssk);
> >  	struct sock *sk = (struct sock *)msk;
> > @@ -659,12 +680,13 @@ static bool
> > __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk,
> >  	pr_debug("msk=%p ssk=%p\n", msk, ssk);
> >  	tp = tcp_sk(ssk);
> >  	do {
> > +		int mem = own_msk ? sk_rmem_alloc_get(sk) : sk-
> > > sk_backlog.len;
> >  		u32 map_remaining, offset;
> >  		u32 seq = tp->copied_seq;
> >  		struct sk_buff *skb;
> >  		bool fin;
> >  
> > -		if (sk_rmem_alloc_get(sk) > sk->sk_rcvbuf)
> > +		if (mem > READ_ONCE(sk->sk_rcvbuf))
> >  			break;
> >  
> >  		/* try to move as much data as available */
> > @@ -694,7 +716,11 @@ static bool
> > __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk,
> >  
> >  			mptcp_init_skb(ssk, skb, offset, len);
> >  			skb_orphan(skb);
> > -			ret = __mptcp_move_skb(sk, skb) || ret;
> > +
> > +			if (own_msk)
> > +				ret |= __mptcp_move_skb(sk, skb);
> > +			else
> > +				__mptcp_add_backlog(sk, ssk, skb);
> >  			seq += len;
> >  
> >  			if (unlikely(map_remaining < len)) {
> > @@ -715,7 +741,7 @@ static bool
> > __mptcp_move_skbs_from_subflow(struct
> > mptcp_sock *msk,
> >  
> >  	} while (more_data_avail);
> >  
> > -	if (ret)
> > +	if (ret && own_msk)
> >  		msk->last_data_recv = tcp_jiffies32;
> >  	return ret;
> >  }
> > @@ -813,7 +839,7 @@ static bool move_skbs_to_msk(struct mptcp_sock
> > *msk, struct sock *ssk)
> >  	struct sock *sk = (struct sock *)msk;
> >  	bool moved;
> >  
> > -	moved = __mptcp_move_skbs_from_subflow(msk, ssk);
> > +	moved = __mptcp_move_skbs_from_subflow(msk, ssk, true);
> >  	__mptcp_ofo_queue(msk);
> >  	if (unlikely(ssk->sk_err))
> >  		__mptcp_subflow_error_report(sk, ssk);
> > @@ -828,18 +854,10 @@ static bool move_skbs_to_msk(struct
> > mptcp_sock
> > *msk, struct sock *ssk)
> >  	return moved;
> >  }
> >  
> > -static void __mptcp_data_ready(struct sock *sk, struct sock *ssk)
> > -{
> > -	struct mptcp_sock *msk = mptcp_sk(sk);
> > -
> > -	/* Wake-up the reader only for in-sequence data */
> > -	if (move_skbs_to_msk(msk, ssk) && mptcp_epollin_ready(sk))
> > -		sk->sk_data_ready(sk);
> > -}
> > -
> >  void mptcp_data_ready(struct sock *sk, struct sock *ssk)
> >  {
> >  	struct mptcp_subflow_context *subflow =
> > mptcp_subflow_ctx(ssk);
> > +	struct mptcp_sock *msk = mptcp_sk(sk);
> >  
> >  	/* The peer can send data while we are shutting down this
> >  	 * subflow at msk destruction time, but we must avoid
> > enqueuing
> > @@ -849,13 +867,33 @@ void mptcp_data_ready(struct sock *sk, struct
> > sock *ssk)
> >  		return;
> >  
> >  	mptcp_data_lock(sk);
> > -	if (!sock_owned_by_user(sk))
> > -		__mptcp_data_ready(sk, ssk);
> > -	else
> > -		__set_bit(MPTCP_DEQUEUE, &mptcp_sk(sk)->cb_flags);
> > +	if (!sock_owned_by_user(sk)) {
> > +		/* Wake-up the reader only for in-sequence data */
> > +		if (move_skbs_to_msk(msk, ssk) &&
> > mptcp_epollin_ready(sk))
> > +			sk->sk_data_ready(sk);
> > +	} else {
> > +		__mptcp_move_skbs_from_subflow(msk, ssk, false);
> > +		if (unlikely(ssk->sk_err))
> > +			__set_bit(MPTCP_ERROR_REPORT,  &msk-
> > > cb_flags);
> > +	}
> >  	mptcp_data_unlock(sk);
> >  }
> >  
> > +static int mptcp_move_skb(struct sock *sk, struct sk_buff *skb)
> > +{
> > +	struct mptcp_sock *msk = mptcp_sk(sk);
> > +
> > +	if (__mptcp_move_skb(sk, skb)) {
> > +		msk->last_data_recv = tcp_jiffies32;
> > +		__mptcp_ofo_queue(msk);
> > +		/* notify ack seq update */
> > +		mptcp_cleanup_rbuf(msk, 0);
> > +		mptcp_check_data_fin(sk);
> > +		sk->sk_data_ready(sk);
> > +	}
> > +	return 0;
> > +}
> > +
> >  static void mptcp_subflow_joined(struct mptcp_sock *msk, struct
> > sock
> > *ssk)
> >  {
> >  	mptcp_subflow_ctx(ssk)->map_seq = READ_ONCE(msk->ack_seq);
> > @@ -2117,7 +2155,7 @@ static bool __mptcp_move_skbs(struct sock
> > *sk)
> >  
> >  		ssk = mptcp_subflow_tcp_sock(subflow);
> >  		slowpath = lock_sock_fast(ssk);
> > -		ret = __mptcp_move_skbs_from_subflow(msk, ssk) ||
> > ret;
> > +		ret = __mptcp_move_skbs_from_subflow(msk, ssk,
> > true)
> > > > ret;
> >  		if (unlikely(ssk->sk_err))
> >  			__mptcp_error_report(sk);
> >  		unlock_sock_fast(ssk, slowpath);
> > @@ -2193,8 +2231,12 @@ static int mptcp_recvmsg(struct sock *sk,
> > struct msghdr *msg, size_t len,
> >  
> >  		copied += bytes_read;
> >  
> > -		if (skb_queue_empty(&sk->sk_receive_queue) &&
> > __mptcp_move_skbs(sk))
> > -			continue;
> > +		if (skb_queue_empty(&sk->sk_receive_queue)) {
> > +			__sk_flush_backlog(sk);
> > +			if (!skb_queue_empty(&sk-
> > >sk_receive_queue)
> > > > 
> > +			    __mptcp_move_skbs(sk))
> > +				continue;
> > +		}
> >  
> >  		/* only the MPTCP socket status is relevant here.
> > The exit
> >  		 * conditions mirror closely tcp_recvmsg()
> > @@ -2542,7 +2584,6 @@ static void __mptcp_close_subflow(struct sock
> > *sk)
> >  
> >  		mptcp_close_ssk(sk, ssk, subflow);
> >  	}
> > -
> 
> Deleting this blank line is unrelated to the current patch. Let's
> make
> the change later when we modify the __mptcp_close_subflow function
> together.
> 
> I need to spend some more time understanding and testing patches 10-
> 12.

Please delete the sentence in the subject of this patch to maintain
consistency with the other patches in the series.

Reviewed-by: Geliang Tang <geliang@kernel.org>
Tested-by: Geliang Tang <geliang@kernel.org>

Thanks,
-Geliang

> 
> Thanks,
> -Geliang
> 
> >  }
> >  
> >  static bool mptcp_close_tout_expired(const struct sock *sk)
> > @@ -3126,6 +3167,13 @@ bool __mptcp_close(struct sock *sk, long
> > timeout)
> >  	pr_debug("msk=%p state=%d\n", sk, sk->sk_state);
> >  	mptcp_pm_connection_closed(msk);
> >  
> > +	/* process the backlog; note that it never destroies the
> > msk
> > */
> > +	local_bh_disable();
> > +	bh_lock_sock(sk);
> > +	__release_sock(sk);
> > +	bh_unlock_sock(sk);
> > +	local_bh_enable();
> > +
> >  	if (sk->sk_state == TCP_CLOSE) {
> >  		__mptcp_destroy_sock(sk);
> >  		do_cancel_work = true;
> > @@ -3429,8 +3477,7 @@ void __mptcp_check_push(struct sock *sk,
> > struct
> > sock *ssk)
> >  
> >  #define MPTCP_FLAGS_PROCESS_CTX_NEED (BIT(MPTCP_PUSH_PENDING) | \
> >  				      BIT(MPTCP_RETRANSMIT) | \
> > -				      BIT(MPTCP_FLUSH_JOIN_LIST) |
> > \
> > -				      BIT(MPTCP_DEQUEUE))
> > +				      BIT(MPTCP_FLUSH_JOIN_LIST))
> >  
> >  /* processes deferred events and flush wmem */
> >  static void mptcp_release_cb(struct sock *sk)
> > @@ -3464,11 +3511,6 @@ static void mptcp_release_cb(struct sock
> > *sk)
> >  			__mptcp_push_pending(sk, 0);
> >  		if (flags & BIT(MPTCP_RETRANSMIT))
> >  			__mptcp_retrans(sk);
> > -		if ((flags & BIT(MPTCP_DEQUEUE)) &&
> > __mptcp_move_skbs(sk)) {
> > -			/* notify ack seq update */
> > -			mptcp_cleanup_rbuf(msk, 0);
> > -			sk->sk_data_ready(sk);
> > -		}
> >  
> >  		cond_resched();
> >  		spin_lock_bh(&sk->sk_lock.slock);
> > @@ -3704,8 +3746,6 @@ static int mptcp_ioctl(struct sock *sk, int
> > cmd, int *karg)
> >  			return -EINVAL;
> >  
> >  		lock_sock(sk);
> > -		if (__mptcp_move_skbs(sk))
> > -			mptcp_cleanup_rbuf(msk, 0);
> >  		*karg = mptcp_inq_hint(sk);
> >  		release_sock(sk);
> >  		break;
> > @@ -3817,6 +3857,7 @@ static struct proto mptcp_prot = {
> >  	.sendmsg	= mptcp_sendmsg,
> >  	.ioctl		= mptcp_ioctl,
> >  	.recvmsg	= mptcp_recvmsg,
> > +	.backlog_rcv	= mptcp_move_skb,
> >  	.release_cb	= mptcp_release_cb,
> >  	.hash		= mptcp_hash,
> >  	.unhash		= mptcp_unhash,
> > diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
> > index 6ac58e92a1aa3..7bfd4e0d21a8a 100644
> > --- a/net/mptcp/protocol.h
> > +++ b/net/mptcp/protocol.h
> > @@ -124,7 +124,6 @@
> >  #define MPTCP_FLUSH_JOIN_LIST	5
> >  #define MPTCP_SYNC_STATE	6
> >  #define MPTCP_SYNC_SNDBUF	7
> > -#define MPTCP_DEQUEUE		8
> >  
> >  struct mptcp_skb_cb {
> >  	u64 map_seq;
> > @@ -408,6 +407,7 @@ static inline int mptcp_space_from_win(const
> > struct sock *sk, int win)
> >  static inline int __mptcp_space(const struct sock *sk)
> >  {
> >  	return mptcp_win_from_space(sk, READ_ONCE(sk->sk_rcvbuf) -
> > +				    READ_ONCE(sk->sk_backlog.len)
> > -
> >  				    sk_rmem_alloc_get(sk));
> >  }
> >  

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [MPTCP next v3 11/12] mptcp: prevernt __mptcp_move_skbs() interfering with the fastpath
  2025-09-19 15:53 [MPTCP next v3 00/12] mptcp: receive path improvement Paolo Abeni
                   ` (9 preceding siblings ...)
  2025-09-19 15:53 ` [MPTCP next v3 10/12] mptcp: leverage the sk backlog for RX packet processing Paolo Abeni
@ 2025-09-19 15:53 ` Paolo Abeni
  2025-09-21  0:27   ` Geliang Tang
  2025-09-19 15:53 ` [MPTCP next v3 12/12] mptcp: borrow forward memory from subflow Paolo Abeni
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 31+ messages in thread
From: Paolo Abeni @ 2025-09-19 15:53 UTC (permalink / raw)
  To: mptcp

skbs will be left waiting in the subflow only in exceptional cases,
we want to avoid messing with the fast path by unintentionally
processing in __mptcp_move_skbs() packets landed into the subflows
after the last check.

Use a separate flag to mark delayed skbs and only process subflow
with such flag set. Also add new mibs to track the exceptional events.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
v1 -> v2:
  - rebased
---
 net/mptcp/mib.c      |  2 ++
 net/mptcp/mib.h      |  4 ++++
 net/mptcp/protocol.c | 40 ++++++++++++----------------------------
 net/mptcp/protocol.h |  1 +
 4 files changed, 19 insertions(+), 28 deletions(-)

diff --git a/net/mptcp/mib.c b/net/mptcp/mib.c
index 6003e47c770a7..ac5ccf81159de 100644
--- a/net/mptcp/mib.c
+++ b/net/mptcp/mib.c
@@ -85,6 +85,8 @@ static const struct snmp_mib mptcp_snmp_list[] = {
 	SNMP_MIB_ITEM("DssFallback", MPTCP_MIB_DSSFALLBACK),
 	SNMP_MIB_ITEM("SimultConnectFallback", MPTCP_MIB_SIMULTCONNFALLBACK),
 	SNMP_MIB_ITEM("FallbackFailed", MPTCP_MIB_FALLBACKFAILED),
+	SNMP_MIB_ITEM("RcvDelayed", MPTCP_MIB_RCVDELAYED),
+	SNMP_MIB_ITEM("DelayedProcess", MPTCP_MIB_DELAYED_PROCESS),
 };
 
 /* mptcp_mib_alloc - allocate percpu mib counters
diff --git a/net/mptcp/mib.h b/net/mptcp/mib.h
index 309bac6fea325..f6d0eaea463e5 100644
--- a/net/mptcp/mib.h
+++ b/net/mptcp/mib.h
@@ -88,6 +88,10 @@ enum linux_mptcp_mib_field {
 	MPTCP_MIB_DSSFALLBACK,		/* Bad or missing DSS */
 	MPTCP_MIB_SIMULTCONNFALLBACK,	/* Simultaneous connect */
 	MPTCP_MIB_FALLBACKFAILED,	/* Can't fallback due to msk status */
+	MPTCP_MIB_RCVDELAYED,		/* Data move from subflow is delayed due to msk
+					 * receive buffer full
+					 */
+	MPTCP_MIB_DELAYED_PROCESS,	/* Delayed data moved in slowpath */
 	__MPTCP_MIB_MAX
 };
 
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 201e6ac5fe631..2a025c0c4ca0c 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -681,13 +681,17 @@ static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk,
 	tp = tcp_sk(ssk);
 	do {
 		int mem = own_msk ? sk_rmem_alloc_get(sk) : sk->sk_backlog.len;
+		bool over_limit = mem > READ_ONCE(sk->sk_rcvbuf);
 		u32 map_remaining, offset;
 		u32 seq = tp->copied_seq;
 		struct sk_buff *skb;
 		bool fin;
 
-		if (mem > READ_ONCE(sk->sk_rcvbuf))
+		WRITE_ONCE(subflow->data_delayed, over_limit);
+		if (subflow->data_delayed) {
+			MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_RCVDELAYED);
 			break;
+		}
 
 		/* try to move as much data as available */
 		map_remaining = subflow->map_data_len -
@@ -2113,32 +2117,13 @@ static void mptcp_rcv_space_adjust(struct mptcp_sock *msk, int copied)
 	msk->rcvq_space.time = mstamp;
 }
 
-static struct mptcp_subflow_context *
-__mptcp_first_ready_from(struct mptcp_sock *msk,
-			 struct mptcp_subflow_context *subflow)
-{
-	struct mptcp_subflow_context *start_subflow = subflow;
-
-	while (!READ_ONCE(subflow->data_avail)) {
-		subflow = mptcp_next_subflow(msk, subflow);
-		if (subflow == start_subflow)
-			return NULL;
-	}
-	return subflow;
-}
-
 static bool __mptcp_move_skbs(struct sock *sk)
 {
 	struct mptcp_subflow_context *subflow;
 	struct mptcp_sock *msk = mptcp_sk(sk);
 	bool ret = false;
 
-	if (list_empty(&msk->conn_list))
-		return false;
-
-	subflow = list_first_entry(&msk->conn_list,
-				   struct mptcp_subflow_context, node);
-	for (;;) {
+	mptcp_for_each_subflow(msk, subflow) {
 		struct sock *ssk;
 		bool slowpath;
 
@@ -2149,23 +2134,22 @@ static bool __mptcp_move_skbs(struct sock *sk)
 		if (sk_rmem_alloc_get(sk) > sk->sk_rcvbuf)
 			break;
 
-		subflow = __mptcp_first_ready_from(msk, subflow);
-		if (!subflow)
-			break;
+		if (!subflow->data_delayed)
+			continue;
 
 		ssk = mptcp_subflow_tcp_sock(subflow);
 		slowpath = lock_sock_fast(ssk);
-		ret = __mptcp_move_skbs_from_subflow(msk, ssk, true) || ret;
+		ret |= __mptcp_move_skbs_from_subflow(msk, ssk, true);
 		if (unlikely(ssk->sk_err))
 			__mptcp_error_report(sk);
 		unlock_sock_fast(ssk, slowpath);
-
-		subflow = mptcp_next_subflow(msk, subflow);
 	}
 
 	__mptcp_ofo_queue(msk);
-	if (ret)
+	if (ret) {
+		MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_DELAYED_PROCESS);
 		mptcp_check_data_fin((struct sock *)msk);
+	}
 	return ret;
 }
 
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 7bfd4e0d21a8a..a295ce11774ea 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -560,6 +560,7 @@ struct mptcp_subflow_context {
 	u8	reset_transient:1;
 	u8	reset_reason:4;
 	u8	stale_count;
+	bool	data_delayed;
 
 	u32	subflow_id;
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [MPTCP next v3 11/12] mptcp: prevernt __mptcp_move_skbs() interfering with the fastpath
  2025-09-19 15:53 ` [MPTCP next v3 11/12] mptcp: prevernt __mptcp_move_skbs() interfering with the fastpath Paolo Abeni
@ 2025-09-21  0:27   ` Geliang Tang
  0 siblings, 0 replies; 31+ messages in thread
From: Geliang Tang @ 2025-09-21  0:27 UTC (permalink / raw)
  To: Paolo Abeni, mptcp

On Fri, 2025-09-19 at 17:53 +0200, Paolo Abeni wrote:
> skbs will be left waiting in the subflow only in exceptional cases,
> we want to avoid messing with the fast path by unintentionally
> processing in __mptcp_move_skbs() packets landed into the subflows
> after the last check.
> 
> Use a separate flag to mark delayed skbs and only process subflow
> with such flag set. Also add new mibs to track the exceptional
> events.
> 
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>

LGTM!

Reviewed-by: Geliang Tang <geliang@kernel.org>
Tested-by: Geliang Tang <geliang@kernel.org>

Thanks,
-Geliang

> ---
> v1 -> v2:
>   - rebased
> ---
>  net/mptcp/mib.c      |  2 ++
>  net/mptcp/mib.h      |  4 ++++
>  net/mptcp/protocol.c | 40 ++++++++++++----------------------------
>  net/mptcp/protocol.h |  1 +
>  4 files changed, 19 insertions(+), 28 deletions(-)
> 
> diff --git a/net/mptcp/mib.c b/net/mptcp/mib.c
> index 6003e47c770a7..ac5ccf81159de 100644
> --- a/net/mptcp/mib.c
> +++ b/net/mptcp/mib.c
> @@ -85,6 +85,8 @@ static const struct snmp_mib mptcp_snmp_list[] = {
>  	SNMP_MIB_ITEM("DssFallback", MPTCP_MIB_DSSFALLBACK),
>  	SNMP_MIB_ITEM("SimultConnectFallback",
> MPTCP_MIB_SIMULTCONNFALLBACK),
>  	SNMP_MIB_ITEM("FallbackFailed", MPTCP_MIB_FALLBACKFAILED),
> +	SNMP_MIB_ITEM("RcvDelayed", MPTCP_MIB_RCVDELAYED),
> +	SNMP_MIB_ITEM("DelayedProcess", MPTCP_MIB_DELAYED_PROCESS),
>  };
>  
>  /* mptcp_mib_alloc - allocate percpu mib counters
> diff --git a/net/mptcp/mib.h b/net/mptcp/mib.h
> index 309bac6fea325..f6d0eaea463e5 100644
> --- a/net/mptcp/mib.h
> +++ b/net/mptcp/mib.h
> @@ -88,6 +88,10 @@ enum linux_mptcp_mib_field {
>  	MPTCP_MIB_DSSFALLBACK,		/* Bad or missing DSS */
>  	MPTCP_MIB_SIMULTCONNFALLBACK,	/* Simultaneous connect */
>  	MPTCP_MIB_FALLBACKFAILED,	/* Can't fallback due to msk
> status */
> +	MPTCP_MIB_RCVDELAYED,		/* Data move from subflow is
> delayed due to msk
> +					 * receive buffer full
> +					 */
> +	MPTCP_MIB_DELAYED_PROCESS,	/* Delayed data moved in
> slowpath */
>  	__MPTCP_MIB_MAX
>  };
>  
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index 201e6ac5fe631..2a025c0c4ca0c 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -681,13 +681,17 @@ static bool
> __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk,
>  	tp = tcp_sk(ssk);
>  	do {
>  		int mem = own_msk ? sk_rmem_alloc_get(sk) : sk-
> >sk_backlog.len;
> +		bool over_limit = mem > READ_ONCE(sk->sk_rcvbuf);
>  		u32 map_remaining, offset;
>  		u32 seq = tp->copied_seq;
>  		struct sk_buff *skb;
>  		bool fin;
>  
> -		if (mem > READ_ONCE(sk->sk_rcvbuf))
> +		WRITE_ONCE(subflow->data_delayed, over_limit);
> +		if (subflow->data_delayed) {
> +			MPTCP_INC_STATS(sock_net(sk),
> MPTCP_MIB_RCVDELAYED);
>  			break;
> +		}
>  
>  		/* try to move as much data as available */
>  		map_remaining = subflow->map_data_len -
> @@ -2113,32 +2117,13 @@ static void mptcp_rcv_space_adjust(struct
> mptcp_sock *msk, int copied)
>  	msk->rcvq_space.time = mstamp;
>  }
>  
> -static struct mptcp_subflow_context *
> -__mptcp_first_ready_from(struct mptcp_sock *msk,
> -			 struct mptcp_subflow_context *subflow)
> -{
> -	struct mptcp_subflow_context *start_subflow = subflow;
> -
> -	while (!READ_ONCE(subflow->data_avail)) {
> -		subflow = mptcp_next_subflow(msk, subflow);
> -		if (subflow == start_subflow)
> -			return NULL;
> -	}
> -	return subflow;
> -}
> -
>  static bool __mptcp_move_skbs(struct sock *sk)
>  {
>  	struct mptcp_subflow_context *subflow;
>  	struct mptcp_sock *msk = mptcp_sk(sk);
>  	bool ret = false;
>  
> -	if (list_empty(&msk->conn_list))
> -		return false;
> -
> -	subflow = list_first_entry(&msk->conn_list,
> -				   struct mptcp_subflow_context,
> node);
> -	for (;;) {
> +	mptcp_for_each_subflow(msk, subflow) {
>  		struct sock *ssk;
>  		bool slowpath;
>  
> @@ -2149,23 +2134,22 @@ static bool __mptcp_move_skbs(struct sock
> *sk)
>  		if (sk_rmem_alloc_get(sk) > sk->sk_rcvbuf)
>  			break;
>  
> -		subflow = __mptcp_first_ready_from(msk, subflow);
> -		if (!subflow)
> -			break;
> +		if (!subflow->data_delayed)
> +			continue;
>  
>  		ssk = mptcp_subflow_tcp_sock(subflow);
>  		slowpath = lock_sock_fast(ssk);
> -		ret = __mptcp_move_skbs_from_subflow(msk, ssk, true)
> || ret;
> +		ret |= __mptcp_move_skbs_from_subflow(msk, ssk,
> true);
>  		if (unlikely(ssk->sk_err))
>  			__mptcp_error_report(sk);
>  		unlock_sock_fast(ssk, slowpath);
> -
> -		subflow = mptcp_next_subflow(msk, subflow);
>  	}
>  
>  	__mptcp_ofo_queue(msk);
> -	if (ret)
> +	if (ret) {
> +		MPTCP_INC_STATS(sock_net(sk),
> MPTCP_MIB_DELAYED_PROCESS);
>  		mptcp_check_data_fin((struct sock *)msk);
> +	}
>  	return ret;
>  }
>  
> diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
> index 7bfd4e0d21a8a..a295ce11774ea 100644
> --- a/net/mptcp/protocol.h
> +++ b/net/mptcp/protocol.h
> @@ -560,6 +560,7 @@ struct mptcp_subflow_context {
>  	u8	reset_transient:1;
>  	u8	reset_reason:4;
>  	u8	stale_count;
> +	bool	data_delayed;
>  
>  	u32	subflow_id;
>  

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [MPTCP next v3 12/12] mptcp: borrow forward memory from subflow
  2025-09-19 15:53 [MPTCP next v3 00/12] mptcp: receive path improvement Paolo Abeni
                   ` (10 preceding siblings ...)
  2025-09-19 15:53 ` [MPTCP next v3 11/12] mptcp: prevernt __mptcp_move_skbs() interfering with the fastpath Paolo Abeni
@ 2025-09-19 15:53 ` Paolo Abeni
  2025-09-21  0:28   ` Geliang Tang
  2025-09-19 18:36 ` [MPTCP next v3 00/12] mptcp: receive path improvement MPTCP CI
  2025-09-19 21:13 ` Matthieu Baerts
  13 siblings, 1 reply; 31+ messages in thread
From: Paolo Abeni @ 2025-09-19 15:53 UTC (permalink / raw)
  To: mptcp

In the MPTCP receive path, we release the subflow allocated
fwd memory just to allocate it again shortly after for the msk.

That could increases the failures chances, especially during
backlog processing, when other actions could consume the just
released memory before the msk socket has a chance to do the
rcv allocation.

Replace the skb_orphan() call with an open-coded variant that
explicitly borrows, with a PAGE_SIZE granularity, the fwd memory
from the subflow socket instead of releasing it. During backlog
processing the borrowed memory is accounted at release_cb time.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
v1 -> v2:
  - rebased
  - explain why skb_orphan is removed
---
 net/mptcp/protocol.c | 27 +++++++++++++++++++++------
 net/mptcp/protocol.h |  1 +
 2 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 2a025c0c4ca0c..7db5adb43d41b 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -338,11 +338,12 @@ static void mptcp_data_queue_ofo(struct mptcp_sock *msk, struct sk_buff *skb)
 		mptcp_rcvbuf_grow(sk);
 }
 
-static void mptcp_init_skb(struct sock *ssk,
-			   struct sk_buff *skb, int offset, int copy_len)
+static int mptcp_init_skb(struct sock *ssk,
+			  struct sk_buff *skb, int offset, int copy_len)
 {
 	const struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
 	bool has_rxtstamp = TCP_SKB_CB(skb)->has_rxtstamp;
+	int borrowed;
 
 	/* the skb map_seq accounts for the skb offset:
 	 * mptcp_subflow_get_mapped_dsn() is based on the current tp->copied_seq
@@ -358,6 +359,15 @@ static void mptcp_init_skb(struct sock *ssk,
 
 	skb_ext_reset(skb);
 	skb_dst_drop(skb);
+
+	/* "borrow" the fwd memory from the subflow, instead of reclaiming it */
+	skb->destructor = NULL;
+	skb->sk = NULL;
+	atomic_sub(skb->truesize, &ssk->sk_rmem_alloc);
+	borrowed = ssk->sk_forward_alloc - sk_unused_reserved_mem(ssk);
+	borrowed &= ~(PAGE_SIZE - 1);
+	sk_forward_alloc_add(ssk, skb->truesize - borrowed);
+	return borrowed;
 }
 
 static void __mptcp_add_backlog(struct sock *sk, struct sock *ssk,
@@ -717,14 +727,17 @@ static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk,
 
 		if (offset < skb->len) {
 			size_t len = skb->len - offset;
+			int bmem;
 
-			mptcp_init_skb(ssk, skb, offset, len);
-			skb_orphan(skb);
+			bmem = mptcp_init_skb(ssk, skb, offset, len);
 
-			if (own_msk)
+			if (own_msk) {
+				sk_forward_alloc_add(sk, bmem);
 				ret |= __mptcp_move_skb(sk, skb);
-			else
+			} else {
+				msk->borrowed_fwd_mem += bmem;
 				__mptcp_add_backlog(sk, ssk, skb);
+			}
 			seq += len;
 
 			if (unlikely(map_remaining < len)) {
@@ -3514,6 +3527,8 @@ static void mptcp_release_cb(struct sock *sk)
 		if (__test_and_clear_bit(MPTCP_SYNC_SNDBUF, &msk->cb_flags))
 			__mptcp_sync_sndbuf(sk);
 	}
+	sk_forward_alloc_add(sk, msk->borrowed_fwd_mem);
+	msk->borrowed_fwd_mem = 0;
 }
 
 /* MP_JOIN client subflow must wait for 4th ack before sending any data:
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index a295ce11774ea..ff87dd9a0da5a 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -298,6 +298,7 @@ struct mptcp_sock {
 	u32		last_data_sent;
 	u32		last_data_recv;
 	u32		last_ack_recv;
+	int		borrowed_fwd_mem;
 	unsigned long	timer_ival;
 	u32		token;
 	unsigned long	flags;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [MPTCP next v3 12/12] mptcp: borrow forward memory from subflow
  2025-09-19 15:53 ` [MPTCP next v3 12/12] mptcp: borrow forward memory from subflow Paolo Abeni
@ 2025-09-21  0:28   ` Geliang Tang
  0 siblings, 0 replies; 31+ messages in thread
From: Geliang Tang @ 2025-09-21  0:28 UTC (permalink / raw)
  To: Paolo Abeni, mptcp

On Fri, 2025-09-19 at 17:53 +0200, Paolo Abeni wrote:
> In the MPTCP receive path, we release the subflow allocated
> fwd memory just to allocate it again shortly after for the msk.
> 
> That could increases the failures chances, especially during
> backlog processing, when other actions could consume the just
> released memory before the msk socket has a chance to do the
> rcv allocation.
> 
> Replace the skb_orphan() call with an open-coded variant that
> explicitly borrows, with a PAGE_SIZE granularity, the fwd memory
> from the subflow socket instead of releasing it. During backlog
> processing the borrowed memory is accounted at release_cb time.
> 
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>

LGTM!

Reviewed-by: Geliang Tang <geliang@kernel.org>
Tested-by: Geliang Tang <geliang@kernel.org>

Thanks,
-Geliang

> ---
> v1 -> v2:
>   - rebased
>   - explain why skb_orphan is removed
> ---
>  net/mptcp/protocol.c | 27 +++++++++++++++++++++------
>  net/mptcp/protocol.h |  1 +
>  2 files changed, 22 insertions(+), 6 deletions(-)
> 
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index 2a025c0c4ca0c..7db5adb43d41b 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -338,11 +338,12 @@ static void mptcp_data_queue_ofo(struct
> mptcp_sock *msk, struct sk_buff *skb)
>  		mptcp_rcvbuf_grow(sk);
>  }
>  
> -static void mptcp_init_skb(struct sock *ssk,
> -			   struct sk_buff *skb, int offset, int
> copy_len)
> +static int mptcp_init_skb(struct sock *ssk,
> +			  struct sk_buff *skb, int offset, int
> copy_len)
>  {
>  	const struct mptcp_subflow_context *subflow =
> mptcp_subflow_ctx(ssk);
>  	bool has_rxtstamp = TCP_SKB_CB(skb)->has_rxtstamp;
> +	int borrowed;
>  
>  	/* the skb map_seq accounts for the skb offset:
>  	 * mptcp_subflow_get_mapped_dsn() is based on the current
> tp->copied_seq
> @@ -358,6 +359,15 @@ static void mptcp_init_skb(struct sock *ssk,
>  
>  	skb_ext_reset(skb);
>  	skb_dst_drop(skb);
> +
> +	/* "borrow" the fwd memory from the subflow, instead of
> reclaiming it */
> +	skb->destructor = NULL;
> +	skb->sk = NULL;
> +	atomic_sub(skb->truesize, &ssk->sk_rmem_alloc);
> +	borrowed = ssk->sk_forward_alloc -
> sk_unused_reserved_mem(ssk);
> +	borrowed &= ~(PAGE_SIZE - 1);
> +	sk_forward_alloc_add(ssk, skb->truesize - borrowed);
> +	return borrowed;
>  }
>  
>  static void __mptcp_add_backlog(struct sock *sk, struct sock *ssk,
> @@ -717,14 +727,17 @@ static bool
> __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk,
>  
>  		if (offset < skb->len) {
>  			size_t len = skb->len - offset;
> +			int bmem;
>  
> -			mptcp_init_skb(ssk, skb, offset, len);
> -			skb_orphan(skb);
> +			bmem = mptcp_init_skb(ssk, skb, offset,
> len);
>  
> -			if (own_msk)
> +			if (own_msk) {
> +				sk_forward_alloc_add(sk, bmem);
>  				ret |= __mptcp_move_skb(sk, skb);
> -			else
> +			} else {
> +				msk->borrowed_fwd_mem += bmem;
>  				__mptcp_add_backlog(sk, ssk, skb);
> +			}
>  			seq += len;
>  
>  			if (unlikely(map_remaining < len)) {
> @@ -3514,6 +3527,8 @@ static void mptcp_release_cb(struct sock *sk)
>  		if (__test_and_clear_bit(MPTCP_SYNC_SNDBUF, &msk-
> >cb_flags))
>  			__mptcp_sync_sndbuf(sk);
>  	}
> +	sk_forward_alloc_add(sk, msk->borrowed_fwd_mem);
> +	msk->borrowed_fwd_mem = 0;
>  }
>  
>  /* MP_JOIN client subflow must wait for 4th ack before sending any
> data:
> diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
> index a295ce11774ea..ff87dd9a0da5a 100644
> --- a/net/mptcp/protocol.h
> +++ b/net/mptcp/protocol.h
> @@ -298,6 +298,7 @@ struct mptcp_sock {
>  	u32		last_data_sent;
>  	u32		last_data_recv;
>  	u32		last_ack_recv;
> +	int		borrowed_fwd_mem;
>  	unsigned long	timer_ival;
>  	u32		token;
>  	unsigned long	flags;

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [MPTCP next v3 00/12] mptcp: receive path improvement
  2025-09-19 15:53 [MPTCP next v3 00/12] mptcp: receive path improvement Paolo Abeni
                   ` (11 preceding siblings ...)
  2025-09-19 15:53 ` [MPTCP next v3 12/12] mptcp: borrow forward memory from subflow Paolo Abeni
@ 2025-09-19 18:36 ` MPTCP CI
  2025-09-19 21:13 ` Matthieu Baerts
  13 siblings, 0 replies; 31+ messages in thread
From: MPTCP CI @ 2025-09-19 18:36 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: mptcp

Hi Paolo,

Thank you for your modifications, that's great!

Our CI did some validations and here is its report:

- KVM Validation: normal: Success! ✅
- KVM Validation: debug: Unstable: 1 failed test(s): selftest_mptcp_connect 🔴
- KVM Validation: btf-normal (only bpftest_all): Success! ✅
- KVM Validation: btf-debug (only bpftest_all): Success! ✅
- Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/17863851247

Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/fe9e565e924a
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=1004322


If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:

    $ cd [kernel source code]
    $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
        --pull always mptcp/mptcp-upstream-virtme-docker:latest \
        auto-normal

For more details:

    https://github.com/multipath-tcp/mptcp-upstream-virtme-docker


Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)

Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [MPTCP next v3 00/12] mptcp: receive path improvement
  2025-09-19 15:53 [MPTCP next v3 00/12] mptcp: receive path improvement Paolo Abeni
                   ` (12 preceding siblings ...)
  2025-09-19 18:36 ` [MPTCP next v3 00/12] mptcp: receive path improvement MPTCP CI
@ 2025-09-19 21:13 ` Matthieu Baerts
  2025-09-20  4:13   ` Geliang Tang
                     ` (2 more replies)
  13 siblings, 3 replies; 31+ messages in thread
From: Matthieu Baerts @ 2025-09-19 21:13 UTC (permalink / raw)
  To: Paolo Abeni, mptcp

Hi Paolo,

On 19/09/2025 17:53, Paolo Abeni wrote:
> This series includes several changes to the MPTCP RX path.
> 
> The main goals are improving the RX performances _and_ increase the
> long term maintainability.
> 
> Some changes reflects recent (or not so) improvements introduced in the
> TCP stack: patch 1, 2 and 3 are the MPTCP counter part of skb deferral
> free and auto-tuning improvements.
> 
> Note that patch 3 could possibly fix issues/574, and overall such patch
> should protect from similar issues to arise in the future.
> 
> All the others patches are aimed at introducing the socket backlog usage
> to process the packets received by the subflows while the msk socket is
> owned. That (almost completely) replace the processing currently
> happening in the mptcp_release_cb().
> 
> The actual job is done in patch 10, while the others are cleanups needed
> to make the change tidy and more follow-up cleanups.
> 
> Sharing earlier with known issues (at least on fallback socket) to raise
> awareness about this upcoming work.
> ---
> v2 -> v3:
>   - (hopefully) addressed CI failures

Sadly, the CI doesn't seem that happy, but only in debug mode:

https://github.com/multipath-tcp/mptcp_net-next/actions/runs/17863851247

If it is easier for you, I can already apply (and send to netdev?) some
of these patches? e.g. 1-7/12?

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [MPTCP next v3 00/12] mptcp: receive path improvement
  2025-09-19 21:13 ` Matthieu Baerts
@ 2025-09-20  4:13   ` Geliang Tang
  2025-09-20  4:15   ` Geliang Tang
  2025-09-23 16:15   ` Paolo Abeni
  2 siblings, 0 replies; 31+ messages in thread
From: Geliang Tang @ 2025-09-20  4:13 UTC (permalink / raw)
  To: Matthieu Baerts, Paolo Abeni, mptcp

Hi Paolo, Matt,

On Fri, 2025-09-19 at 23:13 +0200, Matthieu Baerts wrote:
> Hi Paolo,
> 
> On 19/09/2025 17:53, Paolo Abeni wrote:
> > This series includes several changes to the MPTCP RX path.
> > 
> > The main goals are improving the RX performances _and_ increase the
> > long term maintainability.
> > 
> > Some changes reflects recent (or not so) improvements introduced in
> > the
> > TCP stack: patch 1, 2 and 3 are the MPTCP counter part of skb
> > deferral
> > free and auto-tuning improvements.
> > 
> > Note that patch 3 could possibly fix issues/574, and overall such
> > patch
> > should protect from similar issues to arise in the future.
> > 
> > All the others patches are aimed at introducing the socket backlog
> > usage
> > to process the packets received by the subflows while the msk
> > socket is
> > owned. That (almost completely) replace the processing currently
> > happening in the mptcp_release_cb().
> > 
> > The actual job is done in patch 10, while the others are cleanups
> > needed
> > to make the change tidy and more follow-up cleanups.
> > 
> > Sharing earlier with known issues (at least on fallback socket) to
> > raise
> > awareness about this upcoming work.
> > ---
> > v2 -> v3:
> >   - (hopefully) addressed CI failures
> 
> Sadly, the CI doesn't seem that happy, but only in debug mode:
> 
> https://github.com/multipath-tcp/mptcp_net-next/actions/runs/17863851247

I just sent two fixes for these CI failures to our mail list. It works
on my end. Still waiting for the CI results - hope everything goes
well.

Thanks,
-Geliang

> 
> If it is easier for you, I can already apply (and send to netdev?)
> some
> of these patches? e.g. 1-7/12?
> 
> Cheers,
> Matt

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [MPTCP next v3 00/12] mptcp: receive path improvement
  2025-09-19 21:13 ` Matthieu Baerts
  2025-09-20  4:13   ` Geliang Tang
@ 2025-09-20  4:15   ` Geliang Tang
  2025-09-23 16:15   ` Paolo Abeni
  2 siblings, 0 replies; 31+ messages in thread
From: Geliang Tang @ 2025-09-20  4:15 UTC (permalink / raw)
  To: Matthieu Baerts, Paolo Abeni, mptcp

Hi Paolo, Matt,

On Fri, 2025-09-19 at 23:13 +0200, Matthieu Baerts wrote:
> Hi Paolo,
> 
> On 19/09/2025 17:53, Paolo Abeni wrote:
> > This series includes several changes to the MPTCP RX path.
> > 
> > The main goals are improving the RX performances _and_ increase the
> > long term maintainability.
> > 
> > Some changes reflects recent (or not so) improvements introduced in
> > the
> > TCP stack: patch 1, 2 and 3 are the MPTCP counter part of skb
> > deferral
> > free and auto-tuning improvements.
> > 
> > Note that patch 3 could possibly fix issues/574, and overall such
> > patch
> > should protect from similar issues to arise in the future.
> > 
> > All the others patches are aimed at introducing the socket backlog
> > usage
> > to process the packets received by the subflows while the msk
> > socket is
> > owned. That (almost completely) replace the processing currently
> > happening in the mptcp_release_cb().
> > 
> > The actual job is done in patch 10, while the others are cleanups
> > needed
> > to make the change tidy and more follow-up cleanups.
> > 
> > Sharing earlier with known issues (at least on fallback socket) to
> > raise
> > awareness about this upcoming work.
> > ---
> > v2 -> v3:
> >   - (hopefully) addressed CI failures
> 
> Sadly, the CI doesn't seem that happy, but only in debug mode:
> 
> https://github.com/multipath-tcp/mptcp_net-next/actions/runs/17863851247

I just sent two fixes for these CI failures to our mail list. It works
on my end. Still waiting for the CI results - hope everything goes
well.

Thanks,
-Geliang

> 
> If it is easier for you, I can already apply (and send to netdev?)
> some
> of these patches? e.g. 1-7/12?
> 
> Cheers,
> Matt

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [MPTCP next v3 00/12] mptcp: receive path improvement
  2025-09-19 21:13 ` Matthieu Baerts
  2025-09-20  4:13   ` Geliang Tang
  2025-09-20  4:15   ` Geliang Tang
@ 2025-09-23 16:15   ` Paolo Abeni
  2 siblings, 0 replies; 31+ messages in thread
From: Paolo Abeni @ 2025-09-23 16:15 UTC (permalink / raw)
  To: Matthieu Baerts, mptcp

On 9/19/25 11:13 PM, Matthieu Baerts wrote:
> On 19/09/2025 17:53, Paolo Abeni wrote:
>> This series includes several changes to the MPTCP RX path.
>>
>> The main goals are improving the RX performances _and_ increase the
>> long term maintainability.
>>
>> Some changes reflects recent (or not so) improvements introduced in the
>> TCP stack: patch 1, 2 and 3 are the MPTCP counter part of skb deferral
>> free and auto-tuning improvements.
>>
>> Note that patch 3 could possibly fix issues/574, and overall such patch
>> should protect from similar issues to arise in the future.
>>
>> All the others patches are aimed at introducing the socket backlog usage
>> to process the packets received by the subflows while the msk socket is
>> owned. That (almost completely) replace the processing currently
>> happening in the mptcp_release_cb().
>>
>> The actual job is done in patch 10, while the others are cleanups needed
>> to make the change tidy and more follow-up cleanups.
>>
>> Sharing earlier with known issues (at least on fallback socket) to raise
>> awareness about this upcoming work.
>> ---
>> v2 -> v3:
>>   - (hopefully) addressed CI failures
> 
> Sadly, the CI doesn't seem that happy, but only in debug mode:
> 
> https://github.com/multipath-tcp/mptcp_net-next/actions/runs/17863851247
> 
> If it is easier for you, I can already apply (and send to netdev?) some
> of these patches? e.g. 1-7/12?

I did a bit more testing. AFAICS patch 10 introduces a few chances of
additional failures (even on top of Geliang's patches) I *think* patch
1-7 and 12 are safe.

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2025-09-23 16:16 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-19 15:53 [MPTCP next v3 00/12] mptcp: receive path improvement Paolo Abeni
2025-09-19 15:53 ` [MPTCP next v3 01/12] mptcp: leverage skb deferral free Paolo Abeni
2025-09-19 15:53 ` [MPTCP next v3 02/12] tcp: make tcp_rcvbuf_grow() accessible to mptcp code Paolo Abeni
2025-09-19 15:53 ` [MPTCP next v3 03/12] mptcp: rcvbuf auto-tuning improvement Paolo Abeni
2025-09-19 21:10   ` Matthieu Baerts
2025-09-19 15:53 ` [MPTCP next v3 04/12] mptcp: introduce the mptcp_init_skb helper Paolo Abeni
2025-09-20  0:01   ` Geliang Tang
2025-09-22 10:44     ` Paolo Abeni
2025-09-20  0:03   ` Geliang Tang
2025-09-21  0:23     ` Geliang Tang
2025-09-21  0:48       ` Geliang Tang
2025-09-19 15:53 ` [MPTCP next v3 05/12] mptcp: remove unneeded mptcp_move_skb() Paolo Abeni
2025-09-19 15:53 ` [MPTCP next v3 06/12] mptcp: factor out a basic skb coalesce helper Paolo Abeni
2025-09-19 15:53 ` [MPTCP next v3 07/12] mptcp: minor move_skbs_to_msk() cleanup Paolo Abeni
2025-09-19 15:53 ` [MPTCP next v3 08/12] mptcp: cleanup fallback data fin reception Paolo Abeni
2025-09-20  0:04   ` Geliang Tang
2025-09-19 15:53 ` [MPTCP next v3 09/12] mptcp: cleanup fallback dummy mapping generation Paolo Abeni
2025-09-20  0:06   ` Geliang Tang
2025-09-21  1:01     ` Geliang Tang
2025-09-19 15:53 ` [MPTCP next v3 10/12] mptcp: leverage the sk backlog for RX packet processing Paolo Abeni
2025-09-20  0:09   ` Geliang Tang
2025-09-21  0:27     ` Geliang Tang
2025-09-19 15:53 ` [MPTCP next v3 11/12] mptcp: prevernt __mptcp_move_skbs() interfering with the fastpath Paolo Abeni
2025-09-21  0:27   ` Geliang Tang
2025-09-19 15:53 ` [MPTCP next v3 12/12] mptcp: borrow forward memory from subflow Paolo Abeni
2025-09-21  0:28   ` Geliang Tang
2025-09-19 18:36 ` [MPTCP next v3 00/12] mptcp: receive path improvement MPTCP CI
2025-09-19 21:13 ` Matthieu Baerts
2025-09-20  4:13   ` Geliang Tang
2025-09-20  4:15   ` Geliang Tang
2025-09-23 16:15   ` Paolo Abeni

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.