MPTCP Linux Development
 help / color / mirror / Atom feed
From: Mat Martineau <martineau@kernel.org>
To: Paolo Abeni <pabeni@redhat.com>
Cc: mptcp@lists.linux.dev
Subject: Re: [PATCH v5 mptcp-next 10/10] mptcp: leverage the backlog for RX packet processing
Date: Mon, 20 Oct 2025 16:32:49 -0700 (PDT)	[thread overview]
Message-ID: <156d60d8-ee80-49fb-8b8e-d0be56f7a02a@kernel.org> (raw)
In-Reply-To: <1cb19d6e65591b81610cc0dd8ef5d0a38605b3f1.1759737859.git.pabeni@redhat.com>


On Mon, 6 Oct 2025, Paolo Abeni wrote:

> When the msk socket is owned or the msk receive buffer is full,
> move the incoming skbs in a msk level backlog list. This avoid
> traversing the joined subflows and acquiring the subflow level
> socket lock at reception time, improving the RX performances.
>
> when processing the backlog, use the fwd alloc memory borrowed from
> the incoming subflow. skbs exceeding the msk receive space are
> not dropped; instead they are kept into the backlog until the receive
> buffer is freed. Dropping packets already acked at the TCP level is
> explicitly discouraged by the RFC and would corrupt the data stream
> for fallback sockets.
>
> Special care is needed to avoid adding skbs to the backlog of a closed
> msk, and to avoid leaving dangling references into the backlog
> at subflow closing time.
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> v4 -> v5:
>  - consolidate ssk rcvbuf accunting in __mptcp_move_skb(), remove
>    some code duplication
>  - return soon in __mptcp_add_backlog() when dropping skbs due to
>    the msk closed. This avoid later UaF
> ---
> net/mptcp/protocol.c | 137 ++++++++++++++++++++++++-------------------
> net/mptcp/protocol.h |   2 +-
> 2 files changed, 79 insertions(+), 60 deletions(-)
>
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index 2d5d3da67d1ac..a97a92eccc502 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c

...

> @@ -3509,23 +3519,29 @@ void __mptcp_check_push(struct sock *sk, struct sock *ssk)
>
> #define MPTCP_FLAGS_PROCESS_CTX_NEED (BIT(MPTCP_PUSH_PENDING) | \
> 				      BIT(MPTCP_RETRANSMIT) | \
> -				      BIT(MPTCP_FLUSH_JOIN_LIST) | \
> -				      BIT(MPTCP_DEQUEUE))
> +				      BIT(MPTCP_FLUSH_JOIN_LIST))
>
> /* processes deferred events and flush wmem */
> static void mptcp_release_cb(struct sock *sk)
> 	__must_hold(&sk->sk_lock.slock)
> {
> 	struct mptcp_sock *msk = mptcp_sk(sk);
> +	u32 delta = 0;
>
> 	for (;;) {
> 		unsigned long flags = (msk->cb_flags & MPTCP_FLAGS_PROCESS_CTX_NEED);
> -		struct list_head join_list;
> +		LIST_HEAD(join_list);
> +		LIST_HEAD(skbs);
> +
> +		sk_forward_alloc_add(sk, msk->borrowed_mem);
> +		msk->borrowed_mem = 0;
> +
> +		if (sk_rmem_alloc_get(sk) < sk->sk_rcvbuf)
> +			list_splice_init(&msk->backlog_list, &skbs);
>
> -		if (!flags)
> +		if (!flags && list_empty(&skbs))
> 			break;
>
> -		INIT_LIST_HEAD(&join_list);
> 		list_splice_init(&msk->join_list, &join_list);
>
> 		/* the following actions acquire the subflow socket lock
> @@ -3544,7 +3560,8 @@ static void mptcp_release_cb(struct sock *sk)
> 			__mptcp_push_pending(sk, 0);
> 		if (flags & BIT(MPTCP_RETRANSMIT))
> 			__mptcp_retrans(sk);
> -		if ((flags & BIT(MPTCP_DEQUEUE)) && __mptcp_move_skbs(sk)) {
> +		if (!list_empty(&skbs) &&
> +		    __mptcp_move_skbs(sk, &skbs, &delta)) {
> 			/* notify ack seq update */
> 			mptcp_cleanup_rbuf(msk, 0);
> 			sk->sk_data_ready(sk);
> @@ -3552,7 +3569,9 @@ static void mptcp_release_cb(struct sock *sk)
>
> 		cond_resched();
> 		spin_lock_bh(&sk->sk_lock.slock);
> +		list_splice(&skbs, &msk->backlog_list);
> 	}
> +	WRITE_ONCE(msk->backlog_len, msk->backlog_len - delta);

Hi Paolo -

Given the possible multiple calls to __mptcp_move_skbs() and that the 
spinlock is released/reacquired (and the cond_resched) in the middle, 
would it make sense to update msk->backlog_len for each iteration of the 
loop so __mptcp_space() and mptcp_space() don't under-report available 
space and mptcp_cleanup_rbuf() can make incremental progress?

I know we don't want to WRITE_ONCE() more than necessary, but it seems 
like there won't typically be more than one loop iteration. In the cases 
where it does repeat the loop that means data is arriving quickly and 
reporting mptcp_space accurately will be important.

- Mat


>
> 	if (__test_and_clear_bit(MPTCP_CLEAN_UNA, &msk->cb_flags))
> 		__mptcp_clean_una_wakeup(sk);

  reply	other threads:[~2025-10-20 23:32 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-06  8:11 [PATCH v5 mptcp-next 00/10] mptcp: introduce backlog processing Paolo Abeni
2025-10-06  8:12 ` [PATCH v5 mptcp-next 01/10] mptcp: borrow forward memory from subflow Paolo Abeni
2025-10-06  8:12 ` [PATCH v5 mptcp-next 02/10] mptcp: cleanup fallback data fin reception Paolo Abeni
2025-10-06  8:12 ` [PATCH v5 mptcp-next 03/10] mptcp: cleanup fallback dummy mapping generation Paolo Abeni
2025-10-06  8:12 ` [PATCH v5 mptcp-next 04/10] mptcp: fix MSG_PEEK stream corruption Paolo Abeni
2025-10-06  8:12 ` [PATCH v5 mptcp-next 05/10] mptcp: ensure the kernel PM does not take action too late Paolo Abeni
2025-10-06  8:12 ` [PATCH v5 mptcp-next 06/10] mptcp: do not miss early first subflow close event notification Paolo Abeni
2025-10-06  8:12 ` [PATCH v5 mptcp-next 07/10] mptcp: make mptcp_destroy_common() static Paolo Abeni
2025-10-06  8:12 ` [PATCH v5 mptcp-next 08/10] mptcp: drop the __mptcp_data_ready() helper Paolo Abeni
2025-10-06  8:12 ` [PATCH v5 mptcp-next 09/10] mptcp: introduce mptcp-level backlog Paolo Abeni
2025-10-08  3:09   ` Geliang Tang
2025-10-20 19:45   ` Mat Martineau
2025-10-06  8:12 ` [PATCH v5 mptcp-next 10/10] mptcp: leverage the backlog for RX packet processing Paolo Abeni
2025-10-20 23:32   ` Mat Martineau [this message]
2025-10-21 17:21     ` Paolo Abeni
2025-10-21 23:53       ` Mat Martineau
2025-10-06 17:07 ` [PATCH v5 mptcp-next 00/10] mptcp: introduce backlog processing Matthieu Baerts
2025-10-08  3:07   ` Geliang Tang
2025-10-08  7:30     ` Paolo Abeni
2025-10-09  6:54       ` Geliang Tang
2025-10-09  7:52         ` Paolo Abeni
2025-10-09  9:02           ` Geliang Tang
2025-10-09 10:23             ` Paolo Abeni
2025-10-09 13:58               ` Paolo Abeni
2025-10-10  8:21                 ` Paolo Abeni
2025-10-10 12:22                   ` Geliang Tang
2025-10-13  9:07                     ` Geliang Tang
2025-10-13 13:29                       ` Paolo Abeni
2025-10-13 17:07                         ` Paolo Abeni
2025-10-15  9:00                       ` Paolo Abeni
2025-10-17  6:38                         ` Geliang Tang
2025-10-18  0:16                           ` Mat Martineau
2025-10-06 17:43 ` MPTCP CI

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=156d60d8-ee80-49fb-8b8e-d0be56f7a02a@kernel.org \
    --to=martineau@kernel.org \
    --cc=mptcp@lists.linux.dev \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox