All of lore.kernel.org
 help / color / mirror / Atom feed
From: Matthieu Baerts <matttbe@kernel.org>
To: Kalpan Jani <kalpan.jani@mpiricsoftware.com>, mptcp@lists.linux.dev
Cc: shardul.b@mpiricsoftware.com, janak@mpiric.us,
	kalpanjani009@gmail.com, shardulsb08@gmail.com
Subject: Re: [PATCH net v3] mptcp: fix stale skb->sk reference on subflow close
Date: Sun, 31 May 2026 16:11:51 +1000	[thread overview]
Message-ID: <dc26df7b-5a00-4c55-bf5f-a402e079b7ef@kernel.org> (raw)
In-Reply-To: <20260519092243.1242351-1-kalpan.jani@mpiricsoftware.com>

Hi Kalpan,

On 19/05/2026 19:22, Kalpan Jani wrote:
> The backlog list is updated by mptcp_data_ready() under
> mptcp_data_lock(), but the cleanup of references to a closing subflow
> in mptcp_close_ssk() was done after release_sock(ssk) without holding
> that lock.

Is it possible the commit message needs to be updated?

AI assisted review is reporting this:

The commit message states the cleanup "was done after release_sock(ssk)"
Is this description of the old code accurate?

Looking at the pre-patch code, the backlog traversal in mptcp_close_ssk
ran before __mptcp_close_ssk() was called. __mptcp_close_ssk() is where
lock_sock_nested(ssk) happens. The traversal was never preceded by
lock_sock(ssk) nor followed by release_sock(ssk) - it simply ran with
neither ssk lock nor mptcp_data_lock held.

The phrase "after release_sock(ssk)" does not match the actual old code
structure. The correct statement would be: "was done without holding the
ssk lock or mptcp_data_lock, before __mptcp_close_ssk() acquired the ssk
lock."

> Once release_sock(ssk) returns, the RX path can acquire the ssk lock,
> call mptcp_data_ready(), and enqueue a new skb under mptcp_data_lock()
> before the close path reaches its list_for_each traversal. That skb is
> missed by cleanup, leaving skb->sk pointing to the freed ssk.

AI assisted review is reporting this:

The race scenario is real, but is the ordering description accurate?

In the old code, the list_for_each traversal happened before
lock_sock(ssk) (inside mptcp_close_ssk(), before calling
__mptcp_close_ssk()). There was no prior release_sock(ssk) to trigger
the race.

The actual race is: the traversal runs without mptcp_data_lock, so
concurrent softirq processing on another CPU can call subflow_data_ready
-> mptcp_data_ready -> __mptcp_add_backlog (under mptcp_data_lock) while
the traversal is in progress. The traversal can miss newly-added entries
or worse, encounter list corruption. The bug is real, but the mechanism
description is inaccurate.

> Fix this by moving the backlog cleanup into __mptcp_close_ssk(), after
> subflow->closing is set to 1 and while the ssk lock is still held,
> serialized under mptcp_data_lock(). The cleanup runs only on the push
> path (MPTCP_CF_PUSH), where backlog references accumulate; on other
> teardown paths the caller already handles cleanup.
> 
> With both locks held simultaneously, any concurrent mptcp_data_ready()
> either completes its enqueue before the purge runs and gets caught, or
> observes closing=1 while the ssk lock is still held and bails without
> enqueuing. After mptcp_data_unlock(), no new skbs can be enqueued.
> The cleanup is exhaustive.
> 
> Remove the unprotected traversal from mptcp_close_ssk() entirely.
> 
> Tested with the MPTCP kernel selftests on the patched kernel:
> - tools/testing/selftests/net/mptcp/mptcp_join.sh: all tests pass
> - tools/testing/selftests/net/mptcp/mptcp_connect.sh: 68/68 pass

Do not forget about the Fixes tag.

> Suggested-by: Paolo Abeni <pabeni@redhat.com>
> Reported-by: syzkaller <syzkaller@googlegroups.com>
> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/621
> Signed-off-by: Kalpan Jani <kalpan.jani@mpiricsoftware.com>
> ---
> v3: follow Paolo's suggestion exactly — use mptcp_cleanup_ssk_backlog() helper under MPTCP_CF_PUSH condition only; remove inline loop duplication and second call from !dispose_it path.
> v2: moved cleanup into __mptcp_close_ssk() but duplicated logic and added redundant second call; incorrect approach.
> v1: incorrect race analysis around subflow->closing flag.
> 
>  net/mptcp/protocol.c | 33 +++++++++++++++++++--------------
>  1 file changed, 19 insertions(+), 14 deletions(-)
> 
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index 718e910ff..149f816fe 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -2527,6 +2527,22 @@ static void __mptcp_subflow_disconnect(struct sock *ssk,
>  	}
>  }
>  
> +static void mptcp_cleanup_ssk_backlog(struct sock *sk, struct sock *ssk)
> +{
> +	struct mptcp_sock *msk = mptcp_sk(sk);
> +	struct sk_buff *skb;
> +
> +	mptcp_data_lock(sk);
> +	list_for_each_entry(skb, &msk->backlog_list, list) {
> +		if (skb->sk != ssk)
> +			continue;
> +
> +		atomic_sub(skb->truesize, &skb->sk->sk_rmem_alloc);
> +		skb->sk = NULL;
> +	}
> +	mptcp_data_unlock(sk);
> +}
> +
>  /* subflow sockets can be either outgoing (connect) or incoming
>   * (accept).
>   *
> @@ -2550,6 +2566,9 @@ static void __mptcp_close_ssk(struct sock *sk, struct sock *ssk,
>  	lock_sock_nested(ssk, SINGLE_DEPTH_NESTING);
>  	subflow->closing = 1;
>  
> +	if (flags & MPTCP_CF_PUSH)
> +		mptcp_cleanup_ssk_backlog(sk, ssk);
> +
>  	/* Borrow the fwd allocated page left-over; fwd memory for the subflow
>  	 * could be negative at this point, but will be reach zero soon - when
>  	 * the data allocated using such fragment will be freed.
> @@ -2641,9 +2660,6 @@ static void __mptcp_close_ssk(struct sock *sk, struct sock *ssk,
>  void mptcp_close_ssk(struct sock *sk, struct sock *ssk,
>  		     struct mptcp_subflow_context *subflow)
>  {
> -	struct mptcp_sock *msk = mptcp_sk(sk);
> -	struct sk_buff *skb;
> -
>  	/* The first subflow can already be closed or disconnected */
>  	if (subflow->close_event_done || READ_ONCE(subflow->local_id) < 0)
>  		return;
> @@ -2653,17 +2669,6 @@ void mptcp_close_ssk(struct sock *sk, struct sock *ssk,
>  	if (sk->sk_state == TCP_ESTABLISHED)
>  		mptcp_event(MPTCP_EVENT_SUB_CLOSED, mptcp_sk(sk), ssk, GFP_KERNEL);
>  
> -	/* Remove any reference from the backlog to this ssk; backlog skbs consume
> -	 * space in the msk receive queue, no need to touch sk->sk_rmem_alloc
> -	 */
> -	list_for_each_entry(skb, &msk->backlog_list, list) {
> -		if (skb->sk != ssk)
> -			continue;
> -
> -		atomic_sub(skb->truesize, &skb->sk->sk_rmem_alloc);
> -		skb->sk = NULL;
> -	}
> -
>  	/* subflow aborted before reaching the fully_established status
>  	 * attempt the creation of the next subflow
>  	 */

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


  parent reply	other threads:[~2026-05-31  6:11 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-19  9:22 [PATCH net v3] mptcp: fix stale skb->sk reference on subflow close Kalpan Jani
2026-05-19 10:34 ` MPTCP CI
2026-05-27  5:49 ` Kalpan Jani
2026-05-27  9:40   ` [PATCH " Paolo Abeni
2026-05-27 11:31     ` Matthieu Baerts
2026-05-29 10:38       ` Paolo Abeni
2026-05-31  6:14         ` Matthieu Baerts
2026-05-31  6:11 ` Matthieu Baerts [this message]
2026-06-10 10:45   ` Kalpan Jani

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=dc26df7b-5a00-4c55-bf5f-a402e079b7ef@kernel.org \
    --to=matttbe@kernel.org \
    --cc=janak@mpiric.us \
    --cc=kalpan.jani@mpiricsoftware.com \
    --cc=kalpanjani009@gmail.com \
    --cc=mptcp@lists.linux.dev \
    --cc=shardul.b@mpiricsoftware.com \
    --cc=shardulsb08@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.