All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net v3] mptcp: fix stale skb->sk reference on subflow close
@ 2026-05-19  9:22 Kalpan Jani
  2026-05-19 10:34 ` MPTCP CI
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Kalpan Jani @ 2026-05-19  9:22 UTC (permalink / raw)
  To: mptcp; +Cc: shardul.b, janak, kalpanjani009, shardulsb08

The backlog list is updated by mptcp_data_ready() under
mptcp_data_lock(), but the cleanup of references to a closing subflow
in mptcp_close_ssk() was done after release_sock(ssk) without holding
that lock.

Once release_sock(ssk) returns, the RX path can acquire the ssk lock,
call mptcp_data_ready(), and enqueue a new skb under mptcp_data_lock()
before the close path reaches its list_for_each traversal. That skb is
missed by cleanup, leaving skb->sk pointing to the freed ssk.

Fix this by moving the backlog cleanup into __mptcp_close_ssk(), after
subflow->closing is set to 1 and while the ssk lock is still held,
serialized under mptcp_data_lock(). The cleanup runs only on the push
path (MPTCP_CF_PUSH), where backlog references accumulate; on other
teardown paths the caller already handles cleanup.

With both locks held simultaneously, any concurrent mptcp_data_ready()
either completes its enqueue before the purge runs and gets caught, or
observes closing=1 while the ssk lock is still held and bails without
enqueuing. After mptcp_data_unlock(), no new skbs can be enqueued.
The cleanup is exhaustive.

Remove the unprotected traversal from mptcp_close_ssk() entirely.

Tested with the MPTCP kernel selftests on the patched kernel:
- tools/testing/selftests/net/mptcp/mptcp_join.sh: all tests pass
- tools/testing/selftests/net/mptcp/mptcp_connect.sh: 68/68 pass

Suggested-by: Paolo Abeni <pabeni@redhat.com>
Reported-by: syzkaller <syzkaller@googlegroups.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/621
Signed-off-by: Kalpan Jani <kalpan.jani@mpiricsoftware.com>
---
v3: follow Paolo's suggestion exactly — use mptcp_cleanup_ssk_backlog() helper under MPTCP_CF_PUSH condition only; remove inline loop duplication and second call from !dispose_it path.
v2: moved cleanup into __mptcp_close_ssk() but duplicated logic and added redundant second call; incorrect approach.
v1: incorrect race analysis around subflow->closing flag.

 net/mptcp/protocol.c | 33 +++++++++++++++++++--------------
 1 file changed, 19 insertions(+), 14 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 718e910ff..149f816fe 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -2527,6 +2527,22 @@ static void __mptcp_subflow_disconnect(struct sock *ssk,
 	}
 }
 
+static void mptcp_cleanup_ssk_backlog(struct sock *sk, struct sock *ssk)
+{
+	struct mptcp_sock *msk = mptcp_sk(sk);
+	struct sk_buff *skb;
+
+	mptcp_data_lock(sk);
+	list_for_each_entry(skb, &msk->backlog_list, list) {
+		if (skb->sk != ssk)
+			continue;
+
+		atomic_sub(skb->truesize, &skb->sk->sk_rmem_alloc);
+		skb->sk = NULL;
+	}
+	mptcp_data_unlock(sk);
+}
+
 /* subflow sockets can be either outgoing (connect) or incoming
  * (accept).
  *
@@ -2550,6 +2566,9 @@ static void __mptcp_close_ssk(struct sock *sk, struct sock *ssk,
 	lock_sock_nested(ssk, SINGLE_DEPTH_NESTING);
 	subflow->closing = 1;
 
+	if (flags & MPTCP_CF_PUSH)
+		mptcp_cleanup_ssk_backlog(sk, ssk);
+
 	/* Borrow the fwd allocated page left-over; fwd memory for the subflow
 	 * could be negative at this point, but will be reach zero soon - when
 	 * the data allocated using such fragment will be freed.
@@ -2641,9 +2660,6 @@ static void __mptcp_close_ssk(struct sock *sk, struct sock *ssk,
 void mptcp_close_ssk(struct sock *sk, struct sock *ssk,
 		     struct mptcp_subflow_context *subflow)
 {
-	struct mptcp_sock *msk = mptcp_sk(sk);
-	struct sk_buff *skb;
-
 	/* The first subflow can already be closed or disconnected */
 	if (subflow->close_event_done || READ_ONCE(subflow->local_id) < 0)
 		return;
@@ -2653,17 +2669,6 @@ void mptcp_close_ssk(struct sock *sk, struct sock *ssk,
 	if (sk->sk_state == TCP_ESTABLISHED)
 		mptcp_event(MPTCP_EVENT_SUB_CLOSED, mptcp_sk(sk), ssk, GFP_KERNEL);
 
-	/* Remove any reference from the backlog to this ssk; backlog skbs consume
-	 * space in the msk receive queue, no need to touch sk->sk_rmem_alloc
-	 */
-	list_for_each_entry(skb, &msk->backlog_list, list) {
-		if (skb->sk != ssk)
-			continue;
-
-		atomic_sub(skb->truesize, &skb->sk->sk_rmem_alloc);
-		skb->sk = NULL;
-	}
-
 	/* subflow aborted before reaching the fully_established status
 	 * attempt the creation of the next subflow
 	 */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH net v3] mptcp: fix stale skb->sk reference on subflow close
  2026-05-19  9:22 [PATCH net v3] mptcp: fix stale skb->sk reference on subflow close Kalpan Jani
@ 2026-05-19 10:34 ` MPTCP CI
  2026-05-27  5:49 ` Kalpan Jani
  2026-05-31  6:11 ` Matthieu Baerts
  2 siblings, 0 replies; 9+ messages in thread
From: MPTCP CI @ 2026-05-19 10:34 UTC (permalink / raw)
  To: Kalpan Jani; +Cc: mptcp

Hi Kalpan,

Thank you for your modifications, that's great!

Our CI did some validations and here is its report:

- KVM Validation: normal (except selftest_mptcp_join): Success! ✅
- KVM Validation: normal (only selftest_mptcp_join): Success! ✅
- KVM Validation: debug (except selftest_mptcp_join): Success! ✅
- KVM Validation: debug (only selftest_mptcp_join): Success! ✅
- KVM Validation: btf-normal (only bpftest_all): Success! ✅
- KVM Validation: btf-debug (only bpftest_all): Success! ✅
- Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/26089006573

Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/473e5e16d4df
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=1097212


If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:

    $ cd [kernel source code]
    $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
        --pull always mptcp/mptcp-upstream-virtme-docker:latest \
        auto-normal

For more details:

    https://github.com/multipath-tcp/mptcp-upstream-virtme-docker


Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)

Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re:[PATCH net v3] mptcp: fix stale skb->sk reference on subflow close
  2026-05-19  9:22 [PATCH net v3] mptcp: fix stale skb->sk reference on subflow close Kalpan Jani
  2026-05-19 10:34 ` MPTCP CI
@ 2026-05-27  5:49 ` Kalpan Jani
  2026-05-27  9:40   ` [PATCH " Paolo Abeni
  2026-05-31  6:11 ` Matthieu Baerts
  2 siblings, 1 reply; 9+ messages in thread
From: Kalpan Jani @ 2026-05-27  5:49 UTC (permalink / raw)
  To: mptcp, matttbe, pabeni; +Cc: shardul.b, janak, kalpanjani009

Hi all,
Gentle ping on this patch sent on 19 May 2026 — I haven't seen any review feedback yet. If anyone has had a chance to look, I'd appreciate comments; otherwise I'm happy to rebase or resend if it slipped through.
Thanks,
Kalpan Jani


From: Kalpan Jani <kalpan.jani@mpiricsoftware.com>
To: <mptcp@lists.linux.dev>
Cc: <shardul.b@mpiricsoftware.com>, <janak@mpiric.us>, <kalpanjani009@gmail.com>, <shardulsb08@gmail.com>
Date: Tue, 19 May 2026 14:52:43 +0530
Subject: [PATCH net v3] mptcp: fix stale skb->sk reference on subflow close

 > The backlog list is updated by mptcp_data_ready() under
 > mptcp_data_lock(), but the cleanup of references to a closing subflow
 > in mptcp_close_ssk() was done after release_sock(ssk) without holding
 > that lock.
 > 
 > Once release_sock(ssk) returns, the RX path can acquire the ssk lock,
 > call mptcp_data_ready(), and enqueue a new skb under mptcp_data_lock()
 > before the close path reaches its list_for_each traversal. That skb is
 > missed by cleanup, leaving skb->sk pointing to the freed ssk.
 > 
 > Fix this by moving the backlog cleanup into __mptcp_close_ssk(), after
 > subflow->closing is set to 1 and while the ssk lock is still held,
 > serialized under mptcp_data_lock(). The cleanup runs only on the push
 > path (MPTCP_CF_PUSH), where backlog references accumulate; on other
 > teardown paths the caller already handles cleanup.
 > 
 > With both locks held simultaneously, any concurrent mptcp_data_ready()
 > either completes its enqueue before the purge runs and gets caught, or
 > observes closing=1 while the ssk lock is still held and bails without
 > enqueuing. After mptcp_data_unlock(), no new skbs can be enqueued.
 > The cleanup is exhaustive.
 > 
 > Remove the unprotected traversal from mptcp_close_ssk() entirely.
 > 
 > Tested with the MPTCP kernel selftests on the patched kernel:
 > - tools/testing/selftests/net/mptcp/mptcp_join.sh: all tests pass
 > - tools/testing/selftests/net/mptcp/mptcp_connect.sh: 68/68 pass
 > 
 > Suggested-by: Paolo Abeni <pabeni@redhat.com>
 > Reported-by: syzkaller <syzkaller@googlegroups.com>
 > Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/621
 > Signed-off-by: Kalpan Jani <kalpan.jani@mpiricsoftware.com>
 > ---
 > v3: follow Paolo's suggestion exactly — use mptcp_cleanup_ssk_backlog() helper under MPTCP_CF_PUSH condition only; remove inline loop duplication and second call from !dispose_it path.
 > v2: moved cleanup into __mptcp_close_ssk() but duplicated logic and added redundant second call; incorrect approach.
 > v1: incorrect race analysis around subflow->closing flag.
 > 
 >  net/mptcp/protocol.c | 33 +++++++++++++++++++--------------
 >  1 file changed, 19 insertions(+), 14 deletions(-)
 > 
 > diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
 > index 718e910ff..149f816fe 100644
 > --- a/net/mptcp/protocol.c
 > +++ b/net/mptcp/protocol.c
 > @@ -2527,6 +2527,22 @@ static void __mptcp_subflow_disconnect(struct sock *ssk,
 >      }
 >  }
 >  
 > +static void mptcp_cleanup_ssk_backlog(struct sock *sk, struct sock *ssk)
 > +{
 > +    struct mptcp_sock *msk = mptcp_sk(sk);
 > +    struct sk_buff *skb;
 > +
 > +    mptcp_data_lock(sk);
 > +    list_for_each_entry(skb, &msk->backlog_list, list) {
 > +        if (skb->sk != ssk)
 > +            continue;
 > +
 > +        atomic_sub(skb->truesize, &skb->sk->sk_rmem_alloc);
 > +        skb->sk = NULL;
 > +    }
 > +    mptcp_data_unlock(sk);
 > +}
 > +
 >  /* subflow sockets can be either outgoing (connect) or incoming
 >   * (accept).
 >   *
 > @@ -2550,6 +2566,9 @@ static void __mptcp_close_ssk(struct sock *sk, struct sock *ssk,
 >      lock_sock_nested(ssk, SINGLE_DEPTH_NESTING);
 >      subflow->closing = 1;
 >  
 > +    if (flags & MPTCP_CF_PUSH)
 > +        mptcp_cleanup_ssk_backlog(sk, ssk);
 > +
 >      /* Borrow the fwd allocated page left-over; fwd memory for the subflow
 >       * could be negative at this point, but will be reach zero soon - when
 >       * the data allocated using such fragment will be freed.
 > @@ -2641,9 +2660,6 @@ static void __mptcp_close_ssk(struct sock *sk, struct sock *ssk,
 >  void mptcp_close_ssk(struct sock *sk, struct sock *ssk,
 >               struct mptcp_subflow_context *subflow)
 >  {
 > -    struct mptcp_sock *msk = mptcp_sk(sk);
 > -    struct sk_buff *skb;
 > -
 >      /* The first subflow can already be closed or disconnected */
 >      if (subflow->close_event_done || READ_ONCE(subflow->local_id) < 0)
 >          return;
 > @@ -2653,17 +2669,6 @@ void mptcp_close_ssk(struct sock *sk, struct sock *ssk,
 >      if (sk->sk_state == TCP_ESTABLISHED)
 >          mptcp_event(MPTCP_EVENT_SUB_CLOSED, mptcp_sk(sk), ssk, GFP_KERNEL);
 >  
 > -    /* Remove any reference from the backlog to this ssk; backlog skbs consume
 > -     * space in the msk receive queue, no need to touch sk->sk_rmem_alloc
 > -     */
 > -    list_for_each_entry(skb, &msk->backlog_list, list) {
 > -        if (skb->sk != ssk)
 > -            continue;
 > -
 > -        atomic_sub(skb->truesize, &skb->sk->sk_rmem_alloc);
 > -        skb->sk = NULL;
 > -    }
 > -
 >      /* subflow aborted before reaching the fully_established status
 >       * attempt the creation of the next subflow
 >       */
 > -- 
 > 2.43.0
 > 
 > 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net v3] mptcp: fix stale skb->sk reference on subflow close
  2026-05-27  5:49 ` Kalpan Jani
@ 2026-05-27  9:40   ` Paolo Abeni
  2026-05-27 11:31     ` Matthieu Baerts
  0 siblings, 1 reply; 9+ messages in thread
From: Paolo Abeni @ 2026-05-27  9:40 UTC (permalink / raw)
  To: Kalpan Jani, mptcp, matttbe; +Cc: shardul.b, janak, kalpanjani009

On 5/27/26 7:49 AM, Kalpan Jani wrote:
> Gentle ping on this patch sent on 19 May 2026 — I haven't seen any review feedback yet. If anyone has had a chance to look, I'd appreciate comments; otherwise I'm happy to rebase or resend if it slipped through.

Note that maintainers are under an ever increasing influx of AI
generated contents that make processing incoming patches more
challenging every day.

FTR this revision LGTM.

/P



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net v3] mptcp: fix stale skb->sk reference on subflow close
  2026-05-27  9:40   ` [PATCH " Paolo Abeni
@ 2026-05-27 11:31     ` Matthieu Baerts
  2026-05-29 10:38       ` Paolo Abeni
  0 siblings, 1 reply; 9+ messages in thread
From: Matthieu Baerts @ 2026-05-27 11:31 UTC (permalink / raw)
  To: Paolo Abeni, Kalpan Jani, mptcp; +Cc: shardul.b, janak, kalpanjani009

Hi Paolo,

Thank you for your review!

On 27/05/2026 19:40, Paolo Abeni wrote:
> On 5/27/26 7:49 AM, Kalpan Jani wrote:
>> Gentle ping on this patch sent on 19 May 2026 — I haven't seen any review feedback yet. If anyone has had a chance to look, I'd appreciate comments; otherwise I'm happy to rebase or resend if it slipped through.
> 
> Note that maintainers are under an ever increasing influx of AI
> generated contents that make processing incoming patches more
> challenging every day.

+1

> FTR this revision LGTM.

Just to be sure, does it mean that Sashiko was wrong?

 https://sashiko.dev/#/patchset/20260519092243.1242351-1-kalpan.jani%40mpiricsoftware.com

If yes, can I convert this "LGTM" as an Acked-by? :)

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net v3] mptcp: fix stale skb->sk reference on subflow close
  2026-05-27 11:31     ` Matthieu Baerts
@ 2026-05-29 10:38       ` Paolo Abeni
  2026-05-31  6:14         ` Matthieu Baerts
  0 siblings, 1 reply; 9+ messages in thread
From: Paolo Abeni @ 2026-05-29 10:38 UTC (permalink / raw)
  To: Matthieu Baerts, Kalpan Jani, mptcp; +Cc: shardul.b, janak, kalpanjani009

On 5/27/26 1:31 PM, Matthieu Baerts wrote:
> Hi Paolo,
> 
> Thank you for your review!
> 
> On 27/05/2026 19:40, Paolo Abeni wrote:
>> On 5/27/26 7:49 AM, Kalpan Jani wrote:
>>> Gentle ping on this patch sent on 19 May 2026 — I haven't seen any review feedback yet. If anyone has had a chance to look, I'd appreciate comments; otherwise I'm happy to rebase or resend if it slipped through.
>>
>> Note that maintainers are under an ever increasing influx of AI
>> generated contents that make processing incoming patches more
>> challenging every day.
> 
> +1
> 
>> FTR this revision LGTM.
> 
> Just to be sure, does it mean that Sashiko was wrong?
> 
>  https://sashiko.dev/#/patchset/20260519092243.1242351-1-kalpan.jani%40mpiricsoftware.com

AFAICS, yes: the msk socket is set to TCP_CLOSE status before calling
mptcp_backlog_purge(): __mptcp_add_backlog() can't add anything to the
backlog.

> If yes, can I convert this "LGTM" as an Acked-by? :)

There is already my Suggested-by: tag, I think the latter would be
redundant?!?

/P


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net v3] mptcp: fix stale skb->sk reference on subflow close
  2026-05-19  9:22 [PATCH net v3] mptcp: fix stale skb->sk reference on subflow close Kalpan Jani
  2026-05-19 10:34 ` MPTCP CI
  2026-05-27  5:49 ` Kalpan Jani
@ 2026-05-31  6:11 ` Matthieu Baerts
  2026-06-10 10:45   ` Kalpan Jani
  2 siblings, 1 reply; 9+ messages in thread
From: Matthieu Baerts @ 2026-05-31  6:11 UTC (permalink / raw)
  To: Kalpan Jani, mptcp; +Cc: shardul.b, janak, kalpanjani009, shardulsb08

Hi Kalpan,

On 19/05/2026 19:22, Kalpan Jani wrote:
> The backlog list is updated by mptcp_data_ready() under
> mptcp_data_lock(), but the cleanup of references to a closing subflow
> in mptcp_close_ssk() was done after release_sock(ssk) without holding
> that lock.

Is it possible the commit message needs to be updated?

AI assisted review is reporting this:

The commit message states the cleanup "was done after release_sock(ssk)"
Is this description of the old code accurate?

Looking at the pre-patch code, the backlog traversal in mptcp_close_ssk
ran before __mptcp_close_ssk() was called. __mptcp_close_ssk() is where
lock_sock_nested(ssk) happens. The traversal was never preceded by
lock_sock(ssk) nor followed by release_sock(ssk) - it simply ran with
neither ssk lock nor mptcp_data_lock held.

The phrase "after release_sock(ssk)" does not match the actual old code
structure. The correct statement would be: "was done without holding the
ssk lock or mptcp_data_lock, before __mptcp_close_ssk() acquired the ssk
lock."

> Once release_sock(ssk) returns, the RX path can acquire the ssk lock,
> call mptcp_data_ready(), and enqueue a new skb under mptcp_data_lock()
> before the close path reaches its list_for_each traversal. That skb is
> missed by cleanup, leaving skb->sk pointing to the freed ssk.

AI assisted review is reporting this:

The race scenario is real, but is the ordering description accurate?

In the old code, the list_for_each traversal happened before
lock_sock(ssk) (inside mptcp_close_ssk(), before calling
__mptcp_close_ssk()). There was no prior release_sock(ssk) to trigger
the race.

The actual race is: the traversal runs without mptcp_data_lock, so
concurrent softirq processing on another CPU can call subflow_data_ready
-> mptcp_data_ready -> __mptcp_add_backlog (under mptcp_data_lock) while
the traversal is in progress. The traversal can miss newly-added entries
or worse, encounter list corruption. The bug is real, but the mechanism
description is inaccurate.

> Fix this by moving the backlog cleanup into __mptcp_close_ssk(), after
> subflow->closing is set to 1 and while the ssk lock is still held,
> serialized under mptcp_data_lock(). The cleanup runs only on the push
> path (MPTCP_CF_PUSH), where backlog references accumulate; on other
> teardown paths the caller already handles cleanup.
> 
> With both locks held simultaneously, any concurrent mptcp_data_ready()
> either completes its enqueue before the purge runs and gets caught, or
> observes closing=1 while the ssk lock is still held and bails without
> enqueuing. After mptcp_data_unlock(), no new skbs can be enqueued.
> The cleanup is exhaustive.
> 
> Remove the unprotected traversal from mptcp_close_ssk() entirely.
> 
> Tested with the MPTCP kernel selftests on the patched kernel:
> - tools/testing/selftests/net/mptcp/mptcp_join.sh: all tests pass
> - tools/testing/selftests/net/mptcp/mptcp_connect.sh: 68/68 pass

Do not forget about the Fixes tag.

> Suggested-by: Paolo Abeni <pabeni@redhat.com>
> Reported-by: syzkaller <syzkaller@googlegroups.com>
> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/621
> Signed-off-by: Kalpan Jani <kalpan.jani@mpiricsoftware.com>
> ---
> v3: follow Paolo's suggestion exactly — use mptcp_cleanup_ssk_backlog() helper under MPTCP_CF_PUSH condition only; remove inline loop duplication and second call from !dispose_it path.
> v2: moved cleanup into __mptcp_close_ssk() but duplicated logic and added redundant second call; incorrect approach.
> v1: incorrect race analysis around subflow->closing flag.
> 
>  net/mptcp/protocol.c | 33 +++++++++++++++++++--------------
>  1 file changed, 19 insertions(+), 14 deletions(-)
> 
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index 718e910ff..149f816fe 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -2527,6 +2527,22 @@ static void __mptcp_subflow_disconnect(struct sock *ssk,
>  	}
>  }
>  
> +static void mptcp_cleanup_ssk_backlog(struct sock *sk, struct sock *ssk)
> +{
> +	struct mptcp_sock *msk = mptcp_sk(sk);
> +	struct sk_buff *skb;
> +
> +	mptcp_data_lock(sk);
> +	list_for_each_entry(skb, &msk->backlog_list, list) {
> +		if (skb->sk != ssk)
> +			continue;
> +
> +		atomic_sub(skb->truesize, &skb->sk->sk_rmem_alloc);
> +		skb->sk = NULL;
> +	}
> +	mptcp_data_unlock(sk);
> +}
> +
>  /* subflow sockets can be either outgoing (connect) or incoming
>   * (accept).
>   *
> @@ -2550,6 +2566,9 @@ static void __mptcp_close_ssk(struct sock *sk, struct sock *ssk,
>  	lock_sock_nested(ssk, SINGLE_DEPTH_NESTING);
>  	subflow->closing = 1;
>  
> +	if (flags & MPTCP_CF_PUSH)
> +		mptcp_cleanup_ssk_backlog(sk, ssk);
> +
>  	/* Borrow the fwd allocated page left-over; fwd memory for the subflow
>  	 * could be negative at this point, but will be reach zero soon - when
>  	 * the data allocated using such fragment will be freed.
> @@ -2641,9 +2660,6 @@ static void __mptcp_close_ssk(struct sock *sk, struct sock *ssk,
>  void mptcp_close_ssk(struct sock *sk, struct sock *ssk,
>  		     struct mptcp_subflow_context *subflow)
>  {
> -	struct mptcp_sock *msk = mptcp_sk(sk);
> -	struct sk_buff *skb;
> -
>  	/* The first subflow can already be closed or disconnected */
>  	if (subflow->close_event_done || READ_ONCE(subflow->local_id) < 0)
>  		return;
> @@ -2653,17 +2669,6 @@ void mptcp_close_ssk(struct sock *sk, struct sock *ssk,
>  	if (sk->sk_state == TCP_ESTABLISHED)
>  		mptcp_event(MPTCP_EVENT_SUB_CLOSED, mptcp_sk(sk), ssk, GFP_KERNEL);
>  
> -	/* Remove any reference from the backlog to this ssk; backlog skbs consume
> -	 * space in the msk receive queue, no need to touch sk->sk_rmem_alloc
> -	 */
> -	list_for_each_entry(skb, &msk->backlog_list, list) {
> -		if (skb->sk != ssk)
> -			continue;
> -
> -		atomic_sub(skb->truesize, &skb->sk->sk_rmem_alloc);
> -		skb->sk = NULL;
> -	}
> -
>  	/* subflow aborted before reaching the fully_established status
>  	 * attempt the creation of the next subflow
>  	 */

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net v3] mptcp: fix stale skb->sk reference on subflow close
  2026-05-29 10:38       ` Paolo Abeni
@ 2026-05-31  6:14         ` Matthieu Baerts
  0 siblings, 0 replies; 9+ messages in thread
From: Matthieu Baerts @ 2026-05-31  6:14 UTC (permalink / raw)
  To: Paolo Abeni, Kalpan Jani, mptcp; +Cc: shardul.b, janak, kalpanjani009

Hi Paolo,

On 29/05/2026 20:38, Paolo Abeni wrote:
> On 5/27/26 1:31 PM, Matthieu Baerts wrote:
>> Hi Paolo,
>>
>> Thank you for your review!
>>
>> On 27/05/2026 19:40, Paolo Abeni wrote:
>>> On 5/27/26 7:49 AM, Kalpan Jani wrote:
>>>> Gentle ping on this patch sent on 19 May 2026 — I haven't seen any review feedback yet. If anyone has had a chance to look, I'd appreciate comments; otherwise I'm happy to rebase or resend if it slipped through.
>>>
>>> Note that maintainers are under an ever increasing influx of AI
>>> generated contents that make processing incoming patches more
>>> challenging every day.
>>
>> +1
>>
>>> FTR this revision LGTM.
>>
>> Just to be sure, does it mean that Sashiko was wrong?
>>
>>  https://sashiko.dev/#/patchset/20260519092243.1242351-1-kalpan.jani%40mpiricsoftware.com
> 
> AFAICS, yes: the msk socket is set to TCP_CLOSE status before calling
> mptcp_backlog_purge(): __mptcp_add_backlog() can't add anything to the
> backlog.

Thank you for having checked!

The code then looks good, but it seems the commit message is no longer
correct with the last version (and a Fixes tag is still missing). I sent
a dedicated message for that.

>> If yes, can I convert this "LGTM" as an Acked-by? :)
> 
> There is already my Suggested-by: tag, I think the latter would be
> redundant?!?

Personally, I think it is clearer with an extra Acked-by, just to say
that what you had in mind (the suggestion) is what is actually done in
the patch. But if you prefer without, fine by me :)

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net v3] mptcp: fix stale skb->sk reference on subflow close
  2026-05-31  6:11 ` Matthieu Baerts
@ 2026-06-10 10:45   ` Kalpan Jani
  0 siblings, 0 replies; 9+ messages in thread
From: Kalpan Jani @ 2026-06-10 10:45 UTC (permalink / raw)
  To: Matthieu Baerts; +Cc: mptcp, shardul.b, janak, kalpanjani009, shardulsb08

Hi Matt, Paolo,

Thanks both for the review and for catching the inaccurate commit message. I've just sent v4:

https://lore.kernel.org/all/20260601083010.924938-1-kalpan.jani@mpiricsoftware.com/

Changes since v3:
- Rewrote the commit message: the old backlog traversal in mptcp_close_ssk() ran before __mptcp_close_ssk() took the ssk lock, holding neither the ssk lock nor mptcp_data_lock(). The race is a concurrent softirq RX path (subflow_data_ready() -> mptcp_data_ready() -> __mptcp_add_backlog(), under mptcp_data_lock()) adding to the backlog mid-traversal, not anything tied to a release_sock(ssk).
- Added the missing Fixes tag (ee458a3f314e).

The diff is otherwise unchanged from v3. I kept your Suggested-by, Paolo; happy to add an Acked-by too if you'd rather it be explicit.

Cheers,
Kalpan Jani

From: Matthieu Baerts <matttbe@kernel.org>
To: "Kalpan Jani"<kalpan.jani@mpiricsoftware.com>, <mptcp@lists.linux.dev>
Cc: <shardul.b@mpiricsoftware.com>, <janak@mpiric.us>, <kalpanjani009@gmail.com>, <shardulsb08@gmail.com>
Date: Sun, 31 May 2026 11:41:51 +0530
Subject: Re: [PATCH net v3] mptcp: fix stale skb->sk reference on subflow close

 > Hi Kalpan,
 > 
 > On 19/05/2026 19:22, Kalpan Jani wrote:
 > > The backlog list is updated by mptcp_data_ready() under
 > > mptcp_data_lock(), but the cleanup of references to a closing subflow
 > > in mptcp_close_ssk() was done after release_sock(ssk) without holding
 > > that lock.
 > 
 > Is it possible the commit message needs to be updated?
 > 
 > AI assisted review is reporting this:
 > 
 > The commit message states the cleanup "was done after release_sock(ssk)"
 > Is this description of the old code accurate?
 > 
 > Looking at the pre-patch code, the backlog traversal in mptcp_close_ssk
 > ran before __mptcp_close_ssk() was called. __mptcp_close_ssk() is where
 > lock_sock_nested(ssk) happens. The traversal was never preceded by
 > lock_sock(ssk) nor followed by release_sock(ssk) - it simply ran with
 > neither ssk lock nor mptcp_data_lock held.
 > 
 > The phrase "after release_sock(ssk)" does not match the actual old code
 > structure. The correct statement would be: "was done without holding the
 > ssk lock or mptcp_data_lock, before __mptcp_close_ssk() acquired the ssk
 > lock."
 > 
 > > Once release_sock(ssk) returns, the RX path can acquire the ssk lock,
 > > call mptcp_data_ready(), and enqueue a new skb under mptcp_data_lock()
 > > before the close path reaches its list_for_each traversal. That skb is
 > > missed by cleanup, leaving skb->sk pointing to the freed ssk.
 > 
 > AI assisted review is reporting this:
 > 
 > The race scenario is real, but is the ordering description accurate?
 > 
 > In the old code, the list_for_each traversal happened before
 > lock_sock(ssk) (inside mptcp_close_ssk(), before calling
 > __mptcp_close_ssk()). There was no prior release_sock(ssk) to trigger
 > the race.
 > 
 > The actual race is: the traversal runs without mptcp_data_lock, so
 > concurrent softirq processing on another CPU can call subflow_data_ready
 > -> mptcp_data_ready -> __mptcp_add_backlog (under mptcp_data_lock) while
 > the traversal is in progress. The traversal can miss newly-added entries
 > or worse, encounter list corruption. The bug is real, but the mechanism
 > description is inaccurate.
 > 
 > > Fix this by moving the backlog cleanup into __mptcp_close_ssk(), after
 > > subflow->closing is set to 1 and while the ssk lock is still held,
 > > serialized under mptcp_data_lock(). The cleanup runs only on the push
 > > path (MPTCP_CF_PUSH), where backlog references accumulate; on other
 > > teardown paths the caller already handles cleanup.
 > > 
 > > With both locks held simultaneously, any concurrent mptcp_data_ready()
 > > either completes its enqueue before the purge runs and gets caught, or
 > > observes closing=1 while the ssk lock is still held and bails without
 > > enqueuing. After mptcp_data_unlock(), no new skbs can be enqueued.
 > > The cleanup is exhaustive.
 > > 
 > > Remove the unprotected traversal from mptcp_close_ssk() entirely.
 > > 
 > > Tested with the MPTCP kernel selftests on the patched kernel:
 > > - tools/testing/selftests/net/mptcp/mptcp_join.sh: all tests pass
 > > - tools/testing/selftests/net/mptcp/mptcp_connect.sh: 68/68 pass
 > 
 > Do not forget about the Fixes tag.
 > 
 > > Suggested-by: Paolo Abeni <pabeni@redhat.com>
 > > Reported-by: syzkaller <syzkaller@googlegroups.com>
 > > Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/621
 > > Signed-off-by: Kalpan Jani <kalpan.jani@mpiricsoftware.com>
 > > ---
 > > v3: follow Paolo's suggestion exactly — use mptcp_cleanup_ssk_backlog() helper under MPTCP_CF_PUSH condition only; remove inline loop duplication and second call from !dispose_it path.
 > > v2: moved cleanup into __mptcp_close_ssk() but duplicated logic and added redundant second call; incorrect approach.
 > > v1: incorrect race analysis around subflow->closing flag.
 > > 
 > >  net/mptcp/protocol.c | 33 +++++++++++++++++++--------------
 > >  1 file changed, 19 insertions(+), 14 deletions(-)
 > > 
 > > diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
 > > index 718e910ff..149f816fe 100644
 > > --- a/net/mptcp/protocol.c
 > > +++ b/net/mptcp/protocol.c
 > > @@ -2527,6 +2527,22 @@ static void __mptcp_subflow_disconnect(struct sock *ssk,
 > >      }
 > >  }
 > >  
 > > +static void mptcp_cleanup_ssk_backlog(struct sock *sk, struct sock *ssk)
 > > +{
 > > +    struct mptcp_sock *msk = mptcp_sk(sk);
 > > +    struct sk_buff *skb;
 > > +
 > > +    mptcp_data_lock(sk);
 > > +    list_for_each_entry(skb, &msk->backlog_list, list) {
 > > +        if (skb->sk != ssk)
 > > +            continue;
 > > +
 > > +        atomic_sub(skb->truesize, &skb->sk->sk_rmem_alloc);
 > > +        skb->sk = NULL;
 > > +    }
 > > +    mptcp_data_unlock(sk);
 > > +}
 > > +
 > >  /* subflow sockets can be either outgoing (connect) or incoming
 > >   * (accept).
 > >   *
 > > @@ -2550,6 +2566,9 @@ static void __mptcp_close_ssk(struct sock *sk, struct sock *ssk,
 > >      lock_sock_nested(ssk, SINGLE_DEPTH_NESTING);
 > >      subflow->closing = 1;
 > >  
 > > +    if (flags & MPTCP_CF_PUSH)
 > > +        mptcp_cleanup_ssk_backlog(sk, ssk);
 > > +
 > >      /* Borrow the fwd allocated page left-over; fwd memory for the subflow
 > >       * could be negative at this point, but will be reach zero soon - when
 > >       * the data allocated using such fragment will be freed.
 > > @@ -2641,9 +2660,6 @@ static void __mptcp_close_ssk(struct sock *sk, struct sock *ssk,
 > >  void mptcp_close_ssk(struct sock *sk, struct sock *ssk,
 > >               struct mptcp_subflow_context *subflow)
 > >  {
 > > -    struct mptcp_sock *msk = mptcp_sk(sk);
 > > -    struct sk_buff *skb;
 > > -
 > >      /* The first subflow can already be closed or disconnected */
 > >      if (subflow->close_event_done || READ_ONCE(subflow->local_id) < 0)
 > >          return;
 > > @@ -2653,17 +2669,6 @@ void mptcp_close_ssk(struct sock *sk, struct sock *ssk,
 > >      if (sk->sk_state == TCP_ESTABLISHED)
 > >          mptcp_event(MPTCP_EVENT_SUB_CLOSED, mptcp_sk(sk), ssk, GFP_KERNEL);
 > >  
 > > -    /* Remove any reference from the backlog to this ssk; backlog skbs consume
 > > -     * space in the msk receive queue, no need to touch sk->sk_rmem_alloc
 > > -     */
 > > -    list_for_each_entry(skb, &msk->backlog_list, list) {
 > > -        if (skb->sk != ssk)
 > > -            continue;
 > > -
 > > -        atomic_sub(skb->truesize, &skb->sk->sk_rmem_alloc);
 > > -        skb->sk = NULL;
 > > -    }
 > > -
 > >      /* subflow aborted before reaching the fully_established status
 > >       * attempt the creation of the next subflow
 > >       */
 > 
 > Cheers,
 > Matt
 > -- 
 > Sponsored by the NGI0 Core fund.
 > 
 > 


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2026-06-10 10:45 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-19  9:22 [PATCH net v3] mptcp: fix stale skb->sk reference on subflow close Kalpan Jani
2026-05-19 10:34 ` MPTCP CI
2026-05-27  5:49 ` Kalpan Jani
2026-05-27  9:40   ` [PATCH " Paolo Abeni
2026-05-27 11:31     ` Matthieu Baerts
2026-05-29 10:38       ` Paolo Abeni
2026-05-31  6:14         ` Matthieu Baerts
2026-05-31  6:11 ` Matthieu Baerts
2026-06-10 10:45   ` Kalpan Jani

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.