Netdev List
 help / color / mirror / Atom feed
From: Paolo Abeni <pabeni@redhat.com>
To: Ren Wei <n05ec@lzu.edu.cn>,
	netdev@vger.kernel.org, mptcp@lists.linux.dev
Cc: matttbe@kernel.org, martineau@kernel.org, geliang@kernel.org,
	davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
	horms@kernel.org, ncardwell@google.com, kuniyu@google.com,
	daniel@iogearbox.net, kafai@fb.com, yuantan098@gmail.com,
	yifanwucs@gmail.com, tomapufckgml@gmail.com, bird@lzu.edu.cn,
	caoruide123@gmail.com, enjou1224z@gmail.com
Subject: Re: [PATCH net v2 1/1] mptcp: fix request ownership when cloning reqsk
Date: Wed, 13 May 2026 11:37:46 +0200	[thread overview]
Message-ID: <be1b6f66-207a-4ed6-8188-72ec11f222a4@redhat.com> (raw)
In-Reply-To: <40fd38e7a368e5b7bc9bc83364a32241f977d53f.1778404619.git.caoruide123@gmail.com>

On 5/10/26 4:14 PM, Ren Wei wrote:
> From: Ruide Cao <caoruide123@gmail.com>
> 
> TCP request migration clones pending request sockets with
> inet_reqsk_clone(). For MPTCP MP_JOIN requests this raw-copies
> subflow_req->msk, but the cloned request does not take a new reference.
> 
> Both the original and the cloned request can later drop the same msk in
> subflow_req_destructor(), and a migrated request may keep a dangling msk
> pointer after the original owner has already been released.
> 
> MP_CAPABLE requests have a similar ownership issue for token_node. The
> original request is hashed in the token table, and inet_reqsk_clone()
> raw-copies the hlist pointers into the clone. The clone must not inherit
> that hashed state on migration failure, and the token ownership must move
> only after the new req has actually replaced the old one in the ehash.
> 
> Fix both cases while keeping the change local to MPTCP and the TCP
> migration path: grab a reference for cloned subflow requests that carry an
> msk, clear any raw-copied token hash state from the clone, and move the
> hashed token request ownership only after migration succeeds.
> 
> Fixes: c905dee62232 ("tcp: Migrate TCP_NEW_SYN_RECV requests at retransmitting SYN+ACKs.")
> Cc: stable@kernel.org
> Reported-by: Yuan Tan <yuantan098@gmail.com>
> Reported-by: Yifan Wu <yifanwucs@gmail.com>
> Reported-by: Juefei Pu <tomapufckgml@gmail.com>
> Reported-by: Xin Liu <bird@lzu.edu.cn>
> Signed-off-by: Ruide Cao <caoruide123@gmail.com>
> Tested-by: Ren Wei <enjou1224z@gmail.com>
> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
> ---
> changes in v2:
>   - drop the generic request_sock clone callback
>   - call MPTCP directly from inet_reqsk_clone() under the TCP protocol check
>   - keep cloned MP_JOIN requests holding an msk reference
>   - clear raw-copied MP_CAPABLE token hash state in the clone
>   - move MP_CAPABLE token ownership only after successful req migration
>   - avoid exposing token internals to inet_connection_sock.c
>   - update the commit message accordingly
>   - v1 link: https://lore.kernel.org/all/86e2514b533bf4d55d4aa2fdbf1404022e8c9430.1776149210.git.caoruide123@gmail.com/
> 
>  include/net/mptcp.h             | 14 ++++++++++++++
>  net/ipv4/inet_connection_sock.c | 12 ++++++++++--
>  net/mptcp/protocol.h            |  2 ++
>  net/mptcp/subflow.c             | 20 ++++++++++++++++++++
>  net/mptcp/token.c               | 25 +++++++++++++++++++++++++
>  5 files changed, 71 insertions(+), 2 deletions(-)
> 
> diff --git a/include/net/mptcp.h b/include/net/mptcp.h
> index f7263fe2a2e4..4bb24963afef 100644
> --- a/include/net/mptcp.h
> +++ b/include/net/mptcp.h
> @@ -219,6 +219,10 @@ int mptcp_subflow_init_cookie_req(struct request_sock *req,
>  struct request_sock *mptcp_subflow_reqsk_alloc(const struct request_sock_ops *ops,
>  					       struct sock *sk_listener,
>  					       bool attach_listener);
> +void mptcp_subflow_reqsk_clone(struct request_sock *req,
> +			       struct request_sock *new_req);
> +void mptcp_subflow_reqsk_migrated(struct request_sock *req,
> +				  struct request_sock *new_req);
>  
>  __be32 mptcp_get_reset_option(const struct sk_buff *skb);
>  
> @@ -314,6 +318,16 @@ static inline struct request_sock *mptcp_subflow_reqsk_alloc(const struct reques
>  	return NULL;
>  }
>  
> +static inline void mptcp_subflow_reqsk_clone(struct request_sock *req,
> +					     struct request_sock *new_req)
> +{
> +}
> +
> +static inline void mptcp_subflow_reqsk_migrated(struct request_sock *req,
> +						struct request_sock *new_req)
> +{
> +}
> +
>  static inline __be32 mptcp_reset_option(const struct sk_buff *skb)  { return htonl(0u); }
>  
>  static inline void mptcp_active_detect_blackhole(struct sock *sk, bool expired) { }
> diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
> index e961936b6be7..578471bb8a2b 100644
> --- a/net/ipv4/inet_connection_sock.c
> +++ b/net/ipv4/inet_connection_sock.c
> @@ -17,6 +17,7 @@
>  #include <net/inet_timewait_sock.h>
>  #include <net/ip.h>
>  #include <net/route.h>
> +#include <net/mptcp.h>
>  #include <net/tcp_states.h>
>  #include <net/xfrm.h>
>  #include <net/tcp.h>
> @@ -951,8 +952,13 @@ static struct request_sock *inet_reqsk_clone(struct request_sock *req,
>  	/* We need not acquire fastopenq->lock
>  	 * because the child socket is locked in inet_csk_listen_stop().
>  	 */
> -	if (sk->sk_protocol == IPPROTO_TCP && tcp_rsk(nreq)->tfo_listener)
> -		rcu_assign_pointer(tcp_sk(nreq->sk)->fastopen_rsk, nreq);
> +	if (sk->sk_protocol == IPPROTO_TCP) {
> +		if (tcp_rsk(nreq)->tfo_listener)
> +			rcu_assign_pointer(tcp_sk(nreq->sk)->fastopen_rsk, nreq);
> +
> +		if (rsk_is_mptcp(req))
> +			mptcp_subflow_reqsk_clone(req, nreq);
> +	}
>  
>  	return nreq;
>  }
> @@ -1115,6 +1121,8 @@ static void reqsk_timer_handler(struct timer_list *t)
>  			goto no_ownership;
>  		}
>  
> +		if (rsk_is_mptcp(oreq))
> +			mptcp_subflow_reqsk_migrated(oreq, nreq);

Sashiko says:

---
Can this sequence trigger a race condition with network RX processing?
In reqsk_timer_handler(), nreq is inserted into the global ehash table
via inet_ehash_insert() right before this block. The ehash lock is
immediately dropped, making nreq globally visible to concurrent network
RX softirq processing.
However, the MPTCP token state is not migrated to nreq until the call to
mptcp_subflow_reqsk_migrated(oreq, nreq).
If a concurrent RX thread processes an incoming ACK for the connection
in this narrow window, it could find nreq in the ehash table,
successfully process it, and invoke mptcp_token_accept(nreq).
Inside mptcp_token_accept(), __token_lookup_req() searches the token
bucket and finds oreq, because the timer thread hasn't moved the token yet.
The code then asserts ownership:
mptcp_token_accept()
	pos = __token_lookup_req(bucket, req->token);
	if (!WARN_ON_ONCE(pos != req))
		hlist_nulls_del_init_rcu(&req->token_node);
Since pos is oreq and req is nreq, would this fail the assertion and
trigger a kernel warning?
If the calls were reordered so that mptcp_subflow_reqsk_migrated()
happens before inet_ehash_insert(), could that merely shift the race? If
a concurrent ACK is already processing oreq, it might call
mptcp_token_accept(oreq) after the token was moved to nreq, triggering
the same warning, since the ehash table and the token bucket are
protected by different locks and the updates are non-atomic.
Could a remote attacker time ACKs to hit this window during a listener
migration, potentially leading to a denial of service on systems
configured to panic on warnings?
---

I _think_ the above race could/should be solved squashing
mptcp_subflow_reqsk_migrated() into mptcp_subflow_reqsk_clone() and
adjusting  mptcp_token_destroy_request() and mptcp_token_accept() to
deal with already removed token, checking (again) for unhased token_node
under the bucket lock.

The latter will require a little more work, returning a success status
to the caller - mptcp_sk_clone_init() - and make the latter fail on
mptcp_token_accept() failures.

The above would also make the fix more encapsulated into the mptcp code.

/P


      reply	other threads:[~2026-05-13  9:37 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-10 14:14 [PATCH net v2 1/1] mptcp: fix request ownership when cloning reqsk Ren Wei
2026-05-13  9:37 ` Paolo Abeni [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=be1b6f66-207a-4ed6-8188-72ec11f222a4@redhat.com \
    --to=pabeni@redhat.com \
    --cc=bird@lzu.edu.cn \
    --cc=caoruide123@gmail.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=enjou1224z@gmail.com \
    --cc=geliang@kernel.org \
    --cc=horms@kernel.org \
    --cc=kafai@fb.com \
    --cc=kuba@kernel.org \
    --cc=kuniyu@google.com \
    --cc=martineau@kernel.org \
    --cc=matttbe@kernel.org \
    --cc=mptcp@lists.linux.dev \
    --cc=n05ec@lzu.edu.cn \
    --cc=ncardwell@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=tomapufckgml@gmail.com \
    --cc=yifanwucs@gmail.com \
    --cc=yuantan098@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox