public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH net v4] net/smc: fix NULL dereference and UAF in smc_tcp_syn_recv_sock()
@ 2026-03-11  2:24 Jiayuan Chen
  2026-03-11  9:17 ` Eric Dumazet
  2026-03-12  3:05 ` [net,v4] " Jakub Kicinski
  0 siblings, 2 replies; 4+ messages in thread
From: Jiayuan Chen @ 2026-03-11  2:24 UTC (permalink / raw)
  To: netdev
  Cc: Jiayuan Chen, syzbot+827ae2bfb3a3529333e9, Eric Dumazet,
	Jiayuan Chen, D. Wythe, Dust Li, Sidraya Jayagond, Wenjia Zhang,
	Mahanta Jambigi, Tony Lu, Wen Gu, David S. Miller, Jakub Kicinski,
	Paolo Abeni, Simon Horman, linux-rdma, linux-s390, linux-kernel

From: Jiayuan Chen <jiayuan.chen@shopee.com>

Syzkaller reported a panic in smc_tcp_syn_recv_sock() [1].

smc_tcp_syn_recv_sock() is called in the TCP receive path
(softirq) via icsk_af_ops->syn_recv_sock on the clcsock (TCP
listening socket). It reads sk_user_data to get the smc_sock
pointer. However, when the SMC listen socket is being closed
concurrently, smc_close_active() sets clcsock->sk_user_data
to NULL under sk_callback_lock, and then the smc_sock itself
can be freed via sock_put() in smc_release().

This leads to two issues:

1) NULL pointer dereference: sk_user_data is NULL when
   accessed.
2) Use-after-free: sk_user_data is read as non-NULL, but the
   smc_sock is freed before its fields (e.g., queued_smc_hs,
   ori_af_ops) are accessed.

The race window looks like this:

  CPU A (softirq)              CPU B (process ctx)

  tcp_v4_rcv()
    TCP_NEW_SYN_RECV:
    sk = req->rsk_listener
    sock_hold(sk)
    /* No lock on listener */
                               smc_close_active():
                                 write_lock_bh(cb_lock)
                                 sk_user_data = NULL
                                 write_unlock_bh(cb_lock)
                                 ...
                                 smc_clcsock_release()
                                 sock_put(smc->sk) x2
                                   -> smc_sock freed!
    tcp_check_req()
      smc_tcp_syn_recv_sock():
        smc = user_data(sk)
          -> NULL or dangling
        smc->queued_smc_hs
          -> crash!

Note that the clcsock and smc_sock are two independent objects
with separate refcounts. TCP stack holds a reference on the
clcsock, which keeps it alive, but this does NOT prevent the
smc_sock from being freed.

Fix this by using RCU and refcount_inc_not_zero() to safely
access smc_sock. Since smc_tcp_syn_recv_sock() is called in
the TCP three-way handshake path, taking read_lock_bh on
sk_callback_lock is too heavy and would not survive a SYN
flood attack. Using rcu_read_lock() is much more lightweight.

- Set SOCK_RCU_FREE on the SMC listen socket so that
  smc_sock freeing is deferred until after the RCU grace
  period. This guarantees the memory is still valid when
  accessed inside rcu_read_lock().
- Use rcu_read_lock() to protect reading sk_user_data.
- Use refcount_inc_not_zero(&smc->sk.sk_refcnt) to pin the
  smc_sock. If the refcount has already reached zero (close
  path completed), it returns false and we bail out safely.

Note: smc_hs_congested() has a similar lockless read of
sk_user_data without rcu_read_lock(), but it only checks for
NULL and accesses the global smc_hs_wq, never dereferencing
any smc_sock field, so it is not affected.

Reproducer was verified with mdelay injection and smc_run,
the issue no longer occurs with this patch applied.

[1] https://syzkaller.appspot.com/bug?extid=827ae2bfb3a3529333e9

Fixes: 8270d9c21041 ("net/smc: Limit backlog connections")
Reported-by: syzbot+827ae2bfb3a3529333e9@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/67eaf9b8.050a0220.3c3d88.004a.GAE@google.com/T/
Suggested-by: Eric Dumazet <edumazet@google.com>
Cc: Jiayuan Chen <jiayuan.chen@linux.dev>
Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com>
---
v4:
- Add a new smc_clcsock_user_data_rcu() helper instead of
  modifying the existing smc_clcsock_user_data(), to allow
  gradual conversion of callers. Only smc_tcp_syn_recv_sock()
  uses the RCU variant; other callers under sk_callback_lock
  remain unchanged, avoiding lockdep warnings.
- Use rcu_dereference_sk_user_data() for reading and
  rcu_assign_sk_user_data() for writing sk_user_data to
  prevent load/store tearing.

v3: https://lore.kernel.org/netdev/20260310120053.136594-1-jiayuan.chen@linux.dev/

v3:
- Write sk_user_data with RCU to prevent store tearing.

v2: https://lore.kernel.org/netdev/20260309023846.18516-1-jiayuan.chen@linux.dev/

v2:
- Use rcu_read_lock() + refcount_inc_not_zero() instead of
  read_lock_bh(sk_callback_lock) + sock_hold(), since this
  is the TCP handshake hot path and read_lock_bh is too
  expensive under SYN flood.
- Set SOCK_RCU_FREE on SMC listen socket to ensure
  RCU-deferred freeing.

v1: https://lore.kernel.org/netdev/20260307032158.372165-1-jiayuan.chen@linux.dev/
---
 net/smc/af_smc.c    | 21 +++++++++++++++------
 net/smc/smc.h       |  5 +++++
 net/smc/smc_close.c |  2 +-
 3 files changed, 21 insertions(+), 7 deletions(-)

diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index d0119afcc6a1..808a107d3354 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -131,7 +131,13 @@ static struct sock *smc_tcp_syn_recv_sock(const struct sock *sk,
 	struct smc_sock *smc;
 	struct sock *child;
 
-	smc = smc_clcsock_user_data(sk);
+	rcu_read_lock();
+	smc = smc_clcsock_user_data_rcu(sk);
+	if (!smc || !refcount_inc_not_zero(&smc->sk.sk_refcnt)) {
+		rcu_read_unlock();
+		return NULL;
+	}
+	rcu_read_unlock();
 
 	if (READ_ONCE(sk->sk_ack_backlog) + atomic_read(&smc->queued_smc_hs) >
 				sk->sk_max_ack_backlog)
@@ -153,11 +159,13 @@ static struct sock *smc_tcp_syn_recv_sock(const struct sock *sk,
 		if (inet_csk(child)->icsk_af_ops == inet_csk(sk)->icsk_af_ops)
 			inet_csk(child)->icsk_af_ops = smc->ori_af_ops;
 	}
+	sock_put(&smc->sk);
 	return child;
 
 drop:
 	dst_release(dst);
 	tcp_listendrop(sk);
+	sock_put(&smc->sk);
 	return NULL;
 }
 
@@ -254,7 +262,7 @@ static void smc_fback_restore_callbacks(struct smc_sock *smc)
 	struct sock *clcsk = smc->clcsock->sk;
 
 	write_lock_bh(&clcsk->sk_callback_lock);
-	clcsk->sk_user_data = NULL;
+	rcu_assign_sk_user_data(clcsk, NULL);
 
 	smc_clcsock_restore_cb(&clcsk->sk_state_change, &smc->clcsk_state_change);
 	smc_clcsock_restore_cb(&clcsk->sk_data_ready, &smc->clcsk_data_ready);
@@ -902,7 +910,7 @@ static void smc_fback_replace_callbacks(struct smc_sock *smc)
 	struct sock *clcsk = smc->clcsock->sk;
 
 	write_lock_bh(&clcsk->sk_callback_lock);
-	clcsk->sk_user_data = (void *)((uintptr_t)smc | SK_USER_DATA_NOCOPY);
+	__rcu_assign_sk_user_data_with_flags(clcsk, smc, SK_USER_DATA_NOCOPY);
 
 	smc_clcsock_replace_cb(&clcsk->sk_state_change, smc_fback_state_change,
 			       &smc->clcsk_state_change);
@@ -2665,8 +2673,8 @@ int smc_listen(struct socket *sock, int backlog)
 	 * smc-specific sk_data_ready function
 	 */
 	write_lock_bh(&smc->clcsock->sk->sk_callback_lock);
-	smc->clcsock->sk->sk_user_data =
-		(void *)((uintptr_t)smc | SK_USER_DATA_NOCOPY);
+	__rcu_assign_sk_user_data_with_flags(smc->clcsock->sk, smc,
+					     SK_USER_DATA_NOCOPY);
 	smc_clcsock_replace_cb(&smc->clcsock->sk->sk_data_ready,
 			       smc_clcsock_data_ready, &smc->clcsk_data_ready);
 	write_unlock_bh(&smc->clcsock->sk->sk_callback_lock);
@@ -2687,10 +2695,11 @@ int smc_listen(struct socket *sock, int backlog)
 		write_lock_bh(&smc->clcsock->sk->sk_callback_lock);
 		smc_clcsock_restore_cb(&smc->clcsock->sk->sk_data_ready,
 				       &smc->clcsk_data_ready);
-		smc->clcsock->sk->sk_user_data = NULL;
+		rcu_assign_sk_user_data(smc->clcsock->sk, NULL);
 		write_unlock_bh(&smc->clcsock->sk->sk_callback_lock);
 		goto out;
 	}
+	sock_set_flag(sk, SOCK_RCU_FREE);
 	sk->sk_max_ack_backlog = backlog;
 	sk->sk_ack_backlog = 0;
 	sk->sk_state = SMC_LISTEN;
diff --git a/net/smc/smc.h b/net/smc/smc.h
index 9e6af72784ba..52145df83f6e 100644
--- a/net/smc/smc.h
+++ b/net/smc/smc.h
@@ -346,6 +346,11 @@ static inline struct smc_sock *smc_clcsock_user_data(const struct sock *clcsk)
 	       ((uintptr_t)clcsk->sk_user_data & ~SK_USER_DATA_NOCOPY);
 }
 
+static inline struct smc_sock *smc_clcsock_user_data_rcu(const struct sock *clcsk)
+{
+	return (struct smc_sock *)rcu_dereference_sk_user_data(clcsk);
+}
+
 /* save target_cb in saved_cb, and replace target_cb with new_cb */
 static inline void smc_clcsock_replace_cb(void (**target_cb)(struct sock *),
 					  void (*new_cb)(struct sock *),
diff --git a/net/smc/smc_close.c b/net/smc/smc_close.c
index 10219f55aad1..bb0313ef5f7c 100644
--- a/net/smc/smc_close.c
+++ b/net/smc/smc_close.c
@@ -218,7 +218,7 @@ int smc_close_active(struct smc_sock *smc)
 			write_lock_bh(&smc->clcsock->sk->sk_callback_lock);
 			smc_clcsock_restore_cb(&smc->clcsock->sk->sk_data_ready,
 					       &smc->clcsk_data_ready);
-			smc->clcsock->sk->sk_user_data = NULL;
+			rcu_assign_sk_user_data(smc->clcsock->sk, NULL);
 			write_unlock_bh(&smc->clcsock->sk->sk_callback_lock);
 			rc = kernel_sock_shutdown(smc->clcsock, SHUT_RDWR);
 		}
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH net v4] net/smc: fix NULL dereference and UAF in smc_tcp_syn_recv_sock()
  2026-03-11  2:24 [PATCH net v4] net/smc: fix NULL dereference and UAF in smc_tcp_syn_recv_sock() Jiayuan Chen
@ 2026-03-11  9:17 ` Eric Dumazet
  2026-03-12  3:05 ` [net,v4] " Jakub Kicinski
  1 sibling, 0 replies; 4+ messages in thread
From: Eric Dumazet @ 2026-03-11  9:17 UTC (permalink / raw)
  To: Jiayuan Chen
  Cc: netdev, Jiayuan Chen, syzbot+827ae2bfb3a3529333e9, D. Wythe,
	Dust Li, Sidraya Jayagond, Wenjia Zhang, Mahanta Jambigi, Tony Lu,
	Wen Gu, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Simon Horman, linux-rdma, linux-s390, linux-kernel

On Wed, Mar 11, 2026 at 3:25 AM Jiayuan Chen <jiayuan.chen@linux.dev> wrote:
>
> From: Jiayuan Chen <jiayuan.chen@shopee.com>
>
> Syzkaller reported a panic in smc_tcp_syn_recv_sock() [1].
>
> smc_tcp_syn_recv_sock() is called in the TCP receive path
> (softirq) via icsk_af_ops->syn_recv_sock on the clcsock (TCP
> listening socket). It reads sk_user_data to get the smc_sock
> pointer. However, when the SMC listen socket is being closed
> concurrently, smc_close_active() sets clcsock->sk_user_data
> to NULL under sk_callback_lock, and then the smc_sock itself
> can be freed via sock_put() in smc_release().
>
> This leads to two issues:
>
> 1) NULL pointer dereference: sk_user_data is NULL when
>    accessed.
> 2) Use-after-free: sk_user_data is read as non-NULL, but the
>    smc_sock is freed before its fields (e.g., queued_smc_hs,
>    ori_af_ops) are accessed.
>
> The race window looks like this:
>
>   CPU A (softirq)              CPU B (process ctx)
>
>   tcp_v4_rcv()
>     TCP_NEW_SYN_RECV:
>     sk = req->rsk_listener
>     sock_hold(sk)
>     /* No lock on listener */
>                                smc_close_active():
>                                  write_lock_bh(cb_lock)
>                                  sk_user_data = NULL
>                                  write_unlock_bh(cb_lock)
>                                  ...
>                                  smc_clcsock_release()
>                                  sock_put(smc->sk) x2
>                                    -> smc_sock freed!
>     tcp_check_req()
>       smc_tcp_syn_recv_sock():
>         smc = user_data(sk)
>           -> NULL or dangling
>         smc->queued_smc_hs
>           -> crash!
>
> Note that the clcsock and smc_sock are two independent objects
> with separate refcounts. TCP stack holds a reference on the
> clcsock, which keeps it alive, but this does NOT prevent the
> smc_sock from being freed.
>
> Fix this by using RCU and refcount_inc_not_zero() to safely
> access smc_sock. Since smc_tcp_syn_recv_sock() is called in
> the TCP three-way handshake path, taking read_lock_bh on
> sk_callback_lock is too heavy and would not survive a SYN
> flood attack. Using rcu_read_lock() is much more lightweight.
>
> - Set SOCK_RCU_FREE on the SMC listen socket so that
>   smc_sock freeing is deferred until after the RCU grace
>   period. This guarantees the memory is still valid when
>   accessed inside rcu_read_lock().
> - Use rcu_read_lock() to protect reading sk_user_data.
> - Use refcount_inc_not_zero(&smc->sk.sk_refcnt) to pin the
>   smc_sock. If the refcount has already reached zero (close
>   path completed), it returns false and we bail out safely.
>
> Note: smc_hs_congested() has a similar lockless read of
> sk_user_data without rcu_read_lock(), but it only checks for
> NULL and accesses the global smc_hs_wq, never dereferencing
> any smc_sock field, so it is not affected.
>
> Reproducer was verified with mdelay injection and smc_run,
> the issue no longer occurs with this patch applied.
>
> [1] https://syzkaller.appspot.com/bug?extid=827ae2bfb3a3529333e9
>
> Fixes: 8270d9c21041 ("net/smc: Limit backlog connections")
> Reported-by: syzbot+827ae2bfb3a3529333e9@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/all/67eaf9b8.050a0220.3c3d88.004a.GAE@google.com/T/
> Suggested-by: Eric Dumazet <edumazet@google.com>
> Cc: Jiayuan Chen <jiayuan.chen@linux.dev>
> Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com>

Reviewed-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [net,v4] net/smc: fix NULL dereference and UAF in smc_tcp_syn_recv_sock()
  2026-03-11  2:24 [PATCH net v4] net/smc: fix NULL dereference and UAF in smc_tcp_syn_recv_sock() Jiayuan Chen
  2026-03-11  9:17 ` Eric Dumazet
@ 2026-03-12  3:05 ` Jakub Kicinski
  2026-03-12  3:27   ` Jiayuan Chen
  1 sibling, 1 reply; 4+ messages in thread
From: Jakub Kicinski @ 2026-03-12  3:05 UTC (permalink / raw)
  To: jiayuan.chen
  Cc: Jakub Kicinski, jiayuan.chen, mjambigi, wenjia, horms, sidraya,
	guwen, davem, linux-s390, netdev, linux-rdma, dust.li,
	syzbot+827ae2bfb3a3529333e9, pabeni, tonylu, linux-kernel,
	edumazet, alibuda

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.

For local reproduction steps see https://netdev-ai.bots.linux.dev/ai-local.html
---
net/smc: fix NULL dereference and UAF in smc_tcp_syn_recv_sock()

This patch fixes a race condition in smc_tcp_syn_recv_sock() where
sk_user_data can be nullified or the smc_sock can be freed during
concurrent socket closure, using RCU and refcount_inc_not_zero() for
safe access.

Closes: https://lore.kernel.org/all/67eaf9b8.050a0220.3c3d88.004a.GAE@google.com/T/

> The race window looks like this:
>
>   CPU A (softirq)              CPU B (process ctx)
>
>   tcp_v4_rcv()
>     TCP_NEW_SYN_RECV:
>     sk = req->rsk_listener
>     sock_hold(sk)
>     /* No lock on listener */
>                                smc_close_active():
>                                  write_lock_bh(cb_lock)
>                                  sk_user_data = NULL
>                                  write_unlock_bh(cb_lock)
>                                  ...
>                                  smc_clcsock_release()
>                                  sock_put(smc->sk) x2
>                                    -> smc_sock freed!
>     tcp_check_req()
>       smc_tcp_syn_recv_sock():
>         smc = user_data(sk)
>           -> NULL or dangling
>         smc->queued_smc_hs
>           -> crash!

The race diagram describes the tcp_check_req() path (TCP_NEW_SYN_RECV
handling), but the actual syzbot crash occurred through a different path.
Looking at the syzbot stacktrace, the crash shows:

smc_tcp_syn_recv_sock+0xa7/0x4b0 called from tcp_get_cookie_sock+0xd5/0x790

This is the SYN cookie path:

tcp_v4_rcv() -> tcp_v4_do_rcv() -> tcp_v4_cookie_check() ->
cookie_v4_check() -> tcp_get_cookie_sock() -> smc_tcp_syn_recv_sock()

The syzbot log even shows 'Possible SYN flooding on port [::]:20002.
Sending cookies.' confirming syncookies were active. Both paths are
vulnerable to the race, but the race diagram doesn't match the actual
crash path that the fuzzer triggered.

Would it be more accurate to note that both paths (tcp_check_req for
normal SYN handling and tcp_get_cookie_sock for SYN cookies) can trigger
the race?

> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
> --- a/net/smc/af_smc.c
> +++ b/net/smc/af_smc.c
> @@ -131,7 +131,13 @@ static struct sock *smc_tcp_syn_recv_sock(const struct sock *sk,
>  	struct smc_sock *smc;
>  	struct sock *child;
>
> -	smc = smc_clcsock_user_data(sk);
> +	rcu_read_lock();
> +	smc = smc_clcsock_user_data_rcu(sk);
> +	if (!smc || !refcount_inc_not_zero(&smc->sk.sk_refcnt)) {
> +		rcu_read_unlock();
> +		return NULL;

When smc_tcp_syn_recv_sock() is called from the SYN cookie path
(tcp_get_cookie_sock() in net/ipv4/syncookies.c:205 and
net/ipv6/syncookies.c:275), the dst parameter is non-NULL.

The new early return here returns NULL without calling dst_release(dst).
The syn_recv_sock contract requires the callee to either consume dst
(attach to child socket on success) or release it (call dst_release on
failure).

Looking at tcp_get_cookie_sock() in net/ipv4/syncookies.c:

    child = af_ops->syn_recv_sock(..., &dst, ...)
    if (!child) {
        reqsk_free(req);
        return NULL;
    }

When syn_recv_sock returns NULL, tcp_get_cookie_sock immediately returns
NULL without releasing dst, expecting the syn_recv_sock callee to have
already handled it.

The existing 'drop:' label at line 165 correctly calls dst_release(dst),
but the new early return at line 138 bypasses it. This causes a dst_entry
reference count leak under SYN flood conditions on an SMC listener.

Should this be: goto drop instead of return NULL, or should dst_release(dst)
be added before the return NULL?

> +	}
> +	rcu_read_unlock();
>
>  	if (READ_ONCE(sk->sk_ack_backlog) + atomic_read(&smc->queued_smc_hs) >
>  				sk->sk_max_ack_backlog)
> @@ -153,11 +159,13 @@ static struct sock *smc_tcp_syn_recv_sock(const struct sock *sk,
>  		if (inet_csk(child)->icsk_af_ops == inet_csk(sk)->icsk_af_ops)
>  			inet_csk(child)->icsk_af_ops = smc->ori_af_ops;
>  	}
> +	sock_put(&smc->sk);
>  	return child;
>
>  drop:
>  	dst_release(dst);
>  	tcp_listendrop(sk);
> +	sock_put(&smc->sk);
>  	return NULL;
>  }
>
> @@ -254,7 +262,7 @@ static void smc_fback_restore_callbacks(struct smc_sock *smc)
>  	struct sock *clcsk = smc->clcsock->sk;
>
>  	write_lock_bh(&clcsk->sk_callback_lock);
> -	clcsk->sk_user_data = NULL;
> +	rcu_assign_sk_user_data(clcsk, NULL);
>
>  	smc_clcsock_restore_cb(&clcsk->sk_state_change, &smc->clcsk_state_change);
>  	smc_clcsock_restore_cb(&clcsk->sk_data_ready, &smc->clcsk_data_ready);

[ ... ]

> @@ -2687,10 +2695,11 @@ int smc_listen(struct socket *sock, int backlog)
>  		write_lock_bh(&smc->clcsock->sk->sk_callback_lock);
>  		smc_clcsock_restore_cb(&smc->clcsock->sk->sk_data_ready,
>  				       &smc->clcsk_data_ready);
> -		smc->clcsock->sk->sk_user_data = NULL;
> +		rcu_assign_sk_user_data(smc->clcsock->sk, NULL);
>  		write_unlock_bh(&smc->clcsock->sk->sk_callback_lock);
>  		goto out;
>  	}
> +	sock_set_flag(sk, SOCK_RCU_FREE);
>  	sk->sk_max_ack_backlog = backlog;
>  	sk->sk_ack_backlog = 0;
>  	sk->sk_state = SMC_LISTEN;

[ ... ]
-- 
pw-bot: cr

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [net,v4] net/smc: fix NULL dereference and UAF in smc_tcp_syn_recv_sock()
  2026-03-12  3:05 ` [net,v4] " Jakub Kicinski
@ 2026-03-12  3:27   ` Jiayuan Chen
  0 siblings, 0 replies; 4+ messages in thread
From: Jiayuan Chen @ 2026-03-12  3:27 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: jiayuan.chen, mjambigi, wenjia, horms, sidraya, guwen, davem,
	linux-s390, netdev, linux-rdma, dust.li,
	syzbot+827ae2bfb3a3529333e9, pabeni, tonylu, linux-kernel,
	edumazet, alibuda

Thanks for the review.


On 3/12/26 11:05 AM, Jakub Kicinski wrote:
> This is an AI-generated review of your patch. The human sending this
> email has considered the AI review valid, or at least plausible.
>
> For local reproduction steps see https://netdev-ai.bots.linux.dev/ai-local.html
> ---
> net/smc: fix NULL dereference and UAF in smc_tcp_syn_recv_sock()
>
> This patch fixes a race condition in smc_tcp_syn_recv_sock() where
> sk_user_data can be nullified or the smc_sock can be freed during
> concurrent socket closure, using RCU and refcount_inc_not_zero() for
> safe access.
>
> Closes: https://lore.kernel.org/all/67eaf9b8.050a0220.3c3d88.004a.GAE@google.com/T/
>
>> The race window looks like this:
>>
>>    CPU A (softirq)              CPU B (process ctx)
>>
>>    tcp_v4_rcv()
>>      TCP_NEW_SYN_RECV:
>>      sk = req->rsk_listener
>>      sock_hold(sk)
>>      /* No lock on listener */
>>                                 smc_close_active():
>>                                   write_lock_bh(cb_lock)
>>                                   sk_user_data = NULL
>>                                   write_unlock_bh(cb_lock)
>>                                   ...
>>                                   smc_clcsock_release()
>>                                   sock_put(smc->sk) x2
>>                                     -> smc_sock freed!
>>      tcp_check_req()
>>        smc_tcp_syn_recv_sock():
>>          smc = user_data(sk)
>>            -> NULL or dangling
>>          smc->queued_smc_hs
>>            -> crash!
> The race diagram describes the tcp_check_req() path (TCP_NEW_SYN_RECV
> handling), but the actual syzbot crash occurred through a different path.
> Looking at the syzbot stacktrace, the crash shows:
>
> smc_tcp_syn_recv_sock+0xa7/0x4b0 called from tcp_get_cookie_sock+0xd5/0x790
>
> This is the SYN cookie path:
>
> tcp_v4_rcv() -> tcp_v4_do_rcv() -> tcp_v4_cookie_check() ->
> cookie_v4_check() -> tcp_get_cookie_sock() -> smc_tcp_syn_recv_sock()
>
> The syzbot log even shows 'Possible SYN flooding on port [::]:20002.
> Sending cookies.' confirming syncookies were active. Both paths are
> vulnerable to the race, but the race diagram doesn't match the actual
> crash path that the fuzzer triggered.
>
> Would it be more accurate to note that both paths (tcp_check_req for
> normal SYN handling and tcp_get_cookie_sock for SYN cookies) can trigger
> the race?


For the first point about the call stack: both the tcp_check_req() path 
(normal SYN handling)
and the tcp_get_cookie_sock() path (SYN cookies) call syn_recv_sock() 
without holding the
listener lock, so the race condition is the same. The diagram uses 
tcp_check_req() as an
example because I have reproduced it under my env, but the fix covers 
both paths.

>> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
>> --- a/net/smc/af_smc.c
>> +++ b/net/smc/af_smc.c
>> @@ -131,7 +131,13 @@ static struct sock *smc_tcp_syn_recv_sock(const struct sock *sk,
>>   	struct smc_sock *smc;
>>   	struct sock *child;
>>
>> -	smc = smc_clcsock_user_data(sk);
>> +	rcu_read_lock();
>> +	smc = smc_clcsock_user_data_rcu(sk);
>> +	if (!smc || !refcount_inc_not_zero(&smc->sk.sk_refcnt)) {
>> +		rcu_read_unlock();
>> +		return NULL;
> When smc_tcp_syn_recv_sock() is called from the SYN cookie path
> (tcp_get_cookie_sock() in net/ipv4/syncookies.c:205 and
> net/ipv6/syncookies.c:275), the dst parameter is non-NULL.
>
> The new early return here returns NULL without calling dst_release(dst).
> The syn_recv_sock contract requires the callee to either consume dst
> (attach to child socket on success) or release it (call dst_release on
> failure).
>
> Looking at tcp_get_cookie_sock() in net/ipv4/syncookies.c:
>
>      child = af_ops->syn_recv_sock(..., &dst, ...)
>      if (!child) {
>          reqsk_free(req);
>          return NULL;
>      }
>
> When syn_recv_sock returns NULL, tcp_get_cookie_sock immediately returns
> NULL without releasing dst, expecting the syn_recv_sock callee to have
> already handled it.
>
> The existing 'drop:' label at line 165 correctly calls dst_release(dst),
> but the new early return at line 138 bypasses it. This causes a dst_entry
> reference count leak under SYN flood conditions on an SMC listener.
>
> Should this be: goto drop instead of return NULL, or should dst_release(dst)
> be added before the return NULL?



The second point about the dst leak is valid. Will fix in v3.


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-03-12  3:28 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-11  2:24 [PATCH net v4] net/smc: fix NULL dereference and UAF in smc_tcp_syn_recv_sock() Jiayuan Chen
2026-03-11  9:17 ` Eric Dumazet
2026-03-12  3:05 ` [net,v4] " Jakub Kicinski
2026-03-12  3:27   ` Jiayuan Chen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox