From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from out-174.mta1.migadu.com (out-174.mta1.migadu.com [95.215.58.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 51A5540C934; Tue, 10 Mar 2026 12:01:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.174 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773144076; cv=none; b=HnSZLRPZHN6Iakf2JwQ6bEt9nOCt5K9o9fQW84BXeh49db77RAVl83KALDUuwM6yaluC1LQbzBVIkEplUY4SwAF8FweFI0WBdaqaGU431tJxETEFf7FXAULJWXMV72V0H1noRFrJoFBPzeQP74YhqqG0HyueBRlzpWckkWQ3YF4= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773144076; c=relaxed/simple; bh=4rtpGxnqa/+J2jc45qo7Qkpscf/0KDo8MaHhqRm1vCM=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=Na5kkDdCNOtYrg47nBhANk/3zKlqkuvzVkVcMI35lWzPIoiiUylcal7IvRYSQXypGMpnwxGQzKJXdJNxtQij/exFxQoKwsBCfvBfDJ3+5E8pvGaS+p6KGduuI9mn9KRoCaD3yuUh/DCtEsqdMCx1mIUwCsxF7WDzQ+mswjPU8Ec= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=qB7zQoYe; arc=none smtp.client-ip=95.215.58.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="qB7zQoYe" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1773144072; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=k4y1F1LBf0520Nj01Rtkf0g/BeLRhTx2La2nFQXGurI=; b=qB7zQoYeSMKtYOxSFqMnD2IiD8GgOF75sdp9QpxuTb5RJBBdFhoN9vfQT7jThKAfSxavnJ hBHUyEvPvIN6E/e6l2IUeeajSF+SiMGljncSpBStsMQ566tEOLQRPhFz4yaICiVRCZBuZk 7pNbT9uSRMC+h/iIDfXy/5EIwPKKxz8= From: Jiayuan Chen To: netdev@vger.kernel.org Cc: Jiayuan Chen , syzbot+827ae2bfb3a3529333e9@syzkaller.appspotmail.com, Eric Dumazet , Jiayuan Chen , "D. Wythe" , Dust Li , Sidraya Jayagond , Wenjia Zhang , Mahanta Jambigi , Tony Lu , Wen Gu , "David S. Miller" , Jakub Kicinski , Paolo Abeni , Simon Horman , linux-rdma@vger.kernel.org, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH net v3] net/smc: fix NULL dereference and UAF in smc_tcp_syn_recv_sock() Date: Tue, 10 Mar 2026 20:00:51 +0800 Message-ID: <20260310120053.136594-1-jiayuan.chen@linux.dev> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT From: Jiayuan Chen Syzkaller reported a panic in smc_tcp_syn_recv_sock() [1]. smc_tcp_syn_recv_sock() is called in the TCP receive path (softirq) via icsk_af_ops->syn_recv_sock on the clcsock (TCP listening socket). It reads sk_user_data to get the smc_sock pointer. However, when the SMC listen socket is being closed concurrently, smc_close_active() sets clcsock->sk_user_data to NULL under sk_callback_lock, and then the smc_sock itself can be freed via sock_put() in smc_release(). This leads to two issues: 1) NULL pointer dereference: sk_user_data is NULL when accessed. 2) Use-after-free: sk_user_data is read as non-NULL, but the smc_sock is freed before its fields (e.g., queued_smc_hs, ori_af_ops) are accessed. The race window looks like this: CPU A (softirq) CPU B (process ctx) tcp_v4_rcv() TCP_NEW_SYN_RECV: sk = req->rsk_listener sock_hold(sk) /* No lock on listener */ smc_close_active(): write_lock_bh(cb_lock) sk_user_data = NULL write_unlock_bh(cb_lock) ... smc_clcsock_release() sock_put(smc->sk) x2 -> smc_sock freed! tcp_check_req() smc_tcp_syn_recv_sock(): smc = user_data(sk) -> NULL or dangling smc->queued_smc_hs -> crash! Note that the clcsock and smc_sock are two independent objects with separate refcounts. TCP stack holds a reference on the clcsock, which keeps it alive, but this does NOT prevent the smc_sock from being freed. Fix this by using RCU and refcount_inc_not_zero() to safely access smc_sock. Since smc_tcp_syn_recv_sock() is called in the TCP three-way handshake path, taking read_lock_bh on sk_callback_lock is too heavy and would not survive a SYN flood attack. Using rcu_read_lock() is much more lightweight. - Set SOCK_RCU_FREE on the SMC listen socket so that smc_sock freeing is deferred until after the RCU grace period. This guarantees the memory is still valid when accessed inside rcu_read_lock(). - Use rcu_read_lock() to protect reading sk_user_data. - Use refcount_inc_not_zero(&smc->sk.sk_refcnt) to pin the smc_sock. If the refcount has already reached zero (close path completed), it returns false and we bail out safely. Note: smc_hs_congested() has a similar lockless read of sk_user_data without rcu_read_lock(), but it only checks for NULL and accesses the global smc_hs_wq, never dereferencing any smc_sock field, so it is not affected. Reproducer was verified with mdelay injection and smc_run, the issue no longer occurs with this patch applied. [1] https://syzkaller.appspot.com/bug?extid=827ae2bfb3a3529333e9 Fixes: 8270d9c21041 ("net/smc: Limit backlog connections") Reported-by: syzbot+827ae2bfb3a3529333e9@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/67eaf9b8.050a0220.3c3d88.004a.GAE@google.com/T/ Suggested-by: Eric Dumazet Cc: Jiayuan Chen Signed-off-by: Jiayuan Chen --- v2 -> v3: - Use rcu_dereference_sk_user_data() for reading and rcu_assign_sk_user_data() for writing sk_user_data to prevent load/store tearing, as pointed out by Eric Dumazet. v2: https://lore.kernel.org/netdev/20260309023846.18516-1-jiayuan.chen@linux.dev/ v1 -> v2: - Use rcu_read_lock() + refcount_inc_not_zero() instead of read_lock_bh(sk_callback_lock) + sock_hold(), since this is the TCP handshake hot path and read_lock_bh is too expensive under SYN flood. - Set SOCK_RCU_FREE on SMC listen socket to ensure RCU-deferred freeing. v1: https://lore.kernel.org/netdev/20260307032158.372165-1-jiayuan.chen@linux.dev/ --- net/smc/af_smc.c | 19 ++++++++++++++----- net/smc/smc.h | 3 +-- net/smc/smc_close.c | 2 +- 3 files changed, 16 insertions(+), 8 deletions(-) diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c index d0119afcc6a1..078927ba068a 100644 --- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -131,7 +131,13 @@ static struct sock *smc_tcp_syn_recv_sock(const struct sock *sk, struct smc_sock *smc; struct sock *child; + rcu_read_lock(); smc = smc_clcsock_user_data(sk); + if (!smc || !refcount_inc_not_zero(&smc->sk.sk_refcnt)) { + rcu_read_unlock(); + return NULL; + } + rcu_read_unlock(); if (READ_ONCE(sk->sk_ack_backlog) + atomic_read(&smc->queued_smc_hs) > sk->sk_max_ack_backlog) @@ -153,11 +159,13 @@ static struct sock *smc_tcp_syn_recv_sock(const struct sock *sk, if (inet_csk(child)->icsk_af_ops == inet_csk(sk)->icsk_af_ops) inet_csk(child)->icsk_af_ops = smc->ori_af_ops; } + sock_put(&smc->sk); return child; drop: dst_release(dst); tcp_listendrop(sk); + sock_put(&smc->sk); return NULL; } @@ -254,7 +262,7 @@ static void smc_fback_restore_callbacks(struct smc_sock *smc) struct sock *clcsk = smc->clcsock->sk; write_lock_bh(&clcsk->sk_callback_lock); - clcsk->sk_user_data = NULL; + rcu_assign_sk_user_data(clcsk, NULL); smc_clcsock_restore_cb(&clcsk->sk_state_change, &smc->clcsk_state_change); smc_clcsock_restore_cb(&clcsk->sk_data_ready, &smc->clcsk_data_ready); @@ -902,7 +910,7 @@ static void smc_fback_replace_callbacks(struct smc_sock *smc) struct sock *clcsk = smc->clcsock->sk; write_lock_bh(&clcsk->sk_callback_lock); - clcsk->sk_user_data = (void *)((uintptr_t)smc | SK_USER_DATA_NOCOPY); + __rcu_assign_sk_user_data_with_flags(clcsk, smc, SK_USER_DATA_NOCOPY); smc_clcsock_replace_cb(&clcsk->sk_state_change, smc_fback_state_change, &smc->clcsk_state_change); @@ -2665,8 +2673,8 @@ int smc_listen(struct socket *sock, int backlog) * smc-specific sk_data_ready function */ write_lock_bh(&smc->clcsock->sk->sk_callback_lock); - smc->clcsock->sk->sk_user_data = - (void *)((uintptr_t)smc | SK_USER_DATA_NOCOPY); + __rcu_assign_sk_user_data_with_flags(smc->clcsock->sk, smc, + SK_USER_DATA_NOCOPY); smc_clcsock_replace_cb(&smc->clcsock->sk->sk_data_ready, smc_clcsock_data_ready, &smc->clcsk_data_ready); write_unlock_bh(&smc->clcsock->sk->sk_callback_lock); @@ -2687,10 +2695,11 @@ int smc_listen(struct socket *sock, int backlog) write_lock_bh(&smc->clcsock->sk->sk_callback_lock); smc_clcsock_restore_cb(&smc->clcsock->sk->sk_data_ready, &smc->clcsk_data_ready); - smc->clcsock->sk->sk_user_data = NULL; + rcu_assign_sk_user_data(smc->clcsock->sk, NULL); write_unlock_bh(&smc->clcsock->sk->sk_callback_lock); goto out; } + sock_set_flag(sk, SOCK_RCU_FREE); sk->sk_max_ack_backlog = backlog; sk->sk_ack_backlog = 0; sk->sk_state = SMC_LISTEN; diff --git a/net/smc/smc.h b/net/smc/smc.h index 9e6af72784ba..8b3eabcdb542 100644 --- a/net/smc/smc.h +++ b/net/smc/smc.h @@ -342,8 +342,7 @@ static inline void smc_init_saved_callbacks(struct smc_sock *smc) static inline struct smc_sock *smc_clcsock_user_data(const struct sock *clcsk) { - return (struct smc_sock *) - ((uintptr_t)clcsk->sk_user_data & ~SK_USER_DATA_NOCOPY); + return (struct smc_sock *)rcu_dereference_sk_user_data(clcsk); } /* save target_cb in saved_cb, and replace target_cb with new_cb */ diff --git a/net/smc/smc_close.c b/net/smc/smc_close.c index 10219f55aad1..bb0313ef5f7c 100644 --- a/net/smc/smc_close.c +++ b/net/smc/smc_close.c @@ -218,7 +218,7 @@ int smc_close_active(struct smc_sock *smc) write_lock_bh(&smc->clcsock->sk->sk_callback_lock); smc_clcsock_restore_cb(&smc->clcsock->sk->sk_data_ready, &smc->clcsk_data_ready); - smc->clcsock->sk->sk_user_data = NULL; + rcu_assign_sk_user_data(smc->clcsock->sk, NULL); write_unlock_bh(&smc->clcsock->sk->sk_callback_lock); rc = kernel_sock_shutdown(smc->clcsock, SHUT_RDWR); } -- 2.43.0