Netdev List
 help / color / mirror / Atom feed
From: "D. Wythe" <alibuda@linux.alibaba.com    >
To: Sechang Lim <rhkrqnwk98@gmail.com>
Cc: "D . Wythe" <alibuda@linux.alibaba.com>,
	Dust Li <dust.li@linux.alibaba.com>,
	Sidraya Jayagond <sidraya@linux.ibm.com>,
	Wenjia Zhang <wenjia@linux.ibm.com>,
	"David S . Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Jiayuan Chen <jiayuan.chen@linux.dev>,
	Mahanta Jambigi <mjambigi@linux.ibm.com>,
	Tony Lu <tonylu@linux.alibaba.com>,
	Wen Gu <guwen@linux.alibaba.com>, Simon Horman <horms@kernel.org>,
	Karsten Graul <kgraul@linux.ibm.com>,
	Guvenc Gulce <guvenc@linux.ibm.com>,
	Ursula Braun <ubraun@linux.ibm.com>,
	linux-rdma@vger.kernel.org, linux-s390@vger.kernel.org,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	bpf@vger.kernel.org
Subject: Re: [PATCH net v3] net/smc: fix out-of-bounds read when sk_user_data holds a sk_psock
Date: Thu, 2 Jul 2026 11:14:50 +0800	[thread overview]
Message-ID: <20260702031450.GA61525@j66a10360.sqa.eu95> (raw)
In-Reply-To: <20260629095140.679754-1-rhkrqnwk98@gmail.com>

On Mon, Jun 29, 2026 at 09:51:33AM +0000, Sechang Lim wrote:
> A passive-open child inherits the listener's smc_clcsock_data_ready().
> sk_clone_lock() clears its sk_user_data to NULL because the listener tagged
> it SK_USER_DATA_NOCOPY. Until accept restores the callback, a BPF sock_ops
> program can add the established child to a sockmap, and sk_psock_init()
> installs a sk_psock into the NULL sk_user_data. The inherited callback then
> reads it back through smc_clcsock_user_data(), which strips only NOCOPY,
> takes the sk_psock for an smc_sock, and dereferences a clcsk_* field past
> its end:
> 
>   BUG: KASAN: slab-out-of-bounds in smc_clcsock_data_ready+0x84/0x200 net/smc/af_smc.c:2637
>   Read of size 8 at addr ffff8880013b8674 by task syz.6.12484/67930
>    <IRQ>
>    smc_clcsock_data_ready+0x84/0x200 net/smc/af_smc.c:2637
>    tcp_urg+0x24d/0x360 net/ipv4/tcp_input.c:6264
>    tcp_rcv_state_process+0x280d/0x4940 net/ipv4/tcp_input.c:7336
>    tcp_child_process+0x371/0xa50 net/ipv4/tcp_minisocks.c:1002
>    tcp_v4_rcv+0x1eaa/0x2a00 net/ipv4/tcp_ipv4.c:2186
>    [...]
>    </IRQ>
> 
>   Allocated by task 67930:
>    sk_psock_init+0x142/0x740 net/core/skmsg.c:766
>    sock_hash_update_common+0xd3/0x990 net/core/sock_map.c:1010
>    bpf_sock_hash_update+0x114/0x170 net/core/sock_map.c:1229
>    __cgroup_bpf_run_filter_sock_ops+0x74/0xa0 kernel/bpf/cgroup.c:1727
>    tcp_init_transfer+0x1085/0x1100 net/ipv4/tcp_input.c:6693
>    [...]
> 
> Resolve the conflict on the write path. Reserve the child's sk_user_data
> with a NULL pointer tagged SK_USER_DATA_NOCOPY so sk_psock_init() returns
> -EBUSY, and release it at accept. smc_clcsock_user_data() still strips the
> tag to NULL, so the inherited callback stays a no-op.
> 
> Fixes: a60a2b1e0af1 ("net/smc: reduce active tcp_listen workers")
> Signed-off-by: Sechang Lim <rhkrqnwk98@gmail.com>
> ---
> v3:
>  - reserve sk_user_data on the write path instead of the read-side check (D. Wythe)
> 
> v2:
>  - https://lore.kernel.org/netdev/20260619150342.3626224-1-rhkrqnwk98@gmail.com/
> 
> v1:
>  - https://lore.kernel.org/netdev/20260614120931.4041687-1-rhkrqnwk98@gmail.com/
> 
>  net/smc/af_smc.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
> index b5db69073e20..78f162344fe3 100644
> --- a/net/smc/af_smc.c
> +++ b/net/smc/af_smc.c
> @@ -154,7 +154,11 @@ static struct sock *smc_tcp_syn_recv_sock(const struct sock *sk,
>  					       own_req, opt_child_init);
>  	/* child must not inherit smc or its ops */
>  	if (child) {
> -		rcu_assign_sk_user_data(child, NULL);
> +		/* reserve sk_user_data so sockmap cannot claim the slot */
> +		write_lock_bh(&child->sk_callback_lock);
> +		__rcu_assign_sk_user_data_with_flags(child, NULL,
> +						     SK_USER_DATA_NOCOPY);
> +		write_unlock_bh(&child->sk_callback_lock);
>  
>  		/* v4-mapped sockets don't inherit parent ops. Don't restore. */
>  		if (inet_csk(child)->icsk_af_ops == inet_csk(sk)->icsk_af_ops)
> @@ -1773,6 +1777,7 @@ static int smc_clcsock_accept(struct smc_sock *lsmc, struct smc_sock **new_smc)
>  	/* new clcsock has inherited the smc listen-specific sk_data_ready
>  	 * function; switch it back to the original sk_data_ready function
>  	 */
> +	write_lock_bh(&new_clcsock->sk->sk_callback_lock);
>  	new_clcsock->sk->sk_data_ready = lsmc->clcsk_data_ready;
>  
>  	/* if new clcsock has also inherited the fallback-specific callback
> @@ -1786,6 +1791,9 @@ static int smc_clcsock_accept(struct smc_sock *lsmc, struct smc_sock **new_smc)
>  		if (lsmc->clcsk_error_report)
>  			new_clcsock->sk->sk_error_report = lsmc->clcsk_error_report;
>  	}
> +	/* release the slot reserved in smc_tcp_syn_recv_sock() */
> +	rcu_assign_sk_user_data(new_clcsock->sk, NULL);
> +	write_unlock_bh(&new_clcsock->sk->sk_callback_lock);
>  
>  	(*new_smc)->clcsock = new_clcsock;
>  out:
> -- 


I do not think this is the right direction.

You have now sent three versions of essentially the same patch, but I
still do not see a clear explanation of why this approach is supposed to
solve the problem you described. At this point, it looks like you are
repeatedly posting changes without demonstrating that you fully
understand the ownership and lifetime rules involved here.

This is not the approach I suggested. My preference is to keep the
sk_user_data reserved from the time the clcsock is created, by not
setting NOCOPY, so that it remains inherited by the passive socket.
That avoids the release-then-reacquire pattern entirely. The problem
is more complicated than it may appear, and if you choose this direction,
you also need to account for how SMC fallback continues to work with
sockmap.

If you do not think you can fully address that approach, then feel free
to pursue other options that you are able to handle, but I will not ack
any workaround version. You may seek ack from other maintainers.

D. Wythe

> 2.43.0

      parent reply	other threads:[~2026-07-02  3:14 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-29  9:51 [PATCH net v3] net/smc: fix out-of-bounds read when sk_user_data holds a sk_psock Sechang Lim
2026-07-01 15:27 ` Paolo Abeni
2026-07-02  3:17   ` D. Wythe
2026-07-02  3:14 ` D. Wythe [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260702031450.GA61525@j66a10360.sqa.eu95 \
    --to=alibuda@linux.alibaba.com \
    --cc=bpf@vger.kernel.org \
    --cc=davem@davemloft.net \
    --cc=dust.li@linux.alibaba.com \
    --cc=edumazet@google.com \
    --cc=guvenc@linux.ibm.com \
    --cc=guwen@linux.alibaba.com \
    --cc=horms@kernel.org \
    --cc=jiayuan.chen@linux.dev \
    --cc=kgraul@linux.ibm.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=mjambigi@linux.ibm.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=rhkrqnwk98@gmail.com \
    --cc=sidraya@linux.ibm.com \
    --cc=tonylu@linux.alibaba.com \
    --cc=ubraun@linux.ibm.com \
    --cc=wenjia@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox