* [PATCH net] net: initialize sk_rx_queue_mapping in sk_clone()
@ 2026-04-07 8:42 Jiayuan Chen
2026-04-07 9:41 ` Eric Dumazet
0 siblings, 1 reply; 2+ messages in thread
From: Jiayuan Chen @ 2026-04-07 8:42 UTC (permalink / raw)
To: netdev
Cc: Jiayuan Chen, Eric Dumazet, Kuniyuki Iwashima, Paolo Abeni,
Willem de Bruijn, David S. Miller, Jakub Kicinski, Simon Horman,
Soheil Hassas Yeganeh, linux-kernel
sk_clone() initializes sk_tx_queue_mapping via sk_tx_queue_clear()
but does not initialize sk_rx_queue_mapping. Since this field is in
the sk_dontcopy region, it is neither copied from the parent socket
by sock_copy() nor zeroed by sk_prot_alloc() (called without
__GFP_ZERO from sk_clone).
Commit 03cfda4fa6ea ("tcp: fix another uninit-value
(sk_rx_queue_mapping)") attempted to fix this by introducing
sk_mark_napi_id_set() with force_set=true in tcp_child_process().
However, sk_mark_napi_id_set() -> sk_rx_queue_set() only writes
when skb_rx_queue_recorded(skb) is true. If the 3-way handshake
ACK arrives through a device that does not record rx_queue (e.g.
loopback or veth), sk_rx_queue_mapping remains uninitialized.
When a subsequent data packet arrives with a recorded rx_queue,
sk_mark_napi_id() -> sk_rx_queue_update() reads the uninitialized
field for comparison (force_set=false path), triggering KMSAN.
This was reproduced by establishing a TCP connection over loopback
(which does not call skb_record_rx_queue), then attaching a BPF TC
program on lo ingress to set skb->queue_mapping on data packets:
BUG: KMSAN: uninit-value in tcp_v4_do_rcv (net/ipv4/tcp_ipv4.c:1875)
tcp_v4_do_rcv (net/ipv4/tcp_ipv4.c:1875)
tcp_v4_rcv (net/ipv4/tcp_ipv4.c:2287)
ip_protocol_deliver_rcu (net/ipv4/ip_input.c:207)
ip_local_deliver_finish (net/ipv4/ip_input.c:242)
ip_local_deliver (net/ipv4/ip_input.c:262)
ip_rcv (net/ipv4/ip_input.c:573)
__netif_receive_skb (net/core/dev.c:6294)
process_backlog (net/core/dev.c:6646)
__napi_poll (net/core/dev.c:7710)
net_rx_action (net/core/dev.c:7929)
handle_softirqs (kernel/softirq.c:623)
do_softirq (kernel/softirq.c:523)
__local_bh_enable_ip (kernel/softirq.c:?)
__dev_queue_xmit (net/core/dev.c:?)
ip_finish_output2 (net/ipv4/ip_output.c:237)
ip_output (net/ipv4/ip_output.c:438)
__ip_queue_xmit (net/ipv4/ip_output.c:534)
__tcp_transmit_skb (net/ipv4/tcp_output.c:1693)
tcp_write_xmit (net/ipv4/tcp_output.c:3064)
tcp_sendmsg_locked (net/ipv4/tcp.c:?)
tcp_sendmsg (net/ipv4/tcp.c:1465)
inet_sendmsg (net/ipv4/af_inet.c:865)
sock_write_iter (net/socket.c:1195)
vfs_write (fs/read_write.c:688)
...
Uninit was created at:
kmem_cache_alloc_noprof (mm/slub.c:4873)
sk_prot_alloc (net/core/sock.c:2239)
sk_alloc (net/core/sock.c:2301)
inet_create (net/ipv4/af_inet.c:334)
__sock_create (net/socket.c:1605)
__sys_socket (net/socket.c:1747)
Fix this at the root by adding sk_rx_queue_clear() alongside
sk_tx_queue_clear() in sk_clone().
Fixes: 342159ee394d ("net: avoid dirtying sk->sk_rx_queue_mapping")
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
---
net/core/sock.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/net/core/sock.c b/net/core/sock.c
index 5976100a9d55..a12c5eca88f2 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2583,6 +2583,7 @@ struct sock *sk_clone(const struct sock *sk, const gfp_t priority,
sk_set_socket(newsk, NULL);
sk_tx_queue_clear(newsk);
+ sk_rx_queue_clear(newsk);
RCU_INIT_POINTER(newsk->sk_wq, NULL);
if (newsk->sk_prot->sockets_allocated)
--
2.43.0
^ permalink raw reply related [flat|nested] 2+ messages in thread
* Re: [PATCH net] net: initialize sk_rx_queue_mapping in sk_clone()
2026-04-07 8:42 [PATCH net] net: initialize sk_rx_queue_mapping in sk_clone() Jiayuan Chen
@ 2026-04-07 9:41 ` Eric Dumazet
0 siblings, 0 replies; 2+ messages in thread
From: Eric Dumazet @ 2026-04-07 9:41 UTC (permalink / raw)
To: Jiayuan Chen
Cc: netdev, Kuniyuki Iwashima, Paolo Abeni, Willem de Bruijn,
David S. Miller, Jakub Kicinski, Simon Horman,
Soheil Hassas Yeganeh, linux-kernel
On Tue, Apr 7, 2026 at 1:43 AM Jiayuan Chen <jiayuan.chen@linux.dev> wrote:
>
> sk_clone() initializes sk_tx_queue_mapping via sk_tx_queue_clear()
> but does not initialize sk_rx_queue_mapping. Since this field is in
> the sk_dontcopy region, it is neither copied from the parent socket
> by sock_copy() nor zeroed by sk_prot_alloc() (called without
> __GFP_ZERO from sk_clone).
>
> Commit 03cfda4fa6ea ("tcp: fix another uninit-value
> (sk_rx_queue_mapping)") attempted to fix this by introducing
> sk_mark_napi_id_set() with force_set=true in tcp_child_process().
> However, sk_mark_napi_id_set() -> sk_rx_queue_set() only writes
> when skb_rx_queue_recorded(skb) is true. If the 3-way handshake
> ACK arrives through a device that does not record rx_queue (e.g.
> loopback or veth), sk_rx_queue_mapping remains uninitialized.
>
> When a subsequent data packet arrives with a recorded rx_queue,
> sk_mark_napi_id() -> sk_rx_queue_update() reads the uninitialized
> field for comparison (force_set=false path), triggering KMSAN.
>
> This was reproduced by establishing a TCP connection over loopback
> (which does not call skb_record_rx_queue), then attaching a BPF TC
> program on lo ingress to set skb->queue_mapping on data packets:
Ok, it's a somewhat convoluted way, and no real harm but KMSAN :)
Reviewed-by: Eric Dumazet <edumazet@google.com>
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2026-04-07 9:41 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-07 8:42 [PATCH net] net: initialize sk_rx_queue_mapping in sk_clone() Jiayuan Chen
2026-04-07 9:41 ` Eric Dumazet
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox