From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-yw1-f172.google.com (mail-yw1-f172.google.com [209.85.128.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9A2E134887E for ; Wed, 12 Nov 2025 18:27:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.172 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762972044; cv=none; b=qJ6pXHiwKYBjB84oiOHsd5xZ9VF5BGv8eX8ExUDmn/ch3Yk8LEWvt4RstEFK23kQxCyqwYuD/oEJMiS1wqds9ZbtIgiqGd6s8sen3h0ne5Q4/vzStN/Lt6d/XEyX+EhxMu934QBrZaCFSjjVzdHw77RKe6nOGLU53nJsqDkd3I8= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762972044; c=relaxed/simple; bh=t9WH0uIA0Kd5iVPDGi3VBM9RJJsL30m93Ocqn2bvwTY=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=kYLGcKePE7u3FWS8NLiW1lLz9s3MaScL+BWC3V2TAepdqUIdA+/jLRPujPtFBLPKgHc6KrvXWGM6sOWMOehNcBN93Q9b5kjBKpArI/JJ1tCTqApxyG2ZD0xv05zMnmL3Jyh4HxsGaYtpK6kqy1MCigc2CjxHrPcRx9BeevzUbIY= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=RkOhmT2v; arc=none smtp.client-ip=209.85.128.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="RkOhmT2v" Received: by mail-yw1-f172.google.com with SMTP id 00721157ae682-7869deffb47so11707067b3.1 for ; Wed, 12 Nov 2025 10:27:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1762972040; x=1763576840; darn=lists.linux.dev; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=Kn4iJ576u/8r69eAmbE1/5rsoTHMVU1gXmFF80Gkt/s=; b=RkOhmT2vLPSzXThrC1u3+OCrNp8+qPdLZDgDfl0HsIWeF7mS/Zx3SJeqyQDJyKHHeT 0766breZi8FANPnZbE0H5WfJc//prOnrH1H4P0KgqttcOCz6tjQ/Yhj6412KsEaYvEIt 2yUdQYuL0iU0V8VXNNXg27TZmom+oqZs7/YJCBD9KAlVUV1yw+GBBoiOqzy01ojEp3PR vC34jUT8d9N1V3bk+xTPH9i8x80U6uojt2xhVinm6R2EwF9FzCAdR1tSwuxNLmB3VDSU sTNkU/pqcirUOf9MhMhiy6bk2odLJjRwE/c+B3FwqWh4qY4glKZPtvu+ZdnqpYUX0Fnh VRow== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1762972040; x=1763576840; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Kn4iJ576u/8r69eAmbE1/5rsoTHMVU1gXmFF80Gkt/s=; b=QhNbX/WH/Qcv6cCGa9LBV42QKGj5FSmmOEIQJ80mKjZZsXnBc7oDqwE8sr7O9zqGX2 tezrmhtWLUorZMfQC9EPjpvAnOFgw94tG4ZJdHP7ohHlrjqnZiY+zPUjrwZyfj5A4nAg otuXuh9lIE0SpxwzsfPcFhpAVi18eY3pD7P4kh/9XGXTEK6SdDTie0GkJjQi1d2Fh2dB W4NIBfS4+p24jHmawd21jXoILnYiJOyrCTTuUKcMg/WCX8YC3q24krhqSH+mRoExO9FU ZZI7pN55Tyyu0fxs1RKtGY9+n03Hnjk1l5eNdfMF9Ia6q4OtpXJc49xi4uZ367tEAikr if1A== X-Forwarded-Encrypted: i=1; AJvYcCVW6gKyXmTAH+dF0cDF7BUachzvHxBs/3feWm+clGHvfPTISqd28bQnDxaI03qfuTjfbBZxJ4bAyN6v9e/KKw==@lists.linux.dev X-Gm-Message-State: AOJu0Yy57QolHTPerYl+NH3fD/5Ug4AyUrYagnndfc0oGcRIUQQu4FM4 XjYroY4DFOJDzkg65w6Z5V1t0u2TwAkVQztjfGUHctZHtayPIhYKUe8H X-Gm-Gg: ASbGncsnfIaa8tB2y5UpWRI4Gz/9tDuq/ekmaAAabYDvSYES9iAA7vtV180gtKrkiPr ENkRhV2TVF/pd8iQ+ItAZsupAcHmnGtjuBNPECDeNQANSmze6qnWsTc4O76xenkuud6RIwe3CTR 7LgagKizcAP34vPC9D+VojHkr9eEarPVVdZiMqV4k74/AA44yip/V63aqAdgkB8jx5LEf/DlXBn ohViVOxXf6zlLpfujzZ1dBMV4v4G6WSvVb1t+jbnOV8tPSX4LI37LxlwZxJfdPWlLe+GCtx5vSY aR0GtTxHt5vqXyvA0PUDDNMG78b6Ff+YT9BXT1LU9fLKw9ac3iLnjjIdHiMT/3k4tXWP1ajC1y3 SSs2XdLYTNPHkOUXY8OMs6i+pieyxlHQUb7wxV1H18zPvDU1GRlaBDE3XfloRDcModsIl96Jc8j RMZ458xY919GHCxPxEZ+wDQoRs5cw82ZV893zHQmYB/mbPlWY= X-Google-Smtp-Source: AGHT+IFtItqGMv7c9zs0jQHS1j4i/TwYkKYfFZ78q75Lc+7yrpynqEhiYQ57+QLFTujVr5ZJ9SSG/A== X-Received: by 2002:a05:690c:a9a:b0:786:82fc:ab57 with SMTP id 00721157ae682-788136f832emr34080737b3.67.1762972040554; Wed, 12 Nov 2025 10:27:20 -0800 (PST) Received: from devvm11784.nha0.facebook.com ([2a03:2880:25ff:5e::]) by smtp.gmail.com with ESMTPSA id 00721157ae682-787d69e9dbesm44067957b3.42.2025.11.12.10.27.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 12 Nov 2025 10:27:19 -0800 (PST) Date: Wed, 12 Nov 2025 10:27:18 -0800 From: Bobby Eshleman To: Stefano Garzarella Cc: Shuah Khan , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Simon Horman , Stefan Hajnoczi , "Michael S. Tsirkin" , Jason Wang , Xuan Zhuo , Eugenio =?iso-8859-1?Q?P=E9rez?= , "K. Y. Srinivasan" , Haiyang Zhang , Wei Liu , Dexuan Cui , Bryan Tan , Vishnu Dasa , Broadcom internal kernel review list , virtualization@lists.linux.dev, netdev@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, linux-hyperv@vger.kernel.org, Sargun Dhillon , berrange@redhat.com, Bobby Eshleman Subject: Re: [PATCH net-next v9 06/14] vsock/loopback: add netns support Message-ID: References: <20251111-vsock-vmtest-v9-0-852787a37bed@meta.com> <20251111-vsock-vmtest-v9-6-852787a37bed@meta.com> Precedence: bulk X-Mailing-List: virtualization@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: On Wed, Nov 12, 2025 at 03:19:47PM +0100, Stefano Garzarella wrote: > On Tue, Nov 11, 2025 at 10:54:48PM -0800, Bobby Eshleman wrote: > > From: Bobby Eshleman > > > > Add NS support to vsock loopback. Sockets in a global mode netns > > communicate with each other, regardless of namespace. Sockets in a local > > mode netns may only communicate with other sockets within the same > > namespace. > > > > Signed-off-by: Bobby Eshleman > > --- > > Changes in v9: > > - remove per-netns vsock_loopback and workqueues, just re-using > > the net and net_mode in skb->cb achieved the same result in a simpler > > way. Also removed need for pernet_subsys. > > - properly track net references > > > > Changes in v7: > > - drop for_each_net() init/exit, drop net_rwsem, the pernet registration > > handles this automatically and race-free > > - flush workqueue before destruction, purge pkt list > > - remember net_mode instead of current net mode > > - keep space after INIT_WORK() > > - change vsock_loopback in netns_vsock to ->priv void ptr > > - rename `orig_net_mode` to `net_mode` > > - remove useless comment > > - protect `register_pernet_subsys()` with `net_rwsem` > > - do cleanup before releasing `net_rwsem` when failure happens > > - call `unregister_pernet_subsys()` in `vsock_loopback_exit()` > > - call `vsock_loopback_deinit_vsock()` in `vsock_loopback_exit()` > > > > Changes in v6: > > - init pernet ops for vsock_loopback module > > - vsock_loopback: add space in struct to clarify lock protection > > - do proper cleanup/unregister on vsock_loopback_exit() > > - vsock_loopback: use virtio_vsock_skb_net() > > > > Changes in v5: > > - add callbacks code to avoid reverse dependency > > - add logic for handling vsock_loopback setup for already existing > > namespaces > > --- > > net/vmw_vsock/vsock_loopback.c | 41 ++++++++++++++++++++++++++++++++++++++++- > > 1 file changed, 40 insertions(+), 1 deletion(-) > > > > diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c > > index d3ac056663ea..e62f6c516992 100644 > > --- a/net/vmw_vsock/vsock_loopback.c > > +++ b/net/vmw_vsock/vsock_loopback.c > > @@ -32,6 +32,9 @@ static int vsock_loopback_send_pkt(struct sk_buff *skb, struct net *net, > > struct vsock_loopback *vsock = &the_vsock_loopback; > > int len = skb->len; > > > > + virtio_vsock_skb_set_net(skb, net); > > + virtio_vsock_skb_set_net_mode(skb, net_mode); > > + > > virtio_vsock_skb_queue_tail(&vsock->pkt_queue, skb); > > queue_work(vsock->workqueue, &vsock->pkt_work); > > > > @@ -116,8 +119,10 @@ static void vsock_loopback_work(struct work_struct *work) > > { > > struct vsock_loopback *vsock = > > container_of(work, struct vsock_loopback, pkt_work); > > + enum vsock_net_mode net_mode; > > struct sk_buff_head pkts; > > struct sk_buff *skb; > > + struct net *net; > > > > skb_queue_head_init(&pkts); > > > > @@ -131,7 +136,41 @@ static void vsock_loopback_work(struct work_struct *work) > > */ > > virtio_transport_consume_skb_sent(skb, false); > > virtio_transport_deliver_tap_pkt(skb); > > - virtio_transport_recv_pkt(&loopback_transport, skb, NULL, 0); > > + > > + /* In the case of virtio_transport_reset_no_sock(), the skb > > + * does not hold a reference on the socket, and so does not > > + * transitively hold a reference on the net. > > + * > > + * There is an ABA race condition in this sequence: > > + * 1. the sender sends a packet > > + * 2. worker calls virtio_transport_recv_pkt(), using the > > + * sender's net > > + * 3. virtio_transport_recv_pkt() uses t->send_pkt() passing the > > + * sender's net > > + * 4. virtio_transport_recv_pkt() free's the skb, dropping the > > + * reference to the socket > > + * 5. the socket closes, frees its reference to the net > > + * 6. Finally, the worker for the second t->send_pkt() call > > + * processes the skb, and uses the now stale net pointer for > > + * socket lookups. > > + * > > + * To prevent this, we acquire a net reference in vsock_loopback_send_pkt() > > + * and hold it until virtio_transport_recv_pkt() completes. > > + * > > + * Additionally, we must grab a reference on the skb before > > + * calling virtio_transport_recv_pkt() to prevent it from > > + * freeing the skb before we have a chance to release the net. > > + */ > > + net_mode = virtio_vsock_skb_net_mode(skb); > > + net = virtio_vsock_skb_net(skb); > > Wait, we are adding those just for loopback (in theory used only for > testing/debugging)? And only to support virtio_transport_reset_no_sock() use > case? Yes, exactly, only loopback + reset_no_sock(). The issue doesn't exist for vhost-vsock because vhost_vsock holds a net reference, and it doesn't exist for non-reset_no_sock calls because after looking up the socket we transfer skb ownership to it, which holds down the skb -> sk -> net reference chain. > > Honestly I don't like this, do we have any alternative? > > I'll also try to think something else. > > Stefano I've been thinking about this all morning... maybe we can do something like this: ``` virtio_transport_recv_pkt(..., struct sock *reply_sk) {... } virtio_transport_reset_no_sock(..., reply_sk) { if (reply_sk) skb_set_owner_sk_safe(reply, reply_sk) t->send_pkt(reply); } vsock_loopback_work(...) { virtio_transport_recv_pkt(..., skb, skb->sk); } for other transports: virtio_transport_recv_pkt(..., skb, NULL); ``` This way 'reply' keeps the sk and sk->net alive even after virtio_transport_recv_pkt() frees 'skb'. The net won't be released until after 'reply' is freed back on the other side, removing the race. It makes semantic sense too... for loopback, we already know which sk the reply is going back to. For other transports, we don't because they're across the virt boundary. WDYT? I hate to suggest this, but another option might be to just do nothing? In order for this race to have any real effect, a loopback socket must send a pkt to a non-existent socket, immediately close(), then the namespace deleted, a new namespace created with the same pointer address, and finally a new socket with the same port created in that namespace, all before the reply RST reaches recv_pkt()... at which point the newly created socket would wrongfully receive the RST. Best, Bobby