All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bobby Eshleman <bobbyeshleman@gmail.com>
To: Stefano Garzarella <sgarzare@redhat.com>
Cc: "Shuah Khan" <shuah@kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	"Eric Dumazet" <edumazet@google.com>,
	"Jakub Kicinski" <kuba@kernel.org>,
	"Paolo Abeni" <pabeni@redhat.com>,
	"Simon Horman" <horms@kernel.org>,
	"Stefan Hajnoczi" <stefanha@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"Jason Wang" <jasowang@redhat.com>,
	"Xuan Zhuo" <xuanzhuo@linux.alibaba.com>,
	"Eugenio Pérez" <eperezma@redhat.com>,
	"K. Y. Srinivasan" <kys@microsoft.com>,
	"Haiyang Zhang" <haiyangz@microsoft.com>,
	"Wei Liu" <wei.liu@kernel.org>,
	"Dexuan Cui" <decui@microsoft.com>,
	"Bryan Tan" <bryan-bt.tan@broadcom.com>,
	"Vishnu Dasa" <vishnu.dasa@broadcom.com>,
	"Broadcom internal kernel review list"
	<bcm-kernel-feedback-list@broadcom.com>,
	virtualization@lists.linux.dev, netdev@vger.kernel.org,
	linux-kselftest@vger.kernel.org, linux-kernel@vger.kernel.org,
	kvm@vger.kernel.org, linux-hyperv@vger.kernel.org,
	"Sargun Dhillon" <sargun@sargun.me>,
	berrange@redhat.com, "Bobby Eshleman" <bobbyeshleman@meta.com>
Subject: Re: [PATCH net-next v9 06/14] vsock/loopback: add netns support
Date: Thu, 13 Nov 2025 10:26:04 -0800	[thread overview]
Message-ID: <aRYivEKsa44u5Mh+@devvm11784.nha0.facebook.com> (raw)
In-Reply-To: <g5dcyor4aryvtcnqxm5aekldbettetlmog3c7sj7sjx3yp2pgy@hcpxyubied2n>

On Thu, Nov 13, 2025 at 04:24:44PM +0100, Stefano Garzarella wrote:
> On Wed, Nov 12, 2025 at 10:27:18AM -0800, Bobby Eshleman wrote:
> > On Wed, Nov 12, 2025 at 03:19:47PM +0100, Stefano Garzarella wrote:
> > > On Tue, Nov 11, 2025 at 10:54:48PM -0800, Bobby Eshleman wrote:
> > > > From: Bobby Eshleman <bobbyeshleman@meta.com>
> > > >
> > > > Add NS support to vsock loopback. Sockets in a global mode netns
> > > > communicate with each other, regardless of namespace. Sockets in a local
> > > > mode netns may only communicate with other sockets within the same
> > > > namespace.
> > > >
> > > > Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>

[...]

> > > > @@ -131,7 +136,41 @@ static void vsock_loopback_work(struct work_struct *work)
> > > > 		 */
> > > > 		virtio_transport_consume_skb_sent(skb, false);
> > > > 		virtio_transport_deliver_tap_pkt(skb);
> > > > -		virtio_transport_recv_pkt(&loopback_transport, skb, NULL, 0);
> > > > +
> > > > +		/* In the case of virtio_transport_reset_no_sock(), the skb
> > > > +		 * does not hold a reference on the socket, and so does not
> > > > +		 * transitively hold a reference on the net.
> > > > +		 *
> > > > +		 * There is an ABA race condition in this sequence:
> > > > +		 * 1. the sender sends a packet
> > > > +		 * 2. worker calls virtio_transport_recv_pkt(), using the
> > > > +		 *    sender's net
> > > > +		 * 3. virtio_transport_recv_pkt() uses t->send_pkt() passing the
> > > > +		 *    sender's net
> > > > +		 * 4. virtio_transport_recv_pkt() free's the skb, dropping the
> > > > +		 *    reference to the socket
> > > > +		 * 5. the socket closes, frees its reference to the net
> > > > +		 * 6. Finally, the worker for the second t->send_pkt() call
> > > > +		 *    processes the skb, and uses the now stale net pointer for
> > > > +		 *    socket lookups.
> > > > +		 *
> > > > +		 * To prevent this, we acquire a net reference in vsock_loopback_send_pkt()
> > > > +		 * and hold it until virtio_transport_recv_pkt() completes.
> > > > +		 *
> > > > +		 * Additionally, we must grab a reference on the skb before
> > > > +		 * calling virtio_transport_recv_pkt() to prevent it from
> > > > +		 * freeing the skb before we have a chance to release the net.
> > > > +		 */
> > > > +		net_mode = virtio_vsock_skb_net_mode(skb);
> > > > +		net = virtio_vsock_skb_net(skb);
> > > 
> > > Wait, we are adding those just for loopback (in theory used only for
> > > testing/debugging)? And only to support virtio_transport_reset_no_sock() use
> > > case?
> > 
> > Yes, exactly, only loopback + reset_no_sock(). The issue doesn't exist
> > for vhost-vsock because vhost_vsock holds a net reference, and it
> > doesn't exist for non-reset_no_sock calls because after looking up the
> > socket we transfer skb ownership to it, which holds down the skb -> sk ->
> > net reference chain.
> > 
> > > 
> > > Honestly I don't like this, do we have any alternative?
> > > 
> > > I'll also try to think something else.
> > > 
> > > Stefano
> > 
> > 
> > I've been thinking about this all morning... maybe
> > we can do something like this:
> > 
> > ```
> > 
> > virtio_transport_recv_pkt(...,  struct sock *reply_sk) {... }
> > 
> > virtio_transport_reset_no_sock(..., reply_sk)
> > {
> > 	if (reply_sk)
> > 		skb_set_owner_sk_safe(reply, reply_sk)
> 
> Interesting, but what about if we call skb_set_owner_sk_safe() in
> vsock_loopback.c just before calling virtio_transport_recv_pkt() for every
> skb?

I think the issue with this is that at the time vsock_loopback calls
virtio_transport_recv_pkt() the reply skb hasn't yet been allocated by
virtio_transport_reset_no_sock() and we can't wait for it to return
because the original skb may be freed by then.

We might be able to keep it all in vsock_loopback if we removed the need
to use the original skb or sk by just using the net. But to do that we
would need to add a netns_tracker per net somewhere. I guess that would
end up in a list or hashmap in struct vsock_loopback.

Another option that does simplify a little, but unfortunately still doesn't keep
everything in loopback:

@@ -1205,7 +1205,7 @@ static int virtio_transport_reset_no_sock(const struct virtio_transport *t,
 	if (!reply)
 		return -ENOMEM;
 
-	return t->send_pkt(reply, net, net_mode);
+	return t->send_pkt(reply, net, net_mode, skb->sk);
 }

@@ -27,11 +27,16 @@ static u32 vsock_loopback_get_local_cid(void)
 }

 static int vsock_loopback_send_pkt(struct sk_buff *skb, struct net *net,
-				   enum vsock_net_mode net_mode)
+				   enum vsock_net_mode net_mode,
+				   struct sock *rst_owner)
 {
 	struct vsock_loopback *vsock = &the_vsock_loopback;
 	int len = skb->len;
 
+	if (!skb->sk && rst_owner)
+		WARN_ONCE(!skb_set_owner_sk_safe(skb, rst_owner),
+			  "loopback socket has sk_refcnt == 0\n");
+
 	virtio_vsock_skb_queue_tail(&vsock->pkt_queue, skb);
 	queue_work(vsock->workqueue, &vsock->pkt_work);

> 
> Maybe we should refactor a bit virtio_transport_recv_pkt() e.g. moving
> `skb_set_owner_sk_safe()` to be sure it's called only when we are sure it's
> the right socket (e.g. after checking SOCK_DONE).
> 
> WDYT?

I agree, it is called a little prematurely.

Thanks,
Bobby

  reply	other threads:[~2025-11-13 18:26 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-12  6:54 [PATCH net-next v9 00/14] vsock: add namespace support to vhost-vsock and loopback Bobby Eshleman
2025-11-12  6:54 ` [PATCH net-next v9 01/14] vsock: a per-net vsock NS mode state Bobby Eshleman
2025-11-12 14:13   ` Stefano Garzarella
2025-11-12  6:54 ` [PATCH net-next v9 02/14] vsock: add netns to vsock core Bobby Eshleman
2025-11-12 14:14   ` Stefano Garzarella
2025-11-12  6:54 ` [PATCH net-next v9 03/14] vsock/virtio: add netns support to virtio transport and virtio common Bobby Eshleman
2025-11-12 14:18   ` Stefano Garzarella
2025-11-12 16:13     ` Bobby Eshleman
2025-11-12 17:39       ` Stefano Garzarella
2025-11-12 19:32         ` Bobby Eshleman
2025-11-13 15:31           ` Stefano Garzarella
2025-11-12  6:54 ` [PATCH net-next v9 04/14] vsock/virtio: pack struct virtio_vsock_skb_cb Bobby Eshleman
2025-11-12  6:54 ` [PATCH net-next v9 05/14] vsock: add netns and netns_tracker to vsock skb cb Bobby Eshleman
2025-11-12  6:54 ` [PATCH net-next v9 06/14] vsock/loopback: add netns support Bobby Eshleman
2025-11-12 14:19   ` Stefano Garzarella
2025-11-12 18:27     ` Bobby Eshleman
2025-11-13 15:24       ` Stefano Garzarella
2025-11-13 18:26         ` Bobby Eshleman [this message]
2025-11-14  9:33           ` Stefano Garzarella
2025-11-14 22:13             ` Bobby Eshleman
2025-11-17  9:27               ` Stefano Garzarella
2025-11-12  6:54 ` [PATCH net-next v9 07/14] vhost/vsock: " Bobby Eshleman
2025-11-12  6:54 ` [PATCH net-next v9 08/14] vsock: reject bad VSOCK_NET_MODE_LOCAL configuration for G2H Bobby Eshleman
2025-11-12 14:21   ` Stefano Garzarella
2025-11-12 18:36     ` Bobby Eshleman
2025-11-12  6:54 ` [PATCH net-next v9 09/14] selftests/vsock: add namespace helpers to vmtest.sh Bobby Eshleman
2025-11-12  6:54 ` [PATCH net-next v9 10/14] selftests/vsock: prepare vm management helpers for namespaces Bobby Eshleman
2025-11-12 14:23   ` Stefano Garzarella
2025-11-12  6:54 ` [PATCH net-next v9 11/14] selftests/vsock: add tests for proc sys vsock ns_mode Bobby Eshleman
2025-11-12 14:38   ` Stefano Garzarella
2025-11-12  6:54 ` [PATCH net-next v9 12/14] selftests/vsock: add namespace tests for CID collisions Bobby Eshleman
2025-11-12  6:54 ` [PATCH net-next v9 13/14] selftests/vsock: add tests for host <-> vm connectivity with namespaces Bobby Eshleman
2025-11-12 14:41   ` Stefano Garzarella
2025-11-12  6:54 ` [PATCH net-next v9 14/14] selftests/vsock: add tests for namespace deletion and mode changes Bobby Eshleman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aRYivEKsa44u5Mh+@devvm11784.nha0.facebook.com \
    --to=bobbyeshleman@gmail.com \
    --cc=bcm-kernel-feedback-list@broadcom.com \
    --cc=berrange@redhat.com \
    --cc=bobbyeshleman@meta.com \
    --cc=bryan-bt.tan@broadcom.com \
    --cc=davem@davemloft.net \
    --cc=decui@microsoft.com \
    --cc=edumazet@google.com \
    --cc=eperezma@redhat.com \
    --cc=haiyangz@microsoft.com \
    --cc=horms@kernel.org \
    --cc=jasowang@redhat.com \
    --cc=kuba@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=kys@microsoft.com \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=sargun@sargun.me \
    --cc=sgarzare@redhat.com \
    --cc=shuah@kernel.org \
    --cc=stefanha@redhat.com \
    --cc=virtualization@lists.linux.dev \
    --cc=vishnu.dasa@broadcom.com \
    --cc=wei.liu@kernel.org \
    --cc=xuanzhuo@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.