netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Bobby Eshleman <bobbyeshleman@gmail.com>
To: Stefano Garzarella <sgarzare@redhat.com>
Cc: "Jakub Kicinski" <kuba@kernel.org>,
	"K. Y. Srinivasan" <kys@microsoft.com>,
	"Haiyang Zhang" <haiyangz@microsoft.com>,
	"Wei Liu" <wei.liu@kernel.org>,
	"Dexuan Cui" <decui@microsoft.com>,
	"Stefan Hajnoczi" <stefanha@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"Jason Wang" <jasowang@redhat.com>,
	"Xuan Zhuo" <xuanzhuo@linux.alibaba.com>,
	"Eugenio Pérez" <eperezma@redhat.com>,
	"Bryan Tan" <bryan-bt.tan@broadcom.com>,
	"Vishnu Dasa" <vishnu.dasa@broadcom.com>,
	"Broadcom internal kernel review list"
	<bcm-kernel-feedback-list@broadcom.com>,
	"David S. Miller" <davem@davemloft.net>,
	virtualization@lists.linux.dev, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-hyperv@vger.kernel.org,
	kvm@vger.kernel.org
Subject: Re: [PATCH v2 0/3] vsock: add namespace support to vhost-vsock
Date: Thu, 13 Mar 2025 09:20:14 -0700	[thread overview]
Message-ID: <Z9MFvkALRY/k3ITG@devvm6277.cco0.facebook.com> (raw)
In-Reply-To: <zz5fnulfljo6hyxaveseq3dxfgljfs33m7ncsw6uod6wouaqhl@jzdmg6s7g5dw>

On Thu, Mar 13, 2025 at 04:37:16PM +0100, Stefano Garzarella wrote:
> Hi Bobby,
> first of all, thank you for starting this work again!
> 

You're welcome, thank you for your work getting it started!

> On Wed, Mar 12, 2025 at 07:28:33PM -0700, Bobby Eshleman wrote:
> > Hey all,
> > 
> > Apologies for forgetting the 'net-next' prefix on this one. Should I
> > resend or no?
> 
> I'd say let's do a firts review cycle on this, then you can re-post.
> Please check also maintainer cced, it looks like someone is missing:
> https://patchwork.kernel.org/project/netdevbpf/patch/20250312-vsock-netns-v2-1-84bffa1aa97a@gmail.com/
> 

Duly noted, I'll double-check the ccs next time. sgtm on the re-post!

> > On Wed, Mar 12, 2025 at 01:59:34PM -0700, Bobby Eshleman wrote:
> > > Picking up Stefano's v1 [1], this series adds netns support to
> > > vhost-vsock. Unlike v1, this series does not address guest-to-host (g2h)
> > > namespaces, defering that for future implementation and discussion.
> > > 
> > > Any vsock created with /dev/vhost-vsock is a global vsock, accessible
> > > from any namespace. Any vsock created with /dev/vhost-vsock-netns is a
> > > "scoped" vsock, accessible only to sockets in its namespace. If a global
> > > vsock or scoped vsock share the same CID, the scoped vsock takes
> > > precedence.
> 
> This inside the netns, right?
> I mean if we are in a netns, and there is a VM A attached to
> /dev/vhost-vsock-netns witch CID=42 and a VM B attached to /dev/vhost-vsock
> also with CID=42, this means that VM A will not be accessible in the netns,
> but it can be accessible outside of the netns,
> right?
> 

In this scenario, CID=42 goes to VM A (/dev/vhost-vsock-netns) for any
socket in its namespace.  For any other namespace, CID=42 will go to VM
B (/dev/vhost-vsock).

If I understand your setup correctly:

	Namespace 1:
		VM A - /dev/vhost-vsock-netns, CID=42
		Process X
	Namespace 2:
		VM B - /dev/vhost-vsock, CID=42
		Process Y
	Namespace 3:
		Process Z

In this scenario, taking connect() as an example:
	Process X connect(CID=42) goes to VM A
	Process Y connect(CID=42) goes to VM B
	Process Z connect(CID=42) goes to VM B

If VM A goes away (migration, shutdown, etc...):
	Process X connect(CID=42) also goes to VM B

> > > 
> > > If a socket in a namespace connects with a global vsock, the CID becomes
> > > unavailable to any VMM in that namespace when creating new vsocks. If
> > > disconnected, the CID becomes available again.
> 
> IIUC if an application in the host running in a netns, is connected to a
> guest attached to /dev/vhost-vsock (e.g. CID=42), a new guest can't be ask
> for the same CID (42) on /dev/vhost-vsock-netns in the same netns till that
> connection is active. Is that right?
> 

Right. Here is the scenario I am trying to avoid:

Step 1: namespace 1, VM A allocated with CID 42 on /dev/vhost-vsock
Step 2: namespace 2, connect(CID=42) (this is legal, preserves old
behavior)
Step 3: namespace 2, VM B allocated with CID 42 on
/dev/vhost-vsock-netns

After step 3, CID=42 in this current namespace should belong to VM B, but
the connection from step 2 would be with VM A.

I think we have some options:
1. disallow the new VM B because the namespace is already active with VM A
2. try and allow the connection to resume, but just make sure that new
   connections got o VM B
3. close the connection from namespace 2, spin up VM B, hope user
	 manages connection retry
4. auto-retry connect to the new VM B? (seems like doing too much on the
   kernel side to me)

I chose option 1 for this rev mostly for the simplicity but definitely
open to suggestions. I think option 3 is also a simple implementation.
Option 2 would require adding some concept of "vhost-vsock ns at time of
connection" to each socket, so the tranport would know which vhost_vsock
to use for which socket.

> > > 
> > > Testing
> > > 
> > > QEMU with /dev/vhost-vsock-netns support:
> > > 	https://github.com/beshleman/qemu/tree/vsock-netns
> 
> You can also use unmodified QEMU using `vhostfd` parameter of
> `vhost-vsock-pci` device:
> 
> # FD will contain the file descriptor to /dev/vhost-vsock-netns
> exec {FD}<>/dev/vhost-vsock-netns
> 
> # pass FD to the device, this is used for example by libvirt
> qemu-system-x86_64 -smp 2 -M q35,accel=kvm,memory-backend=mem \
>   -drive file=fedora.qcow2,format=qcow2,if=virtio \
>   -object memory-backend-memfd,id=mem,size=512M \
>   -device vhost-vsock-pci,vhostfd=${FD},guest-cid=42 -nographic
> 

Very nice, thanks, I didn't realize that!

> That said, I agree we can extend QEMU with `netns` param too.
> 

I'm open to either. Your solution above is super elegant.

> BTW, I'm traveling, I'll be back next Tuesday and I hope to take a deeper
> look to the patches.
> 
> Thanks,
> Stefano
> 

Thanks Stefano! Enjoy the travel.

Best,
Bobby

  reply	other threads:[~2025-03-13 16:20 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-12 20:59 [PATCH v2 0/3] vsock: add namespace support to vhost-vsock Bobby Eshleman
2025-03-12 20:59 ` [PATCH v2 1/3] vsock: add network namespace support Bobby Eshleman
2025-03-19 13:02   ` Stefano Garzarella
2025-03-19 19:00     ` Bobby Eshleman
2025-03-20  8:57       ` Stefano Garzarella
2025-03-20 20:56         ` Bobby Eshleman
2025-03-12 20:59 ` [PATCH v2 2/3] vsock/virtio_transport_common: handle netns of received packets Bobby Eshleman
2025-03-19 13:26   ` Stefano Garzarella
2025-03-19 19:05     ` Bobby Eshleman
2025-03-12 20:59 ` [PATCH v2 3/3] vhost/vsock: use netns of process that opens the vhost-vsock-netns device Bobby Eshleman
2025-03-19 14:15   ` Stefano Garzarella
2025-03-19 19:28     ` Bobby Eshleman
2025-03-19 21:09   ` Paolo Abeni
2025-03-20  9:08     ` Stefano Garzarella
2025-03-20 21:05       ` Bobby Eshleman
2025-03-21 10:02         ` Stefano Garzarella
2025-03-21 16:43           ` Bobby Eshleman
2025-03-26  0:11           ` Bobby Eshleman
2025-03-27  9:14             ` Stefano Garzarella
2025-03-28 16:07               ` Bobby Eshleman
2025-03-28 16:19                 ` Stefano Garzarella
2025-03-28 20:14                   ` Bobby Eshleman
2025-03-20 20:57     ` Bobby Eshleman
2025-03-13  2:28 ` [PATCH v2 0/3] vsock: add namespace support to vhost-vsock Bobby Eshleman
2025-03-13 15:37   ` Stefano Garzarella
2025-03-13 16:20     ` Bobby Eshleman [this message]
2025-03-21 19:49 ` Michael S. Tsirkin
2025-03-22  1:04   ` Bobby Eshleman
2025-03-28 17:03 ` Stefano Garzarella
2025-03-28 20:13   ` Bobby Eshleman
2025-04-01 19:05   ` Daniel P. Berrangé
2025-04-02  0:21     ` Bobby Eshleman
2025-04-02  8:13       ` Stefano Garzarella
2025-04-02  9:21         ` Daniel P. Berrangé
2025-04-02 22:18           ` Bobby Eshleman
2025-04-02 22:28             ` Bobby Eshleman
2025-04-03  9:33               ` Stefano Garzarella
2025-04-03 19:42                 ` Bobby Eshleman
2025-04-04 13:05             ` Daniel P. Berrangé
2025-04-18 17:57               ` Bobby Eshleman
2025-04-22 13:35                 ` Stefano Garzarella
2025-04-03  9:01           ` Stefano Garzarella

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z9MFvkALRY/k3ITG@devvm6277.cco0.facebook.com \
    --to=bobbyeshleman@gmail.com \
    --cc=bcm-kernel-feedback-list@broadcom.com \
    --cc=bryan-bt.tan@broadcom.com \
    --cc=davem@davemloft.net \
    --cc=decui@microsoft.com \
    --cc=eperezma@redhat.com \
    --cc=haiyangz@microsoft.com \
    --cc=jasowang@redhat.com \
    --cc=kuba@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=kys@microsoft.com \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=sgarzare@redhat.com \
    --cc=stefanha@redhat.com \
    --cc=virtualization@lists.linux.dev \
    --cc=vishnu.dasa@broadcom.com \
    --cc=wei.liu@kernel.org \
    --cc=xuanzhuo@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).