All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v7] virtio-vsock: Add support for multi devices
@ 2025-04-12 14:28 Xuewei Niu
  2025-04-12 14:38 ` Xuewei Niu
                   ` (2 more replies)
  0 siblings, 3 replies; 59+ messages in thread
From: Xuewei Niu @ 2025-04-12 14:28 UTC (permalink / raw)
  To: sgarzare, parav, mst, fupan.lfp; +Cc: virtio-comment, Xuewei Niu

This patch brings a new feature, called "multi devices", to the virtio
vsock. It introduces a "VIRTIO_VSOCK_F_MULTI_DEVICES" feature bit, and a
"device_order" field to the config for the virtio vsock.

== Motivition ==

Vsock is a lightweight and widely used data exchange mechanism between host
and guest. Currently, the virtio-vsock only supports one device, resulting
in the inability to enable more than one backend. For instance, two devices
are required: one to transfer data to the VMM via virtio-vsock, and another
to a user process via vhost-user-vsock.

Apart from that, a side gain is that theoretically the performance might be
improved since each device has its own queue. But it varies depending on
the implementation.

== Typical Usages ==

Assuming there are two virtio-vsock devices on the guest, with CIDs 3 and 4
respectively. And the device with CID 3 is default.

Connect to the host using the device with CID 3.

```c
// use default one (no bind)
fd = socket(AF_VSOCK);
connect(fd, 2, 1234);
n = write(fd, buffer);

// or bind explicitly
fd = socket(AF_VSOCK);
bind(fd, 3, -1);
connect(fd, 2, 1234);
n = write(fd, buffer);
```

Connect to the host using the device with CID 4.

```c
// must bind explicitly as the device with CID 4 is not default.
fd = socket(AF_VSOCK);
bind(fd, 4, -1);
connect(fd, 2, 1234);
n = write(fd, buffer);
```

The first version of multi-devices implementation is available at [1].

v6 -> v7:
- Addresses minor review comments from Stefano.

[1] https://lore.kernel.org/virtualization/20240517144607.2595798-1-niuxuewei.nxw@antgroup.com

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
---
 device-types/vsock/description.tex | 30 ++++++++++++++++++++++++++++--
 1 file changed, 28 insertions(+), 2 deletions(-)

diff --git a/device-types/vsock/description.tex b/device-types/vsock/description.tex
index 7d91d15..392dc76 100644
--- a/device-types/vsock/description.tex
+++ b/device-types/vsock/description.tex
@@ -20,6 +20,7 @@ \subsection{Feature bits}\label{sec:Device Types / Socket Device / Feature bits}
 \item[VIRTIO_VSOCK_F_STREAM (0)] stream socket type is supported.
 \item[VIRTIO_VSOCK_F_SEQPACKET (1)] seqpacket socket type is supported.
 \item[VIRTIO_VSOCK_F_NO_IMPLIED_STREAM (2)] stream socket type is not implied.
+\item[VIRTIO_VSOCK_F_MULTI_DEVICES (3)] multiple devices feature is supported.
 \end{description}
 
 \drivernormative{\subsubsection}{Feature bits}{Device Types / Socket Device / Feature bits}
@@ -34,6 +35,12 @@ \subsection{Feature bits}\label{sec:Device Types / Socket Device / Feature bits}
 VIRTIO_VSOCK_F_NO_IMPLIED_STREAM, the driver MAY act as if
 VIRTIO_VSOCK_F_STREAM has also been negotiated.
 
+The driver SHOULD ignore devices that do not have
+VIRTIO_VSOCK_F_MULTI_DEVICES if the feature has been negotiated.
+
+The driver SHOULD ignore all subsequent devices if a device without
+VIRTIO_VSOCK_F_MULTI_DEVICES is present.
+
 \devicenormative{\subsubsection}{Feature bits}{Device Types / Socket Device / Feature bits}
 
 The device SHOULD offer the VIRTIO_VSOCK_F_NO_IMPLIED_STREAM feature.
@@ -52,6 +59,7 @@ \subsection{Device configuration layout}\label{sec:Device Types / Socket Device
 \begin{lstlisting}
 struct virtio_vsock_config {
 	le64 guest_cid;
+	le16 device_order;
 };
 \end{lstlisting}
 
@@ -77,11 +85,27 @@ \subsection{Device configuration layout}\label{sec:Device Types / Socket Device
 \hline
 \end{tabular}
 
+The \field{device_order} is used to identify the default device. Up to
+65,535 devices can be supported due to the size.
+
+\devicenormative{\subsubsection}{Device configuration layout}{Device Types / Socket Device / Device configuration layout}
+
+The device MUST provide a distinct \field{device_order} if
+VIRTIO_VSOCK_F_MULTI_DEVICES feature has been negotiated.
+
+\drivernormative{\subsubsection}{Device configuration layout}{Device Types / Socket Device / Device configuration layout}
+
+The driver MUST treat the device with the lowest \field{device_order} as
+the default device.
+
 \subsection{Device Initialization}\label{sec:Device Types / Socket Device / Device Initialization}
 
 \begin{enumerate}
 \item The guest's cid is read from \field{guest_cid}.
 
+\item If VIRTIO_VSOCK_F_MULTI_DEVICES has been negotiated, the device's
+order will be read from \field{device_order}.
+
 \item Buffers are added to the event virtqueue to receive events from the device.
 
 \item Buffers are added to the rx virtqueue to start receiving packets.
@@ -233,8 +257,10 @@ \subsubsection{Receive and Transmit}\label{sec:Device Types / Socket Device / De
 
 \drivernormative{\paragraph}{Device Operation: Receive and Transmit}{Device Types / Socket Device / Device Operation / Receive and Transmit}
 
-The \field{guest_cid} configuration field MUST be used as the source CID when
-sending outgoing packets.
+If the source socket is not bound to any source CID, the driver MUST assign
+one. If more than one device is present, the driver SHOULD use the default
+device's \field{guest_cid} configuration. Otherwise, the driver SHOULD use
+the \field{guest_cid} of the only available device.
 
 A VIRTIO_VSOCK_OP_RST reply MUST be sent if a packet is received with an
 unknown \field{type} value.
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH v7] virtio-vsock: Add support for multi devices
  2025-04-12 14:28 [PATCH v7] virtio-vsock: Add support for multi devices Xuewei Niu
@ 2025-04-12 14:38 ` Xuewei Niu
  2025-05-18 21:52 ` Michael S. Tsirkin
  2025-06-13  4:23 ` Jason Wang
  2 siblings, 0 replies; 59+ messages in thread
From: Xuewei Niu @ 2025-04-12 14:38 UTC (permalink / raw)
  To: niuxuewei97
  Cc: fupan.lfp, mst, niuxuewei.nxw, parav, sgarzare, virtio-comment

I would not merge the feature of VMADDR_CID_HYPERVISOR support into this
patch, because the feature depends on this multi-devices feature. It will
take a long time to rework on this patch. I prefer to file a new patch
after this one gets merged.

Thanks,
Xuewei

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v7] virtio-vsock: Add support for multi devices
  2025-04-12 14:28 [PATCH v7] virtio-vsock: Add support for multi devices Xuewei Niu
  2025-04-12 14:38 ` Xuewei Niu
@ 2025-05-18 21:52 ` Michael S. Tsirkin
  2025-05-19  9:37   ` Xuewei Niu
  2025-06-13  4:23 ` Jason Wang
  2 siblings, 1 reply; 59+ messages in thread
From: Michael S. Tsirkin @ 2025-05-18 21:52 UTC (permalink / raw)
  To: Xuewei Niu; +Cc: sgarzare, parav, fupan.lfp, virtio-comment, Xuewei Niu

On Sat, Apr 12, 2025 at 10:28:25PM +0800, Xuewei Niu wrote:
> This patch brings a new feature, called "multi devices", to the virtio
> vsock. It introduces a "VIRTIO_VSOCK_F_MULTI_DEVICES" feature bit, and a
> "device_order" field to the config for the virtio vsock.
> 
> == Motivition ==
> 
> Vsock is a lightweight and widely used data exchange mechanism between host
> and guest. Currently, the virtio-vsock only supports one device, resulting
> in the inability to enable more than one backend. For instance, two devices
> are required: one to transfer data to the VMM via virtio-vsock, and another
> to a user process via vhost-user-vsock.
> 
> Apart from that, a side gain is that theoretically the performance might be
> improved since each device has its own queue. But it varies depending on
> the implementation.
> 
> == Typical Usages ==
> 
> Assuming there are two virtio-vsock devices on the guest, with CIDs 3 and 4
> respectively. And the device with CID 3 is default.
> 
> Connect to the host using the device with CID 3.
> 
> ```c
> // use default one (no bind)
> fd = socket(AF_VSOCK);
> connect(fd, 2, 1234);
> n = write(fd, buffer);
> 
> // or bind explicitly
> fd = socket(AF_VSOCK);
> bind(fd, 3, -1);
> connect(fd, 2, 1234);
> n = write(fd, buffer);
> ```
> 
> Connect to the host using the device with CID 4.
> 
> ```c
> // must bind explicitly as the device with CID 4 is not default.
> fd = socket(AF_VSOCK);
> bind(fd, 4, -1);
> connect(fd, 2, 1234);
> n = write(fd, buffer);
> ```
> 
> The first version of multi-devices implementation is available at [1].
> 
> v6 -> v7:
> - Addresses minor review comments from Stefano.
> 
> [1] https://lore.kernel.org/virtualization/20240517144607.2595798-1-niuxuewei.nxw@antgroup.com
> 
> Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
> ---
>  device-types/vsock/description.tex | 30 ++++++++++++++++++++++++++++--
>  1 file changed, 28 insertions(+), 2 deletions(-)
> 
> diff --git a/device-types/vsock/description.tex b/device-types/vsock/description.tex
> index 7d91d15..392dc76 100644
> --- a/device-types/vsock/description.tex
> +++ b/device-types/vsock/description.tex
> @@ -20,6 +20,7 @@ \subsection{Feature bits}\label{sec:Device Types / Socket Device / Feature bits}
>  \item[VIRTIO_VSOCK_F_STREAM (0)] stream socket type is supported.
>  \item[VIRTIO_VSOCK_F_SEQPACKET (1)] seqpacket socket type is supported.
>  \item[VIRTIO_VSOCK_F_NO_IMPLIED_STREAM (2)] stream socket type is not implied.
> +\item[VIRTIO_VSOCK_F_MULTI_DEVICES (3)] multiple devices feature is supported.
>  \end{description}
>  
>  \drivernormative{\subsubsection}{Feature bits}{Device Types / Socket Device / Feature bits}
> @@ -34,6 +35,12 @@ \subsection{Feature bits}\label{sec:Device Types / Socket Device / Feature bits}
>  VIRTIO_VSOCK_F_NO_IMPLIED_STREAM, the driver MAY act as if
>  VIRTIO_VSOCK_F_STREAM has also been negotiated.
>  
> +The driver SHOULD ignore devices that do not have
> +VIRTIO_VSOCK_F_MULTI_DEVICES if the feature has been negotiated.
> +
> +The driver SHOULD ignore all subsequent devices if a device without
> +VIRTIO_VSOCK_F_MULTI_DEVICES is present.
> +

all this is really vague. any better way to put it?

what are subsequent devices? if the feature has been negotiated where?
what does ignore mean? you can not know features without interacting
with the device.



>  \devicenormative{\subsubsection}{Feature bits}{Device Types / Socket Device / Feature bits}
>  
>  The device SHOULD offer the VIRTIO_VSOCK_F_NO_IMPLIED_STREAM feature.
> @@ -52,6 +59,7 @@ \subsection{Device configuration layout}\label{sec:Device Types / Socket Device
>  \begin{lstlisting}
>  struct virtio_vsock_config {
>  	le64 guest_cid;
> +	le16 device_order;
>  };
>  \end{lstlisting}
>  
> @@ -77,11 +85,27 @@ \subsection{Device configuration layout}\label{sec:Device Types / Socket Device
>  \hline
>  \end{tabular}
>  
> +The \field{device_order} is used to identify the default device.

no explanation what is the default device.
is it just for the cid?

> Up to
> +65,535 devices can be supported due to the size.

can be -> are
drop "due to the size".

> +
> +\devicenormative{\subsubsection}{Device configuration layout}{Device Types / Socket Device / Device configuration layout}
> +
> +The device MUST provide a distinct \field{device_order} if
> +VIRTIO_VSOCK_F_MULTI_DEVICES feature has been negotiated.

distinct to what?

> +
> +\drivernormative{\subsubsection}{Device configuration layout}{Device Types / Socket Device / Device configuration layout}
> +
> +The driver MUST treat the device with the lowest \field{device_order} as
> +the default device.
> +
>  \subsection{Device Initialization}\label{sec:Device Types / Socket Device / Device Initialization}
>  
>  \begin{enumerate}
>  \item The guest's cid is read from \field{guest_cid}.
>  
> +\item If VIRTIO_VSOCK_F_MULTI_DEVICES has been negotiated, the device's
> +order will be read from \field{device_order}.
> +
>  \item Buffers are added to the event virtqueue to receive events from the device.
>  
>  \item Buffers are added to the rx virtqueue to start receiving packets.
> @@ -233,8 +257,10 @@ \subsubsection{Receive and Transmit}\label{sec:Device Types / Socket Device / De
>  
>  \drivernormative{\paragraph}{Device Operation: Receive and Transmit}{Device Types / Socket Device / Device Operation / Receive and Transmit}
>  
> -The \field{guest_cid} configuration field MUST be used as the source CID when
> -sending outgoing packets.
> +If the source socket is not bound to any source CID, the driver MUST assign
> +one. If more than one device is present, the driver SHOULD use the default
> +device's \field{guest_cid} configuration. Otherwise, the driver SHOULD use
> +the \field{guest_cid} of the only available device.

why did you drop requirement about outgoing packets?

>  
>  A VIRTIO_VSOCK_OP_RST reply MUST be sent if a packet is received with an
>  unknown \field{type} value.
> -- 
> 2.34.1


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v7] virtio-vsock: Add support for multi devices
  2025-05-18 21:52 ` Michael S. Tsirkin
@ 2025-05-19  9:37   ` Xuewei Niu
  2025-06-11 14:51     ` Xuewei Niu
                       ` (2 more replies)
  0 siblings, 3 replies; 59+ messages in thread
From: Xuewei Niu @ 2025-05-19  9:37 UTC (permalink / raw)
  To: mst; +Cc: fupan.lfp, niuxuewei.nxw, niuxuewei97, parav, sgarzare,
	virtio-comment

> On Sat, Apr 12, 2025 at 10:28:25PM +0800, Xuewei Niu wrote:
> > This patch brings a new feature, called "multi devices", to the virtio
> > vsock. It introduces a "VIRTIO_VSOCK_F_MULTI_DEVICES" feature bit, and a
> > "device_order" field to the config for the virtio vsock.
> > 
> > == Motivition ==
> > 
> > Vsock is a lightweight and widely used data exchange mechanism between host
> > and guest. Currently, the virtio-vsock only supports one device, resulting
> > in the inability to enable more than one backend. For instance, two devices
> > are required: one to transfer data to the VMM via virtio-vsock, and another
> > to a user process via vhost-user-vsock.
> > 
> > Apart from that, a side gain is that theoretically the performance might be
> > improved since each device has its own queue. But it varies depending on
> > the implementation.
> > 
> > == Typical Usages ==
> > 
> > Assuming there are two virtio-vsock devices on the guest, with CIDs 3 and 4
> > respectively. And the device with CID 3 is default.
> > 
> > Connect to the host using the device with CID 3.
> > 
> > ```c
> > // use default one (no bind)
> > fd = socket(AF_VSOCK);
> > connect(fd, 2, 1234);
> > n = write(fd, buffer);
> > 
> > // or bind explicitly
> > fd = socket(AF_VSOCK);
> > bind(fd, 3, -1);
> > connect(fd, 2, 1234);
> > n = write(fd, buffer);
> > ```
> > 
> > Connect to the host using the device with CID 4.
> > 
> > ```c
> > // must bind explicitly as the device with CID 4 is not default.
> > fd = socket(AF_VSOCK);
> > bind(fd, 4, -1);
> > connect(fd, 2, 1234);
> > n = write(fd, buffer);
> > ```
> > 
> > The first version of multi-devices implementation is available at [1].
> > 
> > v6 -> v7:
> > - Addresses minor review comments from Stefano.
> > 
> > [1] https://lore.kernel.org/virtualization/20240517144607.2595798-1-niuxuewei.nxw@antgroup.com
> > 
> > Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
> > ---
> >  device-types/vsock/description.tex | 30 ++++++++++++++++++++++++++++--
> >  1 file changed, 28 insertions(+), 2 deletions(-)
> > 
> > diff --git a/device-types/vsock/description.tex b/device-types/vsock/description.tex
> > index 7d91d15..392dc76 100644
> > --- a/device-types/vsock/description.tex
> > +++ b/device-types/vsock/description.tex
> > @@ -20,6 +20,7 @@ \subsection{Feature bits}\label{sec:Device Types / Socket Device / Feature bits}
> >  \item[VIRTIO_VSOCK_F_STREAM (0)] stream socket type is supported.
> >  \item[VIRTIO_VSOCK_F_SEQPACKET (1)] seqpacket socket type is supported.
> >  \item[VIRTIO_VSOCK_F_NO_IMPLIED_STREAM (2)] stream socket type is not implied.
> > +\item[VIRTIO_VSOCK_F_MULTI_DEVICES (3)] multiple devices feature is supported.
> >  \end{description}
> >  
> >  \drivernormative{\subsubsection}{Feature bits}{Device Types / Socket Device / Feature bits}
> > @@ -34,6 +35,12 @@ \subsection{Feature bits}\label{sec:Device Types / Socket Device / Feature bits}
> >  VIRTIO_VSOCK_F_NO_IMPLIED_STREAM, the driver MAY act as if
> >  VIRTIO_VSOCK_F_STREAM has also been negotiated.
> >  
> > +The driver SHOULD ignore devices that do not have
> > +VIRTIO_VSOCK_F_MULTI_DEVICES if the feature has been negotiated.
> > +
> > +The driver SHOULD ignore all subsequent devices if a device without
> > +VIRTIO_VSOCK_F_MULTI_DEVICES is present.
> > +
> 
> all this is really vague. any better way to put it?
> 
> what are subsequent devices? if the feature has been negotiated where?
> what does ignore mean? you can not know features without interacting
> with the device.

The original idea is: Some devices have enabled the multi-devices feature,
while others have not, and this situation is unacceptable.

The driver determines the states based on the first device present in the
guest.

There are two possible cases:

- If the first device has negotiated the multi-devices feature, then the
driver considers the multi-devices feature as enabled. Then, the driver will
ignore all devices that do not negotiate the feature.
- If the first device has not negotiated, it indicates that the multi-devices
feature is disabled. Consequently, the driver will ignore any subsequent
devices.

====

Here is the revised version:

To ensure consistency, all devices MUST have the same multi-devices feature
status; a mix of enabled and disabled devices is not acceptable. The driver
determines whether the multi-devices feature is enabled based on the first
device present in the guest: if the first device has negotiated the
feature, the driver enables it and ignores any devices that have not; if
the first device has not negotiated the feature, the driver treats the
feature as disabled and ignores any subsequent devices.

Does this look better to you?

> >  \devicenormative{\subsubsection}{Feature bits}{Device Types / Socket Device / Feature bits}
> >  
> >  The device SHOULD offer the VIRTIO_VSOCK_F_NO_IMPLIED_STREAM feature.
> > @@ -52,6 +59,7 @@ \subsection{Device configuration layout}\label{sec:Device Types / Socket Device
> >  \begin{lstlisting}
> >  struct virtio_vsock_config {
> >  	le64 guest_cid;
> > +	le16 device_order;
> >  };
> >  \end{lstlisting}
> >  
> > @@ -77,11 +85,27 @@ \subsection{Device configuration layout}\label{sec:Device Types / Socket Device
> >  \hline
> >  \end{tabular}
> >  
> > +The \field{device_order} is used to identify the default device.
> 
> no explanation what is the default device.
> is it just for the cid?

Yes.

It is allowed to not specify the local CID for a socket. In this case, the
driver will use the default device's CID as the local CID for the socket.

The details are listed in the "Receive and Transmit" section, where you
left a comment.

> > Up to
> > +65,535 devices can be supported due to the size.
> 
> can be -> are
> drop "due to the size".

Will do in the next version.

> > +\devicenormative{\subsubsection}{Device configuration layout}{Device Types / Socket Device / Device configuration layout}
> > +
> > +The device MUST provide a distinct \field{device_order} if
> > +VIRTIO_VSOCK_F_MULTI_DEVICES feature has been negotiated.
> 
> distinct to what?

In the scope of the guest VM, the device_order should be unique. This means
that the device_order should be distinct for each device.

> > +\drivernormative{\subsubsection}{Device configuration layout}{Device Types / Socket Device / Device configuration layout}
> > +
> > +The driver MUST treat the device with the lowest \field{device_order} as
> > +the default device.
> > +
> >  \subsection{Device Initialization}\label{sec:Device Types / Socket Device / Device Initialization}
> >  
> >  \begin{enumerate}
> >  \item The guest's cid is read from \field{guest_cid}.
> >  
> > +\item If VIRTIO_VSOCK_F_MULTI_DEVICES has been negotiated, the device's
> > +order will be read from \field{device_order}.
> > +
> >  \item Buffers are added to the event virtqueue to receive events from the device.
> >  
> >  \item Buffers are added to the rx virtqueue to start receiving packets.
> > @@ -233,8 +257,10 @@ \subsubsection{Receive and Transmit}\label{sec:Device Types / Socket Device / De
> >  
> >  \drivernormative{\paragraph}{Device Operation: Receive and Transmit}{Device Types / Socket Device / Device Operation / Receive and Transmit}
> >  
> > -The \field{guest_cid} configuration field MUST be used as the source CID when
> > -sending outgoing packets.
> > +If the source socket is not bound to any source CID, the driver MUST assign
> > +one. If more than one device is present, the driver SHOULD use the default
> > +device's \field{guest_cid} configuration. Otherwise, the driver SHOULD use
> > +the \field{guest_cid} of the only available device.
> 
> why did you drop requirement about outgoing packets?

The driver prefers to use the CID provided by the user. That is, if the
user binds to a source CID, the driver will use it and does not need to do
anything. If not, the driver will use one from the configuration.

Thanks,
Xuewei

> >  A VIRTIO_VSOCK_OP_RST reply MUST be sent if a packet is received with an
> >  unknown \field{type} value.
> > -- 
> > 2.34.1

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v7] virtio-vsock: Add support for multi devices
  2025-05-19  9:37   ` Xuewei Niu
@ 2025-06-11 14:51     ` Xuewei Niu
  2025-06-11 14:53       ` Parav Pandit
  2025-06-11 18:02     ` Michael S. Tsirkin
  2025-06-14 17:11     ` Parav Pandit
  2 siblings, 1 reply; 59+ messages in thread
From: Xuewei Niu @ 2025-06-11 14:51 UTC (permalink / raw)
  To: niuxuewei97
  Cc: fupan.lfp, mst, niuxuewei.nxw, parav, sgarzare, virtio-comment

Hi, Michael and Stefano,

No comments since last month. Does this patch look good to you?

Thanks,
Xuewei

^ permalink raw reply	[flat|nested] 59+ messages in thread

* RE: [PATCH v7] virtio-vsock: Add support for multi devices
  2025-06-11 14:51     ` Xuewei Niu
@ 2025-06-11 14:53       ` Parav Pandit
  0 siblings, 0 replies; 59+ messages in thread
From: Parav Pandit @ 2025-06-11 14:53 UTC (permalink / raw)
  To: Xuewei Niu
  Cc: fupan.lfp@antgroup.com, mst@redhat.com,
	niuxuewei.nxw@antgroup.com, sgarzare@redhat.com,
	virtio-comment@lists.linux.dev


> From: Xuewei Niu <niuxuewei97@gmail.com>
> Sent: Wednesday, June 11, 2025 8:22 PM
> 
> Hi, Michael and Stefano,
> 
> No comments since last month. Does this patch look good to you?
> 
> Thanks,
> Xuewei

I missed to review v7 as was off-on working during your v7 posting.
Will review before 14th Jun.

Parav

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v7] virtio-vsock: Add support for multi devices
  2025-05-19  9:37   ` Xuewei Niu
  2025-06-11 14:51     ` Xuewei Niu
@ 2025-06-11 18:02     ` Michael S. Tsirkin
  2025-06-12  3:28       ` Xuewei Niu
  2025-06-14 17:11     ` Parav Pandit
  2 siblings, 1 reply; 59+ messages in thread
From: Michael S. Tsirkin @ 2025-06-11 18:02 UTC (permalink / raw)
  To: Xuewei Niu; +Cc: fupan.lfp, niuxuewei.nxw, parav, sgarzare, virtio-comment

On Mon, May 19, 2025 at 05:37:36PM +0800, Xuewei Niu wrote:
> > On Sat, Apr 12, 2025 at 10:28:25PM +0800, Xuewei Niu wrote:
> > > This patch brings a new feature, called "multi devices", to the virtio
> > > vsock. It introduces a "VIRTIO_VSOCK_F_MULTI_DEVICES" feature bit, and a
> > > "device_order" field to the config for the virtio vsock.
> > > 
> > > == Motivition ==
> > > 
> > > Vsock is a lightweight and widely used data exchange mechanism between host
> > > and guest. Currently, the virtio-vsock only supports one device, resulting
> > > in the inability to enable more than one backend. For instance, two devices
> > > are required: one to transfer data to the VMM via virtio-vsock, and another
> > > to a user process via vhost-user-vsock.
> > > 
> > > Apart from that, a side gain is that theoretically the performance might be
> > > improved since each device has its own queue. But it varies depending on
> > > the implementation.
> > > 
> > > == Typical Usages ==
> > > 
> > > Assuming there are two virtio-vsock devices on the guest, with CIDs 3 and 4
> > > respectively. And the device with CID 3 is default.
> > > 
> > > Connect to the host using the device with CID 3.
> > > 
> > > ```c
> > > // use default one (no bind)
> > > fd = socket(AF_VSOCK);
> > > connect(fd, 2, 1234);
> > > n = write(fd, buffer);
> > > 
> > > // or bind explicitly
> > > fd = socket(AF_VSOCK);
> > > bind(fd, 3, -1);
> > > connect(fd, 2, 1234);
> > > n = write(fd, buffer);
> > > ```
> > > 
> > > Connect to the host using the device with CID 4.
> > > 
> > > ```c
> > > // must bind explicitly as the device with CID 4 is not default.
> > > fd = socket(AF_VSOCK);
> > > bind(fd, 4, -1);
> > > connect(fd, 2, 1234);
> > > n = write(fd, buffer);
> > > ```
> > > 
> > > The first version of multi-devices implementation is available at [1].
> > > 
> > > v6 -> v7:
> > > - Addresses minor review comments from Stefano.
> > > 
> > > [1] https://lore.kernel.org/virtualization/20240517144607.2595798-1-niuxuewei.nxw@antgroup.com
> > > 
> > > Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
> > > ---
> > >  device-types/vsock/description.tex | 30 ++++++++++++++++++++++++++++--
> > >  1 file changed, 28 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/device-types/vsock/description.tex b/device-types/vsock/description.tex
> > > index 7d91d15..392dc76 100644
> > > --- a/device-types/vsock/description.tex
> > > +++ b/device-types/vsock/description.tex
> > > @@ -20,6 +20,7 @@ \subsection{Feature bits}\label{sec:Device Types / Socket Device / Feature bits}
> > >  \item[VIRTIO_VSOCK_F_STREAM (0)] stream socket type is supported.
> > >  \item[VIRTIO_VSOCK_F_SEQPACKET (1)] seqpacket socket type is supported.
> > >  \item[VIRTIO_VSOCK_F_NO_IMPLIED_STREAM (2)] stream socket type is not implied.
> > > +\item[VIRTIO_VSOCK_F_MULTI_DEVICES (3)] multiple devices feature is supported.
> > >  \end{description}
> > >  
> > >  \drivernormative{\subsubsection}{Feature bits}{Device Types / Socket Device / Feature bits}
> > > @@ -34,6 +35,12 @@ \subsection{Feature bits}\label{sec:Device Types / Socket Device / Feature bits}
> > >  VIRTIO_VSOCK_F_NO_IMPLIED_STREAM, the driver MAY act as if
> > >  VIRTIO_VSOCK_F_STREAM has also been negotiated.
> > >  
> > > +The driver SHOULD ignore devices that do not have
> > > +VIRTIO_VSOCK_F_MULTI_DEVICES if the feature has been negotiated.
> > > +
> > > +The driver SHOULD ignore all subsequent devices if a device without
> > > +VIRTIO_VSOCK_F_MULTI_DEVICES is present.
> > > +
> > 
> > all this is really vague. any better way to put it?
> > 
> > what are subsequent devices? if the feature has been negotiated where?
> > what does ignore mean? you can not know features without interacting
> > with the device.
> 
> The original idea is: Some devices have enabled the multi-devices feature,
> while others have not, and this situation is unacceptable.
> 
> The driver determines the states based on the first device present in the
> guest.
> 
> There are two possible cases:
> 
> - If the first device has negotiated the multi-devices feature, then the
> driver considers the multi-devices feature as enabled. Then, the driver will
> ignore all devices that do not negotiate the feature.
> - If the first device has not negotiated, it indicates that the multi-devices
> feature is disabled. Consequently, the driver will ignore any subsequent
> devices.
> 
> ====
> 
> Here is the revised version:
> 
> To ensure consistency, all devices MUST have the same multi-devices feature
> status; a mix of enabled and disabled devices is not acceptable. The driver
> determines whether the multi-devices feature is enabled based on the first
> device present in the guest: if the first device has negotiated the
> feature, the driver enables it and ignores any devices that have not; if
> the first device has not negotiated the feature, the driver treats the
> feature as disabled and ignores any subsequent devices.
> 
> Does this look better to you?

I can't say I like this, since it is not clear what does "first device
present" mean. Also, it is not clear what does "ignores" mean.


I propose instead simply specifying something like this for devices:


	All socket devices used with a specific driver MUST be consistent
	with respect to offering VIRTIO_VSOCK_F_MULTI_DEVICES.
	In other words, either all of the devices offer VIRTIO_VSOCK_F_MULTI_DEVICES or
	none of them do.

And similarly for drivers:

	When used with multiple socket devices, a driver MUST be consistent
	with respect to negotiating VIRTIO_VSOCK_F_MULTI_DEVICES.
	In other words, either the driver negotiates VIRTIO_VSOCK_F_MULTI_DEVICES
	with all of the devices, or with none of them.



There is no need to specify behaviour when spec is violated.



> > >  \devicenormative{\subsubsection}{Feature bits}{Device Types / Socket Device / Feature bits}
> > >  
> > >  The device SHOULD offer the VIRTIO_VSOCK_F_NO_IMPLIED_STREAM feature.
> > > @@ -52,6 +59,7 @@ \subsection{Device configuration layout}\label{sec:Device Types / Socket Device
> > >  \begin{lstlisting}
> > >  struct virtio_vsock_config {
> > >  	le64 guest_cid;
> > > +	le16 device_order;
> > >  };
> > >  \end{lstlisting}
> > >  
> > > @@ -77,11 +85,27 @@ \subsection{Device configuration layout}\label{sec:Device Types / Socket Device
> > >  \hline
> > >  \end{tabular}
> > >  
> > > +The \field{device_order} is used to identify the default device.
> > 
> > no explanation what is the default device.
> > is it just for the cid?
> 
> Yes.
> 
> It is allowed to not specify the local CID for a socket. In this case, the
> driver will use the default device's CID as the local CID for the socket.
> 
> The details are listed in the "Receive and Transmit" section, where you
> left a comment.
> 
> > > Up to
> > > +65,535 devices can be supported due to the size.
> > 
> > can be -> are
> > drop "due to the size".
> 
> Will do in the next version.
> 
> > > +\devicenormative{\subsubsection}{Device configuration layout}{Device Types / Socket Device / Device configuration layout}
> > > +
> > > +The device MUST provide a distinct \field{device_order} if
> > > +VIRTIO_VSOCK_F_MULTI_DEVICES feature has been negotiated.
> > 
> > distinct to what?
> 
> In the scope of the guest VM, the device_order should be unique. This means
> that the device_order should be distinct for each device.
> 
> > > +\drivernormative{\subsubsection}{Device configuration layout}{Device Types / Socket Device / Device configuration layout}
> > > +
> > > +The driver MUST treat the device with the lowest \field{device_order} as
> > > +the default device.
> > > +
> > >  \subsection{Device Initialization}\label{sec:Device Types / Socket Device / Device Initialization}
> > >  
> > >  \begin{enumerate}
> > >  \item The guest's cid is read from \field{guest_cid}.
> > >  
> > > +\item If VIRTIO_VSOCK_F_MULTI_DEVICES has been negotiated, the device's
> > > +order will be read from \field{device_order}.
> > > +
> > >  \item Buffers are added to the event virtqueue to receive events from the device.
> > >  
> > >  \item Buffers are added to the rx virtqueue to start receiving packets.
> > > @@ -233,8 +257,10 @@ \subsubsection{Receive and Transmit}\label{sec:Device Types / Socket Device / De
> > >  
> > >  \drivernormative{\paragraph}{Device Operation: Receive and Transmit}{Device Types / Socket Device / Device Operation / Receive and Transmit}
> > >  
> > > -The \field{guest_cid} configuration field MUST be used as the source CID when
> > > -sending outgoing packets.
> > > +If the source socket is not bound to any source CID, the driver MUST assign
> > > +one. If more than one device is present, the driver SHOULD use the default
> > > +device's \field{guest_cid} configuration. Otherwise, the driver SHOULD use
> > > +the \field{guest_cid} of the only available device.
> > 
> > why did you drop requirement about outgoing packets?
> 
> The driver prefers to use the CID provided by the user. That is, if the
> user binds to a source CID, the driver will use it and does not need to do
> anything. If not, the driver will use one from the configuration.
> 
> Thanks,
> Xuewei
> 
> > >  A VIRTIO_VSOCK_OP_RST reply MUST be sent if a packet is received with an
> > >  unknown \field{type} value.
> > > -- 
> > > 2.34.1


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v7] virtio-vsock: Add support for multi devices
  2025-06-11 18:02     ` Michael S. Tsirkin
@ 2025-06-12  3:28       ` Xuewei Niu
  0 siblings, 0 replies; 59+ messages in thread
From: Xuewei Niu @ 2025-06-12  3:28 UTC (permalink / raw)
  To: mst; +Cc: fupan.lfp, niuxuewei.nxw, niuxuewei97, parav, sgarzare,
	virtio-comment

> On Mon, May 19, 2025 at 05:37:36PM +0800, Xuewei Niu wrote:
> > > On Sat, Apr 12, 2025 at 10:28:25PM +0800, Xuewei Niu wrote:
> > > > This patch brings a new feature, called "multi devices", to the virtio
> > > > vsock. It introduces a "VIRTIO_VSOCK_F_MULTI_DEVICES" feature bit, and a
> > > > "device_order" field to the config for the virtio vsock.
> > > > 
> > > > == Motivition ==
> > > > 
> > > > Vsock is a lightweight and widely used data exchange mechanism between host
> > > > and guest. Currently, the virtio-vsock only supports one device, resulting
> > > > in the inability to enable more than one backend. For instance, two devices
> > > > are required: one to transfer data to the VMM via virtio-vsock, and another
> > > > to a user process via vhost-user-vsock.
> > > > 
> > > > Apart from that, a side gain is that theoretically the performance might be
> > > > improved since each device has its own queue. But it varies depending on
> > > > the implementation.
> > > > 
> > > > == Typical Usages ==
> > > > 
> > > > Assuming there are two virtio-vsock devices on the guest, with CIDs 3 and 4
> > > > respectively. And the device with CID 3 is default.
> > > > 
> > > > Connect to the host using the device with CID 3.
> > > > 
> > > > ```c
> > > > // use default one (no bind)
> > > > fd = socket(AF_VSOCK);
> > > > connect(fd, 2, 1234);
> > > > n = write(fd, buffer);
> > > > 
> > > > // or bind explicitly
> > > > fd = socket(AF_VSOCK);
> > > > bind(fd, 3, -1);
> > > > connect(fd, 2, 1234);
> > > > n = write(fd, buffer);
> > > > ```
> > > > 
> > > > Connect to the host using the device with CID 4.
> > > > 
> > > > ```c
> > > > // must bind explicitly as the device with CID 4 is not default.
> > > > fd = socket(AF_VSOCK);
> > > > bind(fd, 4, -1);
> > > > connect(fd, 2, 1234);
> > > > n = write(fd, buffer);
> > > > ```
> > > > 
> > > > The first version of multi-devices implementation is available at [1].
> > > > 
> > > > v6 -> v7:
> > > > - Addresses minor review comments from Stefano.
> > > > 
> > > > [1] https://lore.kernel.org/virtualization/20240517144607.2595798-1-niuxuewei.nxw@antgroup.com
> > > > 
> > > > Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
> > > > ---
> > > >  device-types/vsock/description.tex | 30 ++++++++++++++++++++++++++++--
> > > >  1 file changed, 28 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/device-types/vsock/description.tex b/device-types/vsock/description.tex
> > > > index 7d91d15..392dc76 100644
> > > > --- a/device-types/vsock/description.tex
> > > > +++ b/device-types/vsock/description.tex
> > > > @@ -20,6 +20,7 @@ \subsection{Feature bits}\label{sec:Device Types / Socket Device / Feature bits}
> > > >  \item[VIRTIO_VSOCK_F_STREAM (0)] stream socket type is supported.
> > > >  \item[VIRTIO_VSOCK_F_SEQPACKET (1)] seqpacket socket type is supported.
> > > >  \item[VIRTIO_VSOCK_F_NO_IMPLIED_STREAM (2)] stream socket type is not implied.
> > > > +\item[VIRTIO_VSOCK_F_MULTI_DEVICES (3)] multiple devices feature is supported.
> > > >  \end{description}
> > > >  
> > > >  \drivernormative{\subsubsection}{Feature bits}{Device Types / Socket Device / Feature bits}
> > > > @@ -34,6 +35,12 @@ \subsection{Feature bits}\label{sec:Device Types / Socket Device / Feature bits}
> > > >  VIRTIO_VSOCK_F_NO_IMPLIED_STREAM, the driver MAY act as if
> > > >  VIRTIO_VSOCK_F_STREAM has also been negotiated.
> > > >  
> > > > +The driver SHOULD ignore devices that do not have
> > > > +VIRTIO_VSOCK_F_MULTI_DEVICES if the feature has been negotiated.
> > > > +
> > > > +The driver SHOULD ignore all subsequent devices if a device without
> > > > +VIRTIO_VSOCK_F_MULTI_DEVICES is present.
> > > > +
> > > 
> > > all this is really vague. any better way to put it?
> > > 
> > > what are subsequent devices? if the feature has been negotiated where?
> > > what does ignore mean? you can not know features without interacting
> > > with the device.
> > 
> > The original idea is: Some devices have enabled the multi-devices feature,
> > while others have not, and this situation is unacceptable.
> > 
> > The driver determines the states based on the first device present in the
> > guest.
> > 
> > There are two possible cases:
> > 
> > - If the first device has negotiated the multi-devices feature, then the
> > driver considers the multi-devices feature as enabled. Then, the driver will
> > ignore all devices that do not negotiate the feature.
> > - If the first device has not negotiated, it indicates that the multi-devices
> > feature is disabled. Consequently, the driver will ignore any subsequent
> > devices.
> > 
> > ====
> > 
> > Here is the revised version:
> > 
> > To ensure consistency, all devices MUST have the same multi-devices feature
> > status; a mix of enabled and disabled devices is not acceptable. The driver
> > determines whether the multi-devices feature is enabled based on the first
> > device present in the guest: if the first device has negotiated the
> > feature, the driver enables it and ignores any devices that have not; if
> > the first device has not negotiated the feature, the driver treats the
> > feature as disabled and ignores any subsequent devices.
> > 
> > Does this look better to you?
> 
> I can't say I like this, since it is not clear what does "first device
> present" mean. Also, it is not clear what does "ignores" mean.
> 
> 
> I propose instead simply specifying something like this for devices:
> 
> 
> 	All socket devices used with a specific driver MUST be consistent
> 	with respect to offering VIRTIO_VSOCK_F_MULTI_DEVICES.
> 	In other words, either all of the devices offer VIRTIO_VSOCK_F_MULTI_DEVICES or
> 	none of them do.
> 
> 
> And similarly for drivers:
> 
> 	When used with multiple socket devices, a driver MUST be consistent
> 	with respect to negotiating VIRTIO_VSOCK_F_MULTI_DEVICES.
> 	In other words, either the driver negotiates VIRTIO_VSOCK_F_MULTI_DEVICES
> 	with all of the devices, or with none of them.
> 
> 
> There is no need to specify behaviour when spec is violated.

Will do in the next.

Thanks,
Xuewei

> > > >  \devicenormative{\subsubsection}{Feature bits}{Device Types / Socket Device / Feature bits}
> > > >  
> > > >  The device SHOULD offer the VIRTIO_VSOCK_F_NO_IMPLIED_STREAM feature.
> > > > @@ -52,6 +59,7 @@ \subsection{Device configuration layout}\label{sec:Device Types / Socket Device
> > > >  \begin{lstlisting}
> > > >  struct virtio_vsock_config {
> > > >  	le64 guest_cid;
> > > > +	le16 device_order;
> > > >  };
> > > >  \end{lstlisting}
> > > >  
> > > > @@ -77,11 +85,27 @@ \subsection{Device configuration layout}\label{sec:Device Types / Socket Device
> > > >  \hline
> > > >  \end{tabular}
> > > >  
> > > > +The \field{device_order} is used to identify the default device.
> > > 
> > > no explanation what is the default device.
> > > is it just for the cid?
> > 
> > Yes.
> > 
> > It is allowed to not specify the local CID for a socket. In this case, the
> > driver will use the default device's CID as the local CID for the socket.
> > 
> > The details are listed in the "Receive and Transmit" section, where you
> > left a comment.
> > 
> > > > Up to
> > > > +65,535 devices can be supported due to the size.
> > > 
> > > can be -> are
> > > drop "due to the size".
> > 
> > Will do in the next version.
> > 
> > > > +\devicenormative{\subsubsection}{Device configuration layout}{Device Types / Socket Device / Device configuration layout}
> > > > +
> > > > +The device MUST provide a distinct \field{device_order} if
> > > > +VIRTIO_VSOCK_F_MULTI_DEVICES feature has been negotiated.
> > > 
> > > distinct to what?
> > 
> > In the scope of the guest VM, the device_order should be unique. This means
> > that the device_order should be distinct for each device.
> > 
> > > > +\drivernormative{\subsubsection}{Device configuration layout}{Device Types / Socket Device / Device configuration layout}
> > > > +
> > > > +The driver MUST treat the device with the lowest \field{device_order} as
> > > > +the default device.
> > > > +
> > > >  \subsection{Device Initialization}\label{sec:Device Types / Socket Device / Device Initialization}
> > > >  
> > > >  \begin{enumerate}
> > > >  \item The guest's cid is read from \field{guest_cid}.
> > > >  
> > > > +\item If VIRTIO_VSOCK_F_MULTI_DEVICES has been negotiated, the device's
> > > > +order will be read from \field{device_order}.
> > > > +
> > > >  \item Buffers are added to the event virtqueue to receive events from the device.
> > > >  
> > > >  \item Buffers are added to the rx virtqueue to start receiving packets.
> > > > @@ -233,8 +257,10 @@ \subsubsection{Receive and Transmit}\label{sec:Device Types / Socket Device / De
> > > >  
> > > >  \drivernormative{\paragraph}{Device Operation: Receive and Transmit}{Device Types / Socket Device / Device Operation / Receive and Transmit}
> > > >  
> > > > -The \field{guest_cid} configuration field MUST be used as the source CID when
> > > > -sending outgoing packets.
> > > > +If the source socket is not bound to any source CID, the driver MUST assign
> > > > +one. If more than one device is present, the driver SHOULD use the default
> > > > +device's \field{guest_cid} configuration. Otherwise, the driver SHOULD use
> > > > +the \field{guest_cid} of the only available device.
> > > 
> > > why did you drop requirement about outgoing packets?
> > 
> > The driver prefers to use the CID provided by the user. That is, if the
> > user binds to a source CID, the driver will use it and does not need to do
> > anything. If not, the driver will use one from the configuration.
> > 
> > Thanks,
> > Xuewei
> > 
> > > >  A VIRTIO_VSOCK_OP_RST reply MUST be sent if a packet is received with an
> > > >  unknown \field{type} value.
> > > > -- 
> > > > 2.34.1

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v7] virtio-vsock: Add support for multi devices
  2025-04-12 14:28 [PATCH v7] virtio-vsock: Add support for multi devices Xuewei Niu
  2025-04-12 14:38 ` Xuewei Niu
  2025-05-18 21:52 ` Michael S. Tsirkin
@ 2025-06-13  4:23 ` Jason Wang
  2025-06-13  4:57   ` Xuewei Niu
  2025-06-13  8:31   ` Stefano Garzarella
  2 siblings, 2 replies; 59+ messages in thread
From: Jason Wang @ 2025-06-13  4:23 UTC (permalink / raw)
  To: Xuewei Niu; +Cc: sgarzare, parav, mst, fupan.lfp, virtio-comment, Xuewei Niu

On Sat, Apr 12, 2025 at 10:39 PM Xuewei Niu <niuxuewei97@gmail.com> wrote:
>
> This patch brings a new feature, called "multi devices", to the virtio
> vsock. It introduces a "VIRTIO_VSOCK_F_MULTI_DEVICES" feature bit, and a
> "device_order" field to the config for the virtio vsock.
>
> == Motivition ==
>
> Vsock is a lightweight and widely used data exchange mechanism between host
> and guest. Currently, the virtio-vsock only supports one device, resulting
> in the inability to enable more than one backend.

I wonder which part of the spec forbids more than one device.

> For instance, two devices
> are required: one to transfer data to the VMM via virtio-vsock, and another
> to a user process via vhost-user-vsock.
>
> Apart from that, a side gain is that theoretically the performance might be
> improved since each device has its own queue. But it varies depending on
> the implementation.

It could implement multiqueue anyhow.

Thanks


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v7] virtio-vsock: Add support for multi devices
  2025-06-13  4:23 ` Jason Wang
@ 2025-06-13  4:57   ` Xuewei Niu
  2025-06-13  8:35     ` Stefano Garzarella
  2025-06-13  8:31   ` Stefano Garzarella
  1 sibling, 1 reply; 59+ messages in thread
From: Xuewei Niu @ 2025-06-13  4:57 UTC (permalink / raw)
  To: jasowang
  Cc: fupan.lfp, mst, niuxuewei.nxw, niuxuewei97, parav, sgarzare,
	virtio-comment

> On Sat, Apr 12, 2025 at 10:39 PM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> >
> > This patch brings a new feature, called "multi devices", to the virtio
> > vsock. It introduces a "VIRTIO_VSOCK_F_MULTI_DEVICES" feature bit, and a
> > "device_order" field to the config for the virtio vsock.
> >
> > == Motivition ==
> >
> > Vsock is a lightweight and widely used data exchange mechanism between host
> > and guest. Currently, the virtio-vsock only supports one device, resulting
> > in the inability to enable more than one backend.
> 
> I wonder which part of the spec forbids more than one device.

No. The spec, however, is designed for a single device, and lacks some
specifications for multiple devices.

For example, we should have a mechanism to select a device from all to
communicate with a peer.

This patch introduces a new feature bit "VIRTIO_VSOCK_F_MULTI_DEVICES", and
involves some modifications to the config space, device & driver norms.

> > For instance, two devices
> > are required: one to transfer data to the VMM via virtio-vsock, and another
> > to a user process via vhost-user-vsock.
> >
> > Apart from that, a side gain is that theoretically the performance might be
> > improved since each device has its own queue. But it varies depending on
> > the implementation.
> 
> It could implement multiqueue anyhow.
> 
> Thanks

Yes, indeed. As we discussed before, I marked this as a side gain. And
@Stefano said that the community has a plan to implement multiqueue for
vsock. The foundational goal of this patch is to enable vsock to support
multiple backends.

Thanks,
Xuewei

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v7] virtio-vsock: Add support for multi devices
  2025-06-13  4:23 ` Jason Wang
  2025-06-13  4:57   ` Xuewei Niu
@ 2025-06-13  8:31   ` Stefano Garzarella
  2025-06-13  8:36     ` Xuewei Niu
  1 sibling, 1 reply; 59+ messages in thread
From: Stefano Garzarella @ 2025-06-13  8:31 UTC (permalink / raw)
  To: Jason Wang; +Cc: Xuewei Niu, parav, mst, fupan.lfp, virtio-comment, Xuewei Niu

On Fri, 13 Jun 2025 at 06:23, Jason Wang <jasowang@redhat.com> wrote:
>
> On Sat, Apr 12, 2025 at 10:39 PM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> >
> > This patch brings a new feature, called "multi devices", to the virtio
> > vsock. It introduces a "VIRTIO_VSOCK_F_MULTI_DEVICES" feature bit, and a
> > "device_order" field to the config for the virtio vsock.
> >
> > == Motivition ==
> >
> > Vsock is a lightweight and widely used data exchange mechanism between host
> > and guest. Currently, the virtio-vsock only supports one device, resulting
> > in the inability to enable more than one backend.
>
> I wonder which part of the spec forbids more than one device.

I don't think there isn't anything that forbid it, but what we don't
have is a kind of default gateway/interface to use when sending
messages.
So, this proposal is mostly to chose the default device to use, when
the user don't bind any CID on the source socket.

>
> > For instance, two devices
> > are required: one to transfer data to the VMM via virtio-vsock, and another
> > to a user process via vhost-user-vsock.
> >
> > Apart from that, a side gain is that theoretically the performance might be
> > improved since each device has its own queue. But it varies depending on
> > the implementation.
>
> It could implement multiqueue anyhow.

Yeah, I pointed out multiple times to skip this as justification,
since multi-queue has more much sense for that.
IMHO it's confusing to still talk about it in this proposal.

Stefano

>
> Thanks
>


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v7] virtio-vsock: Add support for multi devices
  2025-06-13  4:57   ` Xuewei Niu
@ 2025-06-13  8:35     ` Stefano Garzarella
  2025-06-13  8:46       ` Xuewei Niu
  0 siblings, 1 reply; 59+ messages in thread
From: Stefano Garzarella @ 2025-06-13  8:35 UTC (permalink / raw)
  To: Xuewei Niu; +Cc: jasowang, fupan.lfp, mst, niuxuewei.nxw, parav, virtio-comment

On Fri, 13 Jun 2025 at 06:57, Xuewei Niu <niuxuewei97@gmail.com> wrote:
>
> > On Sat, Apr 12, 2025 at 10:39 PM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > >
> > > This patch brings a new feature, called "multi devices", to the virtio
> > > vsock. It introduces a "VIRTIO_VSOCK_F_MULTI_DEVICES" feature bit, and a
> > > "device_order" field to the config for the virtio vsock.
> > >
> > > == Motivition ==
> > >
> > > Vsock is a lightweight and widely used data exchange mechanism between host
> > > and guest. Currently, the virtio-vsock only supports one device, resulting
> > > in the inability to enable more than one backend.
> >
> > I wonder which part of the spec forbids more than one device.
>
> No. The spec, however, is designed for a single device, and lacks some
> specifications for multiple devices.
>
> For example, we should have a mechanism to select a device from all to
> communicate with a peer.
>
> This patch introduces a new feature bit "VIRTIO_VSOCK_F_MULTI_DEVICES", and
> involves some modifications to the config space, device & driver norms.

Maybe we should think a better name to avoid confusion, like "F_DEVICE_ORDER".
I'm not good at name, so if others are better suggestion, they are welcome.

>
> > > For instance, two devices
> > > are required: one to transfer data to the VMM via virtio-vsock, and another
> > > to a user process via vhost-user-vsock.
> > >
> > > Apart from that, a side gain is that theoretically the performance might be
> > > improved since each device has its own queue. But it varies depending on
> > > the implementation.
> >
> > It could implement multiqueue anyhow.
> >
> > Thanks
>
> Yes, indeed. As we discussed before, I marked this as a side gain. And
> @Stefano said that the community has a plan to implement multiqueue for
> vsock.

I'm not sure there is a plan, but yeah, we should do it some day.
As I mention in the previous email, I'd remove any reference to performance.

I'd just talk about a mechanism to select a default out device, when
the source socket is not bound to any address, like a default gateway.

Thanks,
Stefano

> The foundational goal of this patch is to enable vsock to support
> multiple backends.
>
> Thanks,
> Xuewei
>


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v7] virtio-vsock: Add support for multi devices
  2025-06-13  8:31   ` Stefano Garzarella
@ 2025-06-13  8:36     ` Xuewei Niu
  0 siblings, 0 replies; 59+ messages in thread
From: Xuewei Niu @ 2025-06-13  8:36 UTC (permalink / raw)
  To: sgarzare
  Cc: fupan.lfp, jasowang, mst, niuxuewei.nxw, niuxuewei97, parav,
	virtio-comment

> On Fri, 13 Jun 2025 at 06:23, Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Sat, Apr 12, 2025 at 10:39 PM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > >
> > > This patch brings a new feature, called "multi devices", to the virtio
> > > vsock. It introduces a "VIRTIO_VSOCK_F_MULTI_DEVICES" feature bit, and a
> > > "device_order" field to the config for the virtio vsock.
> > >
> > > == Motivition ==
> > >
> > > Vsock is a lightweight and widely used data exchange mechanism between host
> > > and guest. Currently, the virtio-vsock only supports one device, resulting
> > > in the inability to enable more than one backend.
> >
> > I wonder which part of the spec forbids more than one device.
> 
> I don't think there isn't anything that forbid it, but what we don't
> have is a kind of default gateway/interface to use when sending
> messages.
> So, this proposal is mostly to chose the default device to use, when
> the user don't bind any CID on the source socket.
> 
> >
> > > For instance, two devices
> > > are required: one to transfer data to the VMM via virtio-vsock, and another
> > > to a user process via vhost-user-vsock.
> > >
> > > Apart from that, a side gain is that theoretically the performance might be
> > > improved since each device has its own queue. But it varies depending on
> > > the implementation.
> >
> > It could implement multiqueue anyhow.
> 
> Yeah, I pointed out multiple times to skip this as justification,
> since multi-queue has more much sense for that.
> IMHO it's confusing to still talk about it in this proposal.
> 
> Stefano
> 
> >
> > Thanks
> >

Thanks for the clarification. I'll remove such descriptions in the next
version.

Thanks,
Xuewei

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v7] virtio-vsock: Add support for multi devices
  2025-06-13  8:35     ` Stefano Garzarella
@ 2025-06-13  8:46       ` Xuewei Niu
  2025-06-16  3:06         ` Jason Wang
  0 siblings, 1 reply; 59+ messages in thread
From: Xuewei Niu @ 2025-06-13  8:46 UTC (permalink / raw)
  To: sgarzare
  Cc: fupan.lfp, jasowang, mst, niuxuewei.nxw, niuxuewei97, parav,
	virtio-comment

> On Fri, 13 Jun 2025 at 06:57, Xuewei Niu <niuxuewei97@gmail.com> wrote:
> >
> > > On Sat, Apr 12, 2025 at 10:39 PM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > >
> > > > This patch brings a new feature, called "multi devices", to the virtio
> > > > vsock. It introduces a "VIRTIO_VSOCK_F_MULTI_DEVICES" feature bit, and a
> > > > "device_order" field to the config for the virtio vsock.
> > > >
> > > > == Motivition ==
> > > >
> > > > Vsock is a lightweight and widely used data exchange mechanism between host
> > > > and guest. Currently, the virtio-vsock only supports one device, resulting
> > > > in the inability to enable more than one backend.
> > >
> > > I wonder which part of the spec forbids more than one device.
> >
> > No. The spec, however, is designed for a single device, and lacks some
> > specifications for multiple devices.
> >
> > For example, we should have a mechanism to select a device from all to
> > communicate with a peer.
> >
> > This patch introduces a new feature bit "VIRTIO_VSOCK_F_MULTI_DEVICES", and
> > involves some modifications to the config space, device & driver norms.
> 
> Maybe we should think a better name to avoid confusion, like "F_DEVICE_ORDER".
> I'm not good at name, so if others are better suggestion, they are welcome.

Naming things is tough for me, too. I'm good with "F_DEVICE_ORDER", and
will update it in the next. I am still open to other suggestions.

Thanks,
Xuewei

> >
> > > > For instance, two devices
> > > > are required: one to transfer data to the VMM via virtio-vsock, and another
> > > > to a user process via vhost-user-vsock.
> > > >
> > > > Apart from that, a side gain is that theoretically the performance might be
> > > > improved since each device has its own queue. But it varies depending on
> > > > the implementation.
> > >
> > > It could implement multiqueue anyhow.
> > >
> > > Thanks
> >
> > Yes, indeed. As we discussed before, I marked this as a side gain. And
> > @Stefano said that the community has a plan to implement multiqueue for
> > vsock.
> 
> I'm not sure there is a plan, but yeah, we should do it some day.
> As I mention in the previous email, I'd remove any reference to performance.
> 
> I'd just talk about a mechanism to select a default out device, when
> the source socket is not bound to any address, like a default gateway.
> 
> Thanks,
> Stefano
> 
> > The foundational goal of this patch is to enable vsock to support
> > multiple backends.
> >
> > Thanks,
> > Xuewei

^ permalink raw reply	[flat|nested] 59+ messages in thread

* RE: [PATCH v7] virtio-vsock: Add support for multi devices
  2025-05-19  9:37   ` Xuewei Niu
  2025-06-11 14:51     ` Xuewei Niu
  2025-06-11 18:02     ` Michael S. Tsirkin
@ 2025-06-14 17:11     ` Parav Pandit
  2025-06-16  8:18       ` Xuewei Niu
  2 siblings, 1 reply; 59+ messages in thread
From: Parav Pandit @ 2025-06-14 17:11 UTC (permalink / raw)
  To: Xuewei Niu, mst@redhat.com
  Cc: fupan.lfp@antgroup.com, niuxuewei.nxw@antgroup.com,
	sgarzare@redhat.com, virtio-comment@lists.linux.dev

Hi Xuewei,

> From: Xuewei Niu <niuxuewei97@gmail.com>
> Sent: Monday, May 19, 2025 3:08 PM
> 
> > On Sat, Apr 12, 2025 at 10:28:25PM +0800, Xuewei Niu wrote:
> > > This patch brings a new feature, called "multi devices", to the
> > > virtio vsock. It introduces a "VIRTIO_VSOCK_F_MULTI_DEVICES" feature
> > > bit, and a "device_order" field to the config for the virtio vsock.
> > >
> > > == Motivition ==
> > >
> > > Vsock is a lightweight and widely used data exchange mechanism
> > > between host and guest. 

Even though it is the current use, the specification does not prevent its usage between two guests via a host.
So we should not assume such guest <-> host communication as the only case to add new feature.

For example, in the spec only must requirement is that src_cid == config.guest_cid.
Dst_cid can be anything, it need not be well known 0x2 (for the host).

With this flexibility in the spec, one can connect vsock devices with multiple different backends.

For example,
QEMU can insert one vsock device for VM to HV communication.
A real PCI device can insert one voscket device for VM-to-VM communication bypassing a full TCP/IP stack.

This means there are two different backends.
And these two devices should not be grouped in the use case you described.

So as we discussed sometime ago in thread [1], for the use case that you described, one needs the concept of a virtio device group.
I urge you to include such basic construct to the spec, without which the group of multiple devices seems broken beginning.

I also highlighted additional use cases at [2] that will also benefit from same feature.
And introducing generic feature is more valuable when useful to multiple device types.

[1] https://lore.kernel.org/virtio-comment/20250217071804.296893-1-niuxuewei.nxw@antgroup.com/
[2] https://lore.kernel.org/virtio-comment/CY8PR12MB7195FD1684735ECB8D950F6FDCFB2@CY8PR12MB7195.namprd12.prod.outlook.com/

Can you please consider revising the proposal to be more complete?

And I also fully agree to Stefano suggestion to drop discussing multi-queue performance aspect here as its not related at all.
You are addressing device selection functionality and grouping multiple devices, which cannot be solved by multi-queue.
(unless you bring the concept of CID to queue binding).

> Currently, the virtio-vsock only supports
> > > one device, resulting in the inability to enable more than one
> > > backend. For instance, two devices are required: one to transfer
> > > data to the VMM via virtio-vsock, and another to a user process via vhost-
> user-vsock.
> > >
> > > Apart from that, a side gain is that theoretically the performance
> > > might be improved since each device has its own queue. But it varies
> > > depending on the implementation.
> > >
> > > == Typical Usages ==
> > >
> > > Assuming there are two virtio-vsock devices on the guest, with CIDs
> > > 3 and 4 respectively. And the device with CID 3 is default.
> > >
> > > Connect to the host using the device with CID 3.
> > >
> > > ```c
> > > // use default one (no bind)
> > > fd = socket(AF_VSOCK);
> > > connect(fd, 2, 1234);
> > > n = write(fd, buffer);
> > >
> > > // or bind explicitly
> > > fd = socket(AF_VSOCK);
> > > bind(fd, 3, -1);
> > > connect(fd, 2, 1234);
> > > n = write(fd, buffer);
> > > ```
> > >
> > > Connect to the host using the device with CID 4.
> > >
> > > ```c
> > > // must bind explicitly as the device with CID 4 is not default.
> > > fd = socket(AF_VSOCK);
> > > bind(fd, 4, -1);
> > > connect(fd, 2, 1234);
> > > n = write(fd, buffer);
> > > ```
> > >
> > > The first version of multi-devices implementation is available at [1].
> > >
> > > v6 -> v7:
> > > - Addresses minor review comments from Stefano.
> > >
> > > [1]
> > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flo
> > > re.kernel.org%2Fvirtualization%2F20240517144607.2595798-1-
> niuxuewei.
> > >
> nxw%40antgroup.com&data=05%7C02%7Cparav%40nvidia.com%7Cb912ba8
> a049a4
> > >
> 6ed8d7808dd96b8d125%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0
> %7C638
> > >
> 832442724462145%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRy
> dWUsIlYi
> > >
> OiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%
> 7C0
> > >
> %7C%7C%7C&sdata=AarOvDq9a3Djl7bCH2vsHGlKfEmHtvcXQGOGvXetNGo%3
> D&reser
> > > ved=0
> > >
> > > Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
> > > ---
> > >  device-types/vsock/description.tex | 30
> > > ++++++++++++++++++++++++++++--
> > >  1 file changed, 28 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/device-types/vsock/description.tex
> > > b/device-types/vsock/description.tex
> > > index 7d91d15..392dc76 100644
> > > --- a/device-types/vsock/description.tex
> > > +++ b/device-types/vsock/description.tex
> > > @@ -20,6 +20,7 @@ \subsection{Feature bits}\label{sec:Device Types /
> > > Socket Device / Feature bits}  \item[VIRTIO_VSOCK_F_STREAM (0)]
> stream socket type is supported.
> > >  \item[VIRTIO_VSOCK_F_SEQPACKET (1)] seqpacket socket type is
> supported.
> > >  \item[VIRTIO_VSOCK_F_NO_IMPLIED_STREAM (2)] stream socket type is
> not implied.
> > > +\item[VIRTIO_VSOCK_F_MULTI_DEVICES (3)] multiple devices feature is
> supported.
> > >  \end{description}
> > >
> > >  \drivernormative{\subsubsection}{Feature bits}{Device Types /
> > > Socket Device / Feature bits} @@ -34,6 +35,12 @@ \subsection{Feature
> > > bits}\label{sec:Device Types / Socket Device / Feature bits}
> > > VIRTIO_VSOCK_F_NO_IMPLIED_STREAM, the driver MAY act as if
> VIRTIO_VSOCK_F_STREAM has also been negotiated.
> > >
> > > +The driver SHOULD ignore devices that do not have
> > > +VIRTIO_VSOCK_F_MULTI_DEVICES if the feature has been negotiated.
> > > +
> > > +The driver SHOULD ignore all subsequent devices if a device without
> > > +VIRTIO_VSOCK_F_MULTI_DEVICES is present.
> > > +
> >
> > all this is really vague. any better way to put it?
> >
> > what are subsequent devices? if the feature has been negotiated where?
> > what does ignore mean? you can not know features without interacting
> > with the device.
> 
> The original idea is: Some devices have enabled the multi-devices feature,
> while others have not, and this situation is unacceptable.
> 
> The driver determines the states based on the first device present in the
> guest.
> 
> There are two possible cases:
> 
> - If the first device has negotiated the multi-devices feature, then the driver
> considers the multi-devices feature as enabled. Then, the driver will ignore
> all devices that do not negotiate the feature.
> - If the first device has not negotiated, it indicates that the multi-devices
> feature is disabled. Consequently, the driver will ignore any subsequent
> devices.
> 
> ====
> 
> Here is the revised version:
> 
> To ensure consistency, all devices MUST have the same multi-devices feature
> status; a mix of enabled and disabled devices is not acceptable. The driver
> determines whether the multi-devices feature is enabled based on the first
> device present in the guest: if the first device has negotiated the feature, the
> driver enables it and ignores any devices that have not; if the first device has
> not negotiated the feature, the driver treats the feature as disabled and
> ignores any subsequent devices.
> 
> Does this look better to you?
> 
> > >  \devicenormative{\subsubsection}{Feature bits}{Device Types /
> > > Socket Device / Feature bits}
> > >
> > >  The device SHOULD offer the VIRTIO_VSOCK_F_NO_IMPLIED_STREAM
> feature.
> > > @@ -52,6 +59,7 @@ \subsection{Device configuration
> > > layout}\label{sec:Device Types / Socket Device  \begin{lstlisting}
> > > struct virtio_vsock_config {
> > >  	le64 guest_cid;
> > > +	le16 device_order;
> > >  };
> > >  \end{lstlisting}
> > >
> > > @@ -77,11 +85,27 @@ \subsection{Device configuration
> > > layout}\label{sec:Device Types / Socket Device  \hline
> > > \end{tabular}
> > >
> > > +The \field{device_order} is used to identify the default device.
> >
> > no explanation what is the default device.
> > is it just for the cid?
> 
> Yes.
> 
> It is allowed to not specify the local CID for a socket. In this case, the driver
> will use the default device's CID as the local CID for the socket.
> 
> The details are listed in the "Receive and Transmit" section, where you left a
> comment.
> 
> > > Up to
> > > +65,535 devices can be supported due to the size.
> >
> > can be -> are
> > drop "due to the size".
> 
> Will do in the next version.
> 
> > > +\devicenormative{\subsubsection}{Device configuration
> > > +layout}{Device Types / Socket Device / Device configuration layout}
> > > +
> > > +The device MUST provide a distinct \field{device_order} if
> > > +VIRTIO_VSOCK_F_MULTI_DEVICES feature has been negotiated.
> >
> > distinct to what?
> 
> In the scope of the guest VM, the device_order should be unique. This
> means that the device_order should be distinct for each device.
> 
> > > +\drivernormative{\subsubsection}{Device configuration
> > > +layout}{Device Types / Socket Device / Device configuration layout}
> > > +
> > > +The driver MUST treat the device with the lowest
> > > +\field{device_order} as the default device.
> > > +
> > >  \subsection{Device Initialization}\label{sec:Device Types / Socket
> > > Device / Device Initialization}
> > >
> > >  \begin{enumerate}
> > >  \item The guest's cid is read from \field{guest_cid}.
> > >
> > > +\item If VIRTIO_VSOCK_F_MULTI_DEVICES has been negotiated, the
> > > +device's order will be read from \field{device_order}.
> > > +
> > >  \item Buffers are added to the event virtqueue to receive events from
> the device.
> > >
> > >  \item Buffers are added to the rx virtqueue to start receiving packets.
> > > @@ -233,8 +257,10 @@ \subsubsection{Receive and
> > > Transmit}\label{sec:Device Types / Socket Device / De
> > >
> > >  \drivernormative{\paragraph}{Device Operation: Receive and
> > > Transmit}{Device Types / Socket Device / Device Operation / Receive
> > > and Transmit}
> > >
> > > -The \field{guest_cid} configuration field MUST be used as the
> > > source CID when -sending outgoing packets.
> > > +If the source socket is not bound to any source CID, the driver
> > > +MUST assign one. If more than one device is present, the driver
> > > +SHOULD use the default device's \field{guest_cid} configuration.
> > > +Otherwise, the driver SHOULD use the \field{guest_cid} of the only
> available device.
> >
> > why did you drop requirement about outgoing packets?
> 
> The driver prefers to use the CID provided by the user. That is, if the user
> binds to a source CID, the driver will use it and does not need to do anything.
> If not, the driver will use one from the configuration.
> 
> Thanks,
> Xuewei
> 
> > >  A VIRTIO_VSOCK_OP_RST reply MUST be sent if a packet is received
> > > with an  unknown \field{type} value.
> > > --
> > > 2.34.1

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v7] virtio-vsock: Add support for multi devices
  2025-06-13  8:46       ` Xuewei Niu
@ 2025-06-16  3:06         ` Jason Wang
  2025-06-16  8:29           ` Xuewei Niu
  0 siblings, 1 reply; 59+ messages in thread
From: Jason Wang @ 2025-06-16  3:06 UTC (permalink / raw)
  To: Xuewei Niu; +Cc: sgarzare, fupan.lfp, mst, niuxuewei.nxw, parav, virtio-comment

On Fri, Jun 13, 2025 at 4:46 PM Xuewei Niu <niuxuewei97@gmail.com> wrote:
>
> > On Fri, 13 Jun 2025 at 06:57, Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > >
> > > > On Sat, Apr 12, 2025 at 10:39 PM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > >
> > > > > This patch brings a new feature, called "multi devices", to the virtio
> > > > > vsock. It introduces a "VIRTIO_VSOCK_F_MULTI_DEVICES" feature bit, and a
> > > > > "device_order" field to the config for the virtio vsock.
> > > > >
> > > > > == Motivition ==
> > > > >
> > > > > Vsock is a lightweight and widely used data exchange mechanism between host
> > > > > and guest. Currently, the virtio-vsock only supports one device, resulting
> > > > > in the inability to enable more than one backend.
> > > >
> > > > I wonder which part of the spec forbids more than one device.
> > >
> > > No. The spec, however, is designed for a single device, and lacks some
> > > specifications for multiple devices.
> > >
> > > For example, we should have a mechanism to select a device from all to
> > > communicate with a peer.

I wonder if this is a:

1) mechanism that needs to be mandated by the device
2) a policy that is allowed to be tweaked by the user as TCP/IP did

(Note anyhow the driver can override what the device suggests...)


> > >
> > > This patch introduces a new feature bit "VIRTIO_VSOCK_F_MULTI_DEVICES", and
> > > involves some modifications to the config space, device & driver norms.
> >
> > Maybe we should think a better name to avoid confusion, like "F_DEVICE_ORDER".
> > I'm not good at name, so if others are better suggestion, they are welcome.
>
> Naming things is tough for me, too. I'm good with "F_DEVICE_ORDER", and
> will update it in the next. I am still open to other suggestions.

If we agree on the idea, I agree we need a better name.

Thanks

>
> Thanks,
> Xuewei
>
> > >
> > > > > For instance, two devices
> > > > > are required: one to transfer data to the VMM via virtio-vsock, and another
> > > > > to a user process via vhost-user-vsock.
> > > > >
> > > > > Apart from that, a side gain is that theoretically the performance might be
> > > > > improved since each device has its own queue. But it varies depending on
> > > > > the implementation.
> > > >
> > > > It could implement multiqueue anyhow.
> > > >
> > > > Thanks
> > >
> > > Yes, indeed. As we discussed before, I marked this as a side gain. And
> > > @Stefano said that the community has a plan to implement multiqueue for
> > > vsock.
> >
> > I'm not sure there is a plan, but yeah, we should do it some day.
> > As I mention in the previous email, I'd remove any reference to performance.
> >
> > I'd just talk about a mechanism to select a default out device, when
> > the source socket is not bound to any address, like a default gateway.
> >
> > Thanks,
> > Stefano
> >
> > > The foundational goal of this patch is to enable vsock to support
> > > multiple backends.
> > >
> > > Thanks,
> > > Xuewei
>


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v7] virtio-vsock: Add support for multi devices
  2025-06-14 17:11     ` Parav Pandit
@ 2025-06-16  8:18       ` Xuewei Niu
  2025-06-16  8:26         ` Parav Pandit
  0 siblings, 1 reply; 59+ messages in thread
From: Xuewei Niu @ 2025-06-16  8:18 UTC (permalink / raw)
  To: parav; +Cc: fupan.lfp, mst, niuxuewei.nxw, niuxuewei97, sgarzare,
	virtio-comment

Hi, Parav.

Thanks for your detailed comments.

> Hi Xuewei,
> 
> > From: Xuewei Niu <niuxuewei97@gmail.com>
> > Sent: Monday, May 19, 2025 3:08 PM
> > 
> > > On Sat, Apr 12, 2025 at 10:28:25PM +0800, Xuewei Niu wrote:
> > > > This patch brings a new feature, called "multi devices", to the
> > > > virtio vsock. It introduces a "VIRTIO_VSOCK_F_MULTI_DEVICES" feature
> > > > bit, and a "device_order" field to the config for the virtio vsock.
> > > >
> > > > == Motivition ==
> > > >
> > > > Vsock is a lightweight and widely used data exchange mechanism
> > > > between host and guest. 
> 
> Even though it is the current use, the specification does not prevent its usage between two guests via a host.
> So we should not assume such guest <-> host communication as the only case to add new feature.
> 
> For example, in the spec only must requirement is that src_cid == config.guest_cid.
> Dst_cid can be anything, it need not be well known 0x2 (for the host).

Yes.

> With this flexibility in the spec, one can connect vsock devices with multiple different backends.
> 
> For example,
> QEMU can insert one vsock device for VM to HV communication.

It is also able to communicate with other devices on the same host, i.e.
VM-to-VM.

> A real PCI device can insert one voscket device for VM-to-VM communication bypassing a full TCP/IP stack.

AFAIK, all devices are implemented in software. Is it a real PCI HW device?
 
> This means there are two different backends.
> And these two devices should not be grouped in the use case you described.

I think one group is enough for all use cases. It is required that CIDs are
unique in global, i.e. all backends.

As your example, let me assume there are two VMs

1. VM0 (two vsock backends)
    1.1 device0 (cid=3, default), backend is host kernel (vhost-vsock);
    1.2 device1 (cid=4), backend is HV (virtio-vsock).
2. VM1
    2.1 device2 (cid=5, default), backend is host kernel (vhost-vsock).

The device0 is able to do VM0-HOST (src_cid=3, dst_cid=2) and VM0-VM1
(src_cid=3, dst_cid=5) communicatation, while the device1 is only able to
do VM0-HV (src_cid=4, dst_cid=2) communicatation.

In a word, a tuple identifies a connection.

"Refuse to connect" will be raised if the device1 attempts to connect to
the device2. "They are not in the same group" is a reasonable explaination.
Am I right?

I am not an expert in networking, so please correct me if I
misunderstood.

> So as we discussed sometime ago in thread [1], for the use case that you described, one needs the concept of a virtio device group.
> I urge you to include such basic construct to the spec, without which the group of multiple devices seems broken beginning.
> 
> I also highlighted additional use cases at [2] that will also benefit from same feature.
> And introducing generic feature is more valuable when useful to multiple device types.
> 
> [1] https://lore.kernel.org/virtio-comment/20250217071804.296893-1-niuxuewei.nxw@antgroup.com/
> [2] https://lore.kernel.org/virtio-comment/CY8PR12MB7195FD1684735ECB8D950F6FDCFB2@CY8PR12MB7195.namprd12.prod.outlook.com/
> 
> Can you please`consider revising the proposal to be more complete?

Sure thing. I'll update this part once we reach an agreement.

> And I also fully agree to Stefano suggestion to drop discussing multi-queue performance aspect here as its not related at all.
> You are addressing device selection functionality and grouping multiple devices, which cannot be solved by multi-queue.
> (unless you bring the concept of CID to queue binding).

Yes. I'll remove them in the next.

Thanks,
Xuewei

> > Currently, the virtio-vsock only supports
> > > > one device, resulting in the inability to enable more than one
> > > > backend. For instance, two devices are required: one to transfer
> > > > data to the VMM via virtio-vsock, and another to a user process via vhost-
> > user-vsock.
> > > >
> > > > Apart from that, a side gain is that theoretically the performance
> > > > might be improved since each device has its own queue. But it varies
> > > > depending on the implementation.
> > > >
> > > > == Typical Usages ==
> > > >
> > > > Assuming there are two virtio-vsock devices on the guest, with CIDs
> > > > 3 and 4 respectively. And the device with CID 3 is default.
> > > >
> > > > Connect to the host using the device with CID 3.
> > > >
> > > > ```c
> > > > // use default one (no bind)
> > > > fd = socket(AF_VSOCK);
> > > > connect(fd, 2, 1234);
> > > > n = write(fd, buffer);
> > > >
> > > > // or bind explicitly
> > > > fd = socket(AF_VSOCK);
> > > > bind(fd, 3, -1);
> > > > connect(fd, 2, 1234);
> > > > n = write(fd, buffer);
> > > > ```
> > > >
> > > > Connect to the host using the device with CID 4.
> > > >
> > > > ```c
> > > > // must bind explicitly as the device with CID 4 is not default.
> > > > fd = socket(AF_VSOCK);
> > > > bind(fd, 4, -1);
> > > > connect(fd, 2, 1234);
> > > > n = write(fd, buffer);
> > > > ```
> > > >
> > > > The first version of multi-devices implementation is available at [1].
> > > >
> > > > v6 -> v7:
> > > > - Addresses minor review comments from Stefano.
> > > >
> > > > [1]
> > > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flo
> > > > re.kernel.org%2Fvirtualization%2F20240517144607.2595798-1-
> > niuxuewei.
> > > >
> > nxw%40antgroup.com&data=05%7C02%7Cparav%40nvidia.com%7Cb912ba8
> > a049a4
> > > >
> > 6ed8d7808dd96b8d125%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0
> > %7C638
> > > >
> > 832442724462145%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRy
> > dWUsIlYi
> > > >
> > OiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%
> > 7C0
> > > >
> > %7C%7C%7C&sdata=AarOvDq9a3Djl7bCH2vsHGlKfEmHtvcXQGOGvXetNGo%3
> > D&reser
> > > > ved=0
> > > >
> > > > Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
> > > > ---
> > > >  device-types/vsock/description.tex | 30
> > > > ++++++++++++++++++++++++++++--
> > > >  1 file changed, 28 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/device-types/vsock/description.tex
> > > > b/device-types/vsock/description.tex
> > > > index 7d91d15..392dc76 100644
> > > > --- a/device-types/vsock/description.tex
> > > > +++ b/device-types/vsock/description.tex
> > > > @@ -20,6 +20,7 @@ \subsection{Feature bits}\label{sec:Device Types /
> > > > Socket Device / Feature bits}  \item[VIRTIO_VSOCK_F_STREAM (0)]
> > stream socket type is supported.
> > > >  \item[VIRTIO_VSOCK_F_SEQPACKET (1)] seqpacket socket type is
> > supported.
> > > >  \item[VIRTIO_VSOCK_F_NO_IMPLIED_STREAM (2)] stream socket type is
> > not implied.
> > > > +\item[VIRTIO_VSOCK_F_MULTI_DEVICES (3)] multiple devices feature is
> > supported.
> > > >  \end{description}
> > > >
> > > >  \drivernormative{\subsubsection}{Feature bits}{Device Types /
> > > > Socket Device / Feature bits} @@ -34,6 +35,12 @@ \subsection{Feature
> > > > bits}\label{sec:Device Types / Socket Device / Feature bits}
> > > > VIRTIO_VSOCK_F_NO_IMPLIED_STREAM, the driver MAY act as if
> > VIRTIO_VSOCK_F_STREAM has also been negotiated.
> > > >
> > > > +The driver SHOULD ignore devices that do not have
> > > > +VIRTIO_VSOCK_F_MULTI_DEVICES if the feature has been negotiated.
> > > > +
> > > > +The driver SHOULD ignore all subsequent devices if a device without
> > > > +VIRTIO_VSOCK_F_MULTI_DEVICES is present.
> > > > +
> > >
> > > all this is really vague. any better way to put it?
> > >
> > > what are subsequent devices? if the feature has been negotiated where?
> > > what does ignore mean? you can not know features without interacting
> > > with the device.
> > 
> > The original idea is: Some devices have enabled the multi-devices feature,
> > while others have not, and this situation is unacceptable.
> > 
> > The driver determines the states based on the first device present in the
> > guest.
> > 
> > There are two possible cases:
> > 
> > - If the first device has negotiated the multi-devices feature, then the driver
> > considers the multi-devices feature as enabled. Then, the driver will ignore
> > all devices that do not negotiate the feature.
> > - If the first device has not negotiated, it indicates that the multi-devices
> > feature is disabled. Consequently, the driver will ignore any subsequent
> > devices.
> > 
> > ====
> > 
> > Here is the revised version:
> > 
> > To ensure consistency, all devices MUST have the same multi-devices feature
> > status; a mix of enabled and disabled devices is not acceptable. The driver
> > determines whether the multi-devices feature is enabled based on the first
> > device present in the guest: if the first device has negotiated the feature, the
> > driver enables it and ignores any devices that have not; if the first device has
> > not negotiated the feature, the driver treats the feature as disabled and
> > ignores any subsequent devices.
> > 
> > Does this look better to you?
> > 
> > > >  \devicenormative{\subsubsection}{Feature bits}{Device Types /
> > > > Socket Device / Feature bits}
> > > >
> > > >  The device SHOULD offer the VIRTIO_VSOCK_F_NO_IMPLIED_STREAM
> > feature.
> > > > @@ -52,6 +59,7 @@ \subsection{Device configuration
> > > > layout}\label{sec:Device Types / Socket Device  \begin{lstlisting}
> > > > struct virtio_vsock_config {
> > > >  	le64 guest_cid;
> > > > +	le16 device_order;
> > > >  };
> > > >  \end{lstlisting}
> > > >
> > > > @@ -77,11 +85,27 @@ \subsection{Device configuration
> > > > layout}\label{sec:Device Types / Socket Device  \hline
> > > > \end{tabular}
> > > >
> > > > +The \field{device_order} is used to identify the default device.
> > >
> > > no explanation what is the default device.
> > > is it just for the cid?
> > 
> > Yes.
> > 
> > It is allowed to not specify the local CID for a socket. In this case, the driver
> > will use the default device's CID as the local CID for the socket.
> > 
> > The details are listed in the "Receive and Transmit" section, where you left a
> > comment.
> > 
> > > > Up to
> > > > +65,535 devices can be supported due to the size.
> > >
> > > can be -> are
> > > drop "due to the size".
> > 
> > Will do in the next version.
> > 
> > > > +\devicenormative{\subsubsection}{Device configuration
> > > > +layout}{Device Types / Socket Device / Device configuration layout}
> > > > +
> > > > +The device MUST provide a distinct \field{device_order} if
> > > > +VIRTIO_VSOCK_F_MULTI_DEVICES feature has been negotiated.
> > >
> > > distinct to what?
> > 
> > In the scope of the guest VM, the device_order should be unique. This
> > means that the device_order should be distinct for each device.
> > 
> > > > +\drivernormative{\subsubsection}{Device configuration
> > > > +layout}{Device Types / Socket Device / Device configuration layout}
> > > > +
> > > > +The driver MUST treat the device with the lowest
> > > > +\field{device_order} as the default device.
> > > > +
> > > >  \subsection{Device Initialization}\label{sec:Device Types / Socket
> > > > Device / Device Initialization}
> > > >
> > > >  \begin{enumerate}
> > > >  \item The guest's cid is read from \field{guest_cid}.
> > > >
> > > > +\item If VIRTIO_VSOCK_F_MULTI_DEVICES has been negotiated, the
> > > > +device's order will be read from \field{device_order}.
> > > > +
> > > >  \item Buffers are added to the event virtqueue to receive events from
> > the device.
> > > >
> > > >  \item Buffers are added to the rx virtqueue to start receiving packets.
> > > > @@ -233,8 +257,10 @@ \subsubsection{Receive and
> > > > Transmit}\label{sec:Device Types / Socket Device / De
> > > >
> > > >  \drivernormative{\paragraph}{Device Operation: Receive and
> > > > Transmit}{Device Types / Socket Device / Device Operation / Receive
> > > > and Transmit}
> > > >
> > > > -The \field{guest_cid} configuration field MUST be used as the
> > > > source CID when -sending outgoing packets.
> > > > +If the source socket is not bound to any source CID, the driver
> > > > +MUST assign one. If more than one device is present, the driver
> > > > +SHOULD use the default device's \field{guest_cid} configuration.
> > > > +Otherwise, the driver SHOULD use the \field{guest_cid} of the only
> > available device.
> > >
> > > why did you drop requirement about outgoing packets?
> > 
> > The driver prefers to use the CID provided by the user. That is, if the user
> > binds to a source CID, the driver will use it and does not need to do anything.
> > If not, the driver will use one from the configuration.
> > 
> > Thanks,
> > Xuewei
> > 
> > > >  A VIRTIO_VSOCK_OP_RST reply MUST be sent if a packet is received
> > > > with an  unknown \field{type} value.
> > > > --
> > > > 2.34.1

^ permalink raw reply	[flat|nested] 59+ messages in thread

* RE: [PATCH v7] virtio-vsock: Add support for multi devices
  2025-06-16  8:18       ` Xuewei Niu
@ 2025-06-16  8:26         ` Parav Pandit
  2025-06-16  8:59           ` Xuewei Niu
  0 siblings, 1 reply; 59+ messages in thread
From: Parav Pandit @ 2025-06-16  8:26 UTC (permalink / raw)
  To: Xuewei Niu
  Cc: fupan.lfp@antgroup.com, mst@redhat.com,
	niuxuewei.nxw@antgroup.com, sgarzare@redhat.com,
	virtio-comment@lists.linux.dev


> From: Xuewei Niu <niuxuewei97@gmail.com>
> Sent: Monday, June 16, 2025 1:48 PM
> 
> Hi, Parav.
> 
> Thanks for your detailed comments.
> 
> > Hi Xuewei,
> >
> > > From: Xuewei Niu <niuxuewei97@gmail.com>
> > > Sent: Monday, May 19, 2025 3:08 PM
> > >
> > > > On Sat, Apr 12, 2025 at 10:28:25PM +0800, Xuewei Niu wrote:
> > > > > This patch brings a new feature, called "multi devices", to the
> > > > > virtio vsock. It introduces a "VIRTIO_VSOCK_F_MULTI_DEVICES"
> > > > > feature bit, and a "device_order" field to the config for the virtio vsock.
> > > > >
> > > > > == Motivition ==
> > > > >
> > > > > Vsock is a lightweight and widely used data exchange mechanism
> > > > > between host and guest.
> >
> > Even though it is the current use, the specification does not prevent its usage
> between two guests via a host.
> > So we should not assume such guest <-> host communication as the only
> case to add new feature.
> >
> > For example, in the spec only must requirement is that src_cid ==
> config.guest_cid.
> > Dst_cid can be anything, it need not be well known 0x2 (for the host).
> 
> Yes.
> 
> > With this flexibility in the spec, one can connect vsock devices with multiple
> different backends.
> >
> > For example,
> > QEMU can insert one vsock device for VM to HV communication.
> 
> It is also able to communicate with other devices on the same host, i.e.
> VM-to-VM.
> 
> > A real PCI device can insert one voscket device for VM-to-VM
> communication bypassing a full TCP/IP stack.
> 
> AFAIK, all devices are implemented in software. Is it a real PCI HW device?
> 
Yes. virto PCI devices are implemented as hw or as vdpa for many years now by cloud operators and by NIC vendors.

> > This means there are two different backends.
> > And these two devices should not be grouped in the use case you described.
> 
> I think one group is enough for all use cases. It is required that CIDs are unique
> in global, i.e. all backends.
> 
> As your example, let me assume there are two VMs
> 
> 1. VM0 (two vsock backends)
>     1.1 device0 (cid=3, default), backend is host kernel (vhost-vsock);
>     1.2 device1 (cid=4), backend is HV (virtio-vsock).
> 2. VM1
>     2.1 device2 (cid=5, default), backend is host kernel (vhost-vsock).
> 
> The device0 is able to do VM0-HOST (src_cid=3, dst_cid=2) and VM0-VM1
> (src_cid=3, dst_cid=5) communicatation, while the device1 is only able to do
> VM0-HV (src_cid=4, dst_cid=2) communicatation.
> 
In example 1.1 and 1.2 no devices are grouped.
Your proposal of this patch wants to group the two devices and pick one of them as default device.
If device0 and device1 are inserted to the VM0 with the feature bit you suggested, the guest things that they are part of the same group.

When bind() call is not done, host sw does not know which device to pick up between device0 and device1 when binding the devices.


> In a word, a tuple identifies a connection.
> 
> "Refuse to connect" will be raised if the device1 attempts to connect to the
> device2. "They are not in the same group" is a reasonable explaination.
> Am I right?
> 
During bind call, one needs to select the device when the devices are coming from multiple different backends.

[..]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v7] virtio-vsock: Add support for multi devices
  2025-06-16  3:06         ` Jason Wang
@ 2025-06-16  8:29           ` Xuewei Niu
  2025-06-16  8:38             ` Stefano Garzarella
  0 siblings, 1 reply; 59+ messages in thread
From: Xuewei Niu @ 2025-06-16  8:29 UTC (permalink / raw)
  To: jasowang
  Cc: fupan.lfp, mst, niuxuewei.nxw, niuxuewei97, parav, sgarzare,
	virtio-comment

> On Fri, Jun 13, 2025 at 4:46 PM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> >
> > > On Fri, 13 Jun 2025 at 06:57, Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > >
> > > > > On Sat, Apr 12, 2025 at 10:39 PM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > > >
> > > > > > This patch brings a new feature, called "multi devices", to the virtio
> > > > > > vsock. It introduces a "VIRTIO_VSOCK_F_MULTI_DEVICES" feature bit, and a
> > > > > > "device_order" field to the config for the virtio vsock.
> > > > > >
> > > > > > == Motivition ==
> > > > > >
> > > > > > Vsock is a lightweight and widely used data exchange mechanism between host
> > > > > > and guest. Currently, the virtio-vsock only supports one device, resulting
> > > > > > in the inability to enable more than one backend.
> > > > >
> > > > > I wonder which part of the spec forbids more than one device.
> > > >
> > > > No. The spec, however, is designed for a single device, and lacks some
> > > > specifications for multiple devices.
> > > >
> > > > For example, we should have a mechanism to select a device from all to
> > > > communicate with a peer.
> 
> I wonder if this is a:
> 
> 1) mechanism that needs to be mandated by the device

Yes.

> 2) a policy that is allowed to be tweaked by the user as TCP/IP did

Not allowed in the current version.

> (Note anyhow the driver can override what the device suggests...)

> > > > This patch introduces a new feature bit "VIRTIO_VSOCK_F_MULTI_DEVICES", and
> > > > involves some modifications to the config space, device & driver norms.
> > >
> > > Maybe we should think a better name to avoid confusion, like "F_DEVICE_ORDER".
> > > I'm not good at name, so if others are better suggestion, they are welcome.
> >
> > Naming things is tough for me, too. I'm good with "F_DEVICE_ORDER", and
> > will update it in the next. I am still open to other suggestions.
> 
> If we agree on the idea, I agree we need a better name.

Okay. I'll try to give a new name in the next version.

Thanks,
Xuewei

> Thanks
> 
> >
> > Thanks,
> > Xuewei
> >
> > > >
> > > > > > For instance, two devices
> > > > > > are required: one to transfer data to the VMM via virtio-vsock, and another
> > > > > > to a user process via vhost-user-vsock.
> > > > > >
> > > > > > Apart from that, a side gain is that theoretically the performance might be
> > > > > > improved since each device has its own queue. But it varies depending on
> > > > > > the implementation.
> > > > >
> > > > > It could implement multiqueue anyhow.
> > > > >
> > > > > Thanks
> > > >
> > > > Yes, indeed. As we discussed before, I marked this as a side gain. And
> > > > @Stefano said that the community has a plan to implement multiqueue for
> > > > vsock.
> > >
> > > I'm not sure there is a plan, but yeah, we should do it some day.
> > > As I mention in the previous email, I'd remove any reference to performance.
> > >
> > > I'd just talk about a mechanism to select a default out device, when
> > > the source socket is not bound to any address, like a default gateway.
> > >
> > > Thanks,
> > > Stefano
> > >
> > > > The foundational goal of this patch is to enable vsock to support
> > > > multiple backends.
> > > >
> > > > Thanks,
> > > > Xuewei
> >

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v7] virtio-vsock: Add support for multi devices
  2025-06-16  8:29           ` Xuewei Niu
@ 2025-06-16  8:38             ` Stefano Garzarella
  2025-06-17  2:52               ` Jason Wang
  0 siblings, 1 reply; 59+ messages in thread
From: Stefano Garzarella @ 2025-06-16  8:38 UTC (permalink / raw)
  To: Xuewei Niu, jasowang; +Cc: fupan.lfp, mst, niuxuewei.nxw, parav, virtio-comment

On Mon, 16 Jun 2025 at 10:29, Xuewei Niu <niuxuewei97@gmail.com> wrote:
>
> > On Fri, Jun 13, 2025 at 4:46 PM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > >
> > > > On Fri, 13 Jun 2025 at 06:57, Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > >
> > > > > > On Sat, Apr 12, 2025 at 10:39 PM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > > > >
> > > > > > > This patch brings a new feature, called "multi devices", to the virtio
> > > > > > > vsock. It introduces a "VIRTIO_VSOCK_F_MULTI_DEVICES" feature bit, and a
> > > > > > > "device_order" field to the config for the virtio vsock.
> > > > > > >
> > > > > > > == Motivition ==
> > > > > > >
> > > > > > > Vsock is a lightweight and widely used data exchange mechanism between host
> > > > > > > and guest. Currently, the virtio-vsock only supports one device, resulting
> > > > > > > in the inability to enable more than one backend.
> > > > > >
> > > > > > I wonder which part of the spec forbids more than one device.
> > > > >
> > > > > No. The spec, however, is designed for a single device, and lacks some
> > > > > specifications for multiple devices.
> > > > >
> > > > > For example, we should have a mechanism to select a device from all to
> > > > > communicate with a peer.
> >
> > I wonder if this is a:
> >
> > 1) mechanism that needs to be mandated by the device
>
> Yes.
>
> > 2) a policy that is allowed to be tweaked by the user as TCP/IP did
>
> Not allowed in the current version.
>
> > (Note anyhow the driver can override what the device suggests...)

I think we should follow what we described in the spec:
"The virtio socket device is a zero-configuration socket communications device."

So, IMO the guest (driver) should not be allowed to change anything.
E.g. Right now it's not allowed to change the CID assigned by the host (device).

Thanks,
Stefano


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v7] virtio-vsock: Add support for multi devices
  2025-06-16  8:26         ` Parav Pandit
@ 2025-06-16  8:59           ` Xuewei Niu
  2025-06-16 10:05             ` Parav Pandit
  0 siblings, 1 reply; 59+ messages in thread
From: Xuewei Niu @ 2025-06-16  8:59 UTC (permalink / raw)
  To: parav; +Cc: fupan.lfp, mst, niuxuewei.nxw, niuxuewei97, sgarzare,
	virtio-comment

> > From: Xuewei Niu <niuxuewei97@gmail.com>
> > Sent: Monday, June 16, 2025 1:48 PM
> > 
> > Hi, Parav.
> > 
> > Thanks for your detailed comments.
> > 
> > > Hi Xuewei,
> > >
> > > > From: Xuewei Niu <niuxuewei97@gmail.com>
> > > > Sent: Monday, May 19, 2025 3:08 PM
> > > >
> > > > > On Sat, Apr 12, 2025 at 10:28:25PM +0800, Xuewei Niu wrote:
> > > > > > This patch brings a new feature, called "multi devices", to the
> > > > > > virtio vsock. It introduces a "VIRTIO_VSOCK_F_MULTI_DEVICES"
> > > > > > feature bit, and a "device_order" field to the config for the virtio vsock.
> > > > > >
> > > > > > == Motivition ==
> > > > > >
> > > > > > Vsock is a lightweight and widely used data exchange mechanism
> > > > > > between host and guest.
> > >
> > > Even though it is the current use, the specification does not prevent its usage
> > between two guests via a host.
> > > So we should not assume such guest <-> host communication as the only
> > case to add new feature.
> > >
> > > For example, in the spec only must requirement is that src_cid ==
> > config.guest_cid.
> > > Dst_cid can be anything, it need not be well known 0x2 (for the host).
> > 
> > Yes.
> > 
> > > With this flexibility in the spec, one can connect vsock devices with multiple
> > different backends.
> > >
> > > For example,
> > > QEMU can insert one vsock device for VM to HV communication.
> > 
> > It is also able to communicate with other devices on the same host, i.e.
> > VM-to-VM.
> > 
> > > A real PCI device can insert one voscket device for VM-to-VM
> > communication bypassing a full TCP/IP stack.
> > 
> > AFAIK, all devices are implemented in software. Is it a real PCI HW device?
> > 
> Yes. virto PCI devices are implemented as hw or as vdpa for many years now by cloud operators and by NIC vendors.

Thanks for your confirmation.

> > > This means there are two different backends.
> > > And these two devices should not be grouped in the use case you described.
> > 
> > I think one group is enough for all use cases. It is required that CIDs are unique
> > in global, i.e. all backends.
> > 
> > As your example, let me assume there are two VMs
> > 
> > 1. VM0 (two vsock backends)
> >     1.1 device0 (cid=3, default), backend is host kernel (vhost-vsock);
> >     1.2 device1 (cid=4), backend is HV (virtio-vsock).
> > 2. VM1
> >     2.1 device2 (cid=5, default), backend is host kernel (vhost-vsock).
> > 
> > The device0 is able to do VM0-HOST (src_cid=3, dst_cid=2) and VM0-VM1
> > (src_cid=3, dst_cid=5) communicatation, while the device1 is only able to do
> > VM0-HV (src_cid=4, dst_cid=2) communicatation.
> > 
> In example 1.1 and 1.2 no devices are grouped.
> Your proposal of this patch wants to group the two devices and pick one of them as default device.

Yes. I just wonder if it is possible to have more than one groups in one guest?

> If device0 and device1 are inserted to the VM0 with the feature bit you suggested, the guest things that they are part of the same group.

In this patch, we don't allow to insert devices without the feature bit if
there are already devices with the feature bit.

As a result, there can be either multiple devices with the feature bit or
just a single device.

> When bind() call is not done, host sw does not know which device to pick up between device0 and device1 when binding the devices.

Are you referring to an vsock application on the host? If yes, "host sw" is
able to pick up one device according to the dst cid. For example, pick up
device0 if `connect(3)` is called.

Please be aware that the "host sw" can not pick up device1, since its device
is not in the host kernel.

> > In a word, a tuple identifies a connection.
> > 
> > "Refuse to connect" will be raised if the device1 attempts to connect to the
> > device2. "They are not in the same group" is a reasonable explaination.
> > Am I right?
> > 
> During bind call, one needs to select the device when the devices are coming from multiple different backends.

Yes.

Thanks,
Xuewei

^ permalink raw reply	[flat|nested] 59+ messages in thread

* RE: [PATCH v7] virtio-vsock: Add support for multi devices
  2025-06-16  8:59           ` Xuewei Niu
@ 2025-06-16 10:05             ` Parav Pandit
  2025-06-16 10:56               ` Xuewei Niu
  0 siblings, 1 reply; 59+ messages in thread
From: Parav Pandit @ 2025-06-16 10:05 UTC (permalink / raw)
  To: Xuewei Niu
  Cc: fupan.lfp@antgroup.com, mst@redhat.com,
	niuxuewei.nxw@antgroup.com, sgarzare@redhat.com,
	virtio-comment@lists.linux.dev


> From: Xuewei Niu <niuxuewei97@gmail.com>
> Sent: Monday, June 16, 2025 2:30 PM
> 
> > > From: Xuewei Niu <niuxuewei97@gmail.com>
> > > Sent: Monday, June 16, 2025 1:48 PM
> > >
> > > Hi, Parav.
> > >
> > > Thanks for your detailed comments.
> > >
> > > > Hi Xuewei,
> > > >
> > > > > From: Xuewei Niu <niuxuewei97@gmail.com>
> > > > > Sent: Monday, May 19, 2025 3:08 PM
> > > > >
> > > > > > On Sat, Apr 12, 2025 at 10:28:25PM +0800, Xuewei Niu wrote:
> > > > > > > This patch brings a new feature, called "multi devices", to
> > > > > > > the virtio vsock. It introduces a "VIRTIO_VSOCK_F_MULTI_DEVICES"
> > > > > > > feature bit, and a "device_order" field to the config for the virtio
> vsock.
> > > > > > >
> > > > > > > == Motivition ==
> > > > > > >
> > > > > > > Vsock is a lightweight and widely used data exchange
> > > > > > > mechanism between host and guest.
> > > >
> > > > Even though it is the current use, the specification does not
> > > > prevent its usage
> > > between two guests via a host.
> > > > So we should not assume such guest <-> host communication as the
> > > > only
> > > case to add new feature.
> > > >
> > > > For example, in the spec only must requirement is that src_cid ==
> > > config.guest_cid.
> > > > Dst_cid can be anything, it need not be well known 0x2 (for the host).
> > >
> > > Yes.
> > >
> > > > With this flexibility in the spec, one can connect vsock devices
> > > > with multiple
> > > different backends.
> > > >
> > > > For example,
> > > > QEMU can insert one vsock device for VM to HV communication.
> > >
> > > It is also able to communicate with other devices on the same host, i.e.
> > > VM-to-VM.
> > >
> > > > A real PCI device can insert one voscket device for VM-to-VM
> > > communication bypassing a full TCP/IP stack.
> > >
> > > AFAIK, all devices are implemented in software. Is it a real PCI HW device?
> > >
> > Yes. virto PCI devices are implemented as hw or as vdpa for many years now by
> cloud operators and by NIC vendors.
> 
> Thanks for your confirmation.
> 
> > > > This means there are two different backends.
> > > > And these two devices should not be grouped in the use case you
> described.
> > >
> > > I think one group is enough for all use cases. It is required that
> > > CIDs are unique in global, i.e. all backends.
> > >
> > > As your example, let me assume there are two VMs
> > >
> > > 1. VM0 (two vsock backends)
> > >     1.1 device0 (cid=3, default), backend is host kernel (vhost-vsock);
> > >     1.2 device1 (cid=4), backend is HV (virtio-vsock).
> > > 2. VM1
> > >     2.1 device2 (cid=5, default), backend is host kernel (vhost-vsock).
> > >
> > > The device0 is able to do VM0-HOST (src_cid=3, dst_cid=2) and
> > > VM0-VM1 (src_cid=3, dst_cid=5) communicatation, while the device1 is
> > > only able to do VM0-HV (src_cid=4, dst_cid=2) communicatation.
> > >
> > In example 1.1 and 1.2 no devices are grouped.
> > Your proposal of this patch wants to group the two devices and pick one of
> them as default device.
> 
> Yes. I just wonder if it is possible to have more than one groups in one guest?
> 
For example
Group_1: two devices dev0 and dev1, implemented as PCI HW devices.
Group_2: two devices by QEMU SW implemented as sw backend.

All the 4 devices has _F bit indicating they can be grouped.
But there is no indication that they are part of which group.
And hence the guest VM driver is in dark on how to forward requests without the bind() call.

> > If device0 and device1 are inserted to the VM0 with the feature bit you
> suggested, the guest things that they are part of the same group.
> 
> In this patch, we don't allow to insert devices without the feature bit if there are
> already devices with the feature bit.
> 
In above example of two groups, all the 4 devices spread across two groups has the feature bit set.
Yet, they cannot be grouped correctly.
Driver driving blind thinks that all 4 devices are part of the same group.

> As a result, there can be either multiple devices with the feature bit or just a
> single device.
> 

> > When bind() call is not done, host sw does not know which device to pick up
> between device0 and device1 when binding the devices.
> 
> Are you referring to an vsock application on the host? If yes, "host sw" is able to
> pick up one device according to the dst cid. For example, pick up
> device0 if `connect(3)` is called.
> 
> Please be aware that the "host sw" can not pick up device1, since its device is not
> in the host kernel.
> 
> > > In a word, a tuple identifies a connection.
> > >
> > > "Refuse to connect" will be raised if the device1 attempts to
> > > connect to the device2. "They are not in the same group" is a reasonable
> explaination.
> > > Am I right?
> > >
> > During bind call, one needs to select the device when the devices are coming
> from multiple different backends.
> 
> Yes.
> 
> Thanks,
> Xuewei

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v7] virtio-vsock: Add support for multi devices
  2025-06-16 10:05             ` Parav Pandit
@ 2025-06-16 10:56               ` Xuewei Niu
  2025-06-16 11:00                 ` Xuewei Niu
  2025-06-17  6:01                 ` Parav Pandit
  0 siblings, 2 replies; 59+ messages in thread
From: Xuewei Niu @ 2025-06-16 10:56 UTC (permalink / raw)
  To: parav; +Cc: fupan.lfp, mst, niuxuewei.nxw, niuxuewei97, sgarzare,
	virtio-comment

> > From: Xuewei Niu <niuxuewei97@gmail.com>
> > Sent: Monday, June 16, 2025 2:30 PM
> > 
> > > > From: Xuewei Niu <niuxuewei97@gmail.com>
> > > > Sent: Monday, June 16, 2025 1:48 PM
> > > >
> > > > Hi, Parav.
> > > >
> > > > Thanks for your detailed comments.
> > > >
> > > > > Hi Xuewei,
> > > > >
> > > > > > From: Xuewei Niu <niuxuewei97@gmail.com>
> > > > > > Sent: Monday, May 19, 2025 3:08 PM
> > > > > >
> > > > > > > On Sat, Apr 12, 2025 at 10:28:25PM +0800, Xuewei Niu wrote:
> > > > > > > > This patch brings a new feature, called "multi devices", to
> > > > > > > > the virtio vsock. It introduces a "VIRTIO_VSOCK_F_MULTI_DEVICES"
> > > > > > > > feature bit, and a "device_order" field to the config for the virtio
> > vsock.
> > > > > > > >
> > > > > > > > == Motivition ==
> > > > > > > >
> > > > > > > > Vsock is a lightweight and widely used data exchange
> > > > > > > > mechanism between host and guest.
> > > > >
> > > > > Even though it is the current use, the specification does not
> > > > > prevent its usage
> > > > between two guests via a host.
> > > > > So we should not assume such guest <-> host communication as the
> > > > > only
> > > > case to add new feature.
> > > > >
> > > > > For example, in the spec only must requirement is that src_cid ==
> > > > config.guest_cid.
> > > > > Dst_cid can be anything, it need not be well known 0x2 (for the host).
> > > >
> > > > Yes.
> > > >
> > > > > With this flexibility in the spec, one can connect vsock devices
> > > > > with multiple
> > > > different backends.
> > > > >
> > > > > For example,
> > > > > QEMU can insert one vsock device for VM to HV communication.
> > > >
> > > > It is also able to communicate with other devices on the same host, i.e.
> > > > VM-to-VM.
> > > >
> > > > > A real PCI device can insert one voscket device for VM-to-VM
> > > > communication bypassing a full TCP/IP stack.
> > > >
> > > > AFAIK, all devices are implemented in software. Is it a real PCI HW device?
> > > >
> > > Yes. virto PCI devices are implemented as hw or as vdpa for many years now by
> > cloud operators and by NIC vendors.
> > 
> > Thanks for your confirmation.
> > 
> > > > > This means there are two different backends.
> > > > > And these two devices should not be grouped in the use case you
> > described.
> > > >
> > > > I think one group is enough for all use cases. It is required that
> > > > CIDs are unique in global, i.e. all backends.
> > > >
> > > > As your example, let me assume there are two VMs
> > > >
> > > > 1. VM0 (two vsock backends)
> > > >     1.1 device0 (cid=3, default), backend is host kernel (vhost-vsock);
> > > >     1.2 device1 (cid=4), backend is HV (virtio-vsock).
> > > > 2. VM1
> > > >     2.1 device2 (cid=5, default), backend is host kernel (vhost-vsock).
> > > >
> > > > The device0 is able to do VM0-HOST (src_cid=3, dst_cid=2) and
> > > > VM0-VM1 (src_cid=3, dst_cid=5) communicatation, while the device1 is
> > > > only able to do VM0-HV (src_cid=4, dst_cid=2) communicatation.
> > > >
> > > In example 1.1 and 1.2 no devices are grouped.
> > > Your proposal of this patch wants to group the two devices and pick one of
> > them as default device.
> > 
> > Yes. I just wonder if it is possible to have more than one groups in one guest?
> > 
> For example
> Group_1: two devices dev0 and dev1, implemented as PCI HW devices.
> Group_2: two devices by QEMU SW implemented as sw backend.
> 
> All the 4 devices has _F bit indicating they can be grouped.
> But there is no indication that they are part of which group.
> And hence the guest VM driver is in dark on how to forward requests without the bind() call.

I see. Thanks!

My idea is that there is only one default device, no matter how many types
of backends are. If the users intend to use other devices, `bind()` call
is required.

For example, we set `dev0` as the default device:

1. Do not call `bind()`: use dev0;
2. Call `bind(${dev0_cid})`: use dev1;
...
5. Call `bind(${dev4_cid})`: use dev4;

Even though we introduce the group concept, if we don't call `bind()`, how
does driver know which group to use? If the driver recoginizes the dst_cid,
it can use the group to find the device, then the things will be
complicated. The driver needs to know the relationship between the dst_cid
and the group.

WDYT?

Thanks,
Xuewei

> > > If device0 and device1 are inserted to the VM0 with the feature bit you
> > suggested, the guest things that they are part of the same group.
> > 
> > In this patch, we don't allow to insert devices without the feature bit if there are
> > already devices with the feature bit.
> > 
> In above example of two groups, all the 4 devices spread across two groups has the feature bit set.
> Yet, they cannot be grouped correctly.
> Driver driving blind thinks that all 4 devices are part of the same group.
> 
> > As a result, there can be either multiple devices with the feature bit or just a
> > single device.
> > 
> 
> > > When bind() call is not done, host sw does not know which device to pick up
> > between device0 and device1 when binding the devices.
> > 
> > Are you referring to an vsock application on the host? If yes, "host sw" is able to
> > pick up one device according to the dst cid. For example, pick up
> > device0 if `connect(3)` is called.
> > 
> > Please be aware that the "host sw" can not pick up device1, since its device is not
> > in the host kernel.
> > 
> > > > In a word, a tuple identifies a connection.
> > > >
> > > > "Refuse to connect" will be raised if the device1 attempts to
> > > > connect to the device2. "They are not in the same group" is a reasonable
> > explaination.
> > > > Am I right?
> > > >
> > > During bind call, one needs to select the device when the devices are coming
> > from multiple different backends.
> > 
> > Yes.
> > 
> > Thanks,
> > Xuewei

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v7] virtio-vsock: Add support for multi devices
  2025-06-16 10:56               ` Xuewei Niu
@ 2025-06-16 11:00                 ` Xuewei Niu
  2025-06-17  6:01                 ` Parav Pandit
  1 sibling, 0 replies; 59+ messages in thread
From: Xuewei Niu @ 2025-06-16 11:00 UTC (permalink / raw)
  To: niuxuewei97
  Cc: fupan.lfp, mst, niuxuewei.nxw, parav, sgarzare, virtio-comment

> > > From: Xuewei Niu <niuxuewei97@gmail.com>
> > > Sent: Monday, June 16, 2025 2:30 PM
> > > 
> > > > > From: Xuewei Niu <niuxuewei97@gmail.com>
> > > > > Sent: Monday, June 16, 2025 1:48 PM
> > > > >
> > > > > Hi, Parav.
> > > > >
> > > > > Thanks for your detailed comments.
> > > > >
> > > > > > Hi Xuewei,
> > > > > >
> > > > > > > From: Xuewei Niu <niuxuewei97@gmail.com>
> > > > > > > Sent: Monday, May 19, 2025 3:08 PM
> > > > > > >
> > > > > > > > On Sat, Apr 12, 2025 at 10:28:25PM +0800, Xuewei Niu wrote:
> > > > > > > > > This patch brings a new feature, called "multi devices", to
> > > > > > > > > the virtio vsock. It introduces a "VIRTIO_VSOCK_F_MULTI_DEVICES"
> > > > > > > > > feature bit, and a "device_order" field to the config for the virtio
> > > vsock.
> > > > > > > > >
> > > > > > > > > == Motivition ==
> > > > > > > > >
> > > > > > > > > Vsock is a lightweight and widely used data exchange
> > > > > > > > > mechanism between host and guest.
> > > > > >
> > > > > > Even though it is the current use, the specification does not
> > > > > > prevent its usage
> > > > > between two guests via a host.
> > > > > > So we should not assume such guest <-> host communication as the
> > > > > > only
> > > > > case to add new feature.
> > > > > >
> > > > > > For example, in the spec only must requirement is that src_cid ==
> > > > > config.guest_cid.
> > > > > > Dst_cid can be anything, it need not be well known 0x2 (for the host).
> > > > >
> > > > > Yes.
> > > > >
> > > > > > With this flexibility in the spec, one can connect vsock devices
> > > > > > with multiple
> > > > > different backends.
> > > > > >
> > > > > > For example,
> > > > > > QEMU can insert one vsock device for VM to HV communication.
> > > > >
> > > > > It is also able to communicate with other devices on the same host, i.e.
> > > > > VM-to-VM.
> > > > >
> > > > > > A real PCI device can insert one voscket device for VM-to-VM
> > > > > communication bypassing a full TCP/IP stack.
> > > > >
> > > > > AFAIK, all devices are implemented in software. Is it a real PCI HW device?
> > > > >
> > > > Yes. virto PCI devices are implemented as hw or as vdpa for many years now by
> > > cloud operators and by NIC vendors.
> > > 
> > > Thanks for your confirmation.
> > > 
> > > > > > This means there are two different backends.
> > > > > > And these two devices should not be grouped in the use case you
> > > described.
> > > > >
> > > > > I think one group is enough for all use cases. It is required that
> > > > > CIDs are unique in global, i.e. all backends.
> > > > >
> > > > > As your example, let me assume there are two VMs
> > > > >
> > > > > 1. VM0 (two vsock backends)
> > > > >     1.1 device0 (cid=3, default), backend is host kernel (vhost-vsock);
> > > > >     1.2 device1 (cid=4), backend is HV (virtio-vsock).
> > > > > 2. VM1
> > > > >     2.1 device2 (cid=5, default), backend is host kernel (vhost-vsock).
> > > > >
> > > > > The device0 is able to do VM0-HOST (src_cid=3, dst_cid=2) and
> > > > > VM0-VM1 (src_cid=3, dst_cid=5) communicatation, while the device1 is
> > > > > only able to do VM0-HV (src_cid=4, dst_cid=2) communicatation.
> > > > >
> > > > In example 1.1 and 1.2 no devices are grouped.
> > > > Your proposal of this patch wants to group the two devices and pick one of
> > > them as default device.
> > > 
> > > Yes. I just wonder if it is possible to have more than one groups in one guest?
> > > 
> > For example
> > Group_1: two devices dev0 and dev1, implemented as PCI HW devices.
> > Group_2: two devices by QEMU SW implemented as sw backend.
> > 
> > All the 4 devices has _F bit indicating they can be grouped.
> > But there is no indication that they are part of which group.
> > And hence the guest VM driver is in dark on how to forward requests without the bind() call.
> 
> I see. Thanks!
> 
> My idea is that there is only one default device, no matter how many types
> of backends are. If the users intend to use other devices, `bind()` call
> is required.
> 
> For example, we set `dev0` as the default device:
> 
> 1. Do not call `bind()`: use dev0;
> 2. Call `bind(${dev0_cid})`: use dev1;
> ...
> 5. Call `bind(${dev4_cid})`: use dev4;

Sorry for the typo. Correct the item 2:

2. Call `bind(${dev0_cid})`: use dev0;

> Even though we introduce the group concept, if we don't call `bind()`, how
> does driver know which group to use? If the driver recoginizes the dst_cid,
> it can use the group to find the device, then the things will be
> complicated. The driver needs to know the relationship between the dst_cid
> and the group.
> 
> WDYT?
> 
> Thanks,
> Xuewei
> 
> > > > If device0 and device1 are inserted to the VM0 with the feature bit you
> > > suggested, the guest things that they are part of the same group.
> > > 
> > > In this patch, we don't allow to insert devices without the feature bit if there are
> > > already devices with the feature bit.
> > > 
> > In above example of two groups, all the 4 devices spread across two groups has the feature bit set.
> > Yet, they cannot be grouped correctly.
> > Driver driving blind thinks that all 4 devices are part of the same group.
> > 
> > > As a result, there can be either multiple devices with the feature bit or just a
> > > single device.
> > > 
> > 
> > > > When bind() call is not done, host sw does not know which device to pick up
> > > between device0 and device1 when binding the devices.
> > > 
> > > Are you referring to an vsock application on the host? If yes, "host sw" is able to
> > > pick up one device according to the dst cid. For example, pick up
> > > device0 if `connect(3)` is called.
> > > 
> > > Please be aware that the "host sw" can not pick up device1, since its device is not
> > > in the host kernel.
> > > 
> > > > > In a word, a tuple identifies a connection.
> > > > >
> > > > > "Refuse to connect" will be raised if the device1 attempts to
> > > > > connect to the device2. "They are not in the same group" is a reasonable
> > > explaination.
> > > > > Am I right?
> > > > >
> > > > During bind call, one needs to select the device when the devices are coming
> > > from multiple different backends.
> > > 
> > > Yes.
> > > 
> > > Thanks,
> > > Xuewei

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v7] virtio-vsock: Add support for multi devices
  2025-06-16  8:38             ` Stefano Garzarella
@ 2025-06-17  2:52               ` Jason Wang
  2025-06-17  2:54                 ` Jason Wang
  0 siblings, 1 reply; 59+ messages in thread
From: Jason Wang @ 2025-06-17  2:52 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Xuewei Niu, fupan.lfp, mst, niuxuewei.nxw, parav, virtio-comment

On Mon, Jun 16, 2025 at 4:38 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> On Mon, 16 Jun 2025 at 10:29, Xuewei Niu <niuxuewei97@gmail.com> wrote:
> >
> > > On Fri, Jun 13, 2025 at 4:46 PM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > >
> > > > > On Fri, 13 Jun 2025 at 06:57, Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > > >
> > > > > > > On Sat, Apr 12, 2025 at 10:39 PM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > > > > >
> > > > > > > > This patch brings a new feature, called "multi devices", to the virtio
> > > > > > > > vsock. It introduces a "VIRTIO_VSOCK_F_MULTI_DEVICES" feature bit, and a
> > > > > > > > "device_order" field to the config for the virtio vsock.
> > > > > > > >
> > > > > > > > == Motivition ==
> > > > > > > >
> > > > > > > > Vsock is a lightweight and widely used data exchange mechanism between host
> > > > > > > > and guest. Currently, the virtio-vsock only supports one device, resulting
> > > > > > > > in the inability to enable more than one backend.
> > > > > > >
> > > > > > > I wonder which part of the spec forbids more than one device.
> > > > > >
> > > > > > No. The spec, however, is designed for a single device, and lacks some
> > > > > > specifications for multiple devices.
> > > > > >
> > > > > > For example, we should have a mechanism to select a device from all to
> > > > > > communicate with a peer.
> > >
> > > I wonder if this is a:
> > >
> > > 1) mechanism that needs to be mandated by the device
> >
> > Yes.
> >
> > > 2) a policy that is allowed to be tweaked by the user as TCP/IP did
> >
> > Not allowed in the current version.
> >
> > > (Note anyhow the driver can override what the device suggests...)
>
> I think we should follow what we described in the spec:
> "The virtio socket device is a zero-configuration socket communications device."

We probably need to define "configuration" first.

For example, if it means zero configuration from the user, it does not
conflict with 2), the driver can use its own algorithm to elect a
"default" device.

>
> So, IMO the guest (driver) should not be allowed to change anything.
> E.g. Right now it's not allowed to change the CID assigned by the host (device).

A dumb question when having two cids (cid1 and cid2) in the same
guest, what happens if src=cid1 and dst=cid2?

>
> Thanks,
> Stefano
>

Thanks


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v7] virtio-vsock: Add support for multi devices
  2025-06-17  2:52               ` Jason Wang
@ 2025-06-17  2:54                 ` Jason Wang
  2025-06-17  7:45                   ` Xuewei Niu
  0 siblings, 1 reply; 59+ messages in thread
From: Jason Wang @ 2025-06-17  2:54 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Xuewei Niu, fupan.lfp, mst, niuxuewei.nxw, parav, virtio-comment

On Tue, Jun 17, 2025 at 10:52 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Mon, Jun 16, 2025 at 4:38 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
> >
> > On Mon, 16 Jun 2025 at 10:29, Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > >
> > > > On Fri, Jun 13, 2025 at 4:46 PM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > >
> > > > > > On Fri, 13 Jun 2025 at 06:57, Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > > > >
> > > > > > > > On Sat, Apr 12, 2025 at 10:39 PM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > This patch brings a new feature, called "multi devices", to the virtio
> > > > > > > > > vsock. It introduces a "VIRTIO_VSOCK_F_MULTI_DEVICES" feature bit, and a
> > > > > > > > > "device_order" field to the config for the virtio vsock.
> > > > > > > > >
> > > > > > > > > == Motivition ==
> > > > > > > > >
> > > > > > > > > Vsock is a lightweight and widely used data exchange mechanism between host
> > > > > > > > > and guest. Currently, the virtio-vsock only supports one device, resulting
> > > > > > > > > in the inability to enable more than one backend.
> > > > > > > >
> > > > > > > > I wonder which part of the spec forbids more than one device.
> > > > > > >
> > > > > > > No. The spec, however, is designed for a single device, and lacks some
> > > > > > > specifications for multiple devices.
> > > > > > >
> > > > > > > For example, we should have a mechanism to select a device from all to
> > > > > > > communicate with a peer.
> > > >
> > > > I wonder if this is a:
> > > >
> > > > 1) mechanism that needs to be mandated by the device
> > >
> > > Yes.
> > >
> > > > 2) a policy that is allowed to be tweaked by the user as TCP/IP did
> > >
> > > Not allowed in the current version.
> > >
> > > > (Note anyhow the driver can override what the device suggests...)
> >
> > I think we should follow what we described in the spec:
> > "The virtio socket device is a zero-configuration socket communications device."
>
> We probably need to define "configuration" first.
>
> For example, if it means zero configuration from the user, it does not
> conflict with 2), the driver can use its own algorithm to elect a
> "default" device.

Another perspective, making decisions in guests may be even more
helpful for the case where the device is not trusted.

>
> >
> > So, IMO the guest (driver) should not be allowed to change anything.
> > E.g. Right now it's not allowed to change the CID assigned by the host (device).
>
> A dumb question when having two cids (cid1 and cid2) in the same
> guest, what happens if src=cid1 and dst=cid2?
>
> >
> > Thanks,
> > Stefano
> >
>
> Thanks

Thanks


^ permalink raw reply	[flat|nested] 59+ messages in thread

* RE: [PATCH v7] virtio-vsock: Add support for multi devices
  2025-06-16 10:56               ` Xuewei Niu
  2025-06-16 11:00                 ` Xuewei Niu
@ 2025-06-17  6:01                 ` Parav Pandit
  2025-06-17  7:41                   ` Xuewei Niu
  2025-06-19  3:26                   ` Xuewei Niu
  1 sibling, 2 replies; 59+ messages in thread
From: Parav Pandit @ 2025-06-17  6:01 UTC (permalink / raw)
  To: Xuewei Niu
  Cc: fupan.lfp@antgroup.com, mst@redhat.com,
	niuxuewei.nxw@antgroup.com, sgarzare@redhat.com,
	virtio-comment@lists.linux.dev


> From: Xuewei Niu <niuxuewei97@gmail.com>
> Sent: Monday, June 16, 2025 4:26 PM
> 
> > > From: Xuewei Niu <niuxuewei97@gmail.com>
> > > Sent: Monday, June 16, 2025 2:30 PM
> > >
> > > > > From: Xuewei Niu <niuxuewei97@gmail.com>
> > > > > Sent: Monday, June 16, 2025 1:48 PM
> > > > >
> > > > > Hi, Parav.
> > > > >
> > > > > Thanks for your detailed comments.
> > > > >
> > > > > > Hi Xuewei,
> > > > > >
> > > > > > > From: Xuewei Niu <niuxuewei97@gmail.com>
> > > > > > > Sent: Monday, May 19, 2025 3:08 PM
> > > > > > >
> > > > > > > > On Sat, Apr 12, 2025 at 10:28:25PM +0800, Xuewei Niu wrote:
> > > > > > > > > This patch brings a new feature, called "multi devices",
> > > > > > > > > to the virtio vsock. It introduces a
> "VIRTIO_VSOCK_F_MULTI_DEVICES"
> > > > > > > > > feature bit, and a "device_order" field to the config
> > > > > > > > > for the virtio
> > > vsock.
> > > > > > > > >
> > > > > > > > > == Motivition ==
> > > > > > > > >
> > > > > > > > > Vsock is a lightweight and widely used data exchange
> > > > > > > > > mechanism between host and guest.
> > > > > >
> > > > > > Even though it is the current use, the specification does not
> > > > > > prevent its usage
> > > > > between two guests via a host.
> > > > > > So we should not assume such guest <-> host communication as
> > > > > > the only
> > > > > case to add new feature.
> > > > > >
> > > > > > For example, in the spec only must requirement is that src_cid
> > > > > > ==
> > > > > config.guest_cid.
> > > > > > Dst_cid can be anything, it need not be well known 0x2 (for the host).
> > > > >
> > > > > Yes.
> > > > >
> > > > > > With this flexibility in the spec, one can connect vsock
> > > > > > devices with multiple
> > > > > different backends.
> > > > > >
> > > > > > For example,
> > > > > > QEMU can insert one vsock device for VM to HV communication.
> > > > >
> > > > > It is also able to communicate with other devices on the same host, i.e.
> > > > > VM-to-VM.
> > > > >
> > > > > > A real PCI device can insert one voscket device for VM-to-VM
> > > > > communication bypassing a full TCP/IP stack.
> > > > >
> > > > > AFAIK, all devices are implemented in software. Is it a real PCI HW device?
> > > > >
> > > > Yes. virto PCI devices are implemented as hw or as vdpa for many
> > > > years now by
> > > cloud operators and by NIC vendors.
> > >
> > > Thanks for your confirmation.
> > >
> > > > > > This means there are two different backends.
> > > > > > And these two devices should not be grouped in the use case
> > > > > > you
> > > described.
> > > > >
> > > > > I think one group is enough for all use cases. It is required
> > > > > that CIDs are unique in global, i.e. all backends.
> > > > >
> > > > > As your example, let me assume there are two VMs
> > > > >
> > > > > 1. VM0 (two vsock backends)
> > > > >     1.1 device0 (cid=3, default), backend is host kernel (vhost-vsock);
> > > > >     1.2 device1 (cid=4), backend is HV (virtio-vsock).
> > > > > 2. VM1
> > > > >     2.1 device2 (cid=5, default), backend is host kernel (vhost-vsock).
> > > > >
> > > > > The device0 is able to do VM0-HOST (src_cid=3, dst_cid=2) and
> > > > > VM0-VM1 (src_cid=3, dst_cid=5) communicatation, while the
> > > > > device1 is only able to do VM0-HV (src_cid=4, dst_cid=2)
> communicatation.
> > > > >
> > > > In example 1.1 and 1.2 no devices are grouped.
> > > > Your proposal of this patch wants to group the two devices and
> > > > pick one of
> > > them as default device.
> > >
> > > Yes. I just wonder if it is possible to have more than one groups in one guest?
> > >
> > For example
> > Group_1: two devices dev0 and dev1, implemented as PCI HW devices.
> > Group_2: two devices by QEMU SW implemented as sw backend.
> >
> > All the 4 devices has _F bit indicating they can be grouped.
> > But there is no indication that they are part of which group.
> > And hence the guest VM driver is in dark on how to forward requests without
> the bind() call.
> 
> I see. Thanks!
> 
> My idea is that there is only one default device, no matter how many types of
> backends are. If the users intend to use other devices, `bind()` call is required.
> 
> For example, we set `dev0` as the default device:
> 
> 1. Do not call `bind()`: use dev0;
> 2. Call `bind(${dev0_cid})`: use dev1;
> ...
> 5. Call `bind(${dev4_cid})`: use dev4;
> 
> Even though we introduce the group concept, if we don't call `bind()`, how does
> driver know which group to use? If the driver recoginizes the dst_cid, it can use
> the group to find the device, then the things will be complicated. The driver
> needs to know the relationship between the dst_cid and the group.
>
Based on the dst_cid picking the right vscock group would be needed. This is vsock level issue at driver level.
Driver would need enough hints or encoding or of dst_cid or something else.

So even though we miss vsock level construct, it should be the reason to not group the devices.
As both attempt to solve issue at different level.

> WDYT?
> 
> Thanks,
> Xuewei
> 
> > > > If device0 and device1 are inserted to the VM0 with the feature
> > > > bit you
> > > suggested, the guest things that they are part of the same group.
> > >
> > > In this patch, we don't allow to insert devices without the feature
> > > bit if there are already devices with the feature bit.
> > >
> > In above example of two groups, all the 4 devices spread across two groups has
> the feature bit set.
> > Yet, they cannot be grouped correctly.
> > Driver driving blind thinks that all 4 devices are part of the same group.
> >
> > > As a result, there can be either multiple devices with the feature
> > > bit or just a single device.
> > >
> >
> > > > When bind() call is not done, host sw does not know which device
> > > > to pick up
> > > between device0 and device1 when binding the devices.
> > >
> > > Are you referring to an vsock application on the host? If yes, "host
> > > sw" is able to pick up one device according to the dst cid. For
> > > example, pick up
> > > device0 if `connect(3)` is called.
> > >
> > > Please be aware that the "host sw" can not pick up device1, since
> > > its device is not in the host kernel.
> > >
> > > > > In a word, a tuple identifies a connection.
> > > > >
> > > > > "Refuse to connect" will be raised if the device1 attempts to
> > > > > connect to the device2. "They are not in the same group" is a
> > > > > reasonable
> > > explaination.
> > > > > Am I right?
> > > > >
> > > > During bind call, one needs to select the device when the devices
> > > > are coming
> > > from multiple different backends.
> > >
> > > Yes.
> > >
> > > Thanks,
> > > Xuewei

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v7] virtio-vsock: Add support for multi devices
  2025-06-17  6:01                 ` Parav Pandit
@ 2025-06-17  7:41                   ` Xuewei Niu
  2025-06-19  3:26                   ` Xuewei Niu
  1 sibling, 0 replies; 59+ messages in thread
From: Xuewei Niu @ 2025-06-17  7:41 UTC (permalink / raw)
  To: parav; +Cc: fupan.lfp, mst, niuxuewei.nxw, niuxuewei97, sgarzare,
	virtio-comment

Resend, because it isn’t listed in the mailing list due to my mistake.

> > From: Xuewei Niu <niuxuewei97@gmail.com>
> > Sent: Monday, June 16, 2025 4:26 PM
> > 
> > > > From: Xuewei Niu <niuxuewei97@gmail.com>
> > > > Sent: Monday, June 16, 2025 2:30 PM
> > > >
> > > > > > From: Xuewei Niu <niuxuewei97@gmail.com>
> > > > > > Sent: Monday, June 16, 2025 1:48 PM
> > > > > >
> > > > > > Hi, Parav.
> > > > > >
> > > > > > Thanks for your detailed comments.
> > > > > >
> > > > > > > Hi Xuewei,
> > > > > > >
> > > > > > > > From: Xuewei Niu <niuxuewei97@gmail.com>
> > > > > > > > Sent: Monday, May 19, 2025 3:08 PM
> > > > > > > >
> > > > > > > > > On Sat, Apr 12, 2025 at 10:28:25PM +0800, Xuewei Niu wrote:
> > > > > > > > > > This patch brings a new feature, called "multi devices",
> > > > > > > > > > to the virtio vsock. It introduces a
> > "VIRTIO_VSOCK_F_MULTI_DEVICES"
> > > > > > > > > > feature bit, and a "device_order" field to the config
> > > > > > > > > > for the virtio
> > > > vsock.
> > > > > > > > > >
> > > > > > > > > > == Motivition ==
> > > > > > > > > >
> > > > > > > > > > Vsock is a lightweight and widely used data exchange
> > > > > > > > > > mechanism between host and guest.
> > > > > > >
> > > > > > > Even though it is the current use, the specification does not
> > > > > > > prevent its usage
> > > > > > between two guests via a host.
> > > > > > > So we should not assume such guest <-> host communication as
> > > > > > > the only
> > > > > > case to add new feature.
> > > > > > >
> > > > > > > For example, in the spec only must requirement is that src_cid
> > > > > > > ==
> > > > > > config.guest_cid.
> > > > > > > Dst_cid can be anything, it need not be well known 0x2 (for the host).
> > > > > >
> > > > > > Yes.
> > > > > >
> > > > > > > With this flexibility in the spec, one can connect vsock
> > > > > > > devices with multiple
> > > > > > different backends.
> > > > > > >
> > > > > > > For example,
> > > > > > > QEMU can insert one vsock device for VM to HV communication.
> > > > > >
> > > > > > It is also able to communicate with other devices on the same host, i.e.
> > > > > > VM-to-VM.
> > > > > >
> > > > > > > A real PCI device can insert one voscket device for VM-to-VM
> > > > > > communication bypassing a full TCP/IP stack.
> > > > > >
> > > > > > AFAIK, all devices are implemented in software. Is it a real PCI HW device?
> > > > > >
> > > > > Yes. virto PCI devices are implemented as hw or as vdpa for many
> > > > > years now by
> > > > cloud operators and by NIC vendors.
> > > >
> > > > Thanks for your confirmation.
> > > >
> > > > > > > This means there are two different backends.
> > > > > > > And these two devices should not be grouped in the use case
> > > > > > > you
> > > > described.
> > > > > >
> > > > > > I think one group is enough for all use cases. It is required
> > > > > > that CIDs are unique in global, i.e. all backends.
> > > > > >
> > > > > > As your example, let me assume there are two VMs
> > > > > >
> > > > > > 1. VM0 (two vsock backends)
> > > > > >     1.1 device0 (cid=3, default), backend is host kernel (vhost-vsock);
> > > > > >     1.2 device1 (cid=4), backend is HV (virtio-vsock).
> > > > > > 2. VM1
> > > > > >     2.1 device2 (cid=5, default), backend is host kernel (vhost-vsock).
> > > > > >
> > > > > > The device0 is able to do VM0-HOST (src_cid=3, dst_cid=2) and
> > > > > > VM0-VM1 (src_cid=3, dst_cid=5) communicatation, while the
> > > > > > device1 is only able to do VM0-HV (src_cid=4, dst_cid=2)
> > communicatation.
> > > > > >
> > > > > In example 1.1 and 1.2 no devices are grouped.
> > > > > Your proposal of this patch wants to group the two devices and
> > > > > pick one of
> > > > them as default device.
> > > >
> > > > Yes. I just wonder if it is possible to have more than one groups in one guest?
> > > >
> > > For example
> > > Group_1: two devices dev0 and dev1, implemented as PCI HW devices.
> > > Group_2: two devices by QEMU SW implemented as sw backend.
> > >
> > > All the 4 devices has _F bit indicating they can be grouped.
> > > But there is no indication that they are part of which group.
> > > And hence the guest VM driver is in dark on how to forward requests without
> > the bind() call.
> > 
> > I see. Thanks!
> > 
> > My idea is that there is only one default device, no matter how many types of
> > backends are. If the users intend to use other devices, `bind()` call is required.
> > 
> > For example, we set `dev0` as the default device:
> > 
> > 1. Do not call `bind()`: use dev0;
> > 2. Call `bind(${dev0_cid})`: use dev1;
> > ...
> > 5. Call `bind(${dev4_cid})`: use dev4;
> > 
> > Even though we introduce the group concept, if we don't call `bind()`, how does
> > driver know which group to use? If the driver recoginizes the dst_cid, it can use
> > the group to find the device, then the things will be complicated. The driver
> > needs to know the relationship between the dst_cid and the group.
> >
> Based on the dst_cid picking the right vscock group would be needed. This is vsock level issue at driver level.
> Driver would need enough hints or encoding or of dst_cid or something else.
>
> So even though we miss vsock level construct, it should be the reason to not group the devices.
> As both attempt to solve issue at different level.

Here is an example of the guest client.

All operations for hinting addresses are as follows:

- `connect()` is required, to specify the destination;
- `bind()` is optional, to specify the source. (We don't call in this case)

We set destination to host (CID=2), which is a valid dst_cid for every
group existed, the driver still don't have enough information to pick up a
device.

In a word, even though we have the group, the driver is still in dark on
how to forward requests.

As I mentioned in the another email, I think there is one default device
for HV-VM communication, to gather the information about other devices,
e.g. CID, purpose, and other details of other devices. This is a job at
the application level. If the apps intend to establish connections with
others (not HV), they must call `bind()` explicitly to select a device,
regardless of its backend.

Hey @Stefano, could you please give your opinion on this?

> > WDYT?
> > 
> > Thanks,
> > Xuewei
> > 
> > > > > If device0 and device1 are inserted to the VM0 with the feature
> > > > > bit you
> > > > suggested, the guest things that they are part of the same group.
> > > >
> > > > In this patch, we don't allow to insert devices without the feature
> > > > bit if there are already devices with the feature bit.
> > > >
> > > In above example of two groups, all the 4 devices spread across two groups has
> > the feature bit set.
> > > Yet, they cannot be grouped correctly.
> > > Driver driving blind thinks that all 4 devices are part of the same group.
> > >
> > > > As a result, there can be either multiple devices with the feature
> > > > bit or just a single device.
> > > >
> > >
> > > > > When bind() call is not done, host sw does not know which device
> > > > > to pick up
> > > > between device0 and device1 when binding the devices.
> > > >
> > > > Are you referring to an vsock application on the host? If yes, "host
> > > > sw" is able to pick up one device according to the dst cid. For
> > > > example, pick up
> > > > device0 if `connect(3)` is called.
> > > >
> > > > Please be aware that the "host sw" can not pick up device1, since
> > > > its device is not in the host kernel.
> > > >
> > > > > > In a word, a tuple identifies a connection.
> > > > > >
> > > > > > "Refuse to connect" will be raised if the device1 attempts to
> > > > > > connect to the device2. "They are not in the same group" is a
> > > > > > reasonable
> > > > explaination.
> > > > > > Am I right?
> > > > > >
> > > > > During bind call, one needs to select the device when the devices
> > > > > are coming
> > > > from multiple different backends.
> > > >
> > > > Yes.
> > > >
> > > > Thanks,
> > > > Xuewei

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v7] virtio-vsock: Add support for multi devices
  2025-06-17  2:54                 ` Jason Wang
@ 2025-06-17  7:45                   ` Xuewei Niu
  2025-06-18  0:49                     ` Jason Wang
  0 siblings, 1 reply; 59+ messages in thread
From: Xuewei Niu @ 2025-06-17  7:45 UTC (permalink / raw)
  To: jasowang
  Cc: fupan.lfp, mst, niuxuewei.nxw, niuxuewei97, parav, sgarzare,
	virtio-comment

Resend, because it isn’t listed in the mailing list due to my mistake.

> On Mon, Jun 16, 2025 at 4:38 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
> >
> > On Mon, 16 Jun 2025 at 10:29, Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > >
> > > > On Fri, Jun 13, 2025 at 4:46 PM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > >
> > > > > > On Fri, 13 Jun 2025 at 06:57, Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > > > >
> > > > > > > > On Sat, Apr 12, 2025 at 10:39 PM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > This patch brings a new feature, called "multi devices", to the virtio
> > > > > > > > > vsock. It introduces a "VIRTIO_VSOCK_F_MULTI_DEVICES" feature bit, and a
> > > > > > > > > "device_order" field to the config for the virtio vsock.
> > > > > > > > >
> > > > > > > > > == Motivition ==
> > > > > > > > >
> > > > > > > > > Vsock is a lightweight and widely used data exchange mechanism between host
> > > > > > > > > and guest. Currently, the virtio-vsock only supports one device, resulting
> > > > > > > > > in the inability to enable more than one backend.
> > > > > > > >
> > > > > > > > I wonder which part of the spec forbids more than one device.
> > > > > > >
> > > > > > > No. The spec, however, is designed for a single device, and lacks some
> > > > > > > specifications for multiple devices.
> > > > > > >
> > > > > > > For example, we should have a mechanism to select a device from all to
> > > > > > > communicate with a peer.
> > > >
> > > > I wonder if this is a:
> > > >
> > > > 1) mechanism that needs to be mandated by the device
> > >
> > > Yes.
> > >
> > > > 2) a policy that is allowed to be tweaked by the user as TCP/IP did
> > >
> > > Not allowed in the current version.
> > >
> > > > (Note anyhow the driver can override what the device suggests...)
> >
> > I think we should follow what we described in the spec:
> > "The virtio socket device is a zero-configuration socket communications device."
>
> We probably need to define "configuration" first.
>
> For example, if it means zero configuration from the user, it does not
> conflict with 2), the driver can use its own algorithm to elect a
> "default" device.

IMHO, it can be done, but it is not the current design.

I prefer to set the default device for VM-HV communication to do some init
work. I don't think people have a strong need for this.

> Another perspective, making decisions in guests may be even more
> helpful for the case where the device is not trusted

How does the guest realize the vsock device is not trusted? The guest only
knows the information from its config space, which is provided by the host.

> > So, IMO the guest (driver) should not be allowed to change anything.
> > E.g. Right now it's not allowed to change the CID assigned by the host (device).
>
> A dumb question when having two cids (cid1 and cid2) in the same
> guest, what happens if src=cid1 and dst=cid2?

The packets will be directed to another application, if any, on the same guest.

Thanks,
Xuewei

> > Thanks,
> > Stefano
> >
>
> Thanks

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v7] virtio-vsock: Add support for multi devices
  2025-06-17  7:45                   ` Xuewei Niu
@ 2025-06-18  0:49                     ` Jason Wang
  2025-06-18  2:47                       ` Xuewei Niu
  0 siblings, 1 reply; 59+ messages in thread
From: Jason Wang @ 2025-06-18  0:49 UTC (permalink / raw)
  To: Xuewei Niu; +Cc: fupan.lfp, mst, niuxuewei.nxw, parav, sgarzare, virtio-comment

On Tue, Jun 17, 2025 at 3:46 PM Xuewei Niu <niuxuewei97@gmail.com> wrote:
>
> Resend, because it isn’t listed in the mailing list due to my mistake.
>
> > On Mon, Jun 16, 2025 at 4:38 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
> > >
> > > On Mon, 16 Jun 2025 at 10:29, Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > >
> > > > > On Fri, Jun 13, 2025 at 4:46 PM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > > >
> > > > > > > On Fri, 13 Jun 2025 at 06:57, Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > > > > >
> > > > > > > > > On Sat, Apr 12, 2025 at 10:39 PM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > > > > > > >
> > > > > > > > > > This patch brings a new feature, called "multi devices", to the virtio
> > > > > > > > > > vsock. It introduces a "VIRTIO_VSOCK_F_MULTI_DEVICES" feature bit, and a
> > > > > > > > > > "device_order" field to the config for the virtio vsock.
> > > > > > > > > >
> > > > > > > > > > == Motivition ==
> > > > > > > > > >
> > > > > > > > > > Vsock is a lightweight and widely used data exchange mechanism between host
> > > > > > > > > > and guest. Currently, the virtio-vsock only supports one device, resulting
> > > > > > > > > > in the inability to enable more than one backend.
> > > > > > > > >
> > > > > > > > > I wonder which part of the spec forbids more than one device.
> > > > > > > >
> > > > > > > > No. The spec, however, is designed for a single device, and lacks some
> > > > > > > > specifications for multiple devices.
> > > > > > > >
> > > > > > > > For example, we should have a mechanism to select a device from all to
> > > > > > > > communicate with a peer.
> > > > >
> > > > > I wonder if this is a:
> > > > >
> > > > > 1) mechanism that needs to be mandated by the device
> > > >
> > > > Yes.
> > > >
> > > > > 2) a policy that is allowed to be tweaked by the user as TCP/IP did
> > > >
> > > > Not allowed in the current version.
> > > >
> > > > > (Note anyhow the driver can override what the device suggests...)
> > >
> > > I think we should follow what we described in the spec:
> > > "The virtio socket device is a zero-configuration socket communications device."
> >
> > We probably need to define "configuration" first.
> >
> > For example, if it means zero configuration from the user, it does not
> > conflict with 2), the driver can use its own algorithm to elect a
> > "default" device.
>
> IMHO, it can be done, but it is not the current design.

Well, you need at least explain the advantages or why you choose to do this.

>
> I prefer to set the default device for VM-HV communication to do some init
> work. I don't think people have a strong need for this.
>
> > Another perspective, making decisions in guests may be even more
> > helpful for the case where the device is not trusted
>
> How does the guest realize the vsock device is not trusted?

There're various ways to build trust (for example device attestation)
and more might come in the future.

> The guest only
> knows the information from its config space, which is provided by the host.

The way to build trust is probably beyond the scope of virtio, but it
is something we need to consider.

>
> > > So, IMO the guest (driver) should not be allowed to change anything.
> > > E.g. Right now it's not allowed to change the CID assigned by the host (device).
> >
> > A dumb question when having two cids (cid1 and cid2) in the same
> > guest, what happens if src=cid1 and dst=cid2?
>
> The packets will be directed to another application, if any, on the same guest.

Ok, so in your example, vhost-user-vsock should route those packets
back to kernel vsock?

Thanks

>
> Thanks,
> Xuewei
>
> > > Thanks,
> > > Stefano
> > >
> >
> > Thanks
>


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v7] virtio-vsock: Add support for multi devices
  2025-06-18  0:49                     ` Jason Wang
@ 2025-06-18  2:47                       ` Xuewei Niu
  2025-06-18  4:19                         ` Jason Wang
  0 siblings, 1 reply; 59+ messages in thread
From: Xuewei Niu @ 2025-06-18  2:47 UTC (permalink / raw)
  To: jasowang
  Cc: fupan.lfp, mst, niuxuewei.nxw, niuxuewei97, parav, sgarzare,
	virtio-comment

> On Tue, Jun 17, 2025 at 3:46 PM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> >
> > Resend, because it isn’t listed in the mailing list due to my mistake.
> >
> > > On Mon, Jun 16, 2025 at 4:38 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
> > > >
> > > > On Mon, 16 Jun 2025 at 10:29, Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > >
> > > > > > On Fri, Jun 13, 2025 at 4:46 PM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > > > >
> > > > > > > > On Fri, 13 Jun 2025 at 06:57, Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > > On Sat, Apr 12, 2025 at 10:39 PM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > This patch brings a new feature, called "multi devices", to the virtio
> > > > > > > > > > > vsock. It introduces a "VIRTIO_VSOCK_F_MULTI_DEVICES" feature bit, and a
> > > > > > > > > > > "device_order" field to the config for the virtio vsock.
> > > > > > > > > > >
> > > > > > > > > > > == Motivition ==
> > > > > > > > > > >
> > > > > > > > > > > Vsock is a lightweight and widely used data exchange mechanism between host
> > > > > > > > > > > and guest. Currently, the virtio-vsock only supports one device, resulting
> > > > > > > > > > > in the inability to enable more than one backend.
> > > > > > > > > >
> > > > > > > > > > I wonder which part of the spec forbids more than one device.
> > > > > > > > >
> > > > > > > > > No. The spec, however, is designed for a single device, and lacks some
> > > > > > > > > specifications for multiple devices.
> > > > > > > > >
> > > > > > > > > For example, we should have a mechanism to select a device from all to
> > > > > > > > > communicate with a peer.
> > > > > >
> > > > > > I wonder if this is a:
> > > > > >
> > > > > > 1) mechanism that needs to be mandated by the device
> > > > >
> > > > > Yes.
> > > > >
> > > > > > 2) a policy that is allowed to be tweaked by the user as TCP/IP did
> > > > >
> > > > > Not allowed in the current version.
> > > > >
> > > > > > (Note anyhow the driver can override what the device suggests...)
> > > >
> > > > I think we should follow what we described in the spec:
> > > > "The virtio socket device is a zero-configuration socket communications device."
> > >
> > > We probably need to define "configuration" first.
> > >
> > > For example, if it means zero configuration from the user, it does not
> > > conflict with 2), the driver can use its own algorithm to elect a
> > > "default" device.
> >
> > IMHO, it can be done, but it is not the current design.
> 
> Well, you need at least explain the advantages or why you choose to do this.

I listed in the previous message. Maybe it is not clear enough. I'll try to
explain it again.

I think people should pick up the device by a `bind()` call, which takes a
CID as an argument.

Generally, the device is picked up by the source CID, which is achieved
through a `bind()` call.

The default device, which is equivalent to the current single device, is
used to be compatible with the existing applications.

Apart from that, the default device is used to communicate with hypervisor
for some init works, such as gathering information about other vsock
devices.

To summarize, users must do `bind()` call explicitly to select the desired
device for non-HV-VM communications.

> > I prefer to set the default device for VM-HV communication to do some init
> > work. I don't think people have a strong need for this.
> >
> > > Another perspective, making decisions in guests may be even more
> > > helpful for the case where the device is not trusted
> >
> > How does the guest realize the vsock device is not trusted?
> 
> There're various ways to build trust (for example device attestation)
> and more might come in the future.
> 
> > The guest only
> > knows the information from its config space, which is provided by the host.
> 
> The way to build trust is probably beyond the scope of virtio, but it
> is something we need to consider.

Agree with you. If it comes in the future, I think the driver should have
the ability to make decisions.

> > > > So, IMO the guest (driver) should not be allowed to change anything.
> > > > E.g. Right now it's not allowed to change the CID assigned by the host (device).
> > >
> > > A dumb question when having two cids (cid1 and cid2) in the same
> > > guest, what happens if src=cid1 and dst=cid2?
> >
> > The packets will be directed to another application, if any, on the same guest.
> 
> Ok, so in your example, vhost-user-vsock should route those packets
> back to kernel vsock?

It depends. Two cases are possible:

1. dev0(cid1) and dev1(cid2) are from the same type of backend, e.g.
vhost-vsock, so the packets will be routed back as you described.
2. dev0(cid1) from vhost-user-vsock, and dev1(cid2) from vhost-vsock, the
packets will be dropped if vhost-user-vsock device can't find a device
whose cid is cid2.

Thanks,
Xuewei

> Thanks
> 
> >
> > Thanks,
> > Xuewei
> >
> > > > Thanks,
> > > > Stefano
> > > >
> > >
> > > Thanks
> >

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v7] virtio-vsock: Add support for multi devices
  2025-06-18  2:47                       ` Xuewei Niu
@ 2025-06-18  4:19                         ` Jason Wang
  2025-06-18  5:40                           ` Xuewei Niu
  0 siblings, 1 reply; 59+ messages in thread
From: Jason Wang @ 2025-06-18  4:19 UTC (permalink / raw)
  To: Xuewei Niu; +Cc: fupan.lfp, mst, niuxuewei.nxw, parav, sgarzare, virtio-comment

On Wed, Jun 18, 2025 at 10:47 AM Xuewei Niu <niuxuewei97@gmail.com> wrote:
>
> > On Tue, Jun 17, 2025 at 3:46 PM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > >
> > > Resend, because it isn’t listed in the mailing list due to my mistake.
> > >
> > > > On Mon, Jun 16, 2025 at 4:38 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
> > > > >
> > > > > On Mon, 16 Jun 2025 at 10:29, Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > > >
> > > > > > > On Fri, Jun 13, 2025 at 4:46 PM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > > > > >
> > > > > > > > > On Fri, 13 Jun 2025 at 06:57, Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > > > > > > >
> > > > > > > > > > > On Sat, Apr 12, 2025 at 10:39 PM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > This patch brings a new feature, called "multi devices", to the virtio
> > > > > > > > > > > > vsock. It introduces a "VIRTIO_VSOCK_F_MULTI_DEVICES" feature bit, and a
> > > > > > > > > > > > "device_order" field to the config for the virtio vsock.
> > > > > > > > > > > >
> > > > > > > > > > > > == Motivition ==
> > > > > > > > > > > >
> > > > > > > > > > > > Vsock is a lightweight and widely used data exchange mechanism between host
> > > > > > > > > > > > and guest. Currently, the virtio-vsock only supports one device, resulting
> > > > > > > > > > > > in the inability to enable more than one backend.
> > > > > > > > > > >
> > > > > > > > > > > I wonder which part of the spec forbids more than one device.
> > > > > > > > > >
> > > > > > > > > > No. The spec, however, is designed for a single device, and lacks some
> > > > > > > > > > specifications for multiple devices.
> > > > > > > > > >
> > > > > > > > > > For example, we should have a mechanism to select a device from all to
> > > > > > > > > > communicate with a peer.
> > > > > > >
> > > > > > > I wonder if this is a:
> > > > > > >
> > > > > > > 1) mechanism that needs to be mandated by the device
> > > > > >
> > > > > > Yes.
> > > > > >
> > > > > > > 2) a policy that is allowed to be tweaked by the user as TCP/IP did
> > > > > >
> > > > > > Not allowed in the current version.
> > > > > >
> > > > > > > (Note anyhow the driver can override what the device suggests...)
> > > > >
> > > > > I think we should follow what we described in the spec:
> > > > > "The virtio socket device is a zero-configuration socket communications device."
> > > >
> > > > We probably need to define "configuration" first.
> > > >
> > > > For example, if it means zero configuration from the user, it does not
> > > > conflict with 2), the driver can use its own algorithm to elect a
> > > > "default" device.
> > >
> > > IMHO, it can be done, but it is not the current design.
> >
> > Well, you need at least explain the advantages or why you choose to do this.
>
> I listed in the previous message. Maybe it is not clear enough. I'll try to
> explain it again.
>
> I think people should pick up the device by a `bind()` call, which takes a
> CID as an argument.
>
> Generally, the device is picked up by the source CID, which is achieved
> through a `bind()` call.
>
> The default device, which is equivalent to the current single device, is
> used to be compatible with the existing applications.
>
> Apart from that, the default device is used to communicate with hypervisor
> for some init works, such as gathering information about other vsock
> devices.
>
> To summarize, users must do `bind()` call explicitly to select the desired
> device for non-HV-VM communications.

So if I understand correctly you need a way to select the default when
bind() is not called?

>
> > > I prefer to set the default device for VM-HV communication to do some init
> > > work. I don't think people have a strong need for this.
> > >
> > > > Another perspective, making decisions in guests may be even more
> > > > helpful for the case where the device is not trusted
> > >
> > > How does the guest realize the vsock device is not trusted?
> >
> > There're various ways to build trust (for example device attestation)
> > and more might come in the future.
> >
> > > The guest only
> > > knows the information from its config space, which is provided by the host.
> >
> > The way to build trust is probably beyond the scope of virtio, but it
> > is something we need to consider.
>
> Agree with you. If it comes in the future, I think the driver should have
> the ability to make decisions.


Actually, I meant the way to build trust via virtio is something that
needs to be considered. But now we have other ways to build trust.
That would result a situlation:

1) "default" vsock device is not trusted but other might

or

2) two device claims that they are all "default"

This means anyhow we need a decision from the driver side so the
device side order seems to be useless here.

>
> > > > > So, IMO the guest (driver) should not be allowed to change anything.
> > > > > E.g. Right now it's not allowed to change the CID assigned by the host (device).
> > > >
> > > > A dumb question when having two cids (cid1 and cid2) in the same
> > > > guest, what happens if src=cid1 and dst=cid2?
> > >
> > > The packets will be directed to another application, if any, on the same guest.
> >
> > Ok, so in your example, vhost-user-vsock should route those packets
> > back to kernel vsock?
>
> It depends. Two cases are possible:
>
> 1. dev0(cid1) and dev1(cid2) are from the same type of backend, e.g.
> vhost-vsock, so the packets will be routed back as you described.
> 2. dev0(cid1) from vhost-user-vsock, and dev1(cid2) from vhost-vsock, the
> packets will be dropped if vhost-user-vsock device can't find a device
> whose cid is cid2.

Well, this means the behavior depends on the implementation which is not good.

Thanks

>
> Thanks,
> Xuewei
>
> > Thanks
> >
> > >
> > > Thanks,
> > > Xuewei
> > >
> > > > > Thanks,
> > > > > Stefano
> > > > >
> > > >
> > > > Thanks
> > >
>


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v7] virtio-vsock: Add support for multi devices
  2025-06-18  4:19                         ` Jason Wang
@ 2025-06-18  5:40                           ` Xuewei Niu
  2025-06-18  8:36                             ` Jason Wang
  0 siblings, 1 reply; 59+ messages in thread
From: Xuewei Niu @ 2025-06-18  5:40 UTC (permalink / raw)
  To: jasowang
  Cc: fupan.lfp, mst, niuxuewei.nxw, niuxuewei97, parav, sgarzare,
	virtio-comment

> On Wed, Jun 18, 2025 at 10:47 AM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> >
> > > On Tue, Jun 17, 2025 at 3:46 PM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > >
> > > > Resend, because it isn’t listed in the mailing list due to my mistake.
> > > >
> > > > > On Mon, Jun 16, 2025 at 4:38 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
> > > > > >
> > > > > > On Mon, 16 Jun 2025 at 10:29, Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > > > >
> > > > > > > > On Fri, Jun 13, 2025 at 4:46 PM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > > On Fri, 13 Jun 2025 at 06:57, Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > > On Sat, Apr 12, 2025 at 10:39 PM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > This patch brings a new feature, called "multi devices", to the virtio
> > > > > > > > > > > > > vsock. It introduces a "VIRTIO_VSOCK_F_MULTI_DEVICES" feature bit, and a
> > > > > > > > > > > > > "device_order" field to the config for the virtio vsock.
> > > > > > > > > > > > >
> > > > > > > > > > > > > == Motivition ==
> > > > > > > > > > > > >
> > > > > > > > > > > > > Vsock is a lightweight and widely used data exchange mechanism between host
> > > > > > > > > > > > > and guest. Currently, the virtio-vsock only supports one device, resulting
> > > > > > > > > > > > > in the inability to enable more than one backend.
> > > > > > > > > > > >
> > > > > > > > > > > > I wonder which part of the spec forbids more than one device.
> > > > > > > > > > >
> > > > > > > > > > > No. The spec, however, is designed for a single device, and lacks some
> > > > > > > > > > > specifications for multiple devices.
> > > > > > > > > > >
> > > > > > > > > > > For example, we should have a mechanism to select a device from all to
> > > > > > > > > > > communicate with a peer.
> > > > > > > >
> > > > > > > > I wonder if this is a:
> > > > > > > >
> > > > > > > > 1) mechanism that needs to be mandated by the device
> > > > > > >
> > > > > > > Yes.
> > > > > > >
> > > > > > > > 2) a policy that is allowed to be tweaked by the user as TCP/IP did
> > > > > > >
> > > > > > > Not allowed in the current version.
> > > > > > >
> > > > > > > > (Note anyhow the driver can override what the device suggests...)
> > > > > >
> > > > > > I think we should follow what we described in the spec:
> > > > > > "The virtio socket device is a zero-configuration socket communications device."
> > > > >
> > > > > We probably need to define "configuration" first.
> > > > >
> > > > > For example, if it means zero configuration from the user, it does not
> > > > > conflict with 2), the driver can use its own algorithm to elect a
> > > > > "default" device.
> > > >
> > > > IMHO, it can be done, but it is not the current design.
> > >
> > > Well, you need at least explain the advantages or why you choose to do this.
> >
> > I listed in the previous message. Maybe it is not clear enough. I'll try to
> > explain it again.
> >
> > I think people should pick up the device by a `bind()` call, which takes a
> > CID as an argument.
> >
> > Generally, the device is picked up by the source CID, which is achieved
> > through a `bind()` call.
> >
> > The default device, which is equivalent to the current single device, is
> > used to be compatible with the existing applications.
> >
> > Apart from that, the default device is used to communicate with hypervisor
> > for some init works, such as gathering information about other vsock
> > devices.
> >
> > To summarize, users must do `bind()` call explicitly to select the desired
> > device for non-HV-VM communications.
> 
> So if I understand correctly you need a way to select the default when
> bind() is not called?

Yes.

> > > > I prefer to set the default device for VM-HV communication to do some init
> > > > work. I don't think people have a strong need for this.
> > > >
> > > > > Another perspective, making decisions in guests may be even more
> > > > > helpful for the case where the device is not trusted
> > > >
> > > > How does the guest realize the vsock device is not trusted?
> > >
> > > There're various ways to build trust (for example device attestation)
> > > and more might come in the future.
> > >
> > > > The guest only
> > > > knows the information from its config space, which is provided by the host.
> > >
> > > The way to build trust is probably beyond the scope of virtio, but it
> > > is something we need to consider.
> >
> > Agree with you. If it comes in the future, I think the driver should have
> > the ability to make decisions.
> 
> 
> Actually, I meant the way to build trust via virtio is something that
> needs to be considered. But now we have other ways to build trust.
> That would result a situlation:
> 
> 1) "default" vsock device is not trusted but other might

I think this topic might be beyond the scope of this patch.

With the current version, there is only one device supported, which can be
considered as the "default". We don't have a mechanism to say "we don't
trust you", right? That is, it is assumed that we trust the device provided
by the hypervisor.

This patch is for multiple devices support, based on the same assumption. I
think trust is a good point to consider, but perhaps we have to address it
in the follow-up patches.

> or
> 
> 2) two device claims that they are all "default"
>
> This means anyhow we need a decision from the driver side so the
> device side order seems to be useless here.

There is only one default device. The device with the lowest device_order
is considered the default. It is not allowed to have the same device_order.

> > > > > > So, IMO the guest (driver) should not be allowed to change anything.
> > > > > > E.g. Right now it's not allowed to change the CID assigned by the host (device).
> > > > >
> > > > > A dumb question when having two cids (cid1 and cid2) in the same
> > > > > guest, what happens if src=cid1 and dst=cid2?
> > > >
> > > > The packets will be directed to another application, if any, on the same guest.
> > >
> > > Ok, so in your example, vhost-user-vsock should route those packets
> > > back to kernel vsock?
> >
> > It depends. Two cases are possible:
> >
> > 1. dev0(cid1) and dev1(cid2) are from the same type of backend, e.g.
> > vhost-vsock, so the packets will be routed back as you described.
> > 2. dev0(cid1) from vhost-user-vsock, and dev1(cid2) from vhost-vsock, the
> > packets will be dropped if vhost-user-vsock device can't find a device
> > whose cid is cid2.
> 
> Well, this means the behavior depends on the implementation which is not good.

No, it is not. It depends on whether the device can find a proper
target device based on the dst_cid.

In case 2, the two devices are in different namespaces (or sort of address
spaces). If there is no device with cid2 in the namespace of
dev0 (vhost-user-vsock), then the packet will be dropped.

In case 1, the two are in the same, so that the communication can be
established.

Thanks,
Xuewei

Thanks

> >
> > Thanks,
> > Xuewei
> >
> > > Thanks
> > >
> > > >
> > > > Thanks,
> > > > Xuewei
> > > >
> > > > > > Thanks,
> > > > > > Stefano
> > > > > >
> > > > >
> > > > > Thanks
> > > >
> >

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v7] virtio-vsock: Add support for multi devices
  2025-06-18  5:40                           ` Xuewei Niu
@ 2025-06-18  8:36                             ` Jason Wang
  2025-06-18  9:51                               ` Xuewei Niu
  0 siblings, 1 reply; 59+ messages in thread
From: Jason Wang @ 2025-06-18  8:36 UTC (permalink / raw)
  To: Xuewei Niu; +Cc: fupan.lfp, mst, niuxuewei.nxw, parav, sgarzare, virtio-comment

On Wed, Jun 18, 2025 at 1:40 PM Xuewei Niu <niuxuewei97@gmail.com> wrote:
>
> > On Wed, Jun 18, 2025 at 10:47 AM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > >
> > > > On Tue, Jun 17, 2025 at 3:46 PM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > >
> > > > > Resend, because it isn’t listed in the mailing list due to my mistake.
> > > > >
> > > > > > On Mon, Jun 16, 2025 at 4:38 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
> > > > > > >
> > > > > > > On Mon, 16 Jun 2025 at 10:29, Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > > > > >
> > > > > > > > > On Fri, Jun 13, 2025 at 4:46 PM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > > > > > > >
> > > > > > > > > > > On Fri, 13 Jun 2025 at 06:57, Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > On Sat, Apr 12, 2025 at 10:39 PM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > This patch brings a new feature, called "multi devices", to the virtio
> > > > > > > > > > > > > > vsock. It introduces a "VIRTIO_VSOCK_F_MULTI_DEVICES" feature bit, and a
> > > > > > > > > > > > > > "device_order" field to the config for the virtio vsock.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > == Motivition ==
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Vsock is a lightweight and widely used data exchange mechanism between host
> > > > > > > > > > > > > > and guest. Currently, the virtio-vsock only supports one device, resulting
> > > > > > > > > > > > > > in the inability to enable more than one backend.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I wonder which part of the spec forbids more than one device.
> > > > > > > > > > > >
> > > > > > > > > > > > No. The spec, however, is designed for a single device, and lacks some
> > > > > > > > > > > > specifications for multiple devices.
> > > > > > > > > > > >
> > > > > > > > > > > > For example, we should have a mechanism to select a device from all to
> > > > > > > > > > > > communicate with a peer.
> > > > > > > > >
> > > > > > > > > I wonder if this is a:
> > > > > > > > >
> > > > > > > > > 1) mechanism that needs to be mandated by the device
> > > > > > > >
> > > > > > > > Yes.
> > > > > > > >
> > > > > > > > > 2) a policy that is allowed to be tweaked by the user as TCP/IP did
> > > > > > > >
> > > > > > > > Not allowed in the current version.
> > > > > > > >
> > > > > > > > > (Note anyhow the driver can override what the device suggests...)
> > > > > > >
> > > > > > > I think we should follow what we described in the spec:
> > > > > > > "The virtio socket device is a zero-configuration socket communications device."
> > > > > >
> > > > > > We probably need to define "configuration" first.
> > > > > >
> > > > > > For example, if it means zero configuration from the user, it does not
> > > > > > conflict with 2), the driver can use its own algorithm to elect a
> > > > > > "default" device.
> > > > >
> > > > > IMHO, it can be done, but it is not the current design.
> > > >
> > > > Well, you need at least explain the advantages or why you choose to do this.
> > >
> > > I listed in the previous message. Maybe it is not clear enough. I'll try to
> > > explain it again.
> > >
> > > I think people should pick up the device by a `bind()` call, which takes a
> > > CID as an argument.
> > >
> > > Generally, the device is picked up by the source CID, which is achieved
> > > through a `bind()` call.
> > >
> > > The default device, which is equivalent to the current single device, is
> > > used to be compatible with the existing applications.
> > >
> > > Apart from that, the default device is used to communicate with hypervisor
> > > for some init works, such as gathering information about other vsock
> > > devices.
> > >
> > > To summarize, users must do `bind()` call explicitly to select the desired
> > > device for non-HV-VM communications.
> >
> > So if I understand correctly you need a way to select the default when
> > bind() is not called?
>
> Yes.
>
> > > > > I prefer to set the default device for VM-HV communication to do some init
> > > > > work. I don't think people have a strong need for this.
> > > > >
> > > > > > Another perspective, making decisions in guests may be even more
> > > > > > helpful for the case where the device is not trusted
> > > > >
> > > > > How does the guest realize the vsock device is not trusted?
> > > >
> > > > There're various ways to build trust (for example device attestation)
> > > > and more might come in the future.
> > > >
> > > > > The guest only
> > > > > knows the information from its config space, which is provided by the host.
> > > >
> > > > The way to build trust is probably beyond the scope of virtio, but it
> > > > is something we need to consider.
> > >
> > > Agree with you. If it comes in the future, I think the driver should have
> > > the ability to make decisions.
> >
> >
> > Actually, I meant the way to build trust via virtio is something that
> > needs to be considered. But now we have other ways to build trust.
> > That would result a situlation:
> >
> > 1) "default" vsock device is not trusted but other might
>
> I think this topic might be beyond the scope of this patch.
>
> With the current version, there is only one device supported, which can be
> considered as the "default".  We don't have a mechanism to say "we don't
> trust you", right?

No. I meant we don't have it in the virtio core but we already have it
in other layers (for example the transport layer).

> That is, it is assumed that we trust the device provided
> by the hypervisor.
>
> This patch is for multiple devices support, based on the same assumption. I
> think trust is a good point to consider, but perhaps we have to address it
> in the follow-up patches.

My point is not about how to build trust, it's about letting the
driver decide by itself in some cases.

A better way might be something like:

"The device_order is a hint for the driver to select a default vsock
device. Device MAY choose ...."

>
> > or
> >
> > 2) two device claims that they are all "default"
> >
> > This means anyhow we need a decision from the driver side so the
> > device side order seems to be useless here.
>
> There is only one default device. The device with the lowest device_order
> is considered the default. It is not allowed to have the same device_order.

Who can forbid two same device_order? Note that in various security
models, hypervisors are not trusted at all.

>
> > > > > > > So, IMO the guest (driver) should not be allowed to change anything.
> > > > > > > E.g. Right now it's not allowed to change the CID assigned by the host (device).
> > > > > >
> > > > > > A dumb question when having two cids (cid1 and cid2) in the same
> > > > > > guest, what happens if src=cid1 and dst=cid2?
> > > > >
> > > > > The packets will be directed to another application, if any, on the same guest.
> > > >
> > > > Ok, so in your example, vhost-user-vsock should route those packets
> > > > back to kernel vsock?
> > >
> > > It depends. Two cases are possible:
> > >
> > > 1. dev0(cid1) and dev1(cid2) are from the same type of backend, e.g.
> > > vhost-vsock, so the packets will be routed back as you described.
> > > 2. dev0(cid1) from vhost-user-vsock, and dev1(cid2) from vhost-vsock, the
> > > packets will be dropped if vhost-user-vsock device can't find a device
> > > whose cid is cid2.
> >
> > Well, this means the behavior depends on the implementation which is not good.
>
> No, it is not. It depends on whether the device can find a proper
> target device based on the dst_cid.

This sounds really weird, two cids belong to the same guest. So guests
should expect that the two vsock devic can talk to each other?

>
> In case 2, the two devices are in different namespaces (or sort of address
> spaces). If there is no device with cid2 in the namespace of
> dev0 (vhost-user-vsock), then the packet will be dropped.

Are you talking about the implementation of the device or the spec?


>
> In case 1, the two are in the same, so that the communication can be
> established.
>
> Thanks,
> Xuewei
>
> Thanks

Thanks

>
> > >
> > > Thanks,
> > > Xuewei
> > >
> > > > Thanks
> > > >
> > > > >
> > > > > Thanks,
> > > > > Xuewei
> > > > >
> > > > > > > Thanks,
> > > > > > > Stefano
> > > > > > >
> > > > > >
> > > > > > Thanks
> > > > >
> > >
>


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v7] virtio-vsock: Add support for multi devices
  2025-06-18  8:36                             ` Jason Wang
@ 2025-06-18  9:51                               ` Xuewei Niu
  2025-06-19  1:10                                 ` Jason Wang
  0 siblings, 1 reply; 59+ messages in thread
From: Xuewei Niu @ 2025-06-18  9:51 UTC (permalink / raw)
  To: jasowang
  Cc: fupan.lfp, mst, niuxuewei.nxw, niuxuewei97, parav, sgarzare,
	virtio-comment

> On Wed, Jun 18, 2025 at 1:40 PM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> >
> > > On Wed, Jun 18, 2025 at 10:47 AM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > >
> > > > > On Tue, Jun 17, 2025 at 3:46 PM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > > >
> > > > > > Resend, because it isn’t listed in the mailing list due to my mistake.
> > > > > >
> > > > > > > On Mon, Jun 16, 2025 at 4:38 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
> > > > > > > >
> > > > > > > > On Mon, 16 Jun 2025 at 10:29, Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > > On Fri, Jun 13, 2025 at 4:46 PM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > > On Fri, 13 Jun 2025 at 06:57, Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > On Sat, Apr 12, 2025 at 10:39 PM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > This patch brings a new feature, called "multi devices", to the virtio
> > > > > > > > > > > > > > > vsock. It introduces a "VIRTIO_VSOCK_F_MULTI_DEVICES" feature bit, and a
> > > > > > > > > > > > > > > "device_order" field to the config for the virtio vsock.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > == Motivition ==
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Vsock is a lightweight and widely used data exchange mechanism between host
> > > > > > > > > > > > > > > and guest. Currently, the virtio-vsock only supports one device, resulting
> > > > > > > > > > > > > > > in the inability to enable more than one backend.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I wonder which part of the spec forbids more than one device.
> > > > > > > > > > > > >
> > > > > > > > > > > > > No. The spec, however, is designed for a single device, and lacks some
> > > > > > > > > > > > > specifications for multiple devices.
> > > > > > > > > > > > >
> > > > > > > > > > > > > For example, we should have a mechanism to select a device from all to
> > > > > > > > > > > > > communicate with a peer.
> > > > > > > > > >
> > > > > > > > > > I wonder if this is a:
> > > > > > > > > >
> > > > > > > > > > 1) mechanism that needs to be mandated by the device
> > > > > > > > >
> > > > > > > > > Yes.
> > > > > > > > >
> > > > > > > > > > 2) a policy that is allowed to be tweaked by the user as TCP/IP did
> > > > > > > > >
> > > > > > > > > Not allowed in the current version.
> > > > > > > > >
> > > > > > > > > > (Note anyhow the driver can override what the device suggests...)
> > > > > > > >
> > > > > > > > I think we should follow what we described in the spec:
> > > > > > > > "The virtio socket device is a zero-configuration socket communications device."
> > > > > > >
> > > > > > > We probably need to define "configuration" first.
> > > > > > >
> > > > > > > For example, if it means zero configuration from the user, it does not
> > > > > > > conflict with 2), the driver can use its own algorithm to elect a
> > > > > > > "default" device.
> > > > > >
> > > > > > IMHO, it can be done, but it is not the current design.
> > > > >
> > > > > Well, you need at least explain the advantages or why you choose to do this.
> > > >
> > > > I listed in the previous message. Maybe it is not clear enough. I'll try to
> > > > explain it again.
> > > >
> > > > I think people should pick up the device by a `bind()` call, which takes a
> > > > CID as an argument.
> > > >
> > > > Generally, the device is picked up by the source CID, which is achieved
> > > > through a `bind()` call.
> > > >
> > > > The default device, which is equivalent to the current single device, is
> > > > used to be compatible with the existing applications.
> > > >
> > > > Apart from that, the default device is used to communicate with hypervisor
> > > > for some init works, such as gathering information about other vsock
> > > > devices.
> > > >
> > > > To summarize, users must do `bind()` call explicitly to select the desired
> > > > device for non-HV-VM communications.
> > >
> > > So if I understand correctly you need a way to select the default when
> > > bind() is not called?
> >
> > Yes.
> >
> > > > > > I prefer to set the default device for VM-HV communication to do some init
> > > > > > work. I don't think people have a strong need for this.
> > > > > >
> > > > > > > Another perspective, making decisions in guests may be even more
> > > > > > > helpful for the case where the device is not trusted
> > > > > >
> > > > > > How does the guest realize the vsock device is not trusted?
> > > > >
> > > > > There're various ways to build trust (for example device attestation)
> > > > > and more might come in the future.
> > > > >
> > > > > > The guest only
> > > > > > knows the information from its config space, which is provided by the host.
> > > > >
> > > > > The way to build trust is probably beyond the scope of virtio, but it
> > > > > is something we need to consider.
> > > >
> > > > Agree with you. If it comes in the future, I think the driver should have
> > > > the ability to make decisions.
> > >
> > >
> > > Actually, I meant the way to build trust via virtio is something that
> > > needs to be considered. But now we have other ways to build trust.
> > > That would result a situlation:
> > >
> > > 1) "default" vsock device is not trusted but other might
> >
> > I think this topic might be beyond the scope of this patch.
> >
> > With the current version, there is only one device supported, which can be
> > considered as the "default".  We don't have a mechanism to say "we don't
> > trust you", right?
> 
> No. I meant we don't have it in the virtio core but we already have it
> in other layers (for example the transport layer).

Are you referring to virito-vsock transport layer?
 
> > That is, it is assumed that we trust the device provided
> > by the hypervisor.
> >
> > This patch is for multiple devices support, based on the same assumption. I
> > think trust is a good point to consider, but perhaps we have to address it
> > in the follow-up patches.
> 
> My point is not about how to build trust, it's about letting the
> driver decide by itself in some cases.
> 
> A better way might be something like:
> 
> "The device_order is a hint for the driver to select a default vsock
> device. Device MAY choose ...."

Okay, I'll do that.

> >
> > > or
> > >
> > > 2) two device claims that they are all "default"
> > >
> > > This means anyhow we need a decision from the driver side so the
> > > device side order seems to be useless here.
> >
> > There is only one default device. The device with the lowest device_order
> > is considered the default. It is not allowed to have the same device_order.
> 
> Who can forbid two same device_order? Note that in various security
> models, hypervisors are not trusted at all.

The driver does. Indeed, the driver is able to deny devices if they violate
the spec.

> > > > > > > > So, IMO the guest (driver) should not be allowed to change anything.
> > > > > > > > E.g. Right now it's not allowed to change the CID assigned by the host (device).
> > > > > > >
> > > > > > > A dumb question when having two cids (cid1 and cid2) in the same
> > > > > > > guest, what happens if src=cid1 and dst=cid2?
> > > > > >
> > > > > > The packets will be directed to another application, if any, on the same guest.
> > > > >
> > > > > Ok, so in your example, vhost-user-vsock should route those packets
> > > > > back to kernel vsock?
> > > >
> > > > It depends. Two cases are possible:
> > > >
> > > > 1. dev0(cid1) and dev1(cid2) are from the same type of backend, e.g.
> > > > vhost-vsock, so the packets will be routed back as you described.
> > > > 2. dev0(cid1) from vhost-user-vsock, and dev1(cid2) from vhost-vsock, the
> > > > packets will be dropped if vhost-user-vsock device can't find a device
> > > > whose cid is cid2.
> > >
> > > Well, this means the behavior depends on the implementation which is not good.
> >
> > No, it is not. It depends on whether the device can find a proper
> > target device based on the dst_cid.
> 
> This sounds really weird, two cids belong to the same guest. So guests
> should expect that the two vsock devic can talk to each other?

Yes, a little bit complicated.

Two constraints should be applied:

1. No CID conflicts within the driver;
2. No CID conflicts within the address space.

Here is a diagram to illustrate the situation where it does not violate the
constraints:

 ┌─ kernel─(as0)────────────────────────────────────────────────────┐
 │   ┌───────────┐                      ┌───────────┐  ┌───────────┐│
 │   │ dev0(cid0)│                      │ dev2(cid1)│  │ dev3(cid2)││
 │   └───┬───────┘                      └───┬───────┘  └────────┬──┘│
 └───────┼──────────────────────────────────┼───────────────────┼───┘
  ┌──────┼───────────────────────┐   ┌──────┼───────────────────┼───┐
  │      │  ┌───────────┐        │   │      │  ┌───────────┐    │   │
  │      └──► dev0(cid0)│        │   │      └──► dev2(cid1)│    │   │
  │         └───────────┘        │   │         └───────────┘    │   │
  │         ┌───────────┐        │   │         ┌───────────┐    │   │
  │      ┌──► dev1(cid1)│        │   │         │ dev3(cid2)◄────┘   │
  │      │  └───────────┘     VM0│   │         └───────────┘     VM1│
  └──────┼───────────────────────┘   └──────────────────────────────┘
   vhost-user-vsock                                                  
┌────────┼──────────────────────────────────────────────────────────┐
│   ┌────┼──────┐                                                   │
│   │ dev1(cid1)│                                                   │
│   └───────────┘                                                   │
└─userapp─(as1)─────────────────────────────────────────────────────┘

- VM0
    - dev0
        - dst_cid = 2 (well-known cid): connect to host;
        - dst_cid = cid1: connect to dev2 (they are in the same as0);
        - dst_cid = cid2: connect to dev3;
    - dev1
        - dst_cid = 2 (well-known cid): connect to userapp;
        - dst_cid = cid0: failure (no cid0 is available in as1, even though
        cid1 is available in the VM0);
        - dst_cid = cid1: connect to dev0;
- VM1
    - dev2
        - dst_cid = 2 (well-known cid): connect to host;
        - dst_cid = cid0: connect to dev0;
        - dst_cid = cid2: connect to dev3;
    - dev3: skip the same as dev2.

So back to your question, my answer is that it depends on the address
space. Hope it could be helpful.

> > In case 2, the two devices are in different namespaces (or sort of address
> > spaces). If there is no device with cid2 in the namespace of
> > dev0 (vhost-user-vsock), then the packet will be dropped.
> 
> Are you talking about the implementation of the device or the spec?

I think it is the spec. I am not good at recognizing those, so please
correct me if I am wrong.

Thanks,
Xuewei

> > In case 1, the two are in the same, so that the communication can be
> > established.
> >
> > Thanks,
> > Xuewei
> >
> > Thanks
> 
> Thanks
> 
> >
> > > >
> > > > Thanks,
> > > > Xuewei
> > > >
> > > > > Thanks
> > > > >
> > > > > >
> > > > > > Thanks,
> > > > > > Xuewei
> > > > > >
> > > > > > > > Thanks,
> > > > > > > > Stefano
> > > > > > > >
> > > > > > >
> > > > > > > Thanks
> > > > > >
> > > >
> >

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v7] virtio-vsock: Add support for multi devices
  2025-06-18  9:51                               ` Xuewei Niu
@ 2025-06-19  1:10                                 ` Jason Wang
  2025-06-19  2:42                                   ` Xuewei Niu
  0 siblings, 1 reply; 59+ messages in thread
From: Jason Wang @ 2025-06-19  1:10 UTC (permalink / raw)
  To: Xuewei Niu; +Cc: fupan.lfp, mst, niuxuewei.nxw, parav, sgarzare, virtio-comment

On Wed, Jun 18, 2025 at 5:51 PM Xuewei Niu <niuxuewei97@gmail.com> wrote:
>
> > On Wed, Jun 18, 2025 at 1:40 PM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > >
> > > > On Wed, Jun 18, 2025 at 10:47 AM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > >
> > > > > > On Tue, Jun 17, 2025 at 3:46 PM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > > > >
> > > > > > > Resend, because it isn’t listed in the mailing list due to my mistake.
> > > > > > >
> > > > > > > > On Mon, Jun 16, 2025 at 4:38 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
> > > > > > > > >
> > > > > > > > > On Mon, 16 Jun 2025 at 10:29, Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > > > > > > >
> > > > > > > > > > > On Fri, Jun 13, 2025 at 4:46 PM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > On Fri, 13 Jun 2025 at 06:57, Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Sat, Apr 12, 2025 at 10:39 PM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > This patch brings a new feature, called "multi devices", to the virtio
> > > > > > > > > > > > > > > > vsock. It introduces a "VIRTIO_VSOCK_F_MULTI_DEVICES" feature bit, and a
> > > > > > > > > > > > > > > > "device_order" field to the config for the virtio vsock.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > == Motivition ==
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Vsock is a lightweight and widely used data exchange mechanism between host
> > > > > > > > > > > > > > > > and guest. Currently, the virtio-vsock only supports one device, resulting
> > > > > > > > > > > > > > > > in the inability to enable more than one backend.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I wonder which part of the spec forbids more than one device.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > No. The spec, however, is designed for a single device, and lacks some
> > > > > > > > > > > > > > specifications for multiple devices.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > For example, we should have a mechanism to select a device from all to
> > > > > > > > > > > > > > communicate with a peer.
> > > > > > > > > > >
> > > > > > > > > > > I wonder if this is a:
> > > > > > > > > > >
> > > > > > > > > > > 1) mechanism that needs to be mandated by the device
> > > > > > > > > >
> > > > > > > > > > Yes.
> > > > > > > > > >
> > > > > > > > > > > 2) a policy that is allowed to be tweaked by the user as TCP/IP did
> > > > > > > > > >
> > > > > > > > > > Not allowed in the current version.
> > > > > > > > > >
> > > > > > > > > > > (Note anyhow the driver can override what the device suggests...)
> > > > > > > > >
> > > > > > > > > I think we should follow what we described in the spec:
> > > > > > > > > "The virtio socket device is a zero-configuration socket communications device."
> > > > > > > >
> > > > > > > > We probably need to define "configuration" first.
> > > > > > > >
> > > > > > > > For example, if it means zero configuration from the user, it does not
> > > > > > > > conflict with 2), the driver can use its own algorithm to elect a
> > > > > > > > "default" device.
> > > > > > >
> > > > > > > IMHO, it can be done, but it is not the current design.
> > > > > >
> > > > > > Well, you need at least explain the advantages or why you choose to do this.
> > > > >
> > > > > I listed in the previous message. Maybe it is not clear enough. I'll try to
> > > > > explain it again.
> > > > >
> > > > > I think people should pick up the device by a `bind()` call, which takes a
> > > > > CID as an argument.
> > > > >
> > > > > Generally, the device is picked up by the source CID, which is achieved
> > > > > through a `bind()` call.
> > > > >
> > > > > The default device, which is equivalent to the current single device, is
> > > > > used to be compatible with the existing applications.
> > > > >
> > > > > Apart from that, the default device is used to communicate with hypervisor
> > > > > for some init works, such as gathering information about other vsock
> > > > > devices.
> > > > >
> > > > > To summarize, users must do `bind()` call explicitly to select the desired
> > > > > device for non-HV-VM communications.
> > > >
> > > > So if I understand correctly you need a way to select the default when
> > > > bind() is not called?
> > >
> > > Yes.
> > >
> > > > > > > I prefer to set the default device for VM-HV communication to do some init
> > > > > > > work. I don't think people have a strong need for this.
> > > > > > >
> > > > > > > > Another perspective, making decisions in guests may be even more
> > > > > > > > helpful for the case where the device is not trusted
> > > > > > >
> > > > > > > How does the guest realize the vsock device is not trusted?
> > > > > >
> > > > > > There're various ways to build trust (for example device attestation)
> > > > > > and more might come in the future.
> > > > > >
> > > > > > > The guest only
> > > > > > > knows the information from its config space, which is provided by the host.
> > > > > >
> > > > > > The way to build trust is probably beyond the scope of virtio, but it
> > > > > > is something we need to consider.
> > > > >
> > > > > Agree with you. If it comes in the future, I think the driver should have
> > > > > the ability to make decisions.
> > > >
> > > >
> > > > Actually, I meant the way to build trust via virtio is something that
> > > > needs to be considered. But now we have other ways to build trust.
> > > > That would result a situlation:
> > > >
> > > > 1) "default" vsock device is not trusted but other might
> > >
> > > I think this topic might be beyond the scope of this patch.
> > >
> > > With the current version, there is only one device supported, which can be
> > > considered as the "default".  We don't have a mechanism to say "we don't
> > > trust you", right?
> >
> > No. I meant we don't have it in the virtio core but we already have it
> > in other layers (for example the transport layer).
>
> Are you referring to virito-vsock transport layer?

Yes, for example the PCI layer.

>
> > > That is, it is assumed that we trust the device provided
> > > by the hypervisor.
> > >
> > > This patch is for multiple devices support, based on the same assumption. I
> > > think trust is a good point to consider, but perhaps we have to address it
> > > in the follow-up patches.
> >
> > My point is not about how to build trust, it's about letting the
> > driver decide by itself in some cases.
> >
> > A better way might be something like:
> >
> > "The device_order is a hint for the driver to select a default vsock
> > device. Device MAY choose ...."
>
> Okay, I'll do that.
>
> > >
> > > > or
> > > >
> > > > 2) two device claims that they are all "default"
> > > >
> > > > This means anyhow we need a decision from the driver side so the
> > > > device side order seems to be useless here.
> > >
> > > There is only one default device. The device with the lowest device_order
> > > is considered the default. It is not allowed to have the same device_order.
> >
> > Who can forbid two same device_order? Note that in various security
> > models, hypervisors are not trusted at all.
>
> The driver does. Indeed, the driver is able to deny devices if they violate
> the spec.

Yes, that's the point, anyhow driver need to do the decision.

>
> > > > > > > > > So, IMO the guest (driver) should not be allowed to change anything.
> > > > > > > > > E.g. Right now it's not allowed to change the CID assigned by the host (device).
> > > > > > > >
> > > > > > > > A dumb question when having two cids (cid1 and cid2) in the same
> > > > > > > > guest, what happens if src=cid1 and dst=cid2?
> > > > > > >
> > > > > > > The packets will be directed to another application, if any, on the same guest.
> > > > > >
> > > > > > Ok, so in your example, vhost-user-vsock should route those packets
> > > > > > back to kernel vsock?
> > > > >
> > > > > It depends. Two cases are possible:
> > > > >
> > > > > 1. dev0(cid1) and dev1(cid2) are from the same type of backend, e.g.
> > > > > vhost-vsock, so the packets will be routed back as you described.
> > > > > 2. dev0(cid1) from vhost-user-vsock, and dev1(cid2) from vhost-vsock, the
> > > > > packets will be dropped if vhost-user-vsock device can't find a device
> > > > > whose cid is cid2.
> > > >
> > > > Well, this means the behavior depends on the implementation which is not good.
> > >
> > > No, it is not. It depends on whether the device can find a proper
> > > target device based on the dst_cid.
> >
> > This sounds really weird, two cids belong to the same guest. So guests
> > should expect that the two vsock devic can talk to each other?
>
> Yes, a little bit complicated.
>
> Two constraints should be applied:
>
> 1. No CID conflicts within the driver;
> 2. No CID conflicts within the address space.
>
> Here is a diagram to illustrate the situation where it does not violate the
> constraints:
>
>  ┌─ kernel─(as0)────────────────────────────────────────────────────┐
>  │   ┌───────────┐                      ┌───────────┐  ┌───────────┐│
>  │   │ dev0(cid0)│                      │ dev2(cid1)│  │ dev3(cid2)││
>  │   └───┬───────┘                      └───┬───────┘  └────────┬──┘│
>  └───────┼──────────────────────────────────┼───────────────────┼───┘
>   ┌──────┼───────────────────────┐   ┌──────┼───────────────────┼───┐
>   │      │  ┌───────────┐        │   │      │  ┌───────────┐    │   │
>   │      └──► dev0(cid0)│        │   │      └──► dev2(cid1)│    │   │
>   │         └───────────┘        │   │         └───────────┘    │   │
>   │         ┌───────────┐        │   │         ┌───────────┐    │   │
>   │      ┌──► dev1(cid1)│        │   │         │ dev3(cid2)◄────┘   │
>   │      │  └───────────┘     VM0│   │         └───────────┘     VM1│
>   └──────┼───────────────────────┘   └──────────────────────────────┘
>    vhost-user-vsock
> ┌────────┼──────────────────────────────────────────────────────────┐
> │   ┌────┼──────┐                                                   │
> │   │ dev1(cid1)│                                                   │
> │   └───────────┘                                                   │
> └─userapp─(as1)─────────────────────────────────────────────────────┘
>
> - VM0
>     - dev0
>         - dst_cid = 2 (well-known cid): connect to host;
>         - dst_cid = cid1: connect to dev2 (they are in the same as0);
>         - dst_cid = cid2: connect to dev3;
>     - dev1
>         - dst_cid = 2 (well-known cid): connect to userapp;
>         - dst_cid = cid0: failure (no cid0 is available in as1, even though
>         cid1 is available in the VM0);
>         - dst_cid = cid1: connect to dev0;
> - VM1
>     - dev2
>         - dst_cid = 2 (well-known cid): connect to host;
>         - dst_cid = cid0: connect to dev0;
>         - dst_cid = cid2: connect to dev3;
>     - dev3: skip the same as dev2.
>
> So back to your question, my answer is that it depends on the address
> space. Hope it could be helpful.

This brings an interesting question, for example if vm0 tries to
connect to vm1, how does it know which device it needs to use (lacking
the concept like switch/route/address announcing etc...)?

Thanks


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v7] virtio-vsock: Add support for multi devices
  2025-06-19  1:10                                 ` Jason Wang
@ 2025-06-19  2:42                                   ` Xuewei Niu
  2025-06-23  8:01                                     ` Jason Wang
  0 siblings, 1 reply; 59+ messages in thread
From: Xuewei Niu @ 2025-06-19  2:42 UTC (permalink / raw)
  To: jasowang
  Cc: fupan.lfp, mst, niuxuewei.nxw, niuxuewei97, parav, sgarzare,
	virtio-comment

> On Wed, Jun 18, 2025 at 5:51 PM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> >
> > > On Wed, Jun 18, 2025 at 1:40 PM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > >
> > > > > On Wed, Jun 18, 2025 at 10:47 AM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > > >
> > > > > > > On Tue, Jun 17, 2025 at 3:46 PM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > > > > >
> > > > > > > > Resend, because it isn’t listed in the mailing list due to my mistake.
> > > > > > > >
> > > > > > > > > On Mon, Jun 16, 2025 at 4:38 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
> > > > > > > > > >
> > > > > > > > > > On Mon, 16 Jun 2025 at 10:29, Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > > On Fri, Jun 13, 2025 at 4:46 PM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > On Fri, 13 Jun 2025 at 06:57, Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Sat, Apr 12, 2025 at 10:39 PM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > This patch brings a new feature, called "multi devices", to the virtio
> > > > > > > > > > > > > > > > > vsock. It introduces a "VIRTIO_VSOCK_F_MULTI_DEVICES" feature bit, and a
> > > > > > > > > > > > > > > > > "device_order" field to the config for the virtio vsock.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > == Motivition ==
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Vsock is a lightweight and widely used data exchange mechanism between host
> > > > > > > > > > > > > > > > > and guest. Currently, the virtio-vsock only supports one device, resulting
> > > > > > > > > > > > > > > > > in the inability to enable more than one backend.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I wonder which part of the spec forbids more than one device.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > No. The spec, however, is designed for a single device, and lacks some
> > > > > > > > > > > > > > > specifications for multiple devices.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > For example, we should have a mechanism to select a device from all to
> > > > > > > > > > > > > > > communicate with a peer.
> > > > > > > > > > > >
> > > > > > > > > > > > I wonder if this is a:
> > > > > > > > > > > >
> > > > > > > > > > > > 1) mechanism that needs to be mandated by the device
> > > > > > > > > > >
> > > > > > > > > > > Yes.
> > > > > > > > > > >
> > > > > > > > > > > > 2) a policy that is allowed to be tweaked by the user as TCP/IP did
> > > > > > > > > > >
> > > > > > > > > > > Not allowed in the current version.
> > > > > > > > > > >
> > > > > > > > > > > > (Note anyhow the driver can override what the device suggests...)
> > > > > > > > > >
> > > > > > > > > > I think we should follow what we described in the spec:
> > > > > > > > > > "The virtio socket device is a zero-configuration socket communications device."
> > > > > > > > >
> > > > > > > > > We probably need to define "configuration" first.
> > > > > > > > >
> > > > > > > > > For example, if it means zero configuration from the user, it does not
> > > > > > > > > conflict with 2), the driver can use its own algorithm to elect a
> > > > > > > > > "default" device.
> > > > > > > >
> > > > > > > > IMHO, it can be done, but it is not the current design.
> > > > > > >
> > > > > > > Well, you need at least explain the advantages or why you choose to do this.
> > > > > >
> > > > > > I listed in the previous message. Maybe it is not clear enough. I'll try to
> > > > > > explain it again.
> > > > > >
> > > > > > I think people should pick up the device by a `bind()` call, which takes a
> > > > > > CID as an argument.
> > > > > >
> > > > > > Generally, the device is picked up by the source CID, which is achieved
> > > > > > through a `bind()` call.
> > > > > >
> > > > > > The default device, which is equivalent to the current single device, is
> > > > > > used to be compatible with the existing applications.
> > > > > >
> > > > > > Apart from that, the default device is used to communicate with hypervisor
> > > > > > for some init works, such as gathering information about other vsock
> > > > > > devices.
> > > > > >
> > > > > > To summarize, users must do `bind()` call explicitly to select the desired
> > > > > > device for non-HV-VM communications.
> > > > >
> > > > > So if I understand correctly you need a way to select the default when
> > > > > bind() is not called?
> > > >
> > > > Yes.
> > > >
> > > > > > > > I prefer to set the default device for VM-HV communication to do some init
> > > > > > > > work. I don't think people have a strong need for this.
> > > > > > > >
> > > > > > > > > Another perspective, making decisions in guests may be even more
> > > > > > > > > helpful for the case where the device is not trusted
> > > > > > > >
> > > > > > > > How does the guest realize the vsock device is not trusted?
> > > > > > >
> > > > > > > There're various ways to build trust (for example device attestation)
> > > > > > > and more might come in the future.
> > > > > > >
> > > > > > > > The guest only
> > > > > > > > knows the information from its config space, which is provided by the host.
> > > > > > >
> > > > > > > The way to build trust is probably beyond the scope of virtio, but it
> > > > > > > is something we need to consider.
> > > > > >
> > > > > > Agree with you. If it comes in the future, I think the driver should have
> > > > > > the ability to make decisions.
> > > > >
> > > > >
> > > > > Actually, I meant the way to build trust via virtio is something that
> > > > > needs to be considered. But now we have other ways to build trust.
> > > > > That would result a situlation:
> > > > >
> > > > > 1) "default" vsock device is not trusted but other might
> > > >
> > > > I think this topic might be beyond the scope of this patch.
> > > >
> > > > With the current version, there is only one device supported, which can be
> > > > considered as the "default".  We don't have a mechanism to say "we don't
> > > > trust you", right?
> > >
> > > No. I meant we don't have it in the virtio core but we already have it
> > > in other layers (for example the transport layer).
> >
> > Are you referring to virito-vsock transport layer?
> 
> Yes, for example the PCI layer.
> 
> >
> > > > That is, it is assumed that we trust the device provided
> > > > by the hypervisor.
> > > >
> > > > This patch is for multiple devices support, based on the same assumption. I
> > > > think trust is a good point to consider, but perhaps we have to address it
> > > > in the follow-up patches.
> > >
> > > My point is not about how to build trust, it's about letting the
> > > driver decide by itself in some cases.
> > >
> > > A better way might be something like:
> > >
> > > "The device_order is a hint for the driver to select a default vsock
> > > device. Device MAY choose ...."
> >
> > Okay, I'll do that.
> >
> > > >
> > > > > or
> > > > >
> > > > > 2) two device claims that they are all "default"
> > > > >
> > > > > This means anyhow we need a decision from the driver side so the
> > > > > device side order seems to be useless here.
> > > >
> > > > There is only one default device. The device with the lowest device_order
> > > > is considered the default. It is not allowed to have the same device_order.
> > >
> > > Who can forbid two same device_order? Note that in various security
> > > models, hypervisors are not trusted at all.
> >
> > The driver does. Indeed, the driver is able to deny devices if they violate
> > the spec.
> 
> Yes, that's the point, anyhow driver need to do the decision.
> 
> >
> > > > > > > > > > So, IMO the guest (driver) should not be allowed to change anything.
> > > > > > > > > > E.g. Right now it's not allowed to change the CID assigned by the host (device).
> > > > > > > > >
> > > > > > > > > A dumb question when having two cids (cid1 and cid2) in the same
> > > > > > > > > guest, what happens if src=cid1 and dst=cid2?
> > > > > > > >
> > > > > > > > The packets will be directed to another application, if any, on the same guest.
> > > > > > >
> > > > > > > Ok, so in your example, vhost-user-vsock should route those packets
> > > > > > > back to kernel vsock?
> > > > > >
> > > > > > It depends. Two cases are possible:
> > > > > >
> > > > > > 1. dev0(cid1) and dev1(cid2) are from the same type of backend, e.g.
> > > > > > vhost-vsock, so the packets will be routed back as you described.
> > > > > > 2. dev0(cid1) from vhost-user-vsock, and dev1(cid2) from vhost-vsock, the
> > > > > > packets will be dropped if vhost-user-vsock device can't find a device
> > > > > > whose cid is cid2.
> > > > >
> > > > > Well, this means the behavior depends on the implementation which is not good.
> > > >
> > > > No, it is not. It depends on whether the device can find a proper
> > > > target device based on the dst_cid.
> > >
> > > This sounds really weird, two cids belong to the same guest. So guests
> > > should expect that the two vsock devic can talk to each other?
> >
> > Yes, a little bit complicated.
> >
> > Two constraints should be applied:
> >
> > 1. No CID conflicts within the driver;
> > 2. No CID conflicts within the address space.
> >
> > Here is a diagram to illustrate the situation where it does not violate the
> > constraints:
> >
> >  ┌─ kernel─(as0)────────────────────────────────────────────────────┐
> >  │   ┌───────────┐                      ┌───────────┐  ┌───────────┐│
> >  │   │ dev0(cid0)│                      │ dev2(cid1)│  │ dev3(cid2)││
> >  │   └───┬───────┘                      └───┬───────┘  └────────┬──┘│
> >  └───────┼──────────────────────────────────┼───────────────────┼───┘
> >   ┌──────┼───────────────────────┐   ┌──────┼───────────────────┼───┐
> >   │      │  ┌───────────┐        │   │      │  ┌───────────┐    │   │
> >   │      └──► dev0(cid0)│        │   │      └──► dev2(cid1)│    │   │
> >   │         └───────────┘        │   │         └───────────┘    │   │
> >   │         ┌───────────┐        │   │         ┌───────────┐    │   │
> >   │      ┌──► dev1(cid1)│        │   │         │ dev3(cid2)◄────┘   │
> >   │      │  └───────────┘     VM0│   │         └───────────┘     VM1│
> >   └──────┼───────────────────────┘   └──────────────────────────────┘
> >    vhost-user-vsock
> > ┌────────┼──────────────────────────────────────────────────────────┐
> > │   ┌────┼──────┐                                                   │
> > │   │ dev1(cid1)│                                                   │
> > │   └───────────┘                                                   │
> > └─userapp─(as1)─────────────────────────────────────────────────────┘
> >
> > - VM0
> >     - dev0
> >         - dst_cid = 2 (well-known cid): connect to host;
> >         - dst_cid = cid1: connect to dev2 (they are in the same as0);
> >         - dst_cid = cid2: connect to dev3;
> >     - dev1
> >         - dst_cid = 2 (well-known cid): connect to userapp;
> >         - dst_cid = cid0: failure (no cid0 is available in as1, even though
> >         cid1 is available in the VM0);
> >         - dst_cid = cid1: connect to dev0;
> > - VM1
> >     - dev2
> >         - dst_cid = 2 (well-known cid): connect to host;
> >         - dst_cid = cid0: connect to dev0;
> >         - dst_cid = cid2: connect to dev3;
> >     - dev3: skip the same as dev2.
> >
> > So back to your question, my answer is that it depends on the address
> > space. Hope it could be helpful.
> 
> This brings an interesting question, for example if vm0 tries to
> connect to vm1, how does it know which device it needs to use (lacking
> the concept like switch/route/address announcing etc...)?

The HV should maintain a table for that. The guest things firstly
communicate with the HV, through the default device, to know which device
to use.

Thanks,
Xuewei

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v7] virtio-vsock: Add support for multi devices
  2025-06-17  6:01                 ` Parav Pandit
  2025-06-17  7:41                   ` Xuewei Niu
@ 2025-06-19  3:26                   ` Xuewei Niu
  2025-06-19  4:40                     ` Parav Pandit
  1 sibling, 1 reply; 59+ messages in thread
From: Xuewei Niu @ 2025-06-19  3:26 UTC (permalink / raw)
  To: parav; +Cc: fupan.lfp, mst, niuxuewei.nxw, niuxuewei97, sgarzare,
	virtio-comment

Hi Parav,

Could you please take a look at the diagram in [1]?

IIUC, for VM0, there are two groups, and for VM1, there is only one group.
Am I right? If yes, I think the group concept is reasonable but we don't
need at this time.

I think the first thing is to figure out how to pick the right group.

Standard socket doesn't provide a way to access the group information.

Source and destination are from `bind()` and `connect()`, respectively. If
we don't call `bind()`, only the destination is known.

However, only destination is not enough to find the group. For example, the
well-known CIDs (e.g. 2) are valid for all groups.

1: https://lore.kernel.org/virtio-comment/20250618095139.1412138-1-niuxuewei.nxw@antgroup.com/

> > From: Xuewei Niu <niuxuewei97@gmail.com>
> > Sent: Monday, June 16, 2025 4:26 PM
> > 
> > > > From: Xuewei Niu <niuxuewei97@gmail.com>
> > > > Sent: Monday, June 16, 2025 2:30 PM
> > > >
> > > > > > From: Xuewei Niu <niuxuewei97@gmail.com>
> > > > > > Sent: Monday, June 16, 2025 1:48 PM
> > > > > >
> > > > > > Hi, Parav.
> > > > > >
> > > > > > Thanks for your detailed comments.
> > > > > >
> > > > > > > Hi Xuewei,
> > > > > > >
> > > > > > > > From: Xuewei Niu <niuxuewei97@gmail.com>
> > > > > > > > Sent: Monday, May 19, 2025 3:08 PM
> > > > > > > >
> > > > > > > > > On Sat, Apr 12, 2025 at 10:28:25PM +0800, Xuewei Niu wrote:
> > > > > > > > > > This patch brings a new feature, called "multi devices",
> > > > > > > > > > to the virtio vsock. It introduces a
> > "VIRTIO_VSOCK_F_MULTI_DEVICES"
> > > > > > > > > > feature bit, and a "device_order" field to the config
> > > > > > > > > > for the virtio
> > > > vsock.
> > > > > > > > > >
> > > > > > > > > > == Motivition ==
> > > > > > > > > >
> > > > > > > > > > Vsock is a lightweight and widely used data exchange
> > > > > > > > > > mechanism between host and guest.
> > > > > > >
> > > > > > > Even though it is the current use, the specification does not
> > > > > > > prevent its usage
> > > > > > between two guests via a host.
> > > > > > > So we should not assume such guest <-> host communication as
> > > > > > > the only
> > > > > > case to add new feature.
> > > > > > >
> > > > > > > For example, in the spec only must requirement is that src_cid
> > > > > > > ==
> > > > > > config.guest_cid.
> > > > > > > Dst_cid can be anything, it need not be well known 0x2 (for the host).
> > > > > >
> > > > > > Yes.
> > > > > >
> > > > > > > With this flexibility in the spec, one can connect vsock
> > > > > > > devices with multiple
> > > > > > different backends.
> > > > > > >
> > > > > > > For example,
> > > > > > > QEMU can insert one vsock device for VM to HV communication.
> > > > > >
> > > > > > It is also able to communicate with other devices on the same host, i.e.
> > > > > > VM-to-VM.
> > > > > >
> > > > > > > A real PCI device can insert one voscket device for VM-to-VM
> > > > > > communication bypassing a full TCP/IP stack.
> > > > > >
> > > > > > AFAIK, all devices are implemented in software. Is it a real PCI HW device?
> > > > > >
> > > > > Yes. virto PCI devices are implemented as hw or as vdpa for many
> > > > > years now by
> > > > cloud operators and by NIC vendors.
> > > >
> > > > Thanks for your confirmation.
> > > >
> > > > > > > This means there are two different backends.
> > > > > > > And these two devices should not be grouped in the use case
> > > > > > > you
> > > > described.
> > > > > >
> > > > > > I think one group is enough for all use cases. It is required
> > > > > > that CIDs are unique in global, i.e. all backends.
> > > > > >
> > > > > > As your example, let me assume there are two VMs
> > > > > >
> > > > > > 1. VM0 (two vsock backends)
> > > > > >     1.1 device0 (cid=3, default), backend is host kernel (vhost-vsock);
> > > > > >     1.2 device1 (cid=4), backend is HV (virtio-vsock).
> > > > > > 2. VM1
> > > > > >     2.1 device2 (cid=5, default), backend is host kernel (vhost-vsock).
> > > > > >
> > > > > > The device0 is able to do VM0-HOST (src_cid=3, dst_cid=2) and
> > > > > > VM0-VM1 (src_cid=3, dst_cid=5) communicatation, while the
> > > > > > device1 is only able to do VM0-HV (src_cid=4, dst_cid=2)
> > communicatation.
> > > > > >
> > > > > In example 1.1 and 1.2 no devices are grouped.
> > > > > Your proposal of this patch wants to group the two devices and
> > > > > pick one of
> > > > them as default device.
> > > >
> > > > Yes. I just wonder if it is possible to have more than one groups in one guest?
> > > >
> > > For example
> > > Group_1: two devices dev0 and dev1, implemented as PCI HW devices.
> > > Group_2: two devices by QEMU SW implemented as sw backend.
> > >
> > > All the 4 devices has _F bit indicating they can be grouped.
> > > But there is no indication that they are part of which group.
> > > And hence the guest VM driver is in dark on how to forward requests without
> > the bind() call.
> > 
> > I see. Thanks!
> > 
> > My idea is that there is only one default device, no matter how many types of
> > backends are. If the users intend to use other devices, `bind()` call is required.
> > 
> > For example, we set `dev0` as the default device:
> > 
> > 1. Do not call `bind()`: use dev0;
> > 2. Call `bind(${dev0_cid})`: use dev1;
> > ...
> > 5. Call `bind(${dev4_cid})`: use dev4;
> > 
> > Even though we introduce the group concept, if we don't call `bind()`, how does
> > driver know which group to use? If the driver recoginizes the dst_cid, it can use
> > the group to find the device, then the things will be complicated. The driver
> > needs to know the relationship between the dst_cid and the group.
> >
> Based on the dst_cid picking the right vscock group would be needed. This is vsock level issue at driver level.
> Driver would need enough hints or encoding or of dst_cid or something else.
> 
> So even though we miss vsock level construct, it should be the reason to not group the devices.
> As both attempt to solve issue at different level.
> 
> > WDYT?
> > 
> > Thanks,
> > Xuewei
> > 
> > > > > If device0 and device1 are inserted to the VM0 with the feature
> > > > > bit you
> > > > suggested, the guest things that they are part of the same group.
> > > >
> > > > In this patch, we don't allow to insert devices without the feature
> > > > bit if there are already devices with the feature bit.
> > > >
> > > In above example of two groups, all the 4 devices spread across two groups has
> > the feature bit set.
> > > Yet, they cannot be grouped correctly.
> > > Driver driving blind thinks that all 4 devices are part of the same group.
> > >
> > > > As a result, there can be either multiple devices with the feature
> > > > bit or just a single device.
> > > >
> > >
> > > > > When bind() call is not done, host sw does not know which device
> > > > > to pick up
> > > > between device0 and device1 when binding the devices.
> > > >
> > > > Are you referring to an vsock application on the host? If yes, "host
> > > > sw" is able to pick up one device according to the dst cid. For
> > > > example, pick up
> > > > device0 if `connect(3)` is called.
> > > >
> > > > Please be aware that the "host sw" can not pick up device1, since
> > > > its device is not in the host kernel.
> > > >
> > > > > > In a word, a tuple identifies a connection.
> > > > > >
> > > > > > "Refuse to connect" will be raised if the device1 attempts to
> > > > > > connect to the device2. "They are not in the same group" is a
> > > > > > reasonable
> > > > explaination.
> > > > > > Am I right?
> > > > > >
> > > > > During bind call, one needs to select the device when the devices
> > > > > are coming
> > > > from multiple different backends.
> > > >
> > > > Yes.
> > > >
> > > > Thanks,
> > > > Xuewei

^ permalink raw reply	[flat|nested] 59+ messages in thread

* RE: [PATCH v7] virtio-vsock: Add support for multi devices
  2025-06-19  3:26                   ` Xuewei Niu
@ 2025-06-19  4:40                     ` Parav Pandit
  2025-06-19  5:10                       ` Xuewei Niu
  0 siblings, 1 reply; 59+ messages in thread
From: Parav Pandit @ 2025-06-19  4:40 UTC (permalink / raw)
  To: Xuewei Niu
  Cc: fupan.lfp@antgroup.com, mst@redhat.com,
	niuxuewei.nxw@antgroup.com, sgarzare@redhat.com,
	virtio-comment@lists.linux.dev



> From: Xuewei Niu <niuxuewei97@gmail.com>
> Sent: 19 June 2025 08:57 AM
> 
> Hi Parav,
> 
> Could you please take a look at the diagram in [1]?
> 
> IIUC, for VM0, there are two groups, and for VM1, there is only one group.
> Am I right? If yes, I think the group concept is reasonable but we don't need
> at this time.
> 
> I think the first thing is to figure out how to pick the right group.
> 
> Standard socket doesn't provide a way to access the group information.
> 
> Source and destination are from `bind()` and `connect()`, respectively. If we
> don't call `bind()`, only the destination is known.
> 
> However, only destination is not enough to find the group. For example, the
> well-known CIDs (e.g. 2) are valid for all groups.
> 
> 1:
> https://lore.kernel.org/virtio-comment/20250618095139.1412138-1-niuxuewei.nxw@antgroup.com/

Right. Sock addressing scheme is naïve presently to select the group. Not sure when/how you or others plan to do.
This is transport layer problem to solve (not to confuse with transport = pci/mmio etc).

However, at device level, we should have the construct of grouping.
Without this construct, all devices will be part of single group and one will not be able to build the group concept later.
So even if you don't need it explicitly now, grouping the device is what you need when connect() is called.

So I was imagining a relatively simple scheme:
For example, virtio device level, some kind of group id is present.
So two devices which has same group id, are part of single group.
An example group id format can be a UUID.

And this is completely optional for devices to implement.
Generic enough and usable beyond just vsock device in other use cases we discussed in past.

> 
> > > From: Xuewei Niu <niuxuewei97@gmail.com>
> > > Sent: Monday, June 16, 2025 4:26 PM
> > >
> > > > > From: Xuewei Niu <niuxuewei97@gmail.com>
> > > > > Sent: Monday, June 16, 2025 2:30 PM
> > > > >
> > > > > > > From: Xuewei Niu <niuxuewei97@gmail.com>
> > > > > > > Sent: Monday, June 16, 2025 1:48 PM
> > > > > > >
> > > > > > > Hi, Parav.
> > > > > > >
> > > > > > > Thanks for your detailed comments.
> > > > > > >
> > > > > > > > Hi Xuewei,
> > > > > > > >
> > > > > > > > > From: Xuewei Niu <niuxuewei97@gmail.com>
> > > > > > > > > Sent: Monday, May 19, 2025 3:08 PM
> > > > > > > > >
> > > > > > > > > > On Sat, Apr 12, 2025 at 10:28:25PM +0800, Xuewei Niu wrote:
> > > > > > > > > > > This patch brings a new feature, called "multi
> > > > > > > > > > > devices", to the virtio vsock. It introduces a
> > > "VIRTIO_VSOCK_F_MULTI_DEVICES"
> > > > > > > > > > > feature bit, and a "device_order" field to the
> > > > > > > > > > > config for the virtio
> > > > > vsock.
> > > > > > > > > > >
> > > > > > > > > > > == Motivition ==
> > > > > > > > > > >
> > > > > > > > > > > Vsock is a lightweight and widely used data exchange
> > > > > > > > > > > mechanism between host and guest.
> > > > > > > >
> > > > > > > > Even though it is the current use, the specification does
> > > > > > > > not prevent its usage
> > > > > > > between two guests via a host.
> > > > > > > > So we should not assume such guest <-> host communication
> > > > > > > > as the only
> > > > > > > case to add new feature.
> > > > > > > >
> > > > > > > > For example, in the spec only must requirement is that
> > > > > > > > src_cid ==
> > > > > > > config.guest_cid.
> > > > > > > > Dst_cid can be anything, it need not be well known 0x2 (for the
> host).
> > > > > > >
> > > > > > > Yes.
> > > > > > >
> > > > > > > > With this flexibility in the spec, one can connect vsock
> > > > > > > > devices with multiple
> > > > > > > different backends.
> > > > > > > >
> > > > > > > > For example,
> > > > > > > > QEMU can insert one vsock device for VM to HV communication.
> > > > > > >
> > > > > > > It is also able to communicate with other devices on the same host,
> i.e.
> > > > > > > VM-to-VM.
> > > > > > >
> > > > > > > > A real PCI device can insert one voscket device for
> > > > > > > > VM-to-VM
> > > > > > > communication bypassing a full TCP/IP stack.
> > > > > > >
> > > > > > > AFAIK, all devices are implemented in software. Is it a real PCI HW
> device?
> > > > > > >
> > > > > > Yes. virto PCI devices are implemented as hw or as vdpa for
> > > > > > many years now by
> > > > > cloud operators and by NIC vendors.
> > > > >
> > > > > Thanks for your confirmation.
> > > > >
> > > > > > > > This means there are two different backends.
> > > > > > > > And these two devices should not be grouped in the use
> > > > > > > > case you
> > > > > described.
> > > > > > >
> > > > > > > I think one group is enough for all use cases. It is
> > > > > > > required that CIDs are unique in global, i.e. all backends.
> > > > > > >
> > > > > > > As your example, let me assume there are two VMs
> > > > > > >
> > > > > > > 1. VM0 (two vsock backends)
> > > > > > >     1.1 device0 (cid=3, default), backend is host kernel (vhost-vsock);
> > > > > > >     1.2 device1 (cid=4), backend is HV (virtio-vsock).
> > > > > > > 2. VM1
> > > > > > >     2.1 device2 (cid=5, default), backend is host kernel (vhost-vsock).
> > > > > > >
> > > > > > > The device0 is able to do VM0-HOST (src_cid=3, dst_cid=2)
> > > > > > > and
> > > > > > > VM0-VM1 (src_cid=3, dst_cid=5) communicatation, while the
> > > > > > > device1 is only able to do VM0-HV (src_cid=4, dst_cid=2)
> > > communicatation.
> > > > > > >
> > > > > > In example 1.1 and 1.2 no devices are grouped.
> > > > > > Your proposal of this patch wants to group the two devices and
> > > > > > pick one of
> > > > > them as default device.
> > > > >
> > > > > Yes. I just wonder if it is possible to have more than one groups in one
> guest?
> > > > >
> > > > For example
> > > > Group_1: two devices dev0 and dev1, implemented as PCI HW devices.
> > > > Group_2: two devices by QEMU SW implemented as sw backend.
> > > >
> > > > All the 4 devices has _F bit indicating they can be grouped.
> > > > But there is no indication that they are part of which group.
> > > > And hence the guest VM driver is in dark on how to forward
> > > > requests without
> > > the bind() call.
> > >
> > > I see. Thanks!
> > >
> > > My idea is that there is only one default device, no matter how many
> > > types of backends are. If the users intend to use other devices, `bind()` call
> is required.
> > >
> > > For example, we set `dev0` as the default device:
> > >
> > > 1. Do not call `bind()`: use dev0;
> > > 2. Call `bind(${dev0_cid})`: use dev1; ...
> > > 5. Call `bind(${dev4_cid})`: use dev4;
> > >
> > > Even though we introduce the group concept, if we don't call
> > > `bind()`, how does driver know which group to use? If the driver
> > > recoginizes the dst_cid, it can use the group to find the device,
> > > then the things will be complicated. The driver needs to know the
> relationship between the dst_cid and the group.
> > >
> > Based on the dst_cid picking the right vscock group would be needed. This
> is vsock level issue at driver level.
> > Driver would need enough hints or encoding or of dst_cid or something
> else.
> >
> > So even though we miss vsock level construct, it should be the reason to not
> group the devices.
> > As both attempt to solve issue at different level.
> >
> > > WDYT?
> > >
> > > Thanks,
> > > Xuewei
> > >
> > > > > > If device0 and device1 are inserted to the VM0 with the
> > > > > > feature bit you
> > > > > suggested, the guest things that they are part of the same group.
> > > > >
> > > > > In this patch, we don't allow to insert devices without the
> > > > > feature bit if there are already devices with the feature bit.
> > > > >
> > > > In above example of two groups, all the 4 devices spread across
> > > > two groups has
> > > the feature bit set.
> > > > Yet, they cannot be grouped correctly.
> > > > Driver driving blind thinks that all 4 devices are part of the same group.
> > > >
> > > > > As a result, there can be either multiple devices with the
> > > > > feature bit or just a single device.
> > > > >
> > > >
> > > > > > When bind() call is not done, host sw does not know which
> > > > > > device to pick up
> > > > > between device0 and device1 when binding the devices.
> > > > >
> > > > > Are you referring to an vsock application on the host? If yes,
> > > > > "host sw" is able to pick up one device according to the dst
> > > > > cid. For example, pick up
> > > > > device0 if `connect(3)` is called.
> > > > >
> > > > > Please be aware that the "host sw" can not pick up device1,
> > > > > since its device is not in the host kernel.
> > > > >
> > > > > > > In a word, a tuple identifies a connection.
> > > > > > >
> > > > > > > "Refuse to connect" will be raised if the device1 attempts
> > > > > > > to connect to the device2. "They are not in the same group"
> > > > > > > is a reasonable
> > > > > explaination.
> > > > > > > Am I right?
> > > > > > >
> > > > > > During bind call, one needs to select the device when the
> > > > > > devices are coming
> > > > > from multiple different backends.
> > > > >
> > > > > Yes.
> > > > >
> > > > > Thanks,
> > > > > Xuewei

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v7] virtio-vsock: Add support for multi devices
  2025-06-19  4:40                     ` Parav Pandit
@ 2025-06-19  5:10                       ` Xuewei Niu
  2025-06-19  5:25                         ` Parav Pandit
  2025-06-23  7:53                         ` Stefano Garzarella
  0 siblings, 2 replies; 59+ messages in thread
From: Xuewei Niu @ 2025-06-19  5:10 UTC (permalink / raw)
  To: parav; +Cc: fupan.lfp, mst, niuxuewei.nxw, niuxuewei97, sgarzare,
	virtio-comment

> > From: Xuewei Niu <niuxuewei97@gmail.com>
> > Sent: 19 June 2025 08:57 AM
> > 
> > Hi Parav,
> > 
> > Could you please take a look at the diagram in [1]?
> > 
> > IIUC, for VM0, there are two groups, and for VM1, there is only one group.
> > Am I right? If yes, I think the group concept is reasonable but we don't need
> > at this time.
> > 
> > I think the first thing is to figure out how to pick the right group.
> > 
> > Standard socket doesn't provide a way to access the group information.
> > 
> > Source and destination are from `bind()` and `connect()`, respectively. If we
> > don't call `bind()`, only the destination is known.
> > 
> > However, only destination is not enough to find the group. For example, the
> > well-known CIDs (e.g. 2) are valid for all groups.
> > 
> > 1:
> > https://lore.kernel.org/virtio-comment/20250618095139.1412138-1-niuxuewei.nxw@antgroup.com/
> 
> Right. Sock addressing scheme is naïve presently to select the group. Not sure when/how you or others plan to do.
> This is transport layer problem to solve (not to confuse with transport = pci/mmio etc).
> 
> However, at device level, we should have the construct of grouping.
> Without this construct, all devices will be part of single group and one will not be able to build the group concept later.
> So even if you don't need it explicitly now, grouping the device is what you need when connect() is called.
> 
> So I was imagining a relatively simple scheme:
> For example, virtio device level, some kind of group id is present.
> So two devices which has same group id, are part of single group.
> An example group id format can be a UUID.
> 
> And this is completely optional for devices to implement.
> Generic enough and usable beyond just vsock device in other use cases we discussed in past.

Fair enough.

@Stefano, could you please take a look at this? I'd love to have some input
from you.

A brief summary of the idea is: The config space will be extended to
include a group id. The devices with the same group id are considered to be
in the same group.

Thanks,
Xuewei
 
> > 
> > > > From: Xuewei Niu <niuxuewei97@gmail.com>
> > > > Sent: Monday, June 16, 2025 4:26 PM
> > > >
> > > > > > From: Xuewei Niu <niuxuewei97@gmail.com>
> > > > > > Sent: Monday, June 16, 2025 2:30 PM
> > > > > >
> > > > > > > > From: Xuewei Niu <niuxuewei97@gmail.com>
> > > > > > > > Sent: Monday, June 16, 2025 1:48 PM
> > > > > > > >
> > > > > > > > Hi, Parav.
> > > > > > > >
> > > > > > > > Thanks for your detailed comments.
> > > > > > > >
> > > > > > > > > Hi Xuewei,
> > > > > > > > >
> > > > > > > > > > From: Xuewei Niu <niuxuewei97@gmail.com>
> > > > > > > > > > Sent: Monday, May 19, 2025 3:08 PM
> > > > > > > > > >
> > > > > > > > > > > On Sat, Apr 12, 2025 at 10:28:25PM +0800, Xuewei Niu wrote:
> > > > > > > > > > > > This patch brings a new feature, called "multi
> > > > > > > > > > > > devices", to the virtio vsock. It introduces a
> > > > "VIRTIO_VSOCK_F_MULTI_DEVICES"
> > > > > > > > > > > > feature bit, and a "device_order" field to the
> > > > > > > > > > > > config for the virtio
> > > > > > vsock.
> > > > > > > > > > > >
> > > > > > > > > > > > == Motivition ==
> > > > > > > > > > > >
> > > > > > > > > > > > Vsock is a lightweight and widely used data exchange
> > > > > > > > > > > > mechanism between host and guest.
> > > > > > > > >
> > > > > > > > > Even though it is the current use, the specification does
> > > > > > > > > not prevent its usage
> > > > > > > > between two guests via a host.
> > > > > > > > > So we should not assume such guest <-> host communication
> > > > > > > > > as the only
> > > > > > > > case to add new feature.
> > > > > > > > >
> > > > > > > > > For example, in the spec only must requirement is that
> > > > > > > > > src_cid ==
> > > > > > > > config.guest_cid.
> > > > > > > > > Dst_cid can be anything, it need not be well known 0x2 (for the
> > host).
> > > > > > > >
> > > > > > > > Yes.
> > > > > > > >
> > > > > > > > > With this flexibility in the spec, one can connect vsock
> > > > > > > > > devices with multiple
> > > > > > > > different backends.
> > > > > > > > >
> > > > > > > > > For example,
> > > > > > > > > QEMU can insert one vsock device for VM to HV communication.
> > > > > > > >
> > > > > > > > It is also able to communicate with other devices on the same host,
> > i.e.
> > > > > > > > VM-to-VM.
> > > > > > > >
> > > > > > > > > A real PCI device can insert one voscket device for
> > > > > > > > > VM-to-VM
> > > > > > > > communication bypassing a full TCP/IP stack.
> > > > > > > >
> > > > > > > > AFAIK, all devices are implemented in software. Is it a real PCI HW
> > device?
> > > > > > > >
> > > > > > > Yes. virto PCI devices are implemented as hw or as vdpa for
> > > > > > > many years now by
> > > > > > cloud operators and by NIC vendors.
> > > > > >
> > > > > > Thanks for your confirmation.
> > > > > >
> > > > > > > > > This means there are two different backends.
> > > > > > > > > And these two devices should not be grouped in the use
> > > > > > > > > case you
> > > > > > described.
> > > > > > > >
> > > > > > > > I think one group is enough for all use cases. It is
> > > > > > > > required that CIDs are unique in global, i.e. all backends.
> > > > > > > >
> > > > > > > > As your example, let me assume there are two VMs
> > > > > > > >
> > > > > > > > 1. VM0 (two vsock backends)
> > > > > > > >     1.1 device0 (cid=3, default), backend is host kernel (vhost-vsock);
> > > > > > > >     1.2 device1 (cid=4), backend is HV (virtio-vsock).
> > > > > > > > 2. VM1
> > > > > > > >     2.1 device2 (cid=5, default), backend is host kernel (vhost-vsock).
> > > > > > > >
> > > > > > > > The device0 is able to do VM0-HOST (src_cid=3, dst_cid=2)
> > > > > > > > and
> > > > > > > > VM0-VM1 (src_cid=3, dst_cid=5) communicatation, while the
> > > > > > > > device1 is only able to do VM0-HV (src_cid=4, dst_cid=2)
> > > > communicatation.
> > > > > > > >
> > > > > > > In example 1.1 and 1.2 no devices are grouped.
> > > > > > > Your proposal of this patch wants to group the two devices and
> > > > > > > pick one of
> > > > > > them as default device.
> > > > > >
> > > > > > Yes. I just wonder if it is possible to have more than one groups in one
> > guest?
> > > > > >
> > > > > For example
> > > > > Group_1: two devices dev0 and dev1, implemented as PCI HW devices.
> > > > > Group_2: two devices by QEMU SW implemented as sw backend.
> > > > >
> > > > > All the 4 devices has _F bit indicating they can be grouped.
> > > > > But there is no indication that they are part of which group.
> > > > > And hence the guest VM driver is in dark on how to forward
> > > > > requests without
> > > > the bind() call.
> > > >
> > > > I see. Thanks!
> > > >
> > > > My idea is that there is only one default device, no matter how many
> > > > types of backends are. If the users intend to use other devices, `bind()` call
> > is required.
> > > >
> > > > For example, we set `dev0` as the default device:
> > > >
> > > > 1. Do not call `bind()`: use dev0;
> > > > 2. Call `bind(${dev0_cid})`: use dev1; ...
> > > > 5. Call `bind(${dev4_cid})`: use dev4;
> > > >
> > > > Even though we introduce the group concept, if we don't call
> > > > `bind()`, how does driver know which group to use? If the driver
> > > > recoginizes the dst_cid, it can use the group to find the device,
> > > > then the things will be complicated. The driver needs to know the
> > relationship between the dst_cid and the group.
> > > >
> > > Based on the dst_cid picking the right vscock group would be needed. This
> > is vsock level issue at driver level.
> > > Driver would need enough hints or encoding or of dst_cid or something
> > else.
> > >
> > > So even though we miss vsock level construct, it should be the reason to not
> > group the devices.
> > > As both attempt to solve issue at different level.
> > >
> > > > WDYT?
> > > >
> > > > Thanks,
> > > > Xuewei
> > > >
> > > > > > > If device0 and device1 are inserted to the VM0 with the
> > > > > > > feature bit you
> > > > > > suggested, the guest things that they are part of the same group.
> > > > > >
> > > > > > In this patch, we don't allow to insert devices without the
> > > > > > feature bit if there are already devices with the feature bit.
> > > > > >
> > > > > In above example of two groups, all the 4 devices spread across
> > > > > two groups has
> > > > the feature bit set.
> > > > > Yet, they cannot be grouped correctly.
> > > > > Driver driving blind thinks that all 4 devices are part of the same group.
> > > > >
> > > > > > As a result, there can be either multiple devices with the
> > > > > > feature bit or just a single device.
> > > > > >
> > > > >
> > > > > > > When bind() call is not done, host sw does not know which
> > > > > > > device to pick up
> > > > > > between device0 and device1 when binding the devices.
> > > > > >
> > > > > > Are you referring to an vsock application on the host? If yes,
> > > > > > "host sw" is able to pick up one device according to the dst
> > > > > > cid. For example, pick up
> > > > > > device0 if `connect(3)` is called.
> > > > > >
> > > > > > Please be aware that the "host sw" can not pick up device1,
> > > > > > since its device is not in the host kernel.
> > > > > >
> > > > > > > > In a word, a tuple identifies a connection.
> > > > > > > >
> > > > > > > > "Refuse to connect" will be raised if the device1 attempts
> > > > > > > > to connect to the device2. "They are not in the same group"
> > > > > > > > is a reasonable
> > > > > > explaination.
> > > > > > > > Am I right?
> > > > > > > >
> > > > > > > During bind call, one needs to select the device when the
> > > > > > > devices are coming
> > > > > > from multiple different backends.
> > > > > >
> > > > > > Yes.
> > > > > >
> > > > > > Thanks,
> > > > > > Xuewei

^ permalink raw reply	[flat|nested] 59+ messages in thread

* RE: [PATCH v7] virtio-vsock: Add support for multi devices
  2025-06-19  5:10                       ` Xuewei Niu
@ 2025-06-19  5:25                         ` Parav Pandit
  2025-06-22 13:54                           ` Xuewei Niu
  2025-06-23  7:53                         ` Stefano Garzarella
  1 sibling, 1 reply; 59+ messages in thread
From: Parav Pandit @ 2025-06-19  5:25 UTC (permalink / raw)
  To: Xuewei Niu
  Cc: fupan.lfp@antgroup.com, mst@redhat.com,
	niuxuewei.nxw@antgroup.com, sgarzare@redhat.com,
	virtio-comment@lists.linux.dev


> From: Xuewei Niu <niuxuewei97@gmail.com>
> Sent: 19 June 2025 10:41 AM
> 
> > > From: Xuewei Niu <niuxuewei97@gmail.com>
> > > Sent: 19 June 2025 08:57 AM
> > >
> > > Hi Parav,
> > >
> > > Could you please take a look at the diagram in [1]?
> > >
> > > IIUC, for VM0, there are two groups, and for VM1, there is only one group.
> > > Am I right? If yes, I think the group concept is reasonable but we
> > > don't need at this time.
> > >
> > > I think the first thing is to figure out how to pick the right group.
> > >
> > > Standard socket doesn't provide a way to access the group information.
> > >
> > > Source and destination are from `bind()` and `connect()`,
> > > respectively. If we don't call `bind()`, only the destination is known.
> > >
> > > However, only destination is not enough to find the group. For
> > > example, the well-known CIDs (e.g. 2) are valid for all groups.
> > >
> > > 1:
> > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flo
> > > re.kernel.org%2Fvirtio-comment%2F20250618095139.1412138-1-
> niuxuewei.
> > >
> nxw%40antgroup.com%2F&data=05%7C02%7Cparav%40nvidia.com%7C9d4a
> 762f95
> > >
> 49433be00208ddaeefa4ba%7C43083d15727340c1b7db39efd9ccc17a%7C0%7
> C0%7C
> > >
> 638859066500269242%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOn
> RydWUsI
> > >
> lYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D
> %
> > >
> 7C0%7C%7C%7C&sdata=ANv4q7h1Z1Jw5AyypDRE9brIbeyt4siy4g4hiYf8Lis%3
> D&re
> > > served=0
> >
> > Right. Sock addressing scheme is naïve presently to select the group. Not
> sure when/how you or others plan to do.
> > This is transport layer problem to solve (not to confuse with transport =
> pci/mmio etc).
> >
> > However, at device level, we should have the construct of grouping.
> > Without this construct, all devices will be part of single group and one will
> not be able to build the group concept later.
> > So even if you don't need it explicitly now, grouping the device is what you
> need when connect() is called.
> >
> > So I was imagining a relatively simple scheme:
> > For example, virtio device level, some kind of group id is present.
> > So two devices which has same group id, are part of single group.
> > An example group id format can be a UUID.
> >
> > And this is completely optional for devices to implement.
> > Generic enough and usable beyond just vsock device in other use cases we
> discussed in past.
> 
> Fair enough.
> 
> @Stefano, could you please take a look at this? I'd love to have some input
> from you.
> 
> A brief summary of the idea is: The config space will be extended to include a
> group id. 
UUIDs are long even though they are read only.
And config space is readable only after feature bits are negotiated.

I didn't think enough if the driver needs to know early enough when creating the 'struct virtio_device' with two legs of virtio_pci_dev * in it.
Need some more thoughts on it.

> The devices with the same group id are considered to be in the
> same group.
>
Sounds good.
 
> Thanks,
> Xuewei
> 
> > >
> > > > > From: Xuewei Niu <niuxuewei97@gmail.com>
> > > > > Sent: Monday, June 16, 2025 4:26 PM
> > > > >
> > > > > > > From: Xuewei Niu <niuxuewei97@gmail.com>
> > > > > > > Sent: Monday, June 16, 2025 2:30 PM
> > > > > > >
> > > > > > > > > From: Xuewei Niu <niuxuewei97@gmail.com>
> > > > > > > > > Sent: Monday, June 16, 2025 1:48 PM
> > > > > > > > >
> > > > > > > > > Hi, Parav.
> > > > > > > > >
> > > > > > > > > Thanks for your detailed comments.
> > > > > > > > >
> > > > > > > > > > Hi Xuewei,
> > > > > > > > > >
> > > > > > > > > > > From: Xuewei Niu <niuxuewei97@gmail.com>
> > > > > > > > > > > Sent: Monday, May 19, 2025 3:08 PM
> > > > > > > > > > >
> > > > > > > > > > > > On Sat, Apr 12, 2025 at 10:28:25PM +0800, Xuewei Niu
> wrote:
> > > > > > > > > > > > > This patch brings a new feature, called "multi
> > > > > > > > > > > > > devices", to the virtio vsock. It introduces a
> > > > > "VIRTIO_VSOCK_F_MULTI_DEVICES"
> > > > > > > > > > > > > feature bit, and a "device_order" field to the
> > > > > > > > > > > > > config for the virtio
> > > > > > > vsock.
> > > > > > > > > > > > >
> > > > > > > > > > > > > == Motivition ==
> > > > > > > > > > > > >
> > > > > > > > > > > > > Vsock is a lightweight and widely used data
> > > > > > > > > > > > > exchange mechanism between host and guest.
> > > > > > > > > >
> > > > > > > > > > Even though it is the current use, the specification
> > > > > > > > > > does not prevent its usage
> > > > > > > > > between two guests via a host.
> > > > > > > > > > So we should not assume such guest <-> host
> > > > > > > > > > communication as the only
> > > > > > > > > case to add new feature.
> > > > > > > > > >
> > > > > > > > > > For example, in the spec only must requirement is that
> > > > > > > > > > src_cid ==
> > > > > > > > > config.guest_cid.
> > > > > > > > > > Dst_cid can be anything, it need not be well known 0x2
> > > > > > > > > > (for the
> > > host).
> > > > > > > > >
> > > > > > > > > Yes.
> > > > > > > > >
> > > > > > > > > > With this flexibility in the spec, one can connect
> > > > > > > > > > vsock devices with multiple
> > > > > > > > > different backends.
> > > > > > > > > >
> > > > > > > > > > For example,
> > > > > > > > > > QEMU can insert one vsock device for VM to HV
> communication.
> > > > > > > > >
> > > > > > > > > It is also able to communicate with other devices on the
> > > > > > > > > same host,
> > > i.e.
> > > > > > > > > VM-to-VM.
> > > > > > > > >
> > > > > > > > > > A real PCI device can insert one voscket device for
> > > > > > > > > > VM-to-VM
> > > > > > > > > communication bypassing a full TCP/IP stack.
> > > > > > > > >
> > > > > > > > > AFAIK, all devices are implemented in software. Is it a
> > > > > > > > > real PCI HW
> > > device?
> > > > > > > > >
> > > > > > > > Yes. virto PCI devices are implemented as hw or as vdpa
> > > > > > > > for many years now by
> > > > > > > cloud operators and by NIC vendors.
> > > > > > >
> > > > > > > Thanks for your confirmation.
> > > > > > >
> > > > > > > > > > This means there are two different backends.
> > > > > > > > > > And these two devices should not be grouped in the use
> > > > > > > > > > case you
> > > > > > > described.
> > > > > > > > >
> > > > > > > > > I think one group is enough for all use cases. It is
> > > > > > > > > required that CIDs are unique in global, i.e. all backends.
> > > > > > > > >
> > > > > > > > > As your example, let me assume there are two VMs
> > > > > > > > >
> > > > > > > > > 1. VM0 (two vsock backends)
> > > > > > > > >     1.1 device0 (cid=3, default), backend is host kernel (vhost-
> vsock);
> > > > > > > > >     1.2 device1 (cid=4), backend is HV (virtio-vsock).
> > > > > > > > > 2. VM1
> > > > > > > > >     2.1 device2 (cid=5, default), backend is host kernel (vhost-
> vsock).
> > > > > > > > >
> > > > > > > > > The device0 is able to do VM0-HOST (src_cid=3,
> > > > > > > > > dst_cid=2) and
> > > > > > > > > VM0-VM1 (src_cid=3, dst_cid=5) communicatation, while
> > > > > > > > > the
> > > > > > > > > device1 is only able to do VM0-HV (src_cid=4, dst_cid=2)
> > > > > communicatation.
> > > > > > > > >
> > > > > > > > In example 1.1 and 1.2 no devices are grouped.
> > > > > > > > Your proposal of this patch wants to group the two devices
> > > > > > > > and pick one of
> > > > > > > them as default device.
> > > > > > >
> > > > > > > Yes. I just wonder if it is possible to have more than one
> > > > > > > groups in one
> > > guest?
> > > > > > >
> > > > > > For example
> > > > > > Group_1: two devices dev0 and dev1, implemented as PCI HW
> devices.
> > > > > > Group_2: two devices by QEMU SW implemented as sw backend.
> > > > > >
> > > > > > All the 4 devices has _F bit indicating they can be grouped.
> > > > > > But there is no indication that they are part of which group.
> > > > > > And hence the guest VM driver is in dark on how to forward
> > > > > > requests without
> > > > > the bind() call.
> > > > >
> > > > > I see. Thanks!
> > > > >
> > > > > My idea is that there is only one default device, no matter how
> > > > > many types of backends are. If the users intend to use other
> > > > > devices, `bind()` call
> > > is required.
> > > > >
> > > > > For example, we set `dev0` as the default device:
> > > > >
> > > > > 1. Do not call `bind()`: use dev0; 2. Call `bind(${dev0_cid})`:
> > > > > use dev1; ...
> > > > > 5. Call `bind(${dev4_cid})`: use dev4;
> > > > >
> > > > > Even though we introduce the group concept, if we don't call
> > > > > `bind()`, how does driver know which group to use? If the driver
> > > > > recoginizes the dst_cid, it can use the group to find the
> > > > > device, then the things will be complicated. The driver needs to
> > > > > know the
> > > relationship between the dst_cid and the group.
> > > > >
> > > > Based on the dst_cid picking the right vscock group would be
> > > > needed. This
> > > is vsock level issue at driver level.
> > > > Driver would need enough hints or encoding or of dst_cid or
> > > > something
> > > else.
> > > >
> > > > So even though we miss vsock level construct, it should be the
> > > > reason to not
> > > group the devices.
> > > > As both attempt to solve issue at different level.
> > > >
> > > > > WDYT?
> > > > >
> > > > > Thanks,
> > > > > Xuewei
> > > > >
> > > > > > > > If device0 and device1 are inserted to the VM0 with the
> > > > > > > > feature bit you
> > > > > > > suggested, the guest things that they are part of the same group.
> > > > > > >
> > > > > > > In this patch, we don't allow to insert devices without the
> > > > > > > feature bit if there are already devices with the feature bit.
> > > > > > >
> > > > > > In above example of two groups, all the 4 devices spread
> > > > > > across two groups has
> > > > > the feature bit set.
> > > > > > Yet, they cannot be grouped correctly.
> > > > > > Driver driving blind thinks that all 4 devices are part of the same
> group.
> > > > > >
> > > > > > > As a result, there can be either multiple devices with the
> > > > > > > feature bit or just a single device.
> > > > > > >
> > > > > >
> > > > > > > > When bind() call is not done, host sw does not know which
> > > > > > > > device to pick up
> > > > > > > between device0 and device1 when binding the devices.
> > > > > > >
> > > > > > > Are you referring to an vsock application on the host? If
> > > > > > > yes, "host sw" is able to pick up one device according to
> > > > > > > the dst cid. For example, pick up
> > > > > > > device0 if `connect(3)` is called.
> > > > > > >
> > > > > > > Please be aware that the "host sw" can not pick up device1,
> > > > > > > since its device is not in the host kernel.
> > > > > > >
> > > > > > > > > In a word, a tuple identifies a connection.
> > > > > > > > >
> > > > > > > > > "Refuse to connect" will be raised if the device1
> > > > > > > > > attempts to connect to the device2. "They are not in the same
> group"
> > > > > > > > > is a reasonable
> > > > > > > explaination.
> > > > > > > > > Am I right?
> > > > > > > > >
> > > > > > > > During bind call, one needs to select the device when the
> > > > > > > > devices are coming
> > > > > > > from multiple different backends.
> > > > > > >
> > > > > > > Yes.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Xuewei

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v7] virtio-vsock: Add support for multi devices
  2025-06-19  5:25                         ` Parav Pandit
@ 2025-06-22 13:54                           ` Xuewei Niu
  0 siblings, 0 replies; 59+ messages in thread
From: Xuewei Niu @ 2025-06-22 13:54 UTC (permalink / raw)
  To: parav; +Cc: fupan.lfp, mst, niuxuewei.nxw, niuxuewei97, sgarzare,
	virtio-comment

> > From: Xuewei Niu <niuxuewei97@gmail.com>
> > Sent: 19 June 2025 10:41 AM
> > 
> > > > From: Xuewei Niu <niuxuewei97@gmail.com>
> > > > Sent: 19 June 2025 08:57 AM
> > > >
> > > > Hi Parav,
> > > >
> > > > Could you please take a look at the diagram in [1]?
> > > >
> > > > IIUC, for VM0, there are two groups, and for VM1, there is only one group.
> > > > Am I right? If yes, I think the group concept is reasonable but we
> > > > don't need at this time.
> > > >
> > > > I think the first thing is to figure out how to pick the right group.
> > > >
> > > > Standard socket doesn't provide a way to access the group information.
> > > >
> > > > Source and destination are from `bind()` and `connect()`,
> > > > respectively. If we don't call `bind()`, only the destination is known.
> > > >
> > > > However, only destination is not enough to find the group. For
> > > > example, the well-known CIDs (e.g. 2) are valid for all groups.
> > > >
> > > > 1:
> > > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flo
> > > > re.kernel.org%2Fvirtio-comment%2F20250618095139.1412138-1-
> > niuxuewei.
> > > >
> > nxw%40antgroup.com%2F&data=05%7C02%7Cparav%40nvidia.com%7C9d4a
> > 762f95
> > > >
> > 49433be00208ddaeefa4ba%7C43083d15727340c1b7db39efd9ccc17a%7C0%7
> > C0%7C
> > > >
> > 638859066500269242%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOn
> > RydWUsI
> > > >
> > lYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D
> > %
> > > >
> > 7C0%7C%7C%7C&sdata=ANv4q7h1Z1Jw5AyypDRE9brIbeyt4siy4g4hiYf8Lis%3
> > D&re
> > > > served=0
> > >
> > > Right. Sock addressing scheme is naïve presently to select the group. Not
> > sure when/how you or others plan to do.
> > > This is transport layer problem to solve (not to confuse with transport =
> > pci/mmio etc).
> > >
> > > However, at device level, we should have the construct of grouping.
> > > Without this construct, all devices will be part of single group and one will
> > not be able to build the group concept later.
> > > So even if you don't need it explicitly now, grouping the device is what you
> > need when connect() is called.
> > >
> > > So I was imagining a relatively simple scheme:
> > > For example, virtio device level, some kind of group id is present.
> > > So two devices which has same group id, are part of single group.
> > > An example group id format can be a UUID.
> > >
> > > And this is completely optional for devices to implement.
> > > Generic enough and usable beyond just vsock device in other use cases we
> > discussed in past.
> > 
> > Fair enough.
> > 
> > @Stefano, could you please take a look at this? I'd love to have some input
> > from you.
> > 
> > A brief summary of the idea is: The config space will be extended to include a
> > group id. 
> UUIDs are long even though they are read only.
> And config space is readable only after feature bits are negotiated.
> 
> I didn't think enough if the driver needs to know early enough when creating the 'struct virtio_device' with two legs of virtio_pci_dev * in it.
> Need some more thoughts on it.

I think it is okay to do so. It is used in socket layer. By that time, the
driver will be able to access the config space.

Thanks,
Xuewei

> > The devices with the same group id are considered to be in the
> > same group.
> >
> Sounds good.
>  
> > Thanks,
> > Xuewei
> > 
> > > >
> > > > > > From: Xuewei Niu <niuxuewei97@gmail.com>
> > > > > > Sent: Monday, June 16, 2025 4:26 PM
> > > > > >
> > > > > > > > From: Xuewei Niu <niuxuewei97@gmail.com>
> > > > > > > > Sent: Monday, June 16, 2025 2:30 PM
> > > > > > > >
> > > > > > > > > > From: Xuewei Niu <niuxuewei97@gmail.com>
> > > > > > > > > > Sent: Monday, June 16, 2025 1:48 PM
> > > > > > > > > >
> > > > > > > > > > Hi, Parav.
> > > > > > > > > >
> > > > > > > > > > Thanks for your detailed comments.
> > > > > > > > > >
> > > > > > > > > > > Hi Xuewei,
> > > > > > > > > > >
> > > > > > > > > > > > From: Xuewei Niu <niuxuewei97@gmail.com>
> > > > > > > > > > > > Sent: Monday, May 19, 2025 3:08 PM
> > > > > > > > > > > >
> > > > > > > > > > > > > On Sat, Apr 12, 2025 at 10:28:25PM +0800, Xuewei Niu
> > wrote:
> > > > > > > > > > > > > > This patch brings a new feature, called "multi
> > > > > > > > > > > > > > devices", to the virtio vsock. It introduces a
> > > > > > "VIRTIO_VSOCK_F_MULTI_DEVICES"
> > > > > > > > > > > > > > feature bit, and a "device_order" field to the
> > > > > > > > > > > > > > config for the virtio
> > > > > > > > vsock.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > == Motivition ==
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Vsock is a lightweight and widely used data
> > > > > > > > > > > > > > exchange mechanism between host and guest.
> > > > > > > > > > >
> > > > > > > > > > > Even though it is the current use, the specification
> > > > > > > > > > > does not prevent its usage
> > > > > > > > > > between two guests via a host.
> > > > > > > > > > > So we should not assume such guest <-> host
> > > > > > > > > > > communication as the only
> > > > > > > > > > case to add new feature.
> > > > > > > > > > >
> > > > > > > > > > > For example, in the spec only must requirement is that
> > > > > > > > > > > src_cid ==
> > > > > > > > > > config.guest_cid.
> > > > > > > > > > > Dst_cid can be anything, it need not be well known 0x2
> > > > > > > > > > > (for the
> > > > host).
> > > > > > > > > >
> > > > > > > > > > Yes.
> > > > > > > > > >
> > > > > > > > > > > With this flexibility in the spec, one can connect
> > > > > > > > > > > vsock devices with multiple
> > > > > > > > > > different backends.
> > > > > > > > > > >
> > > > > > > > > > > For example,
> > > > > > > > > > > QEMU can insert one vsock device for VM to HV
> > communication.
> > > > > > > > > >
> > > > > > > > > > It is also able to communicate with other devices on the
> > > > > > > > > > same host,
> > > > i.e.
> > > > > > > > > > VM-to-VM.
> > > > > > > > > >
> > > > > > > > > > > A real PCI device can insert one voscket device for
> > > > > > > > > > > VM-to-VM
> > > > > > > > > > communication bypassing a full TCP/IP stack.
> > > > > > > > > >
> > > > > > > > > > AFAIK, all devices are implemented in software. Is it a
> > > > > > > > > > real PCI HW
> > > > device?
> > > > > > > > > >
> > > > > > > > > Yes. virto PCI devices are implemented as hw or as vdpa
> > > > > > > > > for many years now by
> > > > > > > > cloud operators and by NIC vendors.
> > > > > > > >
> > > > > > > > Thanks for your confirmation.
> > > > > > > >
> > > > > > > > > > > This means there are two different backends.
> > > > > > > > > > > And these two devices should not be grouped in the use
> > > > > > > > > > > case you
> > > > > > > > described.
> > > > > > > > > >
> > > > > > > > > > I think one group is enough for all use cases. It is
> > > > > > > > > > required that CIDs are unique in global, i.e. all backends.
> > > > > > > > > >
> > > > > > > > > > As your example, let me assume there are two VMs
> > > > > > > > > >
> > > > > > > > > > 1. VM0 (two vsock backends)
> > > > > > > > > >     1.1 device0 (cid=3, default), backend is host kernel (vhost-
> > vsock);
> > > > > > > > > >     1.2 device1 (cid=4), backend is HV (virtio-vsock).
> > > > > > > > > > 2. VM1
> > > > > > > > > >     2.1 device2 (cid=5, default), backend is host kernel (vhost-
> > vsock).
> > > > > > > > > >
> > > > > > > > > > The device0 is able to do VM0-HOST (src_cid=3,
> > > > > > > > > > dst_cid=2) and
> > > > > > > > > > VM0-VM1 (src_cid=3, dst_cid=5) communicatation, while
> > > > > > > > > > the
> > > > > > > > > > device1 is only able to do VM0-HV (src_cid=4, dst_cid=2)
> > > > > > communicatation.
> > > > > > > > > >
> > > > > > > > > In example 1.1 and 1.2 no devices are grouped.
> > > > > > > > > Your proposal of this patch wants to group the two devices
> > > > > > > > > and pick one of
> > > > > > > > them as default device.
> > > > > > > >
> > > > > > > > Yes. I just wonder if it is possible to have more than one
> > > > > > > > groups in one
> > > > guest?
> > > > > > > >
> > > > > > > For example
> > > > > > > Group_1: two devices dev0 and dev1, implemented as PCI HW
> > devices.
> > > > > > > Group_2: two devices by QEMU SW implemented as sw backend.
> > > > > > >
> > > > > > > All the 4 devices has _F bit indicating they can be grouped.
> > > > > > > But there is no indication that they are part of which group.
> > > > > > > And hence the guest VM driver is in dark on how to forward
> > > > > > > requests without
> > > > > > the bind() call.
> > > > > >
> > > > > > I see. Thanks!
> > > > > >
> > > > > > My idea is that there is only one default device, no matter how
> > > > > > many types of backends are. If the users intend to use other
> > > > > > devices, `bind()` call
> > > > is required.
> > > > > >
> > > > > > For example, we set `dev0` as the default device:
> > > > > >
> > > > > > 1. Do not call `bind()`: use dev0; 2. Call `bind(${dev0_cid})`:
> > > > > > use dev1; ...
> > > > > > 5. Call `bind(${dev4_cid})`: use dev4;
> > > > > >
> > > > > > Even though we introduce the group concept, if we don't call
> > > > > > `bind()`, how does driver know which group to use? If the driver
> > > > > > recoginizes the dst_cid, it can use the group to find the
> > > > > > device, then the things will be complicated. The driver needs to
> > > > > > know the
> > > > relationship between the dst_cid and the group.
> > > > > >
> > > > > Based on the dst_cid picking the right vscock group would be
> > > > > needed. This
> > > > is vsock level issue at driver level.
> > > > > Driver would need enough hints or encoding or of dst_cid or
> > > > > something
> > > > else.
> > > > >
> > > > > So even though we miss vsock level construct, it should be the
> > > > > reason to not
> > > > group the devices.
> > > > > As both attempt to solve issue at different level.
> > > > >
> > > > > > WDYT?
> > > > > >
> > > > > > Thanks,
> > > > > > Xuewei
> > > > > >
> > > > > > > > > If device0 and device1 are inserted to the VM0 with the
> > > > > > > > > feature bit you
> > > > > > > > suggested, the guest things that they are part of the same group.
> > > > > > > >
> > > > > > > > In this patch, we don't allow to insert devices without the
> > > > > > > > feature bit if there are already devices with the feature bit.
> > > > > > > >
> > > > > > > In above example of two groups, all the 4 devices spread
> > > > > > > across two groups has
> > > > > > the feature bit set.
> > > > > > > Yet, they cannot be grouped correctly.
> > > > > > > Driver driving blind thinks that all 4 devices are part of the same
> > group.
> > > > > > >
> > > > > > > > As a result, there can be either multiple devices with the
> > > > > > > > feature bit or just a single device.
> > > > > > > >
> > > > > > >
> > > > > > > > > When bind() call is not done, host sw does not know which
> > > > > > > > > device to pick up
> > > > > > > > between device0 and device1 when binding the devices.
> > > > > > > >
> > > > > > > > Are you referring to an vsock application on the host? If
> > > > > > > > yes, "host sw" is able to pick up one device according to
> > > > > > > > the dst cid. For example, pick up
> > > > > > > > device0 if `connect(3)` is called.
> > > > > > > >
> > > > > > > > Please be aware that the "host sw" can not pick up device1,
> > > > > > > > since its device is not in the host kernel.
> > > > > > > >
> > > > > > > > > > In a word, a tuple identifies a connection.
> > > > > > > > > >
> > > > > > > > > > "Refuse to connect" will be raised if the device1
> > > > > > > > > > attempts to connect to the device2. "They are not in the same
> > group"
> > > > > > > > > > is a reasonable
> > > > > > > > explaination.
> > > > > > > > > > Am I right?
> > > > > > > > > >
> > > > > > > > > During bind call, one needs to select the device when the
> > > > > > > > > devices are coming
> > > > > > > > from multiple different backends.
> > > > > > > >
> > > > > > > > Yes.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Xuewei

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v7] virtio-vsock: Add support for multi devices
  2025-06-19  5:10                       ` Xuewei Niu
  2025-06-19  5:25                         ` Parav Pandit
@ 2025-06-23  7:53                         ` Stefano Garzarella
  2025-06-23  8:48                           ` Xuewei Niu
  1 sibling, 1 reply; 59+ messages in thread
From: Stefano Garzarella @ 2025-06-23  7:53 UTC (permalink / raw)
  To: Xuewei Niu; +Cc: parav, fupan.lfp, mst, niuxuewei.nxw, virtio-comment

On Thu, Jun 19, 2025 at 01:10:33PM +0800, Xuewei Niu wrote:
>> > From: Xuewei Niu <niuxuewei97@gmail.com>
>> > Sent: 19 June 2025 08:57 AM
>> >
>> > Hi Parav,
>> >
>> > Could you please take a look at the diagram in [1]?
>> >
>> > IIUC, for VM0, there are two groups, and for VM1, there is only one group.
>> > Am I right? If yes, I think the group concept is reasonable but we don't need
>> > at this time.
>> >
>> > I think the first thing is to figure out how to pick the right group.
>> >
>> > Standard socket doesn't provide a way to access the group information.
>> >
>> > Source and destination are from `bind()` and `connect()`, respectively. If we
>> > don't call `bind()`, only the destination is known.
>> >
>> > However, only destination is not enough to find the group. For example, the
>> > well-known CIDs (e.g. 2) are valid for all groups.
>> >
>> > 1:
>> > https://lore.kernel.org/virtio-comment/20250618095139.1412138-1-niuxuewei.nxw@antgroup.com/
>>
>> Right. Sock addressing scheme is naïve presently to select the group. 
>> Not sure when/how you or others plan to do.
>> This is transport layer problem to solve (not to confuse with transport = pci/mmio etc).


>>
>> However, at device level, we should have the construct of grouping.
>> Without this construct, all devices will be part of single group and one will not be able to build the group concept later.
>> So even if you don't need it explicitly now, grouping the device is what you need when connect() is called.
>>
>> So I was imagining a relatively simple scheme:
>> For example, virtio device level, some kind of group id is present.
>> So two devices which has same group id, are part of single group.
>> An example group id format can be a UUID.
>>
>> And this is completely optional for devices to implement.
>> Generic enough and usable beyond just vsock device in other use cases 
>> we discussed in past.
>
>Fair enough.
>
>@Stefano, could you please take a look at this? I'd love to have some 
>input
>from you.
>
>A brief summary of the idea is: The config space will be extended to
>include a group id. The devices with the same group id are considered 
>to be
>in the same group.

Thanks for the summary, but please avoid top posting, otherwise is very 
hard to follow the discussion :-(
https://subspace.kernel.org/etiquette.html#do-not-top-post-when-replying

I like the idea of groups. What is not clear to me, is how groups will 
allow the driver to select the default output device when the source 
socket is not bind to any source CID. But if it's already discussed, 
please go head and I'll check the next version.

Just a note, AF_VSOCK is suppose to be very similar to AF_UNIX. It's a 
point ot point connection, we don't have any transport layer like TCP.
What we call "transport" in AF_VSOCK world, is usually the driver/device 
usend to send data (e.g. vmci, virtio, vhost, hyperv).

Thanks,
Stefano


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v7] virtio-vsock: Add support for multi devices
  2025-06-19  2:42                                   ` Xuewei Niu
@ 2025-06-23  8:01                                     ` Jason Wang
  2025-06-23  9:47                                       ` Xuewei Niu
  0 siblings, 1 reply; 59+ messages in thread
From: Jason Wang @ 2025-06-23  8:01 UTC (permalink / raw)
  To: Xuewei Niu; +Cc: fupan.lfp, mst, niuxuewei.nxw, parav, sgarzare, virtio-comment

On Thu, Jun 19, 2025 at 10:42 AM Xuewei Niu <niuxuewei97@gmail.com> wrote:
>
> > On Wed, Jun 18, 2025 at 5:51 PM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > >
> > > > On Wed, Jun 18, 2025 at 1:40 PM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > >
> > > > > > On Wed, Jun 18, 2025 at 10:47 AM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > > > >
> > > > > > > > On Tue, Jun 17, 2025 at 3:46 PM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > Resend, because it isn’t listed in the mailing list due to my mistake.
> > > > > > > > >
> > > > > > > > > > On Mon, Jun 16, 2025 at 4:38 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Mon, 16 Jun 2025 at 10:29, Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > On Fri, Jun 13, 2025 at 4:46 PM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Fri, 13 Jun 2025 at 06:57, Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Sat, Apr 12, 2025 at 10:39 PM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > This patch brings a new feature, called "multi devices", to the virtio
> > > > > > > > > > > > > > > > > > vsock. It introduces a "VIRTIO_VSOCK_F_MULTI_DEVICES" feature bit, and a
> > > > > > > > > > > > > > > > > > "device_order" field to the config for the virtio vsock.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > == Motivition ==
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Vsock is a lightweight and widely used data exchange mechanism between host
> > > > > > > > > > > > > > > > > > and guest. Currently, the virtio-vsock only supports one device, resulting
> > > > > > > > > > > > > > > > > > in the inability to enable more than one backend.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I wonder which part of the spec forbids more than one device.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > No. The spec, however, is designed for a single device, and lacks some
> > > > > > > > > > > > > > > > specifications for multiple devices.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > For example, we should have a mechanism to select a device from all to
> > > > > > > > > > > > > > > > communicate with a peer.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I wonder if this is a:
> > > > > > > > > > > > >
> > > > > > > > > > > > > 1) mechanism that needs to be mandated by the device
> > > > > > > > > > > >
> > > > > > > > > > > > Yes.
> > > > > > > > > > > >
> > > > > > > > > > > > > 2) a policy that is allowed to be tweaked by the user as TCP/IP did
> > > > > > > > > > > >
> > > > > > > > > > > > Not allowed in the current version.
> > > > > > > > > > > >
> > > > > > > > > > > > > (Note anyhow the driver can override what the device suggests...)
> > > > > > > > > > >
> > > > > > > > > > > I think we should follow what we described in the spec:
> > > > > > > > > > > "The virtio socket device is a zero-configuration socket communications device."
> > > > > > > > > >
> > > > > > > > > > We probably need to define "configuration" first.
> > > > > > > > > >
> > > > > > > > > > For example, if it means zero configuration from the user, it does not
> > > > > > > > > > conflict with 2), the driver can use its own algorithm to elect a
> > > > > > > > > > "default" device.
> > > > > > > > >
> > > > > > > > > IMHO, it can be done, but it is not the current design.
> > > > > > > >
> > > > > > > > Well, you need at least explain the advantages or why you choose to do this.
> > > > > > >
> > > > > > > I listed in the previous message. Maybe it is not clear enough. I'll try to
> > > > > > > explain it again.
> > > > > > >
> > > > > > > I think people should pick up the device by a `bind()` call, which takes a
> > > > > > > CID as an argument.
> > > > > > >
> > > > > > > Generally, the device is picked up by the source CID, which is achieved
> > > > > > > through a `bind()` call.
> > > > > > >
> > > > > > > The default device, which is equivalent to the current single device, is
> > > > > > > used to be compatible with the existing applications.
> > > > > > >
> > > > > > > Apart from that, the default device is used to communicate with hypervisor
> > > > > > > for some init works, such as gathering information about other vsock
> > > > > > > devices.
> > > > > > >
> > > > > > > To summarize, users must do `bind()` call explicitly to select the desired
> > > > > > > device for non-HV-VM communications.
> > > > > >
> > > > > > So if I understand correctly you need a way to select the default when
> > > > > > bind() is not called?
> > > > >
> > > > > Yes.
> > > > >
> > > > > > > > > I prefer to set the default device for VM-HV communication to do some init
> > > > > > > > > work. I don't think people have a strong need for this.
> > > > > > > > >
> > > > > > > > > > Another perspective, making decisions in guests may be even more
> > > > > > > > > > helpful for the case where the device is not trusted
> > > > > > > > >
> > > > > > > > > How does the guest realize the vsock device is not trusted?
> > > > > > > >
> > > > > > > > There're various ways to build trust (for example device attestation)
> > > > > > > > and more might come in the future.
> > > > > > > >
> > > > > > > > > The guest only
> > > > > > > > > knows the information from its config space, which is provided by the host.
> > > > > > > >
> > > > > > > > The way to build trust is probably beyond the scope of virtio, but it
> > > > > > > > is something we need to consider.
> > > > > > >
> > > > > > > Agree with you. If it comes in the future, I think the driver should have
> > > > > > > the ability to make decisions.
> > > > > >
> > > > > >
> > > > > > Actually, I meant the way to build trust via virtio is something that
> > > > > > needs to be considered. But now we have other ways to build trust.
> > > > > > That would result a situlation:
> > > > > >
> > > > > > 1) "default" vsock device is not trusted but other might
> > > > >
> > > > > I think this topic might be beyond the scope of this patch.
> > > > >
> > > > > With the current version, there is only one device supported, which can be
> > > > > considered as the "default".  We don't have a mechanism to say "we don't
> > > > > trust you", right?
> > > >
> > > > No. I meant we don't have it in the virtio core but we already have it
> > > > in other layers (for example the transport layer).
> > >
> > > Are you referring to virito-vsock transport layer?
> >
> > Yes, for example the PCI layer.
> >
> > >
> > > > > That is, it is assumed that we trust the device provided
> > > > > by the hypervisor.
> > > > >
> > > > > This patch is for multiple devices support, based on the same assumption. I
> > > > > think trust is a good point to consider, but perhaps we have to address it
> > > > > in the follow-up patches.
> > > >
> > > > My point is not about how to build trust, it's about letting the
> > > > driver decide by itself in some cases.
> > > >
> > > > A better way might be something like:
> > > >
> > > > "The device_order is a hint for the driver to select a default vsock
> > > > device. Device MAY choose ...."
> > >
> > > Okay, I'll do that.
> > >
> > > > >
> > > > > > or
> > > > > >
> > > > > > 2) two device claims that they are all "default"
> > > > > >
> > > > > > This means anyhow we need a decision from the driver side so the
> > > > > > device side order seems to be useless here.
> > > > >
> > > > > There is only one default device. The device with the lowest device_order
> > > > > is considered the default. It is not allowed to have the same device_order.
> > > >
> > > > Who can forbid two same device_order? Note that in various security
> > > > models, hypervisors are not trusted at all.
> > >
> > > The driver does. Indeed, the driver is able to deny devices if they violate
> > > the spec.
> >
> > Yes, that's the point, anyhow driver need to do the decision.
> >
> > >
> > > > > > > > > > > So, IMO the guest (driver) should not be allowed to change anything.
> > > > > > > > > > > E.g. Right now it's not allowed to change the CID assigned by the host (device).
> > > > > > > > > >
> > > > > > > > > > A dumb question when having two cids (cid1 and cid2) in the same
> > > > > > > > > > guest, what happens if src=cid1 and dst=cid2?
> > > > > > > > >
> > > > > > > > > The packets will be directed to another application, if any, on the same guest.
> > > > > > > >
> > > > > > > > Ok, so in your example, vhost-user-vsock should route those packets
> > > > > > > > back to kernel vsock?
> > > > > > >
> > > > > > > It depends. Two cases are possible:
> > > > > > >
> > > > > > > 1. dev0(cid1) and dev1(cid2) are from the same type of backend, e.g.
> > > > > > > vhost-vsock, so the packets will be routed back as you described.
> > > > > > > 2. dev0(cid1) from vhost-user-vsock, and dev1(cid2) from vhost-vsock, the
> > > > > > > packets will be dropped if vhost-user-vsock device can't find a device
> > > > > > > whose cid is cid2.
> > > > > >
> > > > > > Well, this means the behavior depends on the implementation which is not good.
> > > > >
> > > > > No, it is not. It depends on whether the device can find a proper
> > > > > target device based on the dst_cid.
> > > >
> > > > This sounds really weird, two cids belong to the same guest. So guests
> > > > should expect that the two vsock devic can talk to each other?
> > >
> > > Yes, a little bit complicated.
> > >
> > > Two constraints should be applied:
> > >
> > > 1. No CID conflicts within the driver;
> > > 2. No CID conflicts within the address space.
> > >
> > > Here is a diagram to illustrate the situation where it does not violate the
> > > constraints:
> > >
> > >  ┌─ kernel─(as0)────────────────────────────────────────────────────┐
> > >  │   ┌───────────┐                      ┌───────────┐  ┌───────────┐│
> > >  │   │ dev0(cid0)│                      │ dev2(cid1)│  │ dev3(cid2)││
> > >  │   └───┬───────┘                      └───┬───────┘  └────────┬──┘│
> > >  └───────┼──────────────────────────────────┼───────────────────┼───┘
> > >   ┌──────┼───────────────────────┐   ┌──────┼───────────────────┼───┐
> > >   │      │  ┌───────────┐        │   │      │  ┌───────────┐    │   │
> > >   │      └──► dev0(cid0)│        │   │      └──► dev2(cid1)│    │   │
> > >   │         └───────────┘        │   │         └───────────┘    │   │
> > >   │         ┌───────────┐        │   │         ┌───────────┐    │   │
> > >   │      ┌──► dev1(cid1)│        │   │         │ dev3(cid2)◄────┘   │
> > >   │      │  └───────────┘     VM0│   │         └───────────┘     VM1│
> > >   └──────┼───────────────────────┘   └──────────────────────────────┘
> > >    vhost-user-vsock
> > > ┌────────┼──────────────────────────────────────────────────────────┐
> > > │   ┌────┼──────┐                                                   │
> > > │   │ dev1(cid1)│                                                   │
> > > │   └───────────┘                                                   │
> > > └─userapp─(as1)─────────────────────────────────────────────────────┘
> > >
> > > - VM0
> > >     - dev0
> > >         - dst_cid = 2 (well-known cid): connect to host;
> > >         - dst_cid = cid1: connect to dev2 (they are in the same as0);
> > >         - dst_cid = cid2: connect to dev3;
> > >     - dev1
> > >         - dst_cid = 2 (well-known cid): connect to userapp;
> > >         - dst_cid = cid0: failure (no cid0 is available in as1, even though
> > >         cid1 is available in the VM0);
> > >         - dst_cid = cid1: connect to dev0;
> > > - VM1
> > >     - dev2
> > >         - dst_cid = 2 (well-known cid): connect to host;
> > >         - dst_cid = cid0: connect to dev0;
> > >         - dst_cid = cid2: connect to dev3;
> > >     - dev3: skip the same as dev2.
> > >
> > > So back to your question, my answer is that it depends on the address
> > > space. Hope it could be helpful.
> >
> > This brings an interesting question, for example if vm0 tries to
> > connect to vm1, how does it know which device it needs to use (lacking
> > the concept like switch/route/address announcing etc...)?
>
> The HV should maintain a table for that. The guest things firstly
> communicate with the HV, through the default device, to know which device
> to use.

It's still not clear to me how things work. For example, we had a
guest1 with two cid 4 ("default"),5 another guest2 with one cid 6. You
meant guest1 needs to ask the host to know about which device is
connected to 6? Or actually any device in guest1 can be used to
connected to guest2?

Thanks

>
> Thanks,
> Xuewei
>


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v7] virtio-vsock: Add support for multi devices
  2025-06-23  7:53                         ` Stefano Garzarella
@ 2025-06-23  8:48                           ` Xuewei Niu
  2025-06-23  9:16                             ` Stefano Garzarella
  0 siblings, 1 reply; 59+ messages in thread
From: Xuewei Niu @ 2025-06-23  8:48 UTC (permalink / raw)
  To: sgarzare
  Cc: fupan.lfp, mst, niuxuewei.nxw, niuxuewei97, parav, virtio-comment

> On Thu, Jun 19, 2025 at 01:10:33PM +0800, Xuewei Niu wrote:
> >> > From: Xuewei Niu <niuxuewei97@gmail.com>
> >> > Sent: 19 June 2025 08:57 AM
> >> >
> >> > Hi Parav,
> >> >
> >> > Could you please take a look at the diagram in [1]?
> >> >
> >> > IIUC, for VM0, there are two groups, and for VM1, there is only one group.
> >> > Am I right? If yes, I think the group concept is reasonable but we don't need
> >> > at this time.
> >> >
> >> > I think the first thing is to figure out how to pick the right group.
> >> >
> >> > Standard socket doesn't provide a way to access the group information.
> >> >
> >> > Source and destination are from `bind()` and `connect()`, respectively. If we
> >> > don't call `bind()`, only the destination is known.
> >> >
> >> > However, only destination is not enough to find the group. For example, the
> >> > well-known CIDs (e.g. 2) are valid for all groups.
> >> >
> >> > 1:
> >> > https://lore.kernel.org/virtio-comment/20250618095139.1412138-1-niuxuewei.nxw@antgroup.com/
> >>
> >> Right. Sock addressing scheme is naïve presently to select the group. 
> >> Not sure when/how you or others plan to do.
> >> This is transport layer problem to solve (not to confuse with transport = pci/mmio etc).
> 
> 
> >>
> >> However, at device level, we should have the construct of grouping.
> >> Without this construct, all devices will be part of single group and one will not be able to build the group concept later.
> >> So even if you don't need it explicitly now, grouping the device is what you need when connect() is called.
> >>
> >> So I was imagining a relatively simple scheme:
> >> For example, virtio device level, some kind of group id is present.
> >> So two devices which has same group id, are part of single group.
> >> An example group id format can be a UUID.
> >>
> >> And this is completely optional for devices to implement.
> >> Generic enough and usable beyond just vsock device in other use cases 
> >> we discussed in past.
> >
> >Fair enough.
> >
> >@Stefano, could you please take a look at this? I'd love to have some 
> >input
> >from you.
> >
> >A brief summary of the idea is: The config space will be extended to
> >include a group id. The devices with the same group id are considered 
> >to be
> >in the same group.
> 
> Thanks for the summary, but please avoid top posting, otherwise is very 
> hard to follow the discussion :-(
> https://subspace.kernel.org/etiquette.html#do-not-top-post-when-replying

Sorry, I'll avoid it in the future.

> I like the idea of groups. What is not clear to me, is how groups will 
> allow the driver to select the default output device when the source 
> socket is not bind to any source CID.

Well, we did discuss, but we need your input.

I said in the thread [1] based on the standard socket API, the driver can't
pick a group. Parav [2] suggested that the group, as a basic concept,
should be present even if we are unable to use it.

IMHO, we might use "{group_id}-{cid}" as the vsock addressing scheme to use
the concept. But it is a very big change, leading to incompatibility with
the existing apps.

I think it might be beyond the scope of this patch, and would make the
vsock more complex. The current conclusion is that we will keep the concept
of grouping as a placeholder, but we will not use it.

1: https://lore.kernel.org/virtio-comment/20250613083633.1087589-1-niuxuewei.nxw@antgroup.com/T/#mcfa6ca71da3147930e3d4edcf3c1ef097f808d4d
2: https://lore.kernel.org/virtio-comment/20250613083633.1087589-1-niuxuewei.nxw@antgroup.com/T/#m3e97c051721f053da4d15e0c531c6f6c8ff38f60

> But if it's already discussed, 
> please go head and I'll check the next version.
> 
> Just a note, AF_VSOCK is suppose to be very similar to AF_UNIX. It's a 
> point ot point connection, we don't have any transport layer like TCP.
> What we call "transport" in AF_VSOCK world, is usually the driver/device 
> usend to send data (e.g. vmci, virtio, vhost, hyperv).

Thanks for the clarification, sometimes I am lost in the many "transports".

Thanks,
Xuewei

> Thanks,
> Stefano

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v7] virtio-vsock: Add support for multi devices
  2025-06-23  8:48                           ` Xuewei Niu
@ 2025-06-23  9:16                             ` Stefano Garzarella
  2025-06-23 10:35                               ` Xuewei Niu
  0 siblings, 1 reply; 59+ messages in thread
From: Stefano Garzarella @ 2025-06-23  9:16 UTC (permalink / raw)
  To: Xuewei Niu; +Cc: fupan.lfp, mst, niuxuewei.nxw, parav, virtio-comment

On Mon, Jun 23, 2025 at 04:48:33PM +0800, Xuewei Niu wrote:
>> On Thu, Jun 19, 2025 at 01:10:33PM +0800, Xuewei Niu wrote:
>> >> > From: Xuewei Niu <niuxuewei97@gmail.com>
>> >> > Sent: 19 June 2025 08:57 AM
>> >> >
>> >> > Hi Parav,
>> >> >
>> >> > Could you please take a look at the diagram in [1]?
>> >> >
>> >> > IIUC, for VM0, there are two groups, and for VM1, there is only one group.
>> >> > Am I right? If yes, I think the group concept is reasonable but we don't need
>> >> > at this time.
>> >> >
>> >> > I think the first thing is to figure out how to pick the right group.
>> >> >
>> >> > Standard socket doesn't provide a way to access the group information.
>> >> >
>> >> > Source and destination are from `bind()` and `connect()`, respectively. If we
>> >> > don't call `bind()`, only the destination is known.
>> >> >
>> >> > However, only destination is not enough to find the group. For example, the
>> >> > well-known CIDs (e.g. 2) are valid for all groups.
>> >> >
>> >> > 1:
>> >> > https://lore.kernel.org/virtio-comment/20250618095139.1412138-1-niuxuewei.nxw@antgroup.com/
>> >>
>> >> Right. Sock addressing scheme is naïve presently to select the group.
>> >> Not sure when/how you or others plan to do.
>> >> This is transport layer problem to solve (not to confuse with transport = pci/mmio etc).
>>
>>
>> >>
>> >> However, at device level, we should have the construct of grouping.
>> >> Without this construct, all devices will be part of single group and one will not be able to build the group concept later.
>> >> So even if you don't need it explicitly now, grouping the device is what you need when connect() is called.
>> >>
>> >> So I was imagining a relatively simple scheme:
>> >> For example, virtio device level, some kind of group id is present.
>> >> So two devices which has same group id, are part of single group.
>> >> An example group id format can be a UUID.
>> >>
>> >> And this is completely optional for devices to implement.
>> >> Generic enough and usable beyond just vsock device in other use cases
>> >> we discussed in past.
>> >
>> >Fair enough.
>> >
>> >@Stefano, could you please take a look at this? I'd love to have some
>> >input
>> >from you.
>> >
>> >A brief summary of the idea is: The config space will be extended to
>> >include a group id. The devices with the same group id are considered
>> >to be
>> >in the same group.
>>
>> Thanks for the summary, but please avoid top posting, otherwise is very
>> hard to follow the discussion :-(
>> https://subspace.kernel.org/etiquette.html#do-not-top-post-when-replying
>
>Sorry, I'll avoid it in the future.
>
>> I like the idea of groups. What is not clear to me, is how groups will
>> allow the driver to select the default output device when the source
>> socket is not bind to any source CID.
>
>Well, we did discuss, but we need your input.
>
>I said in the thread [1] based on the standard socket API, the driver can't
>pick a group. Parav [2] suggested that the group, as a basic concept,
>should be present even if we are unable to use it.
>
>IMHO, we might use "{group_id}-{cid}" as the vsock addressing scheme to use
>the concept. But it is a very big change, leading to incompatibility with
>the existing apps.

I still don't understand how the group_id will work :-( and how will 
allow the driver to pick the default device.

This is the `sockaddr_vm`, so we should be careful of extending it:

struct sockaddr_vm {
	__kernel_sa_family_t svm_family;
	unsigned short svm_reserved1;
	unsigned int svm_port;
	unsigned int svm_cid;

#define VMADDR_FLAG_TO_HOST 0x01

	__u8 svm_flags;
	unsigned char svm_zero[sizeof(struct sockaddr) -
			       sizeof(sa_family_t) -
			       sizeof(unsigned short) -
			       sizeof(unsigned int) -
			       sizeof(unsigned int) -
			       sizeof(__u8)];
};


>
>I think it might be beyond the scope of this patch, and would make the
>vsock more complex. The current conclusion is that we will keep the concept
>of grouping as a placeholder, but we will not use it.

IMO we should first clarify better what we want to support.
As I already suggested some months ago, IMHO supporting any number of 
vsock devices for a VM it's not really needed for your goal and I can't 
see other use cases where a virtio-net device can't be use. Just a 
reminder, vsock is not a network device, is more a P2P device where we 
want to keep the configuration in the guest as simpler as possible (we 
don't want to run ARP, DHCP, etc.).
Till now vsock was more used just for guest-host communication, but 
recently it was extended to communicate with sibling VMs.

IIUC your use case, we just need to support different type of vsock 
devices attached to the VM. With "type" I mean type of address handled.
I think we can define 3 types based on the CID we have:
- hypervisor: VMADDR_CID_HYPERVISOR(0)
- host: VMADDR_CID_HOST(2)
- sibling: CID >=3

The vhost-vsock device handles only VMADDR_CID_HOST, so it doesn't allow 
to reach from the guest any other CIDs.
The vhost-user-vsock recently started to support sibling VMs.
The tcp-over-vsock of libkrun should use VMADDR_CID_HYPERVISOR(0).

So, IMHO we should define new features or config flags that a device can 
expose depending on which address is able to handle.
Also, in order to avoid to overcomplicate vsock, we should allow only 
one device for each type (a single device should support multiple types, 
but only one device can be registered for a type).

If we don't want this limitation, I think we need to overcomplicate 
vsock and define some kind of discovery algorithm that the guest driver 
should run to understand which CIDs are reachable for each device.
But I'm not sure we really want this; if this is a use case, the guest 
should use a virtio-net device that already support this pretty well, so 
before going on this direction, we should define better why we want to 
complicate vsock, instead of using a net device.

Thanks,
Stefano


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v7] virtio-vsock: Add support for multi devices
  2025-06-23  8:01                                     ` Jason Wang
@ 2025-06-23  9:47                                       ` Xuewei Niu
  2025-06-24  0:51                                         ` Jason Wang
  0 siblings, 1 reply; 59+ messages in thread
From: Xuewei Niu @ 2025-06-23  9:47 UTC (permalink / raw)
  To: jasowang
  Cc: fupan.lfp, mst, niuxuewei.nxw, niuxuewei97, parav, sgarzare,
	virtio-comment

> On Thu, Jun 19, 2025 at 10:42 AM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> >
> > > On Wed, Jun 18, 2025 at 5:51 PM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > >
> > > > > On Wed, Jun 18, 2025 at 1:40 PM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > > >
> > > > > > > On Wed, Jun 18, 2025 at 10:47 AM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > > > > >
> > > > > > > > > On Tue, Jun 17, 2025 at 3:46 PM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > > > > > > >
> > > > > > > > > > Resend, because it isn’t listed in the mailing list due to my mistake.
> > > > > > > > > >
> > > > > > > > > > > On Mon, Jun 16, 2025 at 4:38 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, 16 Jun 2025 at 10:29, Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > On Fri, Jun 13, 2025 at 4:46 PM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Fri, 13 Jun 2025 at 06:57, Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On Sat, Apr 12, 2025 at 10:39 PM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > This patch brings a new feature, called "multi devices", to the virtio
> > > > > > > > > > > > > > > > > > > vsock. It introduces a "VIRTIO_VSOCK_F_MULTI_DEVICES" feature bit, and a
> > > > > > > > > > > > > > > > > > > "device_order" field to the config for the virtio vsock.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > == Motivition ==
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Vsock is a lightweight and widely used data exchange mechanism between host
> > > > > > > > > > > > > > > > > > > and guest. Currently, the virtio-vsock only supports one device, resulting
> > > > > > > > > > > > > > > > > > > in the inability to enable more than one backend.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > I wonder which part of the spec forbids more than one device.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > No. The spec, however, is designed for a single device, and lacks some
> > > > > > > > > > > > > > > > > specifications for multiple devices.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > For example, we should have a mechanism to select a device from all to
> > > > > > > > > > > > > > > > > communicate with a peer.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I wonder if this is a:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 1) mechanism that needs to be mandated by the device
> > > > > > > > > > > > >
> > > > > > > > > > > > > Yes.
> > > > > > > > > > > > >
> > > > > > > > > > > > > > 2) a policy that is allowed to be tweaked by the user as TCP/IP did
> > > > > > > > > > > > >
> > > > > > > > > > > > > Not allowed in the current version.
> > > > > > > > > > > > >
> > > > > > > > > > > > > > (Note anyhow the driver can override what the device suggests...)
> > > > > > > > > > > >
> > > > > > > > > > > > I think we should follow what we described in the spec:
> > > > > > > > > > > > "The virtio socket device is a zero-configuration socket communications device."
> > > > > > > > > > >
> > > > > > > > > > > We probably need to define "configuration" first.
> > > > > > > > > > >
> > > > > > > > > > > For example, if it means zero configuration from the user, it does not
> > > > > > > > > > > conflict with 2), the driver can use its own algorithm to elect a
> > > > > > > > > > > "default" device.
> > > > > > > > > >
> > > > > > > > > > IMHO, it can be done, but it is not the current design.
> > > > > > > > >
> > > > > > > > > Well, you need at least explain the advantages or why you choose to do this.
> > > > > > > >
> > > > > > > > I listed in the previous message. Maybe it is not clear enough. I'll try to
> > > > > > > > explain it again.
> > > > > > > >
> > > > > > > > I think people should pick up the device by a `bind()` call, which takes a
> > > > > > > > CID as an argument.
> > > > > > > >
> > > > > > > > Generally, the device is picked up by the source CID, which is achieved
> > > > > > > > through a `bind()` call.
> > > > > > > >
> > > > > > > > The default device, which is equivalent to the current single device, is
> > > > > > > > used to be compatible with the existing applications.
> > > > > > > >
> > > > > > > > Apart from that, the default device is used to communicate with hypervisor
> > > > > > > > for some init works, such as gathering information about other vsock
> > > > > > > > devices.
> > > > > > > >
> > > > > > > > To summarize, users must do `bind()` call explicitly to select the desired
> > > > > > > > device for non-HV-VM communications.
> > > > > > >
> > > > > > > So if I understand correctly you need a way to select the default when
> > > > > > > bind() is not called?
> > > > > >
> > > > > > Yes.
> > > > > >
> > > > > > > > > > I prefer to set the default device for VM-HV communication to do some init
> > > > > > > > > > work. I don't think people have a strong need for this.
> > > > > > > > > >
> > > > > > > > > > > Another perspective, making decisions in guests may be even more
> > > > > > > > > > > helpful for the case where the device is not trusted
> > > > > > > > > >
> > > > > > > > > > How does the guest realize the vsock device is not trusted?
> > > > > > > > >
> > > > > > > > > There're various ways to build trust (for example device attestation)
> > > > > > > > > and more might come in the future.
> > > > > > > > >
> > > > > > > > > > The guest only
> > > > > > > > > > knows the information from its config space, which is provided by the host.
> > > > > > > > >
> > > > > > > > > The way to build trust is probably beyond the scope of virtio, but it
> > > > > > > > > is something we need to consider.
> > > > > > > >
> > > > > > > > Agree with you. If it comes in the future, I think the driver should have
> > > > > > > > the ability to make decisions.
> > > > > > >
> > > > > > >
> > > > > > > Actually, I meant the way to build trust via virtio is something that
> > > > > > > needs to be considered. But now we have other ways to build trust.
> > > > > > > That would result a situlation:
> > > > > > >
> > > > > > > 1) "default" vsock device is not trusted but other might
> > > > > >
> > > > > > I think this topic might be beyond the scope of this patch.
> > > > > >
> > > > > > With the current version, there is only one device supported, which can be
> > > > > > considered as the "default".  We don't have a mechanism to say "we don't
> > > > > > trust you", right?
> > > > >
> > > > > No. I meant we don't have it in the virtio core but we already have it
> > > > > in other layers (for example the transport layer).
> > > >
> > > > Are you referring to virito-vsock transport layer?
> > >
> > > Yes, for example the PCI layer.
> > >
> > > >
> > > > > > That is, it is assumed that we trust the device provided
> > > > > > by the hypervisor.
> > > > > >
> > > > > > This patch is for multiple devices support, based on the same assumption. I
> > > > > > think trust is a good point to consider, but perhaps we have to address it
> > > > > > in the follow-up patches.
> > > > >
> > > > > My point is not about how to build trust, it's about letting the
> > > > > driver decide by itself in some cases.
> > > > >
> > > > > A better way might be something like:
> > > > >
> > > > > "The device_order is a hint for the driver to select a default vsock
> > > > > device. Device MAY choose ...."
> > > >
> > > > Okay, I'll do that.
> > > >
> > > > > >
> > > > > > > or
> > > > > > >
> > > > > > > 2) two device claims that they are all "default"
> > > > > > >
> > > > > > > This means anyhow we need a decision from the driver side so the
> > > > > > > device side order seems to be useless here.
> > > > > >
> > > > > > There is only one default device. The device with the lowest device_order
> > > > > > is considered the default. It is not allowed to have the same device_order.
> > > > >
> > > > > Who can forbid two same device_order? Note that in various security
> > > > > models, hypervisors are not trusted at all.
> > > >
> > > > The driver does. Indeed, the driver is able to deny devices if they violate
> > > > the spec.
> > >
> > > Yes, that's the point, anyhow driver need to do the decision.
> > >
> > > >
> > > > > > > > > > > > So, IMO the guest (driver) should not be allowed to change anything.
> > > > > > > > > > > > E.g. Right now it's not allowed to change the CID assigned by the host (device).
> > > > > > > > > > >
> > > > > > > > > > > A dumb question when having two cids (cid1 and cid2) in the same
> > > > > > > > > > > guest, what happens if src=cid1 and dst=cid2?
> > > > > > > > > >
> > > > > > > > > > The packets will be directed to another application, if any, on the same guest.
> > > > > > > > >
> > > > > > > > > Ok, so in your example, vhost-user-vsock should route those packets
> > > > > > > > > back to kernel vsock?
> > > > > > > >
> > > > > > > > It depends. Two cases are possible:
> > > > > > > >
> > > > > > > > 1. dev0(cid1) and dev1(cid2) are from the same type of backend, e.g.
> > > > > > > > vhost-vsock, so the packets will be routed back as you described.
> > > > > > > > 2. dev0(cid1) from vhost-user-vsock, and dev1(cid2) from vhost-vsock, the
> > > > > > > > packets will be dropped if vhost-user-vsock device can't find a device
> > > > > > > > whose cid is cid2.
> > > > > > >
> > > > > > > Well, this means the behavior depends on the implementation which is not good.
> > > > > >
> > > > > > No, it is not. It depends on whether the device can find a proper
> > > > > > target device based on the dst_cid.
> > > > >
> > > > > This sounds really weird, two cids belong to the same guest. So guests
> > > > > should expect that the two vsock devic can talk to each other?
> > > >
> > > > Yes, a little bit complicated.
> > > >
> > > > Two constraints should be applied:
> > > >
> > > > 1. No CID conflicts within the driver;
> > > > 2. No CID conflicts within the address space.
> > > >
> > > > Here is a diagram to illustrate the situation where it does not violate the
> > > > constraints:
> > > >
> > > >  ┌─ kernel─(as0)────────────────────────────────────────────────────┐
> > > >  │   ┌───────────┐                      ┌───────────┐  ┌───────────┐│
> > > >  │   │ dev0(cid0)│                      │ dev2(cid1)│  │ dev3(cid2)││
> > > >  │   └───┬───────┘                      └───┬───────┘  └────────┬──┘│
> > > >  └───────┼──────────────────────────────────┼───────────────────┼───┘
> > > >   ┌──────┼───────────────────────┐   ┌──────┼───────────────────┼───┐
> > > >   │      │  ┌───────────┐        │   │      │  ┌───────────┐    │   │
> > > >   │      └──► dev0(cid0)│        │   │      └──► dev2(cid1)│    │   │
> > > >   │         └───────────┘        │   │         └───────────┘    │   │
> > > >   │         ┌───────────┐        │   │         ┌───────────┐    │   │
> > > >   │      ┌──► dev1(cid1)│        │   │         │ dev3(cid2)◄────┘   │
> > > >   │      │  └───────────┘     VM0│   │         └───────────┘     VM1│
> > > >   └──────┼───────────────────────┘   └──────────────────────────────┘
> > > >    vhost-user-vsock
> > > > ┌────────┼──────────────────────────────────────────────────────────┐
> > > > │   ┌────┼──────┐                                                   │
> > > > │   │ dev1(cid1)│                                                   │
> > > > │   └───────────┘                                                   │
> > > > └─userapp─(as1)─────────────────────────────────────────────────────┘
> > > >
> > > > - VM0
> > > >     - dev0
> > > >         - dst_cid = 2 (well-known cid): connect to host;
> > > >         - dst_cid = cid1: connect to dev2 (they are in the same as0);
> > > >         - dst_cid = cid2: connect to dev3;
> > > >     - dev1
> > > >         - dst_cid = 2 (well-known cid): connect to userapp;
> > > >         - dst_cid = cid0: failure (no cid0 is available in as1, even though
> > > >         cid1 is available in the VM0);
> > > >         - dst_cid = cid1: connect to dev0;
> > > > - VM1
> > > >     - dev2
> > > >         - dst_cid = 2 (well-known cid): connect to host;
> > > >         - dst_cid = cid0: connect to dev0;
> > > >         - dst_cid = cid2: connect to dev3;
> > > >     - dev3: skip the same as dev2.
> > > >
> > > > So back to your question, my answer is that it depends on the address
> > > > space. Hope it could be helpful.
> > >
> > > This brings an interesting question, for example if vm0 tries to
> > > connect to vm1, how does it know which device it needs to use (lacking
> > > the concept like switch/route/address announcing etc...)?
> >
> > The HV should maintain a table for that. The guest things firstly
> > communicate with the HV, through the default device, to know which device
> > to use.
> 
> It's still not clear to me how things work. For example, we had a
> guest1 with two cid 4 ("default"),5 another guest2 with one cid 6. You
> meant guest1 needs to ask the host to know about which device is
> connected to 6? Or actually any device in guest1 can be used to
> connected to guest2?

I think it is application-level, like micro services? Here are some
diagrams to illustrate the process:

Step1: app0 asks the host: "I want to connect to app1 inside VM1";
Step2: host replies: "Please use source cid=5, and destination cid=6"

+────────+         +───────────────────────────────+    +──────────────────────────────+
│  host  │         │+───────────────+          VM0 │    │                         VM1  │
│service │◀─step1──┤│dev0(cid=4,def)│──┐           │    │                              │
+────────+         │+───────────────+  │           │    │                              │
                   │+───────────────+  │  +───────+│    │+───────────────+     +──────+│
                   ││  dev1(cid=5)  │  └──│ app0  ││    ││dev1(cid=6,def)│     │ app1 ││
                   │+───────────────+     +───────+│    │+───────────────+     +──────+│
                   +───────────────────────────────+    +──────────────────────────────+

Step3: app0 binds (5, -1), and connects to (6, {PORT}) to establish a
connection.

+────────+         +───────────────────────────────+    +──────────────────────────────+
│  host  │         │+───────────────+          VM0 │    │                         VM1  │
│service │         ││dev0(cid=4,def)│              │    │                              │
+────────+         │+───────────────+              │    │                              │
                   │+──────────────bind(5, -1)────+│    │+───────────────+     +──────+│
                 ┌─┤│  dev1(cid=5)  │◀────│ app0  ││    ││dev1(cid=6,def)│─────▶ app1 ││
                 │ │+───────────────+     +───────+│    │+───────────────+     +──────+│
                 │ +───────────────────────────────+    +────────▲─────────────────────+
                 │                                               │                      
                 └───────────────connect(6,PORT)─────────────────┘                      

This is a little bit complicated, but it is dynamic. Another way is to
assign a fixed CID for app1, so app0 can always connect to it without
needing to ask for the host service.

Thanks,
Xuewei

> Thanks
> 
> >
> > Thanks,
> > Xuewei
> >

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v7] virtio-vsock: Add support for multi devices
  2025-06-23  9:16                             ` Stefano Garzarella
@ 2025-06-23 10:35                               ` Xuewei Niu
  2025-06-23 11:01                                 ` Xuewei Niu
  2025-06-23 11:15                                 ` Stefano Garzarella
  0 siblings, 2 replies; 59+ messages in thread
From: Xuewei Niu @ 2025-06-23 10:35 UTC (permalink / raw)
  To: sgarzare
  Cc: fupan.lfp, mst, niuxuewei.nxw, niuxuewei97, parav, virtio-comment

> On Mon, Jun 23, 2025 at 04:48:33PM +0800, Xuewei Niu wrote:
> >> On Thu, Jun 19, 2025 at 01:10:33PM +0800, Xuewei Niu wrote:
> >> >> > From: Xuewei Niu <niuxuewei97@gmail.com>
> >> >> > Sent: 19 June 2025 08:57 AM
> >> >> >
> >> >> > Hi Parav,
> >> >> >
> >> >> > Could you please take a look at the diagram in [1]?
> >> >> >
> >> >> > IIUC, for VM0, there are two groups, and for VM1, there is only one group.
> >> >> > Am I right? If yes, I think the group concept is reasonable but we don't need
> >> >> > at this time.
> >> >> >
> >> >> > I think the first thing is to figure out how to pick the right group.
> >> >> >
> >> >> > Standard socket doesn't provide a way to access the group information.
> >> >> >
> >> >> > Source and destination are from `bind()` and `connect()`, respectively. If we
> >> >> > don't call `bind()`, only the destination is known.
> >> >> >
> >> >> > However, only destination is not enough to find the group. For example, the
> >> >> > well-known CIDs (e.g. 2) are valid for all groups.
> >> >> >
> >> >> > 1:
> >> >> > https://lore.kernel.org/virtio-comment/20250618095139.1412138-1-niuxuewei.nxw@antgroup.com/
> >> >>
> >> >> Right. Sock addressing scheme is naïve presently to select the group.
> >> >> Not sure when/how you or others plan to do.
> >> >> This is transport layer problem to solve (not to confuse with transport = pci/mmio etc).
> >>
> >>
> >> >>
> >> >> However, at device level, we should have the construct of grouping.
> >> >> Without this construct, all devices will be part of single group and one will not be able to build the group concept later.
> >> >> So even if you don't need it explicitly now, grouping the device is what you need when connect() is called.
> >> >>
> >> >> So I was imagining a relatively simple scheme:
> >> >> For example, virtio device level, some kind of group id is present.
> >> >> So two devices which has same group id, are part of single group.
> >> >> An example group id format can be a UUID.
> >> >>
> >> >> And this is completely optional for devices to implement.
> >> >> Generic enough and usable beyond just vsock device in other use cases
> >> >> we discussed in past.
> >> >
> >> >Fair enough.
> >> >
> >> >@Stefano, could you please take a look at this? I'd love to have some
> >> >input
> >> >from you.
> >> >
> >> >A brief summary of the idea is: The config space will be extended to
> >> >include a group id. The devices with the same group id are considered
> >> >to be
> >> >in the same group.
> >>
> >> Thanks for the summary, but please avoid top posting, otherwise is very
> >> hard to follow the discussion :-(
> >> https://subspace.kernel.org/etiquette.html#do-not-top-post-when-replying
> >
> >Sorry, I'll avoid it in the future.
> >
> >> I like the idea of groups. What is not clear to me, is how groups will
> >> allow the driver to select the default output device when the source
> >> socket is not bind to any source CID.
> >
> >Well, we did discuss, but we need your input.
> >
> >I said in the thread [1] based on the standard socket API, the driver can't
> >pick a group. Parav [2] suggested that the group, as a basic concept,
> >should be present even if we are unable to use it.
> >
> >IMHO, we might use "{group_id}-{cid}" as the vsock addressing scheme to use
> >the concept. But it is a very big change, leading to incompatibility with
> >the existing apps.
> 
> I still don't understand how the group_id will work :-( and how will 
> allow the driver to pick the default device.

FYI, here is trying to explain what is group and why it is needed, but I
agree with you about the idea of "types".

I posted a diagram in the thread [1]. I think the group is something like a
"namespace". I think it is reasonable, but my concerns are compatibility
and complexity.

1: https://lore.kernel.org/virtio-comment/20250618095139.1412138-1-niuxuewei.nxw@antgroup.com/

> This is the `sockaddr_vm`, so we should be careful of extending it:
> 
> struct sockaddr_vm {
> 	__kernel_sa_family_t svm_family;
> 	unsigned short svm_reserved1;
> 	unsigned int svm_port;
> 	unsigned int svm_cid;
> 
> #define VMADDR_FLAG_TO_HOST 0x01
> 
> 	__u8 svm_flags;
> 	unsigned char svm_zero[sizeof(struct sockaddr) -
> 			       sizeof(sa_family_t) -
> 			       sizeof(unsigned short) -
> 			       sizeof(unsigned int) -
> 			       sizeof(unsigned int) -
> 			       sizeof(__u8)];
> };
>
> >I think it might be beyond the scope of this patch, and would make the
> >vsock more complex. The current conclusion is that we will keep the concept
> >of grouping as a placeholder, but we will not use it.
> 
> IMO we should first clarify better what we want to support.
> As I already suggested some months ago, IMHO supporting any number of 
> vsock devices for a VM it's not really needed for your goal and I can't 
> see other use cases where a virtio-net device can't be use. Just a 
> reminder, vsock is not a network device, is more a P2P device where we 
> want to keep the configuration in the guest as simpler as possible (we 
> don't want to run ARP, DHCP, etc.).

Agree that.

> Till now vsock was more used just for guest-host communication, but 
> recently it was extended to communicate with sibling VMs.
> 
> IIUC your use case, we just need to support different type of vsock 
> devices attached to the VM. With "type" I mean type of address handled.
> I think we can define 3 types based on the CID we have:
> - hypervisor: VMADDR_CID_HYPERVISOR(0)
> - host: VMADDR_CID_HOST(2)
> - sibling: CID >=3
> 
> The vhost-vsock device handles only VMADDR_CID_HOST, so it doesn't allow 
> to reach from the guest any other CIDs.
> The vhost-user-vsock recently started to support sibling VMs.
> The tcp-over-vsock of libkrun should use VMADDR_CID_HYPERVISOR(0).
> 
> So, IMHO we should define new features or config flags that a device can 
> expose depending on which address is able to handle.
> Also, in order to avoid to overcomplicate vsock, we should allow only 
> one device for each type (a single device should support multiple types, 
> but only one device can be registered for a type).

I think only one device can be registered for sibling and hypervisor. And
it is possible to have multiple devices for host. Otherwise, the goals of
this patch will be not achieved.

Then, we can get rid of the group concept.

Big thanks for your constructive ideas :)

Thanks,
Xuewei

> If we don't want this limitation, I think we need to overcomplicate 
> vsock and define some kind of discovery algorithm that the guest driver 
> should run to understand which CIDs are reachable for each device.
> But I'm not sure we really want this; if this is a use case, the guest 
> should use a virtio-net device that already support this pretty well, so 
> before going on this direction, we should define better why we want to 
> complicate vsock, instead of using a net device.
> 
> Thanks,
> Stefano

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v7] virtio-vsock: Add support for multi devices
  2025-06-23 10:35                               ` Xuewei Niu
@ 2025-06-23 11:01                                 ` Xuewei Niu
  2025-06-23 11:15                                 ` Stefano Garzarella
  1 sibling, 0 replies; 59+ messages in thread
From: Xuewei Niu @ 2025-06-23 11:01 UTC (permalink / raw)
  To: niuxuewei97
  Cc: fupan.lfp, mst, niuxuewei.nxw, parav, sgarzare, virtio-comment

> > On Mon, Jun 23, 2025 at 04:48:33PM +0800, Xuewei Niu wrote:
> > >> On Thu, Jun 19, 2025 at 01:10:33PM +0800, Xuewei Niu wrote:
> > >> >> > From: Xuewei Niu <niuxuewei97@gmail.com>
> > >> >> > Sent: 19 June 2025 08:57 AM
> > >> >> >
> > >> >> > Hi Parav,
> > >> >> >
> > >> >> > Could you please take a look at the diagram in [1]?
> > >> >> >
> > >> >> > IIUC, for VM0, there are two groups, and for VM1, there is only one group.
> > >> >> > Am I right? If yes, I think the group concept is reasonable but we don't need
> > >> >> > at this time.
> > >> >> >
> > >> >> > I think the first thing is to figure out how to pick the right group.
> > >> >> >
> > >> >> > Standard socket doesn't provide a way to access the group information.
> > >> >> >
> > >> >> > Source and destination are from `bind()` and `connect()`, respectively. If we
> > >> >> > don't call `bind()`, only the destination is known.
> > >> >> >
> > >> >> > However, only destination is not enough to find the group. For example, the
> > >> >> > well-known CIDs (e.g. 2) are valid for all groups.
> > >> >> >
> > >> >> > 1:
> > >> >> > https://lore.kernel.org/virtio-comment/20250618095139.1412138-1-niuxuewei.nxw@antgroup.com/
> > >> >>
> > >> >> Right. Sock addressing scheme is naïve presently to select the group.
> > >> >> Not sure when/how you or others plan to do.
> > >> >> This is transport layer problem to solve (not to confuse with transport = pci/mmio etc).
> > >>
> > >>
> > >> >>
> > >> >> However, at device level, we should have the construct of grouping.
> > >> >> Without this construct, all devices will be part of single group and one will not be able to build the group concept later.
> > >> >> So even if you don't need it explicitly now, grouping the device is what you need when connect() is called.
> > >> >>
> > >> >> So I was imagining a relatively simple scheme:
> > >> >> For example, virtio device level, some kind of group id is present.
> > >> >> So two devices which has same group id, are part of single group.
> > >> >> An example group id format can be a UUID.
> > >> >>
> > >> >> And this is completely optional for devices to implement.
> > >> >> Generic enough and usable beyond just vsock device in other use cases
> > >> >> we discussed in past.
> > >> >
> > >> >Fair enough.
> > >> >
> > >> >@Stefano, could you please take a look at this? I'd love to have some
> > >> >input
> > >> >from you.
> > >> >
> > >> >A brief summary of the idea is: The config space will be extended to
> > >> >include a group id. The devices with the same group id are considered
> > >> >to be
> > >> >in the same group.
> > >>
> > >> Thanks for the summary, but please avoid top posting, otherwise is very
> > >> hard to follow the discussion :-(
> > >> https://subspace.kernel.org/etiquette.html#do-not-top-post-when-replying
> > >
> > >Sorry, I'll avoid it in the future.
> > >
> > >> I like the idea of groups. What is not clear to me, is how groups will
> > >> allow the driver to select the default output device when the source
> > >> socket is not bind to any source CID.
> > >
> > >Well, we did discuss, but we need your input.
> > >
> > >I said in the thread [1] based on the standard socket API, the driver can't
> > >pick a group. Parav [2] suggested that the group, as a basic concept,
> > >should be present even if we are unable to use it.
> > >
> > >IMHO, we might use "{group_id}-{cid}" as the vsock addressing scheme to use
> > >the concept. But it is a very big change, leading to incompatibility with
> > >the existing apps.
> > 
> > I still don't understand how the group_id will work :-( and how will 
> > allow the driver to pick the default device.
> 
> FYI, here is trying to explain what is group and why it is needed, but I
> agree with you about the idea of "types".
> 
> I posted a diagram in the thread [1]. I think the group is something like a
> "namespace". I think it is reasonable, but my concerns are compatibility
> and complexity.
> 
> 1: https://lore.kernel.org/virtio-comment/20250618095139.1412138-1-niuxuewei.nxw@antgroup.com/
> 
> > This is the `sockaddr_vm`, so we should be careful of extending it:
> > 
> > struct sockaddr_vm {
> > 	__kernel_sa_family_t svm_family;
> > 	unsigned short svm_reserved1;
> > 	unsigned int svm_port;
> > 	unsigned int svm_cid;
> > 
> > #define VMADDR_FLAG_TO_HOST 0x01
> > 
> > 	__u8 svm_flags;
> > 	unsigned char svm_zero[sizeof(struct sockaddr) -
> > 			       sizeof(sa_family_t) -
> > 			       sizeof(unsigned short) -
> > 			       sizeof(unsigned int) -
> > 			       sizeof(unsigned int) -
> > 			       sizeof(__u8)];
> > };
> >
> > >I think it might be beyond the scope of this patch, and would make the
> > >vsock more complex. The current conclusion is that we will keep the concept
> > >of grouping as a placeholder, but we will not use it.
> > 
> > IMO we should first clarify better what we want to support.
> > As I already suggested some months ago, IMHO supporting any number of 
> > vsock devices for a VM it's not really needed for your goal and I can't 
> > see other use cases where a virtio-net device can't be use. Just a 
> > reminder, vsock is not a network device, is more a P2P device where we 
> > want to keep the configuration in the guest as simpler as possible (we 
> > don't want to run ARP, DHCP, etc.).
> 
> Agree that.
> 
> > Till now vsock was more used just for guest-host communication, but 
> > recently it was extended to communicate with sibling VMs.
> > 
> > IIUC your use case, we just need to support different type of vsock 
> > devices attached to the VM. With "type" I mean type of address handled.
> > I think we can define 3 types based on the CID we have:
> > - hypervisor: VMADDR_CID_HYPERVISOR(0)
> > - host: VMADDR_CID_HOST(2)
> > - sibling: CID >=3
> > 
> > The vhost-vsock device handles only VMADDR_CID_HOST, so it doesn't allow 
> > to reach from the guest any other CIDs.
> > The vhost-user-vsock recently started to support sibling VMs.
> > The tcp-over-vsock of libkrun should use VMADDR_CID_HYPERVISOR(0).
> > 
> > So, IMHO we should define new features or config flags that a device can 
> > expose depending on which address is able to handle.
> > Also, in order to avoid to overcomplicate vsock, we should allow only 
> > one device for each type (a single device should support multiple types, 
> > but only one device can be registered for a type).
> 
> I think only one device can be registered for sibling and hypervisor. And
> it is possible to have multiple devices for host. Otherwise, the goals of
> this patch will be not achieved.
> 
> Then, we can get rid of the group concept.
> 
> Big thanks for your constructive ideas :)
> 
> Thanks,
> Xuewei

Well, I think it can impose some limitations to make things simpler:

1. The default device is used to VM-VM (sibling).
2. If there is an exclusive device for HV-VM, then use it. Otherwise, use
the default device.
3. Other devices are allowed to VM-Host only.

Thanks,
Xuewei

> > If we don't want this limitation, I think we need to overcomplicate 
> > vsock and define some kind of discovery algorithm that the guest driver 
> > should run to understand which CIDs are reachable for each device.
> > But I'm not sure we really want this; if this is a use case, the guest 
> > should use a virtio-net device that already support this pretty well, so 
> > before going on this direction, we should define better why we want to 
> > complicate vsock, instead of using a net device.
> > 
> > Thanks,
> > Stefano

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v7] virtio-vsock: Add support for multi devices
  2025-06-23 10:35                               ` Xuewei Niu
  2025-06-23 11:01                                 ` Xuewei Niu
@ 2025-06-23 11:15                                 ` Stefano Garzarella
  2025-06-23 12:14                                   ` Xuewei Niu
  1 sibling, 1 reply; 59+ messages in thread
From: Stefano Garzarella @ 2025-06-23 11:15 UTC (permalink / raw)
  To: Xuewei Niu; +Cc: fupan.lfp, mst, niuxuewei.nxw, parav, virtio-comment

On Mon, Jun 23, 2025 at 06:35:59PM +0800, Xuewei Niu wrote:
>> On Mon, Jun 23, 2025 at 04:48:33PM +0800, Xuewei Niu wrote:
>> >> On Thu, Jun 19, 2025 at 01:10:33PM +0800, Xuewei Niu wrote:
>> >> >> > From: Xuewei Niu <niuxuewei97@gmail.com>
>> >> >> > Sent: 19 June 2025 08:57 AM
>> >> >> >
>> >> >> > Hi Parav,
>> >> >> >
>> >> >> > Could you please take a look at the diagram in [1]?
>> >> >> >
>> >> >> > IIUC, for VM0, there are two groups, and for VM1, there is only one group.
>> >> >> > Am I right? If yes, I think the group concept is reasonable but we don't need
>> >> >> > at this time.
>> >> >> >
>> >> >> > I think the first thing is to figure out how to pick the right group.
>> >> >> >
>> >> >> > Standard socket doesn't provide a way to access the group information.
>> >> >> >
>> >> >> > Source and destination are from `bind()` and `connect()`, respectively. If we
>> >> >> > don't call `bind()`, only the destination is known.
>> >> >> >
>> >> >> > However, only destination is not enough to find the group. For example, the
>> >> >> > well-known CIDs (e.g. 2) are valid for all groups.
>> >> >> >
>> >> >> > 1:
>> >> >> > https://lore.kernel.org/virtio-comment/20250618095139.1412138-1-niuxuewei.nxw@antgroup.com/
>> >> >>
>> >> >> Right. Sock addressing scheme is naïve presently to select the group.
>> >> >> Not sure when/how you or others plan to do.
>> >> >> This is transport layer problem to solve (not to confuse with transport = pci/mmio etc).
>> >>
>> >>
>> >> >>
>> >> >> However, at device level, we should have the construct of grouping.
>> >> >> Without this construct, all devices will be part of single group and one will not be able to build the group concept later.
>> >> >> So even if you don't need it explicitly now, grouping the device is what you need when connect() is called.
>> >> >>
>> >> >> So I was imagining a relatively simple scheme:
>> >> >> For example, virtio device level, some kind of group id is present.
>> >> >> So two devices which has same group id, are part of single group.
>> >> >> An example group id format can be a UUID.
>> >> >>
>> >> >> And this is completely optional for devices to implement.
>> >> >> Generic enough and usable beyond just vsock device in other use cases
>> >> >> we discussed in past.
>> >> >
>> >> >Fair enough.
>> >> >
>> >> >@Stefano, could you please take a look at this? I'd love to have some
>> >> >input
>> >> >from you.
>> >> >
>> >> >A brief summary of the idea is: The config space will be extended to
>> >> >include a group id. The devices with the same group id are considered
>> >> >to be
>> >> >in the same group.
>> >>
>> >> Thanks for the summary, but please avoid top posting, otherwise is very
>> >> hard to follow the discussion :-(
>> >> https://subspace.kernel.org/etiquette.html#do-not-top-post-when-replying
>> >
>> >Sorry, I'll avoid it in the future.
>> >
>> >> I like the idea of groups. What is not clear to me, is how groups will
>> >> allow the driver to select the default output device when the source
>> >> socket is not bind to any source CID.
>> >
>> >Well, we did discuss, but we need your input.
>> >
>> >I said in the thread [1] based on the standard socket API, the driver can't
>> >pick a group. Parav [2] suggested that the group, as a basic concept,
>> >should be present even if we are unable to use it.
>> >
>> >IMHO, we might use "{group_id}-{cid}" as the vsock addressing scheme to use
>> >the concept. But it is a very big change, leading to incompatibility with
>> >the existing apps.
>>
>> I still don't understand how the group_id will work :-( and how will
>> allow the driver to pick the default device.
>
>FYI, here is trying to explain what is group and why it is needed, but I
>agree with you about the idea of "types".
>
>I posted a diagram in the thread [1]. I think the group is something like a
>"namespace". I think it is reasonable, but my concerns are compatibility
>and complexity.
>
>1: https://lore.kernel.org/virtio-comment/20250618095139.1412138-1-niuxuewei.nxw@antgroup.com/
>
>> This is the `sockaddr_vm`, so we should be careful of extending it:
>>
>> struct sockaddr_vm {
>> 	__kernel_sa_family_t svm_family;
>> 	unsigned short svm_reserved1;
>> 	unsigned int svm_port;
>> 	unsigned int svm_cid;
>>
>> #define VMADDR_FLAG_TO_HOST 0x01
>>
>> 	__u8 svm_flags;
>> 	unsigned char svm_zero[sizeof(struct sockaddr) -
>> 			       sizeof(sa_family_t) -
>> 			       sizeof(unsigned short) -
>> 			       sizeof(unsigned int) -
>> 			       sizeof(unsigned int) -
>> 			       sizeof(__u8)];
>> };
>>
>> >I think it might be beyond the scope of this patch, and would make the
>> >vsock more complex. The current conclusion is that we will keep the concept
>> >of grouping as a placeholder, but we will not use it.
>>
>> IMO we should first clarify better what we want to support.
>> As I already suggested some months ago, IMHO supporting any number of
>> vsock devices for a VM it's not really needed for your goal and I can't
>> see other use cases where a virtio-net device can't be use. Just a
>> reminder, vsock is not a network device, is more a P2P device where we
>> want to keep the configuration in the guest as simpler as possible (we
>> don't want to run ARP, DHCP, etc.).
>
>Agree that.
>
>> Till now vsock was more used just for guest-host communication, but
>> recently it was extended to communicate with sibling VMs.
>>
>> IIUC your use case, we just need to support different type of vsock
>> devices attached to the VM. With "type" I mean type of address handled.
>> I think we can define 3 types based on the CID we have:
>> - hypervisor: VMADDR_CID_HYPERVISOR(0)
>> - host: VMADDR_CID_HOST(2)
>> - sibling: CID >=3
>>
>> The vhost-vsock device handles only VMADDR_CID_HOST, so it doesn't allow
>> to reach from the guest any other CIDs.
>> The vhost-user-vsock recently started to support sibling VMs.
>> The tcp-over-vsock of libkrun should use VMADDR_CID_HYPERVISOR(0).
>>
>> So, IMHO we should define new features or config flags that a device can
>> expose depending on which address is able to handle.
>> Also, in order to avoid to overcomplicate vsock, we should allow only
>> one device for each type (a single device should support multiple types,
>> but only one device can be registered for a type).
>
>I think only one device can be registered for sibling and hypervisor. And
>it is possible to have multiple devices for host. Otherwise, the goals of
>this patch will be not achieved.

Why?

IMO, as I already wrote, the libkrun service should use CID=0.

How you will handle multiple devices for host? How can the guest know 
which device to use to reach HOST(2)?

IMO is easier to have the multiple device for sibling (e.g. the device 
can advertise which dest CID its supports), but not for host.

>
>Then, we can get rid of the group concept.
>
>Big thanks for your constructive ideas :)

You're welcome :-)

Stefano


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v7] virtio-vsock: Add support for multi devices
  2025-06-23 11:15                                 ` Stefano Garzarella
@ 2025-06-23 12:14                                   ` Xuewei Niu
  2025-06-23 12:51                                     ` Stefano Garzarella
  0 siblings, 1 reply; 59+ messages in thread
From: Xuewei Niu @ 2025-06-23 12:14 UTC (permalink / raw)
  To: sgarzare
  Cc: fupan.lfp, mst, niuxuewei.nxw, niuxuewei97, parav, virtio-comment

> On Mon, Jun 23, 2025 at 06:35:59PM +0800, Xuewei Niu wrote:
> >> On Mon, Jun 23, 2025 at 04:48:33PM +0800, Xuewei Niu wrote:
> >> >> On Thu, Jun 19, 2025 at 01:10:33PM +0800, Xuewei Niu wrote:
> >> >> >> > From: Xuewei Niu <niuxuewei97@gmail.com>
> >> >> >> > Sent: 19 June 2025 08:57 AM
> >> >> >> >
> >> >> >> > Hi Parav,
> >> >> >> >
> >> >> >> > Could you please take a look at the diagram in [1]?
> >> >> >> >
> >> >> >> > IIUC, for VM0, there are two groups, and for VM1, there is only one group.
> >> >> >> > Am I right? If yes, I think the group concept is reasonable but we don't need
> >> >> >> > at this time.
> >> >> >> >
> >> >> >> > I think the first thing is to figure out how to pick the right group.
> >> >> >> >
> >> >> >> > Standard socket doesn't provide a way to access the group information.
> >> >> >> >
> >> >> >> > Source and destination are from `bind()` and `connect()`, respectively. If we
> >> >> >> > don't call `bind()`, only the destination is known.
> >> >> >> >
> >> >> >> > However, only destination is not enough to find the group. For example, the
> >> >> >> > well-known CIDs (e.g. 2) are valid for all groups.
> >> >> >> >
> >> >> >> > 1:
> >> >> >> > https://lore.kernel.org/virtio-comment/20250618095139.1412138-1-niuxuewei.nxw@antgroup.com/
> >> >> >>
> >> >> >> Right. Sock addressing scheme is naïve presently to select the group.
> >> >> >> Not sure when/how you or others plan to do.
> >> >> >> This is transport layer problem to solve (not to confuse with transport = pci/mmio etc).
> >> >>
> >> >>
> >> >> >>
> >> >> >> However, at device level, we should have the construct of grouping.
> >> >> >> Without this construct, all devices will be part of single group and one will not be able to build the group concept later.
> >> >> >> So even if you don't need it explicitly now, grouping the device is what you need when connect() is called.
> >> >> >>
> >> >> >> So I was imagining a relatively simple scheme:
> >> >> >> For example, virtio device level, some kind of group id is present.
> >> >> >> So two devices which has same group id, are part of single group.
> >> >> >> An example group id format can be a UUID.
> >> >> >>
> >> >> >> And this is completely optional for devices to implement.
> >> >> >> Generic enough and usable beyond just vsock device in other use cases
> >> >> >> we discussed in past.
> >> >> >
> >> >> >Fair enough.
> >> >> >
> >> >> >@Stefano, could you please take a look at this? I'd love to have some
> >> >> >input
> >> >> >from you.
> >> >> >
> >> >> >A brief summary of the idea is: The config space will be extended to
> >> >> >include a group id. The devices with the same group id are considered
> >> >> >to be
> >> >> >in the same group.
> >> >>
> >> >> Thanks for the summary, but please avoid top posting, otherwise is very
> >> >> hard to follow the discussion :-(
> >> >> https://subspace.kernel.org/etiquette.html#do-not-top-post-when-replying
> >> >
> >> >Sorry, I'll avoid it in the future.
> >> >
> >> >> I like the idea of groups. What is not clear to me, is how groups will
> >> >> allow the driver to select the default output device when the source
> >> >> socket is not bind to any source CID.
> >> >
> >> >Well, we did discuss, but we need your input.
> >> >
> >> >I said in the thread [1] based on the standard socket API, the driver can't
> >> >pick a group. Parav [2] suggested that the group, as a basic concept,
> >> >should be present even if we are unable to use it.
> >> >
> >> >IMHO, we might use "{group_id}-{cid}" as the vsock addressing scheme to use
> >> >the concept. But it is a very big change, leading to incompatibility with
> >> >the existing apps.
> >>
> >> I still don't understand how the group_id will work :-( and how will
> >> allow the driver to pick the default device.
> >
> >FYI, here is trying to explain what is group and why it is needed, but I
> >agree with you about the idea of "types".
> >
> >I posted a diagram in the thread [1]. I think the group is something like a
> >"namespace". I think it is reasonable, but my concerns are compatibility
> >and complexity.
> >
> >1: https://lore.kernel.org/virtio-comment/20250618095139.1412138-1-niuxuewei.nxw@antgroup.com/
> >
> >> This is the `sockaddr_vm`, so we should be careful of extending it:
> >>
> >> struct sockaddr_vm {
> >> 	__kernel_sa_family_t svm_family;
> >> 	unsigned short svm_reserved1;
> >> 	unsigned int svm_port;
> >> 	unsigned int svm_cid;
> >>
> >> #define VMADDR_FLAG_TO_HOST 0x01
> >>
> >> 	__u8 svm_flags;
> >> 	unsigned char svm_zero[sizeof(struct sockaddr) -
> >> 			       sizeof(sa_family_t) -
> >> 			       sizeof(unsigned short) -
> >> 			       sizeof(unsigned int) -
> >> 			       sizeof(unsigned int) -
> >> 			       sizeof(__u8)];
> >> };
> >>
> >> >I think it might be beyond the scope of this patch, and would make the
> >> >vsock more complex. The current conclusion is that we will keep the concept
> >> >of grouping as a placeholder, but we will not use it.
> >>
> >> IMO we should first clarify better what we want to support.
> >> As I already suggested some months ago, IMHO supporting any number of
> >> vsock devices for a VM it's not really needed for your goal and I can't
> >> see other use cases where a virtio-net device can't be use. Just a
> >> reminder, vsock is not a network device, is more a P2P device where we
> >> want to keep the configuration in the guest as simpler as possible (we
> >> don't want to run ARP, DHCP, etc.).
> >
> >Agree that.
> >
> >> Till now vsock was more used just for guest-host communication, but
> >> recently it was extended to communicate with sibling VMs.
> >>
> >> IIUC your use case, we just need to support different type of vsock
> >> devices attached to the VM. With "type" I mean type of address handled.
> >> I think we can define 3 types based on the CID we have:
> >> - hypervisor: VMADDR_CID_HYPERVISOR(0)
> >> - host: VMADDR_CID_HOST(2)
> >> - sibling: CID >=3
> >>
> >> The vhost-vsock device handles only VMADDR_CID_HOST, so it doesn't allow
> >> to reach from the guest any other CIDs.
> >> The vhost-user-vsock recently started to support sibling VMs.
> >> The tcp-over-vsock of libkrun should use VMADDR_CID_HYPERVISOR(0).
> >>
> >> So, IMHO we should define new features or config flags that a device can
> >> expose depending on which address is able to handle.
> >> Also, in order to avoid to overcomplicate vsock, we should allow only
> >> one device for each type (a single device should support multiple types,
> >> but only one device can be registered for a type).
> >
> >I think only one device can be registered for sibling and hypervisor. And
> >it is possible to have multiple devices for host. Otherwise, the goals of
> >this patch will be not achieved.
> 
> Why?
> 
> IMO, as I already wrote, the libkrun service should use CID=0.

As we before discussed, if there is only one backend for the libkrun
service, we can use cid=0. I totally agree with you. So let us put this
case aside first.

> How you will handle multiple devices for host? How can the guest know 
> which device to use to reach HOST(2)?

Do `bind()` explicitly, and a device with matching cid will be picked up.

> IMO is easier to have the multiple device for sibling (e.g. the device 
> can advertise which dest CID its supports), but not for host.

I think one device for sibling is enough. For example, dev1 is enough for
communication with dev3 and dev4. So we don't need dev2 for sibling in the
first VM, right?

+──kernel(vhost-vsock)──────────────────────────────────────────────+    
│+──────────────+ +──────────────+ +──────────────+ +──────────────+│    
││  dev1(cid1)  │ │  dev2(cid2)  │ │  dev3(cid3)  │ │  dev4(cid4)  ││    
│+──────────────+ +──────────────+ +──────────────+ +──────────────+│    
+────────▲────────────────▲────────────────▲────────────────▲───────+    
         │                │                │                │            
         │                │          ┌─────┘                │            
         │                │          │                      │            
+────────┴───────────+    │ +────────┴───────────+ +────────┴───────────+
│+──────────────+ VM0│    │ │+──────────────+ VM1│ │+──────────────+ VM2│
││dev1(cid1,def)│    │    │ ││dev3(cid3,def)│    │ ││dev4(cid4,def)│    │
│+──────────────+    │    │ │+──────────────+    │ │+──────────────+    │
│+──────────────+    │    │ │                    │ │                    │
││  dev2(cid2)  │────┼────┘ │                    │ │                    │
│+──────────────+    │      │                    │ │                    │
+────────────────────+      +────────────────────+ +────────────────────+

The things go different for devices for host:

1. For dev1, it is used for sibling (dev1 <-> dev4), and the host
(src_cid=cid1, dst_cid=2) means the real host (not a userapp), where
doesn't show in the diagram.
2. For dev2 (src_cid=cid2), the host (dst_cid=2) is userapp1;
3. For dev3 (src_cid=cid3), the host (dst_cid=2) is userapp2.

+──kernel(vhost-vsock)──────────────────────────────────────────────+    
│+──────────────+                                   +──────────────+│    
││  dev1(cid1)  │                                   │  dev4(cid4)  ││    
│+──────────────+                                   +──────────────+│    
+────────▲──────────────────────────────────────────────────▲───────+    
         │                                                  │            
+────────┴───────────+     +────────────────────+           │            
│+──────────────+ VM0│     │+──────────────+    │           │            
││dev1(cid1,def)│    │┌────▶│  dev2(cid2)  │    │  +────────┴───────────+
│+──────────────+    ││    │+──────────────+    │  │+──────────────+ VM2│
│+──────────────+    ││    │            userapp1│  ││dev4(cid4,def)│    │
││  dev2(cid2)  │────┼┘    +────────────────────+  │+──────────────+    │
│+──────────────+  vhost-user-vsock─────────────+  │                    │
│+──────────────+    │     │+──────────────+    │  │                    │
││  dev3(cid3)  │────┼─────▶│  dev3(cid3)  │    │  │                    │
│+──────────────+    │     │+──────────────+    │  +────────────────────+
+────────────────────+     │            userapp2│                        
                           +────────────────────+                        

Thanks,
Xuewei

> >
> >Then, we can get rid of the group concept.
> >
> >Big thanks for your constructive ideas :)
> 
> You're welcome :-)
> 
> Stefano

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v7] virtio-vsock: Add support for multi devices
  2025-06-23 12:14                                   ` Xuewei Niu
@ 2025-06-23 12:51                                     ` Stefano Garzarella
  2025-06-23 15:51                                       ` Xuewei Niu
  0 siblings, 1 reply; 59+ messages in thread
From: Stefano Garzarella @ 2025-06-23 12:51 UTC (permalink / raw)
  To: Xuewei Niu; +Cc: fupan.lfp, mst, niuxuewei.nxw, parav, virtio-comment

On Mon, Jun 23, 2025 at 08:14:07PM +0800, Xuewei Niu wrote:
>> On Mon, Jun 23, 2025 at 06:35:59PM +0800, Xuewei Niu wrote:
>> >> On Mon, Jun 23, 2025 at 04:48:33PM +0800, Xuewei Niu wrote:
>> >> >> On Thu, Jun 19, 2025 at 01:10:33PM +0800, Xuewei Niu wrote:
>> >> >> >> > From: Xuewei Niu <niuxuewei97@gmail.com>
>> >> >> >> > Sent: 19 June 2025 08:57 AM
>> >> >> >> >
>> >> >> >> > Hi Parav,
>> >> >> >> >
>> >> >> >> > Could you please take a look at the diagram in [1]?
>> >> >> >> >
>> >> >> >> > IIUC, for VM0, there are two groups, and for VM1, there is only one group.
>> >> >> >> > Am I right? If yes, I think the group concept is reasonable but we don't need
>> >> >> >> > at this time.
>> >> >> >> >
>> >> >> >> > I think the first thing is to figure out how to pick the right group.
>> >> >> >> >
>> >> >> >> > Standard socket doesn't provide a way to access the group information.
>> >> >> >> >
>> >> >> >> > Source and destination are from `bind()` and `connect()`, respectively. If we
>> >> >> >> > don't call `bind()`, only the destination is known.
>> >> >> >> >
>> >> >> >> > However, only destination is not enough to find the group. For example, the
>> >> >> >> > well-known CIDs (e.g. 2) are valid for all groups.
>> >> >> >> >
>> >> >> >> > 1:
>> >> >> >> > https://lore.kernel.org/virtio-comment/20250618095139.1412138-1-niuxuewei.nxw@antgroup.com/
>> >> >> >>
>> >> >> >> Right. Sock addressing scheme is naïve presently to select the group.
>> >> >> >> Not sure when/how you or others plan to do.
>> >> >> >> This is transport layer problem to solve (not to confuse with transport = pci/mmio etc).
>> >> >>
>> >> >>
>> >> >> >>
>> >> >> >> However, at device level, we should have the construct of grouping.
>> >> >> >> Without this construct, all devices will be part of single group and one will not be able to build the group concept later.
>> >> >> >> So even if you don't need it explicitly now, grouping the device is what you need when connect() is called.
>> >> >> >>
>> >> >> >> So I was imagining a relatively simple scheme:
>> >> >> >> For example, virtio device level, some kind of group id is present.
>> >> >> >> So two devices which has same group id, are part of single group.
>> >> >> >> An example group id format can be a UUID.
>> >> >> >>
>> >> >> >> And this is completely optional for devices to implement.
>> >> >> >> Generic enough and usable beyond just vsock device in other use cases
>> >> >> >> we discussed in past.
>> >> >> >
>> >> >> >Fair enough.
>> >> >> >
>> >> >> >@Stefano, could you please take a look at this? I'd love to have some
>> >> >> >input
>> >> >> >from you.
>> >> >> >
>> >> >> >A brief summary of the idea is: The config space will be extended to
>> >> >> >include a group id. The devices with the same group id are considered
>> >> >> >to be
>> >> >> >in the same group.
>> >> >>
>> >> >> Thanks for the summary, but please avoid top posting, otherwise is very
>> >> >> hard to follow the discussion :-(
>> >> >> https://subspace.kernel.org/etiquette.html#do-not-top-post-when-replying
>> >> >
>> >> >Sorry, I'll avoid it in the future.
>> >> >
>> >> >> I like the idea of groups. What is not clear to me, is how groups will
>> >> >> allow the driver to select the default output device when the source
>> >> >> socket is not bind to any source CID.
>> >> >
>> >> >Well, we did discuss, but we need your input.
>> >> >
>> >> >I said in the thread [1] based on the standard socket API, the driver can't
>> >> >pick a group. Parav [2] suggested that the group, as a basic concept,
>> >> >should be present even if we are unable to use it.
>> >> >
>> >> >IMHO, we might use "{group_id}-{cid}" as the vsock addressing scheme to use
>> >> >the concept. But it is a very big change, leading to incompatibility with
>> >> >the existing apps.
>> >>
>> >> I still don't understand how the group_id will work :-( and how will
>> >> allow the driver to pick the default device.
>> >
>> >FYI, here is trying to explain what is group and why it is needed, but I
>> >agree with you about the idea of "types".
>> >
>> >I posted a diagram in the thread [1]. I think the group is something like a
>> >"namespace". I think it is reasonable, but my concerns are compatibility
>> >and complexity.
>> >
>> >1: https://lore.kernel.org/virtio-comment/20250618095139.1412138-1-niuxuewei.nxw@antgroup.com/
>> >
>> >> This is the `sockaddr_vm`, so we should be careful of extending it:
>> >>
>> >> struct sockaddr_vm {
>> >> 	__kernel_sa_family_t svm_family;
>> >> 	unsigned short svm_reserved1;
>> >> 	unsigned int svm_port;
>> >> 	unsigned int svm_cid;
>> >>
>> >> #define VMADDR_FLAG_TO_HOST 0x01
>> >>
>> >> 	__u8 svm_flags;
>> >> 	unsigned char svm_zero[sizeof(struct sockaddr) -
>> >> 			       sizeof(sa_family_t) -
>> >> 			       sizeof(unsigned short) -
>> >> 			       sizeof(unsigned int) -
>> >> 			       sizeof(unsigned int) -
>> >> 			       sizeof(__u8)];
>> >> };
>> >>
>> >> >I think it might be beyond the scope of this patch, and would make the
>> >> >vsock more complex. The current conclusion is that we will keep the concept
>> >> >of grouping as a placeholder, but we will not use it.
>> >>
>> >> IMO we should first clarify better what we want to support.
>> >> As I already suggested some months ago, IMHO supporting any number of
>> >> vsock devices for a VM it's not really needed for your goal and I can't
>> >> see other use cases where a virtio-net device can't be use. Just a
>> >> reminder, vsock is not a network device, is more a P2P device where we
>> >> want to keep the configuration in the guest as simpler as possible (we
>> >> don't want to run ARP, DHCP, etc.).
>> >
>> >Agree that.
>> >
>> >> Till now vsock was more used just for guest-host communication, but
>> >> recently it was extended to communicate with sibling VMs.
>> >>
>> >> IIUC your use case, we just need to support different type of vsock
>> >> devices attached to the VM. With "type" I mean type of address handled.
>> >> I think we can define 3 types based on the CID we have:
>> >> - hypervisor: VMADDR_CID_HYPERVISOR(0)
>> >> - host: VMADDR_CID_HOST(2)
>> >> - sibling: CID >=3
>> >>
>> >> The vhost-vsock device handles only VMADDR_CID_HOST, so it doesn't allow
>> >> to reach from the guest any other CIDs.
>> >> The vhost-user-vsock recently started to support sibling VMs.
>> >> The tcp-over-vsock of libkrun should use VMADDR_CID_HYPERVISOR(0).
>> >>
>> >> So, IMHO we should define new features or config flags that a device can
>> >> expose depending on which address is able to handle.
>> >> Also, in order to avoid to overcomplicate vsock, we should allow only
>> >> one device for each type (a single device should support multiple types,
>> >> but only one device can be registered for a type).
>> >
>> >I think only one device can be registered for sibling and hypervisor. And
>> >it is possible to have multiple devices for host. Otherwise, the goals of
>> >this patch will be not achieved.
>>
>> Why?
>>
>> IMO, as I already wrote, the libkrun service should use CID=0.
>
>As we before discussed, if there is only one backend for the libkrun
>service, we can use cid=0. I totally agree with you. So let us put this
>case aside first.
>
>> How you will handle multiple devices for host? How can the guest know
>> which device to use to reach HOST(2)?
>
>Do `bind()` explicitly, and a device with matching cid will be picked up.

Okay, but why you need 2 devices to communicate with the same CID, host 
in this case (CID=2)?
IMO use the source CID to multiplex a socket at destination is not 
great. But I can be wrong.

>
>> IMO is easier to have the multiple device for sibling (e.g. the device
>> can advertise which dest CID its supports), but not for host.
>
>I think one device for sibling is enough.

I also think one should be enough, but IMO I think it might make more 
sense to have multiple devices in this case, where each device can 
handle a pool of CIDs, then the hard part will be figuring out how to 
allocate the CIDs, etc. so yes, I agree that having one device even in 
this case is the easiest thing.

>For example, dev1 is enough for
>communication with dev3 and dev4. So we don't need dev2 for sibling in the
>first VM, right?

Just a note, vhost-vsock is not allowing any sibling communication.
vhost-user-vsock can do it, but we don't want to bring any support in 
vhost-vsock to not overcomplicate it (again it will become like a 
network switch, requiring firewalls, etc.)

>
>+──kernel(vhost-vsock)──────────────────────────────────────────────+
>│+──────────────+ +──────────────+ +──────────────+ +──────────────+│
>││  dev1(cid1)  │ │  dev2(cid2)  │ │  dev3(cid3)  │ │  dev4(cid4)  ││
>│+──────────────+ +──────────────+ +──────────────+ +──────────────+│
>+────────▲────────────────▲────────────────▲────────────────▲───────+
>         │                │                │                │
>         │                │          ┌─────┘                │
>         │                │          │                      │
>+────────┴───────────+    │ +────────┴───────────+ +────────┴───────────+
>│+──────────────+ VM0│    │ │+──────────────+ VM1│ │+──────────────+ VM2│
>││dev1(cid1,def)│    │    │ ││dev3(cid3,def)│    │ ││dev4(cid4,def)│    │
>│+──────────────+    │    │ │+──────────────+    │ │+──────────────+    │
>│+──────────────+    │    │ │                    │ │                    │
>││  dev2(cid2)  │────┼────┘ │                    │ │                    │
>│+──────────────+    │      │                    │ │                    │
>+────────────────────+      +────────────────────+ +────────────────────+
>
>The things go different for devices for host:
>
>1. For dev1, it is used for sibling (dev1 <-> dev4), and the host
>(src_cid=cid1, dst_cid=2) means the real host (not a userapp), where
>doesn't show in the diagram.

This is not going to happen (see above).

>2. For dev2 (src_cid=cid2), the host (dst_cid=2) is userapp1;
>3. For dev3 (src_cid=cid3), the host (dst_cid=2) is userapp2.
>
>+──kernel(vhost-vsock)──────────────────────────────────────────────+
>│+──────────────+                                   +──────────────+│
>││  dev1(cid1)  │                                   │  dev4(cid4)  ││
>│+──────────────+                                   +──────────────+│
>+────────▲──────────────────────────────────────────────────▲───────+
>         │                                                  │
>+────────┴───────────+     +────────────────────+           │
>│+──────────────+ VM0│     │+──────────────+    │           │
>││dev1(cid1,def)│    │┌────▶│  dev2(cid2)  │    │  +────────┴───────────+
>│+──────────────+    ││    │+──────────────+    │  │+──────────────+ VM2│
>│+──────────────+    ││    │            userapp1│  ││dev4(cid4,def)│    │
>││  dev2(cid2)  │────┼┘    +────────────────────+  │+──────────────+    │
>│+──────────────+  vhost-user-vsock─────────────+  │                    │
>│+──────────────+    │     │+──────────────+    │  │                    │
>││  dev3(cid3)  │────┼─────▶│  dev3(cid3)  │    │  │                    │
>│+──────────────+    │     │+──────────────+    │  +────────────────────+
>+────────────────────+     │            userapp2│
>                           +────────────────────+
>

I'm really confused with `cid1`, `cid2`, etc. Are they any number >= 3?
I'd suggest to use real value (e.g. cid=42).

So what dest CID the VM0 is supposed to use to talk with userapp1 and 
userapp2? In both cases CID=2, right?

Why you need 2 vhost-user-vsock devices?
Can you just have a single one and have the application 
connecting/listing on different port? (which is the sense of the port, 
multiplexing application on the destination)

Thanks,
Stefano


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v7] virtio-vsock: Add support for multi devices
  2025-06-23 12:51                                     ` Stefano Garzarella
@ 2025-06-23 15:51                                       ` Xuewei Niu
  2025-07-01 10:31                                         ` Stefano Garzarella
  0 siblings, 1 reply; 59+ messages in thread
From: Xuewei Niu @ 2025-06-23 15:51 UTC (permalink / raw)
  To: sgarzare
  Cc: fupan.lfp, mst, niuxuewei.nxw, niuxuewei97, parav, virtio-comment

> On Mon, Jun 23, 2025 at 08:14:07PM +0800, Xuewei Niu wrote:
> >> On Mon, Jun 23, 2025 at 06:35:59PM +0800, Xuewei Niu wrote:
> >> >> On Mon, Jun 23, 2025 at 04:48:33PM +0800, Xuewei Niu wrote:
> >> >> >> On Thu, Jun 19, 2025 at 01:10:33PM +0800, Xuewei Niu wrote:
> >> >> >> >> > From: Xuewei Niu <niuxuewei97@gmail.com>
> >> >> >> >> > Sent: 19 June 2025 08:57 AM
> >> >> >> >> >
> >> >> >> >> > Hi Parav,
> >> >> >> >> >
> >> >> >> >> > Could you please take a look at the diagram in [1]?
> >> >> >> >> >
> >> >> >> >> > IIUC, for VM0, there are two groups, and for VM1, there is only one group.
> >> >> >> >> > Am I right? If yes, I think the group concept is reasonable but we don't need
> >> >> >> >> > at this time.
> >> >> >> >> >
> >> >> >> >> > I think the first thing is to figure out how to pick the right group.
> >> >> >> >> >
> >> >> >> >> > Standard socket doesn't provide a way to access the group information.
> >> >> >> >> >
> >> >> >> >> > Source and destination are from `bind()` and `connect()`, respectively. If we
> >> >> >> >> > don't call `bind()`, only the destination is known.
> >> >> >> >> >
> >> >> >> >> > However, only destination is not enough to find the group. For example, the
> >> >> >> >> > well-known CIDs (e.g. 2) are valid for all groups.
> >> >> >> >> >
> >> >> >> >> > 1:
> >> >> >> >> > https://lore.kernel.org/virtio-comment/20250618095139.1412138-1-niuxuewei.nxw@antgroup.com/
> >> >> >> >>
> >> >> >> >> Right. Sock addressing scheme is naïve presently to select the group.
> >> >> >> >> Not sure when/how you or others plan to do.
> >> >> >> >> This is transport layer problem to solve (not to confuse with transport = pci/mmio etc).
> >> >> >>
> >> >> >>
> >> >> >> >>
> >> >> >> >> However, at device level, we should have the construct of grouping.
> >> >> >> >> Without this construct, all devices will be part of single group and one will not be able to build the group concept later.
> >> >> >> >> So even if you don't need it explicitly now, grouping the device is what you need when connect() is called.
> >> >> >> >>
> >> >> >> >> So I was imagining a relatively simple scheme:
> >> >> >> >> For example, virtio device level, some kind of group id is present.
> >> >> >> >> So two devices which has same group id, are part of single group.
> >> >> >> >> An example group id format can be a UUID.
> >> >> >> >>
> >> >> >> >> And this is completely optional for devices to implement.
> >> >> >> >> Generic enough and usable beyond just vsock device in other use cases
> >> >> >> >> we discussed in past.
> >> >> >> >
> >> >> >> >Fair enough.
> >> >> >> >
> >> >> >> >@Stefano, could you please take a look at this? I'd love to have some
> >> >> >> >input
> >> >> >> >from you.
> >> >> >> >
> >> >> >> >A brief summary of the idea is: The config space will be extended to
> >> >> >> >include a group id. The devices with the same group id are considered
> >> >> >> >to be
> >> >> >> >in the same group.
> >> >> >>
> >> >> >> Thanks for the summary, but please avoid top posting, otherwise is very
> >> >> >> hard to follow the discussion :-(
> >> >> >> https://subspace.kernel.org/etiquette.html#do-not-top-post-when-replying
> >> >> >
> >> >> >Sorry, I'll avoid it in the future.
> >> >> >
> >> >> >> I like the idea of groups. What is not clear to me, is how groups will
> >> >> >> allow the driver to select the default output device when the source
> >> >> >> socket is not bind to any source CID.
> >> >> >
> >> >> >Well, we did discuss, but we need your input.
> >> >> >
> >> >> >I said in the thread [1] based on the standard socket API, the driver can't
> >> >> >pick a group. Parav [2] suggested that the group, as a basic concept,
> >> >> >should be present even if we are unable to use it.
> >> >> >
> >> >> >IMHO, we might use "{group_id}-{cid}" as the vsock addressing scheme to use
> >> >> >the concept. But it is a very big change, leading to incompatibility with
> >> >> >the existing apps.
> >> >>
> >> >> I still don't understand how the group_id will work :-( and how will
> >> >> allow the driver to pick the default device.
> >> >
> >> >FYI, here is trying to explain what is group and why it is needed, but I
> >> >agree with you about the idea of "types".
> >> >
> >> >I posted a diagram in the thread [1]. I think the group is something like a
> >> >"namespace". I think it is reasonable, but my concerns are compatibility
> >> >and complexity.
> >> >
> >> >1: https://lore.kernel.org/virtio-comment/20250618095139.1412138-1-niuxuewei.nxw@antgroup.com/
> >> >
> >> >> This is the `sockaddr_vm`, so we should be careful of extending it:
> >> >>
> >> >> struct sockaddr_vm {
> >> >> 	__kernel_sa_family_t svm_family;
> >> >> 	unsigned short svm_reserved1;
> >> >> 	unsigned int svm_port;
> >> >> 	unsigned int svm_cid;
> >> >>
> >> >> #define VMADDR_FLAG_TO_HOST 0x01
> >> >>
> >> >> 	__u8 svm_flags;
> >> >> 	unsigned char svm_zero[sizeof(struct sockaddr) -
> >> >> 			       sizeof(sa_family_t) -
> >> >> 			       sizeof(unsigned short) -
> >> >> 			       sizeof(unsigned int) -
> >> >> 			       sizeof(unsigned int) -
> >> >> 			       sizeof(__u8)];
> >> >> };
> >> >>
> >> >> >I think it might be beyond the scope of this patch, and would make the
> >> >> >vsock more complex. The current conclusion is that we will keep the concept
> >> >> >of grouping as a placeholder, but we will not use it.
> >> >>
> >> >> IMO we should first clarify better what we want to support.
> >> >> As I already suggested some months ago, IMHO supporting any number of
> >> >> vsock devices for a VM it's not really needed for your goal and I can't
> >> >> see other use cases where a virtio-net device can't be use. Just a
> >> >> reminder, vsock is not a network device, is more a P2P device where we
> >> >> want to keep the configuration in the guest as simpler as possible (we
> >> >> don't want to run ARP, DHCP, etc.).
> >> >
> >> >Agree that.
> >> >
> >> >> Till now vsock was more used just for guest-host communication, but
> >> >> recently it was extended to communicate with sibling VMs.
> >> >>
> >> >> IIUC your use case, we just need to support different type of vsock
> >> >> devices attached to the VM. With "type" I mean type of address handled.
> >> >> I think we can define 3 types based on the CID we have:
> >> >> - hypervisor: VMADDR_CID_HYPERVISOR(0)
> >> >> - host: VMADDR_CID_HOST(2)
> >> >> - sibling: CID >=3
> >> >>
> >> >> The vhost-vsock device handles only VMADDR_CID_HOST, so it doesn't allow
> >> >> to reach from the guest any other CIDs.
> >> >> The vhost-user-vsock recently started to support sibling VMs.
> >> >> The tcp-over-vsock of libkrun should use VMADDR_CID_HYPERVISOR(0).
> >> >>
> >> >> So, IMHO we should define new features or config flags that a device can
> >> >> expose depending on which address is able to handle.
> >> >> Also, in order to avoid to overcomplicate vsock, we should allow only
> >> >> one device for each type (a single device should support multiple types,
> >> >> but only one device can be registered for a type).
> >> >
> >> >I think only one device can be registered for sibling and hypervisor. And
> >> >it is possible to have multiple devices for host. Otherwise, the goals of
> >> >this patch will be not achieved.
> >>
> >> Why?
> >>
> >> IMO, as I already wrote, the libkrun service should use CID=0.
> >
> >As we before discussed, if there is only one backend for the libkrun
> >service, we can use cid=0. I totally agree with you. So let us put this
> >case aside first.
> >
> >> How you will handle multiple devices for host? How can the guest know
> >> which device to use to reach HOST(2)?
> >
> >Do `bind()` explicitly, and a device with matching cid will be picked up.
> 
> Okay, but why you need 2 devices to communicate with the same CID, host 
> in this case (CID=2)?
> IMO use the source CID to multiplex a socket at destination is not 
> great. But I can be wrong.

I mean multiple sockets with multiple `bind()` calls, not multiplexing:

- socket1: bind(3, -1), connect(2, 10000);
- socket2: bind(4, -1), connect(2, 10001);
- ...

> >> IMO is easier to have the multiple device for sibling (e.g. the device
> >> can advertise which dest CID its supports), but not for host.
> >
> >I think one device for sibling is enough.
> 
> I also think one should be enough, but IMO I think it might make more 
> sense to have multiple devices in this case, where each device can 
> handle a pool of CIDs, then the hard part will be figuring out how to 
> allocate the CIDs, etc. so yes, I agree that having one device even in 
> this case is the easiest thing.

Yeah, so let us skip this for now ;)

> >For example, dev1 is enough for
> >communication with dev3 and dev4. So we don't need dev2 for sibling in the
> >first VM, right?
> 
> Just a note, vhost-vsock is not allowing any sibling communication.
> vhost-user-vsock can do it, but we don't want to bring any support in 
> vhost-vsock to not overcomplicate it (again it will become like a 
> network switch, requiring firewalls, etc.)

Even if we don't impose some complicated mechanisms, I think the
vhost-vsock should work with sibling, and I don't see any difference
between vhost-vsock and vhost-user-vsock. (I am just curious.)

> >+──kernel(vhost-vsock)──────────────────────────────────────────────+
> >│+──────────────+ +──────────────+ +──────────────+ +──────────────+│
> >││  dev1(cid1)  │ │  dev2(cid2)  │ │  dev3(cid3)  │ │  dev4(cid4)  ││
> >│+──────────────+ +──────────────+ +──────────────+ +──────────────+│
> >+────────▲────────────────▲────────────────▲────────────────▲───────+
> >         │                │                │                │
> >         │                │          ┌─────┘                │
> >         │                │          │                      │
> >+────────┴───────────+    │ +────────┴───────────+ +────────┴───────────+
> >│+──────────────+ VM0│    │ │+──────────────+ VM1│ │+──────────────+ VM2│
> >││dev1(cid1,def)│    │    │ ││dev3(cid3,def)│    │ ││dev4(cid4,def)│    │
> >│+──────────────+    │    │ │+──────────────+    │ │+──────────────+    │
> >│+──────────────+    │    │ │                    │ │                    │
> >││  dev2(cid2)  │────┼────┘ │                    │ │                    │
> >│+──────────────+    │      │                    │ │                    │
> >+────────────────────+      +────────────────────+ +────────────────────+
> >
> >The things go different for devices for host:
> >
> >1. For dev1, it is used for sibling (dev1 <-> dev4), and the host
> >(src_cid=cid1, dst_cid=2) means the real host (not a userapp), where
> >doesn't show in the diagram.
> 
> This is not going to happen (see above).
> 
> >2. For dev2 (src_cid=cid2), the host (dst_cid=2) is userapp1;
> >3. For dev3 (src_cid=cid3), the host (dst_cid=2) is userapp2.
> >
> >+──kernel(vhost-vsock)──────────────────────────────────────────────+
> >│+──────────────+                                   +──────────────+│
> >││  dev1(cid1)  │                                   │  dev4(cid4)  ││
> >│+──────────────+                                   +──────────────+│
> >+────────▲──────────────────────────────────────────────────▲───────+
> >         │                                                  │
> >+────────┴───────────+     +────────────────────+           │
> >│+──────────────+ VM0│     │+──────────────+    │           │
> >││dev1(cid1,def)│    │┌────▶│  dev2(cid2)  │    │  +────────┴───────────+
> >│+──────────────+    ││    │+──────────────+    │  │+──────────────+ VM2│
> >│+──────────────+    ││    │            userapp1│  ││dev4(cid4,def)│    │
> >││  dev2(cid2)  │────┼┘    +────────────────────+  │+──────────────+    │
> >│+──────────────+  vhost-user-vsock─────────────+  │                    │
> >│+──────────────+    │     │+──────────────+    │  │                    │
> >││  dev3(cid3)  │────┼─────▶│  dev3(cid3)  │    │  │                    │
> >│+──────────────+    │     │+──────────────+    │  +────────────────────+
> >+────────────────────+     │            userapp2│
> >                           +────────────────────+
> >
> 
> I'm really confused with `cid1`, `cid2`, etc. Are they any number >= 3?
> I'd suggest to use real value (e.g. cid=42).

Okay. I'll update it with real value ;)

Based on the above question, I still put the devices into the kernel.

+──kernel(vhost-vsock)──────────────────────────────────────────────────+
│+───────────────+                                  +───────────────+   │
││  dev1(cid=3)  │                                  │  dev4(cid=6)  │   │
│+───────────────+                                  +───────────────+   │
+────────▲──────────────────────────────────────────────────▲───────────+
         │                                                  │            
+────────┴───────────+     +────────────────────+           │            
│+───────────────+VM0│     │+──────────────+    │           │            
││dev1(cid=3,def)│   │┌────▶│ dev2(cid=4)  │    │  +────────┴───────────+
│+───────────────+   ││    │+──────────────+    │  │+───────────────+VM2│
│+───────────────+   ││    │            userapp1│  ││dev4(cid=6,def)│   │
││  dev2(cid=4)  │───┼┘    +────────────────────+  │+───────────────+   │
│+───────────────+ vhost-user-vsock─────────────+  │                    │
│+───────────────+   │     │+──────────────+    │  │                    │
││  dev3(cid=5)  │───┼─────▶│ dev3(cid=5)  │    │  │                    │
│+───────────────+   │     │+──────────────+    │  +────────────────────+
+────────────────────+     │            userapp2│                        
                           +────────────────────+                        

> So what dest CID the VM0 is supposed to use to talk with userapp1 and 
> userapp2? In both cases CID=2, right?

Yes. There are at least two sockets with source cid=4 and cid=5 respectively.

> Why you need 2 vhost-user-vsock devices?

The benefit of vhost-user is "shared memory", which reduces the need for
data copying. It is possible to share virtqueues to multiple user apps, for
the sake of performance.

I don't forget the "CID=0" thing. Just as an explanation, I'll use the
example of TSI.

We can treat the TSI backend as a proxy. Thanks to vhost-user-vsock, the
data will be copied once from the guest user space to the proxy. When we
have two subnets, which is a common case, we might want to have two proxies
to forward the data.

                 +────────────────+    +───────────────────────────────+
 .─────────.     │  tsi backend1  │    │+───────────────+    ┏━━━━━━━━┓│
(    NW1    )◀───│   (userapp1)   │◀───┤│  vsock dev1   ◀────┃subnet1 ┃│
 `─────────'     +────────────────+    │+───────────────+    ┗━━━━━━━━┛│
                                       │                               │
                 +────────────────+    │+───────────────+    ┏━━━━━━━━┓│
 .─────────.     │  tsi backend2  │◀───┤│  vsock dev2   ◀────┃subnet2 ┃│
(    NW2    )◀───│   (userapp2)   │    │+───────────────+    ┗━━━━━━━━┛│
 `─────────'     +────────────────+    +───────────────────────────────+

> Can you just have a single one and have the application 
> connecting/listing on different port? (which is the sense of the port, 
> multiplexing application on the destination)

In terms of functionality, I think it is possible. But it loses the benefit
of vhost-user-vsock.

Thanks,
Xuewei

> Thanks,
> Stefano

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v7] virtio-vsock: Add support for multi devices
  2025-06-23  9:47                                       ` Xuewei Niu
@ 2025-06-24  0:51                                         ` Jason Wang
  2025-06-24  3:33                                           ` Xuewei Niu
  0 siblings, 1 reply; 59+ messages in thread
From: Jason Wang @ 2025-06-24  0:51 UTC (permalink / raw)
  To: Xuewei Niu; +Cc: fupan.lfp, mst, niuxuewei.nxw, parav, sgarzare, virtio-comment

On Mon, Jun 23, 2025 at 5:47 PM Xuewei Niu <niuxuewei97@gmail.com> wrote:
>
> > On Thu, Jun 19, 2025 at 10:42 AM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > >
> > > > On Wed, Jun 18, 2025 at 5:51 PM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > >
> > > > > > On Wed, Jun 18, 2025 at 1:40 PM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > > > >
> > > > > > > > On Wed, Jun 18, 2025 at 10:47 AM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > > On Tue, Jun 17, 2025 at 3:46 PM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > Resend, because it isn’t listed in the mailing list due to my mistake.
> > > > > > > > > > >
> > > > > > > > > > > > On Mon, Jun 16, 2025 at 4:38 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Mon, 16 Jun 2025 at 10:29, Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Fri, Jun 13, 2025 at 4:46 PM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Fri, 13 Jun 2025 at 06:57, Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > On Sat, Apr 12, 2025 at 10:39 PM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > This patch brings a new feature, called "multi devices", to the virtio
> > > > > > > > > > > > > > > > > > > > vsock. It introduces a "VIRTIO_VSOCK_F_MULTI_DEVICES" feature bit, and a
> > > > > > > > > > > > > > > > > > > > "device_order" field to the config for the virtio vsock.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > == Motivition ==
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Vsock is a lightweight and widely used data exchange mechanism between host
> > > > > > > > > > > > > > > > > > > > and guest. Currently, the virtio-vsock only supports one device, resulting
> > > > > > > > > > > > > > > > > > > > in the inability to enable more than one backend.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > I wonder which part of the spec forbids more than one device.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > No. The spec, however, is designed for a single device, and lacks some
> > > > > > > > > > > > > > > > > > specifications for multiple devices.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > For example, we should have a mechanism to select a device from all to
> > > > > > > > > > > > > > > > > > communicate with a peer.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I wonder if this is a:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 1) mechanism that needs to be mandated by the device
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Yes.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 2) a policy that is allowed to be tweaked by the user as TCP/IP did
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Not allowed in the current version.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > (Note anyhow the driver can override what the device suggests...)
> > > > > > > > > > > > >
> > > > > > > > > > > > > I think we should follow what we described in the spec:
> > > > > > > > > > > > > "The virtio socket device is a zero-configuration socket communications device."
> > > > > > > > > > > >
> > > > > > > > > > > > We probably need to define "configuration" first.
> > > > > > > > > > > >
> > > > > > > > > > > > For example, if it means zero configuration from the user, it does not
> > > > > > > > > > > > conflict with 2), the driver can use its own algorithm to elect a
> > > > > > > > > > > > "default" device.
> > > > > > > > > > >
> > > > > > > > > > > IMHO, it can be done, but it is not the current design.
> > > > > > > > > >
> > > > > > > > > > Well, you need at least explain the advantages or why you choose to do this.
> > > > > > > > >
> > > > > > > > > I listed in the previous message. Maybe it is not clear enough. I'll try to
> > > > > > > > > explain it again.
> > > > > > > > >
> > > > > > > > > I think people should pick up the device by a `bind()` call, which takes a
> > > > > > > > > CID as an argument.
> > > > > > > > >
> > > > > > > > > Generally, the device is picked up by the source CID, which is achieved
> > > > > > > > > through a `bind()` call.
> > > > > > > > >
> > > > > > > > > The default device, which is equivalent to the current single device, is
> > > > > > > > > used to be compatible with the existing applications.
> > > > > > > > >
> > > > > > > > > Apart from that, the default device is used to communicate with hypervisor
> > > > > > > > > for some init works, such as gathering information about other vsock
> > > > > > > > > devices.
> > > > > > > > >
> > > > > > > > > To summarize, users must do `bind()` call explicitly to select the desired
> > > > > > > > > device for non-HV-VM communications.
> > > > > > > >
> > > > > > > > So if I understand correctly you need a way to select the default when
> > > > > > > > bind() is not called?
> > > > > > >
> > > > > > > Yes.
> > > > > > >
> > > > > > > > > > > I prefer to set the default device for VM-HV communication to do some init
> > > > > > > > > > > work. I don't think people have a strong need for this.
> > > > > > > > > > >
> > > > > > > > > > > > Another perspective, making decisions in guests may be even more
> > > > > > > > > > > > helpful for the case where the device is not trusted
> > > > > > > > > > >
> > > > > > > > > > > How does the guest realize the vsock device is not trusted?
> > > > > > > > > >
> > > > > > > > > > There're various ways to build trust (for example device attestation)
> > > > > > > > > > and more might come in the future.
> > > > > > > > > >
> > > > > > > > > > > The guest only
> > > > > > > > > > > knows the information from its config space, which is provided by the host.
> > > > > > > > > >
> > > > > > > > > > The way to build trust is probably beyond the scope of virtio, but it
> > > > > > > > > > is something we need to consider.
> > > > > > > > >
> > > > > > > > > Agree with you. If it comes in the future, I think the driver should have
> > > > > > > > > the ability to make decisions.
> > > > > > > >
> > > > > > > >
> > > > > > > > Actually, I meant the way to build trust via virtio is something that
> > > > > > > > needs to be considered. But now we have other ways to build trust.
> > > > > > > > That would result a situlation:
> > > > > > > >
> > > > > > > > 1) "default" vsock device is not trusted but other might
> > > > > > >
> > > > > > > I think this topic might be beyond the scope of this patch.
> > > > > > >
> > > > > > > With the current version, there is only one device supported, which can be
> > > > > > > considered as the "default".  We don't have a mechanism to say "we don't
> > > > > > > trust you", right?
> > > > > >
> > > > > > No. I meant we don't have it in the virtio core but we already have it
> > > > > > in other layers (for example the transport layer).
> > > > >
> > > > > Are you referring to virito-vsock transport layer?
> > > >
> > > > Yes, for example the PCI layer.
> > > >
> > > > >
> > > > > > > That is, it is assumed that we trust the device provided
> > > > > > > by the hypervisor.
> > > > > > >
> > > > > > > This patch is for multiple devices support, based on the same assumption. I
> > > > > > > think trust is a good point to consider, but perhaps we have to address it
> > > > > > > in the follow-up patches.
> > > > > >
> > > > > > My point is not about how to build trust, it's about letting the
> > > > > > driver decide by itself in some cases.
> > > > > >
> > > > > > A better way might be something like:
> > > > > >
> > > > > > "The device_order is a hint for the driver to select a default vsock
> > > > > > device. Device MAY choose ...."
> > > > >
> > > > > Okay, I'll do that.
> > > > >
> > > > > > >
> > > > > > > > or
> > > > > > > >
> > > > > > > > 2) two device claims that they are all "default"
> > > > > > > >
> > > > > > > > This means anyhow we need a decision from the driver side so the
> > > > > > > > device side order seems to be useless here.
> > > > > > >
> > > > > > > There is only one default device. The device with the lowest device_order
> > > > > > > is considered the default. It is not allowed to have the same device_order.
> > > > > >
> > > > > > Who can forbid two same device_order? Note that in various security
> > > > > > models, hypervisors are not trusted at all.
> > > > >
> > > > > The driver does. Indeed, the driver is able to deny devices if they violate
> > > > > the spec.
> > > >
> > > > Yes, that's the point, anyhow driver need to do the decision.
> > > >
> > > > >
> > > > > > > > > > > > > So, IMO the guest (driver) should not be allowed to change anything.
> > > > > > > > > > > > > E.g. Right now it's not allowed to change the CID assigned by the host (device).
> > > > > > > > > > > >
> > > > > > > > > > > > A dumb question when having two cids (cid1 and cid2) in the same
> > > > > > > > > > > > guest, what happens if src=cid1 and dst=cid2?
> > > > > > > > > > >
> > > > > > > > > > > The packets will be directed to another application, if any, on the same guest.
> > > > > > > > > >
> > > > > > > > > > Ok, so in your example, vhost-user-vsock should route those packets
> > > > > > > > > > back to kernel vsock?
> > > > > > > > >
> > > > > > > > > It depends. Two cases are possible:
> > > > > > > > >
> > > > > > > > > 1. dev0(cid1) and dev1(cid2) are from the same type of backend, e.g.
> > > > > > > > > vhost-vsock, so the packets will be routed back as you described.
> > > > > > > > > 2. dev0(cid1) from vhost-user-vsock, and dev1(cid2) from vhost-vsock, the
> > > > > > > > > packets will be dropped if vhost-user-vsock device can't find a device
> > > > > > > > > whose cid is cid2.
> > > > > > > >
> > > > > > > > Well, this means the behavior depends on the implementation which is not good.
> > > > > > >
> > > > > > > No, it is not. It depends on whether the device can find a proper
> > > > > > > target device based on the dst_cid.
> > > > > >
> > > > > > This sounds really weird, two cids belong to the same guest. So guests
> > > > > > should expect that the two vsock devic can talk to each other?
> > > > >
> > > > > Yes, a little bit complicated.
> > > > >
> > > > > Two constraints should be applied:
> > > > >
> > > > > 1. No CID conflicts within the driver;
> > > > > 2. No CID conflicts within the address space.
> > > > >
> > > > > Here is a diagram to illustrate the situation where it does not violate the
> > > > > constraints:
> > > > >
> > > > >  ┌─ kernel─(as0)────────────────────────────────────────────────────┐
> > > > >  │   ┌───────────┐                      ┌───────────┐  ┌───────────┐│
> > > > >  │   │ dev0(cid0)│                      │ dev2(cid1)│  │ dev3(cid2)││
> > > > >  │   └───┬───────┘                      └───┬───────┘  └────────┬──┘│
> > > > >  └───────┼──────────────────────────────────┼───────────────────┼───┘
> > > > >   ┌──────┼───────────────────────┐   ┌──────┼───────────────────┼───┐
> > > > >   │      │  ┌───────────┐        │   │      │  ┌───────────┐    │   │
> > > > >   │      └──► dev0(cid0)│        │   │      └──► dev2(cid1)│    │   │
> > > > >   │         └───────────┘        │   │         └───────────┘    │   │
> > > > >   │         ┌───────────┐        │   │         ┌───────────┐    │   │
> > > > >   │      ┌──► dev1(cid1)│        │   │         │ dev3(cid2)◄────┘   │
> > > > >   │      │  └───────────┘     VM0│   │         └───────────┘     VM1│
> > > > >   └──────┼───────────────────────┘   └──────────────────────────────┘
> > > > >    vhost-user-vsock
> > > > > ┌────────┼──────────────────────────────────────────────────────────┐
> > > > > │   ┌────┼──────┐                                                   │
> > > > > │   │ dev1(cid1)│                                                   │
> > > > > │   └───────────┘                                                   │
> > > > > └─userapp─(as1)─────────────────────────────────────────────────────┘
> > > > >
> > > > > - VM0
> > > > >     - dev0
> > > > >         - dst_cid = 2 (well-known cid): connect to host;
> > > > >         - dst_cid = cid1: connect to dev2 (they are in the same as0);
> > > > >         - dst_cid = cid2: connect to dev3;
> > > > >     - dev1
> > > > >         - dst_cid = 2 (well-known cid): connect to userapp;
> > > > >         - dst_cid = cid0: failure (no cid0 is available in as1, even though
> > > > >         cid1 is available in the VM0);
> > > > >         - dst_cid = cid1: connect to dev0;
> > > > > - VM1
> > > > >     - dev2
> > > > >         - dst_cid = 2 (well-known cid): connect to host;
> > > > >         - dst_cid = cid0: connect to dev0;
> > > > >         - dst_cid = cid2: connect to dev3;
> > > > >     - dev3: skip the same as dev2.
> > > > >
> > > > > So back to your question, my answer is that it depends on the address
> > > > > space. Hope it could be helpful.
> > > >
> > > > This brings an interesting question, for example if vm0 tries to
> > > > connect to vm1, how does it know which device it needs to use (lacking
> > > > the concept like switch/route/address announcing etc...)?
> > >
> > > The HV should maintain a table for that. The guest things firstly
> > > communicate with the HV, through the default device, to know which device
> > > to use.
> >
> > It's still not clear to me how things work. For example, we had a
> > guest1 with two cid 4 ("default"),5 another guest2 with one cid 6. You
> > meant guest1 needs to ask the host to know about which device is
> > connected to 6? Or actually any device in guest1 can be used to
> > connected to guest2?
>
> I think it is application-level, like micro services? Here are some
> diagrams to illustrate the process:
>
> Step1: app0 asks the host: "I want to connect to app1 inside VM1";
> Step2: host replies: "Please use source cid=5, and destination cid=6"

This sounds like arp anyhow.

>
> +────────+         +───────────────────────────────+    +──────────────────────────────+
> │  host  │         │+───────────────+          VM0 │    │                         VM1  │
> │service │◀─step1──┤│dev0(cid=4,def)│──┐           │    │                              │
> +────────+         │+───────────────+  │           │    │                              │
>                    │+───────────────+  │  +───────+│    │+───────────────+     +──────+│
>                    ││  dev1(cid=5)  │  └──│ app0  ││    ││dev1(cid=6,def)│     │ app1 ││
>                    │+───────────────+     +───────+│    │+───────────────+     +──────+│
>                    +───────────────────────────────+    +──────────────────────────────+
>
> Step3: app0 binds (5, -1), and connects to (6, {PORT}) to establish a
> connection.
>
> +────────+         +───────────────────────────────+    +──────────────────────────────+
> │  host  │         │+───────────────+          VM0 │    │                         VM1  │
> │service │         ││dev0(cid=4,def)│              │    │                              │
> +────────+         │+───────────────+              │    │                              │
>                    │+──────────────bind(5, -1)────+│    │+───────────────+     +──────+│
>                  ┌─┤│  dev1(cid=5)  │◀────│ app0  ││    ││dev1(cid=6,def)│─────▶ app1 ││
>                  │ │+───────────────+     +───────+│    │+───────────────+     +──────+│
>                  │ +───────────────────────────────+    +────────▲─────────────────────+
>                  │                                               │
>                  └───────────────connect(6,PORT)─────────────────┘
>
> This is a little bit complicated, but it is dynamic. Another way is to
> assign a fixed CID for app1,

Who did the assignment here?

> so app0 can always connect to it without
> needing to ask for the host service.

Basically, I wonder if the above needs to be part of the spec or not
and why. If not, we should not bother here.

Thanks

>
> Thanks,
> Xuewei
>
> > Thanks
> >
> > >
> > > Thanks,
> > > Xuewei
> > >
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v7] virtio-vsock: Add support for multi devices
  2025-06-24  0:51                                         ` Jason Wang
@ 2025-06-24  3:33                                           ` Xuewei Niu
  0 siblings, 0 replies; 59+ messages in thread
From: Xuewei Niu @ 2025-06-24  3:33 UTC (permalink / raw)
  To: jasowang
  Cc: fupan.lfp, mst, niuxuewei.nxw, niuxuewei97, parav, sgarzare,
	virtio-comment

> On Mon, Jun 23, 2025 at 5:47 PM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> >
> > > On Thu, Jun 19, 2025 at 10:42 AM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > >
> > > > > On Wed, Jun 18, 2025 at 5:51 PM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > > >
> > > > > > > On Wed, Jun 18, 2025 at 1:40 PM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > > > > >
> > > > > > > > > On Wed, Jun 18, 2025 at 10:47 AM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > > > > > > >
> > > > > > > > > > > On Tue, Jun 17, 2025 at 3:46 PM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > Resend, because it isn’t listed in the mailing list due to my mistake.
> > > > > > > > > > > >
> > > > > > > > > > > > > On Mon, Jun 16, 2025 at 4:38 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Mon, 16 Jun 2025 at 10:29, Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Fri, Jun 13, 2025 at 4:46 PM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On Fri, 13 Jun 2025 at 06:57, Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > On Sat, Apr 12, 2025 at 10:39 PM Xuewei Niu <niuxuewei97@gmail.com> wrote:
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > This patch brings a new feature, called "multi devices", to the virtio
> > > > > > > > > > > > > > > > > > > > > vsock. It introduces a "VIRTIO_VSOCK_F_MULTI_DEVICES" feature bit, and a
> > > > > > > > > > > > > > > > > > > > > "device_order" field to the config for the virtio vsock.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > == Motivition ==
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Vsock is a lightweight and widely used data exchange mechanism between host
> > > > > > > > > > > > > > > > > > > > > and guest. Currently, the virtio-vsock only supports one device, resulting
> > > > > > > > > > > > > > > > > > > > > in the inability to enable more than one backend.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > I wonder which part of the spec forbids more than one device.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > No. The spec, however, is designed for a single device, and lacks some
> > > > > > > > > > > > > > > > > > > specifications for multiple devices.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > For example, we should have a mechanism to select a device from all to
> > > > > > > > > > > > > > > > > > > communicate with a peer.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I wonder if this is a:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 1) mechanism that needs to be mandated by the device
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Yes.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 2) a policy that is allowed to be tweaked by the user as TCP/IP did
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Not allowed in the current version.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > (Note anyhow the driver can override what the device suggests...)
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I think we should follow what we described in the spec:
> > > > > > > > > > > > > > "The virtio socket device is a zero-configuration socket communications device."
> > > > > > > > > > > > >
> > > > > > > > > > > > > We probably need to define "configuration" first.
> > > > > > > > > > > > >
> > > > > > > > > > > > > For example, if it means zero configuration from the user, it does not
> > > > > > > > > > > > > conflict with 2), the driver can use its own algorithm to elect a
> > > > > > > > > > > > > "default" device.
> > > > > > > > > > > >
> > > > > > > > > > > > IMHO, it can be done, but it is not the current design.
> > > > > > > > > > >
> > > > > > > > > > > Well, you need at least explain the advantages or why you choose to do this.
> > > > > > > > > >
> > > > > > > > > > I listed in the previous message. Maybe it is not clear enough. I'll try to
> > > > > > > > > > explain it again.
> > > > > > > > > >
> > > > > > > > > > I think people should pick up the device by a `bind()` call, which takes a
> > > > > > > > > > CID as an argument.
> > > > > > > > > >
> > > > > > > > > > Generally, the device is picked up by the source CID, which is achieved
> > > > > > > > > > through a `bind()` call.
> > > > > > > > > >
> > > > > > > > > > The default device, which is equivalent to the current single device, is
> > > > > > > > > > used to be compatible with the existing applications.
> > > > > > > > > >
> > > > > > > > > > Apart from that, the default device is used to communicate with hypervisor
> > > > > > > > > > for some init works, such as gathering information about other vsock
> > > > > > > > > > devices.
> > > > > > > > > >
> > > > > > > > > > To summarize, users must do `bind()` call explicitly to select the desired
> > > > > > > > > > device for non-HV-VM communications.
> > > > > > > > >
> > > > > > > > > So if I understand correctly you need a way to select the default when
> > > > > > > > > bind() is not called?
> > > > > > > >
> > > > > > > > Yes.
> > > > > > > >
> > > > > > > > > > > > I prefer to set the default device for VM-HV communication to do some init
> > > > > > > > > > > > work. I don't think people have a strong need for this.
> > > > > > > > > > > >
> > > > > > > > > > > > > Another perspective, making decisions in guests may be even more
> > > > > > > > > > > > > helpful for the case where the device is not trusted
> > > > > > > > > > > >
> > > > > > > > > > > > How does the guest realize the vsock device is not trusted?
> > > > > > > > > > >
> > > > > > > > > > > There're various ways to build trust (for example device attestation)
> > > > > > > > > > > and more might come in the future.
> > > > > > > > > > >
> > > > > > > > > > > > The guest only
> > > > > > > > > > > > knows the information from its config space, which is provided by the host.
> > > > > > > > > > >
> > > > > > > > > > > The way to build trust is probably beyond the scope of virtio, but it
> > > > > > > > > > > is something we need to consider.
> > > > > > > > > >
> > > > > > > > > > Agree with you. If it comes in the future, I think the driver should have
> > > > > > > > > > the ability to make decisions.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Actually, I meant the way to build trust via virtio is something that
> > > > > > > > > needs to be considered. But now we have other ways to build trust.
> > > > > > > > > That would result a situlation:
> > > > > > > > >
> > > > > > > > > 1) "default" vsock device is not trusted but other might
> > > > > > > >
> > > > > > > > I think this topic might be beyond the scope of this patch.
> > > > > > > >
> > > > > > > > With the current version, there is only one device supported, which can be
> > > > > > > > considered as the "default".  We don't have a mechanism to say "we don't
> > > > > > > > trust you", right?
> > > > > > >
> > > > > > > No. I meant we don't have it in the virtio core but we already have it
> > > > > > > in other layers (for example the transport layer).
> > > > > >
> > > > > > Are you referring to virito-vsock transport layer?
> > > > >
> > > > > Yes, for example the PCI layer.
> > > > >
> > > > > >
> > > > > > > > That is, it is assumed that we trust the device provided
> > > > > > > > by the hypervisor.
> > > > > > > >
> > > > > > > > This patch is for multiple devices support, based on the same assumption. I
> > > > > > > > think trust is a good point to consider, but perhaps we have to address it
> > > > > > > > in the follow-up patches.
> > > > > > >
> > > > > > > My point is not about how to build trust, it's about letting the
> > > > > > > driver decide by itself in some cases.
> > > > > > >
> > > > > > > A better way might be something like:
> > > > > > >
> > > > > > > "The device_order is a hint for the driver to select a default vsock
> > > > > > > device. Device MAY choose ...."
> > > > > >
> > > > > > Okay, I'll do that.
> > > > > >
> > > > > > > >
> > > > > > > > > or
> > > > > > > > >
> > > > > > > > > 2) two device claims that they are all "default"
> > > > > > > > >
> > > > > > > > > This means anyhow we need a decision from the driver side so the
> > > > > > > > > device side order seems to be useless here.
> > > > > > > >
> > > > > > > > There is only one default device. The device with the lowest device_order
> > > > > > > > is considered the default. It is not allowed to have the same device_order.
> > > > > > >
> > > > > > > Who can forbid two same device_order? Note that in various security
> > > > > > > models, hypervisors are not trusted at all.
> > > > > >
> > > > > > The driver does. Indeed, the driver is able to deny devices if they violate
> > > > > > the spec.
> > > > >
> > > > > Yes, that's the point, anyhow driver need to do the decision.
> > > > >
> > > > > >
> > > > > > > > > > > > > > So, IMO the guest (driver) should not be allowed to change anything.
> > > > > > > > > > > > > > E.g. Right now it's not allowed to change the CID assigned by the host (device).
> > > > > > > > > > > > >
> > > > > > > > > > > > > A dumb question when having two cids (cid1 and cid2) in the same
> > > > > > > > > > > > > guest, what happens if src=cid1 and dst=cid2?
> > > > > > > > > > > >
> > > > > > > > > > > > The packets will be directed to another application, if any, on the same guest.
> > > > > > > > > > >
> > > > > > > > > > > Ok, so in your example, vhost-user-vsock should route those packets
> > > > > > > > > > > back to kernel vsock?
> > > > > > > > > >
> > > > > > > > > > It depends. Two cases are possible:
> > > > > > > > > >
> > > > > > > > > > 1. dev0(cid1) and dev1(cid2) are from the same type of backend, e.g.
> > > > > > > > > > vhost-vsock, so the packets will be routed back as you described.
> > > > > > > > > > 2. dev0(cid1) from vhost-user-vsock, and dev1(cid2) from vhost-vsock, the
> > > > > > > > > > packets will be dropped if vhost-user-vsock device can't find a device
> > > > > > > > > > whose cid is cid2.
> > > > > > > > >
> > > > > > > > > Well, this means the behavior depends on the implementation which is not good.
> > > > > > > >
> > > > > > > > No, it is not. It depends on whether the device can find a proper
> > > > > > > > target device based on the dst_cid.
> > > > > > >
> > > > > > > This sounds really weird, two cids belong to the same guest. So guests
> > > > > > > should expect that the two vsock devic can talk to each other?
> > > > > >
> > > > > > Yes, a little bit complicated.
> > > > > >
> > > > > > Two constraints should be applied:
> > > > > >
> > > > > > 1. No CID conflicts within the driver;
> > > > > > 2. No CID conflicts within the address space.
> > > > > >
> > > > > > Here is a diagram to illustrate the situation where it does not violate the
> > > > > > constraints:
> > > > > >
> > > > > >  ┌─ kernel─(as0)────────────────────────────────────────────────────┐
> > > > > >  │   ┌───────────┐                      ┌───────────┐  ┌───────────┐│
> > > > > >  │   │ dev0(cid0)│                      │ dev2(cid1)│  │ dev3(cid2)││
> > > > > >  │   └───┬───────┘                      └───┬───────┘  └────────┬──┘│
> > > > > >  └───────┼──────────────────────────────────┼───────────────────┼───┘
> > > > > >   ┌──────┼───────────────────────┐   ┌──────┼───────────────────┼───┐
> > > > > >   │      │  ┌───────────┐        │   │      │  ┌───────────┐    │   │
> > > > > >   │      └──► dev0(cid0)│        │   │      └──► dev2(cid1)│    │   │
> > > > > >   │         └───────────┘        │   │         └───────────┘    │   │
> > > > > >   │         ┌───────────┐        │   │         ┌───────────┐    │   │
> > > > > >   │      ┌──► dev1(cid1)│        │   │         │ dev3(cid2)◄────┘   │
> > > > > >   │      │  └───────────┘     VM0│   │         └───────────┘     VM1│
> > > > > >   └──────┼───────────────────────┘   └──────────────────────────────┘
> > > > > >    vhost-user-vsock
> > > > > > ┌────────┼──────────────────────────────────────────────────────────┐
> > > > > > │   ┌────┼──────┐                                                   │
> > > > > > │   │ dev1(cid1)│                                                   │
> > > > > > │   └───────────┘                                                   │
> > > > > > └─userapp─(as1)─────────────────────────────────────────────────────┘
> > > > > >
> > > > > > - VM0
> > > > > >     - dev0
> > > > > >         - dst_cid = 2 (well-known cid): connect to host;
> > > > > >         - dst_cid = cid1: connect to dev2 (they are in the same as0);
> > > > > >         - dst_cid = cid2: connect to dev3;
> > > > > >     - dev1
> > > > > >         - dst_cid = 2 (well-known cid): connect to userapp;
> > > > > >         - dst_cid = cid0: failure (no cid0 is available in as1, even though
> > > > > >         cid1 is available in the VM0);
> > > > > >         - dst_cid = cid1: connect to dev0;
> > > > > > - VM1
> > > > > >     - dev2
> > > > > >         - dst_cid = 2 (well-known cid): connect to host;
> > > > > >         - dst_cid = cid0: connect to dev0;
> > > > > >         - dst_cid = cid2: connect to dev3;
> > > > > >     - dev3: skip the same as dev2.
> > > > > >
> > > > > > So back to your question, my answer is that it depends on the address
> > > > > > space. Hope it could be helpful.
> > > > >
> > > > > This brings an interesting question, for example if vm0 tries to
> > > > > connect to vm1, how does it know which device it needs to use (lacking
> > > > > the concept like switch/route/address announcing etc...)?
> > > >
> > > > The HV should maintain a table for that. The guest things firstly
> > > > communicate with the HV, through the default device, to know which device
> > > > to use.
> > >
> > > It's still not clear to me how things work. For example, we had a
> > > guest1 with two cid 4 ("default"),5 another guest2 with one cid 6. You
> > > meant guest1 needs to ask the host to know about which device is
> > > connected to 6? Or actually any device in guest1 can be used to
> > > connected to guest2?
> >
> > I think it is application-level, like micro services? Here are some
> > diagrams to illustrate the process:
> >
> > Step1: app0 asks the host: "I want to connect to app1 inside VM1";
> > Step2: host replies: "Please use source cid=5, and destination cid=6"
> 
> This sounds like arp anyhow.

Not really.

As Stefano mentioned in [1][2], vsock is designed for p2p communication. I
am just quoting what he said:

"Just a note, AF_VSOCK is suppose to be very similar to AF_UNIX. It's a 
point ot point connection, we don't have any transport layer like TCP.
What we call "transport" in AF_VSOCK world, is usually the driver/device 
usend to send data (e.g. vmci, virtio, vhost, hyperv)."

"Just a reminder, vsock is not a network device, is more a P2P device where
we want to keep the configuration in the guest as simpler as possible (we
don't want to run ARP, DHCP, etc.)."

1: https://lore.kernel.org/virtio-comment/ntpwwmwwaow4lfdjubgdwsvvpzpkuc52iz3vjztbx4iphvpnem@gmd2bsqckonm/
2: https://lore.kernel.org/virtio-comment/ncciiv3udhy6mylzobtt5jnp3xwthtfvuwonb7i5c5hkpcyfc2@utxcie3jxof2/

> > +────────+         +───────────────────────────────+    +──────────────────────────────+
> > │  host  │         │+───────────────+          VM0 │    │                         VM1  │
> > │service │◀─step1──┤│dev0(cid=4,def)│──┐           │    │                              │
> > +────────+         │+───────────────+  │           │    │                              │
> >                    │+───────────────+  │  +───────+│    │+───────────────+     +──────+│
> >                    ││  dev1(cid=5)  │  └──│ app0  ││    ││dev1(cid=6,def)│     │ app1 ││
> >                    │+───────────────+     +───────+│    │+───────────────+     +──────+│
> >                    +───────────────────────────────+    +──────────────────────────────+
> >
> > Step3: app0 binds (5, -1), and connects to (6, {PORT}) to establish a
> > connection.
> >
> > +────────+         +───────────────────────────────+    +──────────────────────────────+
> > │  host  │         │+───────────────+          VM0 │    │                         VM1  │
> > │service │         ││dev0(cid=4,def)│              │    │                              │
> > +────────+         │+───────────────+              │    │                              │
> >                    │+──────────────bind(5, -1)────+│    │+───────────────+     +──────+│
> >                  ┌─┤│  dev1(cid=5)  │◀────│ app0  ││    ││dev1(cid=6,def)│─────▶ app1 ││
> >                  │ │+───────────────+     +───────+│    │+───────────────+     +──────+│
> >                  │ +───────────────────────────────+    +────────▲─────────────────────+
> >                  │                                               │
> >                  └───────────────connect(6,PORT)─────────────────┘
> >
> > This is a little bit complicated, but it is dynamic. Another way is to
> > assign a fixed CID for app1,
> 
> Who did the assignment here?

Discovery must happen through an out-of-band mechanism, which is not
built-in for vsock, for example, the configuration files.

Even though we have a single device, we have to discovery the CID for
sibling. This feature is aimed at adding more devices, the mechanism
remains the same.

> > so app0 can always connect to it without
> > needing to ask for the host service.
> 
> Basically, I wonder if the above needs to be part of the spec or not
> and why. If not, we should not bother here.

Yes, I think it is not part of the spec as I mentioned above.

Thanks,
Xuewei

> Thanks
> 
> >
> > Thanks,
> > Xuewei
> >
> > > Thanks
> > >
> > > >
> > > > Thanks,
> > > > Xuewei
> > > >
> >

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v7] virtio-vsock: Add support for multi devices
  2025-06-23 15:51                                       ` Xuewei Niu
@ 2025-07-01 10:31                                         ` Stefano Garzarella
  2025-07-02  6:05                                           ` Xuewei Niu
  0 siblings, 1 reply; 59+ messages in thread
From: Stefano Garzarella @ 2025-07-01 10:31 UTC (permalink / raw)
  To: Xuewei Niu; +Cc: fupan.lfp, mst, niuxuewei.nxw, parav, virtio-comment

On Mon, Jun 23, 2025 at 11:51:39PM +0800, Xuewei Niu wrote:
>> On Mon, Jun 23, 2025 at 08:14:07PM +0800, Xuewei Niu wrote:
>> >> On Mon, Jun 23, 2025 at 06:35:59PM +0800, Xuewei Niu wrote:
>> >> >> On Mon, Jun 23, 2025 at 04:48:33PM +0800, Xuewei Niu wrote:
>> >> >> >> On Thu, Jun 19, 2025 at 01:10:33PM +0800, Xuewei Niu wrote:
>> >> >> >> >> > From: Xuewei Niu <niuxuewei97@gmail.com>
>> >> >> >> >> > Sent: 19 June 2025 08:57 AM
>> >> >> >> >> >
>> >> >> >> >> > Hi Parav,
>> >> >> >> >> >
>> >> >> >> >> > Could you please take a look at the diagram in [1]?
>> >> >> >> >> >
>> >> >> >> >> > IIUC, for VM0, there are two groups, and for VM1, there is only one group.
>> >> >> >> >> > Am I right? If yes, I think the group concept is reasonable but we don't need
>> >> >> >> >> > at this time.
>> >> >> >> >> >
>> >> >> >> >> > I think the first thing is to figure out how to pick the right group.
>> >> >> >> >> >
>> >> >> >> >> > Standard socket doesn't provide a way to access the group information.
>> >> >> >> >> >
>> >> >> >> >> > Source and destination are from `bind()` and `connect()`, respectively. If we
>> >> >> >> >> > don't call `bind()`, only the destination is known.
>> >> >> >> >> >
>> >> >> >> >> > However, only destination is not enough to find the group. For example, the
>> >> >> >> >> > well-known CIDs (e.g. 2) are valid for all groups.
>> >> >> >> >> >
>> >> >> >> >> > 1:
>> >> >> >> >> > https://lore.kernel.org/virtio-comment/20250618095139.1412138-1-niuxuewei.nxw@antgroup.com/
>> >> >> >> >>
>> >> >> >> >> Right. Sock addressing scheme is naïve presently to select the group.
>> >> >> >> >> Not sure when/how you or others plan to do.
>> >> >> >> >> This is transport layer problem to solve (not to confuse with transport = pci/mmio etc).
>> >> >> >>
>> >> >> >>
>> >> >> >> >>
>> >> >> >> >> However, at device level, we should have the construct of grouping.
>> >> >> >> >> Without this construct, all devices will be part of single group and one will not be able to build the group concept later.
>> >> >> >> >> So even if you don't need it explicitly now, grouping the device is what you need when connect() is called.
>> >> >> >> >>
>> >> >> >> >> So I was imagining a relatively simple scheme:
>> >> >> >> >> For example, virtio device level, some kind of group id is present.
>> >> >> >> >> So two devices which has same group id, are part of single group.
>> >> >> >> >> An example group id format can be a UUID.
>> >> >> >> >>
>> >> >> >> >> And this is completely optional for devices to implement.
>> >> >> >> >> Generic enough and usable beyond just vsock device in other use cases
>> >> >> >> >> we discussed in past.
>> >> >> >> >
>> >> >> >> >Fair enough.
>> >> >> >> >
>> >> >> >> >@Stefano, could you please take a look at this? I'd love to have some
>> >> >> >> >input
>> >> >> >> >from you.
>> >> >> >> >
>> >> >> >> >A brief summary of the idea is: The config space will be extended to
>> >> >> >> >include a group id. The devices with the same group id are considered
>> >> >> >> >to be
>> >> >> >> >in the same group.
>> >> >> >>
>> >> >> >> Thanks for the summary, but please avoid top posting, otherwise is very
>> >> >> >> hard to follow the discussion :-(
>> >> >> >> https://subspace.kernel.org/etiquette.html#do-not-top-post-when-replying
>> >> >> >
>> >> >> >Sorry, I'll avoid it in the future.
>> >> >> >
>> >> >> >> I like the idea of groups. What is not clear to me, is how groups will
>> >> >> >> allow the driver to select the default output device when the source
>> >> >> >> socket is not bind to any source CID.
>> >> >> >
>> >> >> >Well, we did discuss, but we need your input.
>> >> >> >
>> >> >> >I said in the thread [1] based on the standard socket API, the driver can't
>> >> >> >pick a group. Parav [2] suggested that the group, as a basic concept,
>> >> >> >should be present even if we are unable to use it.
>> >> >> >
>> >> >> >IMHO, we might use "{group_id}-{cid}" as the vsock addressing scheme to use
>> >> >> >the concept. But it is a very big change, leading to incompatibility with
>> >> >> >the existing apps.
>> >> >>
>> >> >> I still don't understand how the group_id will work :-( and how will
>> >> >> allow the driver to pick the default device.
>> >> >
>> >> >FYI, here is trying to explain what is group and why it is needed, but I
>> >> >agree with you about the idea of "types".
>> >> >
>> >> >I posted a diagram in the thread [1]. I think the group is something like a
>> >> >"namespace". I think it is reasonable, but my concerns are compatibility
>> >> >and complexity.
>> >> >
>> >> >1: https://lore.kernel.org/virtio-comment/20250618095139.1412138-1-niuxuewei.nxw@antgroup.com/
>> >> >
>> >> >> This is the `sockaddr_vm`, so we should be careful of extending it:
>> >> >>
>> >> >> struct sockaddr_vm {
>> >> >> 	__kernel_sa_family_t svm_family;
>> >> >> 	unsigned short svm_reserved1;
>> >> >> 	unsigned int svm_port;
>> >> >> 	unsigned int svm_cid;
>> >> >>
>> >> >> #define VMADDR_FLAG_TO_HOST 0x01
>> >> >>
>> >> >> 	__u8 svm_flags;
>> >> >> 	unsigned char svm_zero[sizeof(struct sockaddr) -
>> >> >> 			       sizeof(sa_family_t) -
>> >> >> 			       sizeof(unsigned short) -
>> >> >> 			       sizeof(unsigned int) -
>> >> >> 			       sizeof(unsigned int) -
>> >> >> 			       sizeof(__u8)];
>> >> >> };
>> >> >>
>> >> >> >I think it might be beyond the scope of this patch, and would make the
>> >> >> >vsock more complex. The current conclusion is that we will keep the concept
>> >> >> >of grouping as a placeholder, but we will not use it.
>> >> >>
>> >> >> IMO we should first clarify better what we want to support.
>> >> >> As I already suggested some months ago, IMHO supporting any number of
>> >> >> vsock devices for a VM it's not really needed for your goal and I can't
>> >> >> see other use cases where a virtio-net device can't be use. Just a
>> >> >> reminder, vsock is not a network device, is more a P2P device where we
>> >> >> want to keep the configuration in the guest as simpler as possible (we
>> >> >> don't want to run ARP, DHCP, etc.).
>> >> >
>> >> >Agree that.
>> >> >
>> >> >> Till now vsock was more used just for guest-host communication, but
>> >> >> recently it was extended to communicate with sibling VMs.
>> >> >>
>> >> >> IIUC your use case, we just need to support different type of vsock
>> >> >> devices attached to the VM. With "type" I mean type of address handled.
>> >> >> I think we can define 3 types based on the CID we have:
>> >> >> - hypervisor: VMADDR_CID_HYPERVISOR(0)
>> >> >> - host: VMADDR_CID_HOST(2)
>> >> >> - sibling: CID >=3
>> >> >>
>> >> >> The vhost-vsock device handles only VMADDR_CID_HOST, so it doesn't allow
>> >> >> to reach from the guest any other CIDs.
>> >> >> The vhost-user-vsock recently started to support sibling VMs.
>> >> >> The tcp-over-vsock of libkrun should use VMADDR_CID_HYPERVISOR(0).
>> >> >>
>> >> >> So, IMHO we should define new features or config flags that a device can
>> >> >> expose depending on which address is able to handle.
>> >> >> Also, in order to avoid to overcomplicate vsock, we should allow only
>> >> >> one device for each type (a single device should support multiple types,
>> >> >> but only one device can be registered for a type).
>> >> >
>> >> >I think only one device can be registered for sibling and hypervisor. And
>> >> >it is possible to have multiple devices for host. Otherwise, the goals of
>> >> >this patch will be not achieved.
>> >>
>> >> Why?
>> >>
>> >> IMO, as I already wrote, the libkrun service should use CID=0.
>> >
>> >As we before discussed, if there is only one backend for the libkrun
>> >service, we can use cid=0. I totally agree with you. So let us put this
>> >case aside first.
>> >
>> >> How you will handle multiple devices for host? How can the guest know
>> >> which device to use to reach HOST(2)?
>> >
>> >Do `bind()` explicitly, and a device with matching cid will be picked up.
>>
>> Okay, but why you need 2 devices to communicate with the same CID, host
>> in this case (CID=2)?
>> IMO use the source CID to multiplex a socket at destination is not
>> great. But I can be wrong.
>
>I mean multiple sockets with multiple `bind()` calls, not multiplexing:
>
>- socket1: bind(3, -1), connect(2, 10000);
>- socket2: bind(4, -1), connect(2, 10001);
>- ...

Yep, of course, I meant exactly that.
In this case we are doing multiplexing based on the source address, 
which IMHO is odd.

>
>> >> IMO is easier to have the multiple device for sibling (e.g. the device
>> >> can advertise which dest CID its supports), but not for host.
>> >
>> >I think one device for sibling is enough.
>>
>> I also think one should be enough, but IMO I think it might make more
>> sense to have multiple devices in this case, where each device can
>> handle a pool of CIDs, then the hard part will be figuring out how to
>> allocate the CIDs, etc. so yes, I agree that having one device even in
>> this case is the easiest thing.
>
>Yeah, so let us skip this for now ;)

Agree.

>
>> >For example, dev1 is enough for
>> >communication with dev3 and dev4. So we don't need dev2 for sibling in the
>> >first VM, right?
>>
>> Just a note, vhost-vsock is not allowing any sibling communication.
>> vhost-user-vsock can do it, but we don't want to bring any support in
>> vhost-vsock to not overcomplicate it (again it will become like a
>> network switch, requiring firewalls, etc.)
>
>Even if we don't impose some complicated mechanisms, I think the
>vhost-vsock should work with sibling, and I don't see any difference
>between vhost-vsock and vhost-user-vsock. (I am just curious.)

THe difference is how to prevent a communication between VMs of 
different users.

With vhost-user-vsock, you can decide which VMs to connect to the 
vhost-user backend, in vhost-vsock all of them will be attached to the 
host net stack, so we need to add some kind of firewall, etc. and it 
will complicate a lot our simple stack. So I'm not sure we want that.

>
>> >+──kernel(vhost-vsock)──────────────────────────────────────────────+
>> >│+──────────────+ +──────────────+ +──────────────+ +──────────────+│
>> >││  dev1(cid1)  │ │  dev2(cid2)  │ │  dev3(cid3)  │ │  dev4(cid4)  ││
>> >│+──────────────+ +──────────────+ +──────────────+ +──────────────+│
>> >+────────▲────────────────▲────────────────▲────────────────▲───────+
>> >         │                │                │                │
>> >         │                │          ┌─────┘                │
>> >         │                │          │                      │
>> >+────────┴───────────+    │ +────────┴───────────+ +────────┴───────────+
>> >│+──────────────+ VM0│    │ │+──────────────+ VM1│ │+──────────────+ VM2│
>> >││dev1(cid1,def)│    │    │ ││dev3(cid3,def)│    │ ││dev4(cid4,def)│    │
>> >│+──────────────+    │    │ │+──────────────+    │ │+──────────────+    │
>> >│+──────────────+    │    │ │                    │ │                    │
>> >││  dev2(cid2)  │────┼────┘ │                    │ │                    │
>> >│+──────────────+    │      │                    │ │                    │
>> >+────────────────────+      +────────────────────+ +────────────────────+
>> >
>> >The things go different for devices for host:
>> >
>> >1. For dev1, it is used for sibling (dev1 <-> dev4), and the host
>> >(src_cid=cid1, dst_cid=2) means the real host (not a userapp), where
>> >doesn't show in the diagram.
>>
>> This is not going to happen (see above).
>>
>> >2. For dev2 (src_cid=cid2), the host (dst_cid=2) is userapp1;
>> >3. For dev3 (src_cid=cid3), the host (dst_cid=2) is userapp2.
>> >
>> >+──kernel(vhost-vsock)──────────────────────────────────────────────+
>> >│+──────────────+                                   +──────────────+│
>> >││  dev1(cid1)  │                                   │  dev4(cid4)  ││
>> >│+──────────────+                                   +──────────────+│
>> >+────────▲──────────────────────────────────────────────────▲───────+
>> >         │                                                  │
>> >+────────┴───────────+     +────────────────────+           │
>> >│+──────────────+ VM0│     │+──────────────+    │           │
>> >││dev1(cid1,def)│    │┌────▶│  dev2(cid2)  │    │  +────────┴───────────+
>> >│+──────────────+    ││    │+──────────────+    │  │+──────────────+ VM2│
>> >│+──────────────+    ││    │            userapp1│  ││dev4(cid4,def)│    │
>> >││  dev2(cid2)  │────┼┘    +────────────────────+  │+──────────────+    │
>> >│+──────────────+  vhost-user-vsock─────────────+  │                    │
>> >│+──────────────+    │     │+──────────────+    │  │                    │
>> >││  dev3(cid3)  │────┼─────▶│  dev3(cid3)  │    │  │                    │
>> >│+──────────────+    │     │+──────────────+    │  +────────────────────+
>> >+────────────────────+     │            userapp2│
>> >                           +────────────────────+
>> >
>>
>> I'm really confused with `cid1`, `cid2`, etc. Are they any number >= 3?
>> I'd suggest to use real value (e.g. cid=42).
>
>Okay. I'll update it with real value ;)
>
>Based on the above question, I still put the devices into the kernel.
>
>+──kernel(vhost-vsock)──────────────────────────────────────────────────+
>│+───────────────+                                  +───────────────+   │
>││  dev1(cid=3)  │                                  │  dev4(cid=6)  │   │
>│+───────────────+                                  +───────────────+   │
>+────────▲──────────────────────────────────────────────────▲───────────+
>         │                                                  │
>+────────┴───────────+     +────────────────────+           │
>│+───────────────+VM0│     │+──────────────+    │           │
>││dev1(cid=3,def)│   │┌────▶│ dev2(cid=4)  │    │  +────────┴───────────+
>│+───────────────+   ││    │+──────────────+    │  │+───────────────+VM2│
>│+───────────────+   ││    │            userapp1│  ││dev4(cid=6,def)│   │
>││  dev2(cid=4)  │───┼┘    +────────────────────+  │+───────────────+   │
>│+───────────────+ vhost-user-vsock─────────────+  │                    │
>│+───────────────+   │     │+──────────────+    │  │                    │
>││  dev3(cid=5)  │───┼─────▶│ dev3(cid=5)  │    │  │                    │
>│+───────────────+   │     │+──────────────+    │  +────────────────────+
>+────────────────────+     │            userapp2│
>                           +────────────────────+
>
>> So what dest CID the VM0 is supposed to use to talk with userapp1 and
>> userapp2? In both cases CID=2, right?
>
>Yes. There are at least two sockets with source cid=4 and cid=5 
>respectively.

As I said, this is odd IMHO.
We are using the source address to multiplex the destination app.
We should use the destination address for that, no?

>
>> Why you need 2 vhost-user-vsock devices?
>
>The benefit of vhost-user is "shared memory", which reduces the need 
>for
>data copying. It is possible to share virtqueues to multiple user apps, for
>the sake of performance.
>
>I don't forget the "CID=0" thing. Just as an explanation, I'll use the
>example of TSI.
>
>We can treat the TSI backend as a proxy. Thanks to vhost-user-vsock, the
>data will be copied once from the guest user space to the proxy. When we
>have two subnets, which is a common case, we might want to have two 
>proxies
>to forward the data.

Okay, I see it now, but it's really the use case of vsock?
In this way the destination address (CID, port) is completely useless, 
since it's never used, so why using vsock for this use case?

I have an idea, but I don't know if it is feasible.
CID=0 is pretty much unsupported for now by virtio-vsock, but maybe we 
could leverage it for this use case.

If we have multiple devices, but each practically allows only one 
application to be reached, then these devices can be reached by CID=0 
and port=x, where each device exposes in its configuration space the 
port to which it responds.

Thus, in the guest, connect(0, 10001) will go to the device that exposes 
port 10001, and so on.

I don't know if I really like that proposal honestly, anyway, if we go 
back with yours instead, where the guest has to do the bind to choose 
the device to use, that's fine, but the one thing we have to have IMO is 
a way to set the default device, as we were doing.

>
>                 +────────────────+    +───────────────────────────────+
> .─────────.     │  tsi backend1  │    │+───────────────+    ┏━━━━━━━━┓│
>(    NW1    )◀───│   (userapp1)   │◀───┤│  vsock dev1   ◀────┃subnet1 ┃│
> `─────────'     +────────────────+    │+───────────────+    ┗━━━━━━━━┛│
>                                       │                               │
>                 +────────────────+    │+───────────────+    
>                 ┏━━━━━━━━┓│
> .─────────.     │  tsi backend2  │◀───┤│  vsock dev2   ◀────┃subnet2 ┃│
>(    NW2    )◀───│   (userapp2)   │    │+───────────────+    ┗━━━━━━━━┛│
> `─────────'     +────────────────+    +───────────────────────────────+
>
>> Can you just have a single one and have the application
>> connecting/listing on different port? (which is the sense of the port,
>> multiplexing application on the destination)
>
>In terms of functionality, I think it is possible. But it loses the benefit
>of vhost-user-vsock.

Can you elaborate a bit?

BTW in rust-vmm/vhost-device we use a single vhost-user device and 
multiplex connections between multiple applications in the host: 
https://github.com/rust-vmm/vhost-device/tree/main/vhost-device-vsock

Thanks,
Stefano


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v7] virtio-vsock: Add support for multi devices
  2025-07-01 10:31                                         ` Stefano Garzarella
@ 2025-07-02  6:05                                           ` Xuewei Niu
  2025-07-10 10:19                                             ` Stefano Garzarella
  0 siblings, 1 reply; 59+ messages in thread
From: Xuewei Niu @ 2025-07-02  6:05 UTC (permalink / raw)
  To: sgarzare
  Cc: fupan.lfp, mst, niuxuewei.nxw, niuxuewei97, parav, virtio-comment

> On Mon, Jun 23, 2025 at 11:51:39PM +0800, Xuewei Niu wrote:
> >> On Mon, Jun 23, 2025 at 08:14:07PM +0800, Xuewei Niu wrote:
> >> >> On Mon, Jun 23, 2025 at 06:35:59PM +0800, Xuewei Niu wrote:
> >> >> >> On Mon, Jun 23, 2025 at 04:48:33PM +0800, Xuewei Niu wrote:
> >> >> >> >> On Thu, Jun 19, 2025 at 01:10:33PM +0800, Xuewei Niu wrote:
> >> >> >> >> >> > From: Xuewei Niu <niuxuewei97@gmail.com>
> >> >> >> >> >> > Sent: 19 June 2025 08:57 AM
> >> >> >> >> >> >
> >> >> >> >> >> > Hi Parav,
> >> >> >> >> >> >
> >> >> >> >> >> > Could you please take a look at the diagram in [1]?
> >> >> >> >> >> >
> >> >> >> >> >> > IIUC, for VM0, there are two groups, and for VM1, there is only one group.
> >> >> >> >> >> > Am I right? If yes, I think the group concept is reasonable but we don't need
> >> >> >> >> >> > at this time.
> >> >> >> >> >> >
> >> >> >> >> >> > I think the first thing is to figure out how to pick the right group.
> >> >> >> >> >> >
> >> >> >> >> >> > Standard socket doesn't provide a way to access the group information.
> >> >> >> >> >> >
> >> >> >> >> >> > Source and destination are from `bind()` and `connect()`, respectively. If we
> >> >> >> >> >> > don't call `bind()`, only the destination is known.
> >> >> >> >> >> >
> >> >> >> >> >> > However, only destination is not enough to find the group. For example, the
> >> >> >> >> >> > well-known CIDs (e.g. 2) are valid for all groups.
> >> >> >> >> >> >
> >> >> >> >> >> > 1:
> >> >> >> >> >> > https://lore.kernel.org/virtio-comment/20250618095139.1412138-1-niuxuewei.nxw@antgroup.com/
> >> >> >> >> >>
> >> >> >> >> >> Right. Sock addressing scheme is naïve presently to select the group.
> >> >> >> >> >> Not sure when/how you or others plan to do.
> >> >> >> >> >> This is transport layer problem to solve (not to confuse with transport = pci/mmio etc).
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >> >> However, at device level, we should have the construct of grouping.
> >> >> >> >> >> Without this construct, all devices will be part of single group and one will not be able to build the group concept later.
> >> >> >> >> >> So even if you don't need it explicitly now, grouping the device is what you need when connect() is called.
> >> >> >> >> >>
> >> >> >> >> >> So I was imagining a relatively simple scheme:
> >> >> >> >> >> For example, virtio device level, some kind of group id is present.
> >> >> >> >> >> So two devices which has same group id, are part of single group.
> >> >> >> >> >> An example group id format can be a UUID.
> >> >> >> >> >>
> >> >> >> >> >> And this is completely optional for devices to implement.
> >> >> >> >> >> Generic enough and usable beyond just vsock device in other use cases
> >> >> >> >> >> we discussed in past.
> >> >> >> >> >
> >> >> >> >> >Fair enough.
> >> >> >> >> >
> >> >> >> >> >@Stefano, could you please take a look at this? I'd love to have some
> >> >> >> >> >input
> >> >> >> >> >from you.
> >> >> >> >> >
> >> >> >> >> >A brief summary of the idea is: The config space will be extended to
> >> >> >> >> >include a group id. The devices with the same group id are considered
> >> >> >> >> >to be
> >> >> >> >> >in the same group.
> >> >> >> >>
> >> >> >> >> Thanks for the summary, but please avoid top posting, otherwise is very
> >> >> >> >> hard to follow the discussion :-(
> >> >> >> >> https://subspace.kernel.org/etiquette.html#do-not-top-post-when-replying
> >> >> >> >
> >> >> >> >Sorry, I'll avoid it in the future.
> >> >> >> >
> >> >> >> >> I like the idea of groups. What is not clear to me, is how groups will
> >> >> >> >> allow the driver to select the default output device when the source
> >> >> >> >> socket is not bind to any source CID.
> >> >> >> >
> >> >> >> >Well, we did discuss, but we need your input.
> >> >> >> >
> >> >> >> >I said in the thread [1] based on the standard socket API, the driver can't
> >> >> >> >pick a group. Parav [2] suggested that the group, as a basic concept,
> >> >> >> >should be present even if we are unable to use it.
> >> >> >> >
> >> >> >> >IMHO, we might use "{group_id}-{cid}" as the vsock addressing scheme to use
> >> >> >> >the concept. But it is a very big change, leading to incompatibility with
> >> >> >> >the existing apps.
> >> >> >>
> >> >> >> I still don't understand how the group_id will work :-( and how will
> >> >> >> allow the driver to pick the default device.
> >> >> >
> >> >> >FYI, here is trying to explain what is group and why it is needed, but I
> >> >> >agree with you about the idea of "types".
> >> >> >
> >> >> >I posted a diagram in the thread [1]. I think the group is something like a
> >> >> >"namespace". I think it is reasonable, but my concerns are compatibility
> >> >> >and complexity.
> >> >> >
> >> >> >1: https://lore.kernel.org/virtio-comment/20250618095139.1412138-1-niuxuewei.nxw@antgroup.com/
> >> >> >
> >> >> >> This is the `sockaddr_vm`, so we should be careful of extending it:
> >> >> >>
> >> >> >> struct sockaddr_vm {
> >> >> >> 	__kernel_sa_family_t svm_family;
> >> >> >> 	unsigned short svm_reserved1;
> >> >> >> 	unsigned int svm_port;
> >> >> >> 	unsigned int svm_cid;
> >> >> >>
> >> >> >> #define VMADDR_FLAG_TO_HOST 0x01
> >> >> >>
> >> >> >> 	__u8 svm_flags;
> >> >> >> 	unsigned char svm_zero[sizeof(struct sockaddr) -
> >> >> >> 			       sizeof(sa_family_t) -
> >> >> >> 			       sizeof(unsigned short) -
> >> >> >> 			       sizeof(unsigned int) -
> >> >> >> 			       sizeof(unsigned int) -
> >> >> >> 			       sizeof(__u8)];
> >> >> >> };
> >> >> >>
> >> >> >> >I think it might be beyond the scope of this patch, and would make the
> >> >> >> >vsock more complex. The current conclusion is that we will keep the concept
> >> >> >> >of grouping as a placeholder, but we will not use it.
> >> >> >>
> >> >> >> IMO we should first clarify better what we want to support.
> >> >> >> As I already suggested some months ago, IMHO supporting any number of
> >> >> >> vsock devices for a VM it's not really needed for your goal and I can't
> >> >> >> see other use cases where a virtio-net device can't be use. Just a
> >> >> >> reminder, vsock is not a network device, is more a P2P device where we
> >> >> >> want to keep the configuration in the guest as simpler as possible (we
> >> >> >> don't want to run ARP, DHCP, etc.).
> >> >> >
> >> >> >Agree that.
> >> >> >
> >> >> >> Till now vsock was more used just for guest-host communication, but
> >> >> >> recently it was extended to communicate with sibling VMs.
> >> >> >>
> >> >> >> IIUC your use case, we just need to support different type of vsock
> >> >> >> devices attached to the VM. With "type" I mean type of address handled.
> >> >> >> I think we can define 3 types based on the CID we have:
> >> >> >> - hypervisor: VMADDR_CID_HYPERVISOR(0)
> >> >> >> - host: VMADDR_CID_HOST(2)
> >> >> >> - sibling: CID >=3
> >> >> >>
> >> >> >> The vhost-vsock device handles only VMADDR_CID_HOST, so it doesn't allow
> >> >> >> to reach from the guest any other CIDs.
> >> >> >> The vhost-user-vsock recently started to support sibling VMs.
> >> >> >> The tcp-over-vsock of libkrun should use VMADDR_CID_HYPERVISOR(0).
> >> >> >>
> >> >> >> So, IMHO we should define new features or config flags that a device can
> >> >> >> expose depending on which address is able to handle.
> >> >> >> Also, in order to avoid to overcomplicate vsock, we should allow only
> >> >> >> one device for each type (a single device should support multiple types,
> >> >> >> but only one device can be registered for a type).
> >> >> >
> >> >> >I think only one device can be registered for sibling and hypervisor. And
> >> >> >it is possible to have multiple devices for host. Otherwise, the goals of
> >> >> >this patch will be not achieved.
> >> >>
> >> >> Why?
> >> >>
> >> >> IMO, as I already wrote, the libkrun service should use CID=0.
> >> >
> >> >As we before discussed, if there is only one backend for the libkrun
> >> >service, we can use cid=0. I totally agree with you. So let us put this
> >> >case aside first.
> >> >
> >> >> How you will handle multiple devices for host? How can the guest know
> >> >> which device to use to reach HOST(2)?
> >> >
> >> >Do `bind()` explicitly, and a device with matching cid will be picked up.
> >>
> >> Okay, but why you need 2 devices to communicate with the same CID, host
> >> in this case (CID=2)?
> >> IMO use the source CID to multiplex a socket at destination is not
> >> great. But I can be wrong.
> >
> >I mean multiple sockets with multiple `bind()` calls, not multiplexing:
> >
> >- socket1: bind(3, -1), connect(2, 10000);
> >- socket2: bind(4, -1), connect(2, 10001);
> >- ...
> 
> Yep, of course, I meant exactly that.
> In this case we are doing multiplexing based on the source address, 
> which IMHO is odd.

Please see below.

> >> >> IMO is easier to have the multiple device for sibling (e.g. the device
> >> >> can advertise which dest CID its supports), but not for host.
> >> >
> >> >I think one device for sibling is enough.
> >>
> >> I also think one should be enough, but IMO I think it might make more
> >> sense to have multiple devices in this case, where each device can
> >> handle a pool of CIDs, then the hard part will be figuring out how to
> >> allocate the CIDs, etc. so yes, I agree that having one device even in
> >> this case is the easiest thing.
> >
> >Yeah, so let us skip this for now ;)
> 
> Agree.
> 
> > >
> >> >For example, dev1 is enough for
> >> >communication with dev3 and dev4. So we don't need dev2 for sibling in the
> >> >first VM, right?
> >>
> >> Just a note, vhost-vsock is not allowing any sibling communication.
> >> vhost-user-vsock can do it, but we don't want to bring any support in
> >> vhost-vsock to not overcomplicate it (again it will become like a
> >> network switch, requiring firewalls, etc.)
> >
> >Even if we don't impose some complicated mechanisms, I think the
> >vhost-vsock should work with sibling, and I don't see any difference
> >between vhost-vsock and vhost-user-vsock. (I am just curious.)
> 
> THe difference is how to prevent a communication between VMs of 
> different users.
> 
> With vhost-user-vsock, you can decide which VMs to connect to the 
> vhost-user backend, in vhost-vsock all of them will be attached to the 
> host net stack, so we need to add some kind of firewall, etc. and it 
> will complicate a lot our simple stack. So I'm not sure we want that.

I see it. The vhost-user-vsock provides kind of "namespace" capability, am
I right?

> >> >+──kernel(vhost-vsock)──────────────────────────────────────────────+
> >> >│+──────────────+ +──────────────+ +──────────────+ +──────────────+│
> >> >││  dev1(cid1)  │ │  dev2(cid2)  │ │  dev3(cid3)  │ │  dev4(cid4)  ││
> >> >│+──────────────+ +──────────────+ +──────────────+ +──────────────+│
> >> >+────────▲────────────────▲────────────────▲────────────────▲───────+
> >> >         │                │                │                │
> >> >         │                │          ┌─────┘                │
> >> >         │                │          │                      │
> >> >+────────┴───────────+    │ +────────┴───────────+ +────────┴───────────+
> >> >│+──────────────+ VM0│    │ │+──────────────+ VM1│ │+──────────────+ VM2│
> >> >││dev1(cid1,def)│    │    │ ││dev3(cid3,def)│    │ ││dev4(cid4,def)│    │
> >> >│+──────────────+    │    │ │+──────────────+    │ │+──────────────+    │
> >> >│+──────────────+    │    │ │                    │ │                    │
> >> >││  dev2(cid2)  │────┼────┘ │                    │ │                    │
> >> >│+──────────────+    │      │                    │ │                    │
> >> >+────────────────────+      +────────────────────+ +────────────────────+
> >> >
> >> >The things go different for devices for host:
> >> >
> >> >1. For dev1, it is used for sibling (dev1 <-> dev4), and the host
> >> >(src_cid=cid1, dst_cid=2) means the real host (not a userapp), where
> >> >doesn't show in the diagram.
> >>
> >> This is not going to happen (see above).
> >>
> >> >2. For dev2 (src_cid=cid2), the host (dst_cid=2) is userapp1;
> >> >3. For dev3 (src_cid=cid3), the host (dst_cid=2) is userapp2.
> >> >
> >> >+──kernel(vhost-vsock)──────────────────────────────────────────────+
> >> >│+──────────────+                                   +──────────────+│
> >> >││  dev1(cid1)  │                                   │  dev4(cid4)  ││
> >> >│+──────────────+                                   +──────────────+│
> >> >+────────▲──────────────────────────────────────────────────▲───────+
> >> >         │                                                  │
> >> >+────────┴───────────+     +────────────────────+           │
> >> >│+──────────────+ VM0│     │+──────────────+    │           │
> >> >││dev1(cid1,def)│    │┌────▶│  dev2(cid2)  │    │  +────────┴───────────+
> >> >│+──────────────+    ││    │+──────────────+    │  │+──────────────+ VM2│
> >> >│+──────────────+    ││    │            userapp1│  ││dev4(cid4,def)│    │
> >> >││  dev2(cid2)  │────┼┘    +────────────────────+  │+──────────────+    │
> >> >│+──────────────+  vhost-user-vsock─────────────+  │                    │
> >> >│+──────────────+    │     │+──────────────+    │  │                    │
> >> >││  dev3(cid3)  │────┼─────▶│  dev3(cid3)  │    │  │                    │
> >> >│+──────────────+    │     │+──────────────+    │  +────────────────────+
> >> >+────────────────────+     │            userapp2│
> >> >                           +────────────────────+
> >> >
> >>
> >> I'm really confused with `cid1`, `cid2`, etc. Are they any number >= 3?
> >> I'd suggest to use real value (e.g. cid=42).
> >
> >Okay. I'll update it with real value ;)
> >
> >Based on the above question, I still put the devices into the kernel.
> >
> >+──kernel(vhost-vsock)──────────────────────────────────────────────────+
> >│+───────────────+                                  +───────────────+   │
> >││  dev1(cid=3)  │                                  │  dev4(cid=6)  │   │
> >│+───────────────+                                  +───────────────+   │
> >+────────▲──────────────────────────────────────────────────▲───────────+
> >         │                                                  │
> >+────────┴───────────+     +────────────────────+           │
> >│+───────────────+VM0│     │+──────────────+    │           │
> >││dev1(cid=3,def)│   │┌────▶│ dev2(cid=4)  │    │  +────────┴───────────+
> >│+───────────────+   ││    │+──────────────+    │  │+───────────────+VM2│
> >│+───────────────+   ││    │            userapp1│  ││dev4(cid=6,def)│   │
> >││  dev2(cid=4)  │───┼┘    +────────────────────+  │+───────────────+   │
> >│+───────────────+ vhost-user-vsock─────────────+  │                    │
> >│+───────────────+   │     │+──────────────+    │  │                    │
> >││  dev3(cid=5)  │───┼─────▶│ dev3(cid=5)  │    │  │                    │
> >│+───────────────+   │     │+──────────────+    │  +────────────────────+
> >+────────────────────+     │            userapp2│
> >                           +────────────────────+
> >
> >> So what dest CID the VM0 is supposed to use to talk with userapp1 and
> >> userapp2? In both cases CID=2, right?
> >
> >Yes. There are at least two sockets with source cid=4 and cid=5 
> >respectively.
> 
> As I said, this is odd IMHO.
> We are using the source address to multiplex the destination app.
> We should use the destination address for that, no?

Well. In my design, we use one device mainly for the host, so that we can
do multiplexing as we ususally do. Other devices are dedicated for specific
uses, for example, I want to use a dedicated vsock for TSI backend as I
mentioned before. In this case, we can't do multiplexing.

IMHO, our systems are functional without the support of multiple devices,
right? So I introduce the default device to make sure that we don't break
the current usage after adding the support.

At the same time, we really want to have more devices for specific use
cases, where we have to deal with the more complicated usage and config.

To recap: I think the case I mentioned here is one of specific use cases.
For most cases, we can do multiplexing based on the destination address.

Looks good to you?

> >> Why you need 2 vhost-user-vsock devices?
> >
> >The benefit of vhost-user is "shared memory", which reduces the need 
> >for
> >data copying. It is possible to share virtqueues to multiple user apps, for
> >the sake of performance.
> >
> >I don't forget the "CID=0" thing. Just as an explanation, I'll use the
> >example of TSI.
> >
> >We can treat the TSI backend as a proxy. Thanks to vhost-user-vsock, the
> >data will be copied once from the guest user space to the proxy. When we
> >have two subnets, which is a common case, we might want to have two 
> >proxies
> >to forward the data.
> 
> Okay, I see it now, but it's really the use case of vsock?

We are using Kata Containers to launch a pod. Thanks to Istio Ambient [1]
service mesh, which is available now, we don't need to set network rules in
the pod. Networking inside the VM is pretty simple, we don't need to
network stack, all we need to do is to forward the data to the Istio
Ambient host daemonset. In this case, vsock and TSI are the best choices.

If use network, we have to do a lot of copy to achieve that:

guest userspace
    -> guest network stack
        -> pod net namespace network stack
            -> host network stack
                -> host userspace (Istio Ambient proxy)

If use TSI, the things are:

guest userspace
    -> guest vsock
        -> host tsi backend (Istio Ambient proxy)

We are trying to enable vsock to function as a data plane, instead of only
doing control stuff (control plane).

Of course, we also expect that the vsock keeps as simple as possible, and
try to not bother users who don't need these advanced features.

1: https://istio.io/latest/blog/2022/introducing-ambient-mesh/

> In this way the destination address (CID, port) is completely useless, 
> since it's never used, so why using vsock for this use case?

What if the userapp provides two services on different ports, so port is
needed?

> I have an idea, but I don't know if it is feasible.
> CID=0 is pretty much unsupported for now by virtio-vsock, but maybe we 
> could leverage it for this use case.
> 
> If we have multiple devices, but each practically allows only one 
> application to be reached,

See above.

> then these devices can be reached by CID=0 

Why CID=0 only?

> and port=x, where each device exposes in its configuration space the 
> port to which it responds.
> 
> Thus, in the guest, connect(0, 10001) will go to the device that exposes 
> port 10001, and so on.

Your proposal

pro:

1) all cases don't need to call `bind()`.

cons:

1) since the config space will not be read after the device is set up, the
driver can't update the exposed port dynamically.
2) the driver has to maintain the mapping between the port and the device.
3) the guest apps still have to know the mapping to get the right port.

===

My proposal

pro:

1) it is transparent to the driver.
2) most cases (expected 95%) don't need to make a change (they don't have
to know the mapping, and don't need to call `bind()`).

cons:

1) specific cases (expected 5%) need to know which device to use, and do
`bind()` call.

===

I update some limitations, and try to make things clearer:

1) the default device is allowed to communicate with host and sibling (just
like with a single device now).
2) other devices are allowed to communicate with host (CID=2) only.

> I don't know if I really like that proposal honestly, anyway, if we go 
> back with yours instead, where the guest has to do the bind to choose 
> the device to use, that's fine, but the one thing we have to have IMO is 
> a way to set the default device, as we were doing.

Exactly!

> >
> >                 +────────────────+    +───────────────────────────────+
> > .─────────.     │  tsi backend1  │    │+───────────────+    ┏━━━━━━━━┓│
> >(    NW1    )◀───│   (userapp1)   │◀───┤│  vsock dev1   ◀────┃subnet1 ┃│
> > `─────────'     +────────────────+    │+───────────────+    ┗━━━━━━━━┛│
> >                                       │                               │
> >                 +────────────────+    │+───────────────+    
> >                 ┏━━━━━━━━┓│
> > .─────────.     │  tsi backend2  │◀───┤│  vsock dev2   ◀────┃subnet2 ┃│
> >(    NW2    )◀───│   (userapp2)   │    │+───────────────+    ┗━━━━━━━━┛│
> > `─────────'     +────────────────+    +───────────────────────────────+
> >
> >> Can you just have a single one and have the application
> >> connecting/listing on different port? (which is the sense of the port,
> >> multiplexing application on the destination)
> >
> >In terms of functionality, I think it is possible. But it loses the benefit
> >of vhost-user-vsock.
> 
> Can you elaborate a bit?

In this case, we can have one TSI backend with multiple ports. Then, the
backend forwards data to different proxies, which means we actually copy
the data an additional time between the backend and the proxy.

guest userspace
    -> guest vsock
        -> host tsi backend
            -> host proxy (Istio Ambient proxy)

If we can combine TSI backend and proxy, we can reduce the data copying.

Thanks,
Xuewei

> BTW in rust-vmm/vhost-device we use a single vhost-user device and 
> multiplex connections between multiple applications in the host: 
> https://github.com/rust-vmm/vhost-device/tree/main/vhost-device-vsock
> 
> Thanks,
> Stefano

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v7] virtio-vsock: Add support for multi devices
  2025-07-02  6:05                                           ` Xuewei Niu
@ 2025-07-10 10:19                                             ` Stefano Garzarella
  2025-07-11  4:40                                               ` Xuewei Niu
  0 siblings, 1 reply; 59+ messages in thread
From: Stefano Garzarella @ 2025-07-10 10:19 UTC (permalink / raw)
  To: Xuewei Niu; +Cc: fupan.lfp, mst, niuxuewei.nxw, parav, virtio-comment

On Wed, Jul 02, 2025 at 02:05:16PM +0800, Xuewei Niu wrote:

[...]

>> >> >>
>> >> >> Why?
>> >> >>
>> >> >> IMO, as I already wrote, the libkrun service should use CID=0.
>> >> >
>> >> >As we before discussed, if there is only one backend for the libkrun
>> >> >service, we can use cid=0. I totally agree with you. So let us put this
>> >> >case aside first.
>> >> >
>> >> >> How you will handle multiple devices for host? How can the guest know
>> >> >> which device to use to reach HOST(2)?
>> >> >
>> >> >Do `bind()` explicitly, and a device with matching cid will be picked up.
>> >>
>> >> Okay, but why you need 2 devices to communicate with the same CID, host
>> >> in this case (CID=2)?
>> >> IMO use the source CID to multiplex a socket at destination is not
>> >> great. But I can be wrong.
>> >
>> >I mean multiple sockets with multiple `bind()` calls, not multiplexing:
>> >
>> >- socket1: bind(3, -1), connect(2, 10000);
>> >- socket2: bind(4, -1), connect(2, 10001);
>> >- ...
>>
>> Yep, of course, I meant exactly that.
>> In this case we are doing multiplexing based on the source address,
>> which IMHO is odd.
>
>Please see below.
>
>> >> >> IMO is easier to have the multiple device for sibling (e.g. the device
>> >> >> can advertise which dest CID its supports), but not for host.
>> >> >
>> >> >I think one device for sibling is enough.
>> >>
>> >> I also think one should be enough, but IMO I think it might make more
>> >> sense to have multiple devices in this case, where each device can
>> >> handle a pool of CIDs, then the hard part will be figuring out how to
>> >> allocate the CIDs, etc. so yes, I agree that having one device even in
>> >> this case is the easiest thing.
>> >
>> >Yeah, so let us skip this for now ;)
>>
>> Agree.
>>
>> > >
>> >> >For example, dev1 is enough for
>> >> >communication with dev3 and dev4. So we don't need dev2 for sibling in the
>> >> >first VM, right?
>> >>
>> >> Just a note, vhost-vsock is not allowing any sibling communication.
>> >> vhost-user-vsock can do it, but we don't want to bring any support in
>> >> vhost-vsock to not overcomplicate it (again it will become like a
>> >> network switch, requiring firewalls, etc.)
>> >
>> >Even if we don't impose some complicated mechanisms, I think the
>> >vhost-vsock should work with sibling, and I don't see any difference
>> >between vhost-vsock and vhost-user-vsock. (I am just curious.)
>>
>> THe difference is how to prevent a communication between VMs of
>> different users.
>>
>> With vhost-user-vsock, you can decide which VMs to connect to the
>> vhost-user backend, in vhost-vsock all of them will be attached to the
>> host net stack, so we need to add some kind of firewall, etc. and it
>> will complicate a lot our simple stack. So I'm not sure we want that.
>
>I see it. The vhost-user-vsock provides kind of "namespace" capability, am
>I right?

yep, kind of.

>
>> >> >+──kernel(vhost-vsock)──────────────────────────────────────────────+
>> >> >│+──────────────+ +──────────────+ +──────────────+ +──────────────+│
>> >> >││  dev1(cid1)  │ │  dev2(cid2)  │ │  dev3(cid3)  │ │  dev4(cid4)  ││
>> >> >│+──────────────+ +──────────────+ +──────────────+ +──────────────+│
>> >> >+────────▲────────────────▲────────────────▲────────────────▲───────+
>> >> >         │                │                │                │
>> >> >         │                │          ┌─────┘                │
>> >> >         │                │          │                      │
>> >> >+────────┴───────────+    │ +────────┴───────────+ +────────┴───────────+
>> >> >│+──────────────+ VM0│    │ │+──────────────+ VM1│ │+──────────────+ VM2│
>> >> >││dev1(cid1,def)│    │    │ ││dev3(cid3,def)│    │ ││dev4(cid4,def)│    │
>> >> >│+──────────────+    │    │ │+──────────────+    │ │+──────────────+    │
>> >> >│+──────────────+    │    │ │                    │ │                    │
>> >> >││  dev2(cid2)  │────┼────┘ │                    │ │                    │
>> >> >│+──────────────+    │      │                    │ │                    │
>> >> >+────────────────────+      +────────────────────+ +────────────────────+
>> >> >
>> >> >The things go different for devices for host:
>> >> >
>> >> >1. For dev1, it is used for sibling (dev1 <-> dev4), and the host
>> >> >(src_cid=cid1, dst_cid=2) means the real host (not a userapp), where
>> >> >doesn't show in the diagram.
>> >>
>> >> This is not going to happen (see above).
>> >>
>> >> >2. For dev2 (src_cid=cid2), the host (dst_cid=2) is userapp1;
>> >> >3. For dev3 (src_cid=cid3), the host (dst_cid=2) is userapp2.
>> >> >
>> >> >+──kernel(vhost-vsock)──────────────────────────────────────────────+
>> >> >│+──────────────+                                   +──────────────+│
>> >> >││  dev1(cid1)  │                                   │  dev4(cid4)  ││
>> >> >│+──────────────+                                   +──────────────+│
>> >> >+────────▲──────────────────────────────────────────────────▲───────+
>> >> >         │                                                  │
>> >> >+────────┴───────────+     +────────────────────+           │
>> >> >│+──────────────+ VM0│     │+──────────────+    │           │
>> >> >││dev1(cid1,def)│    │┌────▶│  dev2(cid2)  │    │  +────────┴───────────+
>> >> >│+──────────────+    ││    │+──────────────+    │  │+──────────────+ VM2│
>> >> >│+──────────────+    ││    │            userapp1│  ││dev4(cid4,def)│    │
>> >> >││  dev2(cid2)  │────┼┘    +────────────────────+  │+──────────────+    │
>> >> >│+──────────────+  vhost-user-vsock─────────────+  │                    │
>> >> >│+──────────────+    │     │+──────────────+    │  │                    │
>> >> >││  dev3(cid3)  │────┼─────▶│  dev3(cid3)  │    │  │                    │
>> >> >│+──────────────+    │     │+──────────────+    │  +────────────────────+
>> >> >+────────────────────+     │            userapp2│
>> >> >                           +────────────────────+
>> >> >
>> >>
>> >> I'm really confused with `cid1`, `cid2`, etc. Are they any number >= 3?
>> >> I'd suggest to use real value (e.g. cid=42).
>> >
>> >Okay. I'll update it with real value ;)
>> >
>> >Based on the above question, I still put the devices into the kernel.
>> >
>> >+──kernel(vhost-vsock)──────────────────────────────────────────────────+
>> >│+───────────────+                                  +───────────────+   │
>> >││  dev1(cid=3)  │                                  │  dev4(cid=6)  │   │
>> >│+───────────────+                                  +───────────────+   │
>> >+────────▲──────────────────────────────────────────────────▲───────────+
>> >         │                                                  │
>> >+────────┴───────────+     +────────────────────+           │
>> >│+───────────────+VM0│     │+──────────────+    │           │
>> >││dev1(cid=3,def)│   │┌────▶│ dev2(cid=4)  │    │  +────────┴───────────+
>> >│+───────────────+   ││    │+──────────────+    │  │+───────────────+VM2│
>> >│+───────────────+   ││    │            userapp1│  ││dev4(cid=6,def)│   │
>> >││  dev2(cid=4)  │───┼┘    +────────────────────+  │+───────────────+   │
>> >│+───────────────+ vhost-user-vsock─────────────+  │                    │
>> >│+───────────────+   │     │+──────────────+    │  │                    │
>> >││  dev3(cid=5)  │───┼─────▶│ dev3(cid=5)  │    │  │                    │
>> >│+───────────────+   │     │+──────────────+    │  +────────────────────+
>> >+────────────────────+     │            userapp2│
>> >                           +────────────────────+
>> >
>> >> So what dest CID the VM0 is supposed to use to talk with userapp1 and
>> >> userapp2? In both cases CID=2, right?
>> >
>> >Yes. There are at least two sockets with source cid=4 and cid=5
>> >respectively.
>>
>> As I said, this is odd IMHO.
>> We are using the source address to multiplex the destination app.
>> We should use the destination address for that, no?
>
>Well. In my design, we use one device mainly for the host, so that we can
>do multiplexing as we ususally do. Other devices are dedicated for specific
>uses, for example, I want to use a dedicated vsock for TSI backend as I
>mentioned before. In this case, we can't do multiplexing.

I meant multiplexing in the VM. Why we can't do that?

>
>IMHO, our systems are functional without the support of multiple devices,
>right? So I introduce the default device to make sure that we don't break
>the current usage after adding the support.
>
>At the same time, we really want to have more devices for specific use
>cases, where we have to deal with the more complicated usage and config.
>
>To recap: I think the case I mentioned here is one of specific use 
>cases.
>For most cases, we can do multiplexing based on the destination address.
>
>Looks good to you?

Yep, in order to not over-complicate vsock, I think the "default" device 
is the simplest way.

>
>> >> Why you need 2 vhost-user-vsock devices?
>> >
>> >The benefit of vhost-user is "shared memory", which reduces the need
>> >for
>> >data copying. It is possible to share virtqueues to multiple user apps, for
>> >the sake of performance.
>> >
>> >I don't forget the "CID=0" thing. Just as an explanation, I'll use the
>> >example of TSI.
>> >
>> >We can treat the TSI backend as a proxy. Thanks to vhost-user-vsock, the
>> >data will be copied once from the guest user space to the proxy. When we
>> >have two subnets, which is a common case, we might want to have two
>> >proxies
>> >to forward the data.
>>
>> Okay, I see it now, but it's really the use case of vsock?
>
>We are using Kata Containers to launch a pod. Thanks to Istio Ambient [1]
>service mesh, which is available now, we don't need to set network rules in
>the pod. Networking inside the VM is pretty simple, we don't need to
>network stack, all we need to do is to forward the data to the Istio
>Ambient host daemonset. In this case, vsock and TSI are the best choices.
>
>If use network, we have to do a lot of copy to achieve that:
>
>guest userspace
>    -> guest network stack
>        -> pod net namespace network stack
>            -> host network stack
>                -> host userspace (Istio Ambient proxy)
>
>If use TSI, the things are:
>
>guest userspace
>    -> guest vsock
>        -> host tsi backend (Istio Ambient proxy)
>
>We are trying to enable vsock to function as a data plane, instead of only
>doing control stuff (control plane).
>
>Of course, we also expect that the vsock keeps as simple as possible, and
>try to not bother users who don't need these advanced features.
>
>1: https://istio.io/latest/blog/2022/introducing-ambient-mesh/
>
>> In this way the destination address (CID, port) is completely useless,
>> since it's never used, so why using vsock for this use case?
>
>What if the userapp provides two services on different ports, so port is
>needed?

I see.

>
>> I have an idea, but I don't know if it is feasible.
>> CID=0 is pretty much unsupported for now by virtio-vsock, but maybe we
>> could leverage it for this use case.
>>
>> If we have multiple devices, but each practically allows only one
>> application to be reached,
>
>See above.
>
>> then these devices can be reached by CID=0
>
>Why CID=0 only?
>
>> and port=x, where each device exposes in its configuration space the
>> port to which it responds.
>>
>> Thus, in the guest, connect(0, 10001) will go to the device that exposes
>> port 10001, and so on.
>
>Your proposal
>
>pro:
>
>1) all cases don't need to call `bind()`.
>
>cons:
>
>1) since the config space will not be read after the device is set up, the
>driver can't update the exposed port dynamically.
>2) the driver has to maintain the mapping between the port and the 
>device.
>3) the guest apps still have to know the mapping to get the right port.

Yep, I wrote that I don't like it too ;-)

>
>===
>
>My proposal
>
>pro:
>
>1) it is transparent to the driver.

The driver needs to be updated to support these changes, no?

>2) most cases (expected 95%) don't need to make a change (they don't 
>have
>to know the mapping, and don't need to call `bind()`).
>
>cons:
>
>1) specific cases (expected 5%) need to know which device to use, and do
>`bind()` call.
>
>===
>
>I update some limitations, and try to make things clearer:
>
>1) the default device is allowed to communicate with host and sibling (just
>like with a single device now).
>2) other devices are allowed to communicate with host (CID=2) only.
>
>> I don't know if I really like that proposal honestly, anyway, if we go
>> back with yours instead, where the guest has to do the bind to choose
>> the device to use, that's fine, but the one thing we have to have IMO is
>> a way to set the default device, as we were doing.
>
>Exactly!

BTW, yep let's do the default approach for now if others are fine, also 
because I'm a bit lost xD

>
>> >
>> >                 +────────────────+    +───────────────────────────────+
>> > .─────────.     │  tsi backend1  │    │+───────────────+    ┏━━━━━━━━┓│
>> >(    NW1    )◀───│   (userapp1)   │◀───┤│  vsock dev1   ◀────┃subnet1 ┃│
>> > `─────────'     +────────────────+    │+───────────────+    ┗━━━━━━━━┛│
>> >                                       │                               │
>> >                 +────────────────+    │+───────────────+
>> >                 ┏━━━━━━━━┓│
>> > .─────────.     │  tsi backend2  │◀───┤│  vsock dev2   ◀────┃subnet2 ┃│
>> >(    NW2    )◀───│   (userapp2)   │    │+───────────────+    ┗━━━━━━━━┛│
>> > `─────────'     +────────────────+    +───────────────────────────────+
>> >
>> >> Can you just have a single one and have the application
>> >> connecting/listing on different port? (which is the sense of the port,
>> >> multiplexing application on the destination)
>> >
>> >In terms of functionality, I think it is possible. But it loses the benefit
>> >of vhost-user-vsock.
>>
>> Can you elaborate a bit?
>
>In this case, we can have one TSI backend with multiple ports. Then, the
>backend forwards data to different proxies, which means we actually copy
>the data an additional time between the backend and the proxy.
>
>guest userspace
>    -> guest vsock
>        -> host tsi backend
>            -> host proxy (Istio Ambient proxy)
>
>If we can combine TSI backend and proxy, we can reduce the data copying.

Okay, my suggestion for next version is to explain very well (in the 
commit description I guess) why we decided to go in this way (e.g. not 
overcomplicate vsock, backward compatibility, limitations, etc.), but 
yeah, it seems the less complicated approach that should work for you 
and keep vsock simple.

The main thing we should explain well, is that vsock is not a net 
device, so if the host attach 3 devices, they have all the same 
destination address from the guest point of view (CID=2), for that 
reason we can only use the source address to mulitplex them.
(e.g. for net, if you attached 3 devices, they will have 3 different 
dest MAC/IP address to reach the host).

Thanks,
Stefano


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v7] virtio-vsock: Add support for multi devices
  2025-07-10 10:19                                             ` Stefano Garzarella
@ 2025-07-11  4:40                                               ` Xuewei Niu
  0 siblings, 0 replies; 59+ messages in thread
From: Xuewei Niu @ 2025-07-11  4:40 UTC (permalink / raw)
  To: Stefano Garzarella, Xuewei Niu; +Cc: fupan.lfp, mst, parav, virtio-comment

Resend, I've changed my address to company email, which was not subscribed
to the virtio-comment maillist. The previous one was rejected by the
system. Sorry for the inconvenience.

On 2025/7/10 18:19, Stefano Garzarella wrote:
> On Wed, Jul 02, 2025 at 02:05:16PM +0800, Xuewei Niu wrote:
> 
> [...]
> 
>>> >> >>
>>> >> >> Why?
>>> >> >>
>>> >> >> IMO, as I already wrote, the libkrun service should use CID=0.
>>> >> >
>>> >> >As we before discussed, if there is only one backend for the libkrun
>>> >> >service, we can use cid=0. I totally agree with you. So let us put this
>>> >> >case aside first.
>>> >> >
>>> >> >> How you will handle multiple devices for host? How can the guest know
>>> >> >> which device to use to reach HOST(2)?
>>> >> >
>>> >> >Do `bind()` explicitly, and a device with matching cid will be picked up.
>>> >>
>>> >> Okay, but why you need 2 devices to communicate with the same CID, host
>>> >> in this case (CID=2)?
>>> >> IMO use the source CID to multiplex a socket at destination is not
>>> >> great. But I can be wrong.
>>> >
>>> >I mean multiple sockets with multiple `bind()` calls, not multiplexing:
>>> >
>>> >- socket1: bind(3, -1), connect(2, 10000);
>>> >- socket2: bind(4, -1), connect(2, 10001);
>>> >- ...
>>>
>>> Yep, of course, I meant exactly that.
>>> In this case we are doing multiplexing based on the source address,
>>> which IMHO is odd.
>>
>> Please see below.
>>
>>> >> >> IMO is easier to have the multiple device for sibling (e.g. the device
>>> >> >> can advertise which dest CID its supports), but not for host.
>>> >> >
>>> >> >I think one device for sibling is enough.
>>> >>
>>> >> I also think one should be enough, but IMO I think it might make more
>>> >> sense to have multiple devices in this case, where each device can
>>> >> handle a pool of CIDs, then the hard part will be figuring out how to
>>> >> allocate the CIDs, etc. so yes, I agree that having one device even in
>>> >> this case is the easiest thing.
>>> >
>>> >Yeah, so let us skip this for now ;)
>>>
>>> Agree.
>>>
>>> > >
>>> >> >For example, dev1 is enough for
>>> >> >communication with dev3 and dev4. So we don't need dev2 for sibling in the
>>> >> >first VM, right?
>>> >>
>>> >> Just a note, vhost-vsock is not allowing any sibling communication.
>>> >> vhost-user-vsock can do it, but we don't want to bring any support in
>>> >> vhost-vsock to not overcomplicate it (again it will become like a
>>> >> network switch, requiring firewalls, etc.)
>>> >
>>> >Even if we don't impose some complicated mechanisms, I think the
>>> >vhost-vsock should work with sibling, and I don't see any difference
>>> >between vhost-vsock and vhost-user-vsock. (I am just curious.)
>>>
>>> THe difference is how to prevent a communication between VMs of
>>> different users.
>>>
>>> With vhost-user-vsock, you can decide which VMs to connect to the
>>> vhost-user backend, in vhost-vsock all of them will be attached to the
>>> host net stack, so we need to add some kind of firewall, etc. and it
>>> will complicate a lot our simple stack. So I'm not sure we want that.
>>
>> I see it. The vhost-user-vsock provides kind of "namespace" capability, am
>> I right?
> 
> yep, kind of.
> 
>>
>>> >> >+──kernel(vhost-vsock)──────────────────────────────────────────────+
>>> >> >│+──────────────+ +──────────────+ +──────────────+ +──────────────+│
>>> >> >││  dev1(cid1)  │ │  dev2(cid2)  │ │  dev3(cid3)  │ │  dev4(cid4)  ││
>>> >> >│+──────────────+ +──────────────+ +──────────────+ +──────────────+│
>>> >> >+────────▲────────────────▲────────────────▲────────────────▲───────+
>>> >> >         │                │                │                │
>>> >> >         │                │          ┌─────┘                │
>>> >> >         │                │          │                      │
>>> >> >+────────┴───────────+    │ +────────┴───────────+ +────────┴───────────+
>>> >> >│+──────────────+ VM0│    │ │+──────────────+ VM1│ │+──────────────+ VM2│
>>> >> >││dev1(cid1,def)│    │    │ ││dev3(cid3,def)│    │ ││dev4(cid4,def)│    │
>>> >> >│+──────────────+    │    │ │+──────────────+    │ │+──────────────+    │
>>> >> >│+──────────────+    │    │ │                    │ │                    │
>>> >> >││  dev2(cid2)  │────┼────┘ │                    │ │                    │
>>> >> >│+──────────────+    │      │                    │ │                    │
>>> >> >+────────────────────+      +────────────────────+ +────────────────────+
>>> >> >
>>> >> >The things go different for devices for host:
>>> >> >
>>> >> >1. For dev1, it is used for sibling (dev1 <-> dev4), and the host
>>> >> >(src_cid=cid1, dst_cid=2) means the real host (not a userapp), where
>>> >> >doesn't show in the diagram.
>>> >>
>>> >> This is not going to happen (see above).
>>> >>
>>> >> >2. For dev2 (src_cid=cid2), the host (dst_cid=2) is userapp1;
>>> >> >3. For dev3 (src_cid=cid3), the host (dst_cid=2) is userapp2.
>>> >> >
>>> >> >+──kernel(vhost-vsock)──────────────────────────────────────────────+
>>> >> >│+──────────────+                                   +──────────────+│
>>> >> >││  dev1(cid1)  │                                   │  dev4(cid4)  ││
>>> >> >│+──────────────+                                   +──────────────+│
>>> >> >+────────▲──────────────────────────────────────────────────▲───────+
>>> >> >         │                                                  │
>>> >> >+────────┴───────────+     +────────────────────+           │
>>> >> >│+──────────────+ VM0│     │+──────────────+    │           │
>>> >> >││dev1(cid1,def)│    │┌────▶│  dev2(cid2)  │    │  +────────┴───────────+
>>> >> >│+──────────────+    ││    │+──────────────+    │  │+──────────────+ VM2│
>>> >> >│+──────────────+    ││    │            userapp1│  ││dev4(cid4,def)│    │
>>> >> >││  dev2(cid2)  │────┼┘    +────────────────────+  │+──────────────+    │
>>> >> >│+──────────────+  vhost-user-vsock─────────────+  │                    │
>>> >> >│+──────────────+    │     │+──────────────+    │  │                    │
>>> >> >││  dev3(cid3)  │────┼─────▶│  dev3(cid3)  │    │  │                    │
>>> >> >│+──────────────+    │     │+──────────────+    │  +────────────────────+
>>> >> >+────────────────────+     │            userapp2│
>>> >> >                           +────────────────────+
>>> >> >
>>> >>
>>> >> I'm really confused with `cid1`, `cid2`, etc. Are they any number >= 3?
>>> >> I'd suggest to use real value (e.g. cid=42).
>>> >
>>> >Okay. I'll update it with real value ;)
>>> >
>>> >Based on the above question, I still put the devices into the kernel.
>>> >
>>> >+──kernel(vhost-vsock)──────────────────────────────────────────────────+
>>> >│+───────────────+                                  +───────────────+   │
>>> >││  dev1(cid=3)  │                                  │  dev4(cid=6)  │   │
>>> >│+───────────────+                                  +───────────────+   │
>>> >+────────▲──────────────────────────────────────────────────▲───────────+
>>> >         │                                                  │
>>> >+────────┴───────────+     +────────────────────+           │
>>> >│+───────────────+VM0│     │+──────────────+    │           │
>>> >││dev1(cid=3,def)│   │┌────▶│ dev2(cid=4)  │    │  +────────┴───────────+
>>> >│+───────────────+   ││    │+──────────────+    │  │+───────────────+VM2│
>>> >│+───────────────+   ││    │            userapp1│  ││dev4(cid=6,def)│   │
>>> >││  dev2(cid=4)  │───┼┘    +────────────────────+  │+───────────────+   │
>>> >│+───────────────+ vhost-user-vsock─────────────+  │                    │
>>> >│+───────────────+   │     │+──────────────+    │  │                    │
>>> >││  dev3(cid=5)  │───┼─────▶│ dev3(cid=5)  │    │  │                    │
>>> >│+───────────────+   │     │+──────────────+    │  +────────────────────+
>>> >+────────────────────+     │            userapp2│
>>> >                           +────────────────────+
>>> >
>>> >> So what dest CID the VM0 is supposed to use to talk with userapp1 and
>>> >> userapp2? In both cases CID=2, right?
>>> >
>>> >Yes. There are at least two sockets with source cid=4 and cid=5
>>> >respectively.
>>>
>>> As I said, this is odd IMHO.
>>> We are using the source address to multiplex the destination app.
>>> We should use the destination address for that, no?
>>
>> Well. In my design, we use one device mainly for the host, so that we can
>> do multiplexing as we ususally do. Other devices are dedicated for specific
>> uses, for example, I want to use a dedicated vsock for TSI backend as I
>> mentioned before. In this case, we can't do multiplexing.
> 
> I meant multiplexing in the VM. Why we can't do that?

If there is only one device, which means we use the same source device, then
we can do multiplex.

In case of multiple devices, we can do multiplex for connections as long as
they are using the same source device.

For example

<use device cid=3>
- connection1: src (3, 10000) -> dst (2, 10000)
- connection2: src (3, 10001) -> dst (2, 10001)

<use device cid=4>
- connection3: src (4, 10000) -> dst (2, 10002)
- connection4: src (4, 10001) -> dst (2, 10003)

We can multiplex connections 1 and 2, and we can also do so for connections
3 and 4. But for connection 2 and 3 we can't, as the source devices are not
the same. 

> 
>>
>> IMHO, our systems are functional without the support of multiple devices,
>> right? So I introduce the default device to make sure that we don't break
>> the current usage after adding the support.
>>
>> At the same time, we really want to have more devices for specific use
>> cases, where we have to deal with the more complicated usage and config.
>>
>> To recap: I think the case I mentioned here is one of specific use cases.
>> For most cases, we can do multiplexing based on the destination address.
>>
>> Looks good to you?
> 
> Yep, in order to not over-complicate vsock, I think the "default" device is the simplest way.
> 
>>
>>> >> Why you need 2 vhost-user-vsock devices?
>>> >
>>> >The benefit of vhost-user is "shared memory", which reduces the need
>>> >for
>>> >data copying. It is possible to share virtqueues to multiple user apps, for
>>> >the sake of performance.
>>> >
>>> >I don't forget the "CID=0" thing. Just as an explanation, I'll use the
>>> >example of TSI.
>>> >
>>> >We can treat the TSI backend as a proxy. Thanks to vhost-user-vsock, the
>>> >data will be copied once from the guest user space to the proxy. When we
>>> >have two subnets, which is a common case, we might want to have two
>>> >proxies
>>> >to forward the data.
>>>
>>> Okay, I see it now, but it's really the use case of vsock?
>>
>> We are using Kata Containers to launch a pod. Thanks to Istio Ambient [1]
>> service mesh, which is available now, we don't need to set network rules in
>> the pod. Networking inside the VM is pretty simple, we don't need to
>> network stack, all we need to do is to forward the data to the Istio
>> Ambient host daemonset. In this case, vsock and TSI are the best choices.
>>
>> If use network, we have to do a lot of copy to achieve that:
>>
>> guest userspace
>>    -> guest network stack
>>        -> pod net namespace network stack
>>            -> host network stack
>>                -> host userspace (Istio Ambient proxy)
>>
>> If use TSI, the things are:
>>
>> guest userspace
>>    -> guest vsock
>>        -> host tsi backend (Istio Ambient proxy)
>>
>> We are trying to enable vsock to function as a data plane, instead of only
>> doing control stuff (control plane).
>>
>> Of course, we also expect that the vsock keeps as simple as possible, and
>> try to not bother users who don't need these advanced features.
>>
>> 1: https://istio.io/latest/blog/2022/introducing-ambient-mesh/
>>
>>> In this way the destination address (CID, port) is completely useless,
>>> since it's never used, so why using vsock for this use case?
>>
>> What if the userapp provides two services on different ports, so port is
>> needed?
> 
> I see.
> 
>>
>>> I have an idea, but I don't know if it is feasible.
>>> CID=0 is pretty much unsupported for now by virtio-vsock, but maybe we
>>> could leverage it for this use case.
>>>
>>> If we have multiple devices, but each practically allows only one
>>> application to be reached,
>>
>> See above.
>>
>>> then these devices can be reached by CID=0
>>
>> Why CID=0 only?
>>
>>> and port=x, where each device exposes in its configuration space the
>>> port to which it responds.
>>>
>>> Thus, in the guest, connect(0, 10001) will go to the device that exposes
>>> port 10001, and so on.
>>
>> Your proposal
>>
>> pro:
>>
>> 1) all cases don't need to call `bind()`.
>>
>> cons:
>>
>> 1) since the config space will not be read after the device is set up, the
>> driver can't update the exposed port dynamically.
>> 2) the driver has to maintain the mapping between the port and the device.
>> 3) the guest apps still have to know the mapping to get the right port.
> 
> Yep, I wrote that I don't like it too ;-)
> 
>>
>> ===
>>
>> My proposal
>>
>> pro:
>>
>> 1) it is transparent to the driver.
> 
> The driver needs to be updated to support these changes, no?

Yes. I think any proposals can't avoid drivers managing those devices.
The transparency means the driver doesn't know the mapping, which is
kind of difficult to maintain.

> 
>> 2) most cases (expected 95%) don't need to make a change (they don't have
>> to know the mapping, and don't need to call `bind()`).
>>
>> cons:
>>
>> 1) specific cases (expected 5%) need to know which device to use, and do
>> `bind()` call.
>>
>> ===
>>
>> I update some limitations, and try to make things clearer:
>>
>> 1) the default device is allowed to communicate with host and sibling (just
>> like with a single device now).
>> 2) other devices are allowed to communicate with host (CID=2) only.
>>
>>> I don't know if I really like that proposal honestly, anyway, if we go
>>> back with yours instead, where the guest has to do the bind to choose
>>> the device to use, that's fine, but the one thing we have to have IMO is
>>> a way to set the default device, as we were doing.
>>
>> Exactly!
> 
> BTW, yep let's do the default approach for now if others are fine, also because I'm a bit lost xD
> 
>>
>>> >
>>> >                 +────────────────+    +───────────────────────────────+
>>> > .─────────.     │  tsi backend1  │    │+───────────────+    ┏━━━━━━━━┓│
>>> >(    NW1    )◀───│   (userapp1)   │◀───┤│  vsock dev1   ◀────┃subnet1 ┃│
>>> > `─────────'     +────────────────+    │+───────────────+    ┗━━━━━━━━┛│
>>> >                                       │                               │
>>> >                 +────────────────+    │+───────────────+
>>> >                 ┏━━━━━━━━┓│
>>> > .─────────.     │  tsi backend2  │◀───┤│  vsock dev2   ◀────┃subnet2 ┃│
>>> >(    NW2    )◀───│   (userapp2)   │    │+───────────────+    ┗━━━━━━━━┛│
>>> > `─────────'     +────────────────+    +───────────────────────────────+
>>> >
>>> >> Can you just have a single one and have the application
>>> >> connecting/listing on different port? (which is the sense of the port,
>>> >> multiplexing application on the destination)
>>> >
>>> >In terms of functionality, I think it is possible. But it loses the benefit
>>> >of vhost-user-vsock.
>>>
>>> Can you elaborate a bit?
>>
>> In this case, we can have one TSI backend with multiple ports. Then, the
>> backend forwards data to different proxies, which means we actually copy
>> the data an additional time between the backend and the proxy.
>>
>> guest userspace
>>    -> guest vsock
>>        -> host tsi backend
>>            -> host proxy (Istio Ambient proxy)
>>
>> If we can combine TSI backend and proxy, we can reduce the data copying.
> 
> Okay, my suggestion for next version is to explain very well (in the commit description I guess) why we decided to go in this way (e.g. not overcomplicate vsock, backward compatibility, limitations, etc.), but yeah, it seems the less complicated approach that should work for you and keep vsock simple.
> 
> The main thing we should explain well, is that vsock is not a net device, so if the host attach 3 devices, they have all the same destination address from the guest point of view (CID=2), for that reason we can only use the source address to mulitplex them.
> (e.g. for net, if you attached 3 devices, they will have 3 different dest MAC/IP address to reach the host).

I'll summarize the things we discussed in this thread, and update it in v8.

Thanks,
Xuewei

> 
> Thanks,
> Stefano


^ permalink raw reply	[flat|nested] 59+ messages in thread

end of thread, other threads:[~2025-07-11  4:40 UTC | newest]

Thread overview: 59+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-12 14:28 [PATCH v7] virtio-vsock: Add support for multi devices Xuewei Niu
2025-04-12 14:38 ` Xuewei Niu
2025-05-18 21:52 ` Michael S. Tsirkin
2025-05-19  9:37   ` Xuewei Niu
2025-06-11 14:51     ` Xuewei Niu
2025-06-11 14:53       ` Parav Pandit
2025-06-11 18:02     ` Michael S. Tsirkin
2025-06-12  3:28       ` Xuewei Niu
2025-06-14 17:11     ` Parav Pandit
2025-06-16  8:18       ` Xuewei Niu
2025-06-16  8:26         ` Parav Pandit
2025-06-16  8:59           ` Xuewei Niu
2025-06-16 10:05             ` Parav Pandit
2025-06-16 10:56               ` Xuewei Niu
2025-06-16 11:00                 ` Xuewei Niu
2025-06-17  6:01                 ` Parav Pandit
2025-06-17  7:41                   ` Xuewei Niu
2025-06-19  3:26                   ` Xuewei Niu
2025-06-19  4:40                     ` Parav Pandit
2025-06-19  5:10                       ` Xuewei Niu
2025-06-19  5:25                         ` Parav Pandit
2025-06-22 13:54                           ` Xuewei Niu
2025-06-23  7:53                         ` Stefano Garzarella
2025-06-23  8:48                           ` Xuewei Niu
2025-06-23  9:16                             ` Stefano Garzarella
2025-06-23 10:35                               ` Xuewei Niu
2025-06-23 11:01                                 ` Xuewei Niu
2025-06-23 11:15                                 ` Stefano Garzarella
2025-06-23 12:14                                   ` Xuewei Niu
2025-06-23 12:51                                     ` Stefano Garzarella
2025-06-23 15:51                                       ` Xuewei Niu
2025-07-01 10:31                                         ` Stefano Garzarella
2025-07-02  6:05                                           ` Xuewei Niu
2025-07-10 10:19                                             ` Stefano Garzarella
2025-07-11  4:40                                               ` Xuewei Niu
2025-06-13  4:23 ` Jason Wang
2025-06-13  4:57   ` Xuewei Niu
2025-06-13  8:35     ` Stefano Garzarella
2025-06-13  8:46       ` Xuewei Niu
2025-06-16  3:06         ` Jason Wang
2025-06-16  8:29           ` Xuewei Niu
2025-06-16  8:38             ` Stefano Garzarella
2025-06-17  2:52               ` Jason Wang
2025-06-17  2:54                 ` Jason Wang
2025-06-17  7:45                   ` Xuewei Niu
2025-06-18  0:49                     ` Jason Wang
2025-06-18  2:47                       ` Xuewei Niu
2025-06-18  4:19                         ` Jason Wang
2025-06-18  5:40                           ` Xuewei Niu
2025-06-18  8:36                             ` Jason Wang
2025-06-18  9:51                               ` Xuewei Niu
2025-06-19  1:10                                 ` Jason Wang
2025-06-19  2:42                                   ` Xuewei Niu
2025-06-23  8:01                                     ` Jason Wang
2025-06-23  9:47                                       ` Xuewei Niu
2025-06-24  0:51                                         ` Jason Wang
2025-06-24  3:33                                           ` Xuewei Niu
2025-06-13  8:31   ` Stefano Garzarella
2025-06-13  8:36     ` Xuewei Niu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.