* [RFC PATCH 0/5] vsock/virtio: Add support for multi-devices
@ 2024-05-17 14:46 Xuewei Niu
2024-05-17 14:46 ` [RFC PATCH 1/5] vsock/virtio: Extend virtio-vsock spec with an "order" field Xuewei Niu
` (6 more replies)
0 siblings, 7 replies; 15+ messages in thread
From: Xuewei Niu @ 2024-05-17 14:46 UTC (permalink / raw)
To: stefanha, sgarzare
Cc: mst, davem, kvm, virtualization, netdev, linux-kernel, Xuewei Niu
# Motivition
Vsock is a lightweight and widely used data exchange mechanism between host
and guest. Kata Containers, a secure container runtime, leverages the
capability to exchange control data between the shim and the kata-agent.
The Linux kernel only supports one vsock device for virtio-vsock transport,
resulting in the following limitations:
* Poor performance isolation: All vsock connections share the same
virtqueue.
* Cannot enable more than one backend: Virtio-vsock, vhost-vsock, and
vhost-user-vsock cannot be enabled simultaneously on the transport.
We’d like to transfer networking data, such as TSI (Transparent Socket
Impersonation), over vsock via the vhost-user protocol to reduce overhead.
However, by default, the vsock device is occupied by the kata-agent.
# Usages
Principle: **Supporting virtio-vsock multi-devices while also being
compatible with existing ones.**
## Connection from Guest to Host
There are two valuable questions to take about:
1. How to be compatible with the existing usages?
2. How do we specify a virtio-vsock device?
### Question 1
Before we delve into question 1, I'd like to provide a piece of pseudocode
as an example of one of the existing use cases from the guest's
perspective.
Assuming there is one virtio-vsock device with CID 4. One of existing
usages to connect to host is shown as following.
```
fd = socket(AF_VSOCK);
connect(fd, 2, 1234);
n = write(fd, buffer);
```
The result is that a connection is established from the guest (4, ?) to the
host (2, 1234), where "?" denotes a random port.
In the context of multi-devices, there are more than two devices. If the
users don’t specify one CID explicitly, the kernel becomes confused about
which device to use. The new implementation should be compatible with the
old one.
We expanded the virtio-vsock specification to address this issue. The
specification now includes a new field called "order".
```
struct virtio_vsock_config {
__le64 guest_cid;
__le64 order;
} _attribute_((packed));
```
In the phase of virtio-vsock driver probing, the guest kernel reads from
VMM to get the order of each device. **We stipulate that the device with the
smallest order is regarded as the default device**(this mechanism functions
as a 'default gateway' in networking).
Assuming there are three virtio-vsock devices: device1 (CID=3), device2
(CID=4), and device3 (CID=5). The arrangement of the list is as follows
from the perspective of the guest kernel:
```
virtio_vsock_list =
virtio_vsock { cid: 4, order: 0 } -> virtio_vsock { cid: 3, order: 1 } -> virtio_vsock { cid: 5, order: 10 }
```
At this time, the guest kernel realizes that the device2 (CID=4) is the
default device. Execute the same code as before.
```
fd = socket(AF_VSOCK);
connect(fd, 2, 1234);
n = write(fd, buffer);
```
A connection will be established from the guest (4, ?) to the host (2, 1234).
### Question 2
Now, the user wants to specify a device instead of the default one. An
explicit binding operation is required to be performed.
Use the device (CID=3), where “-1” represents any port, the kernel will
search an available port automatically.
```
fd = socket(AF_VSOCK);
bind(fd, 3, -1);
connect(fd, 2, 1234);
n = write(fd, buffer);
```
Use the device (CID=4).
```
fd = socket(AF_VSOCK);
bind(fd, 4, -1);
connect(fd, 2, 1234);
n = write(fd, buffer);
```
## Connection from Host to Guest
Connection from host to guest is quite similar to the existing usages. The
device’s CID is specified by the bind operation.
Listen at the device (CID=3)’s port 10000.
```
fd = socket(AF_VSOCK);
bind(fd, 3, 10000);
listen(fd);
new_fd = accept(fd, &host_cid, &host_port);
n = write(fd, buffer);
```
Listen at the device (CID=4)’s port 10000.
```
fd = socket(AF_VSOCK);
bind(fd, 4, 10000);
listen(fd);
new_fd = accept(fd, &host_cid, &host_port);
n = write(fd, buffer);
```
# Use Cases
We've completed a POC with Kata Containers, Ztunnel, which is a
purpose-built per-node proxy for Istio ambient mesh, and TSI. Please refer
to the following link for more details.
Link: https://bit.ly/4bdPJbU
Xuewei Niu (5):
vsock/virtio: Extend virtio-vsock spec with an "order" field
vsock/virtio: Add support for multi-devices
vsock/virtio: can_msgzerocopy adapts to multi-devices
vsock: seqpacket_allow adapts to multi-devices
vsock: Add an ioctl request to get all CIDs
include/linux/virtio_vsock.h | 2 +-
include/net/af_vsock.h | 25 ++-
include/uapi/linux/virtio_vsock.h | 1 +
include/uapi/linux/vm_sockets.h | 14 ++
net/vmw_vsock/af_vsock.c | 116 +++++++++--
net/vmw_vsock/virtio_transport.c | 255 ++++++++++++++++++------
net/vmw_vsock/virtio_transport_common.c | 16 +-
net/vmw_vsock/vsock_loopback.c | 4 +-
8 files changed, 352 insertions(+), 81 deletions(-)
--
2.34.1
^ permalink raw reply [flat|nested] 15+ messages in thread
* [RFC PATCH 1/5] vsock/virtio: Extend virtio-vsock spec with an "order" field
2024-05-17 14:46 [RFC PATCH 0/5] vsock/virtio: Add support for multi-devices Xuewei Niu
@ 2024-05-17 14:46 ` Xuewei Niu
2024-05-23 7:43 ` Alyssa Ross
2024-05-23 10:43 ` Stefano Garzarella
2024-05-17 14:46 ` [RFC PATCH 2/5] vsock/virtio: Add support for multi-devices Xuewei Niu
` (5 subsequent siblings)
6 siblings, 2 replies; 15+ messages in thread
From: Xuewei Niu @ 2024-05-17 14:46 UTC (permalink / raw)
To: stefanha, sgarzare
Cc: mst, davem, kvm, virtualization, netdev, linux-kernel, Xuewei Niu
The "order" field determines the location of the device in the linked list,
the device with CID 4, having a smallest order, is in the first place, and
so forth.
Rules:
* It doesn’t have to be continuous;
* It cannot exist conflicts;
* It is optional for the mode of a single device, but is required for the
mode of multiple devices.
Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
---
include/uapi/linux/virtio_vsock.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/include/uapi/linux/virtio_vsock.h b/include/uapi/linux/virtio_vsock.h
index 64738838bee5..b62ec7d2ab1e 100644
--- a/include/uapi/linux/virtio_vsock.h
+++ b/include/uapi/linux/virtio_vsock.h
@@ -43,6 +43,7 @@
struct virtio_vsock_config {
__le64 guest_cid;
+ __le64 order;
} __attribute__((packed));
enum virtio_vsock_event_id {
--
2.34.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [RFC PATCH 2/5] vsock/virtio: Add support for multi-devices
2024-05-17 14:46 [RFC PATCH 0/5] vsock/virtio: Add support for multi-devices Xuewei Niu
2024-05-17 14:46 ` [RFC PATCH 1/5] vsock/virtio: Extend virtio-vsock spec with an "order" field Xuewei Niu
@ 2024-05-17 14:46 ` Xuewei Niu
2024-05-23 10:43 ` Stefano Garzarella
2024-05-17 14:46 ` [RFC PATCH 3/5] vsock/virtio: can_msgzerocopy adapts to multi-devices Xuewei Niu
` (4 subsequent siblings)
6 siblings, 1 reply; 15+ messages in thread
From: Xuewei Niu @ 2024-05-17 14:46 UTC (permalink / raw)
To: stefanha, sgarzare
Cc: mst, davem, kvm, virtualization, netdev, linux-kernel, Xuewei Niu
The maximum number of devices is limited by `MAX_VSOCK_NUM`.
Extends `vsock_transport` struct with 4 methods to support multi-devices:
* `get_virtio_vsock()`: It receives a CID, and returns a struct of virtio
vsock. This method is designed to select a vsock device by its CID.
* `get_default_cid()`: It receives nothing, returns the default CID of the
first vsock device registered to the kernel.
* `get_local_cids()`: It returns a vector of vsock devices' CIDs.
* `compare_order()`: It receives two different CIDs, named "left" and
"right" respectively. It returns "-1" while the "left" is behind the
"right". Otherwise, return "1".
`get_local_cid()` is retained, but returns "-1" if the transport supports
multi-devices.
Replaces the single instance of `virtio_vsock` with a list, named
`virtio_vsock_list`. The devices are inserted into the list when probing.
The kernel will deny devices from being registered if there are conflicts
existing in CIDs or orders.
Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
---
include/net/af_vsock.h | 16 ++
include/uapi/linux/vm_sockets.h | 6 +
net/vmw_vsock/af_vsock.c | 82 ++++++--
net/vmw_vsock/virtio_transport.c | 246 ++++++++++++++++++------
net/vmw_vsock/virtio_transport_common.c | 10 +-
5 files changed, 293 insertions(+), 67 deletions(-)
diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index 535701efc1e5..0151296a0bc5 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -174,6 +174,22 @@ struct vsock_transport {
/* Addressing. */
u32 (*get_local_cid)(void);
+ /* Held rcu read lock by the caller. */
+ struct virtio_vsock *(*get_virtio_vsock)(unsigned int cid);
+ unsigned int (*get_default_cid)(void);
+ /* Get an list containing all the CIDs of registered vsock. Return
+ * the length of the list.
+ *
+ * Held rcu read lock by the caller.
+ */
+ int (*get_local_cids)(unsigned int *local_cids);
+ /* Compare the order of two devices. Given the guest CIDs of two
+ * different devices, returns -1 while the left is behind the right.
+ * Otherwise, return 1.
+ *
+ * Held rcu read lock by the caller.
+ */
+ int (*compare_order)(unsigned int left, unsigned int right);
/* Read a single skb */
int (*read_skb)(struct vsock_sock *, skb_read_actor_t);
diff --git a/include/uapi/linux/vm_sockets.h b/include/uapi/linux/vm_sockets.h
index ed07181d4eff..36ca5023293a 100644
--- a/include/uapi/linux/vm_sockets.h
+++ b/include/uapi/linux/vm_sockets.h
@@ -189,6 +189,12 @@ struct sockaddr_vm {
sizeof(__u8)];
};
+/* The maximum number of vsock devices. Each vsock device has an exclusive
+ * context id.
+ */
+
+#define MAX_VSOCK_NUM 16
+
#define IOCTL_VM_SOCKETS_GET_LOCAL_CID _IO(7, 0xb9)
/* MSG_ZEROCOPY notifications are encoded in the standard error format,
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 54ba7316f808..da06ddc940cd 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -234,19 +234,45 @@ static void __vsock_remove_connected(struct vsock_sock *vsk)
static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr)
{
- struct vsock_sock *vsk;
+ struct vsock_sock *vsk, *any_vsk = NULL;
+ rcu_read_lock();
list_for_each_entry(vsk, vsock_bound_sockets(addr), bound_table) {
+ /* The highest priority: full match. */
if (vsock_addr_equals_addr(addr, &vsk->local_addr))
- return sk_vsock(vsk);
+ goto out;
- if (addr->svm_port == vsk->local_addr.svm_port &&
- (vsk->local_addr.svm_cid == VMADDR_CID_ANY ||
- addr->svm_cid == VMADDR_CID_ANY))
- return sk_vsock(vsk);
+ /* Port match */
+ if (addr->svm_port == vsk->local_addr.svm_port) {
+ /* The second priority: local cid is VMADDR_CID_ANY. */
+ if (vsk->local_addr.svm_cid == VMADDR_CID_ANY)
+ goto out;
+
+ /* The third priority: local cid isn't VMADDR_CID_ANY. */
+ if (addr->svm_cid == VMADDR_CID_ANY) {
+ if (!any_vsk) {
+ any_vsk = vsk;
+ continue;
+ }
+ // Use the device with smaller order
+ if (vsk->transport->compare_order(any_vsk->local_addr.svm_cid,
+ vsk->local_addr.svm_cid) < 0)
+ any_vsk = vsk;
+ }
+ }
+ }
+ rcu_read_unlock();
+
+ if (any_vsk) {
+ pr_debug("matched a any_vsk at %p\n", any_vsk);
+ return sk_vsock(any_vsk);
}
return NULL;
+
+out:
+ rcu_read_unlock();
+ return sk_vsock(vsk);
}
static struct sock *__vsock_find_connected_socket(struct sockaddr_vm *src,
@@ -408,7 +434,11 @@ static bool vsock_use_local_transport(unsigned int remote_cid)
return true;
if (transport_g2h) {
- return remote_cid == transport_g2h->get_local_cid();
+ if (transport_g2h->get_virtio_vsock)
+ return transport_g2h->get_virtio_vsock(remote_cid) !=
+ NULL;
+ else
+ return remote_cid == transport_g2h->get_local_cid();
} else {
return remote_cid == VMADDR_CID_HOST;
}
@@ -516,9 +546,26 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
}
EXPORT_SYMBOL_GPL(vsock_assign_transport);
+bool transport_g2h_verify_cid(unsigned int cid)
+{
+ /* transports that support multi devices */
+ rcu_read_lock();
+ if (transport_g2h->get_virtio_vsock &&
+ (cid == VMADDR_CID_ANY || transport_g2h->get_virtio_vsock(cid))) {
+ rcu_read_unlock();
+ return true;
+ }
+ rcu_read_unlock();
+ /* other transports */
+ if (cid == transport_g2h->get_local_cid())
+ return true;
+
+ return false;
+}
+
bool vsock_find_cid(unsigned int cid)
{
- if (transport_g2h && cid == transport_g2h->get_local_cid())
+ if (transport_g2h && transport_g2h_verify_cid(cid))
return true;
if (transport_h2g && cid == VMADDR_CID_HOST)
@@ -697,7 +744,9 @@ static int __vsock_bind(struct sock *sk, struct sockaddr_vm *addr)
/* Now bind to the provided address or select appropriate values if
* none are provided (VMADDR_CID_ANY and VMADDR_PORT_ANY). Note that
* like AF_INET prevents binding to a non-local IP address (in most
- * cases), we only allow binding to a local CID.
+ * cases), we only allow binding to a local CID. In the cases of
+ * multi-devices, only CIDs of vsock devices registered in the kernel
+ * are allowed.
*/
if (addr->svm_cid != VMADDR_CID_ANY && !vsock_find_cid(addr->svm_cid))
return -EADDRNOTAVAIL;
@@ -825,7 +874,6 @@ static void __vsock_release(struct sock *sk, int level)
__vsock_release(pending, SINGLE_DEPTH_NESTING);
sock_put(pending);
}
-
release_sock(sk);
sock_put(sk);
}
@@ -1181,7 +1229,12 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
*/
if (remote_addr->svm_cid == VMADDR_CID_ANY)
- remote_addr->svm_cid = transport->get_local_cid();
+ if (transport->get_default_cid)
+ remote_addr->svm_cid =
+ transport->get_default_cid();
+ else
+ remote_addr->svm_cid =
+ transport->get_local_cid();
if (!vsock_addr_bound(remote_addr)) {
err = -EINVAL;
@@ -1191,7 +1244,12 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
remote_addr = &vsk->remote_addr;
if (remote_addr->svm_cid == VMADDR_CID_ANY)
- remote_addr->svm_cid = transport->get_local_cid();
+ if (transport->get_default_cid)
+ remote_addr->svm_cid =
+ transport->get_default_cid();
+ else
+ remote_addr->svm_cid =
+ transport->get_local_cid();
/* XXX Should connect() or this function ensure remote_addr is
* bound?
diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index ee5d306a96d0..93d25aeafb83 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -22,8 +22,8 @@
#include <net/af_vsock.h>
static struct workqueue_struct *virtio_vsock_workqueue;
-static struct virtio_vsock __rcu *the_virtio_vsock;
-static DEFINE_MUTEX(the_virtio_vsock_mutex); /* protects the_virtio_vsock */
+static LIST_HEAD(virtio_vsock_list); /* vsock multi-devices */
+static DEFINE_MUTEX(virtio_vsock_list_mutex); /* protects virtio_vsock_list */
static struct virtio_transport virtio_transport; /* forward declaration */
struct virtio_vsock {
@@ -62,6 +62,7 @@ struct virtio_vsock {
struct virtio_vsock_event event_list[8];
u32 guest_cid;
+ u32 order;
bool seqpacket_allow;
/* These fields are used only in tx path in function
@@ -74,24 +75,70 @@ struct virtio_vsock {
*/
struct scatterlist *out_sgs[MAX_SKB_FRAGS + 1];
struct scatterlist out_bufs[MAX_SKB_FRAGS + 1];
+
+ struct list_head node;
+ struct rcu_head rcu;
};
static u32 virtio_transport_get_local_cid(void)
+{
+ return VMADDR_CID_ANY;
+}
+
+/* Held rcu read lock by the caller. */
+static struct virtio_vsock *virtio_transport_get_virtio_vsock(unsigned int cid)
{
struct virtio_vsock *vsock;
- u32 ret;
+ list_for_each_entry(vsock, &virtio_vsock_list, node) {
+ if (vsock->guest_cid == cid)
+ return vsock;
+ }
+ return NULL;
+}
- rcu_read_lock();
- vsock = rcu_dereference(the_virtio_vsock);
- if (!vsock) {
- ret = VMADDR_CID_ANY;
- goto out_rcu;
+static unsigned int virtio_transport_get_default_cid(void)
+{
+ struct virtio_vsock *vsock;
+
+ vsock = list_first_or_null_rcu(&virtio_vsock_list, struct virtio_vsock,
+ node);
+ if (!vsock)
+ return VMADDR_CID_ANY;
+
+ return vsock->guest_cid;
+}
+
+/* Held rcu read lock by the caller. */
+static int virtio_transport_get_local_cids(u32 *cids)
+{
+ int count = 0;
+ struct virtio_vsock *vsock;
+
+ if (!cids)
+ return -EFAULT;
+
+ list_for_each_entry(vsock, &virtio_vsock_list, node) {
+ cids[count++] = vsock->guest_cid;
}
+ return count;
+}
- ret = vsock->guest_cid;
-out_rcu:
- rcu_read_unlock();
- return ret;
+/* Held rcu read lock by the caller. */
+static int virtio_transport_compare_order(unsigned int left, unsigned int right)
+{
+ struct virtio_vsock *vsock;
+
+ if (left == right)
+ return 0;
+
+ list_for_each_entry(vsock, &virtio_vsock_list, node) {
+ if (right == vsock->guest_cid)
+ return -1;
+ if (left == vsock->guest_cid)
+ return 1;
+ }
+
+ return 0;
}
static void
@@ -201,12 +248,16 @@ virtio_transport_send_pkt(struct sk_buff *skb)
struct virtio_vsock_hdr *hdr;
struct virtio_vsock *vsock;
int len = skb->len;
+ unsigned int src_cid;
hdr = virtio_vsock_hdr(skb);
+ src_cid = le64_to_cpu(hdr->src_cid);
rcu_read_lock();
- vsock = rcu_dereference(the_virtio_vsock);
+ vsock = virtio_transport_get_virtio_vsock(src_cid);
if (!vsock) {
+ pr_debug("pkt sending has been failed, as vsock with cid %u not found\n",
+ src_cid);
kfree_skb(skb);
len = -ENODEV;
goto out_rcu;
@@ -236,13 +287,17 @@ virtio_transport_cancel_pkt(struct vsock_sock *vsk)
int cnt = 0, ret;
rcu_read_lock();
- vsock = rcu_dereference(the_virtio_vsock);
- if (!vsock) {
+ if (list_empty(&virtio_vsock_list)) {
ret = -ENODEV;
goto out_rcu;
}
- cnt = virtio_transport_purge_skbs(vsk, &vsock->send_pkt_queue);
+ list_for_each_entry(vsock, &virtio_vsock_list, node) {
+ cnt = virtio_transport_purge_skbs(vsk, &vsock->send_pkt_queue);
+ /* Skbs for the vsk belong to one virtio_vsock */
+ if (cnt)
+ break;
+ }
if (cnt) {
struct virtqueue *rx_vq = vsock->vqs[VSOCK_VQ_RX];
@@ -385,6 +440,16 @@ static void virtio_vsock_update_guest_cid(struct virtio_vsock *vsock)
vsock->guest_cid = le64_to_cpu(guest_cid);
}
+static void virtio_vsock_update_order(struct virtio_vsock *vsock)
+{
+ struct virtio_device *vdev = vsock->vdev;
+ __le64 order;
+
+ vdev->config->get(vdev, offsetof(struct virtio_vsock_config, order),
+ &order, sizeof(order));
+ vsock->order = le64_to_cpu(order);
+}
+
/* event_lock must be held */
static void virtio_vsock_event_handle(struct virtio_vsock *vsock,
struct virtio_vsock_event *event)
@@ -492,13 +557,17 @@ static bool virtio_transport_msgzerocopy_allow(void)
return true;
}
-static bool virtio_transport_seqpacket_allow(u32 remote_cid);
+static bool virtio_transport_seqpacket_allow(u32 src_cid, u32 remote_cid);
static struct virtio_transport virtio_transport = {
.transport = {
.module = THIS_MODULE,
.get_local_cid = virtio_transport_get_local_cid,
+ .get_virtio_vsock = virtio_transport_get_virtio_vsock,
+ .get_default_cid = virtio_transport_get_default_cid,
+ .get_local_cids = virtio_transport_get_local_cids,
+ .compare_order = virtio_transport_compare_order,
.init = virtio_transport_do_socket_init,
.destruct = virtio_transport_destruct,
@@ -617,7 +686,38 @@ static void virtio_transport_rx_work(struct work_struct *work)
static int virtio_vsock_vqs_init(struct virtio_vsock *vsock)
{
struct virtio_device *vdev = vsock->vdev;
- static const char * const names[] = {
+ struct virtio_vsock *_vsock = NULL;
+ int i;
+ unsigned int guest_cid, order;
+
+ virtio_vsock_update_guest_cid(vsock);
+ virtio_vsock_update_order(vsock);
+ guest_cid = vsock->guest_cid;
+ order = vsock->order;
+
+ i = 0;
+ list_for_each_entry(_vsock, &virtio_vsock_list, node) {
+ if (++i > MAX_VSOCK_NUM) {
+ pr_debug("vsock num reaches limit %d\n", MAX_VSOCK_NUM);
+ return -ENOMEM;
+ }
+
+ /* Check for guest_cid conflicts */
+ if (unlikely(guest_cid == _vsock->guest_cid)) {
+ pr_debug("conflict exists in vsock's guest_cid %u\n",
+ guest_cid);
+ return -EBUSY;
+ }
+
+ /* Check for order conflicts */
+ if (unlikely(order == _vsock->order)) {
+ pr_debug("conflict exists in vsock's order %u\n",
+ order);
+ return -EBUSY;
+ }
+ }
+
+ static const char *const names[] = {
"rx",
"tx",
"event",
@@ -634,8 +734,6 @@ static int virtio_vsock_vqs_init(struct virtio_vsock *vsock)
if (ret < 0)
return ret;
- virtio_vsock_update_guest_cid(vsock);
-
virtio_device_ready(vdev);
return 0;
@@ -716,21 +814,14 @@ static void virtio_vsock_vqs_del(struct virtio_vsock *vsock)
static int virtio_vsock_probe(struct virtio_device *vdev)
{
- struct virtio_vsock *vsock = NULL;
- int ret;
- int i;
+ struct virtio_vsock *vsock, *_vsock, *first_vsock;
+ int ret, i;
+ unsigned int guest_cid, order;
- ret = mutex_lock_interruptible(&the_virtio_vsock_mutex);
+ ret = mutex_lock_interruptible(&virtio_vsock_list_mutex);
if (ret)
return ret;
- /* Only one virtio-vsock device per guest is supported */
- if (rcu_dereference_protected(the_virtio_vsock,
- lockdep_is_held(&the_virtio_vsock_mutex))) {
- ret = -EBUSY;
- goto out;
- }
-
vsock = kzalloc(sizeof(*vsock), GFP_KERNEL);
if (!vsock) {
ret = -ENOMEM;
@@ -764,28 +855,56 @@ static int virtio_vsock_probe(struct virtio_device *vdev)
for (i = 0; i < ARRAY_SIZE(vsock->out_sgs); i++)
vsock->out_sgs[i] = &vsock->out_bufs[i];
- rcu_assign_pointer(the_virtio_vsock, vsock);
+ order = vsock->order;
+ guest_cid = vsock->guest_cid;
+ first_vsock =
+ list_first_entry(&virtio_vsock_list, struct virtio_vsock, node);
+ /* Insert virtio-vsock device into a proper location. */
+ if (list_empty(&virtio_vsock_list) || first_vsock->order > order) {
+ list_add_rcu(&vsock->node, &virtio_vsock_list);
+ } else {
+ list_for_each_entry(_vsock, &virtio_vsock_list, node) {
+ struct virtio_vsock *next = container_of(_vsock->node.next,
+ struct virtio_vsock, node);
+ if (&next->node != &virtio_vsock_list &&
+ next->order < order)
+ continue;
+ list_add_rcu(&vsock->node, &_vsock->node);
+ break;
+ }
+ }
+
+ pr_debug("virtio_vsock@%p registered (cid = %u, order = %u)\n", vsock, guest_cid, order);
+
virtio_vsock_vqs_start(vsock);
- mutex_unlock(&the_virtio_vsock_mutex);
+ mutex_unlock(&virtio_vsock_list_mutex);
return 0;
out:
kfree(vsock);
- mutex_unlock(&the_virtio_vsock_mutex);
+ mutex_unlock(&virtio_vsock_list_mutex);
return ret;
}
static void virtio_vsock_remove(struct virtio_device *vdev)
{
- struct virtio_vsock *vsock = vdev->priv;
+ struct virtio_vsock *vsock, *_vsock;
- mutex_lock(&the_virtio_vsock_mutex);
+ vsock = vdev->priv;
+
+ mutex_lock(&virtio_vsock_list_mutex);
vdev->priv = NULL;
- rcu_assign_pointer(the_virtio_vsock, NULL);
- synchronize_rcu();
+ /* Remove virtio-vsock device from the list. */
+ list_for_each_entry(_vsock, &virtio_vsock_list, node) {
+ if (vsock == _vsock) {
+ list_del_rcu(&vsock->node);
+ synchronize_rcu();
+ break;
+ }
+ }
virtio_vsock_vqs_del(vsock);
@@ -797,7 +916,7 @@ static void virtio_vsock_remove(struct virtio_device *vdev)
flush_work(&vsock->event_work);
flush_work(&vsock->send_pkt_work);
- mutex_unlock(&the_virtio_vsock_mutex);
+ mutex_unlock(&virtio_vsock_list_mutex);
kfree(vsock);
}
@@ -805,43 +924,62 @@ static void virtio_vsock_remove(struct virtio_device *vdev)
#ifdef CONFIG_PM_SLEEP
static int virtio_vsock_freeze(struct virtio_device *vdev)
{
- struct virtio_vsock *vsock = vdev->priv;
+ struct virtio_vsock *vsock, *_vsock;
- mutex_lock(&the_virtio_vsock_mutex);
+ vsock = vdev->priv;
- rcu_assign_pointer(the_virtio_vsock, NULL);
- synchronize_rcu();
+ mutex_lock(&virtio_vsock_list_mutex);
+
+ list_for_each_entry(_vsock, &virtio_vsock_list, node) {
+ if (vsock == _vsock) {
+ list_del_rcu(&vsock->node);
+ synchronize_rcu();
+ break;
+ }
+ }
virtio_vsock_vqs_del(vsock);
- mutex_unlock(&the_virtio_vsock_mutex);
+ mutex_unlock(&virtio_vsock_list_mutex);
return 0;
}
static int virtio_vsock_restore(struct virtio_device *vdev)
{
- struct virtio_vsock *vsock = vdev->priv;
+ struct virtio_vsock *vsock, *_vsock, *first_vsock;
int ret;
+ unsigned int order;
- mutex_lock(&the_virtio_vsock_mutex);
+ vsock = vdev->priv;
- /* Only one virtio-vsock device per guest is supported */
- if (rcu_dereference_protected(the_virtio_vsock,
- lockdep_is_held(&the_virtio_vsock_mutex))) {
- ret = -EBUSY;
- goto out;
- }
+ mutex_lock(&virtio_vsock_list_mutex);
ret = virtio_vsock_vqs_init(vsock);
if (ret < 0)
goto out;
- rcu_assign_pointer(the_virtio_vsock, vsock);
+ order = vsock->order;
+ first_vsock =
+ list_first_entry(&virtio_vsock_list, struct virtio_vsock, node);
+ /* Insert virtio-vsock device into a proper location. */
+ if (list_empty(&virtio_vsock_list) || first_vsock->order > order) {
+ list_add_rcu(&vsock->node, &virtio_vsock_list);
+ } else {
+ list_for_each_entry(_vsock, &virtio_vsock_list, node) {
+ struct virtio_vsock *next = container_of(_vsock->node.next,
+ struct virtio_vsock, node);
+ if (&next->node != &virtio_vsock_list &&
+ next->order < order)
+ continue;
+ list_add_rcu(&vsock->node, &_vsock->node);
+ break;
+ }
+ }
virtio_vsock_vqs_start(vsock);
out:
- mutex_unlock(&the_virtio_vsock_mutex);
+ mutex_unlock(&virtio_vsock_list_mutex);
return ret;
}
#endif /* CONFIG_PM_SLEEP */
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 16ff976a86e3..bed75a41419e 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -340,7 +340,15 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
if (unlikely(!t_ops))
return -EFAULT;
- src_cid = t_ops->transport.get_local_cid();
+ if (vsk->local_addr.svm_cid == VMADDR_CID_ANY) {
+ if (t_ops->transport.get_default_cid)
+ src_cid = t_ops->transport.get_default_cid();
+ else
+ src_cid = t_ops->transport.get_local_cid();
+ } else {
+ src_cid = vsk->local_addr.svm_cid;
+ }
+
src_port = vsk->local_addr.svm_port;
if (!info->remote_cid) {
dst_cid = vsk->remote_addr.svm_cid;
--
2.34.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [RFC PATCH 3/5] vsock/virtio: can_msgzerocopy adapts to multi-devices
2024-05-17 14:46 [RFC PATCH 0/5] vsock/virtio: Add support for multi-devices Xuewei Niu
2024-05-17 14:46 ` [RFC PATCH 1/5] vsock/virtio: Extend virtio-vsock spec with an "order" field Xuewei Niu
2024-05-17 14:46 ` [RFC PATCH 2/5] vsock/virtio: Add support for multi-devices Xuewei Niu
@ 2024-05-17 14:46 ` Xuewei Niu
2024-05-23 10:44 ` Stefano Garzarella
2024-05-17 14:46 ` [RFC PATCH 4/5] vsock: seqpacket_allow " Xuewei Niu
` (3 subsequent siblings)
6 siblings, 1 reply; 15+ messages in thread
From: Xuewei Niu @ 2024-05-17 14:46 UTC (permalink / raw)
To: stefanha, sgarzare
Cc: mst, davem, kvm, virtualization, netdev, linux-kernel, Xuewei Niu
Adds a new argument, named "cid", to let them know which `virtio_vsock` to
be selected.
Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
---
include/linux/virtio_vsock.h | 2 +-
net/vmw_vsock/virtio_transport.c | 5 ++---
net/vmw_vsock/virtio_transport_common.c | 6 +++---
3 files changed, 6 insertions(+), 7 deletions(-)
diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
index c82089dee0c8..21bfd5e0c2e7 100644
--- a/include/linux/virtio_vsock.h
+++ b/include/linux/virtio_vsock.h
@@ -168,7 +168,7 @@ struct virtio_transport {
* extra checks and can perform zerocopy transmission by
* default.
*/
- bool (*can_msgzerocopy)(int bufs_num);
+ bool (*can_msgzerocopy)(u32 cid, int bufs_num);
};
ssize_t
diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index 93d25aeafb83..998b22e5ce36 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -521,14 +521,13 @@ static void virtio_vsock_rx_done(struct virtqueue *vq)
queue_work(virtio_vsock_workqueue, &vsock->rx_work);
}
-static bool virtio_transport_can_msgzerocopy(int bufs_num)
+static bool virtio_transport_can_msgzerocopy(u32 cid, int bufs_num)
{
struct virtio_vsock *vsock;
bool res = false;
rcu_read_lock();
-
- vsock = rcu_dereference(the_virtio_vsock);
+ vsock = virtio_transport_get_virtio_vsock(cid);
if (vsock) {
struct virtqueue *vq = vsock->vqs[VSOCK_VQ_TX];
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index bed75a41419e..e7315d7b9af1 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -39,7 +39,7 @@ virtio_transport_get_ops(struct vsock_sock *vsk)
static bool virtio_transport_can_zcopy(const struct virtio_transport *t_ops,
struct virtio_vsock_pkt_info *info,
- size_t pkt_len)
+ size_t pkt_len, unsigned int cid)
{
struct iov_iter *iov_iter;
@@ -62,7 +62,7 @@ static bool virtio_transport_can_zcopy(const struct virtio_transport *t_ops,
int pages_to_send = iov_iter_npages(iov_iter, MAX_SKB_FRAGS);
/* +1 is for packet header. */
- return t_ops->can_msgzerocopy(pages_to_send + 1);
+ return t_ops->can_msgzerocopy(cid, pages_to_send + 1);
}
return true;
@@ -375,7 +375,7 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
info->msg->msg_flags &= ~MSG_ZEROCOPY;
if (info->msg->msg_flags & MSG_ZEROCOPY)
- can_zcopy = virtio_transport_can_zcopy(t_ops, info, pkt_len);
+ can_zcopy = virtio_transport_can_zcopy(t_ops, info, pkt_len, src_cid);
if (can_zcopy)
max_skb_len = min_t(u32, VIRTIO_VSOCK_MAX_PKT_BUF_SIZE,
--
2.34.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [RFC PATCH 4/5] vsock: seqpacket_allow adapts to multi-devices
2024-05-17 14:46 [RFC PATCH 0/5] vsock/virtio: Add support for multi-devices Xuewei Niu
` (2 preceding siblings ...)
2024-05-17 14:46 ` [RFC PATCH 3/5] vsock/virtio: can_msgzerocopy adapts to multi-devices Xuewei Niu
@ 2024-05-17 14:46 ` Xuewei Niu
2024-05-23 10:44 ` Stefano Garzarella
2024-05-17 14:46 ` [RFC PATCH 5/5] vsock: Add an ioctl request to get all CIDs Xuewei Niu
` (2 subsequent siblings)
6 siblings, 1 reply; 15+ messages in thread
From: Xuewei Niu @ 2024-05-17 14:46 UTC (permalink / raw)
To: stefanha, sgarzare
Cc: mst, davem, kvm, virtualization, netdev, linux-kernel, Xuewei Niu
Adds a new argument, named "src_cid", to let them know which `virtio_vsock`
to be selected.
Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
---
include/net/af_vsock.h | 2 +-
net/vmw_vsock/af_vsock.c | 15 +++++++++++++--
net/vmw_vsock/virtio_transport.c | 4 ++--
net/vmw_vsock/vsock_loopback.c | 4 ++--
4 files changed, 18 insertions(+), 7 deletions(-)
diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index 0151296a0bc5..25f7dc3d602d 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -143,7 +143,7 @@ struct vsock_transport {
int flags);
int (*seqpacket_enqueue)(struct vsock_sock *vsk, struct msghdr *msg,
size_t len);
- bool (*seqpacket_allow)(u32 remote_cid);
+ bool (*seqpacket_allow)(u32 src_cid, u32 remote_cid);
u32 (*seqpacket_has_data)(struct vsock_sock *vsk);
/* Notification. */
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index da06ddc940cd..3b34be802bf2 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -470,10 +470,12 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
{
const struct vsock_transport *new_transport;
struct sock *sk = sk_vsock(vsk);
- unsigned int remote_cid = vsk->remote_addr.svm_cid;
+ unsigned int src_cid, remote_cid;
__u8 remote_flags;
int ret;
+ remote_cid = vsk->remote_addr.svm_cid;
+
/* If the packet is coming with the source and destination CIDs higher
* than VMADDR_CID_HOST, then a vsock channel where all the packets are
* forwarded to the host should be established. Then the host will
@@ -527,8 +529,17 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
return -ENODEV;
if (sk->sk_type == SOCK_SEQPACKET) {
+ if (vsk->local_addr.svm_cid == VMADDR_CID_ANY) {
+ if (new_transport->get_default_cid)
+ src_cid = new_transport->get_default_cid();
+ else
+ src_cid = new_transport->get_local_cid();
+ } else {
+ src_cid = vsk->local_addr.svm_cid;
+ }
+
if (!new_transport->seqpacket_allow ||
- !new_transport->seqpacket_allow(remote_cid)) {
+ !new_transport->seqpacket_allow(src_cid, remote_cid)) {
module_put(new_transport->module);
return -ESOCKTNOSUPPORT;
}
diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index 998b22e5ce36..0bddcbd906a2 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -615,14 +615,14 @@ static struct virtio_transport virtio_transport = {
.can_msgzerocopy = virtio_transport_can_msgzerocopy,
};
-static bool virtio_transport_seqpacket_allow(u32 remote_cid)
+static bool virtio_transport_seqpacket_allow(u32 src_cid, u32 remote_cid)
{
struct virtio_vsock *vsock;
bool seqpacket_allow;
seqpacket_allow = false;
rcu_read_lock();
- vsock = rcu_dereference(the_virtio_vsock);
+ vsock = virtio_transport_get_virtio_vsock(src_cid);
if (vsock)
seqpacket_allow = vsock->seqpacket_allow;
rcu_read_unlock();
diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c
index 6dea6119f5b2..b94358f5bb2c 100644
--- a/net/vmw_vsock/vsock_loopback.c
+++ b/net/vmw_vsock/vsock_loopback.c
@@ -46,7 +46,7 @@ static int vsock_loopback_cancel_pkt(struct vsock_sock *vsk)
return 0;
}
-static bool vsock_loopback_seqpacket_allow(u32 remote_cid);
+static bool vsock_loopback_seqpacket_allow(u32 src_cid, u32 remote_cid);
static bool vsock_loopback_msgzerocopy_allow(void)
{
return true;
@@ -104,7 +104,7 @@ static struct virtio_transport loopback_transport = {
.send_pkt = vsock_loopback_send_pkt,
};
-static bool vsock_loopback_seqpacket_allow(u32 remote_cid)
+static bool vsock_loopback_seqpacket_allow(u32 src_cid, u32 remote_cid)
{
return true;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [RFC PATCH 5/5] vsock: Add an ioctl request to get all CIDs
2024-05-17 14:46 [RFC PATCH 0/5] vsock/virtio: Add support for multi-devices Xuewei Niu
` (3 preceding siblings ...)
2024-05-17 14:46 ` [RFC PATCH 4/5] vsock: seqpacket_allow " Xuewei Niu
@ 2024-05-17 14:46 ` Xuewei Niu
2024-05-23 10:45 ` Stefano Garzarella
2024-05-23 10:46 ` [RFC PATCH 0/5] vsock/virtio: Add support for multi-devices Stefano Garzarella
2024-05-23 14:54 ` Michael S. Tsirkin
6 siblings, 1 reply; 15+ messages in thread
From: Xuewei Niu @ 2024-05-17 14:46 UTC (permalink / raw)
To: stefanha, sgarzare
Cc: mst, davem, kvm, virtualization, netdev, linux-kernel, Xuewei Niu
The new request is called `IOCTL_VM_SOCKETS_GET_LOCAL_CIDS`. And the old
one, `IOCTL_VM_SOCKETS_GET_LOCAL_CID` is retained.
For the transport that supports multi-devices:
* `IOCTL_VM_SOCKETS_GET_LOCAL_CID` returns "-1";
* `IOCTL_VM_SOCKETS_GET_LOCAL_CIDS` returns a vector of CIDS. The usage is
shown as following.
```
struct vsock_local_cids local_cids;
if ((ret = ioctl(fd, IOCTL_VM_SOCKETS_GET_LOCAL_CIDS, &local_cids))) {
perror("failed to get cids");
exit(1);
}
for (i = 0; i<local_cids.nr; i++) {
if (i == (local_cids.nr - 1))
printf("%u", local_cids.data[i]);
else
printf("%u,", local_cids.data[i]);
}
```
Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
---
include/net/af_vsock.h | 7 +++++++
include/uapi/linux/vm_sockets.h | 8 ++++++++
net/vmw_vsock/af_vsock.c | 19 +++++++++++++++++++
3 files changed, 34 insertions(+)
diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index 25f7dc3d602d..2febc816e388 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -264,4 +264,11 @@ static inline bool vsock_msgzerocopy_allow(const struct vsock_transport *t)
{
return t->msgzerocopy_allow && t->msgzerocopy_allow();
}
+
+/**** IOCTL ****/
+/* Type of return value of IOCTL_VM_SOCKETS_GET_LOCAL_CIDS. */
+struct vsock_local_cids {
+ int nr;
+ unsigned int data[MAX_VSOCK_NUM];
+};
#endif /* __AF_VSOCK_H__ */
diff --git a/include/uapi/linux/vm_sockets.h b/include/uapi/linux/vm_sockets.h
index 36ca5023293a..01f73fb7af5a 100644
--- a/include/uapi/linux/vm_sockets.h
+++ b/include/uapi/linux/vm_sockets.h
@@ -195,8 +195,16 @@ struct sockaddr_vm {
#define MAX_VSOCK_NUM 16
+/* Return actual context id if the transport not support vsock
+ * multi-devices. Otherwise, return `-1U`.
+ */
+
#define IOCTL_VM_SOCKETS_GET_LOCAL_CID _IO(7, 0xb9)
+/* Only available in transports that support multiple devices. */
+
+#define IOCTL_VM_SOCKETS_GET_LOCAL_CIDS _IOR(7, 0xba, struct vsock_local_cids)
+
/* MSG_ZEROCOPY notifications are encoded in the standard error format,
* sock_extended_err. See Documentation/networking/msg_zerocopy.rst in
* kernel source tree for more details.
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 3b34be802bf2..2ea2ff52f15b 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -2454,6 +2454,7 @@ static long vsock_dev_do_ioctl(struct file *filp,
u32 __user *p = ptr;
u32 cid = VMADDR_CID_ANY;
int retval = 0;
+ struct vsock_local_cids local_cids;
switch (cmd) {
case IOCTL_VM_SOCKETS_GET_LOCAL_CID:
@@ -2469,6 +2470,24 @@ static long vsock_dev_do_ioctl(struct file *filp,
retval = -EFAULT;
break;
+ case IOCTL_VM_SOCKETS_GET_LOCAL_CIDS:
+ if (!transport_g2h || !transport_g2h->get_local_cids)
+ goto fault;
+
+ rcu_read_lock();
+ local_cids.nr = transport_g2h->get_local_cids(local_cids.data);
+ rcu_read_unlock();
+
+ if (local_cids.nr < 0 ||
+ copy_to_user(p, &local_cids, sizeof(local_cids)))
+ goto fault;
+
+ break;
+
+fault:
+ retval = -EFAULT;
+ break;
+
default:
retval = -ENOIOCTLCMD;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [RFC PATCH 1/5] vsock/virtio: Extend virtio-vsock spec with an "order" field
2024-05-17 14:46 ` [RFC PATCH 1/5] vsock/virtio: Extend virtio-vsock spec with an "order" field Xuewei Niu
@ 2024-05-23 7:43 ` Alyssa Ross
2024-05-23 10:43 ` Stefano Garzarella
1 sibling, 0 replies; 15+ messages in thread
From: Alyssa Ross @ 2024-05-23 7:43 UTC (permalink / raw)
To: Xuewei Niu
Cc: stefanha, sgarzare, mst, davem, kvm, virtualization, netdev,
linux-kernel, Xuewei Niu, virtio-comment
[-- Attachment #1: Type: text/plain, Size: 1292 bytes --]
(CCing virtio-comment, since this proposes adding a field to a struct
that is standardized[1] in the VIRTIO spec, so changes to the Linux
implementation should presumably be coordinated with changes to the
spec.)
[1]: https://docs.oasis-open.org/virtio/virtio/v1.3/csd01/virtio-v1.3-csd01.html#x1-4780004
On Fri, May 17, 2024 at 10:46:03PM +0800, Xuewei Niu wrote:
> The "order" field determines the location of the device in the linked list,
> the device with CID 4, having a smallest order, is in the first place, and
> so forth.
>
> Rules:
>
> * It doesn’t have to be continuous;
> * It cannot exist conflicts;
> * It is optional for the mode of a single device, but is required for the
> mode of multiple devices.
>
> Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
> ---
> include/uapi/linux/virtio_vsock.h | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/include/uapi/linux/virtio_vsock.h b/include/uapi/linux/virtio_vsock.h
> index 64738838bee5..b62ec7d2ab1e 100644
> --- a/include/uapi/linux/virtio_vsock.h
> +++ b/include/uapi/linux/virtio_vsock.h
> @@ -43,6 +43,7 @@
>
> struct virtio_vsock_config {
> __le64 guest_cid;
> + __le64 order;
> } __attribute__((packed));
>
> enum virtio_vsock_event_id {
> --
> 2.34.1
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC PATCH 1/5] vsock/virtio: Extend virtio-vsock spec with an "order" field
2024-05-17 14:46 ` [RFC PATCH 1/5] vsock/virtio: Extend virtio-vsock spec with an "order" field Xuewei Niu
2024-05-23 7:43 ` Alyssa Ross
@ 2024-05-23 10:43 ` Stefano Garzarella
1 sibling, 0 replies; 15+ messages in thread
From: Stefano Garzarella @ 2024-05-23 10:43 UTC (permalink / raw)
To: Xuewei Niu
Cc: stefanha, mst, davem, kvm, virtualization, netdev, linux-kernel,
Xuewei Niu
As Alyssa suggested, we should discuss spec changes in the virtio ML.
BTW as long as this is an RFC, it's fine. Just be sure, though, to
remember to merge the change in the specification first versus the
patches in Linux.
So I recommend that you don't send a non-RFC set into Linux until you
have agreed on the changes to the specification.
On Fri, May 17, 2024 at 10:46:03PM GMT, Xuewei Niu wrote:
>The "order" field determines the location of the device in the linked list,
>the device with CID 4, having a smallest order, is in the first place, and
>so forth.
Do we really need an order, or would it suffice to just indicate the
device to be used by default? (as the default gateway in networking)
>
>Rules:
>
>* It doesn’t have to be continuous;
>* It cannot exist conflicts;
>* It is optional for the mode of a single device, but is required for the
> mode of multiple devices.
We should also add a feature to support this new field.
>
>Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
>---
> include/uapi/linux/virtio_vsock.h | 1 +
> 1 file changed, 1 insertion(+)
>
>diff --git a/include/uapi/linux/virtio_vsock.h b/include/uapi/linux/virtio_vsock.h
>index 64738838bee5..b62ec7d2ab1e 100644
>--- a/include/uapi/linux/virtio_vsock.h
>+++ b/include/uapi/linux/virtio_vsock.h
>@@ -43,6 +43,7 @@
>
> struct virtio_vsock_config {
> __le64 guest_cid;
>+ __le64 order;
Do we really need 64 bits for the order?
> } __attribute__((packed));
>
> enum virtio_vsock_event_id {
>--
>2.34.1
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC PATCH 2/5] vsock/virtio: Add support for multi-devices
2024-05-17 14:46 ` [RFC PATCH 2/5] vsock/virtio: Add support for multi-devices Xuewei Niu
@ 2024-05-23 10:43 ` Stefano Garzarella
0 siblings, 0 replies; 15+ messages in thread
From: Stefano Garzarella @ 2024-05-23 10:43 UTC (permalink / raw)
To: Xuewei Niu
Cc: stefanha, mst, davem, kvm, virtualization, netdev, linux-kernel,
Xuewei Niu
On Fri, May 17, 2024 at 10:46:04PM GMT, Xuewei Niu wrote:
>The maximum number of devices is limited by `MAX_VSOCK_NUM`.
>
>Extends `vsock_transport` struct with 4 methods to support multi-devices:
>
>* `get_virtio_vsock()`: It receives a CID, and returns a struct of virtio
> vsock. This method is designed to select a vsock device by its CID.
>* `get_default_cid()`: It receives nothing, returns the default CID of the
> first vsock device registered to the kernel.
>* `get_local_cids()`: It returns a vector of vsock devices' CIDs.
>* `compare_order()`: It receives two different CIDs, named "left" and
> "right" respectively. It returns "-1" while the "left" is behind the
> "right". Otherwise, return "1".
>
>`get_local_cid()` is retained, but returns "-1" if the transport supports
>multi-devices.
>
>Replaces the single instance of `virtio_vsock` with a list, named
>`virtio_vsock_list`. The devices are inserted into the list when probing.
>
>The kernel will deny devices from being registered if there are conflicts
>existing in CIDs or orders.
>
>Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
>---
> include/net/af_vsock.h | 16 ++
> include/uapi/linux/vm_sockets.h | 6 +
> net/vmw_vsock/af_vsock.c | 82 ++++++--
> net/vmw_vsock/virtio_transport.c | 246 ++++++++++++++++++------
> net/vmw_vsock/virtio_transport_common.c | 10 +-
> 5 files changed, 293 insertions(+), 67 deletions(-)
>
>diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
>index 535701efc1e5..0151296a0bc5 100644
>--- a/include/net/af_vsock.h
>+++ b/include/net/af_vsock.h
>@@ -174,6 +174,22 @@ struct vsock_transport {
>
> /* Addressing. */
> u32 (*get_local_cid)(void);
>+ /* Held rcu read lock by the caller. */
We should also explain why the rcu is needed.
>+ struct virtio_vsock *(*get_virtio_vsock)(unsigned int cid);
af_vsock supports several transports (i.e. HyperV, VMCI, VIRTIO/VHOST,
loobpack), so we need to be generic here.
In addition, the pointer returned by this function is never used, so
why we need this?
>+ unsigned int (*get_default_cid)(void);
>+ /* Get an list containing all the CIDs of registered vsock. Return
>+ * the length of the list.
>+ *
>+ * Held rcu read lock by the caller.
>+ */
>+ int (*get_local_cids)(unsigned int *local_cids);
Why int? get_local_cid() returns an u32, we should do the same.
In addition, can we remove get_local_cid() and implement
get_local_cids() for all the transports?
>+ /* Compare the order of two devices. Given the guest CIDs of two
>+ * different devices, returns -1 while the left is behind the right.
>+ * Otherwise, return 1.
>+ *
>+ * Held rcu read lock by the caller.
>+ */
>+ int (*compare_order)(unsigned int left, unsigned int right);
Please check better the type for CIDs all over the place.
>
> /* Read a single skb */
> int (*read_skb)(struct vsock_sock *, skb_read_actor_t);
>diff --git a/include/uapi/linux/vm_sockets.h b/include/uapi/linux/vm_sockets.h
>index ed07181d4eff..36ca5023293a 100644
>--- a/include/uapi/linux/vm_sockets.h
>+++ b/include/uapi/linux/vm_sockets.h
>@@ -189,6 +189,12 @@ struct sockaddr_vm {
> sizeof(__u8)];
> };
>
>+/* The maximum number of vsock devices. Each vsock device has an exclusive
>+ * context id.
>+ */
>+
>+#define MAX_VSOCK_NUM 16
This is used internally in AF_VSOCK, I don't think we should expose it
in the UAPI.
>+
> #define IOCTL_VM_SOCKETS_GET_LOCAL_CID _IO(7, 0xb9)
>
> /* MSG_ZEROCOPY notifications are encoded in the standard error format,
>diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>index 54ba7316f808..da06ddc940cd 100644
>--- a/net/vmw_vsock/af_vsock.c
>+++ b/net/vmw_vsock/af_vsock.c
>@@ -234,19 +234,45 @@ static void __vsock_remove_connected(struct vsock_sock *vsk)
>
> static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr)
> {
>- struct vsock_sock *vsk;
>+ struct vsock_sock *vsk, *any_vsk = NULL;
>
>+ rcu_read_lock();
Why the rcu is needed?
> list_for_each_entry(vsk, vsock_bound_sockets(addr), bound_table)
> {
>+ /* The highest priority: full match. */
> if (vsock_addr_equals_addr(addr, &vsk->local_addr))
>- return sk_vsock(vsk);
>+ goto out;
>
>- if (addr->svm_port == vsk->local_addr.svm_port &&
>- (vsk->local_addr.svm_cid == VMADDR_CID_ANY ||
>- addr->svm_cid == VMADDR_CID_ANY))
>- return sk_vsock(vsk);
>+ /* Port match */
>+ if (addr->svm_port == vsk->local_addr.svm_port) {
>+ /* The second priority: local cid is VMADDR_CID_ANY. */
>+ if (vsk->local_addr.svm_cid == VMADDR_CID_ANY)
>+ goto out;
>+
>+ /* The third priority: local cid isn't VMADDR_CID_ANY. */
>+ if (addr->svm_cid == VMADDR_CID_ANY) {
>+ if (!any_vsk) {
>+ any_vsk = vsk;
>+ continue;
>+ }
>+ // Use the device with smaller order
>+ if (vsk->transport->compare_order(any_vsk->local_addr.svm_cid,
>+ vsk->local_addr.svm_cid) < 0)
>+ any_vsk = vsk;
>+ }
>+ }
>+ }
>+ rcu_read_unlock();
>+
>+ if (any_vsk) {
>+ pr_debug("matched a any_vsk at %p\n", any_vsk);
>+ return sk_vsock(any_vsk);
> }
>
> return NULL;
>+
>+out:
>+ rcu_read_unlock();
>+ return sk_vsock(vsk);
> }
>
> static struct sock *__vsock_find_connected_socket(struct sockaddr_vm *src,
>@@ -408,7 +434,11 @@ static bool vsock_use_local_transport(unsigned int remote_cid)
> return true;
>
> if (transport_g2h) {
>- return remote_cid == transport_g2h->get_local_cid();
>+ if (transport_g2h->get_virtio_vsock)
>+ return transport_g2h->get_virtio_vsock(remote_cid) !=
>+ NULL;
>+ else
>+ return remote_cid == transport_g2h->get_local_cid();
> } else {
> return remote_cid == VMADDR_CID_HOST;
> }
>@@ -516,9 +546,26 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
> }
> EXPORT_SYMBOL_GPL(vsock_assign_transport);
>
>+bool transport_g2h_verify_cid(unsigned int cid)
>+{
>+ /* transports that support multi devices */
>+ rcu_read_lock();
>+ if (transport_g2h->get_virtio_vsock &&
>+ (cid == VMADDR_CID_ANY || transport_g2h->get_virtio_vsock(cid))) {
>+ rcu_read_unlock();
>+ return true;
>+ }
>+ rcu_read_unlock();
>+ /* other transports */
>+ if (cid == transport_g2h->get_local_cid())
>+ return true;
>+
>+ return false;
>+}
>+
> bool vsock_find_cid(unsigned int cid)
> {
>- if (transport_g2h && cid == transport_g2h->get_local_cid())
>+ if (transport_g2h && transport_g2h_verify_cid(cid))
> return true;
>
> if (transport_h2g && cid == VMADDR_CID_HOST)
>@@ -697,7 +744,9 @@ static int __vsock_bind(struct sock *sk, struct sockaddr_vm *addr)
> /* Now bind to the provided address or select appropriate values if
> * none are provided (VMADDR_CID_ANY and VMADDR_PORT_ANY). Note that
> * like AF_INET prevents binding to a non-local IP address (in most
>- * cases), we only allow binding to a local CID.
>+ * cases), we only allow binding to a local CID. In the cases of
>+ * multi-devices, only CIDs of vsock devices registered in the kernel
>+ * are allowed.
> */
> if (addr->svm_cid != VMADDR_CID_ANY && !vsock_find_cid(addr->svm_cid))
> return -EADDRNOTAVAIL;
>@@ -825,7 +874,6 @@ static void __vsock_release(struct sock *sk, int level)
> __vsock_release(pending, SINGLE_DEPTH_NESTING);
> sock_put(pending);
> }
>-
Unrelated change.
> release_sock(sk);
> sock_put(sk);
> }
>@@ -1181,7 +1229,12 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
> */
>
> if (remote_addr->svm_cid == VMADDR_CID_ANY)
>- remote_addr->svm_cid = transport->get_local_cid();
>+ if (transport->get_default_cid)
>+ remote_addr->svm_cid =
>+ transport->get_default_cid();
>+ else
>+ remote_addr->svm_cid =
>+ transport->get_local_cid();
>
> if (!vsock_addr_bound(remote_addr)) {
> err = -EINVAL;
>@@ -1191,7 +1244,12 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
> remote_addr = &vsk->remote_addr;
>
> if (remote_addr->svm_cid == VMADDR_CID_ANY)
>- remote_addr->svm_cid = transport->get_local_cid();
>+ if (transport->get_default_cid)
>+ remote_addr->svm_cid =
>+ transport->get_default_cid();
>+ else
>+ remote_addr->svm_cid =
>+ transport->get_local_cid();
>
> /* XXX Should connect() or this function ensure remote_addr is
> * bound?
>diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
>index ee5d306a96d0..93d25aeafb83 100644
>--- a/net/vmw_vsock/virtio_transport.c
>+++ b/net/vmw_vsock/virtio_transport.c
>@@ -22,8 +22,8 @@
> #include <net/af_vsock.h>
>
> static struct workqueue_struct *virtio_vsock_workqueue;
>-static struct virtio_vsock __rcu *the_virtio_vsock;
>-static DEFINE_MUTEX(the_virtio_vsock_mutex); /* protects the_virtio_vsock */
>+static LIST_HEAD(virtio_vsock_list); /* vsock multi-devices */
>+static DEFINE_MUTEX(virtio_vsock_list_mutex); /* protects virtio_vsock_list */
> static struct virtio_transport virtio_transport; /* forward declaration */
>
> struct virtio_vsock {
>@@ -62,6 +62,7 @@ struct virtio_vsock {
> struct virtio_vsock_event event_list[8];
>
> u32 guest_cid;
>+ u32 order;
> bool seqpacket_allow;
>
> /* These fields are used only in tx path in function
>@@ -74,24 +75,70 @@ struct virtio_vsock {
> */
> struct scatterlist *out_sgs[MAX_SKB_FRAGS + 1];
> struct scatterlist out_bufs[MAX_SKB_FRAGS + 1];
>+
>+ struct list_head node;
>+ struct rcu_head rcu;
> };
>
> static u32 virtio_transport_get_local_cid(void)
>+{
>+ return VMADDR_CID_ANY;
>+}
>+
>+/* Held rcu read lock by the caller. */
>+static struct virtio_vsock *virtio_transport_get_virtio_vsock(unsigned int cid)
> {
> struct virtio_vsock *vsock;
>- u32 ret;
>+ list_for_each_entry(vsock, &virtio_vsock_list, node) {
>+ if (vsock->guest_cid == cid)
>+ return vsock;
>+ }
>+ return NULL;
>+}
>
>- rcu_read_lock();
>- vsock = rcu_dereference(the_virtio_vsock);
>- if (!vsock) {
>- ret = VMADDR_CID_ANY;
>- goto out_rcu;
>+static unsigned int virtio_transport_get_default_cid(void)
>+{
>+ struct virtio_vsock *vsock;
>+
>+ vsock = list_first_or_null_rcu(&virtio_vsock_list, struct virtio_vsock,
>+ node);
>+ if (!vsock)
>+ return VMADDR_CID_ANY;
>+
>+ return vsock->guest_cid;
>+}
>+
>+/* Held rcu read lock by the caller. */
>+static int virtio_transport_get_local_cids(u32 *cids)
>+{
>+ int count = 0;
>+ struct virtio_vsock *vsock;
>+
>+ if (!cids)
>+ return -EFAULT;
>+
>+ list_for_each_entry(vsock, &virtio_vsock_list, node) {
>+ cids[count++] = vsock->guest_cid;
> }
>+ return count;
>+}
>
>- ret = vsock->guest_cid;
>-out_rcu:
>- rcu_read_unlock();
>- return ret;
>+/* Held rcu read lock by the caller. */
>+static int virtio_transport_compare_order(unsigned int left, unsigned int right)
>+{
>+ struct virtio_vsock *vsock;
>+
>+ if (left == right)
>+ return 0;
>+
>+ list_for_each_entry(vsock, &virtio_vsock_list, node) {
>+ if (right == vsock->guest_cid)
>+ return -1;
>+ if (left == vsock->guest_cid)
>+ return 1;
>+ }
>+
>+ return 0;
> }
>
> static void
>@@ -201,12 +248,16 @@ virtio_transport_send_pkt(struct sk_buff *skb)
> struct virtio_vsock_hdr *hdr;
> struct virtio_vsock *vsock;
> int len = skb->len;
>+ unsigned int src_cid;
>
> hdr = virtio_vsock_hdr(skb);
>+ src_cid = le64_to_cpu(hdr->src_cid);
>
> rcu_read_lock();
>- vsock = rcu_dereference(the_virtio_vsock);
>+ vsock = virtio_transport_get_virtio_vsock(src_cid);
> if (!vsock) {
>+ pr_debug("pkt sending has been failed, as vsock with cid %u not found\n",
>+ src_cid);
> kfree_skb(skb);
> len = -ENODEV;
> goto out_rcu;
>@@ -236,13 +287,17 @@ virtio_transport_cancel_pkt(struct vsock_sock *vsk)
> int cnt = 0, ret;
>
> rcu_read_lock();
>- vsock = rcu_dereference(the_virtio_vsock);
>- if (!vsock) {
>+ if (list_empty(&virtio_vsock_list)) {
> ret = -ENODEV;
> goto out_rcu;
> }
>
>- cnt = virtio_transport_purge_skbs(vsk, &vsock->send_pkt_queue);
>+ list_for_each_entry(vsock, &virtio_vsock_list, node) {
>+ cnt = virtio_transport_purge_skbs(vsk, &vsock->send_pkt_queue);
>+ /* Skbs for the vsk belong to one virtio_vsock */
>+ if (cnt)
>+ break;
>+ }
>
> if (cnt) {
> struct virtqueue *rx_vq = vsock->vqs[VSOCK_VQ_RX];
>@@ -385,6 +440,16 @@ static void virtio_vsock_update_guest_cid(struct virtio_vsock *vsock)
> vsock->guest_cid = le64_to_cpu(guest_cid);
> }
>
>+static void virtio_vsock_update_order(struct virtio_vsock *vsock)
>+{
>+ struct virtio_device *vdev = vsock->vdev;
>+ __le64 order;
>+
>+ vdev->config->get(vdev, offsetof(struct virtio_vsock_config, order),
>+ &order, sizeof(order));
>+ vsock->order = le64_to_cpu(order);
>+}
>+
> /* event_lock must be held */
> static void virtio_vsock_event_handle(struct virtio_vsock *vsock,
> struct virtio_vsock_event *event)
>@@ -492,13 +557,17 @@ static bool virtio_transport_msgzerocopy_allow(void)
> return true;
> }
>
>-static bool virtio_transport_seqpacket_allow(u32 remote_cid);
>+static bool virtio_transport_seqpacket_allow(u32 src_cid, u32 remote_cid);
>
> static struct virtio_transport virtio_transport = {
> .transport = {
> .module = THIS_MODULE,
>
> .get_local_cid = virtio_transport_get_local_cid,
>+ .get_virtio_vsock = virtio_transport_get_virtio_vsock,
>+ .get_default_cid = virtio_transport_get_default_cid,
>+ .get_local_cids = virtio_transport_get_local_cids,
>+ .compare_order = virtio_transport_compare_order,
>
> .init = virtio_transport_do_socket_init,
> .destruct = virtio_transport_destruct,
>@@ -617,7 +686,38 @@ static void virtio_transport_rx_work(struct work_struct *work)
> static int virtio_vsock_vqs_init(struct virtio_vsock *vsock)
> {
> struct virtio_device *vdev = vsock->vdev;
>- static const char * const names[] = {
>+ struct virtio_vsock *_vsock = NULL;
>+ int i;
>+ unsigned int guest_cid, order;
>+
>+ virtio_vsock_update_guest_cid(vsock);
>+ virtio_vsock_update_order(vsock);
>+ guest_cid = vsock->guest_cid;
>+ order = vsock->order;
>+
>+ i = 0;
>+ list_for_each_entry(_vsock, &virtio_vsock_list, node) {
>+ if (++i > MAX_VSOCK_NUM) {
>+ pr_debug("vsock num reaches limit %d\n", MAX_VSOCK_NUM);
>+ return -ENOMEM;
>+ }
>+
>+ /* Check for guest_cid conflicts */
>+ if (unlikely(guest_cid == _vsock->guest_cid)) {
>+ pr_debug("conflict exists in vsock's guest_cid %u\n",
>+ guest_cid);
>+ return -EBUSY;
>+ }
>+
>+ /* Check for order conflicts */
>+ if (unlikely(order == _vsock->order)) {
>+ pr_debug("conflict exists in vsock's order %u\n",
>+ order);
>+ return -EBUSY;
>+ }
>+ }
>+
>+ static const char *const names[] = {
> "rx",
> "tx",
> "event",
>@@ -634,8 +734,6 @@ static int virtio_vsock_vqs_init(struct virtio_vsock *vsock)
> if (ret < 0)
> return ret;
>
>- virtio_vsock_update_guest_cid(vsock);
>-
> virtio_device_ready(vdev);
>
> return 0;
>@@ -716,21 +814,14 @@ static void virtio_vsock_vqs_del(struct virtio_vsock *vsock)
>
> static int virtio_vsock_probe(struct virtio_device *vdev)
> {
>- struct virtio_vsock *vsock = NULL;
>- int ret;
>- int i;
>+ struct virtio_vsock *vsock, *_vsock, *first_vsock;
>+ int ret, i;
>+ unsigned int guest_cid, order;
>
>- ret = mutex_lock_interruptible(&the_virtio_vsock_mutex);
>+ ret = mutex_lock_interruptible(&virtio_vsock_list_mutex);
> if (ret)
> return ret;
>
>- /* Only one virtio-vsock device per guest is supported */
>- if (rcu_dereference_protected(the_virtio_vsock,
>- lockdep_is_held(&the_virtio_vsock_mutex))) {
>- ret = -EBUSY;
>- goto out;
>- }
>-
> vsock = kzalloc(sizeof(*vsock), GFP_KERNEL);
> if (!vsock) {
> ret = -ENOMEM;
>@@ -764,28 +855,56 @@ static int virtio_vsock_probe(struct virtio_device *vdev)
> for (i = 0; i < ARRAY_SIZE(vsock->out_sgs); i++)
> vsock->out_sgs[i] = &vsock->out_bufs[i];
>
>- rcu_assign_pointer(the_virtio_vsock, vsock);
>+ order = vsock->order;
>+ guest_cid = vsock->guest_cid;
>+ first_vsock =
>+ list_first_entry(&virtio_vsock_list, struct virtio_vsock, node);
>+ /* Insert virtio-vsock device into a proper location. */
>+ if (list_empty(&virtio_vsock_list) || first_vsock->order > order) {
>+ list_add_rcu(&vsock->node, &virtio_vsock_list);
>+ } else {
>+ list_for_each_entry(_vsock, &virtio_vsock_list, node) {
>+ struct virtio_vsock *next = container_of(_vsock->node.next,
>+ struct virtio_vsock, node);
>+ if (&next->node != &virtio_vsock_list &&
>+ next->order < order)
>+ continue;
>+ list_add_rcu(&vsock->node, &_vsock->node);
>+ break;
>+ }
>+ }
>+
>+ pr_debug("virtio_vsock@%p registered (cid = %u, order = %u)\n", vsock, guest_cid, order);
>+
> virtio_vsock_vqs_start(vsock);
>
>- mutex_unlock(&the_virtio_vsock_mutex);
>+ mutex_unlock(&virtio_vsock_list_mutex);
>
> return 0;
>
> out:
> kfree(vsock);
>- mutex_unlock(&the_virtio_vsock_mutex);
>+ mutex_unlock(&virtio_vsock_list_mutex);
> return ret;
> }
>
> static void virtio_vsock_remove(struct virtio_device *vdev)
> {
>- struct virtio_vsock *vsock = vdev->priv;
>+ struct virtio_vsock *vsock, *_vsock;
>
>- mutex_lock(&the_virtio_vsock_mutex);
>+ vsock = vdev->priv;
>+
>+ mutex_lock(&virtio_vsock_list_mutex);
>
> vdev->priv = NULL;
>- rcu_assign_pointer(the_virtio_vsock, NULL);
>- synchronize_rcu();
>+ /* Remove virtio-vsock device from the list. */
>+ list_for_each_entry(_vsock, &virtio_vsock_list, node) {
>+ if (vsock == _vsock) {
>+ list_del_rcu(&vsock->node);
>+ synchronize_rcu();
>+ break;
>+ }
>+ }
>
> virtio_vsock_vqs_del(vsock);
>
>@@ -797,7 +916,7 @@ static void virtio_vsock_remove(struct virtio_device *vdev)
> flush_work(&vsock->event_work);
> flush_work(&vsock->send_pkt_work);
>
>- mutex_unlock(&the_virtio_vsock_mutex);
>+ mutex_unlock(&virtio_vsock_list_mutex);
>
> kfree(vsock);
> }
>@@ -805,43 +924,62 @@ static void virtio_vsock_remove(struct virtio_device *vdev)
> #ifdef CONFIG_PM_SLEEP
> static int virtio_vsock_freeze(struct virtio_device *vdev)
> {
>- struct virtio_vsock *vsock = vdev->priv;
>+ struct virtio_vsock *vsock, *_vsock;
>
>- mutex_lock(&the_virtio_vsock_mutex);
>+ vsock = vdev->priv;
>
>- rcu_assign_pointer(the_virtio_vsock, NULL);
>- synchronize_rcu();
>+ mutex_lock(&virtio_vsock_list_mutex);
>+
>+ list_for_each_entry(_vsock, &virtio_vsock_list, node) {
>+ if (vsock == _vsock) {
>+ list_del_rcu(&vsock->node);
>+ synchronize_rcu();
>+ break;
>+ }
>+ }
>
> virtio_vsock_vqs_del(vsock);
>
>- mutex_unlock(&the_virtio_vsock_mutex);
>+ mutex_unlock(&virtio_vsock_list_mutex);
>
> return 0;
> }
>
> static int virtio_vsock_restore(struct virtio_device *vdev)
> {
>- struct virtio_vsock *vsock = vdev->priv;
>+ struct virtio_vsock *vsock, *_vsock, *first_vsock;
> int ret;
>+ unsigned int order;
>
>- mutex_lock(&the_virtio_vsock_mutex);
>+ vsock = vdev->priv;
>
>- /* Only one virtio-vsock device per guest is supported */
>- if (rcu_dereference_protected(the_virtio_vsock,
>- lockdep_is_held(&the_virtio_vsock_mutex))) {
>- ret = -EBUSY;
>- goto out;
>- }
>+ mutex_lock(&virtio_vsock_list_mutex);
>
> ret = virtio_vsock_vqs_init(vsock);
> if (ret < 0)
> goto out;
>
>- rcu_assign_pointer(the_virtio_vsock, vsock);
>+ order = vsock->order;
>+ first_vsock =
>+ list_first_entry(&virtio_vsock_list, struct virtio_vsock, node);
>+ /* Insert virtio-vsock device into a proper location. */
>+ if (list_empty(&virtio_vsock_list) || first_vsock->order > order) {
>+ list_add_rcu(&vsock->node, &virtio_vsock_list);
>+ } else {
>+ list_for_each_entry(_vsock, &virtio_vsock_list, node) {
>+ struct virtio_vsock *next = container_of(_vsock->node.next,
>+ struct virtio_vsock, node);
>+ if (&next->node != &virtio_vsock_list &&
>+ next->order < order)
>+ continue;
>+ list_add_rcu(&vsock->node, &_vsock->node);
>+ break;
>+ }
>+ }
> virtio_vsock_vqs_start(vsock);
>
> out:
>- mutex_unlock(&the_virtio_vsock_mutex);
>+ mutex_unlock(&virtio_vsock_list_mutex);
> return ret;
> }
> #endif /* CONFIG_PM_SLEEP */
>diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>index 16ff976a86e3..bed75a41419e 100644
>--- a/net/vmw_vsock/virtio_transport_common.c
>+++ b/net/vmw_vsock/virtio_transport_common.c
>@@ -340,7 +340,15 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> if (unlikely(!t_ops))
> return -EFAULT;
>
>- src_cid = t_ops->transport.get_local_cid();
>+ if (vsk->local_addr.svm_cid == VMADDR_CID_ANY) {
>+ if (t_ops->transport.get_default_cid)
>+ src_cid = t_ops->transport.get_default_cid();
>+ else
>+ src_cid = t_ops->transport.get_local_cid();
Can get_local_cid() be used for this case instead of adding
get_default_cid()?
What happen if get_default_cid() returns VMADDR_CID_ANY?
Should we fallback to get_local_cid()?
>+ } else {
>+ src_cid = vsk->local_addr.svm_cid;
>+ }
>+
> src_port = vsk->local_addr.svm_port;
> if (!info->remote_cid) {
> dst_cid = vsk->remote_addr.svm_cid;
>--
>2.34.1
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC PATCH 3/5] vsock/virtio: can_msgzerocopy adapts to multi-devices
2024-05-17 14:46 ` [RFC PATCH 3/5] vsock/virtio: can_msgzerocopy adapts to multi-devices Xuewei Niu
@ 2024-05-23 10:44 ` Stefano Garzarella
0 siblings, 0 replies; 15+ messages in thread
From: Stefano Garzarella @ 2024-05-23 10:44 UTC (permalink / raw)
To: Xuewei Niu
Cc: stefanha, mst, davem, kvm, virtualization, netdev, linux-kernel,
Xuewei Niu
On Fri, May 17, 2024 at 10:46:05PM GMT, Xuewei Niu wrote:
>Adds a new argument, named "cid", to let them know which `virtio_vsock` to
>be selected.
>
>Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
>---
> include/linux/virtio_vsock.h | 2 +-
> net/vmw_vsock/virtio_transport.c | 5 ++---
> net/vmw_vsock/virtio_transport_common.c | 6 +++---
> 3 files changed, 6 insertions(+), 7 deletions(-)
Every commit in linux must be working to support bisection. So these
changes should be made before adding multi-device support.
>
>diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
>index c82089dee0c8..21bfd5e0c2e7 100644
>--- a/include/linux/virtio_vsock.h
>+++ b/include/linux/virtio_vsock.h
>@@ -168,7 +168,7 @@ struct virtio_transport {
> * extra checks and can perform zerocopy transmission by
> * default.
> */
>- bool (*can_msgzerocopy)(int bufs_num);
>+ bool (*can_msgzerocopy)(u32 cid, int bufs_num);
> };
>
> ssize_t
>diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
>index 93d25aeafb83..998b22e5ce36 100644
>--- a/net/vmw_vsock/virtio_transport.c
>+++ b/net/vmw_vsock/virtio_transport.c
>@@ -521,14 +521,13 @@ static void virtio_vsock_rx_done(struct virtqueue *vq)
> queue_work(virtio_vsock_workqueue, &vsock->rx_work);
> }
>
>-static bool virtio_transport_can_msgzerocopy(int bufs_num)
>+static bool virtio_transport_can_msgzerocopy(u32 cid, int bufs_num)
> {
> struct virtio_vsock *vsock;
> bool res = false;
>
> rcu_read_lock();
>-
>- vsock = rcu_dereference(the_virtio_vsock);
>+ vsock = virtio_transport_get_virtio_vsock(cid);
> if (vsock) {
> struct virtqueue *vq = vsock->vqs[VSOCK_VQ_TX];
>
>diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>index bed75a41419e..e7315d7b9af1 100644
>--- a/net/vmw_vsock/virtio_transport_common.c
>+++ b/net/vmw_vsock/virtio_transport_common.c
>@@ -39,7 +39,7 @@ virtio_transport_get_ops(struct vsock_sock *vsk)
>
> static bool virtio_transport_can_zcopy(const struct virtio_transport *t_ops,
> struct virtio_vsock_pkt_info *info,
>- size_t pkt_len)
>+ size_t pkt_len, unsigned int cid)
> {
> struct iov_iter *iov_iter;
>
>@@ -62,7 +62,7 @@ static bool virtio_transport_can_zcopy(const struct virtio_transport *t_ops,
> int pages_to_send = iov_iter_npages(iov_iter, MAX_SKB_FRAGS);
>
> /* +1 is for packet header. */
>- return t_ops->can_msgzerocopy(pages_to_send + 1);
>+ return t_ops->can_msgzerocopy(cid, pages_to_send + 1);
> }
>
> return true;
>@@ -375,7 +375,7 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> info->msg->msg_flags &= ~MSG_ZEROCOPY;
>
> if (info->msg->msg_flags & MSG_ZEROCOPY)
>- can_zcopy = virtio_transport_can_zcopy(t_ops, info, pkt_len);
>+ can_zcopy = virtio_transport_can_zcopy(t_ops, info, pkt_len, src_cid);
>
> if (can_zcopy)
> max_skb_len = min_t(u32, VIRTIO_VSOCK_MAX_PKT_BUF_SIZE,
>--
>2.34.1
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC PATCH 4/5] vsock: seqpacket_allow adapts to multi-devices
2024-05-17 14:46 ` [RFC PATCH 4/5] vsock: seqpacket_allow " Xuewei Niu
@ 2024-05-23 10:44 ` Stefano Garzarella
0 siblings, 0 replies; 15+ messages in thread
From: Stefano Garzarella @ 2024-05-23 10:44 UTC (permalink / raw)
To: Xuewei Niu
Cc: stefanha, mst, davem, kvm, virtualization, netdev, linux-kernel,
Xuewei Niu
On Fri, May 17, 2024 at 10:46:06PM GMT, Xuewei Niu wrote:
>Adds a new argument, named "src_cid", to let them know which `virtio_vsock`
>to be selected.
>
>Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
>---
> include/net/af_vsock.h | 2 +-
> net/vmw_vsock/af_vsock.c | 15 +++++++++++++--
> net/vmw_vsock/virtio_transport.c | 4 ++--
> net/vmw_vsock/vsock_loopback.c | 4 ++--
> 4 files changed, 18 insertions(+), 7 deletions(-)
Same for this.
>
>diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
>index 0151296a0bc5..25f7dc3d602d 100644
>--- a/include/net/af_vsock.h
>+++ b/include/net/af_vsock.h
>@@ -143,7 +143,7 @@ struct vsock_transport {
> int flags);
> int (*seqpacket_enqueue)(struct vsock_sock *vsk, struct msghdr *msg,
> size_t len);
>- bool (*seqpacket_allow)(u32 remote_cid);
>+ bool (*seqpacket_allow)(u32 src_cid, u32 remote_cid);
> u32 (*seqpacket_has_data)(struct vsock_sock *vsk);
>
> /* Notification. */
>diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>index da06ddc940cd..3b34be802bf2 100644
>--- a/net/vmw_vsock/af_vsock.c
>+++ b/net/vmw_vsock/af_vsock.c
>@@ -470,10 +470,12 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
> {
> const struct vsock_transport *new_transport;
> struct sock *sk = sk_vsock(vsk);
>- unsigned int remote_cid = vsk->remote_addr.svm_cid;
>+ unsigned int src_cid, remote_cid;
> __u8 remote_flags;
> int ret;
>
>+ remote_cid = vsk->remote_addr.svm_cid;
>+
> /* If the packet is coming with the source and destination CIDs higher
> * than VMADDR_CID_HOST, then a vsock channel where all the packets are
> * forwarded to the host should be established. Then the host will
>@@ -527,8 +529,17 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
> return -ENODEV;
>
> if (sk->sk_type == SOCK_SEQPACKET) {
>+ if (vsk->local_addr.svm_cid == VMADDR_CID_ANY) {
>+ if (new_transport->get_default_cid)
>+ src_cid = new_transport->get_default_cid();
>+ else
>+ src_cid = new_transport->get_local_cid();
>+ } else {
>+ src_cid = vsk->local_addr.svm_cid;
>+ }
>+
> if (!new_transport->seqpacket_allow ||
>- !new_transport->seqpacket_allow(remote_cid)) {
>+ !new_transport->seqpacket_allow(src_cid, remote_cid)) {
> module_put(new_transport->module);
> return -ESOCKTNOSUPPORT;
> }
>diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
>index 998b22e5ce36..0bddcbd906a2 100644
>--- a/net/vmw_vsock/virtio_transport.c
>+++ b/net/vmw_vsock/virtio_transport.c
>@@ -615,14 +615,14 @@ static struct virtio_transport virtio_transport = {
> .can_msgzerocopy = virtio_transport_can_msgzerocopy,
> };
>
>-static bool virtio_transport_seqpacket_allow(u32 remote_cid)
>+static bool virtio_transport_seqpacket_allow(u32 src_cid, u32 remote_cid)
> {
> struct virtio_vsock *vsock;
> bool seqpacket_allow;
>
> seqpacket_allow = false;
> rcu_read_lock();
>- vsock = rcu_dereference(the_virtio_vsock);
>+ vsock = virtio_transport_get_virtio_vsock(src_cid);
> if (vsock)
> seqpacket_allow = vsock->seqpacket_allow;
> rcu_read_unlock();
>diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c
>index 6dea6119f5b2..b94358f5bb2c 100644
>--- a/net/vmw_vsock/vsock_loopback.c
>+++ b/net/vmw_vsock/vsock_loopback.c
>@@ -46,7 +46,7 @@ static int vsock_loopback_cancel_pkt(struct vsock_sock *vsk)
> return 0;
> }
>
>-static bool vsock_loopback_seqpacket_allow(u32 remote_cid);
>+static bool vsock_loopback_seqpacket_allow(u32 src_cid, u32 remote_cid);
> static bool vsock_loopback_msgzerocopy_allow(void)
> {
> return true;
>@@ -104,7 +104,7 @@ static struct virtio_transport loopback_transport = {
> .send_pkt = vsock_loopback_send_pkt,
> };
>
>-static bool vsock_loopback_seqpacket_allow(u32 remote_cid)
>+static bool vsock_loopback_seqpacket_allow(u32 src_cid, u32 remote_cid)
> {
> return true;
> }
>--
>2.34.1
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC PATCH 5/5] vsock: Add an ioctl request to get all CIDs
2024-05-17 14:46 ` [RFC PATCH 5/5] vsock: Add an ioctl request to get all CIDs Xuewei Niu
@ 2024-05-23 10:45 ` Stefano Garzarella
0 siblings, 0 replies; 15+ messages in thread
From: Stefano Garzarella @ 2024-05-23 10:45 UTC (permalink / raw)
To: Xuewei Niu
Cc: stefanha, mst, davem, kvm, virtualization, netdev, linux-kernel,
Xuewei Niu
On Fri, May 17, 2024 at 10:46:07PM GMT, Xuewei Niu wrote:
>The new request is called `IOCTL_VM_SOCKETS_GET_LOCAL_CIDS`. And the old
>one, `IOCTL_VM_SOCKETS_GET_LOCAL_CID` is retained.
>
>For the transport that supports multi-devices:
>
>* `IOCTL_VM_SOCKETS_GET_LOCAL_CID` returns "-1";
What about returning the default CID (lower prio)?
>* `IOCTL_VM_SOCKETS_GET_LOCAL_CIDS` returns a vector of CIDS. The usage
>is
>shown as following.
>
>```
>struct vsock_local_cids local_cids;
>if ((ret = ioctl(fd, IOCTL_VM_SOCKETS_GET_LOCAL_CIDS, &local_cids))) {
> perror("failed to get cids");
> exit(1);
>}
>for (i = 0; i<local_cids.nr; i++) {
> if (i == (local_cids.nr - 1))
> printf("%u", local_cids.data[i]);
> else
> printf("%u,", local_cids.data[i]);
>}
>```
>
>Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
>---
> include/net/af_vsock.h | 7 +++++++
> include/uapi/linux/vm_sockets.h | 8 ++++++++
> net/vmw_vsock/af_vsock.c | 19 +++++++++++++++++++
> 3 files changed, 34 insertions(+)
>
>diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
>index 25f7dc3d602d..2febc816e388 100644
>--- a/include/net/af_vsock.h
>+++ b/include/net/af_vsock.h
>@@ -264,4 +264,11 @@ static inline bool vsock_msgzerocopy_allow(const struct vsock_transport *t)
> {
> return t->msgzerocopy_allow && t->msgzerocopy_allow();
> }
>+
>+/**** IOCTL ****/
>+/* Type of return value of IOCTL_VM_SOCKETS_GET_LOCAL_CIDS. */
>+struct vsock_local_cids {
>+ int nr;
>+ unsigned int data[MAX_VSOCK_NUM];
>+};
> #endif /* __AF_VSOCK_H__ */
>diff --git a/include/uapi/linux/vm_sockets.h b/include/uapi/linux/vm_sockets.h
>index 36ca5023293a..01f73fb7af5a 100644
>--- a/include/uapi/linux/vm_sockets.h
>+++ b/include/uapi/linux/vm_sockets.h
>@@ -195,8 +195,16 @@ struct sockaddr_vm {
>
> #define MAX_VSOCK_NUM 16
Okay, now I see why you need this in the UAPI, but pleace try to follow
other defines.
What about VM_SOCKETS_MAX_DEVS ?
>
>+/* Return actual context id if the transport not support vsock
>+ * multi-devices. Otherwise, return `-1U`.
>+ */
>+
> #define IOCTL_VM_SOCKETS_GET_LOCAL_CID _IO(7, 0xb9)
>
>+/* Only available in transports that support multiple devices. */
>+
>+#define IOCTL_VM_SOCKETS_GET_LOCAL_CIDS _IOR(7, 0xba, struct vsock_local_cids)
>+
> /* MSG_ZEROCOPY notifications are encoded in the standard error format,
> * sock_extended_err. See Documentation/networking/msg_zerocopy.rst in
> * kernel source tree for more details.
>diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>index 3b34be802bf2..2ea2ff52f15b 100644
>--- a/net/vmw_vsock/af_vsock.c
>+++ b/net/vmw_vsock/af_vsock.c
>@@ -2454,6 +2454,7 @@ static long vsock_dev_do_ioctl(struct file *filp,
> u32 __user *p = ptr;
> u32 cid = VMADDR_CID_ANY;
> int retval = 0;
>+ struct vsock_local_cids local_cids;
>
> switch (cmd) {
> case IOCTL_VM_SOCKETS_GET_LOCAL_CID:
>@@ -2469,6 +2470,24 @@ static long vsock_dev_do_ioctl(struct file *filp,
> retval = -EFAULT;
> break;
>
>+ case IOCTL_VM_SOCKETS_GET_LOCAL_CIDS:
>+ if (!transport_g2h || !transport_g2h->get_local_cids)
>+ goto fault;
>+
>+ rcu_read_lock();
>+ local_cids.nr = transport_g2h->get_local_cids(local_cids.data);
>+ rcu_read_unlock();
>+
>+ if (local_cids.nr < 0 ||
>+ copy_to_user(p, &local_cids, sizeof(local_cids)))
>+ goto fault;
>+
>+ break;
>+
>+fault:
>+ retval = -EFAULT;
>+ break;
>+
> default:
> retval = -ENOIOCTLCMD;
> }
>--
>2.34.1
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC PATCH 0/5] vsock/virtio: Add support for multi-devices
2024-05-17 14:46 [RFC PATCH 0/5] vsock/virtio: Add support for multi-devices Xuewei Niu
` (4 preceding siblings ...)
2024-05-17 14:46 ` [RFC PATCH 5/5] vsock: Add an ioctl request to get all CIDs Xuewei Niu
@ 2024-05-23 10:46 ` Stefano Garzarella
2024-05-23 14:54 ` Michael S. Tsirkin
6 siblings, 0 replies; 15+ messages in thread
From: Stefano Garzarella @ 2024-05-23 10:46 UTC (permalink / raw)
To: Xuewei Niu
Cc: stefanha, mst, davem, kvm, virtualization, netdev, linux-kernel,
Xuewei Niu
Hi,
thanks for this RFC!
On Fri, May 17, 2024 at 10:46:02PM GMT, Xuewei Niu wrote:
># Motivition
>
>Vsock is a lightweight and widely used data exchange mechanism between host
>and guest. Kata Containers, a secure container runtime, leverages the
>capability to exchange control data between the shim and the kata-agent.
>
>The Linux kernel only supports one vsock device for virtio-vsock transport,
>resulting in the following limitations:
>
>* Poor performance isolation: All vsock connections share the same
>virtqueue.
This might be fixed if we implement multi-queue in virtio-vsock.
>* Cannot enable more than one backend: Virtio-vsock, vhost-vsock, and
>vhost-user-vsock cannot be enabled simultaneously on the transport.
>
>We’d like to transfer networking data, such as TSI (Transparent Socket
>Impersonation), over vsock via the vhost-user protocol to reduce overhead.
>However, by default, the vsock device is occupied by the kata-agent.
>
># Usages
>
>Principle: **Supporting virtio-vsock multi-devices while also being
>compatible with existing ones.**
>
>## Connection from Guest to Host
>
>There are two valuable questions to take about:
>
>1. How to be compatible with the existing usages?
>2. How do we specify a virtio-vsock device?
>
>### Question 1
>
>Before we delve into question 1, I'd like to provide a piece of pseudocode
>as an example of one of the existing use cases from the guest's
>perspective.
>
>Assuming there is one virtio-vsock device with CID 4. One of existing
>usages to connect to host is shown as following.
>
>```
>fd = socket(AF_VSOCK);
>connect(fd, 2, 1234);
>n = write(fd, buffer);
>```
>
>The result is that a connection is established from the guest (4, ?) to the
>host (2, 1234), where "?" denotes a random port.
>
>In the context of multi-devices, there are more than two devices. If the
>users don’t specify one CID explicitly, the kernel becomes confused about
>which device to use. The new implementation should be compatible with the
>old one.
>
>We expanded the virtio-vsock specification to address this issue. The
>specification now includes a new field called "order".
>
>```
>struct virtio_vsock_config {
> __le64 guest_cid;
> __le64 order;
>} _attribute_((packed));
>```
>
>In the phase of virtio-vsock driver probing, the guest kernel reads
>from
>VMM to get the order of each device. **We stipulate that the device with the
>smallest order is regarded as the default device**(this mechanism functions
>as a 'default gateway' in networking).
>
>Assuming there are three virtio-vsock devices: device1 (CID=3), device2
>(CID=4), and device3 (CID=5). The arrangement of the list is as follows
>from the perspective of the guest kernel:
>
>```
>virtio_vsock_list =
>virtio_vsock { cid: 4, order: 0 } -> virtio_vsock { cid: 3, order: 1 } -> virtio_vsock { cid: 5, order: 10 }
>```
>
>At this time, the guest kernel realizes that the device2 (CID=4) is the
>default device. Execute the same code as before.
>
>```
>fd = socket(AF_VSOCK);
>connect(fd, 2, 1234);
>n = write(fd, buffer);
>```
>
>A connection will be established from the guest (4, ?) to the host (2, 1234).
It seems that only the one with order 0 is used here though, so what is
the ordering for?
Wouldn't it suffice to simply indicate the default device (e.g., like
the default gateway for networking)?
>
>### Question 2
>
>Now, the user wants to specify a device instead of the default one. An
>explicit binding operation is required to be performed.
>
>Use the device (CID=3), where “-1” represents any port, the kernel will
We have a macro: VMADDR_PORT_ANY (which is -1)
>search an available port automatically.
>
>```
>fd = socket(AF_VSOCK);
>bind(fd, 3, -1);
>connect(fd, 2, 1234);)
>n = write(fd, buffer);
>```
>
>Use the device (CID=4).
>
>```
>fd = socket(AF_VSOCK);
>bind(fd, 4, -1);
>connect(fd, 2, 1234);
>n = write(fd, buffer);
>```
>
>## Connection from Host to Guest
>
>Connection from host to guest is quite similar to the existing usages. The
>device’s CID is specified by the bind operation.
>
>Listen at the device (CID=3)’s port 10000.
>
>```
>fd = socket(AF_VSOCK);
>bind(fd, 3, 10000);
>listen(fd);
>new_fd = accept(fd, &host_cid, &host_port);
>n = write(fd, buffer);
>```
>
>Listen at the device (CID=4)’s port 10000.
>
>```
>fd = socket(AF_VSOCK);
>bind(fd, 4, 10000);
>listen(fd);
>new_fd = accept(fd, &host_cid, &host_port);
>n = write(fd, buffer);
>```
>
># Use Cases
>
>We've completed a POC with Kata Containers, Ztunnel, which is a
>purpose-built per-node proxy for Istio ambient mesh, and TSI. Please refer
>to the following link for more details.
>
>Link: https://bit.ly/4bdPJbU
Thank you for this RFC, I left several comments in the patches, we still
have some work to do, but I think it is something we can support :-)
Here I summarize the things that I think we need to fix:
1. Avoid adding transport-specific things in af_vsock.c
We need to have a generic API to allow other transports to implement
the same functionality.
2. We need to add negotiation of a new feature in virtio/vhost transports
We need to enable or disable support depending on whether the
feature is negotiated, since guest and host may not support it.
3. Re-work the patch order for bisectability (more detail on patches 3/4)
4. Do we really need the order or just a default device?
5. Check if we can add some tests in tools/testing/vsock
6. When we agree on the RFC, we should discuss the spec changes in the
virtio ML before sending a non-RFC series on Linux
These are the main things, but I left other comments in the patches.
Thanks,
Stefano
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC PATCH 0/5] vsock/virtio: Add support for multi-devices
2024-05-17 14:46 [RFC PATCH 0/5] vsock/virtio: Add support for multi-devices Xuewei Niu
` (5 preceding siblings ...)
2024-05-23 10:46 ` [RFC PATCH 0/5] vsock/virtio: Add support for multi-devices Stefano Garzarella
@ 2024-05-23 14:54 ` Michael S. Tsirkin
2024-05-29 7:15 ` Xuewei Niu
6 siblings, 1 reply; 15+ messages in thread
From: Michael S. Tsirkin @ 2024-05-23 14:54 UTC (permalink / raw)
To: Xuewei Niu
Cc: stefanha, sgarzare, davem, kvm, virtualization, netdev,
linux-kernel, Xuewei Niu
On Fri, May 17, 2024 at 10:46:02PM +0800, Xuewei Niu wrote:
> include/linux/virtio_vsock.h | 2 +-
> include/net/af_vsock.h | 25 ++-
> include/uapi/linux/virtio_vsock.h | 1 +
> include/uapi/linux/vm_sockets.h | 14 ++
> net/vmw_vsock/af_vsock.c | 116 +++++++++--
> net/vmw_vsock/virtio_transport.c | 255 ++++++++++++++++++------
> net/vmw_vsock/virtio_transport_common.c | 16 +-
> net/vmw_vsock/vsock_loopback.c | 4 +-
> 8 files changed, 352 insertions(+), 81 deletions(-)
As any change to virtio device/driver interface, this has to
go through the virtio TC. Please subscribe at
virtio-comment+subscribe@lists.linux.dev and then
contact the TC at virtio-comment@lists.linux.dev
You will likely eventually need to write a spec draft document, too.
--
MST
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Re: [RFC PATCH 0/5] vsock/virtio: Add support for multi-devices
2024-05-23 14:54 ` Michael S. Tsirkin
@ 2024-05-29 7:15 ` Xuewei Niu
0 siblings, 0 replies; 15+ messages in thread
From: Xuewei Niu @ 2024-05-29 7:15 UTC (permalink / raw)
To: mst
Cc: davem, kvm, linux-kernel, netdev, niuxuewei.nxw, niuxuewei97,
sgarzare, stefanha, virtualization, fupan.lfp
Hi, MST!
> > include/linux/virtio_vsock.h | 2 +-
> > include/net/af_vsock.h | 25 ++-
> > include/uapi/linux/virtio_vsock.h | 1 +
> > include/uapi/linux/vm_sockets.h | 14 ++
> > net/vmw_vsock/af_vsock.c | 116 +++++++++--
> > net/vmw_vsock/virtio_transport.c | 255 ++++++++++++++++++------
> > net/vmw_vsock/virtio_transport_common.c | 16 +-
> > net/vmw_vsock/vsock_loopback.c | 4 +-
> > 8 files changed, 352 insertions(+), 81 deletions(-)
>
> As any change to virtio device/driver interface, this has to
> go through the virtio TC. Please subscribe at
> virtio-comment+subscribe@lists.linux.dev and then
> contact the TC at virtio-comment@lists.linux.dev
>
> You will likely eventually need to write a spec draft document, too.
Thanks for your reply. I've sent a series of RFC documents for the spec
changes to virtio TC [1].
[1]: https://lore.kernel.org/virtio-comment/20240528054725.268173-1-niuxuewei.nxw@antgroup.com/
Thanks
Xuewei
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2024-05-29 7:15 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-17 14:46 [RFC PATCH 0/5] vsock/virtio: Add support for multi-devices Xuewei Niu
2024-05-17 14:46 ` [RFC PATCH 1/5] vsock/virtio: Extend virtio-vsock spec with an "order" field Xuewei Niu
2024-05-23 7:43 ` Alyssa Ross
2024-05-23 10:43 ` Stefano Garzarella
2024-05-17 14:46 ` [RFC PATCH 2/5] vsock/virtio: Add support for multi-devices Xuewei Niu
2024-05-23 10:43 ` Stefano Garzarella
2024-05-17 14:46 ` [RFC PATCH 3/5] vsock/virtio: can_msgzerocopy adapts to multi-devices Xuewei Niu
2024-05-23 10:44 ` Stefano Garzarella
2024-05-17 14:46 ` [RFC PATCH 4/5] vsock: seqpacket_allow " Xuewei Niu
2024-05-23 10:44 ` Stefano Garzarella
2024-05-17 14:46 ` [RFC PATCH 5/5] vsock: Add an ioctl request to get all CIDs Xuewei Niu
2024-05-23 10:45 ` Stefano Garzarella
2024-05-23 10:46 ` [RFC PATCH 0/5] vsock/virtio: Add support for multi-devices Stefano Garzarella
2024-05-23 14:54 ` Michael S. Tsirkin
2024-05-29 7:15 ` Xuewei Niu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).