linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] Add multiple address spaces support to VDUSE
@ 2025-08-26 11:27 Eugenio Pérez
  2025-08-26 11:27 ` [PATCH 1/6] vduse: add v1 API definition Eugenio Pérez
                   ` (5 more replies)
  0 siblings, 6 replies; 24+ messages in thread
From: Eugenio Pérez @ 2025-08-26 11:27 UTC (permalink / raw)
  To: Michael S . Tsirkin 
  Cc: Eugenio Pérez, Cindy Lu, Stefano Garzarella, Xuan Zhuo,
	Laurent Vivier, virtualization, linux-kernel, jasowang,
	Yongji Xie, Maxime Coquelin

When used by vhost-vDPA bus driver for VM, the control virtqueue
should be shadowed via userspace VMM (QEMU) instead of being assigned
directly to Guest. This is because QEMU needs to know the device state
in order to start and stop device correctly (e.g for Live Migration).

This requies to isolate the memory mapping for control virtqueue
presented by vhost-vDPA to prevent guest from accessing it directly.

This series add support to multiple address spaces in VDUSE device
allowing selective virtqueue isolation through address space IDs (ASID).

The VDUSE device needs to report:
* Number of virtqueue groups
* Association of each vq group with each virtqueue
* Number of address spaces supported.

Then, the vDPA driver can modify the ASID assigned to each VQ group to
isolate the memory AS.  This aligns VDUSE with gq}vdpa_sim and nvidia
mlx5 devices which already support ASID.

This helps to isolate the environments for the virtqueues that will not
be assigned directly. E.g in the case of virtio-net, the control
virtqueue will not be assigned directly to guest.

This series depends on the series that reworks the virtio mapping API:
https://lore.kernel.org/all/20250821064641.5025-1-jasowang@redhat.com/

Also, to be able to test this patch, the user needs to manually revert
56e71885b034 ("vduse: Temporarily fail if control queue feature requested").

PATCH v1:
* Fix: Remove BIT_ULL(VIRTIO_S_*), as _S_ is already the bit (Maxime)
* Using vduse_vq_group_int directly instead of an empty struct in union
  virtio_map.

RFC v3:
* Increase VDUSE_MAX_VQ_GROUPS to 0xffff (Jason). It was set to a lower
  value to reduce memory consumption, but vqs are already limited to
  that value and userspace VDUSE is able to allocate that many vqs.  Also, it's
  a dynamic array now.  Same with ASID.
* Move the valid vq groups range check to vduse_validate_config.
* Embed vduse_iotlb_entry into vduse_iotlb_entry_v2.
* Use of array_index_nospec in VDUSE device ioctls.
* Move the umem mutex to asid struct so there is no contention between
  ASIDs.
* Remove the descs vq group capability as it will not be used and we can
  add it on top.
* Do not ask for vq groups in number of vq groups < 2.
* Remove TODO about merging VDUSE_IOTLB_GET_FD ioctl with
  VDUSE_IOTLB_GET_INFO.

RFC v2:
* Cache group information in kernel, as we need to provide the vq map
  tokens properly.
* Add descs vq group to optimize SVQ forwarding and support indirect
  descriptors out of the box.
* Make iotlb entry the last one of vduse_iotlb_entry_v2 so the first
  part of the struct is the same.
* Fixes detected testing with OVS+VDUSE.

Eugenio Pérez (6):
  vduse: add v1 API definition
  vduse: add vq group support
  vduse: return internal vq group struct as map token
  vduse: create vduse_as to make it an array
  vduse: add vq group asid support
  vduse: bump version number

 drivers/vdpa/vdpa_user/vduse_dev.c | 385 ++++++++++++++++++++++-------
 include/linux/virtio.h             |   6 +-
 include/uapi/linux/vduse.h         |  73 +++++-
 3 files changed, 373 insertions(+), 91 deletions(-)

-- 
2.51.0


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/6] vduse: add v1 API definition
  2025-08-26 11:27 [PATCH 0/6] Add multiple address spaces support to VDUSE Eugenio Pérez
@ 2025-08-26 11:27 ` Eugenio Pérez
  2025-08-26 11:27 ` [PATCH 2/6] vduse: add vq group support Eugenio Pérez
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 24+ messages in thread
From: Eugenio Pérez @ 2025-08-26 11:27 UTC (permalink / raw)
  To: Michael S . Tsirkin 
  Cc: Eugenio Pérez, Cindy Lu, Stefano Garzarella, Xuan Zhuo,
	Laurent Vivier, virtualization, linux-kernel, jasowang,
	Yongji Xie, Maxime Coquelin

This allows the kernel to detect whether the userspace VDUSE device
supports the VQ group and ASID features.  VDUSE devices that don't set
the V1 API will not receive the new messages, and vdpa device will be
created with only one vq group and asid.

The next patches implement the new feature incrementally, only enabling
the VDUSE device to set the V1 API version by the end of the series.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 include/uapi/linux/vduse.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/include/uapi/linux/vduse.h b/include/uapi/linux/vduse.h
index 68a627d04afa..9a56d0416bfe 100644
--- a/include/uapi/linux/vduse.h
+++ b/include/uapi/linux/vduse.h
@@ -10,6 +10,10 @@
 
 #define VDUSE_API_VERSION	0
 
+/* VQ groups and ASID support */
+
+#define VDUSE_API_VERSION_1	1
+
 /*
  * Get the version of VDUSE API that kernel supported (VDUSE_API_VERSION).
  * This is used for future extension.
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 2/6] vduse: add vq group support
  2025-08-26 11:27 [PATCH 0/6] Add multiple address spaces support to VDUSE Eugenio Pérez
  2025-08-26 11:27 ` [PATCH 1/6] vduse: add v1 API definition Eugenio Pérez
@ 2025-08-26 11:27 ` Eugenio Pérez
  2025-09-01  1:59   ` Jason Wang
  2025-08-26 11:27 ` [PATCH 3/6] vduse: return internal vq group struct as map token Eugenio Pérez
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 24+ messages in thread
From: Eugenio Pérez @ 2025-08-26 11:27 UTC (permalink / raw)
  To: Michael S . Tsirkin 
  Cc: Eugenio Pérez, Cindy Lu, Stefano Garzarella, Xuan Zhuo,
	Laurent Vivier, virtualization, linux-kernel, jasowang,
	Yongji Xie, Maxime Coquelin

This allows sepparate the different virtqueues in groups that shares the
same address space.  Asking the VDUSE device for the groups of the vq at
the beginning as they're needed for the DMA API.

Allocating 3 vq groups as net is the device that need the most groups:
* Dataplane (guest passthrough)
* CVQ
* Shadowed vrings.

Future versions of the series can include dynamic allocation of the
groups array so VDUSE can declare more groups.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
v1: Fix: Remove BIT_ULL(VIRTIO_S_*), as _S_ is already the bit (Maxime)

RFC v3:
* Increase VDUSE_MAX_VQ_GROUPS to 0xffff (Jason). It was set to a lower
  value to reduce memory consumption, but vqs are already limited to
  that value and userspace VDUSE is able to allocate that many vqs.
* Remove the descs vq group capability as it will not be used and we can
  add it on top.
* Do not ask for vq groups in number of vq groups < 2.
* Move the valid vq groups range check to vduse_validate_config.

RFC v2:
* Cache group information in kernel, as we need to provide the vq map
  tokens properly.
* Add descs vq group to optimize SVQ forwarding and support indirect
  descriptors out of the box.
---
 drivers/vdpa/vdpa_user/vduse_dev.c | 51 ++++++++++++++++++++++++++++--
 include/uapi/linux/vduse.h         | 21 +++++++++++-
 2 files changed, 68 insertions(+), 4 deletions(-)

diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
index e7bced0b5542..0f4e36dd167e 100644
--- a/drivers/vdpa/vdpa_user/vduse_dev.c
+++ b/drivers/vdpa/vdpa_user/vduse_dev.c
@@ -58,6 +58,7 @@ struct vduse_virtqueue {
 	struct vdpa_vq_state state;
 	bool ready;
 	bool kicked;
+	u32 vq_group;
 	spinlock_t kick_lock;
 	spinlock_t irq_lock;
 	struct eventfd_ctx *kickfd;
@@ -114,6 +115,7 @@ struct vduse_dev {
 	u8 status;
 	u32 vq_num;
 	u32 vq_align;
+	u32 ngroups;
 	struct vduse_umem *umem;
 	struct mutex mem_lock;
 	unsigned int bounce_size;
@@ -592,6 +594,13 @@ static int vduse_vdpa_set_vq_state(struct vdpa_device *vdpa, u16 idx,
 	return 0;
 }
 
+static u32 vduse_get_vq_group(struct vdpa_device *vdpa, u16 idx)
+{
+	struct vduse_dev *dev = vdpa_to_vduse(vdpa);
+
+	return dev->vqs[idx]->vq_group;
+}
+
 static int vduse_vdpa_get_vq_state(struct vdpa_device *vdpa, u16 idx,
 				struct vdpa_vq_state *state)
 {
@@ -678,6 +687,28 @@ static u8 vduse_vdpa_get_status(struct vdpa_device *vdpa)
 	return dev->status;
 }
 
+static int vduse_fill_vq_groups(struct vduse_dev *dev)
+{
+	/* All vqs and descs must be in vq group 0 if ngroups < 2 */
+	if (dev->ngroups < 2)
+		return 0;
+
+	for (int i = 0; i < dev->vdev->vdpa.nvqs; ++i) {
+		struct vduse_dev_msg msg = { 0 };
+		int ret;
+
+		msg.req.type = VDUSE_GET_VQ_GROUP;
+		msg.req.vq_group.index = i;
+		ret = vduse_dev_msg_sync(dev, &msg);
+		if (ret)
+			return ret;
+
+		dev->vqs[i]->vq_group = msg.resp.vq_group.group;
+	}
+
+	return 0;
+}
+
 static void vduse_vdpa_set_status(struct vdpa_device *vdpa, u8 status)
 {
 	struct vduse_dev *dev = vdpa_to_vduse(vdpa);
@@ -685,6 +716,11 @@ static void vduse_vdpa_set_status(struct vdpa_device *vdpa, u8 status)
 	if (vduse_dev_set_status(dev, status))
 		return;
 
+	if (((dev->status ^ status) & VIRTIO_CONFIG_S_FEATURES_OK) &&
+	    (status & VIRTIO_CONFIG_S_FEATURES_OK))
+		if (vduse_fill_vq_groups(dev))
+			return;
+
 	dev->status = status;
 }
 
@@ -789,6 +825,7 @@ static const struct vdpa_config_ops vduse_vdpa_config_ops = {
 	.set_vq_cb		= vduse_vdpa_set_vq_cb,
 	.set_vq_num             = vduse_vdpa_set_vq_num,
 	.get_vq_size		= vduse_vdpa_get_vq_size,
+	.get_vq_group		= vduse_get_vq_group,
 	.set_vq_ready		= vduse_vdpa_set_vq_ready,
 	.get_vq_ready		= vduse_vdpa_get_vq_ready,
 	.set_vq_state		= vduse_vdpa_set_vq_state,
@@ -1737,12 +1774,19 @@ static bool features_is_valid(struct vduse_dev_config *config)
 	return true;
 }
 
-static bool vduse_validate_config(struct vduse_dev_config *config)
+static bool vduse_validate_config(struct vduse_dev_config *config,
+				  u64 api_version)
 {
 	if (!is_mem_zero((const char *)config->reserved,
 			 sizeof(config->reserved)))
 		return false;
 
+	if (api_version < VDUSE_API_VERSION_1 && config->ngroups)
+		return false;
+
+	if (api_version >= VDUSE_API_VERSION_1 && config->ngroups > 0xffff)
+		return false;
+
 	if (config->vq_align > PAGE_SIZE)
 		return false;
 
@@ -1858,6 +1902,7 @@ static int vduse_create_dev(struct vduse_dev_config *config,
 	dev->device_features = config->features;
 	dev->device_id = config->device_id;
 	dev->vendor_id = config->vendor_id;
+	dev->ngroups = (dev->api_version < 1) ? 1 : (config->ngroups ?: 1);
 	dev->name = kstrdup(config->name, GFP_KERNEL);
 	if (!dev->name)
 		goto err_str;
@@ -1936,7 +1981,7 @@ static long vduse_ioctl(struct file *file, unsigned int cmd,
 			break;
 
 		ret = -EINVAL;
-		if (vduse_validate_config(&config) == false)
+		if (!vduse_validate_config(&config, control->api_version))
 			break;
 
 		buf = vmemdup_user(argp + size, config.config_size);
@@ -2017,7 +2062,7 @@ static int vduse_dev_init_vdpa(struct vduse_dev *dev, const char *name)
 
 	vdev = vdpa_alloc_device(struct vduse_vdpa, vdpa, dev->dev,
 				 &vduse_vdpa_config_ops, &vduse_map_ops,
-				 1, 1, name, true);
+				 dev->ngroups, 1, name, true);
 	if (IS_ERR(vdev))
 		return PTR_ERR(vdev);
 
diff --git a/include/uapi/linux/vduse.h b/include/uapi/linux/vduse.h
index 9a56d0416bfe..b1c0e47d71fb 100644
--- a/include/uapi/linux/vduse.h
+++ b/include/uapi/linux/vduse.h
@@ -31,6 +31,7 @@
  * @features: virtio features
  * @vq_num: the number of virtqueues
  * @vq_align: the allocation alignment of virtqueue's metadata
+ * @ngroups: number of vq groups that VDUSE device declares
  * @reserved: for future use, needs to be initialized to zero
  * @config_size: the size of the configuration space
  * @config: the buffer of the configuration space
@@ -45,7 +46,8 @@ struct vduse_dev_config {
 	__u64 features;
 	__u32 vq_num;
 	__u32 vq_align;
-	__u32 reserved[13];
+	__u32 ngroups; /* if VDUSE_API_VERSION >= 1 */
+	__u32 reserved[12];
 	__u32 config_size;
 	__u8 config[];
 };
@@ -160,6 +162,16 @@ struct vduse_vq_state_packed {
 	__u16 last_used_idx;
 };
 
+/**
+ * struct vduse_vq_group - virtqueue group
+ * @index: Index of the virtqueue
+ * @group: Virtqueue group
+ */
+struct vduse_vq_group {
+	__u32 index;
+	__u32 group;
+};
+
 /**
  * struct vduse_vq_info - information of a virtqueue
  * @index: virtqueue index
@@ -274,6 +286,7 @@ enum vduse_req_type {
 	VDUSE_GET_VQ_STATE,
 	VDUSE_SET_STATUS,
 	VDUSE_UPDATE_IOTLB,
+	VDUSE_GET_VQ_GROUP,
 };
 
 /**
@@ -316,6 +329,7 @@ struct vduse_iova_range {
  * @vq_state: virtqueue state, only index field is available
  * @s: device status
  * @iova: IOVA range for updating
+ * @vq_group: virtqueue group of a virtqueue
  * @padding: padding
  *
  * Structure used by read(2) on /dev/vduse/$NAME.
@@ -328,6 +342,8 @@ struct vduse_dev_request {
 		struct vduse_vq_state vq_state;
 		struct vduse_dev_status s;
 		struct vduse_iova_range iova;
+		/* Only if vduse api version >= 1 */;
+		struct vduse_vq_group vq_group;
 		__u32 padding[32];
 	};
 };
@@ -338,6 +354,7 @@ struct vduse_dev_request {
  * @result: the result of request
  * @reserved: for future use, needs to be initialized to zero
  * @vq_state: virtqueue state
+ * @vq_group: virtqueue group of a virtqueue
  * @padding: padding
  *
  * Structure used by write(2) on /dev/vduse/$NAME.
@@ -350,6 +367,8 @@ struct vduse_dev_response {
 	__u32 reserved[4];
 	union {
 		struct vduse_vq_state vq_state;
+		/* Only if vduse api version >= 1 */
+		struct vduse_vq_group vq_group;
 		__u32 padding[32];
 	};
 };
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 3/6] vduse: return internal vq group struct as map token
  2025-08-26 11:27 [PATCH 0/6] Add multiple address spaces support to VDUSE Eugenio Pérez
  2025-08-26 11:27 ` [PATCH 1/6] vduse: add v1 API definition Eugenio Pérez
  2025-08-26 11:27 ` [PATCH 2/6] vduse: add vq group support Eugenio Pérez
@ 2025-08-26 11:27 ` Eugenio Pérez
  2025-09-01  2:25   ` Jason Wang
  2025-08-26 11:27 ` [PATCH 4/6] vduse: create vduse_as to make it an array Eugenio Pérez
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 24+ messages in thread
From: Eugenio Pérez @ 2025-08-26 11:27 UTC (permalink / raw)
  To: Michael S . Tsirkin 
  Cc: Eugenio Pérez, Cindy Lu, Stefano Garzarella, Xuan Zhuo,
	Laurent Vivier, virtualization, linux-kernel, jasowang,
	Yongji Xie, Maxime Coquelin

Return the internal struct that represents the vq group as virtqueue map
token, instead of the device.  This allows the map functions to access
the information per group.

At this moment all the virtqueues share the same vq group, that only
can point to ASID 0.  This change prepares the infrastructure for actual
per-group address space handling

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
v3:
* Make the vq groups a dynamic array to support an arbitrary number of
  them.
---
 drivers/vdpa/vdpa_user/vduse_dev.c | 52 ++++++++++++++++++++++++------
 include/linux/virtio.h             |  6 ++--
 2 files changed, 46 insertions(+), 12 deletions(-)

diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
index 0f4e36dd167e..cdb3dc2b5e3f 100644
--- a/drivers/vdpa/vdpa_user/vduse_dev.c
+++ b/drivers/vdpa/vdpa_user/vduse_dev.c
@@ -22,6 +22,7 @@
 #include <linux/uio.h>
 #include <linux/vdpa.h>
 #include <linux/nospec.h>
+#include <linux/virtio.h>
 #include <linux/vmalloc.h>
 #include <linux/sched/mm.h>
 #include <uapi/linux/vduse.h>
@@ -84,6 +85,10 @@ struct vduse_umem {
 	struct mm_struct *mm;
 };
 
+struct vduse_vq_group_int {
+	struct vduse_dev *dev;
+};
+
 struct vduse_dev {
 	struct vduse_vdpa *vdev;
 	struct device *dev;
@@ -117,6 +122,7 @@ struct vduse_dev {
 	u32 vq_align;
 	u32 ngroups;
 	struct vduse_umem *umem;
+	struct vduse_vq_group_int *groups;
 	struct mutex mem_lock;
 	unsigned int bounce_size;
 	struct mutex domain_lock;
@@ -601,6 +607,15 @@ static u32 vduse_get_vq_group(struct vdpa_device *vdpa, u16 idx)
 	return dev->vqs[idx]->vq_group;
 }
 
+static union virtio_map vduse_get_vq_map(struct vdpa_device *vdpa, u16 idx)
+{
+	struct vduse_dev *dev = vdpa_to_vduse(vdpa);
+	u32 vq_group = dev->vqs[idx]->vq_group;
+	union virtio_map ret = { .group = &dev->groups[vq_group] };
+
+	return ret;
+}
+
 static int vduse_vdpa_get_vq_state(struct vdpa_device *vdpa, u16 idx,
 				struct vdpa_vq_state *state)
 {
@@ -848,6 +863,7 @@ static const struct vdpa_config_ops vduse_vdpa_config_ops = {
 	.get_vq_affinity	= vduse_vdpa_get_vq_affinity,
 	.reset			= vduse_vdpa_reset,
 	.set_map		= vduse_vdpa_set_map,
+	.get_vq_map		= vduse_get_vq_map,
 	.free			= vduse_vdpa_free,
 };
 
@@ -855,7 +871,8 @@ static void vduse_dev_sync_single_for_device(union virtio_map token,
 					     dma_addr_t dma_addr, size_t size,
 					     enum dma_data_direction dir)
 {
-	struct vduse_iova_domain *domain = token.iova_domain;
+	struct vduse_dev *vdev = token.group->dev;
+	struct vduse_iova_domain *domain = vdev->domain;
 
 	vduse_domain_sync_single_for_device(domain, dma_addr, size, dir);
 }
@@ -864,7 +881,8 @@ static void vduse_dev_sync_single_for_cpu(union virtio_map token,
 					     dma_addr_t dma_addr, size_t size,
 					     enum dma_data_direction dir)
 {
-	struct vduse_iova_domain *domain = token.iova_domain;
+	struct vduse_dev *vdev = token.group->dev;
+	struct vduse_iova_domain *domain = vdev->domain;
 
 	vduse_domain_sync_single_for_cpu(domain, dma_addr, size, dir);
 }
@@ -874,7 +892,8 @@ static dma_addr_t vduse_dev_map_page(union virtio_map token, struct page *page,
 				     enum dma_data_direction dir,
 				     unsigned long attrs)
 {
-	struct vduse_iova_domain *domain = token.iova_domain;
+	struct vduse_dev *vdev = token.group->dev;
+	struct vduse_iova_domain *domain = vdev->domain;
 
 	return vduse_domain_map_page(domain, page, offset, size, dir, attrs);
 }
@@ -883,7 +902,8 @@ static void vduse_dev_unmap_page(union virtio_map token, dma_addr_t dma_addr,
 				 size_t size, enum dma_data_direction dir,
 				 unsigned long attrs)
 {
-	struct vduse_iova_domain *domain = token.iova_domain;
+	struct vduse_dev *vdev = token.group->dev;
+	struct vduse_iova_domain *domain = vdev->domain;
 
 	return vduse_domain_unmap_page(domain, dma_addr, size, dir, attrs);
 }
@@ -891,7 +911,8 @@ static void vduse_dev_unmap_page(union virtio_map token, dma_addr_t dma_addr,
 static void *vduse_dev_alloc_coherent(union virtio_map token, size_t size,
 				      dma_addr_t *dma_addr, gfp_t flag)
 {
-	struct vduse_iova_domain *domain = token.iova_domain;
+	struct vduse_dev *vdev = token.group->dev;
+	struct vduse_iova_domain *domain = vdev->domain;
 	unsigned long iova;
 	void *addr;
 
@@ -910,14 +931,16 @@ static void vduse_dev_free_coherent(union virtio_map token, size_t size,
 				    void *vaddr, dma_addr_t dma_addr,
 				    unsigned long attrs)
 {
-	struct vduse_iova_domain *domain = token.iova_domain;
+	struct vduse_dev *vdev = token.group->dev;
+	struct vduse_iova_domain *domain = vdev->domain;
 
 	vduse_domain_free_coherent(domain, size, vaddr, dma_addr, attrs);
 }
 
 static bool vduse_dev_need_sync(union virtio_map token, dma_addr_t dma_addr)
 {
-	struct vduse_iova_domain *domain = token.iova_domain;
+	struct vduse_dev *vdev = token.group->dev;
+	struct vduse_iova_domain *domain = vdev->domain;
 
 	return dma_addr < domain->bounce_size;
 }
@@ -931,7 +954,8 @@ static int vduse_dev_mapping_error(union virtio_map token, dma_addr_t dma_addr)
 
 static size_t vduse_dev_max_mapping_size(union virtio_map token)
 {
-	struct vduse_iova_domain *domain = token.iova_domain;
+	struct vduse_dev *vdev = token.group->dev;
+	struct vduse_iova_domain *domain = vdev->domain;
 
 	return domain->bounce_size;
 }
@@ -1737,6 +1761,7 @@ static int vduse_destroy_dev(char *name)
 	if (dev->domain)
 		vduse_domain_destroy(dev->domain);
 	kfree(dev->name);
+	kfree(dev->groups);
 	vduse_dev_destroy(dev);
 	module_put(THIS_MODULE);
 
@@ -1902,7 +1927,15 @@ static int vduse_create_dev(struct vduse_dev_config *config,
 	dev->device_features = config->features;
 	dev->device_id = config->device_id;
 	dev->vendor_id = config->vendor_id;
+
 	dev->ngroups = (dev->api_version < 1) ? 1 : (config->ngroups ?: 1);
+	dev->groups = kcalloc(dev->ngroups, sizeof(dev->groups[0]),
+			      GFP_KERNEL);
+	if (!dev->groups)
+		goto err_vq_groups;
+	for (u32 i = 0; i < dev->ngroups; ++i)
+		dev->groups[i].dev = dev;
+
 	dev->name = kstrdup(config->name, GFP_KERNEL);
 	if (!dev->name)
 		goto err_str;
@@ -1939,6 +1972,8 @@ static int vduse_create_dev(struct vduse_dev_config *config,
 err_idr:
 	kfree(dev->name);
 err_str:
+	kfree(dev->groups);
+err_vq_groups:
 	vduse_dev_destroy(dev);
 err:
 	return ret;
@@ -2100,7 +2135,6 @@ static int vdpa_dev_add(struct vdpa_mgmt_dev *mdev, const char *name,
 		return -ENOMEM;
 	}
 
-	dev->vdev->vdpa.vmap.iova_domain = dev->domain;
 	ret = _vdpa_register_device(&dev->vdev->vdpa, dev->vq_num);
 	if (ret) {
 		put_device(&dev->vdev->vdpa.dev);
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 07040c7f1f4d..ff46d287a003 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -40,13 +40,13 @@ struct virtqueue {
 	void *priv;
 };
 
-struct vduse_iova_domain;
+struct vduse_vq_group_int;
 
 union virtio_map {
 	/* Device that performs DMA */
 	struct device *dma_dev;
-	/* VDUSE specific mapping data */
-	struct vduse_iova_domain *iova_domain;
+	/* VDUSE specific virtqueue group for doing map */
+	struct vduse_vq_group_int *group;
 };
 
 int virtqueue_add_outbuf(struct virtqueue *vq,
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 4/6] vduse: create vduse_as to make it an array
  2025-08-26 11:27 [PATCH 0/6] Add multiple address spaces support to VDUSE Eugenio Pérez
                   ` (2 preceding siblings ...)
  2025-08-26 11:27 ` [PATCH 3/6] vduse: return internal vq group struct as map token Eugenio Pérez
@ 2025-08-26 11:27 ` Eugenio Pérez
  2025-09-01  2:27   ` Jason Wang
  2025-08-26 11:27 ` [PATCH 5/6] vduse: add vq group asid support Eugenio Pérez
  2025-08-26 11:27 ` [PATCH 6/6] vduse: bump version number Eugenio Pérez
  5 siblings, 1 reply; 24+ messages in thread
From: Eugenio Pérez @ 2025-08-26 11:27 UTC (permalink / raw)
  To: Michael S . Tsirkin 
  Cc: Eugenio Pérez, Cindy Lu, Stefano Garzarella, Xuan Zhuo,
	Laurent Vivier, virtualization, linux-kernel, jasowang,
	Yongji Xie, Maxime Coquelin

This is a first step so we can make more than one different address
spaces.  No change on the colde flow intended.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 drivers/vdpa/vdpa_user/vduse_dev.c | 114 +++++++++++++++--------------
 1 file changed, 59 insertions(+), 55 deletions(-)

diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
index cdb3dc2b5e3f..7d2a3ed77b1e 100644
--- a/drivers/vdpa/vdpa_user/vduse_dev.c
+++ b/drivers/vdpa/vdpa_user/vduse_dev.c
@@ -85,6 +85,12 @@ struct vduse_umem {
 	struct mm_struct *mm;
 };
 
+struct vduse_as {
+	struct vduse_iova_domain *domain;
+	struct vduse_umem *umem;
+	struct mutex mem_lock;
+};
+
 struct vduse_vq_group_int {
 	struct vduse_dev *dev;
 };
@@ -93,7 +99,7 @@ struct vduse_dev {
 	struct vduse_vdpa *vdev;
 	struct device *dev;
 	struct vduse_virtqueue **vqs;
-	struct vduse_iova_domain *domain;
+	struct vduse_as as;
 	char *name;
 	struct mutex lock;
 	spinlock_t msg_lock;
@@ -121,9 +127,7 @@ struct vduse_dev {
 	u32 vq_num;
 	u32 vq_align;
 	u32 ngroups;
-	struct vduse_umem *umem;
 	struct vduse_vq_group_int *groups;
-	struct mutex mem_lock;
 	unsigned int bounce_size;
 	struct mutex domain_lock;
 };
@@ -438,7 +442,7 @@ static __poll_t vduse_dev_poll(struct file *file, poll_table *wait)
 static void vduse_dev_reset(struct vduse_dev *dev)
 {
 	int i;
-	struct vduse_iova_domain *domain = dev->domain;
+	struct vduse_iova_domain *domain = dev->as.domain;
 
 	/* The coherent mappings are handled in vduse_dev_free_coherent() */
 	if (domain && domain->bounce_map)
@@ -814,13 +818,13 @@ static int vduse_vdpa_set_map(struct vdpa_device *vdpa,
 	struct vduse_dev *dev = vdpa_to_vduse(vdpa);
 	int ret;
 
-	ret = vduse_domain_set_map(dev->domain, iotlb);
+	ret = vduse_domain_set_map(dev->as.domain, iotlb);
 	if (ret)
 		return ret;
 
 	ret = vduse_dev_update_iotlb(dev, 0ULL, ULLONG_MAX);
 	if (ret) {
-		vduse_domain_clear_map(dev->domain, iotlb);
+		vduse_domain_clear_map(dev->as.domain, iotlb);
 		return ret;
 	}
 
@@ -872,7 +876,7 @@ static void vduse_dev_sync_single_for_device(union virtio_map token,
 					     enum dma_data_direction dir)
 {
 	struct vduse_dev *vdev = token.group->dev;
-	struct vduse_iova_domain *domain = vdev->domain;
+	struct vduse_iova_domain *domain = vdev->as.domain;
 
 	vduse_domain_sync_single_for_device(domain, dma_addr, size, dir);
 }
@@ -882,7 +886,7 @@ static void vduse_dev_sync_single_for_cpu(union virtio_map token,
 					     enum dma_data_direction dir)
 {
 	struct vduse_dev *vdev = token.group->dev;
-	struct vduse_iova_domain *domain = vdev->domain;
+	struct vduse_iova_domain *domain = vdev->as.domain;
 
 	vduse_domain_sync_single_for_cpu(domain, dma_addr, size, dir);
 }
@@ -893,7 +897,7 @@ static dma_addr_t vduse_dev_map_page(union virtio_map token, struct page *page,
 				     unsigned long attrs)
 {
 	struct vduse_dev *vdev = token.group->dev;
-	struct vduse_iova_domain *domain = vdev->domain;
+	struct vduse_iova_domain *domain = vdev->as.domain;
 
 	return vduse_domain_map_page(domain, page, offset, size, dir, attrs);
 }
@@ -903,7 +907,7 @@ static void vduse_dev_unmap_page(union virtio_map token, dma_addr_t dma_addr,
 				 unsigned long attrs)
 {
 	struct vduse_dev *vdev = token.group->dev;
-	struct vduse_iova_domain *domain = vdev->domain;
+	struct vduse_iova_domain *domain = vdev->as.domain;
 
 	return vduse_domain_unmap_page(domain, dma_addr, size, dir, attrs);
 }
@@ -912,7 +916,7 @@ static void *vduse_dev_alloc_coherent(union virtio_map token, size_t size,
 				      dma_addr_t *dma_addr, gfp_t flag)
 {
 	struct vduse_dev *vdev = token.group->dev;
-	struct vduse_iova_domain *domain = vdev->domain;
+	struct vduse_iova_domain *domain = vdev->as.domain;
 	unsigned long iova;
 	void *addr;
 
@@ -932,7 +936,7 @@ static void vduse_dev_free_coherent(union virtio_map token, size_t size,
 				    unsigned long attrs)
 {
 	struct vduse_dev *vdev = token.group->dev;
-	struct vduse_iova_domain *domain = vdev->domain;
+	struct vduse_iova_domain *domain = vdev->as.domain;
 
 	vduse_domain_free_coherent(domain, size, vaddr, dma_addr, attrs);
 }
@@ -940,7 +944,7 @@ static void vduse_dev_free_coherent(union virtio_map token, size_t size,
 static bool vduse_dev_need_sync(union virtio_map token, dma_addr_t dma_addr)
 {
 	struct vduse_dev *vdev = token.group->dev;
-	struct vduse_iova_domain *domain = vdev->domain;
+	struct vduse_iova_domain *domain = vdev->as.domain;
 
 	return dma_addr < domain->bounce_size;
 }
@@ -955,7 +959,7 @@ static int vduse_dev_mapping_error(union virtio_map token, dma_addr_t dma_addr)
 static size_t vduse_dev_max_mapping_size(union virtio_map token)
 {
 	struct vduse_dev *vdev = token.group->dev;
-	struct vduse_iova_domain *domain = vdev->domain;
+	struct vduse_iova_domain *domain = vdev->as.domain;
 
 	return domain->bounce_size;
 }
@@ -1102,29 +1106,29 @@ static int vduse_dev_dereg_umem(struct vduse_dev *dev,
 {
 	int ret;
 
-	mutex_lock(&dev->mem_lock);
+	mutex_lock(&dev->as.mem_lock);
 	ret = -ENOENT;
-	if (!dev->umem)
+	if (!dev->as.umem)
 		goto unlock;
 
 	ret = -EINVAL;
-	if (!dev->domain)
+	if (!dev->as.domain)
 		goto unlock;
 
-	if (dev->umem->iova != iova || size != dev->domain->bounce_size)
+	if (dev->as.umem->iova != iova || size != dev->as.domain->bounce_size)
 		goto unlock;
 
-	vduse_domain_remove_user_bounce_pages(dev->domain);
-	unpin_user_pages_dirty_lock(dev->umem->pages,
-				    dev->umem->npages, true);
-	atomic64_sub(dev->umem->npages, &dev->umem->mm->pinned_vm);
-	mmdrop(dev->umem->mm);
-	vfree(dev->umem->pages);
-	kfree(dev->umem);
-	dev->umem = NULL;
+	vduse_domain_remove_user_bounce_pages(dev->as.domain);
+	unpin_user_pages_dirty_lock(dev->as.umem->pages,
+				    dev->as.umem->npages, true);
+	atomic64_sub(dev->as.umem->npages, &dev->as.umem->mm->pinned_vm);
+	mmdrop(dev->as.umem->mm);
+	vfree(dev->as.umem->pages);
+	kfree(dev->as.umem);
+	dev->as.umem = NULL;
 	ret = 0;
 unlock:
-	mutex_unlock(&dev->mem_lock);
+	mutex_unlock(&dev->as.mem_lock);
 	return ret;
 }
 
@@ -1137,14 +1141,14 @@ static int vduse_dev_reg_umem(struct vduse_dev *dev,
 	unsigned long npages, lock_limit;
 	int ret;
 
-	if (!dev->domain || !dev->domain->bounce_map ||
-	    size != dev->domain->bounce_size ||
+	if (!dev->as.domain || !dev->as.domain->bounce_map ||
+	    size != dev->as.domain->bounce_size ||
 	    iova != 0 || uaddr & ~PAGE_MASK)
 		return -EINVAL;
 
-	mutex_lock(&dev->mem_lock);
+	mutex_lock(&dev->as.mem_lock);
 	ret = -EEXIST;
-	if (dev->umem)
+	if (dev->as.umem)
 		goto unlock;
 
 	ret = -ENOMEM;
@@ -1168,7 +1172,7 @@ static int vduse_dev_reg_umem(struct vduse_dev *dev,
 		goto out;
 	}
 
-	ret = vduse_domain_add_user_bounce_pages(dev->domain,
+	ret = vduse_domain_add_user_bounce_pages(dev->as.domain,
 						 page_list, pinned);
 	if (ret)
 		goto out;
@@ -1181,7 +1185,7 @@ static int vduse_dev_reg_umem(struct vduse_dev *dev,
 	umem->mm = current->mm;
 	mmgrab(current->mm);
 
-	dev->umem = umem;
+	dev->as.umem = umem;
 out:
 	if (ret && pinned > 0)
 		unpin_user_pages(page_list, pinned);
@@ -1192,7 +1196,7 @@ static int vduse_dev_reg_umem(struct vduse_dev *dev,
 		vfree(page_list);
 		kfree(umem);
 	}
-	mutex_unlock(&dev->mem_lock);
+	mutex_unlock(&dev->as.mem_lock);
 	return ret;
 }
 
@@ -1238,12 +1242,12 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
 			break;
 
 		mutex_lock(&dev->domain_lock);
-		if (!dev->domain) {
+		if (!dev->as.domain) {
 			mutex_unlock(&dev->domain_lock);
 			break;
 		}
-		spin_lock(&dev->domain->iotlb_lock);
-		map = vhost_iotlb_itree_first(dev->domain->iotlb,
+		spin_lock(&dev->as.domain->iotlb_lock);
+		map = vhost_iotlb_itree_first(dev->as.domain->iotlb,
 					      entry.start, entry.last);
 		if (map) {
 			map_file = (struct vdpa_map_file *)map->opaque;
@@ -1253,7 +1257,7 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
 			entry.last = map->last;
 			entry.perm = map->perm;
 		}
-		spin_unlock(&dev->domain->iotlb_lock);
+		spin_unlock(&dev->as.domain->iotlb_lock);
 		mutex_unlock(&dev->domain_lock);
 		ret = -EINVAL;
 		if (!f)
@@ -1447,22 +1451,22 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
 			break;
 
 		mutex_lock(&dev->domain_lock);
-		if (!dev->domain) {
+		if (!dev->as.domain) {
 			mutex_unlock(&dev->domain_lock);
 			break;
 		}
-		spin_lock(&dev->domain->iotlb_lock);
-		map = vhost_iotlb_itree_first(dev->domain->iotlb,
+		spin_lock(&dev->as.domain->iotlb_lock);
+		map = vhost_iotlb_itree_first(dev->as.domain->iotlb,
 					      info.start, info.last);
 		if (map) {
 			info.start = map->start;
 			info.last = map->last;
 			info.capability = 0;
-			if (dev->domain->bounce_map && map->start == 0 &&
-			    map->last == dev->domain->bounce_size - 1)
+			if (dev->as.domain->bounce_map && map->start == 0 &&
+			    map->last == dev->as.domain->bounce_size - 1)
 				info.capability |= VDUSE_IOVA_CAP_UMEM;
 		}
-		spin_unlock(&dev->domain->iotlb_lock);
+		spin_unlock(&dev->as.domain->iotlb_lock);
 		mutex_unlock(&dev->domain_lock);
 		if (!map)
 			break;
@@ -1487,8 +1491,8 @@ static int vduse_dev_release(struct inode *inode, struct file *file)
 	struct vduse_dev *dev = file->private_data;
 
 	mutex_lock(&dev->domain_lock);
-	if (dev->domain)
-		vduse_dev_dereg_umem(dev, 0, dev->domain->bounce_size);
+	if (dev->as.domain)
+		vduse_dev_dereg_umem(dev, 0, dev->as.domain->bounce_size);
 	mutex_unlock(&dev->domain_lock);
 	spin_lock(&dev->msg_lock);
 	/* Make sure the inflight messages can processed after reconncection */
@@ -1707,7 +1711,7 @@ static struct vduse_dev *vduse_dev_create(void)
 		return NULL;
 
 	mutex_init(&dev->lock);
-	mutex_init(&dev->mem_lock);
+	mutex_init(&dev->as.mem_lock);
 	mutex_init(&dev->domain_lock);
 	spin_lock_init(&dev->msg_lock);
 	INIT_LIST_HEAD(&dev->send_list);
@@ -1758,8 +1762,8 @@ static int vduse_destroy_dev(char *name)
 	idr_remove(&vduse_idr, dev->minor);
 	kvfree(dev->config);
 	vduse_dev_deinit_vqs(dev);
-	if (dev->domain)
-		vduse_domain_destroy(dev->domain);
+	if (dev->as.domain)
+		vduse_domain_destroy(dev->as.domain);
 	kfree(dev->name);
 	kfree(dev->groups);
 	vduse_dev_destroy(dev);
@@ -1875,7 +1879,7 @@ static ssize_t bounce_size_store(struct device *device,
 
 	ret = -EPERM;
 	mutex_lock(&dev->domain_lock);
-	if (dev->domain)
+	if (dev->as.domain)
 		goto unlock;
 
 	ret = kstrtouint(buf, 10, &bounce_size);
@@ -2126,11 +2130,11 @@ static int vdpa_dev_add(struct vdpa_mgmt_dev *mdev, const char *name,
 		return ret;
 
 	mutex_lock(&dev->domain_lock);
-	if (!dev->domain)
-		dev->domain = vduse_domain_create(VDUSE_IOVA_SIZE - 1,
+	if (!dev->as.domain)
+		dev->as.domain = vduse_domain_create(VDUSE_IOVA_SIZE - 1,
 						  dev->bounce_size);
 	mutex_unlock(&dev->domain_lock);
-	if (!dev->domain) {
+	if (!dev->as.domain) {
 		put_device(&dev->vdev->vdpa.dev);
 		return -ENOMEM;
 	}
@@ -2139,8 +2143,8 @@ static int vdpa_dev_add(struct vdpa_mgmt_dev *mdev, const char *name,
 	if (ret) {
 		put_device(&dev->vdev->vdpa.dev);
 		mutex_lock(&dev->domain_lock);
-		vduse_domain_destroy(dev->domain);
-		dev->domain = NULL;
+		vduse_domain_destroy(dev->as.domain);
+		dev->as.domain = NULL;
 		mutex_unlock(&dev->domain_lock);
 		return ret;
 	}
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 5/6] vduse: add vq group asid support
  2025-08-26 11:27 [PATCH 0/6] Add multiple address spaces support to VDUSE Eugenio Pérez
                   ` (3 preceding siblings ...)
  2025-08-26 11:27 ` [PATCH 4/6] vduse: create vduse_as to make it an array Eugenio Pérez
@ 2025-08-26 11:27 ` Eugenio Pérez
  2025-09-01  2:46   ` Jason Wang
  2025-08-26 11:27 ` [PATCH 6/6] vduse: bump version number Eugenio Pérez
  5 siblings, 1 reply; 24+ messages in thread
From: Eugenio Pérez @ 2025-08-26 11:27 UTC (permalink / raw)
  To: Michael S . Tsirkin 
  Cc: Eugenio Pérez, Cindy Lu, Stefano Garzarella, Xuan Zhuo,
	Laurent Vivier, virtualization, linux-kernel, jasowang,
	Yongji Xie, Maxime Coquelin

Add support for assigning Address Space Identifiers (ASIDs) to each VQ
group.  This enables mapping each group into a distinct memory space.

Now that the driver can change ASID in the middle of operation, the
domain that each vq address point is also protected by domain_lock.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
v3:
* Increase VDUSE_MAX_VQ_GROUPS to 0xffff (Jason). It was set to a lower
  value to reduce memory consumption, but vqs are already limited to
  that value and userspace VDUSE is able to allocate that many vqs.
* Remove TODO about merging VDUSE_IOTLB_GET_FD ioctl with
  VDUSE_IOTLB_GET_INFO.
* Use of array_index_nospec in VDUSE device ioctls.
* Embed vduse_iotlb_entry into vduse_iotlb_entry_v2.
* Move the umem mutex to asid struct so there is no contention between
  ASIDs.

v2:
* Make iotlb entry the last one of vduse_iotlb_entry_v2 so the first
  part of the struct is the same.
---
 drivers/vdpa/vdpa_user/vduse_dev.c | 290 +++++++++++++++++++++--------
 include/uapi/linux/vduse.h         |  52 +++++-
 2 files changed, 259 insertions(+), 83 deletions(-)

diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
index 7d2a3ed77b1e..2fb227713972 100644
--- a/drivers/vdpa/vdpa_user/vduse_dev.c
+++ b/drivers/vdpa/vdpa_user/vduse_dev.c
@@ -92,6 +92,7 @@ struct vduse_as {
 };
 
 struct vduse_vq_group_int {
+	struct vduse_iova_domain *domain;
 	struct vduse_dev *dev;
 };
 
@@ -99,7 +100,7 @@ struct vduse_dev {
 	struct vduse_vdpa *vdev;
 	struct device *dev;
 	struct vduse_virtqueue **vqs;
-	struct vduse_as as;
+	struct vduse_as *as;
 	char *name;
 	struct mutex lock;
 	spinlock_t msg_lock;
@@ -127,6 +128,7 @@ struct vduse_dev {
 	u32 vq_num;
 	u32 vq_align;
 	u32 ngroups;
+	u32 nas;
 	struct vduse_vq_group_int *groups;
 	unsigned int bounce_size;
 	struct mutex domain_lock;
@@ -317,7 +319,7 @@ static int vduse_dev_set_status(struct vduse_dev *dev, u8 status)
 	return vduse_dev_msg_sync(dev, &msg);
 }
 
-static int vduse_dev_update_iotlb(struct vduse_dev *dev,
+static int vduse_dev_update_iotlb(struct vduse_dev *dev, u32 asid,
 				  u64 start, u64 last)
 {
 	struct vduse_dev_msg msg = { 0 };
@@ -326,8 +328,14 @@ static int vduse_dev_update_iotlb(struct vduse_dev *dev,
 		return -EINVAL;
 
 	msg.req.type = VDUSE_UPDATE_IOTLB;
-	msg.req.iova.start = start;
-	msg.req.iova.last = last;
+	if (dev->api_version < VDUSE_API_VERSION_1) {
+		msg.req.iova.start = start;
+		msg.req.iova.last = last;
+	} else {
+		msg.req.iova_v2.start = start;
+		msg.req.iova_v2.last = last;
+		msg.req.iova_v2.asid = asid;
+	}
 
 	return vduse_dev_msg_sync(dev, &msg);
 }
@@ -439,14 +447,28 @@ static __poll_t vduse_dev_poll(struct file *file, poll_table *wait)
 	return mask;
 }
 
+/* Force set the asid to a vq group without a message to the VDUSE device */
+static void vduse_set_group_asid_nomsg(struct vduse_dev *dev,
+				       unsigned int group, unsigned int asid)
+{
+	guard(mutex)(&dev->domain_lock);
+	dev->groups[group].domain = dev->as[asid].domain;
+}
+
 static void vduse_dev_reset(struct vduse_dev *dev)
 {
 	int i;
-	struct vduse_iova_domain *domain = dev->as.domain;
 
 	/* The coherent mappings are handled in vduse_dev_free_coherent() */
-	if (domain && domain->bounce_map)
-		vduse_domain_reset_bounce_map(domain);
+	for (i = 0; i < dev->nas; i++) {
+		struct vduse_iova_domain *domain = dev->as[i].domain;
+
+		if (domain && domain->bounce_map)
+			vduse_domain_reset_bounce_map(domain);
+	}
+
+	for (i = 0; i < dev->ngroups; i++)
+		vduse_set_group_asid_nomsg(dev, i, 0);
 
 	down_write(&dev->rwsem);
 
@@ -620,6 +642,29 @@ static union virtio_map vduse_get_vq_map(struct vdpa_device *vdpa, u16 idx)
 	return ret;
 }
 
+static int vduse_set_group_asid(struct vdpa_device *vdpa, unsigned int group,
+				unsigned int asid)
+{
+	struct vduse_dev *dev = vdpa_to_vduse(vdpa);
+	struct vduse_dev_msg msg = { 0 };
+	int r;
+
+	if (dev->api_version < VDUSE_API_VERSION_1 ||
+	    group >= dev->ngroups || asid >= dev->nas)
+		return -EINVAL;
+
+	msg.req.type = VDUSE_SET_VQ_GROUP_ASID;
+	msg.req.vq_group_asid.group = group;
+	msg.req.vq_group_asid.asid = asid;
+
+	r = vduse_dev_msg_sync(dev, &msg);
+	if (r < 0)
+		return r;
+
+	vduse_set_group_asid_nomsg(dev, group, asid);
+	return 0;
+}
+
 static int vduse_vdpa_get_vq_state(struct vdpa_device *vdpa, u16 idx,
 				struct vdpa_vq_state *state)
 {
@@ -818,13 +863,13 @@ static int vduse_vdpa_set_map(struct vdpa_device *vdpa,
 	struct vduse_dev *dev = vdpa_to_vduse(vdpa);
 	int ret;
 
-	ret = vduse_domain_set_map(dev->as.domain, iotlb);
+	ret = vduse_domain_set_map(dev->as[asid].domain, iotlb);
 	if (ret)
 		return ret;
 
-	ret = vduse_dev_update_iotlb(dev, 0ULL, ULLONG_MAX);
+	ret = vduse_dev_update_iotlb(dev, asid, 0ULL, ULLONG_MAX);
 	if (ret) {
-		vduse_domain_clear_map(dev->as.domain, iotlb);
+		vduse_domain_clear_map(dev->as[asid].domain, iotlb);
 		return ret;
 	}
 
@@ -867,6 +912,7 @@ static const struct vdpa_config_ops vduse_vdpa_config_ops = {
 	.get_vq_affinity	= vduse_vdpa_get_vq_affinity,
 	.reset			= vduse_vdpa_reset,
 	.set_map		= vduse_vdpa_set_map,
+	.set_group_asid		= vduse_set_group_asid,
 	.get_vq_map		= vduse_get_vq_map,
 	.free			= vduse_vdpa_free,
 };
@@ -876,8 +922,10 @@ static void vduse_dev_sync_single_for_device(union virtio_map token,
 					     enum dma_data_direction dir)
 {
 	struct vduse_dev *vdev = token.group->dev;
-	struct vduse_iova_domain *domain = vdev->as.domain;
+	struct vduse_iova_domain *domain;
 
+	guard(mutex)(&vdev->domain_lock);
+	domain = token.group->domain;
 	vduse_domain_sync_single_for_device(domain, dma_addr, size, dir);
 }
 
@@ -886,8 +934,10 @@ static void vduse_dev_sync_single_for_cpu(union virtio_map token,
 					     enum dma_data_direction dir)
 {
 	struct vduse_dev *vdev = token.group->dev;
-	struct vduse_iova_domain *domain = vdev->as.domain;
+	struct vduse_iova_domain *domain;
 
+	guard(mutex)(&vdev->domain_lock);
+	domain = token.group->domain;
 	vduse_domain_sync_single_for_cpu(domain, dma_addr, size, dir);
 }
 
@@ -897,8 +947,10 @@ static dma_addr_t vduse_dev_map_page(union virtio_map token, struct page *page,
 				     unsigned long attrs)
 {
 	struct vduse_dev *vdev = token.group->dev;
-	struct vduse_iova_domain *domain = vdev->as.domain;
+	struct vduse_iova_domain *domain;
 
+	guard(mutex)(&vdev->domain_lock);
+	domain = token.group->domain;
 	return vduse_domain_map_page(domain, page, offset, size, dir, attrs);
 }
 
@@ -907,8 +959,10 @@ static void vduse_dev_unmap_page(union virtio_map token, dma_addr_t dma_addr,
 				 unsigned long attrs)
 {
 	struct vduse_dev *vdev = token.group->dev;
-	struct vduse_iova_domain *domain = vdev->as.domain;
+	struct vduse_iova_domain *domain;
 
+	guard(mutex)(&vdev->domain_lock);
+	domain = token.group->domain;
 	return vduse_domain_unmap_page(domain, dma_addr, size, dir, attrs);
 }
 
@@ -916,11 +970,13 @@ static void *vduse_dev_alloc_coherent(union virtio_map token, size_t size,
 				      dma_addr_t *dma_addr, gfp_t flag)
 {
 	struct vduse_dev *vdev = token.group->dev;
-	struct vduse_iova_domain *domain = vdev->as.domain;
+	struct vduse_iova_domain *domain;
 	unsigned long iova;
 	void *addr;
 
 	*dma_addr = DMA_MAPPING_ERROR;
+	guard(mutex)(&vdev->domain_lock);
+	domain = token.group->domain;
 	addr = vduse_domain_alloc_coherent(domain, size,
 					   (dma_addr_t *)&iova, flag);
 	if (!addr)
@@ -936,16 +992,20 @@ static void vduse_dev_free_coherent(union virtio_map token, size_t size,
 				    unsigned long attrs)
 {
 	struct vduse_dev *vdev = token.group->dev;
-	struct vduse_iova_domain *domain = vdev->as.domain;
+	struct vduse_iova_domain *domain;
 
+	guard(mutex)(&vdev->domain_lock);
+	domain = token.group->domain;
 	vduse_domain_free_coherent(domain, size, vaddr, dma_addr, attrs);
 }
 
 static bool vduse_dev_need_sync(union virtio_map token, dma_addr_t dma_addr)
 {
 	struct vduse_dev *vdev = token.group->dev;
-	struct vduse_iova_domain *domain = vdev->as.domain;
+	struct vduse_iova_domain *domain;
 
+	guard(mutex)(&vdev->domain_lock);
+	domain = token.group->domain;
 	return dma_addr < domain->bounce_size;
 }
 
@@ -959,8 +1019,10 @@ static int vduse_dev_mapping_error(union virtio_map token, dma_addr_t dma_addr)
 static size_t vduse_dev_max_mapping_size(union virtio_map token)
 {
 	struct vduse_dev *vdev = token.group->dev;
-	struct vduse_iova_domain *domain = vdev->as.domain;
+	struct vduse_iova_domain *domain;
 
+	guard(mutex)(&vdev->domain_lock);
+	domain = token.group->domain;
 	return domain->bounce_size;
 }
 
@@ -1101,39 +1163,40 @@ static int vduse_dev_queue_irq_work(struct vduse_dev *dev,
 	return ret;
 }
 
-static int vduse_dev_dereg_umem(struct vduse_dev *dev,
+static int vduse_dev_dereg_umem(struct vduse_dev *dev, u32 asid,
 				u64 iova, u64 size)
 {
 	int ret;
 
-	mutex_lock(&dev->as.mem_lock);
+	mutex_lock(&dev->as[asid].mem_lock);
 	ret = -ENOENT;
-	if (!dev->as.umem)
+	if (!dev->as[asid].umem)
 		goto unlock;
 
 	ret = -EINVAL;
-	if (!dev->as.domain)
+	if (!dev->as[asid].domain)
 		goto unlock;
 
-	if (dev->as.umem->iova != iova || size != dev->as.domain->bounce_size)
+	if (dev->as[asid].umem->iova != iova ||
+	    size != dev->as[asid].domain->bounce_size)
 		goto unlock;
 
-	vduse_domain_remove_user_bounce_pages(dev->as.domain);
-	unpin_user_pages_dirty_lock(dev->as.umem->pages,
-				    dev->as.umem->npages, true);
-	atomic64_sub(dev->as.umem->npages, &dev->as.umem->mm->pinned_vm);
-	mmdrop(dev->as.umem->mm);
-	vfree(dev->as.umem->pages);
-	kfree(dev->as.umem);
-	dev->as.umem = NULL;
+	vduse_domain_remove_user_bounce_pages(dev->as[asid].domain);
+	unpin_user_pages_dirty_lock(dev->as[asid].umem->pages,
+				    dev->as[asid].umem->npages, true);
+	atomic64_sub(dev->as[asid].umem->npages, &dev->as[asid].umem->mm->pinned_vm);
+	mmdrop(dev->as[asid].umem->mm);
+	vfree(dev->as[asid].umem->pages);
+	kfree(dev->as[asid].umem);
+	dev->as[asid].umem = NULL;
 	ret = 0;
 unlock:
-	mutex_unlock(&dev->as.mem_lock);
+	mutex_unlock(&dev->as[asid].mem_lock);
 	return ret;
 }
 
 static int vduse_dev_reg_umem(struct vduse_dev *dev,
-			      u64 iova, u64 uaddr, u64 size)
+			      u32 asid, u64 iova, u64 uaddr, u64 size)
 {
 	struct page **page_list = NULL;
 	struct vduse_umem *umem = NULL;
@@ -1141,14 +1204,14 @@ static int vduse_dev_reg_umem(struct vduse_dev *dev,
 	unsigned long npages, lock_limit;
 	int ret;
 
-	if (!dev->as.domain || !dev->as.domain->bounce_map ||
-	    size != dev->as.domain->bounce_size ||
+	if (!dev->as[asid].domain || !dev->as[asid].domain->bounce_map ||
+	    size != dev->as[asid].domain->bounce_size ||
 	    iova != 0 || uaddr & ~PAGE_MASK)
 		return -EINVAL;
 
-	mutex_lock(&dev->as.mem_lock);
+	mutex_lock(&dev->as[asid].mem_lock);
 	ret = -EEXIST;
-	if (dev->as.umem)
+	if (dev->as[asid].umem)
 		goto unlock;
 
 	ret = -ENOMEM;
@@ -1172,7 +1235,7 @@ static int vduse_dev_reg_umem(struct vduse_dev *dev,
 		goto out;
 	}
 
-	ret = vduse_domain_add_user_bounce_pages(dev->as.domain,
+	ret = vduse_domain_add_user_bounce_pages(dev->as[asid].domain,
 						 page_list, pinned);
 	if (ret)
 		goto out;
@@ -1185,7 +1248,7 @@ static int vduse_dev_reg_umem(struct vduse_dev *dev,
 	umem->mm = current->mm;
 	mmgrab(current->mm);
 
-	dev->as.umem = umem;
+	dev->as[asid].umem = umem;
 out:
 	if (ret && pinned > 0)
 		unpin_user_pages(page_list, pinned);
@@ -1196,7 +1259,7 @@ static int vduse_dev_reg_umem(struct vduse_dev *dev,
 		vfree(page_list);
 		kfree(umem);
 	}
-	mutex_unlock(&dev->as.mem_lock);
+	mutex_unlock(&dev->as[asid].mem_lock);
 	return ret;
 }
 
@@ -1228,47 +1291,66 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
 
 	switch (cmd) {
 	case VDUSE_IOTLB_GET_FD: {
-		struct vduse_iotlb_entry entry;
+		struct vduse_iotlb_entry_v2 entry;
 		struct vhost_iotlb_map *map;
 		struct vdpa_map_file *map_file;
 		struct file *f = NULL;
+		u32 asid;
 
 		ret = -EFAULT;
-		if (copy_from_user(&entry, argp, sizeof(entry)))
-			break;
+		if (dev->api_version >= VDUSE_API_VERSION_1) {
+			if (copy_from_user(&entry, argp, sizeof(entry)))
+				break;
+		} else {
+			entry.asid = 0;
+			if (copy_from_user(&entry.v1, argp,
+					   sizeof(entry.v1)))
+				break;
+		}
 
 		ret = -EINVAL;
-		if (entry.start > entry.last)
+		if (entry.v1.start > entry.v1.last)
+			break;
+
+		if (entry.asid >= dev->nas)
 			break;
 
 		mutex_lock(&dev->domain_lock);
-		if (!dev->as.domain) {
+		asid = array_index_nospec(entry.asid, dev->nas);
+		if (!dev->as[asid].domain) {
 			mutex_unlock(&dev->domain_lock);
 			break;
 		}
-		spin_lock(&dev->as.domain->iotlb_lock);
-		map = vhost_iotlb_itree_first(dev->as.domain->iotlb,
-					      entry.start, entry.last);
+		spin_lock(&dev->as[asid].domain->iotlb_lock);
+		map = vhost_iotlb_itree_first(dev->as[asid].domain->iotlb,
+					      entry.v1.start, entry.v1.last);
 		if (map) {
 			map_file = (struct vdpa_map_file *)map->opaque;
 			f = get_file(map_file->file);
-			entry.offset = map_file->offset;
-			entry.start = map->start;
-			entry.last = map->last;
-			entry.perm = map->perm;
+			entry.v1.offset = map_file->offset;
+			entry.v1.start = map->start;
+			entry.v1.last = map->last;
+			entry.v1.perm = map->perm;
 		}
-		spin_unlock(&dev->as.domain->iotlb_lock);
+		spin_unlock(&dev->as[asid].domain->iotlb_lock);
 		mutex_unlock(&dev->domain_lock);
 		ret = -EINVAL;
 		if (!f)
 			break;
 
 		ret = -EFAULT;
-		if (copy_to_user(argp, &entry, sizeof(entry))) {
+		if (dev->api_version >= VDUSE_API_VERSION_1)
+			ret = copy_to_user(argp, &entry,
+					   sizeof(entry));
+		else
+			ret = copy_to_user(argp, &entry.v1,
+					   sizeof(entry.v1));
+
+		if (ret) {
 			fput(f);
 			break;
 		}
-		ret = receive_fd(f, NULL, perm_to_file_flags(entry.perm));
+		ret = receive_fd(f, NULL, perm_to_file_flags(entry.v1.perm));
 		fput(f);
 		break;
 	}
@@ -1401,6 +1483,7 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
 	}
 	case VDUSE_IOTLB_REG_UMEM: {
 		struct vduse_iova_umem umem;
+		u32 asid;
 
 		ret = -EFAULT;
 		if (copy_from_user(&umem, argp, sizeof(umem)))
@@ -1408,17 +1491,21 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
 
 		ret = -EINVAL;
 		if (!is_mem_zero((const char *)umem.reserved,
-				 sizeof(umem.reserved)))
+				 sizeof(umem.reserved)) ||
+		    (dev->api_version < VDUSE_API_VERSION_1 &&
+		     umem.asid != 0) || umem.asid >= dev->nas)
 			break;
 
 		mutex_lock(&dev->domain_lock);
-		ret = vduse_dev_reg_umem(dev, umem.iova,
+		asid = array_index_nospec(umem.asid, dev->nas);
+		ret = vduse_dev_reg_umem(dev, asid, umem.iova,
 					 umem.uaddr, umem.size);
 		mutex_unlock(&dev->domain_lock);
 		break;
 	}
 	case VDUSE_IOTLB_DEREG_UMEM: {
 		struct vduse_iova_umem umem;
+		u32 asid;
 
 		ret = -EFAULT;
 		if (copy_from_user(&umem, argp, sizeof(umem)))
@@ -1426,10 +1513,15 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
 
 		ret = -EINVAL;
 		if (!is_mem_zero((const char *)umem.reserved,
-				 sizeof(umem.reserved)))
+				 sizeof(umem.reserved)) ||
+		    (dev->api_version < VDUSE_API_VERSION_1 &&
+		     umem.asid != 0) ||
+		     umem.asid >= dev->nas)
 			break;
+
 		mutex_lock(&dev->domain_lock);
-		ret = vduse_dev_dereg_umem(dev, umem.iova,
+		asid = array_index_nospec(umem.asid, dev->nas);
+		ret = vduse_dev_dereg_umem(dev, asid, umem.iova,
 					   umem.size);
 		mutex_unlock(&dev->domain_lock);
 		break;
@@ -1437,6 +1529,7 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
 	case VDUSE_IOTLB_GET_INFO: {
 		struct vduse_iova_info info;
 		struct vhost_iotlb_map *map;
+		u32 asid;
 
 		ret = -EFAULT;
 		if (copy_from_user(&info, argp, sizeof(info)))
@@ -1450,23 +1543,31 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
 				 sizeof(info.reserved)))
 			break;
 
+		if (dev->api_version < VDUSE_API_VERSION_1) {
+			if (info.asid)
+				break;
+		} else if (info.asid >= dev->nas)
+			break;
+
 		mutex_lock(&dev->domain_lock);
-		if (!dev->as.domain) {
+		asid = array_index_nospec(info.asid, dev->nas);
+		if (!dev->as[asid].domain) {
 			mutex_unlock(&dev->domain_lock);
 			break;
 		}
-		spin_lock(&dev->as.domain->iotlb_lock);
-		map = vhost_iotlb_itree_first(dev->as.domain->iotlb,
+		spin_lock(&dev->as[asid].domain->iotlb_lock);
+		map = vhost_iotlb_itree_first(dev->as[asid].domain->iotlb,
 					      info.start, info.last);
 		if (map) {
 			info.start = map->start;
 			info.last = map->last;
 			info.capability = 0;
-			if (dev->as.domain->bounce_map && map->start == 0 &&
-			    map->last == dev->as.domain->bounce_size - 1)
+			if (dev->as[asid].domain->bounce_map &&
+			    map->start == 0 &&
+			    map->last == dev->as[asid].domain->bounce_size - 1)
 				info.capability |= VDUSE_IOVA_CAP_UMEM;
 		}
-		spin_unlock(&dev->as.domain->iotlb_lock);
+		spin_unlock(&dev->as[asid].domain->iotlb_lock);
 		mutex_unlock(&dev->domain_lock);
 		if (!map)
 			break;
@@ -1491,8 +1592,10 @@ static int vduse_dev_release(struct inode *inode, struct file *file)
 	struct vduse_dev *dev = file->private_data;
 
 	mutex_lock(&dev->domain_lock);
-	if (dev->as.domain)
-		vduse_dev_dereg_umem(dev, 0, dev->as.domain->bounce_size);
+	for (int i = 0; i < dev->nas; i++)
+		if (dev->as[i].domain)
+			vduse_dev_dereg_umem(dev, i, 0,
+					     dev->as[i].domain->bounce_size);
 	mutex_unlock(&dev->domain_lock);
 	spin_lock(&dev->msg_lock);
 	/* Make sure the inflight messages can processed after reconncection */
@@ -1711,7 +1814,6 @@ static struct vduse_dev *vduse_dev_create(void)
 		return NULL;
 
 	mutex_init(&dev->lock);
-	mutex_init(&dev->as.mem_lock);
 	mutex_init(&dev->domain_lock);
 	spin_lock_init(&dev->msg_lock);
 	INIT_LIST_HEAD(&dev->send_list);
@@ -1762,8 +1864,11 @@ static int vduse_destroy_dev(char *name)
 	idr_remove(&vduse_idr, dev->minor);
 	kvfree(dev->config);
 	vduse_dev_deinit_vqs(dev);
-	if (dev->as.domain)
-		vduse_domain_destroy(dev->as.domain);
+	for (int i = 0; i < dev->nas; i++) {
+		if (dev->as[i].domain)
+			vduse_domain_destroy(dev->as[i].domain);
+	}
+	kfree(dev->as);
 	kfree(dev->name);
 	kfree(dev->groups);
 	vduse_dev_destroy(dev);
@@ -1810,12 +1915,16 @@ static bool vduse_validate_config(struct vduse_dev_config *config,
 			 sizeof(config->reserved)))
 		return false;
 
-	if (api_version < VDUSE_API_VERSION_1 && config->ngroups)
+	if (api_version < VDUSE_API_VERSION_1 &&
+	    (config->ngroups || config->nas))
 		return false;
 
 	if (api_version >= VDUSE_API_VERSION_1 && config->ngroups > 0xffff)
 		return false;
 
+	if (api_version >= VDUSE_API_VERSION_1 && config->nas > 0xffff)
+		return false;
+
 	if (config->vq_align > PAGE_SIZE)
 		return false;
 
@@ -1879,7 +1988,8 @@ static ssize_t bounce_size_store(struct device *device,
 
 	ret = -EPERM;
 	mutex_lock(&dev->domain_lock);
-	if (dev->as.domain)
+	/* Assuming that if the first domain is allocated, all are allocated */
+	if (dev->as[0].domain)
 		goto unlock;
 
 	ret = kstrtouint(buf, 10, &bounce_size);
@@ -1940,6 +2050,13 @@ static int vduse_create_dev(struct vduse_dev_config *config,
 	for (u32 i = 0; i < dev->ngroups; ++i)
 		dev->groups[i].dev = dev;
 
+	dev->nas = (dev->api_version < 1) ? 1 : (config->nas ?: 1);
+	dev->as = kcalloc(dev->nas, sizeof(dev->as[0]), GFP_KERNEL);
+	if (!dev->as)
+		goto err_as;
+	for (int i = 0; i < dev->nas; i++)
+		mutex_init(&dev->as[i].mem_lock);
+
 	dev->name = kstrdup(config->name, GFP_KERNEL);
 	if (!dev->name)
 		goto err_str;
@@ -1976,6 +2093,8 @@ static int vduse_create_dev(struct vduse_dev_config *config,
 err_idr:
 	kfree(dev->name);
 err_str:
+	kfree(dev->as);
+err_as:
 	kfree(dev->groups);
 err_vq_groups:
 	vduse_dev_destroy(dev);
@@ -2101,7 +2220,7 @@ static int vduse_dev_init_vdpa(struct vduse_dev *dev, const char *name)
 
 	vdev = vdpa_alloc_device(struct vduse_vdpa, vdpa, dev->dev,
 				 &vduse_vdpa_config_ops, &vduse_map_ops,
-				 dev->ngroups, 1, name, true);
+				 dev->ngroups, dev->nas, name, true);
 	if (IS_ERR(vdev))
 		return PTR_ERR(vdev);
 
@@ -2130,11 +2249,20 @@ static int vdpa_dev_add(struct vdpa_mgmt_dev *mdev, const char *name,
 		return ret;
 
 	mutex_lock(&dev->domain_lock);
-	if (!dev->as.domain)
-		dev->as.domain = vduse_domain_create(VDUSE_IOVA_SIZE - 1,
-						  dev->bounce_size);
+	ret = 0;
+
+	for (int i = 0; i < dev->nas; ++i) {
+		dev->as[i].domain = vduse_domain_create(VDUSE_IOVA_SIZE - 1,
+							dev->bounce_size);
+		if (!dev->as[i].domain) {
+			ret = -ENOMEM;
+			for (int j = 0; j < i; ++j)
+				vduse_domain_destroy(dev->as[j].domain);
+		}
+	}
+
 	mutex_unlock(&dev->domain_lock);
-	if (!dev->as.domain) {
+	if (ret == -ENOMEM) {
 		put_device(&dev->vdev->vdpa.dev);
 		return -ENOMEM;
 	}
@@ -2143,8 +2271,12 @@ static int vdpa_dev_add(struct vdpa_mgmt_dev *mdev, const char *name,
 	if (ret) {
 		put_device(&dev->vdev->vdpa.dev);
 		mutex_lock(&dev->domain_lock);
-		vduse_domain_destroy(dev->as.domain);
-		dev->as.domain = NULL;
+		for (int i = 0; i < dev->nas; i++) {
+			if (dev->as[i].domain) {
+				vduse_domain_destroy(dev->as[i].domain);
+				dev->as[i].domain = NULL;
+			}
+		}
 		mutex_unlock(&dev->domain_lock);
 		return ret;
 	}
diff --git a/include/uapi/linux/vduse.h b/include/uapi/linux/vduse.h
index b1c0e47d71fb..54da965a65dc 100644
--- a/include/uapi/linux/vduse.h
+++ b/include/uapi/linux/vduse.h
@@ -47,7 +47,8 @@ struct vduse_dev_config {
 	__u32 vq_num;
 	__u32 vq_align;
 	__u32 ngroups; /* if VDUSE_API_VERSION >= 1 */
-	__u32 reserved[12];
+	__u32 nas; /* if VDUSE_API_VERSION >= 1 */
+	__u32 reserved[11];
 	__u32 config_size;
 	__u8 config[];
 };
@@ -82,6 +83,18 @@ struct vduse_iotlb_entry {
 	__u8 perm;
 };
 
+/**
+ * struct vduse_iotlb_entry_v2 - entry of IOTLB to describe one IOVA region in an ASID
+ * @v1: the original vduse_iotlb_entry
+ * @asid: address space ID of the IOVA region
+ *
+ * Structure used by VDUSE_IOTLB_GET_FD ioctl to find an overlapped IOVA region.
+ */
+struct vduse_iotlb_entry_v2 {
+	struct vduse_iotlb_entry v1;
+	__u32 asid;
+};
+
 /*
  * Find the first IOVA region that overlaps with the range [start, last]
  * and return the corresponding file descriptor. Return -EINVAL means the
@@ -172,6 +185,16 @@ struct vduse_vq_group {
 	__u32 group;
 };
 
+/**
+ * struct vduse_vq_group - virtqueue group
+ @ @group: Index of the virtqueue group
+ * @asid: Address space ID of the group
+ */
+struct vduse_vq_group_asid {
+	__u32 group;
+	__u32 asid;
+};
+
 /**
  * struct vduse_vq_info - information of a virtqueue
  * @index: virtqueue index
@@ -231,6 +254,7 @@ struct vduse_vq_eventfd {
  * @uaddr: start address of userspace memory, it must be aligned to page size
  * @iova: start of the IOVA region
  * @size: size of the IOVA region
+ * @asid: Address space ID of the IOVA region
  * @reserved: for future use, needs to be initialized to zero
  *
  * Structure used by VDUSE_IOTLB_REG_UMEM and VDUSE_IOTLB_DEREG_UMEM
@@ -240,7 +264,8 @@ struct vduse_iova_umem {
 	__u64 uaddr;
 	__u64 iova;
 	__u64 size;
-	__u64 reserved[3];
+	__u32 asid;
+	__u32 reserved[5];
 };
 
 /* Register userspace memory for IOVA regions */
@@ -254,6 +279,7 @@ struct vduse_iova_umem {
  * @start: start of the IOVA region
  * @last: last of the IOVA region
  * @capability: capability of the IOVA regsion
+ * @asid: Address space ID of the IOVA region, only if device API version >= 1
  * @reserved: for future use, needs to be initialized to zero
  *
  * Structure used by VDUSE_IOTLB_GET_INFO ioctl to get information of
@@ -264,7 +290,8 @@ struct vduse_iova_info {
 	__u64 last;
 #define VDUSE_IOVA_CAP_UMEM (1 << 0)
 	__u64 capability;
-	__u64 reserved[3];
+	__u32 asid; /* Only if device API version >= 1 */
+	__u32 reserved[5];
 };
 
 /*
@@ -287,6 +314,7 @@ enum vduse_req_type {
 	VDUSE_SET_STATUS,
 	VDUSE_UPDATE_IOTLB,
 	VDUSE_GET_VQ_GROUP,
+	VDUSE_SET_VQ_GROUP_ASID,
 };
 
 /**
@@ -321,6 +349,18 @@ struct vduse_iova_range {
 	__u64 last;
 };
 
+/**
+ * struct vduse_iova_range - IOVA range [start, last] if API_VERSION >= 1
+ * @start: start of the IOVA range
+ * @last: last of the IOVA range
+ * @asid: address space ID of the IOVA range
+ */
+struct vduse_iova_range_v2 {
+	__u64 start;
+	__u64 last;
+	__u32 asid;
+};
+
 /**
  * struct vduse_dev_request - control request
  * @type: request type
@@ -330,6 +370,8 @@ struct vduse_iova_range {
  * @s: device status
  * @iova: IOVA range for updating
  * @vq_group: virtqueue group of a virtqueue
+ * @iova_v2: IOVA range for updating if API_VERSION >= 1
+ * @vq_group_asid: ASID of a virtqueue group
  * @padding: padding
  *
  * Structure used by read(2) on /dev/vduse/$NAME.
@@ -342,8 +384,10 @@ struct vduse_dev_request {
 		struct vduse_vq_state vq_state;
 		struct vduse_dev_status s;
 		struct vduse_iova_range iova;
-		/* Only if vduse api version >= 1 */;
+		/* Following members only if vduse api version >= 1 */;
 		struct vduse_vq_group vq_group;
+		struct vduse_iova_range_v2 iova_v2;
+		struct vduse_vq_group_asid vq_group_asid;
 		__u32 padding[32];
 	};
 };
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 6/6] vduse: bump version number
  2025-08-26 11:27 [PATCH 0/6] Add multiple address spaces support to VDUSE Eugenio Pérez
                   ` (4 preceding siblings ...)
  2025-08-26 11:27 ` [PATCH 5/6] vduse: add vq group asid support Eugenio Pérez
@ 2025-08-26 11:27 ` Eugenio Pérez
  5 siblings, 0 replies; 24+ messages in thread
From: Eugenio Pérez @ 2025-08-26 11:27 UTC (permalink / raw)
  To: Michael S . Tsirkin 
  Cc: Eugenio Pérez, Cindy Lu, Stefano Garzarella, Xuan Zhuo,
	Laurent Vivier, virtualization, linux-kernel, jasowang,
	Yongji Xie, Maxime Coquelin

Finalize the series by advertising VDUSE API v1 support to userspace.

Now that all required infrastructure for v1 (ASIDs, VQ groups,
update_iotlb_v2) is in place, VDUSE devices can opt in to the new
features.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 drivers/vdpa/vdpa_user/vduse_dev.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
index 2fb227713972..e37e0352447e 100644
--- a/drivers/vdpa/vdpa_user/vduse_dev.c
+++ b/drivers/vdpa/vdpa_user/vduse_dev.c
@@ -2122,7 +2122,7 @@ static long vduse_ioctl(struct file *file, unsigned int cmd,
 			break;
 
 		ret = -EINVAL;
-		if (api_version > VDUSE_API_VERSION)
+		if (api_version > VDUSE_API_VERSION_1)
 			break;
 
 		ret = 0;
@@ -2189,7 +2189,7 @@ static int vduse_open(struct inode *inode, struct file *file)
 	if (!control)
 		return -ENOMEM;
 
-	control->api_version = VDUSE_API_VERSION;
+	control->api_version = VDUSE_API_VERSION_1;
 	file->private_data = control;
 
 	return 0;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/6] vduse: add vq group support
  2025-08-26 11:27 ` [PATCH 2/6] vduse: add vq group support Eugenio Pérez
@ 2025-09-01  1:59   ` Jason Wang
  2025-09-01  2:31     ` Jason Wang
  2025-09-01  8:39     ` Eugenio Perez Martin
  0 siblings, 2 replies; 24+ messages in thread
From: Jason Wang @ 2025-09-01  1:59 UTC (permalink / raw)
  To: Eugenio Pérez
  Cc: Michael S . Tsirkin, Cindy Lu, Stefano Garzarella, Xuan Zhuo,
	Laurent Vivier, virtualization, linux-kernel, Yongji Xie,
	Maxime Coquelin

On Tue, Aug 26, 2025 at 7:27 PM Eugenio Pérez <eperezma@redhat.com> wrote:
>
> This allows sepparate the different virtqueues in groups that shares the
> same address space.  Asking the VDUSE device for the groups of the vq at
> the beginning as they're needed for the DMA API.
>
> Allocating 3 vq groups as net is the device that need the most groups:
> * Dataplane (guest passthrough)
> * CVQ
> * Shadowed vrings.
>
> Future versions of the series can include dynamic allocation of the
> groups array so VDUSE can declare more groups.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
> v1: Fix: Remove BIT_ULL(VIRTIO_S_*), as _S_ is already the bit (Maxime)
>
> RFC v3:
> * Increase VDUSE_MAX_VQ_GROUPS to 0xffff (Jason). It was set to a lower
>   value to reduce memory consumption, but vqs are already limited to
>   that value and userspace VDUSE is able to allocate that many vqs.
> * Remove the descs vq group capability as it will not be used and we can
>   add it on top.
> * Do not ask for vq groups in number of vq groups < 2.
> * Move the valid vq groups range check to vduse_validate_config.
>
> RFC v2:
> * Cache group information in kernel, as we need to provide the vq map
>   tokens properly.
> * Add descs vq group to optimize SVQ forwarding and support indirect
>   descriptors out of the box.
> ---
>  drivers/vdpa/vdpa_user/vduse_dev.c | 51 ++++++++++++++++++++++++++++--
>  include/uapi/linux/vduse.h         | 21 +++++++++++-
>  2 files changed, 68 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
> index e7bced0b5542..0f4e36dd167e 100644
> --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> @@ -58,6 +58,7 @@ struct vduse_virtqueue {
>         struct vdpa_vq_state state;
>         bool ready;
>         bool kicked;
> +       u32 vq_group;
>         spinlock_t kick_lock;
>         spinlock_t irq_lock;
>         struct eventfd_ctx *kickfd;
> @@ -114,6 +115,7 @@ struct vduse_dev {
>         u8 status;
>         u32 vq_num;
>         u32 vq_align;
> +       u32 ngroups;
>         struct vduse_umem *umem;
>         struct mutex mem_lock;
>         unsigned int bounce_size;
> @@ -592,6 +594,13 @@ static int vduse_vdpa_set_vq_state(struct vdpa_device *vdpa, u16 idx,
>         return 0;
>  }
>
> +static u32 vduse_get_vq_group(struct vdpa_device *vdpa, u16 idx)
> +{
> +       struct vduse_dev *dev = vdpa_to_vduse(vdpa);
> +
> +       return dev->vqs[idx]->vq_group;
> +}
> +
>  static int vduse_vdpa_get_vq_state(struct vdpa_device *vdpa, u16 idx,
>                                 struct vdpa_vq_state *state)
>  {
> @@ -678,6 +687,28 @@ static u8 vduse_vdpa_get_status(struct vdpa_device *vdpa)
>         return dev->status;
>  }
>
> +static int vduse_fill_vq_groups(struct vduse_dev *dev)
> +{
> +       /* All vqs and descs must be in vq group 0 if ngroups < 2 */
> +       if (dev->ngroups < 2)
> +               return 0;
> +
> +       for (int i = 0; i < dev->vdev->vdpa.nvqs; ++i) {
> +               struct vduse_dev_msg msg = { 0 };
> +               int ret;
> +
> +               msg.req.type = VDUSE_GET_VQ_GROUP;
> +               msg.req.vq_group.index = i;
> +               ret = vduse_dev_msg_sync(dev, &msg);
> +               if (ret)
> +                       return ret;
> +
> +               dev->vqs[i]->vq_group = msg.resp.vq_group.group;
> +       }
> +
> +       return 0;
> +}
> +
>  static void vduse_vdpa_set_status(struct vdpa_device *vdpa, u8 status)
>  {
>         struct vduse_dev *dev = vdpa_to_vduse(vdpa);
> @@ -685,6 +716,11 @@ static void vduse_vdpa_set_status(struct vdpa_device *vdpa, u8 status)
>         if (vduse_dev_set_status(dev, status))
>                 return;
>
> +       if (((dev->status ^ status) & VIRTIO_CONFIG_S_FEATURES_OK) &&
> +           (status & VIRTIO_CONFIG_S_FEATURES_OK))
> +               if (vduse_fill_vq_groups(dev))
> +                       return;

I may lose some context but I think we've agreed that we need to
extend the status response for this instead of having multiple
indepdent response.

> +
>         dev->status = status;
>  }
>
> @@ -789,6 +825,7 @@ static const struct vdpa_config_ops vduse_vdpa_config_ops = {
>         .set_vq_cb              = vduse_vdpa_set_vq_cb,
>         .set_vq_num             = vduse_vdpa_set_vq_num,
>         .get_vq_size            = vduse_vdpa_get_vq_size,
> +       .get_vq_group           = vduse_get_vq_group,
>         .set_vq_ready           = vduse_vdpa_set_vq_ready,
>         .get_vq_ready           = vduse_vdpa_get_vq_ready,
>         .set_vq_state           = vduse_vdpa_set_vq_state,
> @@ -1737,12 +1774,19 @@ static bool features_is_valid(struct vduse_dev_config *config)
>         return true;
>  }
>
> -static bool vduse_validate_config(struct vduse_dev_config *config)
> +static bool vduse_validate_config(struct vduse_dev_config *config,
> +                                 u64 api_version)
>  {
>         if (!is_mem_zero((const char *)config->reserved,
>                          sizeof(config->reserved)))
>                 return false;
>
> +       if (api_version < VDUSE_API_VERSION_1 && config->ngroups)
> +               return false;
> +
> +       if (api_version >= VDUSE_API_VERSION_1 && config->ngroups > 0xffff)
> +               return false;

Let's use a macro instead of magic number.

> +
>         if (config->vq_align > PAGE_SIZE)
>                 return false;
>
> @@ -1858,6 +1902,7 @@ static int vduse_create_dev(struct vduse_dev_config *config,
>         dev->device_features = config->features;
>         dev->device_id = config->device_id;
>         dev->vendor_id = config->vendor_id;
> +       dev->ngroups = (dev->api_version < 1) ? 1 : (config->ngroups ?: 1);
>         dev->name = kstrdup(config->name, GFP_KERNEL);
>         if (!dev->name)
>                 goto err_str;
> @@ -1936,7 +1981,7 @@ static long vduse_ioctl(struct file *file, unsigned int cmd,
>                         break;
>
>                 ret = -EINVAL;
> -               if (vduse_validate_config(&config) == false)
> +               if (!vduse_validate_config(&config, control->api_version))
>                         break;
>
>                 buf = vmemdup_user(argp + size, config.config_size);
> @@ -2017,7 +2062,7 @@ static int vduse_dev_init_vdpa(struct vduse_dev *dev, const char *name)
>
>         vdev = vdpa_alloc_device(struct vduse_vdpa, vdpa, dev->dev,
>                                  &vduse_vdpa_config_ops, &vduse_map_ops,
> -                                1, 1, name, true);
> +                                dev->ngroups, 1, name, true);
>         if (IS_ERR(vdev))
>                 return PTR_ERR(vdev);
>
> diff --git a/include/uapi/linux/vduse.h b/include/uapi/linux/vduse.h
> index 9a56d0416bfe..b1c0e47d71fb 100644
> --- a/include/uapi/linux/vduse.h
> +++ b/include/uapi/linux/vduse.h
> @@ -31,6 +31,7 @@
>   * @features: virtio features
>   * @vq_num: the number of virtqueues
>   * @vq_align: the allocation alignment of virtqueue's metadata
> + * @ngroups: number of vq groups that VDUSE device declares
>   * @reserved: for future use, needs to be initialized to zero
>   * @config_size: the size of the configuration space
>   * @config: the buffer of the configuration space
> @@ -45,7 +46,8 @@ struct vduse_dev_config {
>         __u64 features;
>         __u32 vq_num;
>         __u32 vq_align;
> -       __u32 reserved[13];
> +       __u32 ngroups; /* if VDUSE_API_VERSION >= 1 */
> +       __u32 reserved[12];
>         __u32 config_size;
>         __u8 config[];
>  };
> @@ -160,6 +162,16 @@ struct vduse_vq_state_packed {
>         __u16 last_used_idx;
>  };
>
> +/**
> + * struct vduse_vq_group - virtqueue group
> + * @index: Index of the virtqueue
> + * @group: Virtqueue group
> + */
> +struct vduse_vq_group {
> +       __u32 index;
> +       __u32 group;
> +};
> +
>  /**
>   * struct vduse_vq_info - information of a virtqueue
>   * @index: virtqueue index
> @@ -274,6 +286,7 @@ enum vduse_req_type {
>         VDUSE_GET_VQ_STATE,
>         VDUSE_SET_STATUS,
>         VDUSE_UPDATE_IOTLB,
> +       VDUSE_GET_VQ_GROUP,
>  };
>
>  /**
> @@ -316,6 +329,7 @@ struct vduse_iova_range {
>   * @vq_state: virtqueue state, only index field is available
>   * @s: device status
>   * @iova: IOVA range for updating
> + * @vq_group: virtqueue group of a virtqueue
>   * @padding: padding
>   *
>   * Structure used by read(2) on /dev/vduse/$NAME.
> @@ -328,6 +342,8 @@ struct vduse_dev_request {
>                 struct vduse_vq_state vq_state;
>                 struct vduse_dev_status s;
>                 struct vduse_iova_range iova;
> +               /* Only if vduse api version >= 1 */;
> +               struct vduse_vq_group vq_group;
>                 __u32 padding[32];
>         };
>  };
> @@ -338,6 +354,7 @@ struct vduse_dev_request {
>   * @result: the result of request
>   * @reserved: for future use, needs to be initialized to zero
>   * @vq_state: virtqueue state
> + * @vq_group: virtqueue group of a virtqueue
>   * @padding: padding
>   *
>   * Structure used by write(2) on /dev/vduse/$NAME.
> @@ -350,6 +367,8 @@ struct vduse_dev_response {
>         __u32 reserved[4];
>         union {
>                 struct vduse_vq_state vq_state;
> +               /* Only if vduse api version >= 1 */
> +               struct vduse_vq_group vq_group;
>                 __u32 padding[32];
>         };
>  };
> --
> 2.51.0
>

Thanks


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/6] vduse: return internal vq group struct as map token
  2025-08-26 11:27 ` [PATCH 3/6] vduse: return internal vq group struct as map token Eugenio Pérez
@ 2025-09-01  2:25   ` Jason Wang
  2025-09-01  7:27     ` Eugenio Perez Martin
  0 siblings, 1 reply; 24+ messages in thread
From: Jason Wang @ 2025-09-01  2:25 UTC (permalink / raw)
  To: Eugenio Pérez
  Cc: Michael S . Tsirkin, Cindy Lu, Stefano Garzarella, Xuan Zhuo,
	Laurent Vivier, virtualization, linux-kernel, Yongji Xie,
	Maxime Coquelin

On Tue, Aug 26, 2025 at 7:27 PM Eugenio Pérez <eperezma@redhat.com> wrote:
>
> Return the internal struct that represents the vq group as virtqueue map
> token, instead of the device.  This allows the map functions to access
> the information per group.
>
> At this moment all the virtqueues share the same vq group, that only
> can point to ASID 0.  This change prepares the infrastructure for actual
> per-group address space handling
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
> v3:
> * Make the vq groups a dynamic array to support an arbitrary number of
>   them.
> ---
>  drivers/vdpa/vdpa_user/vduse_dev.c | 52 ++++++++++++++++++++++++------
>  include/linux/virtio.h             |  6 ++--
>  2 files changed, 46 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
> index 0f4e36dd167e..cdb3dc2b5e3f 100644
> --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> @@ -22,6 +22,7 @@
>  #include <linux/uio.h>
>  #include <linux/vdpa.h>
>  #include <linux/nospec.h>
> +#include <linux/virtio.h>
>  #include <linux/vmalloc.h>
>  #include <linux/sched/mm.h>
>  #include <uapi/linux/vduse.h>
> @@ -84,6 +85,10 @@ struct vduse_umem {
>         struct mm_struct *mm;
>  };
>
> +struct vduse_vq_group_int {
> +       struct vduse_dev *dev;
> +};

Nit: I don't get the meaning of the "int" suffix.

Thanks


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/6] vduse: create vduse_as to make it an array
  2025-08-26 11:27 ` [PATCH 4/6] vduse: create vduse_as to make it an array Eugenio Pérez
@ 2025-09-01  2:27   ` Jason Wang
  0 siblings, 0 replies; 24+ messages in thread
From: Jason Wang @ 2025-09-01  2:27 UTC (permalink / raw)
  To: Eugenio Pérez
  Cc: Michael S . Tsirkin, Cindy Lu, Stefano Garzarella, Xuan Zhuo,
	Laurent Vivier, virtualization, linux-kernel, Yongji Xie,
	Maxime Coquelin

On Tue, Aug 26, 2025 at 7:27 PM Eugenio Pérez <eperezma@redhat.com> wrote:
>
> This is a first step so we can make more than one different address
> spaces.  No change on the colde flow intended.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---

Acked-by: Jason Wang <jasowang@redhat.com>

Thanks


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/6] vduse: add vq group support
  2025-09-01  1:59   ` Jason Wang
@ 2025-09-01  2:31     ` Jason Wang
  2025-09-01  8:39     ` Eugenio Perez Martin
  1 sibling, 0 replies; 24+ messages in thread
From: Jason Wang @ 2025-09-01  2:31 UTC (permalink / raw)
  To: Eugenio Pérez
  Cc: Michael S . Tsirkin, Cindy Lu, Stefano Garzarella, Xuan Zhuo,
	Laurent Vivier, virtualization, linux-kernel, Yongji Xie,
	Maxime Coquelin

On Mon, Sep 1, 2025 at 9:59 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Tue, Aug 26, 2025 at 7:27 PM Eugenio Pérez <eperezma@redhat.com> wrote:
> >
> > This allows sepparate the different virtqueues in groups that shares the
> > same address space.  Asking the VDUSE device for the groups of the vq at
> > the beginning as they're needed for the DMA API.
> >
> > Allocating 3 vq groups as net is the device that need the most groups:
> > * Dataplane (guest passthrough)
> > * CVQ
> > * Shadowed vrings.
> >
> > Future versions of the series can include dynamic allocation of the
> > groups array so VDUSE can declare more groups.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> > v1: Fix: Remove BIT_ULL(VIRTIO_S_*), as _S_ is already the bit (Maxime)
> >
> > RFC v3:
> > * Increase VDUSE_MAX_VQ_GROUPS to 0xffff (Jason). It was set to a lower
> >   value to reduce memory consumption, but vqs are already limited to
> >   that value and userspace VDUSE is able to allocate that many vqs.
> > * Remove the descs vq group capability as it will not be used and we can
> >   add it on top.
> > * Do not ask for vq groups in number of vq groups < 2.
> > * Move the valid vq groups range check to vduse_validate_config.
> >
> > RFC v2:
> > * Cache group information in kernel, as we need to provide the vq map
> >   tokens properly.
> > * Add descs vq group to optimize SVQ forwarding and support indirect
> >   descriptors out of the box.
> > ---
> >  drivers/vdpa/vdpa_user/vduse_dev.c | 51 ++++++++++++++++++++++++++++--
> >  include/uapi/linux/vduse.h         | 21 +++++++++++-
> >  2 files changed, 68 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
> > index e7bced0b5542..0f4e36dd167e 100644
> > --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> > +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> > @@ -58,6 +58,7 @@ struct vduse_virtqueue {
> >         struct vdpa_vq_state state;
> >         bool ready;
> >         bool kicked;
> > +       u32 vq_group;
> >         spinlock_t kick_lock;
> >         spinlock_t irq_lock;
> >         struct eventfd_ctx *kickfd;
> > @@ -114,6 +115,7 @@ struct vduse_dev {
> >         u8 status;
> >         u32 vq_num;
> >         u32 vq_align;
> > +       u32 ngroups;
> >         struct vduse_umem *umem;
> >         struct mutex mem_lock;
> >         unsigned int bounce_size;
> > @@ -592,6 +594,13 @@ static int vduse_vdpa_set_vq_state(struct vdpa_device *vdpa, u16 idx,
> >         return 0;
> >  }
> >
> > +static u32 vduse_get_vq_group(struct vdpa_device *vdpa, u16 idx)
> > +{
> > +       struct vduse_dev *dev = vdpa_to_vduse(vdpa);
> > +
> > +       return dev->vqs[idx]->vq_group;
> > +}
> > +
> >  static int vduse_vdpa_get_vq_state(struct vdpa_device *vdpa, u16 idx,
> >                                 struct vdpa_vq_state *state)
> >  {
> > @@ -678,6 +687,28 @@ static u8 vduse_vdpa_get_status(struct vdpa_device *vdpa)
> >         return dev->status;
> >  }
> >
> > +static int vduse_fill_vq_groups(struct vduse_dev *dev)
> > +{
> > +       /* All vqs and descs must be in vq group 0 if ngroups < 2 */
> > +       if (dev->ngroups < 2)
> > +               return 0;
> > +
> > +       for (int i = 0; i < dev->vdev->vdpa.nvqs; ++i) {
> > +               struct vduse_dev_msg msg = { 0 };
> > +               int ret;
> > +
> > +               msg.req.type = VDUSE_GET_VQ_GROUP;
> > +               msg.req.vq_group.index = i;
> > +               ret = vduse_dev_msg_sync(dev, &msg);
> > +               if (ret)
> > +                       return ret;
> > +
> > +               dev->vqs[i]->vq_group = msg.resp.vq_group.group;
> > +       }
> > +
> > +       return 0;
> > +}
> > +
> >  static void vduse_vdpa_set_status(struct vdpa_device *vdpa, u8 status)
> >  {
> >         struct vduse_dev *dev = vdpa_to_vduse(vdpa);
> > @@ -685,6 +716,11 @@ static void vduse_vdpa_set_status(struct vdpa_device *vdpa, u8 status)
> >         if (vduse_dev_set_status(dev, status))
> >                 return;
> >
> > +       if (((dev->status ^ status) & VIRTIO_CONFIG_S_FEATURES_OK) &&
> > +           (status & VIRTIO_CONFIG_S_FEATURES_OK))
> > +               if (vduse_fill_vq_groups(dev))
> > +                       return;
>
> I may lose some context but I think we've agreed that we need to
> extend the status response for this instead of having multiple
> indepdent response.

Btw, I wonder why don't we get the vq group per .get_vq_group()

Thanks

>
> > +
> >         dev->status = status;
> >  }
> >
> > @@ -789,6 +825,7 @@ static const struct vdpa_config_ops vduse_vdpa_config_ops = {
> >         .set_vq_cb              = vduse_vdpa_set_vq_cb,
> >         .set_vq_num             = vduse_vdpa_set_vq_num,
> >         .get_vq_size            = vduse_vdpa_get_vq_size,
> > +       .get_vq_group           = vduse_get_vq_group,
> >         .set_vq_ready           = vduse_vdpa_set_vq_ready,
> >         .get_vq_ready           = vduse_vdpa_get_vq_ready,
> >         .set_vq_state           = vduse_vdpa_set_vq_state,
> > @@ -1737,12 +1774,19 @@ static bool features_is_valid(struct vduse_dev_config *config)
> >         return true;
> >  }
> >
> > -static bool vduse_validate_config(struct vduse_dev_config *config)
> > +static bool vduse_validate_config(struct vduse_dev_config *config,
> > +                                 u64 api_version)
> >  {
> >         if (!is_mem_zero((const char *)config->reserved,
> >                          sizeof(config->reserved)))
> >                 return false;
> >
> > +       if (api_version < VDUSE_API_VERSION_1 && config->ngroups)
> > +               return false;
> > +
> > +       if (api_version >= VDUSE_API_VERSION_1 && config->ngroups > 0xffff)
> > +               return false;
>
> Let's use a macro instead of magic number.
>
> > +
> >         if (config->vq_align > PAGE_SIZE)
> >                 return false;
> >
> > @@ -1858,6 +1902,7 @@ static int vduse_create_dev(struct vduse_dev_config *config,
> >         dev->device_features = config->features;
> >         dev->device_id = config->device_id;
> >         dev->vendor_id = config->vendor_id;
> > +       dev->ngroups = (dev->api_version < 1) ? 1 : (config->ngroups ?: 1);
> >         dev->name = kstrdup(config->name, GFP_KERNEL);
> >         if (!dev->name)
> >                 goto err_str;
> > @@ -1936,7 +1981,7 @@ static long vduse_ioctl(struct file *file, unsigned int cmd,
> >                         break;
> >
> >                 ret = -EINVAL;
> > -               if (vduse_validate_config(&config) == false)
> > +               if (!vduse_validate_config(&config, control->api_version))
> >                         break;
> >
> >                 buf = vmemdup_user(argp + size, config.config_size);
> > @@ -2017,7 +2062,7 @@ static int vduse_dev_init_vdpa(struct vduse_dev *dev, const char *name)
> >
> >         vdev = vdpa_alloc_device(struct vduse_vdpa, vdpa, dev->dev,
> >                                  &vduse_vdpa_config_ops, &vduse_map_ops,
> > -                                1, 1, name, true);
> > +                                dev->ngroups, 1, name, true);
> >         if (IS_ERR(vdev))
> >                 return PTR_ERR(vdev);
> >
> > diff --git a/include/uapi/linux/vduse.h b/include/uapi/linux/vduse.h
> > index 9a56d0416bfe..b1c0e47d71fb 100644
> > --- a/include/uapi/linux/vduse.h
> > +++ b/include/uapi/linux/vduse.h
> > @@ -31,6 +31,7 @@
> >   * @features: virtio features
> >   * @vq_num: the number of virtqueues
> >   * @vq_align: the allocation alignment of virtqueue's metadata
> > + * @ngroups: number of vq groups that VDUSE device declares
> >   * @reserved: for future use, needs to be initialized to zero
> >   * @config_size: the size of the configuration space
> >   * @config: the buffer of the configuration space
> > @@ -45,7 +46,8 @@ struct vduse_dev_config {
> >         __u64 features;
> >         __u32 vq_num;
> >         __u32 vq_align;
> > -       __u32 reserved[13];
> > +       __u32 ngroups; /* if VDUSE_API_VERSION >= 1 */
> > +       __u32 reserved[12];
> >         __u32 config_size;
> >         __u8 config[];
> >  };
> > @@ -160,6 +162,16 @@ struct vduse_vq_state_packed {
> >         __u16 last_used_idx;
> >  };
> >
> > +/**
> > + * struct vduse_vq_group - virtqueue group
> > + * @index: Index of the virtqueue
> > + * @group: Virtqueue group
> > + */
> > +struct vduse_vq_group {
> > +       __u32 index;
> > +       __u32 group;
> > +};
> > +
> >  /**
> >   * struct vduse_vq_info - information of a virtqueue
> >   * @index: virtqueue index
> > @@ -274,6 +286,7 @@ enum vduse_req_type {
> >         VDUSE_GET_VQ_STATE,
> >         VDUSE_SET_STATUS,
> >         VDUSE_UPDATE_IOTLB,
> > +       VDUSE_GET_VQ_GROUP,
> >  };
> >
> >  /**
> > @@ -316,6 +329,7 @@ struct vduse_iova_range {
> >   * @vq_state: virtqueue state, only index field is available
> >   * @s: device status
> >   * @iova: IOVA range for updating
> > + * @vq_group: virtqueue group of a virtqueue
> >   * @padding: padding
> >   *
> >   * Structure used by read(2) on /dev/vduse/$NAME.
> > @@ -328,6 +342,8 @@ struct vduse_dev_request {
> >                 struct vduse_vq_state vq_state;
> >                 struct vduse_dev_status s;
> >                 struct vduse_iova_range iova;
> > +               /* Only if vduse api version >= 1 */;
> > +               struct vduse_vq_group vq_group;
> >                 __u32 padding[32];
> >         };
> >  };
> > @@ -338,6 +354,7 @@ struct vduse_dev_request {
> >   * @result: the result of request
> >   * @reserved: for future use, needs to be initialized to zero
> >   * @vq_state: virtqueue state
> > + * @vq_group: virtqueue group of a virtqueue
> >   * @padding: padding
> >   *
> >   * Structure used by write(2) on /dev/vduse/$NAME.
> > @@ -350,6 +367,8 @@ struct vduse_dev_response {
> >         __u32 reserved[4];
> >         union {
> >                 struct vduse_vq_state vq_state;
> > +               /* Only if vduse api version >= 1 */
> > +               struct vduse_vq_group vq_group;
> >                 __u32 padding[32];
> >         };
> >  };
> > --
> > 2.51.0
> >
>
> Thanks


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 5/6] vduse: add vq group asid support
  2025-08-26 11:27 ` [PATCH 5/6] vduse: add vq group asid support Eugenio Pérez
@ 2025-09-01  2:46   ` Jason Wang
  2025-09-01  9:11     ` Eugenio Perez Martin
  0 siblings, 1 reply; 24+ messages in thread
From: Jason Wang @ 2025-09-01  2:46 UTC (permalink / raw)
  To: Eugenio Pérez
  Cc: Michael S . Tsirkin, Cindy Lu, Stefano Garzarella, Xuan Zhuo,
	Laurent Vivier, virtualization, linux-kernel, Yongji Xie,
	Maxime Coquelin

On Tue, Aug 26, 2025 at 7:27 PM Eugenio Pérez <eperezma@redhat.com> wrote:
>
> Add support for assigning Address Space Identifiers (ASIDs) to each VQ
> group.  This enables mapping each group into a distinct memory space.
>
> Now that the driver can change ASID in the middle of operation, the
> domain that each vq address point is also protected by domain_lock.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
> v3:
> * Increase VDUSE_MAX_VQ_GROUPS to 0xffff (Jason). It was set to a lower
>   value to reduce memory consumption, but vqs are already limited to
>   that value and userspace VDUSE is able to allocate that many vqs.
> * Remove TODO about merging VDUSE_IOTLB_GET_FD ioctl with
>   VDUSE_IOTLB_GET_INFO.
> * Use of array_index_nospec in VDUSE device ioctls.
> * Embed vduse_iotlb_entry into vduse_iotlb_entry_v2.
> * Move the umem mutex to asid struct so there is no contention between
>   ASIDs.
>
> v2:
> * Make iotlb entry the last one of vduse_iotlb_entry_v2 so the first
>   part of the struct is the same.
> ---
>  drivers/vdpa/vdpa_user/vduse_dev.c | 290 +++++++++++++++++++++--------
>  include/uapi/linux/vduse.h         |  52 +++++-
>  2 files changed, 259 insertions(+), 83 deletions(-)
>
> diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
> index 7d2a3ed77b1e..2fb227713972 100644
> --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> @@ -92,6 +92,7 @@ struct vduse_as {
>  };
>
>  struct vduse_vq_group_int {
> +       struct vduse_iova_domain *domain;

I'd expect this should be vduse_as. Anything I miss?

>         struct vduse_dev *dev;
>  };
>
> @@ -99,7 +100,7 @@ struct vduse_dev {
>         struct vduse_vdpa *vdev;
>         struct device *dev;
>         struct vduse_virtqueue **vqs;
> -       struct vduse_as as;
> +       struct vduse_as *as;
>         char *name;
>         struct mutex lock;
>         spinlock_t msg_lock;
> @@ -127,6 +128,7 @@ struct vduse_dev {
>         u32 vq_num;
>         u32 vq_align;
>         u32 ngroups;
> +       u32 nas;
>         struct vduse_vq_group_int *groups;
>         unsigned int bounce_size;
>         struct mutex domain_lock;
> @@ -317,7 +319,7 @@ static int vduse_dev_set_status(struct vduse_dev *dev, u8 status)
>         return vduse_dev_msg_sync(dev, &msg);
>  }
>
> -static int vduse_dev_update_iotlb(struct vduse_dev *dev,
> +static int vduse_dev_update_iotlb(struct vduse_dev *dev, u32 asid,
>                                   u64 start, u64 last)
>  {
>         struct vduse_dev_msg msg = { 0 };
> @@ -326,8 +328,14 @@ static int vduse_dev_update_iotlb(struct vduse_dev *dev,
>                 return -EINVAL;
>
>         msg.req.type = VDUSE_UPDATE_IOTLB;
> -       msg.req.iova.start = start;
> -       msg.req.iova.last = last;
> +       if (dev->api_version < VDUSE_API_VERSION_1) {
> +               msg.req.iova.start = start;
> +               msg.req.iova.last = last;
> +       } else {
> +               msg.req.iova_v2.start = start;
> +               msg.req.iova_v2.last = last;
> +               msg.req.iova_v2.asid = asid;
> +       }
>
>         return vduse_dev_msg_sync(dev, &msg);
>  }
> @@ -439,14 +447,28 @@ static __poll_t vduse_dev_poll(struct file *file, poll_table *wait)
>         return mask;
>  }
>
> +/* Force set the asid to a vq group without a message to the VDUSE device */
> +static void vduse_set_group_asid_nomsg(struct vduse_dev *dev,
> +                                      unsigned int group, unsigned int asid)
> +{
> +       guard(mutex)(&dev->domain_lock);
> +       dev->groups[group].domain = dev->as[asid].domain;
> +}
> +
>  static void vduse_dev_reset(struct vduse_dev *dev)
>  {
>         int i;
> -       struct vduse_iova_domain *domain = dev->as.domain;
>
>         /* The coherent mappings are handled in vduse_dev_free_coherent() */
> -       if (domain && domain->bounce_map)
> -               vduse_domain_reset_bounce_map(domain);
> +       for (i = 0; i < dev->nas; i++) {
> +               struct vduse_iova_domain *domain = dev->as[i].domain;
> +
> +               if (domain && domain->bounce_map)
> +                       vduse_domain_reset_bounce_map(domain);
> +       }
> +
> +       for (i = 0; i < dev->ngroups; i++)
> +               vduse_set_group_asid_nomsg(dev, i, 0);
>
>         down_write(&dev->rwsem);
>
> @@ -620,6 +642,29 @@ static union virtio_map vduse_get_vq_map(struct vdpa_device *vdpa, u16 idx)
>         return ret;
>  }
>
> +static int vduse_set_group_asid(struct vdpa_device *vdpa, unsigned int group,
> +                               unsigned int asid)
> +{
> +       struct vduse_dev *dev = vdpa_to_vduse(vdpa);
> +       struct vduse_dev_msg msg = { 0 };
> +       int r;
> +
> +       if (dev->api_version < VDUSE_API_VERSION_1 ||
> +           group >= dev->ngroups || asid >= dev->nas)
> +               return -EINVAL;
> +
> +       msg.req.type = VDUSE_SET_VQ_GROUP_ASID;
> +       msg.req.vq_group_asid.group = group;
> +       msg.req.vq_group_asid.asid = asid;
> +
> +       r = vduse_dev_msg_sync(dev, &msg);
> +       if (r < 0)
> +               return r;
> +
> +       vduse_set_group_asid_nomsg(dev, group, asid);
> +       return 0;
> +}
> +
>  static int vduse_vdpa_get_vq_state(struct vdpa_device *vdpa, u16 idx,
>                                 struct vdpa_vq_state *state)
>  {
> @@ -818,13 +863,13 @@ static int vduse_vdpa_set_map(struct vdpa_device *vdpa,
>         struct vduse_dev *dev = vdpa_to_vduse(vdpa);
>         int ret;
>
> -       ret = vduse_domain_set_map(dev->as.domain, iotlb);
> +       ret = vduse_domain_set_map(dev->as[asid].domain, iotlb);
>         if (ret)
>                 return ret;
>
> -       ret = vduse_dev_update_iotlb(dev, 0ULL, ULLONG_MAX);
> +       ret = vduse_dev_update_iotlb(dev, asid, 0ULL, ULLONG_MAX);
>         if (ret) {
> -               vduse_domain_clear_map(dev->as.domain, iotlb);
> +               vduse_domain_clear_map(dev->as[asid].domain, iotlb);
>                 return ret;
>         }
>
> @@ -867,6 +912,7 @@ static const struct vdpa_config_ops vduse_vdpa_config_ops = {
>         .get_vq_affinity        = vduse_vdpa_get_vq_affinity,
>         .reset                  = vduse_vdpa_reset,
>         .set_map                = vduse_vdpa_set_map,
> +       .set_group_asid         = vduse_set_group_asid,
>         .get_vq_map             = vduse_get_vq_map,
>         .free                   = vduse_vdpa_free,
>  };
> @@ -876,8 +922,10 @@ static void vduse_dev_sync_single_for_device(union virtio_map token,
>                                              enum dma_data_direction dir)
>  {
>         struct vduse_dev *vdev = token.group->dev;
> -       struct vduse_iova_domain *domain = vdev->as.domain;
> +       struct vduse_iova_domain *domain;
>
> +       guard(mutex)(&vdev->domain_lock);

Is this correct? I mean each AS should have its own lock instead of
having a BQL.

> +       domain = token.group->domain;
>         vduse_domain_sync_single_for_device(domain, dma_addr, size, dir);
>  }
>
> @@ -886,8 +934,10 @@ static void vduse_dev_sync_single_for_cpu(union virtio_map token,
>                                              enum dma_data_direction dir)
>  {
>         struct vduse_dev *vdev = token.group->dev;
> -       struct vduse_iova_domain *domain = vdev->as.domain;
> +       struct vduse_iova_domain *domain;
>
> +       guard(mutex)(&vdev->domain_lock);
> +       domain = token.group->domain;
>         vduse_domain_sync_single_for_cpu(domain, dma_addr, size, dir);
>  }
>
> @@ -897,8 +947,10 @@ static dma_addr_t vduse_dev_map_page(union virtio_map token, struct page *page,
>                                      unsigned long attrs)
>  {
>         struct vduse_dev *vdev = token.group->dev;
> -       struct vduse_iova_domain *domain = vdev->as.domain;
> +       struct vduse_iova_domain *domain;
>
> +       guard(mutex)(&vdev->domain_lock);
> +       domain = token.group->domain;
>         return vduse_domain_map_page(domain, page, offset, size, dir, attrs);
>  }
>
> @@ -907,8 +959,10 @@ static void vduse_dev_unmap_page(union virtio_map token, dma_addr_t dma_addr,
>                                  unsigned long attrs)
>  {
>         struct vduse_dev *vdev = token.group->dev;
> -       struct vduse_iova_domain *domain = vdev->as.domain;
> +       struct vduse_iova_domain *domain;
>
> +       guard(mutex)(&vdev->domain_lock);
> +       domain = token.group->domain;
>         return vduse_domain_unmap_page(domain, dma_addr, size, dir, attrs);
>  }
>
> @@ -916,11 +970,13 @@ static void *vduse_dev_alloc_coherent(union virtio_map token, size_t size,
>                                       dma_addr_t *dma_addr, gfp_t flag)
>  {
>         struct vduse_dev *vdev = token.group->dev;
> -       struct vduse_iova_domain *domain = vdev->as.domain;
> +       struct vduse_iova_domain *domain;
>         unsigned long iova;
>         void *addr;
>
>         *dma_addr = DMA_MAPPING_ERROR;
> +       guard(mutex)(&vdev->domain_lock);
> +       domain = token.group->domain;
>         addr = vduse_domain_alloc_coherent(domain, size,
>                                            (dma_addr_t *)&iova, flag);
>         if (!addr)
> @@ -936,16 +992,20 @@ static void vduse_dev_free_coherent(union virtio_map token, size_t size,
>                                     unsigned long attrs)
>  {
>         struct vduse_dev *vdev = token.group->dev;
> -       struct vduse_iova_domain *domain = vdev->as.domain;
> +       struct vduse_iova_domain *domain;
>
> +       guard(mutex)(&vdev->domain_lock);
> +       domain = token.group->domain;
>         vduse_domain_free_coherent(domain, size, vaddr, dma_addr, attrs);
>  }
>
>  static bool vduse_dev_need_sync(union virtio_map token, dma_addr_t dma_addr)
>  {
>         struct vduse_dev *vdev = token.group->dev;
> -       struct vduse_iova_domain *domain = vdev->as.domain;
> +       struct vduse_iova_domain *domain;
>
> +       guard(mutex)(&vdev->domain_lock);
> +       domain = token.group->domain;
>         return dma_addr < domain->bounce_size;
>  }
>
> @@ -959,8 +1019,10 @@ static int vduse_dev_mapping_error(union virtio_map token, dma_addr_t dma_addr)
>  static size_t vduse_dev_max_mapping_size(union virtio_map token)
>  {
>         struct vduse_dev *vdev = token.group->dev;
> -       struct vduse_iova_domain *domain = vdev->as.domain;
> +       struct vduse_iova_domain *domain;
>
> +       guard(mutex)(&vdev->domain_lock);
> +       domain = token.group->domain;
>         return domain->bounce_size;
>  }
>
> @@ -1101,39 +1163,40 @@ static int vduse_dev_queue_irq_work(struct vduse_dev *dev,
>         return ret;
>  }
>
> -static int vduse_dev_dereg_umem(struct vduse_dev *dev,
> +static int vduse_dev_dereg_umem(struct vduse_dev *dev, u32 asid,
>                                 u64 iova, u64 size)
>  {
>         int ret;
>
> -       mutex_lock(&dev->as.mem_lock);
> +       mutex_lock(&dev->as[asid].mem_lock);
>         ret = -ENOENT;
> -       if (!dev->as.umem)
> +       if (!dev->as[asid].umem)
>                 goto unlock;
>
>         ret = -EINVAL;
> -       if (!dev->as.domain)
> +       if (!dev->as[asid].domain)
>                 goto unlock;
>
> -       if (dev->as.umem->iova != iova || size != dev->as.domain->bounce_size)
> +       if (dev->as[asid].umem->iova != iova ||
> +           size != dev->as[asid].domain->bounce_size)
>                 goto unlock;
>
> -       vduse_domain_remove_user_bounce_pages(dev->as.domain);
> -       unpin_user_pages_dirty_lock(dev->as.umem->pages,
> -                                   dev->as.umem->npages, true);
> -       atomic64_sub(dev->as.umem->npages, &dev->as.umem->mm->pinned_vm);
> -       mmdrop(dev->as.umem->mm);
> -       vfree(dev->as.umem->pages);
> -       kfree(dev->as.umem);
> -       dev->as.umem = NULL;
> +       vduse_domain_remove_user_bounce_pages(dev->as[asid].domain);
> +       unpin_user_pages_dirty_lock(dev->as[asid].umem->pages,
> +                                   dev->as[asid].umem->npages, true);
> +       atomic64_sub(dev->as[asid].umem->npages, &dev->as[asid].umem->mm->pinned_vm);
> +       mmdrop(dev->as[asid].umem->mm);
> +       vfree(dev->as[asid].umem->pages);
> +       kfree(dev->as[asid].umem);
> +       dev->as[asid].umem = NULL;
>         ret = 0;
>  unlock:
> -       mutex_unlock(&dev->as.mem_lock);
> +       mutex_unlock(&dev->as[asid].mem_lock);
>         return ret;
>  }
>
>  static int vduse_dev_reg_umem(struct vduse_dev *dev,
> -                             u64 iova, u64 uaddr, u64 size)
> +                             u32 asid, u64 iova, u64 uaddr, u64 size)
>  {
>         struct page **page_list = NULL;
>         struct vduse_umem *umem = NULL;
> @@ -1141,14 +1204,14 @@ static int vduse_dev_reg_umem(struct vduse_dev *dev,
>         unsigned long npages, lock_limit;
>         int ret;
>
> -       if (!dev->as.domain || !dev->as.domain->bounce_map ||
> -           size != dev->as.domain->bounce_size ||
> +       if (!dev->as[asid].domain || !dev->as[asid].domain->bounce_map ||
> +           size != dev->as[asid].domain->bounce_size ||
>             iova != 0 || uaddr & ~PAGE_MASK)
>                 return -EINVAL;
>
> -       mutex_lock(&dev->as.mem_lock);
> +       mutex_lock(&dev->as[asid].mem_lock);
>         ret = -EEXIST;
> -       if (dev->as.umem)
> +       if (dev->as[asid].umem)
>                 goto unlock;
>
>         ret = -ENOMEM;
> @@ -1172,7 +1235,7 @@ static int vduse_dev_reg_umem(struct vduse_dev *dev,
>                 goto out;
>         }
>
> -       ret = vduse_domain_add_user_bounce_pages(dev->as.domain,
> +       ret = vduse_domain_add_user_bounce_pages(dev->as[asid].domain,
>                                                  page_list, pinned);
>         if (ret)
>                 goto out;
> @@ -1185,7 +1248,7 @@ static int vduse_dev_reg_umem(struct vduse_dev *dev,
>         umem->mm = current->mm;
>         mmgrab(current->mm);
>
> -       dev->as.umem = umem;
> +       dev->as[asid].umem = umem;
>  out:
>         if (ret && pinned > 0)
>                 unpin_user_pages(page_list, pinned);
> @@ -1196,7 +1259,7 @@ static int vduse_dev_reg_umem(struct vduse_dev *dev,
>                 vfree(page_list);
>                 kfree(umem);
>         }
> -       mutex_unlock(&dev->as.mem_lock);
> +       mutex_unlock(&dev->as[asid].mem_lock);
>         return ret;
>  }
>
> @@ -1228,47 +1291,66 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
>
>         switch (cmd) {
>         case VDUSE_IOTLB_GET_FD: {
> -               struct vduse_iotlb_entry entry;
> +               struct vduse_iotlb_entry_v2 entry;
>                 struct vhost_iotlb_map *map;
>                 struct vdpa_map_file *map_file;
>                 struct file *f = NULL;
> +               u32 asid;
>
>                 ret = -EFAULT;
> -               if (copy_from_user(&entry, argp, sizeof(entry)))
> -                       break;
> +               if (dev->api_version >= VDUSE_API_VERSION_1) {
> +                       if (copy_from_user(&entry, argp, sizeof(entry)))
> +                               break;
> +               } else {
> +                       entry.asid = 0;
> +                       if (copy_from_user(&entry.v1, argp,
> +                                          sizeof(entry.v1)))
> +                               break;
> +               }
>
>                 ret = -EINVAL;
> -               if (entry.start > entry.last)
> +               if (entry.v1.start > entry.v1.last)
> +                       break;
> +
> +               if (entry.asid >= dev->nas)
>                         break;
>
>                 mutex_lock(&dev->domain_lock);
> -               if (!dev->as.domain) {
> +               asid = array_index_nospec(entry.asid, dev->nas);
> +               if (!dev->as[asid].domain) {
>                         mutex_unlock(&dev->domain_lock);
>                         break;
>                 }
> -               spin_lock(&dev->as.domain->iotlb_lock);
> -               map = vhost_iotlb_itree_first(dev->as.domain->iotlb,
> -                                             entry.start, entry.last);
> +               spin_lock(&dev->as[asid].domain->iotlb_lock);
> +               map = vhost_iotlb_itree_first(dev->as[asid].domain->iotlb,
> +                                             entry.v1.start, entry.v1.last);
>                 if (map) {
>                         map_file = (struct vdpa_map_file *)map->opaque;
>                         f = get_file(map_file->file);
> -                       entry.offset = map_file->offset;
> -                       entry.start = map->start;
> -                       entry.last = map->last;
> -                       entry.perm = map->perm;
> +                       entry.v1.offset = map_file->offset;
> +                       entry.v1.start = map->start;
> +                       entry.v1.last = map->last;
> +                       entry.v1.perm = map->perm;
>                 }
> -               spin_unlock(&dev->as.domain->iotlb_lock);
> +               spin_unlock(&dev->as[asid].domain->iotlb_lock);
>                 mutex_unlock(&dev->domain_lock);
>                 ret = -EINVAL;
>                 if (!f)
>                         break;
>
>                 ret = -EFAULT;
> -               if (copy_to_user(argp, &entry, sizeof(entry))) {
> +               if (dev->api_version >= VDUSE_API_VERSION_1)
> +                       ret = copy_to_user(argp, &entry,
> +                                          sizeof(entry));
> +               else
> +                       ret = copy_to_user(argp, &entry.v1,
> +                                          sizeof(entry.v1));
> +
> +               if (ret) {
>                         fput(f);
>                         break;
>                 }
> -               ret = receive_fd(f, NULL, perm_to_file_flags(entry.perm));
> +               ret = receive_fd(f, NULL, perm_to_file_flags(entry.v1.perm));
>                 fput(f);
>                 break;
>         }
> @@ -1401,6 +1483,7 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
>         }
>         case VDUSE_IOTLB_REG_UMEM: {
>                 struct vduse_iova_umem umem;
> +               u32 asid;
>
>                 ret = -EFAULT;
>                 if (copy_from_user(&umem, argp, sizeof(umem)))
> @@ -1408,17 +1491,21 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
>
>                 ret = -EINVAL;
>                 if (!is_mem_zero((const char *)umem.reserved,
> -                                sizeof(umem.reserved)))
> +                                sizeof(umem.reserved)) ||
> +                   (dev->api_version < VDUSE_API_VERSION_1 &&
> +                    umem.asid != 0) || umem.asid >= dev->nas)
>                         break;

Does this mean umem is only supported for asid == 0? This looks like a bug.

>
>                 mutex_lock(&dev->domain_lock);
> -               ret = vduse_dev_reg_umem(dev, umem.iova,
> +               asid = array_index_nospec(umem.asid, dev->nas);
> +               ret = vduse_dev_reg_umem(dev, asid, umem.iova,
>                                          umem.uaddr, umem.size);
>                 mutex_unlock(&dev->domain_lock);
>                 break;
>         }
>         case VDUSE_IOTLB_DEREG_UMEM: {
>                 struct vduse_iova_umem umem;
> +               u32 asid;
>
>                 ret = -EFAULT;
>                 if (copy_from_user(&umem, argp, sizeof(umem)))
> @@ -1426,10 +1513,15 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
>
>                 ret = -EINVAL;
>                 if (!is_mem_zero((const char *)umem.reserved,
> -                                sizeof(umem.reserved)))
> +                                sizeof(umem.reserved)) ||
> +                   (dev->api_version < VDUSE_API_VERSION_1 &&
> +                    umem.asid != 0) ||
> +                    umem.asid >= dev->nas)
>                         break;
> +
>                 mutex_lock(&dev->domain_lock);
> -               ret = vduse_dev_dereg_umem(dev, umem.iova,
> +               asid = array_index_nospec(umem.asid, dev->nas);
> +               ret = vduse_dev_dereg_umem(dev, asid, umem.iova,
>                                            umem.size);
>                 mutex_unlock(&dev->domain_lock);
>                 break;
> @@ -1437,6 +1529,7 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
>         case VDUSE_IOTLB_GET_INFO: {
>                 struct vduse_iova_info info;
>                 struct vhost_iotlb_map *map;
> +               u32 asid;
>
>                 ret = -EFAULT;
>                 if (copy_from_user(&info, argp, sizeof(info)))
> @@ -1450,23 +1543,31 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
>                                  sizeof(info.reserved)))
>                         break;
>
> +               if (dev->api_version < VDUSE_API_VERSION_1) {
> +                       if (info.asid)
> +                               break;
> +               } else if (info.asid >= dev->nas)

It would be simpler if we mandate dev->nas == 1 VERSION 0.

> +                       break;
> +
>                 mutex_lock(&dev->domain_lock);
> -               if (!dev->as.domain) {
> +               asid = array_index_nospec(info.asid, dev->nas);
> +               if (!dev->as[asid].domain) {
>                         mutex_unlock(&dev->domain_lock);
>                         break;
>                 }
> -               spin_lock(&dev->as.domain->iotlb_lock);
> -               map = vhost_iotlb_itree_first(dev->as.domain->iotlb,
> +               spin_lock(&dev->as[asid].domain->iotlb_lock);
> +               map = vhost_iotlb_itree_first(dev->as[asid].domain->iotlb,
>                                               info.start, info.last);
>                 if (map) {
>                         info.start = map->start;
>                         info.last = map->last;
>                         info.capability = 0;
> -                       if (dev->as.domain->bounce_map && map->start == 0 &&
> -                           map->last == dev->as.domain->bounce_size - 1)
> +                       if (dev->as[asid].domain->bounce_map &&
> +                           map->start == 0 &&
> +                           map->last == dev->as[asid].domain->bounce_size - 1)
>                                 info.capability |= VDUSE_IOVA_CAP_UMEM;
>                 }
> -               spin_unlock(&dev->as.domain->iotlb_lock);
> +               spin_unlock(&dev->as[asid].domain->iotlb_lock);
>                 mutex_unlock(&dev->domain_lock);
>                 if (!map)
>                         break;
> @@ -1491,8 +1592,10 @@ static int vduse_dev_release(struct inode *inode, struct file *file)
>         struct vduse_dev *dev = file->private_data;
>
>         mutex_lock(&dev->domain_lock);
> -       if (dev->as.domain)
> -               vduse_dev_dereg_umem(dev, 0, dev->as.domain->bounce_size);
> +       for (int i = 0; i < dev->nas; i++)
> +               if (dev->as[i].domain)

Not related to this patch, but I wonder which case could we get domain
== NULL here?

> +                       vduse_dev_dereg_umem(dev, i, 0,
> +                                            dev->as[i].domain->bounce_size);
>         mutex_unlock(&dev->domain_lock);
>         spin_lock(&dev->msg_lock);
>         /* Make sure the inflight messages can processed after reconncection */
> @@ -1711,7 +1814,6 @@ static struct vduse_dev *vduse_dev_create(void)
>                 return NULL;
>
>         mutex_init(&dev->lock);
> -       mutex_init(&dev->as.mem_lock);
>         mutex_init(&dev->domain_lock);
>         spin_lock_init(&dev->msg_lock);
>         INIT_LIST_HEAD(&dev->send_list);
> @@ -1762,8 +1864,11 @@ static int vduse_destroy_dev(char *name)
>         idr_remove(&vduse_idr, dev->minor);
>         kvfree(dev->config);
>         vduse_dev_deinit_vqs(dev);
> -       if (dev->as.domain)
> -               vduse_domain_destroy(dev->as.domain);
> +       for (int i = 0; i < dev->nas; i++) {
> +               if (dev->as[i].domain)
> +                       vduse_domain_destroy(dev->as[i].domain);
> +       }
> +       kfree(dev->as);
>         kfree(dev->name);
>         kfree(dev->groups);
>         vduse_dev_destroy(dev);
> @@ -1810,12 +1915,16 @@ static bool vduse_validate_config(struct vduse_dev_config *config,
>                          sizeof(config->reserved)))
>                 return false;
>
> -       if (api_version < VDUSE_API_VERSION_1 && config->ngroups)
> +       if (api_version < VDUSE_API_VERSION_1 &&
> +           (config->ngroups || config->nas))
>                 return false;
>
>         if (api_version >= VDUSE_API_VERSION_1 && config->ngroups > 0xffff)
>                 return false;
>
> +       if (api_version >= VDUSE_API_VERSION_1 && config->nas > 0xffff)
> +               return false;

Using macro instead of magic number please.

> +
>         if (config->vq_align > PAGE_SIZE)
>                 return false;
>
> @@ -1879,7 +1988,8 @@ static ssize_t bounce_size_store(struct device *device,
>

So the real size of the bounce should be bounce_size * nas? Should we
be conservative to adjust per as bounce size to bounce_size / nas?

>         ret = -EPERM;
>         mutex_lock(&dev->domain_lock);
> -       if (dev->as.domain)
> +       /* Assuming that if the first domain is allocated, all are allocated */
> +       if (dev->as[0].domain)
>                 goto unlock;
>
>         ret = kstrtouint(buf, 10, &bounce_size);
> @@ -1940,6 +2050,13 @@ static int vduse_create_dev(struct vduse_dev_config *config,
>         for (u32 i = 0; i < dev->ngroups; ++i)
>                 dev->groups[i].dev = dev;
>
> +       dev->nas = (dev->api_version < 1) ? 1 : (config->nas ?: 1);
> +       dev->as = kcalloc(dev->nas, sizeof(dev->as[0]), GFP_KERNEL);
> +       if (!dev->as)
> +               goto err_as;
> +       for (int i = 0; i < dev->nas; i++)
> +               mutex_init(&dev->as[i].mem_lock);
> +
>         dev->name = kstrdup(config->name, GFP_KERNEL);
>         if (!dev->name)
>                 goto err_str;
> @@ -1976,6 +2093,8 @@ static int vduse_create_dev(struct vduse_dev_config *config,
>  err_idr:
>         kfree(dev->name);
>  err_str:
> +       kfree(dev->as);
> +err_as:
>         kfree(dev->groups);
>  err_vq_groups:
>         vduse_dev_destroy(dev);
> @@ -2101,7 +2220,7 @@ static int vduse_dev_init_vdpa(struct vduse_dev *dev, const char *name)
>
>         vdev = vdpa_alloc_device(struct vduse_vdpa, vdpa, dev->dev,
>                                  &vduse_vdpa_config_ops, &vduse_map_ops,
> -                                dev->ngroups, 1, name, true);
> +                                dev->ngroups, dev->nas, name, true);
>         if (IS_ERR(vdev))
>                 return PTR_ERR(vdev);
>
> @@ -2130,11 +2249,20 @@ static int vdpa_dev_add(struct vdpa_mgmt_dev *mdev, const char *name,
>                 return ret;
>
>         mutex_lock(&dev->domain_lock);
> -       if (!dev->as.domain)
> -               dev->as.domain = vduse_domain_create(VDUSE_IOVA_SIZE - 1,
> -                                                 dev->bounce_size);
> +       ret = 0;
> +
> +       for (int i = 0; i < dev->nas; ++i) {
> +               dev->as[i].domain = vduse_domain_create(VDUSE_IOVA_SIZE - 1,
> +                                                       dev->bounce_size);
> +               if (!dev->as[i].domain) {
> +                       ret = -ENOMEM;
> +                       for (int j = 0; j < i; ++j)
> +                               vduse_domain_destroy(dev->as[j].domain);
> +               }
> +       }
> +
>         mutex_unlock(&dev->domain_lock);
> -       if (!dev->as.domain) {
> +       if (ret == -ENOMEM) {
>                 put_device(&dev->vdev->vdpa.dev);
>                 return -ENOMEM;
>         }
> @@ -2143,8 +2271,12 @@ static int vdpa_dev_add(struct vdpa_mgmt_dev *mdev, const char *name,
>         if (ret) {
>                 put_device(&dev->vdev->vdpa.dev);
>                 mutex_lock(&dev->domain_lock);
> -               vduse_domain_destroy(dev->as.domain);
> -               dev->as.domain = NULL;
> +               for (int i = 0; i < dev->nas; i++) {
> +                       if (dev->as[i].domain) {
> +                               vduse_domain_destroy(dev->as[i].domain);
> +                               dev->as[i].domain = NULL;
> +                       }

This is a little bit duplicated with the error handling above, should
we consider to switch to use err labels?

> +               }
>                 mutex_unlock(&dev->domain_lock);
>                 return ret;
>         }
> diff --git a/include/uapi/linux/vduse.h b/include/uapi/linux/vduse.h
> index b1c0e47d71fb..54da965a65dc 100644
> --- a/include/uapi/linux/vduse.h
> +++ b/include/uapi/linux/vduse.h
> @@ -47,7 +47,8 @@ struct vduse_dev_config {
>         __u32 vq_num;
>         __u32 vq_align;
>         __u32 ngroups; /* if VDUSE_API_VERSION >= 1 */
> -       __u32 reserved[12];
> +       __u32 nas; /* if VDUSE_API_VERSION >= 1 */
> +       __u32 reserved[11];
>         __u32 config_size;
>         __u8 config[];
>  };
> @@ -82,6 +83,18 @@ struct vduse_iotlb_entry {
>         __u8 perm;
>  };
>
> +/**
> + * struct vduse_iotlb_entry_v2 - entry of IOTLB to describe one IOVA region in an ASID
> + * @v1: the original vduse_iotlb_entry
> + * @asid: address space ID of the IOVA region
> + *
> + * Structure used by VDUSE_IOTLB_GET_FD ioctl to find an overlapped IOVA region.
> + */
> +struct vduse_iotlb_entry_v2 {
> +       struct vduse_iotlb_entry v1;
> +       __u32 asid;
> +};
> +
>  /*
>   * Find the first IOVA region that overlaps with the range [start, last]
>   * and return the corresponding file descriptor. Return -EINVAL means the
> @@ -172,6 +185,16 @@ struct vduse_vq_group {
>         __u32 group;
>  };
>
> +/**
> + * struct vduse_vq_group - virtqueue group
> + @ @group: Index of the virtqueue group
> + * @asid: Address space ID of the group
> + */
> +struct vduse_vq_group_asid {
> +       __u32 group;
> +       __u32 asid;
> +};
> +
>  /**
>   * struct vduse_vq_info - information of a virtqueue
>   * @index: virtqueue index
> @@ -231,6 +254,7 @@ struct vduse_vq_eventfd {
>   * @uaddr: start address of userspace memory, it must be aligned to page size
>   * @iova: start of the IOVA region
>   * @size: size of the IOVA region
> + * @asid: Address space ID of the IOVA region
>   * @reserved: for future use, needs to be initialized to zero
>   *
>   * Structure used by VDUSE_IOTLB_REG_UMEM and VDUSE_IOTLB_DEREG_UMEM
> @@ -240,7 +264,8 @@ struct vduse_iova_umem {
>         __u64 uaddr;
>         __u64 iova;
>         __u64 size;
> -       __u64 reserved[3];
> +       __u32 asid;
> +       __u32 reserved[5];
>  };
>
>  /* Register userspace memory for IOVA regions */
> @@ -254,6 +279,7 @@ struct vduse_iova_umem {
>   * @start: start of the IOVA region
>   * @last: last of the IOVA region
>   * @capability: capability of the IOVA regsion
> + * @asid: Address space ID of the IOVA region, only if device API version >= 1
>   * @reserved: for future use, needs to be initialized to zero
>   *
>   * Structure used by VDUSE_IOTLB_GET_INFO ioctl to get information of
> @@ -264,7 +290,8 @@ struct vduse_iova_info {
>         __u64 last;
>  #define VDUSE_IOVA_CAP_UMEM (1 << 0)
>         __u64 capability;
> -       __u64 reserved[3];
> +       __u32 asid; /* Only if device API version >= 1 */
> +       __u32 reserved[5];
>  };
>
>  /*
> @@ -287,6 +314,7 @@ enum vduse_req_type {
>         VDUSE_SET_STATUS,
>         VDUSE_UPDATE_IOTLB,
>         VDUSE_GET_VQ_GROUP,
> +       VDUSE_SET_VQ_GROUP_ASID,
>  };
>
>  /**
> @@ -321,6 +349,18 @@ struct vduse_iova_range {
>         __u64 last;
>  };
>
> +/**
> + * struct vduse_iova_range - IOVA range [start, last] if API_VERSION >= 1
> + * @start: start of the IOVA range
> + * @last: last of the IOVA range
> + * @asid: address space ID of the IOVA range
> + */
> +struct vduse_iova_range_v2 {
> +       __u64 start;
> +       __u64 last;
> +       __u32 asid;
> +};
> +
>  /**
>   * struct vduse_dev_request - control request
>   * @type: request type
> @@ -330,6 +370,8 @@ struct vduse_iova_range {
>   * @s: device status
>   * @iova: IOVA range for updating
>   * @vq_group: virtqueue group of a virtqueue
> + * @iova_v2: IOVA range for updating if API_VERSION >= 1
> + * @vq_group_asid: ASID of a virtqueue group
>   * @padding: padding
>   *
>   * Structure used by read(2) on /dev/vduse/$NAME.
> @@ -342,8 +384,10 @@ struct vduse_dev_request {
>                 struct vduse_vq_state vq_state;
>                 struct vduse_dev_status s;
>                 struct vduse_iova_range iova;
> -               /* Only if vduse api version >= 1 */;
> +               /* Following members only if vduse api version >= 1 */;
>                 struct vduse_vq_group vq_group;
> +               struct vduse_iova_range_v2 iova_v2;
> +               struct vduse_vq_group_asid vq_group_asid;
>                 __u32 padding[32];

This seems to break the uAPI for the userspace that tries to use
sizeof(struct vduse_dev_request)?

Thanks

>         };
>  };
> --
> 2.51.0
>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/6] vduse: return internal vq group struct as map token
  2025-09-01  2:25   ` Jason Wang
@ 2025-09-01  7:27     ` Eugenio Perez Martin
  0 siblings, 0 replies; 24+ messages in thread
From: Eugenio Perez Martin @ 2025-09-01  7:27 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S . Tsirkin, Cindy Lu, Stefano Garzarella, Xuan Zhuo,
	Laurent Vivier, virtualization, linux-kernel, Yongji Xie,
	Maxime Coquelin

On Mon, Sep 1, 2025 at 4:26 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Tue, Aug 26, 2025 at 7:27 PM Eugenio Pérez <eperezma@redhat.com> wrote:
> >
> > Return the internal struct that represents the vq group as virtqueue map
> > token, instead of the device.  This allows the map functions to access
> > the information per group.
> >
> > At this moment all the virtqueues share the same vq group, that only
> > can point to ASID 0.  This change prepares the infrastructure for actual
> > per-group address space handling
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> > v3:
> > * Make the vq groups a dynamic array to support an arbitrary number of
> >   them.
> > ---
> >  drivers/vdpa/vdpa_user/vduse_dev.c | 52 ++++++++++++++++++++++++------
> >  include/linux/virtio.h             |  6 ++--
> >  2 files changed, 46 insertions(+), 12 deletions(-)
> >
> > diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
> > index 0f4e36dd167e..cdb3dc2b5e3f 100644
> > --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> > +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> > @@ -22,6 +22,7 @@
> >  #include <linux/uio.h>
> >  #include <linux/vdpa.h>
> >  #include <linux/nospec.h>
> > +#include <linux/virtio.h>
> >  #include <linux/vmalloc.h>
> >  #include <linux/sched/mm.h>
> >  #include <uapi/linux/vduse.h>
> > @@ -84,6 +85,10 @@ struct vduse_umem {
> >         struct mm_struct *mm;
> >  };
> >
> > +struct vduse_vq_group_int {
> > +       struct vduse_dev *dev;
> > +};
>
> Nit: I don't get the meaning of the "int" suffix.
>

It means "internal", but I don't think it is a great name so I'm ok
with changing it.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/6] vduse: add vq group support
  2025-09-01  1:59   ` Jason Wang
  2025-09-01  2:31     ` Jason Wang
@ 2025-09-01  8:39     ` Eugenio Perez Martin
  2025-09-03  3:57       ` Jason Wang
  2025-09-03  3:58       ` Jason Wang
  1 sibling, 2 replies; 24+ messages in thread
From: Eugenio Perez Martin @ 2025-09-01  8:39 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S . Tsirkin, Cindy Lu, Stefano Garzarella, Xuan Zhuo,
	Laurent Vivier, virtualization, linux-kernel, Yongji Xie,
	Maxime Coquelin

On Mon, Sep 1, 2025 at 3:59 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Tue, Aug 26, 2025 at 7:27 PM Eugenio Pérez <eperezma@redhat.com> wrote:
> >
> > This allows sepparate the different virtqueues in groups that shares the
> > same address space.  Asking the VDUSE device for the groups of the vq at
> > the beginning as they're needed for the DMA API.
> >
> > Allocating 3 vq groups as net is the device that need the most groups:
> > * Dataplane (guest passthrough)
> > * CVQ
> > * Shadowed vrings.
> >
> > Future versions of the series can include dynamic allocation of the
> > groups array so VDUSE can declare more groups.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> > v1: Fix: Remove BIT_ULL(VIRTIO_S_*), as _S_ is already the bit (Maxime)
> >
> > RFC v3:
> > * Increase VDUSE_MAX_VQ_GROUPS to 0xffff (Jason). It was set to a lower
> >   value to reduce memory consumption, but vqs are already limited to
> >   that value and userspace VDUSE is able to allocate that many vqs.
> > * Remove the descs vq group capability as it will not be used and we can
> >   add it on top.
> > * Do not ask for vq groups in number of vq groups < 2.
> > * Move the valid vq groups range check to vduse_validate_config.
> >
> > RFC v2:
> > * Cache group information in kernel, as we need to provide the vq map
> >   tokens properly.
> > * Add descs vq group to optimize SVQ forwarding and support indirect
> >   descriptors out of the box.
> > ---
> >  drivers/vdpa/vdpa_user/vduse_dev.c | 51 ++++++++++++++++++++++++++++--
> >  include/uapi/linux/vduse.h         | 21 +++++++++++-
> >  2 files changed, 68 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
> > index e7bced0b5542..0f4e36dd167e 100644
> > --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> > +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> > @@ -58,6 +58,7 @@ struct vduse_virtqueue {
> >         struct vdpa_vq_state state;
> >         bool ready;
> >         bool kicked;
> > +       u32 vq_group;
> >         spinlock_t kick_lock;
> >         spinlock_t irq_lock;
> >         struct eventfd_ctx *kickfd;
> > @@ -114,6 +115,7 @@ struct vduse_dev {
> >         u8 status;
> >         u32 vq_num;
> >         u32 vq_align;
> > +       u32 ngroups;
> >         struct vduse_umem *umem;
> >         struct mutex mem_lock;
> >         unsigned int bounce_size;
> > @@ -592,6 +594,13 @@ static int vduse_vdpa_set_vq_state(struct vdpa_device *vdpa, u16 idx,
> >         return 0;
> >  }
> >
> > +static u32 vduse_get_vq_group(struct vdpa_device *vdpa, u16 idx)
> > +{
> > +       struct vduse_dev *dev = vdpa_to_vduse(vdpa);
> > +
> > +       return dev->vqs[idx]->vq_group;
> > +}
> > +
> >  static int vduse_vdpa_get_vq_state(struct vdpa_device *vdpa, u16 idx,
> >                                 struct vdpa_vq_state *state)
> >  {
> > @@ -678,6 +687,28 @@ static u8 vduse_vdpa_get_status(struct vdpa_device *vdpa)
> >         return dev->status;
> >  }
> >
> > +static int vduse_fill_vq_groups(struct vduse_dev *dev)
> > +{
> > +       /* All vqs and descs must be in vq group 0 if ngroups < 2 */
> > +       if (dev->ngroups < 2)
> > +               return 0;
> > +
> > +       for (int i = 0; i < dev->vdev->vdpa.nvqs; ++i) {
> > +               struct vduse_dev_msg msg = { 0 };
> > +               int ret;
> > +
> > +               msg.req.type = VDUSE_GET_VQ_GROUP;
> > +               msg.req.vq_group.index = i;
> > +               ret = vduse_dev_msg_sync(dev, &msg);
> > +               if (ret)
> > +                       return ret;
> > +
> > +               dev->vqs[i]->vq_group = msg.resp.vq_group.group;
> > +       }
> > +
> > +       return 0;
> > +}
> > +
> >  static void vduse_vdpa_set_status(struct vdpa_device *vdpa, u8 status)
> >  {
> >         struct vduse_dev *dev = vdpa_to_vduse(vdpa);
> > @@ -685,6 +716,11 @@ static void vduse_vdpa_set_status(struct vdpa_device *vdpa, u8 status)
> >         if (vduse_dev_set_status(dev, status))
> >                 return;
> >
> > +       if (((dev->status ^ status) & VIRTIO_CONFIG_S_FEATURES_OK) &&
> > +           (status & VIRTIO_CONFIG_S_FEATURES_OK))
> > +               if (vduse_fill_vq_groups(dev))
> > +                       return;
>
> I may lose some context but I think we've agreed that we need to
> extend the status response for this instead of having multiple
> indepdent response.
>

My understanding was it is ok to start with this version by [1]. We
can even make it asynchronous on top if we find this is a bottleneck
and the VDUSE device would need no change, would that work?

> > +
> >         dev->status = status;
> >  }
> >
> > @@ -789,6 +825,7 @@ static const struct vdpa_config_ops vduse_vdpa_config_ops = {
> >         .set_vq_cb              = vduse_vdpa_set_vq_cb,
> >         .set_vq_num             = vduse_vdpa_set_vq_num,
> >         .get_vq_size            = vduse_vdpa_get_vq_size,
> > +       .get_vq_group           = vduse_get_vq_group,
> >         .set_vq_ready           = vduse_vdpa_set_vq_ready,
> >         .get_vq_ready           = vduse_vdpa_get_vq_ready,
> >         .set_vq_state           = vduse_vdpa_set_vq_state,
> > @@ -1737,12 +1774,19 @@ static bool features_is_valid(struct vduse_dev_config *config)
> >         return true;
> >  }
> >
> > -static bool vduse_validate_config(struct vduse_dev_config *config)
> > +static bool vduse_validate_config(struct vduse_dev_config *config,
> > +                                 u64 api_version)
> >  {
> >         if (!is_mem_zero((const char *)config->reserved,
> >                          sizeof(config->reserved)))
> >                 return false;
> >
> > +       if (api_version < VDUSE_API_VERSION_1 && config->ngroups)
> > +               return false;
> > +
> > +       if (api_version >= VDUSE_API_VERSION_1 && config->ngroups > 0xffff)
> > +               return false;
>
> Let's use a macro instead of magic number.
>

The rest of the limits are hardcoded, but I'm ok with changing this.
Is UINT16_MAX ok here, or do you prefer something like MAX_NGROUPS and
MAX_ASID?

[...]

[1] https://patchew.org/linux/20250807115752.1663383-1-eperezma@redhat.com/20250807115752.1663383-3-eperezma@redhat.com/#CACGkMEuVngGjgPZXnajiPC+pcbt+dr6jqKRQr8OcX7HK1W3WNQ@mail.gmail.com


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 5/6] vduse: add vq group asid support
  2025-09-01  2:46   ` Jason Wang
@ 2025-09-01  9:11     ` Eugenio Perez Martin
  2025-09-03  3:56       ` Jason Wang
  0 siblings, 1 reply; 24+ messages in thread
From: Eugenio Perez Martin @ 2025-09-01  9:11 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S . Tsirkin, Cindy Lu, Stefano Garzarella, Xuan Zhuo,
	Laurent Vivier, virtualization, linux-kernel, Yongji Xie,
	Maxime Coquelin

On Mon, Sep 1, 2025 at 4:46 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Tue, Aug 26, 2025 at 7:27 PM Eugenio Pérez <eperezma@redhat.com> wrote:
> >
> > Add support for assigning Address Space Identifiers (ASIDs) to each VQ
> > group.  This enables mapping each group into a distinct memory space.
> >
> > Now that the driver can change ASID in the middle of operation, the
> > domain that each vq address point is also protected by domain_lock.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> > v3:
> > * Increase VDUSE_MAX_VQ_GROUPS to 0xffff (Jason). It was set to a lower
> >   value to reduce memory consumption, but vqs are already limited to
> >   that value and userspace VDUSE is able to allocate that many vqs.
> > * Remove TODO about merging VDUSE_IOTLB_GET_FD ioctl with
> >   VDUSE_IOTLB_GET_INFO.
> > * Use of array_index_nospec in VDUSE device ioctls.
> > * Embed vduse_iotlb_entry into vduse_iotlb_entry_v2.
> > * Move the umem mutex to asid struct so there is no contention between
> >   ASIDs.
> >
> > v2:
> > * Make iotlb entry the last one of vduse_iotlb_entry_v2 so the first
> >   part of the struct is the same.
> > ---
> >  drivers/vdpa/vdpa_user/vduse_dev.c | 290 +++++++++++++++++++++--------
> >  include/uapi/linux/vduse.h         |  52 +++++-
> >  2 files changed, 259 insertions(+), 83 deletions(-)
> >
> > diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
> > index 7d2a3ed77b1e..2fb227713972 100644
> > --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> > +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> > @@ -92,6 +92,7 @@ struct vduse_as {
> >  };
> >
> >  struct vduse_vq_group_int {
> > +       struct vduse_iova_domain *domain;
>
> I'd expect this should be vduse_as. Anything I miss?
>

It just saves a memory indirection step, as the rest of members of
vduse_as are not used in vduse_map_ops callbacks. I can use vduse_as
if you prefer it.

> >         struct vduse_dev *dev;
> >  };
> >
> > @@ -99,7 +100,7 @@ struct vduse_dev {
> >         struct vduse_vdpa *vdev;
> >         struct device *dev;
> >         struct vduse_virtqueue **vqs;
> > -       struct vduse_as as;
> > +       struct vduse_as *as;
> >         char *name;
> >         struct mutex lock;
> >         spinlock_t msg_lock;
> > @@ -127,6 +128,7 @@ struct vduse_dev {
> >         u32 vq_num;
> >         u32 vq_align;
> >         u32 ngroups;
> > +       u32 nas;
> >         struct vduse_vq_group_int *groups;
> >         unsigned int bounce_size;
> >         struct mutex domain_lock;
> > @@ -317,7 +319,7 @@ static int vduse_dev_set_status(struct vduse_dev *dev, u8 status)
> >         return vduse_dev_msg_sync(dev, &msg);
> >  }
> >
> > -static int vduse_dev_update_iotlb(struct vduse_dev *dev,
> > +static int vduse_dev_update_iotlb(struct vduse_dev *dev, u32 asid,
> >                                   u64 start, u64 last)
> >  {
> >         struct vduse_dev_msg msg = { 0 };
> > @@ -326,8 +328,14 @@ static int vduse_dev_update_iotlb(struct vduse_dev *dev,
> >                 return -EINVAL;
> >
> >         msg.req.type = VDUSE_UPDATE_IOTLB;
> > -       msg.req.iova.start = start;
> > -       msg.req.iova.last = last;
> > +       if (dev->api_version < VDUSE_API_VERSION_1) {
> > +               msg.req.iova.start = start;
> > +               msg.req.iova.last = last;
> > +       } else {
> > +               msg.req.iova_v2.start = start;
> > +               msg.req.iova_v2.last = last;
> > +               msg.req.iova_v2.asid = asid;
> > +       }
> >
> >         return vduse_dev_msg_sync(dev, &msg);
> >  }
> > @@ -439,14 +447,28 @@ static __poll_t vduse_dev_poll(struct file *file, poll_table *wait)
> >         return mask;
> >  }
> >
> > +/* Force set the asid to a vq group without a message to the VDUSE device */
> > +static void vduse_set_group_asid_nomsg(struct vduse_dev *dev,
> > +                                      unsigned int group, unsigned int asid)
> > +{
> > +       guard(mutex)(&dev->domain_lock);
> > +       dev->groups[group].domain = dev->as[asid].domain;
> > +}
> > +
> >  static void vduse_dev_reset(struct vduse_dev *dev)
> >  {
> >         int i;
> > -       struct vduse_iova_domain *domain = dev->as.domain;
> >
> >         /* The coherent mappings are handled in vduse_dev_free_coherent() */
> > -       if (domain && domain->bounce_map)
> > -               vduse_domain_reset_bounce_map(domain);
> > +       for (i = 0; i < dev->nas; i++) {
> > +               struct vduse_iova_domain *domain = dev->as[i].domain;
> > +
> > +               if (domain && domain->bounce_map)
> > +                       vduse_domain_reset_bounce_map(domain);
> > +       }
> > +
> > +       for (i = 0; i < dev->ngroups; i++)
> > +               vduse_set_group_asid_nomsg(dev, i, 0);
> >
> >         down_write(&dev->rwsem);
> >
> > @@ -620,6 +642,29 @@ static union virtio_map vduse_get_vq_map(struct vdpa_device *vdpa, u16 idx)
> >         return ret;
> >  }
> >
> > +static int vduse_set_group_asid(struct vdpa_device *vdpa, unsigned int group,
> > +                               unsigned int asid)
> > +{
> > +       struct vduse_dev *dev = vdpa_to_vduse(vdpa);
> > +       struct vduse_dev_msg msg = { 0 };
> > +       int r;
> > +
> > +       if (dev->api_version < VDUSE_API_VERSION_1 ||
> > +           group >= dev->ngroups || asid >= dev->nas)
> > +               return -EINVAL;
> > +
> > +       msg.req.type = VDUSE_SET_VQ_GROUP_ASID;
> > +       msg.req.vq_group_asid.group = group;
> > +       msg.req.vq_group_asid.asid = asid;
> > +
> > +       r = vduse_dev_msg_sync(dev, &msg);
> > +       if (r < 0)
> > +               return r;
> > +
> > +       vduse_set_group_asid_nomsg(dev, group, asid);
> > +       return 0;
> > +}
> > +
> >  static int vduse_vdpa_get_vq_state(struct vdpa_device *vdpa, u16 idx,
> >                                 struct vdpa_vq_state *state)
> >  {
> > @@ -818,13 +863,13 @@ static int vduse_vdpa_set_map(struct vdpa_device *vdpa,
> >         struct vduse_dev *dev = vdpa_to_vduse(vdpa);
> >         int ret;
> >
> > -       ret = vduse_domain_set_map(dev->as.domain, iotlb);
> > +       ret = vduse_domain_set_map(dev->as[asid].domain, iotlb);
> >         if (ret)
> >                 return ret;
> >
> > -       ret = vduse_dev_update_iotlb(dev, 0ULL, ULLONG_MAX);
> > +       ret = vduse_dev_update_iotlb(dev, asid, 0ULL, ULLONG_MAX);
> >         if (ret) {
> > -               vduse_domain_clear_map(dev->as.domain, iotlb);
> > +               vduse_domain_clear_map(dev->as[asid].domain, iotlb);
> >                 return ret;
> >         }
> >
> > @@ -867,6 +912,7 @@ static const struct vdpa_config_ops vduse_vdpa_config_ops = {
> >         .get_vq_affinity        = vduse_vdpa_get_vq_affinity,
> >         .reset                  = vduse_vdpa_reset,
> >         .set_map                = vduse_vdpa_set_map,
> > +       .set_group_asid         = vduse_set_group_asid,
> >         .get_vq_map             = vduse_get_vq_map,
> >         .free                   = vduse_vdpa_free,
> >  };
> > @@ -876,8 +922,10 @@ static void vduse_dev_sync_single_for_device(union virtio_map token,
> >                                              enum dma_data_direction dir)
> >  {
> >         struct vduse_dev *vdev = token.group->dev;
> > -       struct vduse_iova_domain *domain = vdev->as.domain;
> > +       struct vduse_iova_domain *domain;
> >
> > +       guard(mutex)(&vdev->domain_lock);
>
> Is this correct? I mean each AS should have its own lock instead of
> having a BQL.
>

That would not protect the pointer if vduse_dev_sync_single_for_device
(or equivalent function that uses the domain) is called at the same
time as vduse_set_group_asid.

> > +       domain = token.group->domain;
> >         vduse_domain_sync_single_for_device(domain, dma_addr, size, dir);
> >  }
> >
> > @@ -886,8 +934,10 @@ static void vduse_dev_sync_single_for_cpu(union virtio_map token,
> >                                              enum dma_data_direction dir)
> >  {
> >         struct vduse_dev *vdev = token.group->dev;
> > -       struct vduse_iova_domain *domain = vdev->as.domain;
> > +       struct vduse_iova_domain *domain;
> >
> > +       guard(mutex)(&vdev->domain_lock);
> > +       domain = token.group->domain;
> >         vduse_domain_sync_single_for_cpu(domain, dma_addr, size, dir);
> >  }
> >
> > @@ -897,8 +947,10 @@ static dma_addr_t vduse_dev_map_page(union virtio_map token, struct page *page,
> >                                      unsigned long attrs)
> >  {
> >         struct vduse_dev *vdev = token.group->dev;
> > -       struct vduse_iova_domain *domain = vdev->as.domain;
> > +       struct vduse_iova_domain *domain;
> >
> > +       guard(mutex)(&vdev->domain_lock);
> > +       domain = token.group->domain;
> >         return vduse_domain_map_page(domain, page, offset, size, dir, attrs);
> >  }
> >
> > @@ -907,8 +959,10 @@ static void vduse_dev_unmap_page(union virtio_map token, dma_addr_t dma_addr,
> >                                  unsigned long attrs)
> >  {
> >         struct vduse_dev *vdev = token.group->dev;
> > -       struct vduse_iova_domain *domain = vdev->as.domain;
> > +       struct vduse_iova_domain *domain;
> >
> > +       guard(mutex)(&vdev->domain_lock);
> > +       domain = token.group->domain;
> >         return vduse_domain_unmap_page(domain, dma_addr, size, dir, attrs);
> >  }
> >
> > @@ -916,11 +970,13 @@ static void *vduse_dev_alloc_coherent(union virtio_map token, size_t size,
> >                                       dma_addr_t *dma_addr, gfp_t flag)
> >  {
> >         struct vduse_dev *vdev = token.group->dev;
> > -       struct vduse_iova_domain *domain = vdev->as.domain;
> > +       struct vduse_iova_domain *domain;
> >         unsigned long iova;
> >         void *addr;
> >
> >         *dma_addr = DMA_MAPPING_ERROR;
> > +       guard(mutex)(&vdev->domain_lock);
> > +       domain = token.group->domain;
> >         addr = vduse_domain_alloc_coherent(domain, size,
> >                                            (dma_addr_t *)&iova, flag);
> >         if (!addr)
> > @@ -936,16 +992,20 @@ static void vduse_dev_free_coherent(union virtio_map token, size_t size,
> >                                     unsigned long attrs)
> >  {
> >         struct vduse_dev *vdev = token.group->dev;
> > -       struct vduse_iova_domain *domain = vdev->as.domain;
> > +       struct vduse_iova_domain *domain;
> >
> > +       guard(mutex)(&vdev->domain_lock);
> > +       domain = token.group->domain;
> >         vduse_domain_free_coherent(domain, size, vaddr, dma_addr, attrs);
> >  }
> >
> >  static bool vduse_dev_need_sync(union virtio_map token, dma_addr_t dma_addr)
> >  {
> >         struct vduse_dev *vdev = token.group->dev;
> > -       struct vduse_iova_domain *domain = vdev->as.domain;
> > +       struct vduse_iova_domain *domain;
> >
> > +       guard(mutex)(&vdev->domain_lock);
> > +       domain = token.group->domain;
> >         return dma_addr < domain->bounce_size;
> >  }
> >
> > @@ -959,8 +1019,10 @@ static int vduse_dev_mapping_error(union virtio_map token, dma_addr_t dma_addr)
> >  static size_t vduse_dev_max_mapping_size(union virtio_map token)
> >  {
> >         struct vduse_dev *vdev = token.group->dev;
> > -       struct vduse_iova_domain *domain = vdev->as.domain;
> > +       struct vduse_iova_domain *domain;
> >
> > +       guard(mutex)(&vdev->domain_lock);
> > +       domain = token.group->domain;
> >         return domain->bounce_size;
> >  }
> >
> > @@ -1101,39 +1163,40 @@ static int vduse_dev_queue_irq_work(struct vduse_dev *dev,
> >         return ret;
> >  }
> >
> > -static int vduse_dev_dereg_umem(struct vduse_dev *dev,
> > +static int vduse_dev_dereg_umem(struct vduse_dev *dev, u32 asid,
> >                                 u64 iova, u64 size)
> >  {
> >         int ret;
> >
> > -       mutex_lock(&dev->as.mem_lock);
> > +       mutex_lock(&dev->as[asid].mem_lock);
> >         ret = -ENOENT;
> > -       if (!dev->as.umem)
> > +       if (!dev->as[asid].umem)
> >                 goto unlock;
> >
> >         ret = -EINVAL;
> > -       if (!dev->as.domain)
> > +       if (!dev->as[asid].domain)
> >                 goto unlock;
> >
> > -       if (dev->as.umem->iova != iova || size != dev->as.domain->bounce_size)
> > +       if (dev->as[asid].umem->iova != iova ||
> > +           size != dev->as[asid].domain->bounce_size)
> >                 goto unlock;
> >
> > -       vduse_domain_remove_user_bounce_pages(dev->as.domain);
> > -       unpin_user_pages_dirty_lock(dev->as.umem->pages,
> > -                                   dev->as.umem->npages, true);
> > -       atomic64_sub(dev->as.umem->npages, &dev->as.umem->mm->pinned_vm);
> > -       mmdrop(dev->as.umem->mm);
> > -       vfree(dev->as.umem->pages);
> > -       kfree(dev->as.umem);
> > -       dev->as.umem = NULL;
> > +       vduse_domain_remove_user_bounce_pages(dev->as[asid].domain);
> > +       unpin_user_pages_dirty_lock(dev->as[asid].umem->pages,
> > +                                   dev->as[asid].umem->npages, true);
> > +       atomic64_sub(dev->as[asid].umem->npages, &dev->as[asid].umem->mm->pinned_vm);
> > +       mmdrop(dev->as[asid].umem->mm);
> > +       vfree(dev->as[asid].umem->pages);
> > +       kfree(dev->as[asid].umem);
> > +       dev->as[asid].umem = NULL;
> >         ret = 0;
> >  unlock:
> > -       mutex_unlock(&dev->as.mem_lock);
> > +       mutex_unlock(&dev->as[asid].mem_lock);
> >         return ret;
> >  }
> >
> >  static int vduse_dev_reg_umem(struct vduse_dev *dev,
> > -                             u64 iova, u64 uaddr, u64 size)
> > +                             u32 asid, u64 iova, u64 uaddr, u64 size)
> >  {
> >         struct page **page_list = NULL;
> >         struct vduse_umem *umem = NULL;
> > @@ -1141,14 +1204,14 @@ static int vduse_dev_reg_umem(struct vduse_dev *dev,
> >         unsigned long npages, lock_limit;
> >         int ret;
> >
> > -       if (!dev->as.domain || !dev->as.domain->bounce_map ||
> > -           size != dev->as.domain->bounce_size ||
> > +       if (!dev->as[asid].domain || !dev->as[asid].domain->bounce_map ||
> > +           size != dev->as[asid].domain->bounce_size ||
> >             iova != 0 || uaddr & ~PAGE_MASK)
> >                 return -EINVAL;
> >
> > -       mutex_lock(&dev->as.mem_lock);
> > +       mutex_lock(&dev->as[asid].mem_lock);
> >         ret = -EEXIST;
> > -       if (dev->as.umem)
> > +       if (dev->as[asid].umem)
> >                 goto unlock;
> >
> >         ret = -ENOMEM;
> > @@ -1172,7 +1235,7 @@ static int vduse_dev_reg_umem(struct vduse_dev *dev,
> >                 goto out;
> >         }
> >
> > -       ret = vduse_domain_add_user_bounce_pages(dev->as.domain,
> > +       ret = vduse_domain_add_user_bounce_pages(dev->as[asid].domain,
> >                                                  page_list, pinned);
> >         if (ret)
> >                 goto out;
> > @@ -1185,7 +1248,7 @@ static int vduse_dev_reg_umem(struct vduse_dev *dev,
> >         umem->mm = current->mm;
> >         mmgrab(current->mm);
> >
> > -       dev->as.umem = umem;
> > +       dev->as[asid].umem = umem;
> >  out:
> >         if (ret && pinned > 0)
> >                 unpin_user_pages(page_list, pinned);
> > @@ -1196,7 +1259,7 @@ static int vduse_dev_reg_umem(struct vduse_dev *dev,
> >                 vfree(page_list);
> >                 kfree(umem);
> >         }
> > -       mutex_unlock(&dev->as.mem_lock);
> > +       mutex_unlock(&dev->as[asid].mem_lock);
> >         return ret;
> >  }
> >
> > @@ -1228,47 +1291,66 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
> >
> >         switch (cmd) {
> >         case VDUSE_IOTLB_GET_FD: {
> > -               struct vduse_iotlb_entry entry;
> > +               struct vduse_iotlb_entry_v2 entry;
> >                 struct vhost_iotlb_map *map;
> >                 struct vdpa_map_file *map_file;
> >                 struct file *f = NULL;
> > +               u32 asid;
> >
> >                 ret = -EFAULT;
> > -               if (copy_from_user(&entry, argp, sizeof(entry)))
> > -                       break;
> > +               if (dev->api_version >= VDUSE_API_VERSION_1) {
> > +                       if (copy_from_user(&entry, argp, sizeof(entry)))
> > +                               break;
> > +               } else {
> > +                       entry.asid = 0;
> > +                       if (copy_from_user(&entry.v1, argp,
> > +                                          sizeof(entry.v1)))
> > +                               break;
> > +               }
> >
> >                 ret = -EINVAL;
> > -               if (entry.start > entry.last)
> > +               if (entry.v1.start > entry.v1.last)
> > +                       break;
> > +
> > +               if (entry.asid >= dev->nas)
> >                         break;
> >
> >                 mutex_lock(&dev->domain_lock);
> > -               if (!dev->as.domain) {
> > +               asid = array_index_nospec(entry.asid, dev->nas);
> > +               if (!dev->as[asid].domain) {
> >                         mutex_unlock(&dev->domain_lock);
> >                         break;
> >                 }
> > -               spin_lock(&dev->as.domain->iotlb_lock);
> > -               map = vhost_iotlb_itree_first(dev->as.domain->iotlb,
> > -                                             entry.start, entry.last);
> > +               spin_lock(&dev->as[asid].domain->iotlb_lock);
> > +               map = vhost_iotlb_itree_first(dev->as[asid].domain->iotlb,
> > +                                             entry.v1.start, entry.v1.last);
> >                 if (map) {
> >                         map_file = (struct vdpa_map_file *)map->opaque;
> >                         f = get_file(map_file->file);
> > -                       entry.offset = map_file->offset;
> > -                       entry.start = map->start;
> > -                       entry.last = map->last;
> > -                       entry.perm = map->perm;
> > +                       entry.v1.offset = map_file->offset;
> > +                       entry.v1.start = map->start;
> > +                       entry.v1.last = map->last;
> > +                       entry.v1.perm = map->perm;
> >                 }
> > -               spin_unlock(&dev->as.domain->iotlb_lock);
> > +               spin_unlock(&dev->as[asid].domain->iotlb_lock);
> >                 mutex_unlock(&dev->domain_lock);
> >                 ret = -EINVAL;
> >                 if (!f)
> >                         break;
> >
> >                 ret = -EFAULT;
> > -               if (copy_to_user(argp, &entry, sizeof(entry))) {
> > +               if (dev->api_version >= VDUSE_API_VERSION_1)
> > +                       ret = copy_to_user(argp, &entry,
> > +                                          sizeof(entry));
> > +               else
> > +                       ret = copy_to_user(argp, &entry.v1,
> > +                                          sizeof(entry.v1));
> > +
> > +               if (ret) {
> >                         fput(f);
> >                         break;
> >                 }
> > -               ret = receive_fd(f, NULL, perm_to_file_flags(entry.perm));
> > +               ret = receive_fd(f, NULL, perm_to_file_flags(entry.v1.perm));
> >                 fput(f);
> >                 break;
> >         }
> > @@ -1401,6 +1483,7 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
> >         }
> >         case VDUSE_IOTLB_REG_UMEM: {
> >                 struct vduse_iova_umem umem;
> > +               u32 asid;
> >
> >                 ret = -EFAULT;
> >                 if (copy_from_user(&umem, argp, sizeof(umem)))
> > @@ -1408,17 +1491,21 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
> >
> >                 ret = -EINVAL;
> >                 if (!is_mem_zero((const char *)umem.reserved,
> > -                                sizeof(umem.reserved)))
> > +                                sizeof(umem.reserved)) ||
> > +                   (dev->api_version < VDUSE_API_VERSION_1 &&
> > +                    umem.asid != 0) || umem.asid >= dev->nas)
> >                         break;
>
> Does this mean umem is only supported for asid == 0? This looks like a bug.
>

No, that conditional means that:
* If dev->api_version < V1 (In other words, if it is 0), umem.asid
must be 0. Previous versions already return -EINVAL if the reserved
members are 0 so we must keep this behavior.
* If the version is 1 or bigger, the asid must be smaller than the
device number of ASID (dev->nas).

> >
> >                 mutex_lock(&dev->domain_lock);
> > -               ret = vduse_dev_reg_umem(dev, umem.iova,
> > +               asid = array_index_nospec(umem.asid, dev->nas);
> > +               ret = vduse_dev_reg_umem(dev, asid, umem.iova,
> >                                          umem.uaddr, umem.size);
> >                 mutex_unlock(&dev->domain_lock);
> >                 break;
> >         }
> >         case VDUSE_IOTLB_DEREG_UMEM: {
> >                 struct vduse_iova_umem umem;
> > +               u32 asid;
> >
> >                 ret = -EFAULT;
> >                 if (copy_from_user(&umem, argp, sizeof(umem)))
> > @@ -1426,10 +1513,15 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
> >
> >                 ret = -EINVAL;
> >                 if (!is_mem_zero((const char *)umem.reserved,
> > -                                sizeof(umem.reserved)))
> > +                                sizeof(umem.reserved)) ||
> > +                   (dev->api_version < VDUSE_API_VERSION_1 &&
> > +                    umem.asid != 0) ||
> > +                    umem.asid >= dev->nas)
> >                         break;
> > +
> >                 mutex_lock(&dev->domain_lock);
> > -               ret = vduse_dev_dereg_umem(dev, umem.iova,
> > +               asid = array_index_nospec(umem.asid, dev->nas);
> > +               ret = vduse_dev_dereg_umem(dev, asid, umem.iova,
> >                                            umem.size);
> >                 mutex_unlock(&dev->domain_lock);
> >                 break;
> > @@ -1437,6 +1529,7 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
> >         case VDUSE_IOTLB_GET_INFO: {
> >                 struct vduse_iova_info info;
> >                 struct vhost_iotlb_map *map;
> > +               u32 asid;
> >
> >                 ret = -EFAULT;
> >                 if (copy_from_user(&info, argp, sizeof(info)))
> > @@ -1450,23 +1543,31 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
> >                                  sizeof(info.reserved)))
> >                         break;
> >
> > +               if (dev->api_version < VDUSE_API_VERSION_1) {
> > +                       if (info.asid)
> > +                               break;
> > +               } else if (info.asid >= dev->nas)
>
> It would be simpler if we mandate dev->nas == 1 VERSION 0.
>

That's done later in this patch, in the change of the
vduse_validate_config function.

> > +                       break;
> > +
> >                 mutex_lock(&dev->domain_lock);
> > -               if (!dev->as.domain) {
> > +               asid = array_index_nospec(info.asid, dev->nas);
> > +               if (!dev->as[asid].domain) {
> >                         mutex_unlock(&dev->domain_lock);
> >                         break;
> >                 }
> > -               spin_lock(&dev->as.domain->iotlb_lock);
> > -               map = vhost_iotlb_itree_first(dev->as.domain->iotlb,
> > +               spin_lock(&dev->as[asid].domain->iotlb_lock);
> > +               map = vhost_iotlb_itree_first(dev->as[asid].domain->iotlb,
> >                                               info.start, info.last);
> >                 if (map) {
> >                         info.start = map->start;
> >                         info.last = map->last;
> >                         info.capability = 0;
> > -                       if (dev->as.domain->bounce_map && map->start == 0 &&
> > -                           map->last == dev->as.domain->bounce_size - 1)
> > +                       if (dev->as[asid].domain->bounce_map &&
> > +                           map->start == 0 &&
> > +                           map->last == dev->as[asid].domain->bounce_size - 1)
> >                                 info.capability |= VDUSE_IOVA_CAP_UMEM;
> >                 }
> > -               spin_unlock(&dev->as.domain->iotlb_lock);
> > +               spin_unlock(&dev->as[asid].domain->iotlb_lock);
> >                 mutex_unlock(&dev->domain_lock);
> >                 if (!map)
> >                         break;
> > @@ -1491,8 +1592,10 @@ static int vduse_dev_release(struct inode *inode, struct file *file)
> >         struct vduse_dev *dev = file->private_data;
> >
> >         mutex_lock(&dev->domain_lock);
> > -       if (dev->as.domain)
> > -               vduse_dev_dereg_umem(dev, 0, dev->as.domain->bounce_size);
> > +       for (int i = 0; i < dev->nas; i++)
> > +               if (dev->as[i].domain)
>
> Not related to this patch, but I wonder which case could we get domain
> == NULL here?
>

I never tried myself but the domain is set at vdpa_dev_add. If a VDUSE
device is created by the VDUSE_CREATE_DEV ioctl and then destroyed
with VDUSE_DESTROY_DEV without calling "vdpa dev add" the
vduse_dev_release function will find dev->as[i].domain == NULL.

> > +                       vduse_dev_dereg_umem(dev, i, 0,
> > +                                            dev->as[i].domain->bounce_size);
> >         mutex_unlock(&dev->domain_lock);
> >         spin_lock(&dev->msg_lock);
> >         /* Make sure the inflight messages can processed after reconncection */
> > @@ -1711,7 +1814,6 @@ static struct vduse_dev *vduse_dev_create(void)
> >                 return NULL;
> >
> >         mutex_init(&dev->lock);
> > -       mutex_init(&dev->as.mem_lock);
> >         mutex_init(&dev->domain_lock);
> >         spin_lock_init(&dev->msg_lock);
> >         INIT_LIST_HEAD(&dev->send_list);
> > @@ -1762,8 +1864,11 @@ static int vduse_destroy_dev(char *name)
> >         idr_remove(&vduse_idr, dev->minor);
> >         kvfree(dev->config);
> >         vduse_dev_deinit_vqs(dev);
> > -       if (dev->as.domain)
> > -               vduse_domain_destroy(dev->as.domain);
> > +       for (int i = 0; i < dev->nas; i++) {
> > +               if (dev->as[i].domain)
> > +                       vduse_domain_destroy(dev->as[i].domain);
> > +       }
> > +       kfree(dev->as);
> >         kfree(dev->name);
> >         kfree(dev->groups);
> >         vduse_dev_destroy(dev);
> > @@ -1810,12 +1915,16 @@ static bool vduse_validate_config(struct vduse_dev_config *config,
> >                          sizeof(config->reserved)))
> >                 return false;
> >
> > -       if (api_version < VDUSE_API_VERSION_1 && config->ngroups)
> > +       if (api_version < VDUSE_API_VERSION_1 &&
> > +           (config->ngroups || config->nas))
> >                 return false;
> >
> >         if (api_version >= VDUSE_API_VERSION_1 && config->ngroups > 0xffff)
> >                 return false;
> >
> > +       if (api_version >= VDUSE_API_VERSION_1 && config->nas > 0xffff)
> > +               return false;
>
> Using macro instead of magic number please.
>
> > +
> >         if (config->vq_align > PAGE_SIZE)
> >                 return false;
> >
> > @@ -1879,7 +1988,8 @@ static ssize_t bounce_size_store(struct device *device,
> >
>
> So the real size of the bounce should be bounce_size * nas? Should we
> be conservative to adjust per as bounce size to bounce_size / nas?
>

I think that's a bad idea in net devices in particular, as I expect
the dataplane will need way more memory than CVQ. bounce_size per
device seems more correct to me. But I can divide per nas for sure.

> >         ret = -EPERM;
> >         mutex_lock(&dev->domain_lock);
> > -       if (dev->as.domain)
> > +       /* Assuming that if the first domain is allocated, all are allocated */
> > +       if (dev->as[0].domain)
> >                 goto unlock;
> >
> >         ret = kstrtouint(buf, 10, &bounce_size);
> > @@ -1940,6 +2050,13 @@ static int vduse_create_dev(struct vduse_dev_config *config,
> >         for (u32 i = 0; i < dev->ngroups; ++i)
> >                 dev->groups[i].dev = dev;
> >
> > +       dev->nas = (dev->api_version < 1) ? 1 : (config->nas ?: 1);
> > +       dev->as = kcalloc(dev->nas, sizeof(dev->as[0]), GFP_KERNEL);
> > +       if (!dev->as)
> > +               goto err_as;
> > +       for (int i = 0; i < dev->nas; i++)
> > +               mutex_init(&dev->as[i].mem_lock);
> > +
> >         dev->name = kstrdup(config->name, GFP_KERNEL);
> >         if (!dev->name)
> >                 goto err_str;
> > @@ -1976,6 +2093,8 @@ static int vduse_create_dev(struct vduse_dev_config *config,
> >  err_idr:
> >         kfree(dev->name);
> >  err_str:
> > +       kfree(dev->as);
> > +err_as:
> >         kfree(dev->groups);
> >  err_vq_groups:
> >         vduse_dev_destroy(dev);
> > @@ -2101,7 +2220,7 @@ static int vduse_dev_init_vdpa(struct vduse_dev *dev, const char *name)
> >
> >         vdev = vdpa_alloc_device(struct vduse_vdpa, vdpa, dev->dev,
> >                                  &vduse_vdpa_config_ops, &vduse_map_ops,
> > -                                dev->ngroups, 1, name, true);
> > +                                dev->ngroups, dev->nas, name, true);
> >         if (IS_ERR(vdev))
> >                 return PTR_ERR(vdev);
> >
> > @@ -2130,11 +2249,20 @@ static int vdpa_dev_add(struct vdpa_mgmt_dev *mdev, const char *name,
> >                 return ret;
> >
> >         mutex_lock(&dev->domain_lock);
> > -       if (!dev->as.domain)
> > -               dev->as.domain = vduse_domain_create(VDUSE_IOVA_SIZE - 1,
> > -                                                 dev->bounce_size);
> > +       ret = 0;
> > +
> > +       for (int i = 0; i < dev->nas; ++i) {
> > +               dev->as[i].domain = vduse_domain_create(VDUSE_IOVA_SIZE - 1,
> > +                                                       dev->bounce_size);
> > +               if (!dev->as[i].domain) {
> > +                       ret = -ENOMEM;
> > +                       for (int j = 0; j < i; ++j)
> > +                               vduse_domain_destroy(dev->as[j].domain);
> > +               }
> > +       }
> > +
> >         mutex_unlock(&dev->domain_lock);
> > -       if (!dev->as.domain) {
> > +       if (ret == -ENOMEM) {
> >                 put_device(&dev->vdev->vdpa.dev);
> >                 return -ENOMEM;
> >         }
> > @@ -2143,8 +2271,12 @@ static int vdpa_dev_add(struct vdpa_mgmt_dev *mdev, const char *name,
> >         if (ret) {
> >                 put_device(&dev->vdev->vdpa.dev);
> >                 mutex_lock(&dev->domain_lock);
> > -               vduse_domain_destroy(dev->as.domain);
> > -               dev->as.domain = NULL;
> > +               for (int i = 0; i < dev->nas; i++) {
> > +                       if (dev->as[i].domain) {
> > +                               vduse_domain_destroy(dev->as[i].domain);
> > +                               dev->as[i].domain = NULL;
> > +                       }
>
> This is a little bit duplicated with the error handling above, should
> we consider to switch to use err labels?
>

Sure, I can change it.

> > +               }
> >                 mutex_unlock(&dev->domain_lock);
> >                 return ret;
> >         }
> > diff --git a/include/uapi/linux/vduse.h b/include/uapi/linux/vduse.h
> > index b1c0e47d71fb..54da965a65dc 100644
> > --- a/include/uapi/linux/vduse.h
> > +++ b/include/uapi/linux/vduse.h
> > @@ -47,7 +47,8 @@ struct vduse_dev_config {
> >         __u32 vq_num;
> >         __u32 vq_align;
> >         __u32 ngroups; /* if VDUSE_API_VERSION >= 1 */
> > -       __u32 reserved[12];
> > +       __u32 nas; /* if VDUSE_API_VERSION >= 1 */
> > +       __u32 reserved[11];
> >         __u32 config_size;
> >         __u8 config[];
> >  };
> > @@ -82,6 +83,18 @@ struct vduse_iotlb_entry {
> >         __u8 perm;
> >  };
> >
> > +/**
> > + * struct vduse_iotlb_entry_v2 - entry of IOTLB to describe one IOVA region in an ASID
> > + * @v1: the original vduse_iotlb_entry
> > + * @asid: address space ID of the IOVA region
> > + *
> > + * Structure used by VDUSE_IOTLB_GET_FD ioctl to find an overlapped IOVA region.
> > + */
> > +struct vduse_iotlb_entry_v2 {
> > +       struct vduse_iotlb_entry v1;
> > +       __u32 asid;
> > +};
> > +
> >  /*
> >   * Find the first IOVA region that overlaps with the range [start, last]
> >   * and return the corresponding file descriptor. Return -EINVAL means the
> > @@ -172,6 +185,16 @@ struct vduse_vq_group {
> >         __u32 group;
> >  };
> >
> > +/**
> > + * struct vduse_vq_group - virtqueue group
> > + @ @group: Index of the virtqueue group
> > + * @asid: Address space ID of the group
> > + */
> > +struct vduse_vq_group_asid {
> > +       __u32 group;
> > +       __u32 asid;
> > +};
> > +
> >  /**
> >   * struct vduse_vq_info - information of a virtqueue
> >   * @index: virtqueue index
> > @@ -231,6 +254,7 @@ struct vduse_vq_eventfd {
> >   * @uaddr: start address of userspace memory, it must be aligned to page size
> >   * @iova: start of the IOVA region
> >   * @size: size of the IOVA region
> > + * @asid: Address space ID of the IOVA region
> >   * @reserved: for future use, needs to be initialized to zero
> >   *
> >   * Structure used by VDUSE_IOTLB_REG_UMEM and VDUSE_IOTLB_DEREG_UMEM
> > @@ -240,7 +264,8 @@ struct vduse_iova_umem {
> >         __u64 uaddr;
> >         __u64 iova;
> >         __u64 size;
> > -       __u64 reserved[3];
> > +       __u32 asid;
> > +       __u32 reserved[5];
> >  };
> >
> >  /* Register userspace memory for IOVA regions */
> > @@ -254,6 +279,7 @@ struct vduse_iova_umem {
> >   * @start: start of the IOVA region
> >   * @last: last of the IOVA region
> >   * @capability: capability of the IOVA regsion
> > + * @asid: Address space ID of the IOVA region, only if device API version >= 1
> >   * @reserved: for future use, needs to be initialized to zero
> >   *
> >   * Structure used by VDUSE_IOTLB_GET_INFO ioctl to get information of
> > @@ -264,7 +290,8 @@ struct vduse_iova_info {
> >         __u64 last;
> >  #define VDUSE_IOVA_CAP_UMEM (1 << 0)
> >         __u64 capability;
> > -       __u64 reserved[3];
> > +       __u32 asid; /* Only if device API version >= 1 */
> > +       __u32 reserved[5];
> >  };
> >
> >  /*
> > @@ -287,6 +314,7 @@ enum vduse_req_type {
> >         VDUSE_SET_STATUS,
> >         VDUSE_UPDATE_IOTLB,
> >         VDUSE_GET_VQ_GROUP,
> > +       VDUSE_SET_VQ_GROUP_ASID,
> >  };
> >
> >  /**
> > @@ -321,6 +349,18 @@ struct vduse_iova_range {
> >         __u64 last;
> >  };
> >
> > +/**
> > + * struct vduse_iova_range - IOVA range [start, last] if API_VERSION >= 1
> > + * @start: start of the IOVA range
> > + * @last: last of the IOVA range
> > + * @asid: address space ID of the IOVA range
> > + */
> > +struct vduse_iova_range_v2 {
> > +       __u64 start;
> > +       __u64 last;
> > +       __u32 asid;
> > +};
> > +
> >  /**
> >   * struct vduse_dev_request - control request
> >   * @type: request type
> > @@ -330,6 +370,8 @@ struct vduse_iova_range {
> >   * @s: device status
> >   * @iova: IOVA range for updating
> >   * @vq_group: virtqueue group of a virtqueue
> > + * @iova_v2: IOVA range for updating if API_VERSION >= 1
> > + * @vq_group_asid: ASID of a virtqueue group
> >   * @padding: padding
> >   *
> >   * Structure used by read(2) on /dev/vduse/$NAME.
> > @@ -342,8 +384,10 @@ struct vduse_dev_request {
> >                 struct vduse_vq_state vq_state;
> >                 struct vduse_dev_status s;
> >                 struct vduse_iova_range iova;
> > -               /* Only if vduse api version >= 1 */;
> > +               /* Following members only if vduse api version >= 1 */;
> >                 struct vduse_vq_group vq_group;
> > +               struct vduse_iova_range_v2 iova_v2;
> > +               struct vduse_vq_group_asid vq_group_asid;
> >                 __u32 padding[32];
>
> This seems to break the uAPI for the userspace that tries to use
> sizeof(struct vduse_dev_request)?
>

No, I'm adding a member to the union that is smaller than u32[32]:

https://patchew.org/linux/20250606115012.1331551-1-eperezma@redhat.com/20250606115012.1331551-4-eperezma@redhat.com/#CACGkMEvT._5F1ngR9Cs1A6ghNhZtyXiAb7qZq-Xj=7NWOzO9o5C=w@mail.gmail.com


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 5/6] vduse: add vq group asid support
  2025-09-01  9:11     ` Eugenio Perez Martin
@ 2025-09-03  3:56       ` Jason Wang
  2025-09-03  6:39         ` Eugenio Perez Martin
  0 siblings, 1 reply; 24+ messages in thread
From: Jason Wang @ 2025-09-03  3:56 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Michael S . Tsirkin, Cindy Lu, Stefano Garzarella, Xuan Zhuo,
	Laurent Vivier, virtualization, linux-kernel, Yongji Xie,
	Maxime Coquelin

On Mon, Sep 1, 2025 at 5:12 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
>
> On Mon, Sep 1, 2025 at 4:46 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Tue, Aug 26, 2025 at 7:27 PM Eugenio Pérez <eperezma@redhat.com> wrote:
> > >
> > > Add support for assigning Address Space Identifiers (ASIDs) to each VQ
> > > group.  This enables mapping each group into a distinct memory space.
> > >
> > > Now that the driver can change ASID in the middle of operation, the
> > > domain that each vq address point is also protected by domain_lock.
> > >
> > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > ---
> > > v3:
> > > * Increase VDUSE_MAX_VQ_GROUPS to 0xffff (Jason). It was set to a lower
> > >   value to reduce memory consumption, but vqs are already limited to
> > >   that value and userspace VDUSE is able to allocate that many vqs.
> > > * Remove TODO about merging VDUSE_IOTLB_GET_FD ioctl with
> > >   VDUSE_IOTLB_GET_INFO.
> > > * Use of array_index_nospec in VDUSE device ioctls.
> > > * Embed vduse_iotlb_entry into vduse_iotlb_entry_v2.
> > > * Move the umem mutex to asid struct so there is no contention between
> > >   ASIDs.
> > >
> > > v2:
> > > * Make iotlb entry the last one of vduse_iotlb_entry_v2 so the first
> > >   part of the struct is the same.
> > > ---
> > >  drivers/vdpa/vdpa_user/vduse_dev.c | 290 +++++++++++++++++++++--------
> > >  include/uapi/linux/vduse.h         |  52 +++++-
> > >  2 files changed, 259 insertions(+), 83 deletions(-)
> > >
> > > diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
> > > index 7d2a3ed77b1e..2fb227713972 100644
> > > --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> > > +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> > > @@ -92,6 +92,7 @@ struct vduse_as {
> > >  };
> > >
> > >  struct vduse_vq_group_int {
> > > +       struct vduse_iova_domain *domain;
> >
> > I'd expect this should be vduse_as. Anything I miss?
> >
>
> It just saves a memory indirection step, as the rest of members of
> vduse_as are not used in vduse_map_ops callbacks. I can use vduse_as
> if you prefer it.

I think it's better to use that to follow the abstraction of a group
to address space indirection.

>
> > >         struct vduse_dev *dev;
> > >  };
> > >
> > > @@ -99,7 +100,7 @@ struct vduse_dev {
> > >         struct vduse_vdpa *vdev;
> > >         struct device *dev;
> > >         struct vduse_virtqueue **vqs;
> > > -       struct vduse_as as;
> > > +       struct vduse_as *as;
> > >         char *name;
> > >         struct mutex lock;
> > >         spinlock_t msg_lock;
> > > @@ -127,6 +128,7 @@ struct vduse_dev {
> > >         u32 vq_num;
> > >         u32 vq_align;
> > >         u32 ngroups;
> > > +       u32 nas;
> > >         struct vduse_vq_group_int *groups;
> > >         unsigned int bounce_size;
> > >         struct mutex domain_lock;
> > > @@ -317,7 +319,7 @@ static int vduse_dev_set_status(struct vduse_dev *dev, u8 status)
> > >         return vduse_dev_msg_sync(dev, &msg);
> > >  }
> > >
> > > -static int vduse_dev_update_iotlb(struct vduse_dev *dev,
> > > +static int vduse_dev_update_iotlb(struct vduse_dev *dev, u32 asid,
> > >                                   u64 start, u64 last)
> > >  {
> > >         struct vduse_dev_msg msg = { 0 };
> > > @@ -326,8 +328,14 @@ static int vduse_dev_update_iotlb(struct vduse_dev *dev,
> > >                 return -EINVAL;
> > >
> > >         msg.req.type = VDUSE_UPDATE_IOTLB;
> > > -       msg.req.iova.start = start;
> > > -       msg.req.iova.last = last;
> > > +       if (dev->api_version < VDUSE_API_VERSION_1) {
> > > +               msg.req.iova.start = start;
> > > +               msg.req.iova.last = last;
> > > +       } else {
> > > +               msg.req.iova_v2.start = start;
> > > +               msg.req.iova_v2.last = last;
> > > +               msg.req.iova_v2.asid = asid;
> > > +       }
> > >
> > >         return vduse_dev_msg_sync(dev, &msg);
> > >  }
> > > @@ -439,14 +447,28 @@ static __poll_t vduse_dev_poll(struct file *file, poll_table *wait)
> > >         return mask;
> > >  }
> > >
> > > +/* Force set the asid to a vq group without a message to the VDUSE device */
> > > +static void vduse_set_group_asid_nomsg(struct vduse_dev *dev,
> > > +                                      unsigned int group, unsigned int asid)
> > > +{
> > > +       guard(mutex)(&dev->domain_lock);
> > > +       dev->groups[group].domain = dev->as[asid].domain;
> > > +}
> > > +
> > >  static void vduse_dev_reset(struct vduse_dev *dev)
> > >  {
> > >         int i;
> > > -       struct vduse_iova_domain *domain = dev->as.domain;
> > >
> > >         /* The coherent mappings are handled in vduse_dev_free_coherent() */
> > > -       if (domain && domain->bounce_map)
> > > -               vduse_domain_reset_bounce_map(domain);
> > > +       for (i = 0; i < dev->nas; i++) {
> > > +               struct vduse_iova_domain *domain = dev->as[i].domain;
> > > +
> > > +               if (domain && domain->bounce_map)
> > > +                       vduse_domain_reset_bounce_map(domain);
> > > +       }
> > > +
> > > +       for (i = 0; i < dev->ngroups; i++)
> > > +               vduse_set_group_asid_nomsg(dev, i, 0);
> > >
> > >         down_write(&dev->rwsem);
> > >
> > > @@ -620,6 +642,29 @@ static union virtio_map vduse_get_vq_map(struct vdpa_device *vdpa, u16 idx)
> > >         return ret;
> > >  }
> > >
> > > +static int vduse_set_group_asid(struct vdpa_device *vdpa, unsigned int group,
> > > +                               unsigned int asid)
> > > +{
> > > +       struct vduse_dev *dev = vdpa_to_vduse(vdpa);
> > > +       struct vduse_dev_msg msg = { 0 };
> > > +       int r;
> > > +
> > > +       if (dev->api_version < VDUSE_API_VERSION_1 ||
> > > +           group >= dev->ngroups || asid >= dev->nas)
> > > +               return -EINVAL;
> > > +
> > > +       msg.req.type = VDUSE_SET_VQ_GROUP_ASID;
> > > +       msg.req.vq_group_asid.group = group;
> > > +       msg.req.vq_group_asid.asid = asid;
> > > +
> > > +       r = vduse_dev_msg_sync(dev, &msg);
> > > +       if (r < 0)
> > > +               return r;
> > > +
> > > +       vduse_set_group_asid_nomsg(dev, group, asid);
> > > +       return 0;
> > > +}
> > > +
> > >  static int vduse_vdpa_get_vq_state(struct vdpa_device *vdpa, u16 idx,
> > >                                 struct vdpa_vq_state *state)
> > >  {
> > > @@ -818,13 +863,13 @@ static int vduse_vdpa_set_map(struct vdpa_device *vdpa,
> > >         struct vduse_dev *dev = vdpa_to_vduse(vdpa);
> > >         int ret;
> > >
> > > -       ret = vduse_domain_set_map(dev->as.domain, iotlb);
> > > +       ret = vduse_domain_set_map(dev->as[asid].domain, iotlb);
> > >         if (ret)
> > >                 return ret;
> > >
> > > -       ret = vduse_dev_update_iotlb(dev, 0ULL, ULLONG_MAX);
> > > +       ret = vduse_dev_update_iotlb(dev, asid, 0ULL, ULLONG_MAX);
> > >         if (ret) {
> > > -               vduse_domain_clear_map(dev->as.domain, iotlb);
> > > +               vduse_domain_clear_map(dev->as[asid].domain, iotlb);
> > >                 return ret;
> > >         }
> > >
> > > @@ -867,6 +912,7 @@ static const struct vdpa_config_ops vduse_vdpa_config_ops = {
> > >         .get_vq_affinity        = vduse_vdpa_get_vq_affinity,
> > >         .reset                  = vduse_vdpa_reset,
> > >         .set_map                = vduse_vdpa_set_map,
> > > +       .set_group_asid         = vduse_set_group_asid,
> > >         .get_vq_map             = vduse_get_vq_map,
> > >         .free                   = vduse_vdpa_free,
> > >  };
> > > @@ -876,8 +922,10 @@ static void vduse_dev_sync_single_for_device(union virtio_map token,
> > >                                              enum dma_data_direction dir)
> > >  {
> > >         struct vduse_dev *vdev = token.group->dev;
> > > -       struct vduse_iova_domain *domain = vdev->as.domain;
> > > +       struct vduse_iova_domain *domain;
> > >
> > > +       guard(mutex)(&vdev->domain_lock);
> >
> > Is this correct? I mean each AS should have its own lock instead of
> > having a BQL.
> >
>
> That would not protect the pointer if vduse_dev_sync_single_for_device
> (or equivalent function that uses the domain) is called at the same
> time as vduse_set_group_asid.
>
> > > +       domain = token.group->domain;
> > >         vduse_domain_sync_single_for_device(domain, dma_addr, size, dir);
> > >  }
> > >
> > > @@ -886,8 +934,10 @@ static void vduse_dev_sync_single_for_cpu(union virtio_map token,
> > >                                              enum dma_data_direction dir)
> > >  {
> > >         struct vduse_dev *vdev = token.group->dev;
> > > -       struct vduse_iova_domain *domain = vdev->as.domain;
> > > +       struct vduse_iova_domain *domain;
> > >
> > > +       guard(mutex)(&vdev->domain_lock);
> > > +       domain = token.group->domain;
> > >         vduse_domain_sync_single_for_cpu(domain, dma_addr, size, dir);
> > >  }
> > >
> > > @@ -897,8 +947,10 @@ static dma_addr_t vduse_dev_map_page(union virtio_map token, struct page *page,
> > >                                      unsigned long attrs)
> > >  {
> > >         struct vduse_dev *vdev = token.group->dev;
> > > -       struct vduse_iova_domain *domain = vdev->as.domain;
> > > +       struct vduse_iova_domain *domain;
> > >
> > > +       guard(mutex)(&vdev->domain_lock);
> > > +       domain = token.group->domain;
> > >         return vduse_domain_map_page(domain, page, offset, size, dir, attrs);
> > >  }
> > >
> > > @@ -907,8 +959,10 @@ static void vduse_dev_unmap_page(union virtio_map token, dma_addr_t dma_addr,
> > >                                  unsigned long attrs)
> > >  {
> > >         struct vduse_dev *vdev = token.group->dev;
> > > -       struct vduse_iova_domain *domain = vdev->as.domain;
> > > +       struct vduse_iova_domain *domain;
> > >
> > > +       guard(mutex)(&vdev->domain_lock);
> > > +       domain = token.group->domain;
> > >         return vduse_domain_unmap_page(domain, dma_addr, size, dir, attrs);
> > >  }
> > >
> > > @@ -916,11 +970,13 @@ static void *vduse_dev_alloc_coherent(union virtio_map token, size_t size,
> > >                                       dma_addr_t *dma_addr, gfp_t flag)
> > >  {
> > >         struct vduse_dev *vdev = token.group->dev;
> > > -       struct vduse_iova_domain *domain = vdev->as.domain;
> > > +       struct vduse_iova_domain *domain;
> > >         unsigned long iova;
> > >         void *addr;
> > >
> > >         *dma_addr = DMA_MAPPING_ERROR;
> > > +       guard(mutex)(&vdev->domain_lock);
> > > +       domain = token.group->domain;
> > >         addr = vduse_domain_alloc_coherent(domain, size,
> > >                                            (dma_addr_t *)&iova, flag);
> > >         if (!addr)
> > > @@ -936,16 +992,20 @@ static void vduse_dev_free_coherent(union virtio_map token, size_t size,
> > >                                     unsigned long attrs)
> > >  {
> > >         struct vduse_dev *vdev = token.group->dev;
> > > -       struct vduse_iova_domain *domain = vdev->as.domain;
> > > +       struct vduse_iova_domain *domain;
> > >
> > > +       guard(mutex)(&vdev->domain_lock);
> > > +       domain = token.group->domain;
> > >         vduse_domain_free_coherent(domain, size, vaddr, dma_addr, attrs);
> > >  }
> > >
> > >  static bool vduse_dev_need_sync(union virtio_map token, dma_addr_t dma_addr)
> > >  {
> > >         struct vduse_dev *vdev = token.group->dev;
> > > -       struct vduse_iova_domain *domain = vdev->as.domain;
> > > +       struct vduse_iova_domain *domain;
> > >
> > > +       guard(mutex)(&vdev->domain_lock);
> > > +       domain = token.group->domain;
> > >         return dma_addr < domain->bounce_size;
> > >  }
> > >
> > > @@ -959,8 +1019,10 @@ static int vduse_dev_mapping_error(union virtio_map token, dma_addr_t dma_addr)
> > >  static size_t vduse_dev_max_mapping_size(union virtio_map token)
> > >  {
> > >         struct vduse_dev *vdev = token.group->dev;
> > > -       struct vduse_iova_domain *domain = vdev->as.domain;
> > > +       struct vduse_iova_domain *domain;
> > >
> > > +       guard(mutex)(&vdev->domain_lock);
> > > +       domain = token.group->domain;
> > >         return domain->bounce_size;
> > >  }
> > >
> > > @@ -1101,39 +1163,40 @@ static int vduse_dev_queue_irq_work(struct vduse_dev *dev,
> > >         return ret;
> > >  }
> > >
> > > -static int vduse_dev_dereg_umem(struct vduse_dev *dev,
> > > +static int vduse_dev_dereg_umem(struct vduse_dev *dev, u32 asid,
> > >                                 u64 iova, u64 size)
> > >  {
> > >         int ret;
> > >
> > > -       mutex_lock(&dev->as.mem_lock);
> > > +       mutex_lock(&dev->as[asid].mem_lock);
> > >         ret = -ENOENT;
> > > -       if (!dev->as.umem)
> > > +       if (!dev->as[asid].umem)
> > >                 goto unlock;
> > >
> > >         ret = -EINVAL;
> > > -       if (!dev->as.domain)
> > > +       if (!dev->as[asid].domain)
> > >                 goto unlock;
> > >
> > > -       if (dev->as.umem->iova != iova || size != dev->as.domain->bounce_size)
> > > +       if (dev->as[asid].umem->iova != iova ||
> > > +           size != dev->as[asid].domain->bounce_size)
> > >                 goto unlock;
> > >
> > > -       vduse_domain_remove_user_bounce_pages(dev->as.domain);
> > > -       unpin_user_pages_dirty_lock(dev->as.umem->pages,
> > > -                                   dev->as.umem->npages, true);
> > > -       atomic64_sub(dev->as.umem->npages, &dev->as.umem->mm->pinned_vm);
> > > -       mmdrop(dev->as.umem->mm);
> > > -       vfree(dev->as.umem->pages);
> > > -       kfree(dev->as.umem);
> > > -       dev->as.umem = NULL;
> > > +       vduse_domain_remove_user_bounce_pages(dev->as[asid].domain);
> > > +       unpin_user_pages_dirty_lock(dev->as[asid].umem->pages,
> > > +                                   dev->as[asid].umem->npages, true);
> > > +       atomic64_sub(dev->as[asid].umem->npages, &dev->as[asid].umem->mm->pinned_vm);
> > > +       mmdrop(dev->as[asid].umem->mm);
> > > +       vfree(dev->as[asid].umem->pages);
> > > +       kfree(dev->as[asid].umem);
> > > +       dev->as[asid].umem = NULL;
> > >         ret = 0;
> > >  unlock:
> > > -       mutex_unlock(&dev->as.mem_lock);
> > > +       mutex_unlock(&dev->as[asid].mem_lock);
> > >         return ret;
> > >  }
> > >
> > >  static int vduse_dev_reg_umem(struct vduse_dev *dev,
> > > -                             u64 iova, u64 uaddr, u64 size)
> > > +                             u32 asid, u64 iova, u64 uaddr, u64 size)
> > >  {
> > >         struct page **page_list = NULL;
> > >         struct vduse_umem *umem = NULL;
> > > @@ -1141,14 +1204,14 @@ static int vduse_dev_reg_umem(struct vduse_dev *dev,
> > >         unsigned long npages, lock_limit;
> > >         int ret;
> > >
> > > -       if (!dev->as.domain || !dev->as.domain->bounce_map ||
> > > -           size != dev->as.domain->bounce_size ||
> > > +       if (!dev->as[asid].domain || !dev->as[asid].domain->bounce_map ||
> > > +           size != dev->as[asid].domain->bounce_size ||
> > >             iova != 0 || uaddr & ~PAGE_MASK)
> > >                 return -EINVAL;
> > >
> > > -       mutex_lock(&dev->as.mem_lock);
> > > +       mutex_lock(&dev->as[asid].mem_lock);
> > >         ret = -EEXIST;
> > > -       if (dev->as.umem)
> > > +       if (dev->as[asid].umem)
> > >                 goto unlock;
> > >
> > >         ret = -ENOMEM;
> > > @@ -1172,7 +1235,7 @@ static int vduse_dev_reg_umem(struct vduse_dev *dev,
> > >                 goto out;
> > >         }
> > >
> > > -       ret = vduse_domain_add_user_bounce_pages(dev->as.domain,
> > > +       ret = vduse_domain_add_user_bounce_pages(dev->as[asid].domain,
> > >                                                  page_list, pinned);
> > >         if (ret)
> > >                 goto out;
> > > @@ -1185,7 +1248,7 @@ static int vduse_dev_reg_umem(struct vduse_dev *dev,
> > >         umem->mm = current->mm;
> > >         mmgrab(current->mm);
> > >
> > > -       dev->as.umem = umem;
> > > +       dev->as[asid].umem = umem;
> > >  out:
> > >         if (ret && pinned > 0)
> > >                 unpin_user_pages(page_list, pinned);
> > > @@ -1196,7 +1259,7 @@ static int vduse_dev_reg_umem(struct vduse_dev *dev,
> > >                 vfree(page_list);
> > >                 kfree(umem);
> > >         }
> > > -       mutex_unlock(&dev->as.mem_lock);
> > > +       mutex_unlock(&dev->as[asid].mem_lock);
> > >         return ret;
> > >  }
> > >
> > > @@ -1228,47 +1291,66 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
> > >
> > >         switch (cmd) {
> > >         case VDUSE_IOTLB_GET_FD: {
> > > -               struct vduse_iotlb_entry entry;
> > > +               struct vduse_iotlb_entry_v2 entry;
> > >                 struct vhost_iotlb_map *map;
> > >                 struct vdpa_map_file *map_file;
> > >                 struct file *f = NULL;
> > > +               u32 asid;
> > >
> > >                 ret = -EFAULT;
> > > -               if (copy_from_user(&entry, argp, sizeof(entry)))
> > > -                       break;
> > > +               if (dev->api_version >= VDUSE_API_VERSION_1) {
> > > +                       if (copy_from_user(&entry, argp, sizeof(entry)))
> > > +                               break;
> > > +               } else {
> > > +                       entry.asid = 0;
> > > +                       if (copy_from_user(&entry.v1, argp,
> > > +                                          sizeof(entry.v1)))
> > > +                               break;
> > > +               }
> > >
> > >                 ret = -EINVAL;
> > > -               if (entry.start > entry.last)
> > > +               if (entry.v1.start > entry.v1.last)
> > > +                       break;
> > > +
> > > +               if (entry.asid >= dev->nas)
> > >                         break;
> > >
> > >                 mutex_lock(&dev->domain_lock);
> > > -               if (!dev->as.domain) {
> > > +               asid = array_index_nospec(entry.asid, dev->nas);
> > > +               if (!dev->as[asid].domain) {
> > >                         mutex_unlock(&dev->domain_lock);
> > >                         break;
> > >                 }
> > > -               spin_lock(&dev->as.domain->iotlb_lock);
> > > -               map = vhost_iotlb_itree_first(dev->as.domain->iotlb,
> > > -                                             entry.start, entry.last);
> > > +               spin_lock(&dev->as[asid].domain->iotlb_lock);
> > > +               map = vhost_iotlb_itree_first(dev->as[asid].domain->iotlb,
> > > +                                             entry.v1.start, entry.v1.last);
> > >                 if (map) {
> > >                         map_file = (struct vdpa_map_file *)map->opaque;
> > >                         f = get_file(map_file->file);
> > > -                       entry.offset = map_file->offset;
> > > -                       entry.start = map->start;
> > > -                       entry.last = map->last;
> > > -                       entry.perm = map->perm;
> > > +                       entry.v1.offset = map_file->offset;
> > > +                       entry.v1.start = map->start;
> > > +                       entry.v1.last = map->last;
> > > +                       entry.v1.perm = map->perm;
> > >                 }
> > > -               spin_unlock(&dev->as.domain->iotlb_lock);
> > > +               spin_unlock(&dev->as[asid].domain->iotlb_lock);
> > >                 mutex_unlock(&dev->domain_lock);
> > >                 ret = -EINVAL;
> > >                 if (!f)
> > >                         break;
> > >
> > >                 ret = -EFAULT;
> > > -               if (copy_to_user(argp, &entry, sizeof(entry))) {
> > > +               if (dev->api_version >= VDUSE_API_VERSION_1)
> > > +                       ret = copy_to_user(argp, &entry,
> > > +                                          sizeof(entry));
> > > +               else
> > > +                       ret = copy_to_user(argp, &entry.v1,
> > > +                                          sizeof(entry.v1));
> > > +
> > > +               if (ret) {
> > >                         fput(f);
> > >                         break;
> > >                 }
> > > -               ret = receive_fd(f, NULL, perm_to_file_flags(entry.perm));
> > > +               ret = receive_fd(f, NULL, perm_to_file_flags(entry.v1.perm));
> > >                 fput(f);
> > >                 break;
> > >         }
> > > @@ -1401,6 +1483,7 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
> > >         }
> > >         case VDUSE_IOTLB_REG_UMEM: {
> > >                 struct vduse_iova_umem umem;
> > > +               u32 asid;
> > >
> > >                 ret = -EFAULT;
> > >                 if (copy_from_user(&umem, argp, sizeof(umem)))
> > > @@ -1408,17 +1491,21 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
> > >
> > >                 ret = -EINVAL;
> > >                 if (!is_mem_zero((const char *)umem.reserved,
> > > -                                sizeof(umem.reserved)))
> > > +                                sizeof(umem.reserved)) ||
> > > +                   (dev->api_version < VDUSE_API_VERSION_1 &&
> > > +                    umem.asid != 0) || umem.asid >= dev->nas)
> > >                         break;
> >
> > Does this mean umem is only supported for asid == 0? This looks like a bug.
> >
>
> No, that conditional means that:
> * If dev->api_version < V1 (In other words, if it is 0), umem.asid
> must be 0. Previous versions already return -EINVAL if the reserved
> members are 0 so we must keep this behavior.
> * If the version is 1 or bigger, the asid must be smaller than the
> device number of ASID (dev->nas).

Ok I see, I misread it as

+                   (dev->api_version < VDUSE_API_VERSION_1 &&
+                    (umem.asid != 0) || umem.asid >= dev->nas))

>
> > >
> > >                 mutex_lock(&dev->domain_lock);
> > > -               ret = vduse_dev_reg_umem(dev, umem.iova,
> > > +               asid = array_index_nospec(umem.asid, dev->nas);
> > > +               ret = vduse_dev_reg_umem(dev, asid, umem.iova,
> > >                                          umem.uaddr, umem.size);
> > >                 mutex_unlock(&dev->domain_lock);
> > >                 break;
> > >         }
> > >         case VDUSE_IOTLB_DEREG_UMEM: {
> > >                 struct vduse_iova_umem umem;
> > > +               u32 asid;
> > >
> > >                 ret = -EFAULT;
> > >                 if (copy_from_user(&umem, argp, sizeof(umem)))
> > > @@ -1426,10 +1513,15 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
> > >
> > >                 ret = -EINVAL;
> > >                 if (!is_mem_zero((const char *)umem.reserved,
> > > -                                sizeof(umem.reserved)))
> > > +                                sizeof(umem.reserved)) ||
> > > +                   (dev->api_version < VDUSE_API_VERSION_1 &&
> > > +                    umem.asid != 0) ||
> > > +                    umem.asid >= dev->nas)
> > >                         break;
> > > +
> > >                 mutex_lock(&dev->domain_lock);
> > > -               ret = vduse_dev_dereg_umem(dev, umem.iova,
> > > +               asid = array_index_nospec(umem.asid, dev->nas);
> > > +               ret = vduse_dev_dereg_umem(dev, asid, umem.iova,
> > >                                            umem.size);
> > >                 mutex_unlock(&dev->domain_lock);
> > >                 break;
> > > @@ -1437,6 +1529,7 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
> > >         case VDUSE_IOTLB_GET_INFO: {
> > >                 struct vduse_iova_info info;
> > >                 struct vhost_iotlb_map *map;
> > > +               u32 asid;
> > >
> > >                 ret = -EFAULT;
> > >                 if (copy_from_user(&info, argp, sizeof(info)))
> > > @@ -1450,23 +1543,31 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
> > >                                  sizeof(info.reserved)))
> > >                         break;
> > >
> > > +               if (dev->api_version < VDUSE_API_VERSION_1) {
> > > +                       if (info.asid)
> > > +                               break;
> > > +               } else if (info.asid >= dev->nas)
> >
> > It would be simpler if we mandate dev->nas == 1 VERSION 0.
> >
>
> That's done later in this patch, in the change of the
> vduse_validate_config function.
>
> > > +                       break;
> > > +
> > >                 mutex_lock(&dev->domain_lock);
> > > -               if (!dev->as.domain) {
> > > +               asid = array_index_nospec(info.asid, dev->nas);
> > > +               if (!dev->as[asid].domain) {
> > >                         mutex_unlock(&dev->domain_lock);
> > >                         break;
> > >                 }
> > > -               spin_lock(&dev->as.domain->iotlb_lock);
> > > -               map = vhost_iotlb_itree_first(dev->as.domain->iotlb,
> > > +               spin_lock(&dev->as[asid].domain->iotlb_lock);
> > > +               map = vhost_iotlb_itree_first(dev->as[asid].domain->iotlb,
> > >                                               info.start, info.last);
> > >                 if (map) {
> > >                         info.start = map->start;
> > >                         info.last = map->last;
> > >                         info.capability = 0;
> > > -                       if (dev->as.domain->bounce_map && map->start == 0 &&
> > > -                           map->last == dev->as.domain->bounce_size - 1)
> > > +                       if (dev->as[asid].domain->bounce_map &&
> > > +                           map->start == 0 &&
> > > +                           map->last == dev->as[asid].domain->bounce_size - 1)
> > >                                 info.capability |= VDUSE_IOVA_CAP_UMEM;
> > >                 }
> > > -               spin_unlock(&dev->as.domain->iotlb_lock);
> > > +               spin_unlock(&dev->as[asid].domain->iotlb_lock);
> > >                 mutex_unlock(&dev->domain_lock);
> > >                 if (!map)
> > >                         break;
> > > @@ -1491,8 +1592,10 @@ static int vduse_dev_release(struct inode *inode, struct file *file)
> > >         struct vduse_dev *dev = file->private_data;
> > >
> > >         mutex_lock(&dev->domain_lock);
> > > -       if (dev->as.domain)
> > > -               vduse_dev_dereg_umem(dev, 0, dev->as.domain->bounce_size);
> > > +       for (int i = 0; i < dev->nas; i++)
> > > +               if (dev->as[i].domain)
> >
> > Not related to this patch, but I wonder which case could we get domain
> > == NULL here?
> >
>
> I never tried myself but the domain is set at vdpa_dev_add. If a VDUSE
> device is created by the VDUSE_CREATE_DEV ioctl and then destroyed
> with VDUSE_DESTROY_DEV without calling "vdpa dev add" the
> vduse_dev_release function will find dev->as[i].domain == NULL.
>

Ok, but I didn't see a similar if(dev->domain) check in
VDUSE_IOTLB_DEREG_UMEM. Is this a bug? If it is, maybe it's better to
move the as check inside vduse_dev_dereg_umem.


> > > +                       vduse_dev_dereg_umem(dev, i, 0,
> > > +                                            dev->as[i].domain->bounce_size);
> > >         mutex_unlock(&dev->domain_lock);
> > >         spin_lock(&dev->msg_lock);
> > >         /* Make sure the inflight messages can processed after reconncection */
> > > @@ -1711,7 +1814,6 @@ static struct vduse_dev *vduse_dev_create(void)
> > >                 return NULL;
> > >
> > >         mutex_init(&dev->lock);
> > > -       mutex_init(&dev->as.mem_lock);
> > >         mutex_init(&dev->domain_lock);
> > >         spin_lock_init(&dev->msg_lock);
> > >         INIT_LIST_HEAD(&dev->send_list);
> > > @@ -1762,8 +1864,11 @@ static int vduse_destroy_dev(char *name)
> > >         idr_remove(&vduse_idr, dev->minor);
> > >         kvfree(dev->config);
> > >         vduse_dev_deinit_vqs(dev);
> > > -       if (dev->as.domain)
> > > -               vduse_domain_destroy(dev->as.domain);
> > > +       for (int i = 0; i < dev->nas; i++) {
> > > +               if (dev->as[i].domain)
> > > +                       vduse_domain_destroy(dev->as[i].domain);
> > > +       }
> > > +       kfree(dev->as);
> > >         kfree(dev->name);
> > >         kfree(dev->groups);
> > >         vduse_dev_destroy(dev);
> > > @@ -1810,12 +1915,16 @@ static bool vduse_validate_config(struct vduse_dev_config *config,
> > >                          sizeof(config->reserved)))
> > >                 return false;
> > >
> > > -       if (api_version < VDUSE_API_VERSION_1 && config->ngroups)
> > > +       if (api_version < VDUSE_API_VERSION_1 &&
> > > +           (config->ngroups || config->nas))
> > >                 return false;
> > >
> > >         if (api_version >= VDUSE_API_VERSION_1 && config->ngroups > 0xffff)
> > >                 return false;
> > >
> > > +       if (api_version >= VDUSE_API_VERSION_1 && config->nas > 0xffff)
> > > +               return false;
> >
> > Using macro instead of magic number please.
> >
> > > +
> > >         if (config->vq_align > PAGE_SIZE)
> > >                 return false;
> > >
> > > @@ -1879,7 +1988,8 @@ static ssize_t bounce_size_store(struct device *device,
> > >
> >
> > So the real size of the bounce should be bounce_size * nas? Should we
> > be conservative to adjust per as bounce size to bounce_size / nas?
> >
>
> I think that's a bad idea in net devices in particular, as I expect
> the dataplane will need way more memory than CVQ.

Ok, I just want to note that the bounce might consume more than what
the user set here.

> bounce_size per
> device seems more correct to me. But I can divide per nas for sure.

Or make bounce_size to be the total size of all virtqueues, so
management will increase it when it knows there's a multiqueue device.

>
> > >         ret = -EPERM;
> > >         mutex_lock(&dev->domain_lock);
> > > -       if (dev->as.domain)
> > > +       /* Assuming that if the first domain is allocated, all are allocated */
> > > +       if (dev->as[0].domain)
> > >                 goto unlock;
> > >
> > >         ret = kstrtouint(buf, 10, &bounce_size);
> > > @@ -1940,6 +2050,13 @@ static int vduse_create_dev(struct vduse_dev_config *config,
> > >         for (u32 i = 0; i < dev->ngroups; ++i)
> > >                 dev->groups[i].dev = dev;
> > >
> > > +       dev->nas = (dev->api_version < 1) ? 1 : (config->nas ?: 1);
> > > +       dev->as = kcalloc(dev->nas, sizeof(dev->as[0]), GFP_KERNEL);
> > > +       if (!dev->as)
> > > +               goto err_as;
> > > +       for (int i = 0; i < dev->nas; i++)
> > > +               mutex_init(&dev->as[i].mem_lock);
> > > +
> > >         dev->name = kstrdup(config->name, GFP_KERNEL);
> > >         if (!dev->name)
> > >                 goto err_str;
> > > @@ -1976,6 +2093,8 @@ static int vduse_create_dev(struct vduse_dev_config *config,
> > >  err_idr:
> > >         kfree(dev->name);
> > >  err_str:
> > > +       kfree(dev->as);
> > > +err_as:
> > >         kfree(dev->groups);
> > >  err_vq_groups:
> > >         vduse_dev_destroy(dev);
> > > @@ -2101,7 +2220,7 @@ static int vduse_dev_init_vdpa(struct vduse_dev *dev, const char *name)
> > >
> > >         vdev = vdpa_alloc_device(struct vduse_vdpa, vdpa, dev->dev,
> > >                                  &vduse_vdpa_config_ops, &vduse_map_ops,
> > > -                                dev->ngroups, 1, name, true);
> > > +                                dev->ngroups, dev->nas, name, true);
> > >         if (IS_ERR(vdev))
> > >                 return PTR_ERR(vdev);
> > >
> > > @@ -2130,11 +2249,20 @@ static int vdpa_dev_add(struct vdpa_mgmt_dev *mdev, const char *name,
> > >                 return ret;
> > >
> > >         mutex_lock(&dev->domain_lock);
> > > -       if (!dev->as.domain)
> > > -               dev->as.domain = vduse_domain_create(VDUSE_IOVA_SIZE - 1,
> > > -                                                 dev->bounce_size);
> > > +       ret = 0;
> > > +
> > > +       for (int i = 0; i < dev->nas; ++i) {
> > > +               dev->as[i].domain = vduse_domain_create(VDUSE_IOVA_SIZE - 1,
> > > +                                                       dev->bounce_size);
> > > +               if (!dev->as[i].domain) {
> > > +                       ret = -ENOMEM;
> > > +                       for (int j = 0; j < i; ++j)
> > > +                               vduse_domain_destroy(dev->as[j].domain);
> > > +               }
> > > +       }
> > > +
> > >         mutex_unlock(&dev->domain_lock);
> > > -       if (!dev->as.domain) {
> > > +       if (ret == -ENOMEM) {
> > >                 put_device(&dev->vdev->vdpa.dev);
> > >                 return -ENOMEM;
> > >         }
> > > @@ -2143,8 +2271,12 @@ static int vdpa_dev_add(struct vdpa_mgmt_dev *mdev, const char *name,
> > >         if (ret) {
> > >                 put_device(&dev->vdev->vdpa.dev);
> > >                 mutex_lock(&dev->domain_lock);
> > > -               vduse_domain_destroy(dev->as.domain);
> > > -               dev->as.domain = NULL;
> > > +               for (int i = 0; i < dev->nas; i++) {
> > > +                       if (dev->as[i].domain) {
> > > +                               vduse_domain_destroy(dev->as[i].domain);
> > > +                               dev->as[i].domain = NULL;
> > > +                       }
> >
> > This is a little bit duplicated with the error handling above, should
> > we consider to switch to use err labels?
> >
>
> Sure, I can change it.
>
> > > +               }
> > >                 mutex_unlock(&dev->domain_lock);
> > >                 return ret;
> > >         }
> > > diff --git a/include/uapi/linux/vduse.h b/include/uapi/linux/vduse.h
> > > index b1c0e47d71fb..54da965a65dc 100644
> > > --- a/include/uapi/linux/vduse.h
> > > +++ b/include/uapi/linux/vduse.h
> > > @@ -47,7 +47,8 @@ struct vduse_dev_config {
> > >         __u32 vq_num;
> > >         __u32 vq_align;
> > >         __u32 ngroups; /* if VDUSE_API_VERSION >= 1 */
> > > -       __u32 reserved[12];
> > > +       __u32 nas; /* if VDUSE_API_VERSION >= 1 */
> > > +       __u32 reserved[11];
> > >         __u32 config_size;
> > >         __u8 config[];
> > >  };
> > > @@ -82,6 +83,18 @@ struct vduse_iotlb_entry {
> > >         __u8 perm;
> > >  };
> > >
> > > +/**
> > > + * struct vduse_iotlb_entry_v2 - entry of IOTLB to describe one IOVA region in an ASID
> > > + * @v1: the original vduse_iotlb_entry
> > > + * @asid: address space ID of the IOVA region
> > > + *
> > > + * Structure used by VDUSE_IOTLB_GET_FD ioctl to find an overlapped IOVA region.
> > > + */
> > > +struct vduse_iotlb_entry_v2 {
> > > +       struct vduse_iotlb_entry v1;
> > > +       __u32 asid;
> > > +};
> > > +
> > >  /*
> > >   * Find the first IOVA region that overlaps with the range [start, last]
> > >   * and return the corresponding file descriptor. Return -EINVAL means the
> > > @@ -172,6 +185,16 @@ struct vduse_vq_group {
> > >         __u32 group;
> > >  };
> > >
> > > +/**
> > > + * struct vduse_vq_group - virtqueue group
> > > + @ @group: Index of the virtqueue group
> > > + * @asid: Address space ID of the group
> > > + */
> > > +struct vduse_vq_group_asid {
> > > +       __u32 group;
> > > +       __u32 asid;
> > > +};
> > > +
> > >  /**
> > >   * struct vduse_vq_info - information of a virtqueue
> > >   * @index: virtqueue index
> > > @@ -231,6 +254,7 @@ struct vduse_vq_eventfd {
> > >   * @uaddr: start address of userspace memory, it must be aligned to page size
> > >   * @iova: start of the IOVA region
> > >   * @size: size of the IOVA region
> > > + * @asid: Address space ID of the IOVA region
> > >   * @reserved: for future use, needs to be initialized to zero
> > >   *
> > >   * Structure used by VDUSE_IOTLB_REG_UMEM and VDUSE_IOTLB_DEREG_UMEM
> > > @@ -240,7 +264,8 @@ struct vduse_iova_umem {
> > >         __u64 uaddr;
> > >         __u64 iova;
> > >         __u64 size;
> > > -       __u64 reserved[3];
> > > +       __u32 asid;
> > > +       __u32 reserved[5];
> > >  };
> > >
> > >  /* Register userspace memory for IOVA regions */
> > > @@ -254,6 +279,7 @@ struct vduse_iova_umem {
> > >   * @start: start of the IOVA region
> > >   * @last: last of the IOVA region
> > >   * @capability: capability of the IOVA regsion
> > > + * @asid: Address space ID of the IOVA region, only if device API version >= 1
> > >   * @reserved: for future use, needs to be initialized to zero
> > >   *
> > >   * Structure used by VDUSE_IOTLB_GET_INFO ioctl to get information of
> > > @@ -264,7 +290,8 @@ struct vduse_iova_info {
> > >         __u64 last;
> > >  #define VDUSE_IOVA_CAP_UMEM (1 << 0)
> > >         __u64 capability;
> > > -       __u64 reserved[3];
> > > +       __u32 asid; /* Only if device API version >= 1 */
> > > +       __u32 reserved[5];
> > >  };
> > >
> > >  /*
> > > @@ -287,6 +314,7 @@ enum vduse_req_type {
> > >         VDUSE_SET_STATUS,
> > >         VDUSE_UPDATE_IOTLB,
> > >         VDUSE_GET_VQ_GROUP,
> > > +       VDUSE_SET_VQ_GROUP_ASID,
> > >  };
> > >
> > >  /**
> > > @@ -321,6 +349,18 @@ struct vduse_iova_range {
> > >         __u64 last;
> > >  };
> > >
> > > +/**
> > > + * struct vduse_iova_range - IOVA range [start, last] if API_VERSION >= 1
> > > + * @start: start of the IOVA range
> > > + * @last: last of the IOVA range
> > > + * @asid: address space ID of the IOVA range
> > > + */
> > > +struct vduse_iova_range_v2 {
> > > +       __u64 start;
> > > +       __u64 last;
> > > +       __u32 asid;
> > > +};
> > > +
> > >  /**
> > >   * struct vduse_dev_request - control request
> > >   * @type: request type
> > > @@ -330,6 +370,8 @@ struct vduse_iova_range {
> > >   * @s: device status
> > >   * @iova: IOVA range for updating
> > >   * @vq_group: virtqueue group of a virtqueue
> > > + * @iova_v2: IOVA range for updating if API_VERSION >= 1
> > > + * @vq_group_asid: ASID of a virtqueue group
> > >   * @padding: padding
> > >   *
> > >   * Structure used by read(2) on /dev/vduse/$NAME.
> > > @@ -342,8 +384,10 @@ struct vduse_dev_request {
> > >                 struct vduse_vq_state vq_state;
> > >                 struct vduse_dev_status s;
> > >                 struct vduse_iova_range iova;
> > > -               /* Only if vduse api version >= 1 */;
> > > +               /* Following members only if vduse api version >= 1 */;
> > >                 struct vduse_vq_group vq_group;
> > > +               struct vduse_iova_range_v2 iova_v2;
> > > +               struct vduse_vq_group_asid vq_group_asid;
> > >                 __u32 padding[32];
> >
> > This seems to break the uAPI for the userspace that tries to use
> > sizeof(struct vduse_dev_request)?
> >
>
> No, I'm adding a member to the union that is smaller than u32[32]:
>
> https://patchew.org/linux/20250606115012.1331551-1-eperezma@redhat.com/20250606115012.1331551-4-eperezma@redhat.com/#CACGkMEvT._5F1ngR9Cs1A6ghNhZtyXiAb7qZq-Xj=7NWOzO9o5C=w@mail.gmail.com
>

You are right.

Thanks


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/6] vduse: add vq group support
  2025-09-01  8:39     ` Eugenio Perez Martin
@ 2025-09-03  3:57       ` Jason Wang
  2025-09-03  3:58       ` Jason Wang
  1 sibling, 0 replies; 24+ messages in thread
From: Jason Wang @ 2025-09-03  3:57 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Michael S . Tsirkin, Cindy Lu, Stefano Garzarella, Xuan Zhuo,
	Laurent Vivier, virtualization, linux-kernel, Yongji Xie,
	Maxime Coquelin

On Mon, Sep 1, 2025 at 4:40 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
>
> On Mon, Sep 1, 2025 at 3:59 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Tue, Aug 26, 2025 at 7:27 PM Eugenio Pérez <eperezma@redhat.com> wrote:
> > >
> > > This allows sepparate the different virtqueues in groups that shares the
> > > same address space.  Asking the VDUSE device for the groups of the vq at
> > > the beginning as they're needed for the DMA API.
> > >
> > > Allocating 3 vq groups as net is the device that need the most groups:
> > > * Dataplane (guest passthrough)
> > > * CVQ
> > > * Shadowed vrings.
> > >
> > > Future versions of the series can include dynamic allocation of the
> > > groups array so VDUSE can declare more groups.
> > >
> > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > ---
> > > v1: Fix: Remove BIT_ULL(VIRTIO_S_*), as _S_ is already the bit (Maxime)
> > >
> > > RFC v3:
> > > * Increase VDUSE_MAX_VQ_GROUPS to 0xffff (Jason). It was set to a lower
> > >   value to reduce memory consumption, but vqs are already limited to
> > >   that value and userspace VDUSE is able to allocate that many vqs.
> > > * Remove the descs vq group capability as it will not be used and we can
> > >   add it on top.
> > > * Do not ask for vq groups in number of vq groups < 2.
> > > * Move the valid vq groups range check to vduse_validate_config.
> > >
> > > RFC v2:
> > > * Cache group information in kernel, as we need to provide the vq map
> > >   tokens properly.
> > > * Add descs vq group to optimize SVQ forwarding and support indirect
> > >   descriptors out of the box.
> > > ---
> > >  drivers/vdpa/vdpa_user/vduse_dev.c | 51 ++++++++++++++++++++++++++++--
> > >  include/uapi/linux/vduse.h         | 21 +++++++++++-
> > >  2 files changed, 68 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
> > > index e7bced0b5542..0f4e36dd167e 100644
> > > --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> > > +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> > > @@ -58,6 +58,7 @@ struct vduse_virtqueue {
> > >         struct vdpa_vq_state state;
> > >         bool ready;
> > >         bool kicked;
> > > +       u32 vq_group;
> > >         spinlock_t kick_lock;
> > >         spinlock_t irq_lock;
> > >         struct eventfd_ctx *kickfd;
> > > @@ -114,6 +115,7 @@ struct vduse_dev {
> > >         u8 status;
> > >         u32 vq_num;
> > >         u32 vq_align;
> > > +       u32 ngroups;
> > >         struct vduse_umem *umem;
> > >         struct mutex mem_lock;
> > >         unsigned int bounce_size;
> > > @@ -592,6 +594,13 @@ static int vduse_vdpa_set_vq_state(struct vdpa_device *vdpa, u16 idx,
> > >         return 0;
> > >  }
> > >
> > > +static u32 vduse_get_vq_group(struct vdpa_device *vdpa, u16 idx)
> > > +{
> > > +       struct vduse_dev *dev = vdpa_to_vduse(vdpa);
> > > +
> > > +       return dev->vqs[idx]->vq_group;
> > > +}
> > > +
> > >  static int vduse_vdpa_get_vq_state(struct vdpa_device *vdpa, u16 idx,
> > >                                 struct vdpa_vq_state *state)
> > >  {
> > > @@ -678,6 +687,28 @@ static u8 vduse_vdpa_get_status(struct vdpa_device *vdpa)
> > >         return dev->status;
> > >  }
> > >
> > > +static int vduse_fill_vq_groups(struct vduse_dev *dev)
> > > +{
> > > +       /* All vqs and descs must be in vq group 0 if ngroups < 2 */
> > > +       if (dev->ngroups < 2)
> > > +               return 0;
> > > +
> > > +       for (int i = 0; i < dev->vdev->vdpa.nvqs; ++i) {
> > > +               struct vduse_dev_msg msg = { 0 };
> > > +               int ret;
> > > +
> > > +               msg.req.type = VDUSE_GET_VQ_GROUP;
> > > +               msg.req.vq_group.index = i;
> > > +               ret = vduse_dev_msg_sync(dev, &msg);
> > > +               if (ret)
> > > +                       return ret;
> > > +
> > > +               dev->vqs[i]->vq_group = msg.resp.vq_group.group;
> > > +       }
> > > +
> > > +       return 0;
> > > +}
> > > +
> > >  static void vduse_vdpa_set_status(struct vdpa_device *vdpa, u8 status)
> > >  {
> > >         struct vduse_dev *dev = vdpa_to_vduse(vdpa);
> > > @@ -685,6 +716,11 @@ static void vduse_vdpa_set_status(struct vdpa_device *vdpa, u8 status)
> > >         if (vduse_dev_set_status(dev, status))
> > >                 return;
> > >
> > > +       if (((dev->status ^ status) & VIRTIO_CONFIG_S_FEATURES_OK) &&
> > > +           (status & VIRTIO_CONFIG_S_FEATURES_OK))
> > > +               if (vduse_fill_vq_groups(dev))
> > > +                       return;
> >
> > I may lose some context but I think we've agreed that we need to
> > extend the status response for this instead of having multiple
> > indepdent response.
> >
>
> My understanding was it is ok to start with this version by [1]. We
> can even make it asynchronous on top if we find this is a bottleneck
> and the VDUSE device would need no change, would that work?
>
> > > +
> > >         dev->status = status;
> > >  }
> > >
> > > @@ -789,6 +825,7 @@ static const struct vdpa_config_ops vduse_vdpa_config_ops = {
> > >         .set_vq_cb              = vduse_vdpa_set_vq_cb,
> > >         .set_vq_num             = vduse_vdpa_set_vq_num,
> > >         .get_vq_size            = vduse_vdpa_get_vq_size,
> > > +       .get_vq_group           = vduse_get_vq_group,
> > >         .set_vq_ready           = vduse_vdpa_set_vq_ready,
> > >         .get_vq_ready           = vduse_vdpa_get_vq_ready,
> > >         .set_vq_state           = vduse_vdpa_set_vq_state,
> > > @@ -1737,12 +1774,19 @@ static bool features_is_valid(struct vduse_dev_config *config)
> > >         return true;
> > >  }
> > >
> > > -static bool vduse_validate_config(struct vduse_dev_config *config)
> > > +static bool vduse_validate_config(struct vduse_dev_config *config,
> > > +                                 u64 api_version)
> > >  {
> > >         if (!is_mem_zero((const char *)config->reserved,
> > >                          sizeof(config->reserved)))
> > >                 return false;
> > >
> > > +       if (api_version < VDUSE_API_VERSION_1 && config->ngroups)
> > > +               return false;
> > > +
> > > +       if (api_version >= VDUSE_API_VERSION_1 && config->ngroups > 0xffff)
> > > +               return false;
> >
> > Let's use a macro instead of magic number.
> >
>
> The rest of the limits are hardcoded, but I'm ok with changing this.
> Is UINT16_MAX ok here, or do you prefer something like MAX_NGROUPS and
> MAX_ASID?

MAX_NGROUPS and MAX_ASID seem to be better.

Thanks

>
> [...]
>
> [1] https://patchew.org/linux/20250807115752.1663383-1-eperezma@redhat.com/20250807115752.1663383-3-eperezma@redhat.com/#CACGkMEuVngGjgPZXnajiPC+pcbt+dr6jqKRQr8OcX7HK1W3WNQ@mail.gmail.com
>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/6] vduse: add vq group support
  2025-09-01  8:39     ` Eugenio Perez Martin
  2025-09-03  3:57       ` Jason Wang
@ 2025-09-03  3:58       ` Jason Wang
  2025-09-03  6:28         ` Eugenio Perez Martin
  1 sibling, 1 reply; 24+ messages in thread
From: Jason Wang @ 2025-09-03  3:58 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Michael S . Tsirkin, Cindy Lu, Stefano Garzarella, Xuan Zhuo,
	Laurent Vivier, virtualization, linux-kernel, Yongji Xie,
	Maxime Coquelin

On Mon, Sep 1, 2025 at 4:40 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
>
> On Mon, Sep 1, 2025 at 3:59 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Tue, Aug 26, 2025 at 7:27 PM Eugenio Pérez <eperezma@redhat.com> wrote:
> > >
> > > This allows sepparate the different virtqueues in groups that shares the
> > > same address space.  Asking the VDUSE device for the groups of the vq at
> > > the beginning as they're needed for the DMA API.
> > >
> > > Allocating 3 vq groups as net is the device that need the most groups:
> > > * Dataplane (guest passthrough)
> > > * CVQ
> > > * Shadowed vrings.
> > >
> > > Future versions of the series can include dynamic allocation of the
> > > groups array so VDUSE can declare more groups.
> > >
> > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > ---
> > > v1: Fix: Remove BIT_ULL(VIRTIO_S_*), as _S_ is already the bit (Maxime)
> > >
> > > RFC v3:
> > > * Increase VDUSE_MAX_VQ_GROUPS to 0xffff (Jason). It was set to a lower
> > >   value to reduce memory consumption, but vqs are already limited to
> > >   that value and userspace VDUSE is able to allocate that many vqs.
> > > * Remove the descs vq group capability as it will not be used and we can
> > >   add it on top.
> > > * Do not ask for vq groups in number of vq groups < 2.
> > > * Move the valid vq groups range check to vduse_validate_config.
> > >
> > > RFC v2:
> > > * Cache group information in kernel, as we need to provide the vq map
> > >   tokens properly.
> > > * Add descs vq group to optimize SVQ forwarding and support indirect
> > >   descriptors out of the box.
> > > ---
> > >  drivers/vdpa/vdpa_user/vduse_dev.c | 51 ++++++++++++++++++++++++++++--
> > >  include/uapi/linux/vduse.h         | 21 +++++++++++-
> > >  2 files changed, 68 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
> > > index e7bced0b5542..0f4e36dd167e 100644
> > > --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> > > +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> > > @@ -58,6 +58,7 @@ struct vduse_virtqueue {
> > >         struct vdpa_vq_state state;
> > >         bool ready;
> > >         bool kicked;
> > > +       u32 vq_group;
> > >         spinlock_t kick_lock;
> > >         spinlock_t irq_lock;
> > >         struct eventfd_ctx *kickfd;
> > > @@ -114,6 +115,7 @@ struct vduse_dev {
> > >         u8 status;
> > >         u32 vq_num;
> > >         u32 vq_align;
> > > +       u32 ngroups;
> > >         struct vduse_umem *umem;
> > >         struct mutex mem_lock;
> > >         unsigned int bounce_size;
> > > @@ -592,6 +594,13 @@ static int vduse_vdpa_set_vq_state(struct vdpa_device *vdpa, u16 idx,
> > >         return 0;
> > >  }
> > >
> > > +static u32 vduse_get_vq_group(struct vdpa_device *vdpa, u16 idx)
> > > +{
> > > +       struct vduse_dev *dev = vdpa_to_vduse(vdpa);
> > > +
> > > +       return dev->vqs[idx]->vq_group;
> > > +}
> > > +
> > >  static int vduse_vdpa_get_vq_state(struct vdpa_device *vdpa, u16 idx,
> > >                                 struct vdpa_vq_state *state)
> > >  {
> > > @@ -678,6 +687,28 @@ static u8 vduse_vdpa_get_status(struct vdpa_device *vdpa)
> > >         return dev->status;
> > >  }
> > >
> > > +static int vduse_fill_vq_groups(struct vduse_dev *dev)
> > > +{
> > > +       /* All vqs and descs must be in vq group 0 if ngroups < 2 */
> > > +       if (dev->ngroups < 2)
> > > +               return 0;
> > > +
> > > +       for (int i = 0; i < dev->vdev->vdpa.nvqs; ++i) {
> > > +               struct vduse_dev_msg msg = { 0 };
> > > +               int ret;
> > > +
> > > +               msg.req.type = VDUSE_GET_VQ_GROUP;
> > > +               msg.req.vq_group.index = i;
> > > +               ret = vduse_dev_msg_sync(dev, &msg);
> > > +               if (ret)
> > > +                       return ret;
> > > +
> > > +               dev->vqs[i]->vq_group = msg.resp.vq_group.group;
> > > +       }
> > > +
> > > +       return 0;
> > > +}
> > > +
> > >  static void vduse_vdpa_set_status(struct vdpa_device *vdpa, u8 status)
> > >  {
> > >         struct vduse_dev *dev = vdpa_to_vduse(vdpa);
> > > @@ -685,6 +716,11 @@ static void vduse_vdpa_set_status(struct vdpa_device *vdpa, u8 status)
> > >         if (vduse_dev_set_status(dev, status))
> > >                 return;
> > >
> > > +       if (((dev->status ^ status) & VIRTIO_CONFIG_S_FEATURES_OK) &&
> > > +           (status & VIRTIO_CONFIG_S_FEATURES_OK))
> > > +               if (vduse_fill_vq_groups(dev))
> > > +                       return;
> >
> > I may lose some context but I think we've agreed that we need to
> > extend the status response for this instead of having multiple
> > indepdent response.
> >
>
> My understanding was it is ok to start with this version by [1]. We
> can even make it asynchronous on top if we find this is a bottleneck
> and the VDUSE device would need no change, would that work?

I think I need to understand why we can not defer this to get_group_asid() call.

Thanks


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/6] vduse: add vq group support
  2025-09-03  3:58       ` Jason Wang
@ 2025-09-03  6:28         ` Eugenio Perez Martin
  2025-09-03  7:40           ` Jason Wang
  0 siblings, 1 reply; 24+ messages in thread
From: Eugenio Perez Martin @ 2025-09-03  6:28 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S . Tsirkin, Cindy Lu, Stefano Garzarella, Xuan Zhuo,
	Laurent Vivier, virtualization, linux-kernel, Yongji Xie,
	Maxime Coquelin

On Wed, Sep 3, 2025 at 5:58 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Mon, Sep 1, 2025 at 4:40 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
> >
> > On Mon, Sep 1, 2025 at 3:59 AM Jason Wang <jasowang@redhat.com> wrote:
> > >
> > > On Tue, Aug 26, 2025 at 7:27 PM Eugenio Pérez <eperezma@redhat.com> wrote:
> > > >
> > > > This allows sepparate the different virtqueues in groups that shares the
> > > > same address space.  Asking the VDUSE device for the groups of the vq at
> > > > the beginning as they're needed for the DMA API.
> > > >
> > > > Allocating 3 vq groups as net is the device that need the most groups:
> > > > * Dataplane (guest passthrough)
> > > > * CVQ
> > > > * Shadowed vrings.
> > > >
> > > > Future versions of the series can include dynamic allocation of the
> > > > groups array so VDUSE can declare more groups.
> > > >
> > > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > > ---
> > > > v1: Fix: Remove BIT_ULL(VIRTIO_S_*), as _S_ is already the bit (Maxime)
> > > >
> > > > RFC v3:
> > > > * Increase VDUSE_MAX_VQ_GROUPS to 0xffff (Jason). It was set to a lower
> > > >   value to reduce memory consumption, but vqs are already limited to
> > > >   that value and userspace VDUSE is able to allocate that many vqs.
> > > > * Remove the descs vq group capability as it will not be used and we can
> > > >   add it on top.
> > > > * Do not ask for vq groups in number of vq groups < 2.
> > > > * Move the valid vq groups range check to vduse_validate_config.
> > > >
> > > > RFC v2:
> > > > * Cache group information in kernel, as we need to provide the vq map
> > > >   tokens properly.
> > > > * Add descs vq group to optimize SVQ forwarding and support indirect
> > > >   descriptors out of the box.
> > > > ---
> > > >  drivers/vdpa/vdpa_user/vduse_dev.c | 51 ++++++++++++++++++++++++++++--
> > > >  include/uapi/linux/vduse.h         | 21 +++++++++++-
> > > >  2 files changed, 68 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
> > > > index e7bced0b5542..0f4e36dd167e 100644
> > > > --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> > > > +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> > > > @@ -58,6 +58,7 @@ struct vduse_virtqueue {
> > > >         struct vdpa_vq_state state;
> > > >         bool ready;
> > > >         bool kicked;
> > > > +       u32 vq_group;
> > > >         spinlock_t kick_lock;
> > > >         spinlock_t irq_lock;
> > > >         struct eventfd_ctx *kickfd;
> > > > @@ -114,6 +115,7 @@ struct vduse_dev {
> > > >         u8 status;
> > > >         u32 vq_num;
> > > >         u32 vq_align;
> > > > +       u32 ngroups;
> > > >         struct vduse_umem *umem;
> > > >         struct mutex mem_lock;
> > > >         unsigned int bounce_size;
> > > > @@ -592,6 +594,13 @@ static int vduse_vdpa_set_vq_state(struct vdpa_device *vdpa, u16 idx,
> > > >         return 0;
> > > >  }
> > > >
> > > > +static u32 vduse_get_vq_group(struct vdpa_device *vdpa, u16 idx)
> > > > +{
> > > > +       struct vduse_dev *dev = vdpa_to_vduse(vdpa);
> > > > +
> > > > +       return dev->vqs[idx]->vq_group;
> > > > +}
> > > > +
> > > >  static int vduse_vdpa_get_vq_state(struct vdpa_device *vdpa, u16 idx,
> > > >                                 struct vdpa_vq_state *state)
> > > >  {
> > > > @@ -678,6 +687,28 @@ static u8 vduse_vdpa_get_status(struct vdpa_device *vdpa)
> > > >         return dev->status;
> > > >  }
> > > >
> > > > +static int vduse_fill_vq_groups(struct vduse_dev *dev)
> > > > +{
> > > > +       /* All vqs and descs must be in vq group 0 if ngroups < 2 */
> > > > +       if (dev->ngroups < 2)
> > > > +               return 0;
> > > > +
> > > > +       for (int i = 0; i < dev->vdev->vdpa.nvqs; ++i) {
> > > > +               struct vduse_dev_msg msg = { 0 };
> > > > +               int ret;
> > > > +
> > > > +               msg.req.type = VDUSE_GET_VQ_GROUP;
> > > > +               msg.req.vq_group.index = i;
> > > > +               ret = vduse_dev_msg_sync(dev, &msg);
> > > > +               if (ret)
> > > > +                       return ret;
> > > > +
> > > > +               dev->vqs[i]->vq_group = msg.resp.vq_group.group;
> > > > +       }
> > > > +
> > > > +       return 0;
> > > > +}
> > > > +
> > > >  static void vduse_vdpa_set_status(struct vdpa_device *vdpa, u8 status)
> > > >  {
> > > >         struct vduse_dev *dev = vdpa_to_vduse(vdpa);
> > > > @@ -685,6 +716,11 @@ static void vduse_vdpa_set_status(struct vdpa_device *vdpa, u8 status)
> > > >         if (vduse_dev_set_status(dev, status))
> > > >                 return;
> > > >
> > > > +       if (((dev->status ^ status) & VIRTIO_CONFIG_S_FEATURES_OK) &&
> > > > +           (status & VIRTIO_CONFIG_S_FEATURES_OK))
> > > > +               if (vduse_fill_vq_groups(dev))
> > > > +                       return;
> > >
> > > I may lose some context but I think we've agreed that we need to
> > > extend the status response for this instead of having multiple
> > > indepdent response.
> > >
> >
> > My understanding was it is ok to start with this version by [1]. We
> > can even make it asynchronous on top if we find this is a bottleneck
> > and the VDUSE device would need no change, would that work?
>
> I think I need to understand why we can not defer this to get_group_asid() call.
>

Because we need to know the vq_groups->asid mapping in other calls
like set_group_asid or get_vq_group.

We could add a boolean on each virtqueue to track if we know its
virtqueue group and then only ask VDUSE device it if needed, would
that work?


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 5/6] vduse: add vq group asid support
  2025-09-03  3:56       ` Jason Wang
@ 2025-09-03  6:39         ` Eugenio Perez Martin
  0 siblings, 0 replies; 24+ messages in thread
From: Eugenio Perez Martin @ 2025-09-03  6:39 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S . Tsirkin, Cindy Lu, Stefano Garzarella, Xuan Zhuo,
	Laurent Vivier, virtualization, linux-kernel, Yongji Xie,
	Maxime Coquelin

On Wed, Sep 3, 2025 at 5:57 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Mon, Sep 1, 2025 at 5:12 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
> >
> > On Mon, Sep 1, 2025 at 4:46 AM Jason Wang <jasowang@redhat.com> wrote:
> > >
> > > On Tue, Aug 26, 2025 at 7:27 PM Eugenio Pérez <eperezma@redhat.com> wrote:
> > > >
> > > > Add support for assigning Address Space Identifiers (ASIDs) to each VQ
> > > > group.  This enables mapping each group into a distinct memory space.
> > > >
> > > > Now that the driver can change ASID in the middle of operation, the
> > > > domain that each vq address point is also protected by domain_lock.
> > > >
> > > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > > ---
> > > > v3:
> > > > * Increase VDUSE_MAX_VQ_GROUPS to 0xffff (Jason). It was set to a lower
> > > >   value to reduce memory consumption, but vqs are already limited to
> > > >   that value and userspace VDUSE is able to allocate that many vqs.
> > > > * Remove TODO about merging VDUSE_IOTLB_GET_FD ioctl with
> > > >   VDUSE_IOTLB_GET_INFO.
> > > > * Use of array_index_nospec in VDUSE device ioctls.
> > > > * Embed vduse_iotlb_entry into vduse_iotlb_entry_v2.
> > > > * Move the umem mutex to asid struct so there is no contention between
> > > >   ASIDs.
> > > >
> > > > v2:
> > > > * Make iotlb entry the last one of vduse_iotlb_entry_v2 so the first
> > > >   part of the struct is the same.
> > > > ---
> > > >  drivers/vdpa/vdpa_user/vduse_dev.c | 290 +++++++++++++++++++++--------
> > > >  include/uapi/linux/vduse.h         |  52 +++++-
> > > >  2 files changed, 259 insertions(+), 83 deletions(-)
> > > >
> > > > diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
> > > > index 7d2a3ed77b1e..2fb227713972 100644
> > > > --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> > > > +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> > > > @@ -92,6 +92,7 @@ struct vduse_as {
> > > >  };
> > > >
> > > >  struct vduse_vq_group_int {
> > > > +       struct vduse_iova_domain *domain;
> > >
> > > I'd expect this should be vduse_as. Anything I miss?
> > >
> >
> > It just saves a memory indirection step, as the rest of members of
> > vduse_as are not used in vduse_map_ops callbacks. I can use vduse_as
> > if you prefer it.
>
> I think it's better to use that to follow the abstraction of a group
> to address space indirection.
>
> >
> > > >         struct vduse_dev *dev;
> > > >  };
> > > >
> > > > @@ -99,7 +100,7 @@ struct vduse_dev {
> > > >         struct vduse_vdpa *vdev;
> > > >         struct device *dev;
> > > >         struct vduse_virtqueue **vqs;
> > > > -       struct vduse_as as;
> > > > +       struct vduse_as *as;
> > > >         char *name;
> > > >         struct mutex lock;
> > > >         spinlock_t msg_lock;
> > > > @@ -127,6 +128,7 @@ struct vduse_dev {
> > > >         u32 vq_num;
> > > >         u32 vq_align;
> > > >         u32 ngroups;
> > > > +       u32 nas;
> > > >         struct vduse_vq_group_int *groups;
> > > >         unsigned int bounce_size;
> > > >         struct mutex domain_lock;
> > > > @@ -317,7 +319,7 @@ static int vduse_dev_set_status(struct vduse_dev *dev, u8 status)
> > > >         return vduse_dev_msg_sync(dev, &msg);
> > > >  }
> > > >
> > > > -static int vduse_dev_update_iotlb(struct vduse_dev *dev,
> > > > +static int vduse_dev_update_iotlb(struct vduse_dev *dev, u32 asid,
> > > >                                   u64 start, u64 last)
> > > >  {
> > > >         struct vduse_dev_msg msg = { 0 };
> > > > @@ -326,8 +328,14 @@ static int vduse_dev_update_iotlb(struct vduse_dev *dev,
> > > >                 return -EINVAL;
> > > >
> > > >         msg.req.type = VDUSE_UPDATE_IOTLB;
> > > > -       msg.req.iova.start = start;
> > > > -       msg.req.iova.last = last;
> > > > +       if (dev->api_version < VDUSE_API_VERSION_1) {
> > > > +               msg.req.iova.start = start;
> > > > +               msg.req.iova.last = last;
> > > > +       } else {
> > > > +               msg.req.iova_v2.start = start;
> > > > +               msg.req.iova_v2.last = last;
> > > > +               msg.req.iova_v2.asid = asid;
> > > > +       }
> > > >
> > > >         return vduse_dev_msg_sync(dev, &msg);
> > > >  }
> > > > @@ -439,14 +447,28 @@ static __poll_t vduse_dev_poll(struct file *file, poll_table *wait)
> > > >         return mask;
> > > >  }
> > > >
> > > > +/* Force set the asid to a vq group without a message to the VDUSE device */
> > > > +static void vduse_set_group_asid_nomsg(struct vduse_dev *dev,
> > > > +                                      unsigned int group, unsigned int asid)
> > > > +{
> > > > +       guard(mutex)(&dev->domain_lock);
> > > > +       dev->groups[group].domain = dev->as[asid].domain;
> > > > +}
> > > > +
> > > >  static void vduse_dev_reset(struct vduse_dev *dev)
> > > >  {
> > > >         int i;
> > > > -       struct vduse_iova_domain *domain = dev->as.domain;
> > > >
> > > >         /* The coherent mappings are handled in vduse_dev_free_coherent() */
> > > > -       if (domain && domain->bounce_map)
> > > > -               vduse_domain_reset_bounce_map(domain);
> > > > +       for (i = 0; i < dev->nas; i++) {
> > > > +               struct vduse_iova_domain *domain = dev->as[i].domain;
> > > > +
> > > > +               if (domain && domain->bounce_map)
> > > > +                       vduse_domain_reset_bounce_map(domain);
> > > > +       }
> > > > +
> > > > +       for (i = 0; i < dev->ngroups; i++)
> > > > +               vduse_set_group_asid_nomsg(dev, i, 0);
> > > >
> > > >         down_write(&dev->rwsem);
> > > >
> > > > @@ -620,6 +642,29 @@ static union virtio_map vduse_get_vq_map(struct vdpa_device *vdpa, u16 idx)
> > > >         return ret;
> > > >  }
> > > >
> > > > +static int vduse_set_group_asid(struct vdpa_device *vdpa, unsigned int group,
> > > > +                               unsigned int asid)
> > > > +{
> > > > +       struct vduse_dev *dev = vdpa_to_vduse(vdpa);
> > > > +       struct vduse_dev_msg msg = { 0 };
> > > > +       int r;
> > > > +
> > > > +       if (dev->api_version < VDUSE_API_VERSION_1 ||
> > > > +           group >= dev->ngroups || asid >= dev->nas)
> > > > +               return -EINVAL;
> > > > +
> > > > +       msg.req.type = VDUSE_SET_VQ_GROUP_ASID;
> > > > +       msg.req.vq_group_asid.group = group;
> > > > +       msg.req.vq_group_asid.asid = asid;
> > > > +
> > > > +       r = vduse_dev_msg_sync(dev, &msg);
> > > > +       if (r < 0)
> > > > +               return r;
> > > > +
> > > > +       vduse_set_group_asid_nomsg(dev, group, asid);
> > > > +       return 0;
> > > > +}
> > > > +
> > > >  static int vduse_vdpa_get_vq_state(struct vdpa_device *vdpa, u16 idx,
> > > >                                 struct vdpa_vq_state *state)
> > > >  {
> > > > @@ -818,13 +863,13 @@ static int vduse_vdpa_set_map(struct vdpa_device *vdpa,
> > > >         struct vduse_dev *dev = vdpa_to_vduse(vdpa);
> > > >         int ret;
> > > >
> > > > -       ret = vduse_domain_set_map(dev->as.domain, iotlb);
> > > > +       ret = vduse_domain_set_map(dev->as[asid].domain, iotlb);
> > > >         if (ret)
> > > >                 return ret;
> > > >
> > > > -       ret = vduse_dev_update_iotlb(dev, 0ULL, ULLONG_MAX);
> > > > +       ret = vduse_dev_update_iotlb(dev, asid, 0ULL, ULLONG_MAX);
> > > >         if (ret) {
> > > > -               vduse_domain_clear_map(dev->as.domain, iotlb);
> > > > +               vduse_domain_clear_map(dev->as[asid].domain, iotlb);
> > > >                 return ret;
> > > >         }
> > > >
> > > > @@ -867,6 +912,7 @@ static const struct vdpa_config_ops vduse_vdpa_config_ops = {
> > > >         .get_vq_affinity        = vduse_vdpa_get_vq_affinity,
> > > >         .reset                  = vduse_vdpa_reset,
> > > >         .set_map                = vduse_vdpa_set_map,
> > > > +       .set_group_asid         = vduse_set_group_asid,
> > > >         .get_vq_map             = vduse_get_vq_map,
> > > >         .free                   = vduse_vdpa_free,
> > > >  };
> > > > @@ -876,8 +922,10 @@ static void vduse_dev_sync_single_for_device(union virtio_map token,
> > > >                                              enum dma_data_direction dir)
> > > >  {
> > > >         struct vduse_dev *vdev = token.group->dev;
> > > > -       struct vduse_iova_domain *domain = vdev->as.domain;
> > > > +       struct vduse_iova_domain *domain;
> > > >
> > > > +       guard(mutex)(&vdev->domain_lock);
> > >
> > > Is this correct? I mean each AS should have its own lock instead of
> > > having a BQL.
> > >
> >
> > That would not protect the pointer if vduse_dev_sync_single_for_device
> > (or equivalent function that uses the domain) is called at the same
> > time as vduse_set_group_asid.
> >
> > > > +       domain = token.group->domain;
> > > >         vduse_domain_sync_single_for_device(domain, dma_addr, size, dir);
> > > >  }
> > > >
> > > > @@ -886,8 +934,10 @@ static void vduse_dev_sync_single_for_cpu(union virtio_map token,
> > > >                                              enum dma_data_direction dir)
> > > >  {
> > > >         struct vduse_dev *vdev = token.group->dev;
> > > > -       struct vduse_iova_domain *domain = vdev->as.domain;
> > > > +       struct vduse_iova_domain *domain;
> > > >
> > > > +       guard(mutex)(&vdev->domain_lock);
> > > > +       domain = token.group->domain;
> > > >         vduse_domain_sync_single_for_cpu(domain, dma_addr, size, dir);
> > > >  }
> > > >
> > > > @@ -897,8 +947,10 @@ static dma_addr_t vduse_dev_map_page(union virtio_map token, struct page *page,
> > > >                                      unsigned long attrs)
> > > >  {
> > > >         struct vduse_dev *vdev = token.group->dev;
> > > > -       struct vduse_iova_domain *domain = vdev->as.domain;
> > > > +       struct vduse_iova_domain *domain;
> > > >
> > > > +       guard(mutex)(&vdev->domain_lock);
> > > > +       domain = token.group->domain;
> > > >         return vduse_domain_map_page(domain, page, offset, size, dir, attrs);
> > > >  }
> > > >
> > > > @@ -907,8 +959,10 @@ static void vduse_dev_unmap_page(union virtio_map token, dma_addr_t dma_addr,
> > > >                                  unsigned long attrs)
> > > >  {
> > > >         struct vduse_dev *vdev = token.group->dev;
> > > > -       struct vduse_iova_domain *domain = vdev->as.domain;
> > > > +       struct vduse_iova_domain *domain;
> > > >
> > > > +       guard(mutex)(&vdev->domain_lock);
> > > > +       domain = token.group->domain;
> > > >         return vduse_domain_unmap_page(domain, dma_addr, size, dir, attrs);
> > > >  }
> > > >
> > > > @@ -916,11 +970,13 @@ static void *vduse_dev_alloc_coherent(union virtio_map token, size_t size,
> > > >                                       dma_addr_t *dma_addr, gfp_t flag)
> > > >  {
> > > >         struct vduse_dev *vdev = token.group->dev;
> > > > -       struct vduse_iova_domain *domain = vdev->as.domain;
> > > > +       struct vduse_iova_domain *domain;
> > > >         unsigned long iova;
> > > >         void *addr;
> > > >
> > > >         *dma_addr = DMA_MAPPING_ERROR;
> > > > +       guard(mutex)(&vdev->domain_lock);
> > > > +       domain = token.group->domain;
> > > >         addr = vduse_domain_alloc_coherent(domain, size,
> > > >                                            (dma_addr_t *)&iova, flag);
> > > >         if (!addr)
> > > > @@ -936,16 +992,20 @@ static void vduse_dev_free_coherent(union virtio_map token, size_t size,
> > > >                                     unsigned long attrs)
> > > >  {
> > > >         struct vduse_dev *vdev = token.group->dev;
> > > > -       struct vduse_iova_domain *domain = vdev->as.domain;
> > > > +       struct vduse_iova_domain *domain;
> > > >
> > > > +       guard(mutex)(&vdev->domain_lock);
> > > > +       domain = token.group->domain;
> > > >         vduse_domain_free_coherent(domain, size, vaddr, dma_addr, attrs);
> > > >  }
> > > >
> > > >  static bool vduse_dev_need_sync(union virtio_map token, dma_addr_t dma_addr)
> > > >  {
> > > >         struct vduse_dev *vdev = token.group->dev;
> > > > -       struct vduse_iova_domain *domain = vdev->as.domain;
> > > > +       struct vduse_iova_domain *domain;
> > > >
> > > > +       guard(mutex)(&vdev->domain_lock);
> > > > +       domain = token.group->domain;
> > > >         return dma_addr < domain->bounce_size;
> > > >  }
> > > >
> > > > @@ -959,8 +1019,10 @@ static int vduse_dev_mapping_error(union virtio_map token, dma_addr_t dma_addr)
> > > >  static size_t vduse_dev_max_mapping_size(union virtio_map token)
> > > >  {
> > > >         struct vduse_dev *vdev = token.group->dev;
> > > > -       struct vduse_iova_domain *domain = vdev->as.domain;
> > > > +       struct vduse_iova_domain *domain;
> > > >
> > > > +       guard(mutex)(&vdev->domain_lock);
> > > > +       domain = token.group->domain;
> > > >         return domain->bounce_size;
> > > >  }
> > > >
> > > > @@ -1101,39 +1163,40 @@ static int vduse_dev_queue_irq_work(struct vduse_dev *dev,
> > > >         return ret;
> > > >  }
> > > >
> > > > -static int vduse_dev_dereg_umem(struct vduse_dev *dev,
> > > > +static int vduse_dev_dereg_umem(struct vduse_dev *dev, u32 asid,
> > > >                                 u64 iova, u64 size)
> > > >  {
> > > >         int ret;
> > > >
> > > > -       mutex_lock(&dev->as.mem_lock);
> > > > +       mutex_lock(&dev->as[asid].mem_lock);
> > > >         ret = -ENOENT;
> > > > -       if (!dev->as.umem)
> > > > +       if (!dev->as[asid].umem)
> > > >                 goto unlock;
> > > >
> > > >         ret = -EINVAL;
> > > > -       if (!dev->as.domain)
> > > > +       if (!dev->as[asid].domain)
> > > >                 goto unlock;
> > > >
> > > > -       if (dev->as.umem->iova != iova || size != dev->as.domain->bounce_size)
> > > > +       if (dev->as[asid].umem->iova != iova ||
> > > > +           size != dev->as[asid].domain->bounce_size)
> > > >                 goto unlock;
> > > >
> > > > -       vduse_domain_remove_user_bounce_pages(dev->as.domain);
> > > > -       unpin_user_pages_dirty_lock(dev->as.umem->pages,
> > > > -                                   dev->as.umem->npages, true);
> > > > -       atomic64_sub(dev->as.umem->npages, &dev->as.umem->mm->pinned_vm);
> > > > -       mmdrop(dev->as.umem->mm);
> > > > -       vfree(dev->as.umem->pages);
> > > > -       kfree(dev->as.umem);
> > > > -       dev->as.umem = NULL;
> > > > +       vduse_domain_remove_user_bounce_pages(dev->as[asid].domain);
> > > > +       unpin_user_pages_dirty_lock(dev->as[asid].umem->pages,
> > > > +                                   dev->as[asid].umem->npages, true);
> > > > +       atomic64_sub(dev->as[asid].umem->npages, &dev->as[asid].umem->mm->pinned_vm);
> > > > +       mmdrop(dev->as[asid].umem->mm);
> > > > +       vfree(dev->as[asid].umem->pages);
> > > > +       kfree(dev->as[asid].umem);
> > > > +       dev->as[asid].umem = NULL;
> > > >         ret = 0;
> > > >  unlock:
> > > > -       mutex_unlock(&dev->as.mem_lock);
> > > > +       mutex_unlock(&dev->as[asid].mem_lock);
> > > >         return ret;
> > > >  }
> > > >
> > > >  static int vduse_dev_reg_umem(struct vduse_dev *dev,
> > > > -                             u64 iova, u64 uaddr, u64 size)
> > > > +                             u32 asid, u64 iova, u64 uaddr, u64 size)
> > > >  {
> > > >         struct page **page_list = NULL;
> > > >         struct vduse_umem *umem = NULL;
> > > > @@ -1141,14 +1204,14 @@ static int vduse_dev_reg_umem(struct vduse_dev *dev,
> > > >         unsigned long npages, lock_limit;
> > > >         int ret;
> > > >
> > > > -       if (!dev->as.domain || !dev->as.domain->bounce_map ||
> > > > -           size != dev->as.domain->bounce_size ||
> > > > +       if (!dev->as[asid].domain || !dev->as[asid].domain->bounce_map ||
> > > > +           size != dev->as[asid].domain->bounce_size ||
> > > >             iova != 0 || uaddr & ~PAGE_MASK)
> > > >                 return -EINVAL;
> > > >
> > > > -       mutex_lock(&dev->as.mem_lock);
> > > > +       mutex_lock(&dev->as[asid].mem_lock);
> > > >         ret = -EEXIST;
> > > > -       if (dev->as.umem)
> > > > +       if (dev->as[asid].umem)
> > > >                 goto unlock;
> > > >
> > > >         ret = -ENOMEM;
> > > > @@ -1172,7 +1235,7 @@ static int vduse_dev_reg_umem(struct vduse_dev *dev,
> > > >                 goto out;
> > > >         }
> > > >
> > > > -       ret = vduse_domain_add_user_bounce_pages(dev->as.domain,
> > > > +       ret = vduse_domain_add_user_bounce_pages(dev->as[asid].domain,
> > > >                                                  page_list, pinned);
> > > >         if (ret)
> > > >                 goto out;
> > > > @@ -1185,7 +1248,7 @@ static int vduse_dev_reg_umem(struct vduse_dev *dev,
> > > >         umem->mm = current->mm;
> > > >         mmgrab(current->mm);
> > > >
> > > > -       dev->as.umem = umem;
> > > > +       dev->as[asid].umem = umem;
> > > >  out:
> > > >         if (ret && pinned > 0)
> > > >                 unpin_user_pages(page_list, pinned);
> > > > @@ -1196,7 +1259,7 @@ static int vduse_dev_reg_umem(struct vduse_dev *dev,
> > > >                 vfree(page_list);
> > > >                 kfree(umem);
> > > >         }
> > > > -       mutex_unlock(&dev->as.mem_lock);
> > > > +       mutex_unlock(&dev->as[asid].mem_lock);
> > > >         return ret;
> > > >  }
> > > >
> > > > @@ -1228,47 +1291,66 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
> > > >
> > > >         switch (cmd) {
> > > >         case VDUSE_IOTLB_GET_FD: {
> > > > -               struct vduse_iotlb_entry entry;
> > > > +               struct vduse_iotlb_entry_v2 entry;
> > > >                 struct vhost_iotlb_map *map;
> > > >                 struct vdpa_map_file *map_file;
> > > >                 struct file *f = NULL;
> > > > +               u32 asid;
> > > >
> > > >                 ret = -EFAULT;
> > > > -               if (copy_from_user(&entry, argp, sizeof(entry)))
> > > > -                       break;
> > > > +               if (dev->api_version >= VDUSE_API_VERSION_1) {
> > > > +                       if (copy_from_user(&entry, argp, sizeof(entry)))
> > > > +                               break;
> > > > +               } else {
> > > > +                       entry.asid = 0;
> > > > +                       if (copy_from_user(&entry.v1, argp,
> > > > +                                          sizeof(entry.v1)))
> > > > +                               break;
> > > > +               }
> > > >
> > > >                 ret = -EINVAL;
> > > > -               if (entry.start > entry.last)
> > > > +               if (entry.v1.start > entry.v1.last)
> > > > +                       break;
> > > > +
> > > > +               if (entry.asid >= dev->nas)
> > > >                         break;
> > > >
> > > >                 mutex_lock(&dev->domain_lock);
> > > > -               if (!dev->as.domain) {
> > > > +               asid = array_index_nospec(entry.asid, dev->nas);
> > > > +               if (!dev->as[asid].domain) {
> > > >                         mutex_unlock(&dev->domain_lock);
> > > >                         break;
> > > >                 }
> > > > -               spin_lock(&dev->as.domain->iotlb_lock);
> > > > -               map = vhost_iotlb_itree_first(dev->as.domain->iotlb,
> > > > -                                             entry.start, entry.last);
> > > > +               spin_lock(&dev->as[asid].domain->iotlb_lock);
> > > > +               map = vhost_iotlb_itree_first(dev->as[asid].domain->iotlb,
> > > > +                                             entry.v1.start, entry.v1.last);
> > > >                 if (map) {
> > > >                         map_file = (struct vdpa_map_file *)map->opaque;
> > > >                         f = get_file(map_file->file);
> > > > -                       entry.offset = map_file->offset;
> > > > -                       entry.start = map->start;
> > > > -                       entry.last = map->last;
> > > > -                       entry.perm = map->perm;
> > > > +                       entry.v1.offset = map_file->offset;
> > > > +                       entry.v1.start = map->start;
> > > > +                       entry.v1.last = map->last;
> > > > +                       entry.v1.perm = map->perm;
> > > >                 }
> > > > -               spin_unlock(&dev->as.domain->iotlb_lock);
> > > > +               spin_unlock(&dev->as[asid].domain->iotlb_lock);
> > > >                 mutex_unlock(&dev->domain_lock);
> > > >                 ret = -EINVAL;
> > > >                 if (!f)
> > > >                         break;
> > > >
> > > >                 ret = -EFAULT;
> > > > -               if (copy_to_user(argp, &entry, sizeof(entry))) {
> > > > +               if (dev->api_version >= VDUSE_API_VERSION_1)
> > > > +                       ret = copy_to_user(argp, &entry,
> > > > +                                          sizeof(entry));
> > > > +               else
> > > > +                       ret = copy_to_user(argp, &entry.v1,
> > > > +                                          sizeof(entry.v1));
> > > > +
> > > > +               if (ret) {
> > > >                         fput(f);
> > > >                         break;
> > > >                 }
> > > > -               ret = receive_fd(f, NULL, perm_to_file_flags(entry.perm));
> > > > +               ret = receive_fd(f, NULL, perm_to_file_flags(entry.v1.perm));
> > > >                 fput(f);
> > > >                 break;
> > > >         }
> > > > @@ -1401,6 +1483,7 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
> > > >         }
> > > >         case VDUSE_IOTLB_REG_UMEM: {
> > > >                 struct vduse_iova_umem umem;
> > > > +               u32 asid;
> > > >
> > > >                 ret = -EFAULT;
> > > >                 if (copy_from_user(&umem, argp, sizeof(umem)))
> > > > @@ -1408,17 +1491,21 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
> > > >
> > > >                 ret = -EINVAL;
> > > >                 if (!is_mem_zero((const char *)umem.reserved,
> > > > -                                sizeof(umem.reserved)))
> > > > +                                sizeof(umem.reserved)) ||
> > > > +                   (dev->api_version < VDUSE_API_VERSION_1 &&
> > > > +                    umem.asid != 0) || umem.asid >= dev->nas)
> > > >                         break;
> > >
> > > Does this mean umem is only supported for asid == 0? This looks like a bug.
> > >
> >
> > No, that conditional means that:
> > * If dev->api_version < V1 (In other words, if it is 0), umem.asid
> > must be 0. Previous versions already return -EINVAL if the reserved
> > members are 0 so we must keep this behavior.
> > * If the version is 1 or bigger, the asid must be smaller than the
> > device number of ASID (dev->nas).
>
> Ok I see, I misread it as
>
> +                   (dev->api_version < VDUSE_API_VERSION_1 &&
> +                    (umem.asid != 0) || umem.asid >= dev->nas))
>
> >
> > > >
> > > >                 mutex_lock(&dev->domain_lock);
> > > > -               ret = vduse_dev_reg_umem(dev, umem.iova,
> > > > +               asid = array_index_nospec(umem.asid, dev->nas);
> > > > +               ret = vduse_dev_reg_umem(dev, asid, umem.iova,
> > > >                                          umem.uaddr, umem.size);
> > > >                 mutex_unlock(&dev->domain_lock);
> > > >                 break;
> > > >         }
> > > >         case VDUSE_IOTLB_DEREG_UMEM: {
> > > >                 struct vduse_iova_umem umem;
> > > > +               u32 asid;
> > > >
> > > >                 ret = -EFAULT;
> > > >                 if (copy_from_user(&umem, argp, sizeof(umem)))
> > > > @@ -1426,10 +1513,15 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
> > > >
> > > >                 ret = -EINVAL;
> > > >                 if (!is_mem_zero((const char *)umem.reserved,
> > > > -                                sizeof(umem.reserved)))
> > > > +                                sizeof(umem.reserved)) ||
> > > > +                   (dev->api_version < VDUSE_API_VERSION_1 &&
> > > > +                    umem.asid != 0) ||
> > > > +                    umem.asid >= dev->nas)
> > > >                         break;
> > > > +
> > > >                 mutex_lock(&dev->domain_lock);
> > > > -               ret = vduse_dev_dereg_umem(dev, umem.iova,
> > > > +               asid = array_index_nospec(umem.asid, dev->nas);
> > > > +               ret = vduse_dev_dereg_umem(dev, asid, umem.iova,
> > > >                                            umem.size);
> > > >                 mutex_unlock(&dev->domain_lock);
> > > >                 break;
> > > > @@ -1437,6 +1529,7 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
> > > >         case VDUSE_IOTLB_GET_INFO: {
> > > >                 struct vduse_iova_info info;
> > > >                 struct vhost_iotlb_map *map;
> > > > +               u32 asid;
> > > >
> > > >                 ret = -EFAULT;
> > > >                 if (copy_from_user(&info, argp, sizeof(info)))
> > > > @@ -1450,23 +1543,31 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
> > > >                                  sizeof(info.reserved)))
> > > >                         break;
> > > >
> > > > +               if (dev->api_version < VDUSE_API_VERSION_1) {
> > > > +                       if (info.asid)
> > > > +                               break;
> > > > +               } else if (info.asid >= dev->nas)
> > >
> > > It would be simpler if we mandate dev->nas == 1 VERSION 0.
> > >
> >
> > That's done later in this patch, in the change of the
> > vduse_validate_config function.
> >
> > > > +                       break;
> > > > +
> > > >                 mutex_lock(&dev->domain_lock);
> > > > -               if (!dev->as.domain) {
> > > > +               asid = array_index_nospec(info.asid, dev->nas);
> > > > +               if (!dev->as[asid].domain) {
> > > >                         mutex_unlock(&dev->domain_lock);
> > > >                         break;
> > > >                 }
> > > > -               spin_lock(&dev->as.domain->iotlb_lock);
> > > > -               map = vhost_iotlb_itree_first(dev->as.domain->iotlb,
> > > > +               spin_lock(&dev->as[asid].domain->iotlb_lock);
> > > > +               map = vhost_iotlb_itree_first(dev->as[asid].domain->iotlb,
> > > >                                               info.start, info.last);
> > > >                 if (map) {
> > > >                         info.start = map->start;
> > > >                         info.last = map->last;
> > > >                         info.capability = 0;
> > > > -                       if (dev->as.domain->bounce_map && map->start == 0 &&
> > > > -                           map->last == dev->as.domain->bounce_size - 1)
> > > > +                       if (dev->as[asid].domain->bounce_map &&
> > > > +                           map->start == 0 &&
> > > > +                           map->last == dev->as[asid].domain->bounce_size - 1)
> > > >                                 info.capability |= VDUSE_IOVA_CAP_UMEM;
> > > >                 }
> > > > -               spin_unlock(&dev->as.domain->iotlb_lock);
> > > > +               spin_unlock(&dev->as[asid].domain->iotlb_lock);
> > > >                 mutex_unlock(&dev->domain_lock);
> > > >                 if (!map)
> > > >                         break;
> > > > @@ -1491,8 +1592,10 @@ static int vduse_dev_release(struct inode *inode, struct file *file)
> > > >         struct vduse_dev *dev = file->private_data;
> > > >
> > > >         mutex_lock(&dev->domain_lock);
> > > > -       if (dev->as.domain)
> > > > -               vduse_dev_dereg_umem(dev, 0, dev->as.domain->bounce_size);
> > > > +       for (int i = 0; i < dev->nas; i++)
> > > > +               if (dev->as[i].domain)
> > >
> > > Not related to this patch, but I wonder which case could we get domain
> > > == NULL here?
> > >
> >
> > I never tried myself but the domain is set at vdpa_dev_add. If a VDUSE
> > device is created by the VDUSE_CREATE_DEV ioctl and then destroyed
> > with VDUSE_DESTROY_DEV without calling "vdpa dev add" the
> > vduse_dev_release function will find dev->as[i].domain == NULL.
> >
>
> Ok, but I didn't see a similar if(dev->domain) check in
> VDUSE_IOTLB_DEREG_UMEM. Is this a bug? If it is, maybe it's better to
> move the as check inside vduse_dev_dereg_umem.
>

Actually the check is duplicated here, as it is also present in
vduse_dev_dereg_umem and vduse_dev_reg_umem. I can remove the
conditional from the caller in the next series.

>
> > > > +                       vduse_dev_dereg_umem(dev, i, 0,
> > > > +                                            dev->as[i].domain->bounce_size);
> > > >         mutex_unlock(&dev->domain_lock);
> > > >         spin_lock(&dev->msg_lock);
> > > >         /* Make sure the inflight messages can processed after reconncection */
> > > > @@ -1711,7 +1814,6 @@ static struct vduse_dev *vduse_dev_create(void)
> > > >                 return NULL;
> > > >
> > > >         mutex_init(&dev->lock);
> > > > -       mutex_init(&dev->as.mem_lock);
> > > >         mutex_init(&dev->domain_lock);
> > > >         spin_lock_init(&dev->msg_lock);
> > > >         INIT_LIST_HEAD(&dev->send_list);
> > > > @@ -1762,8 +1864,11 @@ static int vduse_destroy_dev(char *name)
> > > >         idr_remove(&vduse_idr, dev->minor);
> > > >         kvfree(dev->config);
> > > >         vduse_dev_deinit_vqs(dev);
> > > > -       if (dev->as.domain)
> > > > -               vduse_domain_destroy(dev->as.domain);
> > > > +       for (int i = 0; i < dev->nas; i++) {
> > > > +               if (dev->as[i].domain)
> > > > +                       vduse_domain_destroy(dev->as[i].domain);
> > > > +       }
> > > > +       kfree(dev->as);
> > > >         kfree(dev->name);
> > > >         kfree(dev->groups);
> > > >         vduse_dev_destroy(dev);
> > > > @@ -1810,12 +1915,16 @@ static bool vduse_validate_config(struct vduse_dev_config *config,
> > > >                          sizeof(config->reserved)))
> > > >                 return false;
> > > >
> > > > -       if (api_version < VDUSE_API_VERSION_1 && config->ngroups)
> > > > +       if (api_version < VDUSE_API_VERSION_1 &&
> > > > +           (config->ngroups || config->nas))
> > > >                 return false;
> > > >
> > > >         if (api_version >= VDUSE_API_VERSION_1 && config->ngroups > 0xffff)
> > > >                 return false;
> > > >
> > > > +       if (api_version >= VDUSE_API_VERSION_1 && config->nas > 0xffff)
> > > > +               return false;
> > >
> > > Using macro instead of magic number please.
> > >
> > > > +
> > > >         if (config->vq_align > PAGE_SIZE)
> > > >                 return false;
> > > >
> > > > @@ -1879,7 +1988,8 @@ static ssize_t bounce_size_store(struct device *device,
> > > >
> > >
> > > So the real size of the bounce should be bounce_size * nas? Should we
> > > be conservative to adjust per as bounce size to bounce_size / nas?
> > >
> >
> > I think that's a bad idea in net devices in particular, as I expect
> > the dataplane will need way more memory than CVQ.
>
> Ok, I just want to note that the bounce might consume more than what
> the user set here.
>
> > bounce_size per
> > device seems more correct to me. But I can divide per nas for sure.
>
> Or make bounce_size to be the total size of all virtqueues, so
> management will increase it when it knows there's a multiqueue device.
>

I think this is the best option, yes. Changing for the next version!

> >
> > > >         ret = -EPERM;
> > > >         mutex_lock(&dev->domain_lock);
> > > > -       if (dev->as.domain)
> > > > +       /* Assuming that if the first domain is allocated, all are allocated */
> > > > +       if (dev->as[0].domain)
> > > >                 goto unlock;
> > > >
> > > >         ret = kstrtouint(buf, 10, &bounce_size);
> > > > @@ -1940,6 +2050,13 @@ static int vduse_create_dev(struct vduse_dev_config *config,
> > > >         for (u32 i = 0; i < dev->ngroups; ++i)
> > > >                 dev->groups[i].dev = dev;
> > > >
> > > > +       dev->nas = (dev->api_version < 1) ? 1 : (config->nas ?: 1);
> > > > +       dev->as = kcalloc(dev->nas, sizeof(dev->as[0]), GFP_KERNEL);
> > > > +       if (!dev->as)
> > > > +               goto err_as;
> > > > +       for (int i = 0; i < dev->nas; i++)
> > > > +               mutex_init(&dev->as[i].mem_lock);
> > > > +
> > > >         dev->name = kstrdup(config->name, GFP_KERNEL);
> > > >         if (!dev->name)
> > > >                 goto err_str;
> > > > @@ -1976,6 +2093,8 @@ static int vduse_create_dev(struct vduse_dev_config *config,
> > > >  err_idr:
> > > >         kfree(dev->name);
> > > >  err_str:
> > > > +       kfree(dev->as);
> > > > +err_as:
> > > >         kfree(dev->groups);
> > > >  err_vq_groups:
> > > >         vduse_dev_destroy(dev);
> > > > @@ -2101,7 +2220,7 @@ static int vduse_dev_init_vdpa(struct vduse_dev *dev, const char *name)
> > > >
> > > >         vdev = vdpa_alloc_device(struct vduse_vdpa, vdpa, dev->dev,
> > > >                                  &vduse_vdpa_config_ops, &vduse_map_ops,
> > > > -                                dev->ngroups, 1, name, true);
> > > > +                                dev->ngroups, dev->nas, name, true);
> > > >         if (IS_ERR(vdev))
> > > >                 return PTR_ERR(vdev);
> > > >
> > > > @@ -2130,11 +2249,20 @@ static int vdpa_dev_add(struct vdpa_mgmt_dev *mdev, const char *name,
> > > >                 return ret;
> > > >
> > > >         mutex_lock(&dev->domain_lock);
> > > > -       if (!dev->as.domain)
> > > > -               dev->as.domain = vduse_domain_create(VDUSE_IOVA_SIZE - 1,
> > > > -                                                 dev->bounce_size);
> > > > +       ret = 0;
> > > > +
> > > > +       for (int i = 0; i < dev->nas; ++i) {
> > > > +               dev->as[i].domain = vduse_domain_create(VDUSE_IOVA_SIZE - 1,
> > > > +                                                       dev->bounce_size);
> > > > +               if (!dev->as[i].domain) {
> > > > +                       ret = -ENOMEM;
> > > > +                       for (int j = 0; j < i; ++j)
> > > > +                               vduse_domain_destroy(dev->as[j].domain);
> > > > +               }
> > > > +       }
> > > > +
> > > >         mutex_unlock(&dev->domain_lock);
> > > > -       if (!dev->as.domain) {
> > > > +       if (ret == -ENOMEM) {
> > > >                 put_device(&dev->vdev->vdpa.dev);
> > > >                 return -ENOMEM;
> > > >         }
> > > > @@ -2143,8 +2271,12 @@ static int vdpa_dev_add(struct vdpa_mgmt_dev *mdev, const char *name,
> > > >         if (ret) {
> > > >                 put_device(&dev->vdev->vdpa.dev);
> > > >                 mutex_lock(&dev->domain_lock);
> > > > -               vduse_domain_destroy(dev->as.domain);
> > > > -               dev->as.domain = NULL;
> > > > +               for (int i = 0; i < dev->nas; i++) {
> > > > +                       if (dev->as[i].domain) {
> > > > +                               vduse_domain_destroy(dev->as[i].domain);
> > > > +                               dev->as[i].domain = NULL;
> > > > +                       }
> > >
> > > This is a little bit duplicated with the error handling above, should
> > > we consider to switch to use err labels?
> > >
> >
> > Sure, I can change it.
> >
> > > > +               }
> > > >                 mutex_unlock(&dev->domain_lock);
> > > >                 return ret;
> > > >         }
> > > > diff --git a/include/uapi/linux/vduse.h b/include/uapi/linux/vduse.h
> > > > index b1c0e47d71fb..54da965a65dc 100644
> > > > --- a/include/uapi/linux/vduse.h
> > > > +++ b/include/uapi/linux/vduse.h
> > > > @@ -47,7 +47,8 @@ struct vduse_dev_config {
> > > >         __u32 vq_num;
> > > >         __u32 vq_align;
> > > >         __u32 ngroups; /* if VDUSE_API_VERSION >= 1 */
> > > > -       __u32 reserved[12];
> > > > +       __u32 nas; /* if VDUSE_API_VERSION >= 1 */
> > > > +       __u32 reserved[11];
> > > >         __u32 config_size;
> > > >         __u8 config[];
> > > >  };
> > > > @@ -82,6 +83,18 @@ struct vduse_iotlb_entry {
> > > >         __u8 perm;
> > > >  };
> > > >
> > > > +/**
> > > > + * struct vduse_iotlb_entry_v2 - entry of IOTLB to describe one IOVA region in an ASID
> > > > + * @v1: the original vduse_iotlb_entry
> > > > + * @asid: address space ID of the IOVA region
> > > > + *
> > > > + * Structure used by VDUSE_IOTLB_GET_FD ioctl to find an overlapped IOVA region.
> > > > + */
> > > > +struct vduse_iotlb_entry_v2 {
> > > > +       struct vduse_iotlb_entry v1;
> > > > +       __u32 asid;
> > > > +};
> > > > +
> > > >  /*
> > > >   * Find the first IOVA region that overlaps with the range [start, last]
> > > >   * and return the corresponding file descriptor. Return -EINVAL means the
> > > > @@ -172,6 +185,16 @@ struct vduse_vq_group {
> > > >         __u32 group;
> > > >  };
> > > >
> > > > +/**
> > > > + * struct vduse_vq_group - virtqueue group
> > > > + @ @group: Index of the virtqueue group
> > > > + * @asid: Address space ID of the group
> > > > + */
> > > > +struct vduse_vq_group_asid {
> > > > +       __u32 group;
> > > > +       __u32 asid;
> > > > +};
> > > > +
> > > >  /**
> > > >   * struct vduse_vq_info - information of a virtqueue
> > > >   * @index: virtqueue index
> > > > @@ -231,6 +254,7 @@ struct vduse_vq_eventfd {
> > > >   * @uaddr: start address of userspace memory, it must be aligned to page size
> > > >   * @iova: start of the IOVA region
> > > >   * @size: size of the IOVA region
> > > > + * @asid: Address space ID of the IOVA region
> > > >   * @reserved: for future use, needs to be initialized to zero
> > > >   *
> > > >   * Structure used by VDUSE_IOTLB_REG_UMEM and VDUSE_IOTLB_DEREG_UMEM
> > > > @@ -240,7 +264,8 @@ struct vduse_iova_umem {
> > > >         __u64 uaddr;
> > > >         __u64 iova;
> > > >         __u64 size;
> > > > -       __u64 reserved[3];
> > > > +       __u32 asid;
> > > > +       __u32 reserved[5];
> > > >  };
> > > >
> > > >  /* Register userspace memory for IOVA regions */
> > > > @@ -254,6 +279,7 @@ struct vduse_iova_umem {
> > > >   * @start: start of the IOVA region
> > > >   * @last: last of the IOVA region
> > > >   * @capability: capability of the IOVA regsion
> > > > + * @asid: Address space ID of the IOVA region, only if device API version >= 1
> > > >   * @reserved: for future use, needs to be initialized to zero
> > > >   *
> > > >   * Structure used by VDUSE_IOTLB_GET_INFO ioctl to get information of
> > > > @@ -264,7 +290,8 @@ struct vduse_iova_info {
> > > >         __u64 last;
> > > >  #define VDUSE_IOVA_CAP_UMEM (1 << 0)
> > > >         __u64 capability;
> > > > -       __u64 reserved[3];
> > > > +       __u32 asid; /* Only if device API version >= 1 */
> > > > +       __u32 reserved[5];
> > > >  };
> > > >
> > > >  /*
> > > > @@ -287,6 +314,7 @@ enum vduse_req_type {
> > > >         VDUSE_SET_STATUS,
> > > >         VDUSE_UPDATE_IOTLB,
> > > >         VDUSE_GET_VQ_GROUP,
> > > > +       VDUSE_SET_VQ_GROUP_ASID,
> > > >  };
> > > >
> > > >  /**
> > > > @@ -321,6 +349,18 @@ struct vduse_iova_range {
> > > >         __u64 last;
> > > >  };
> > > >
> > > > +/**
> > > > + * struct vduse_iova_range - IOVA range [start, last] if API_VERSION >= 1
> > > > + * @start: start of the IOVA range
> > > > + * @last: last of the IOVA range
> > > > + * @asid: address space ID of the IOVA range
> > > > + */
> > > > +struct vduse_iova_range_v2 {
> > > > +       __u64 start;
> > > > +       __u64 last;
> > > > +       __u32 asid;
> > > > +};
> > > > +
> > > >  /**
> > > >   * struct vduse_dev_request - control request
> > > >   * @type: request type
> > > > @@ -330,6 +370,8 @@ struct vduse_iova_range {
> > > >   * @s: device status
> > > >   * @iova: IOVA range for updating
> > > >   * @vq_group: virtqueue group of a virtqueue
> > > > + * @iova_v2: IOVA range for updating if API_VERSION >= 1
> > > > + * @vq_group_asid: ASID of a virtqueue group
> > > >   * @padding: padding
> > > >   *
> > > >   * Structure used by read(2) on /dev/vduse/$NAME.
> > > > @@ -342,8 +384,10 @@ struct vduse_dev_request {
> > > >                 struct vduse_vq_state vq_state;
> > > >                 struct vduse_dev_status s;
> > > >                 struct vduse_iova_range iova;
> > > > -               /* Only if vduse api version >= 1 */;
> > > > +               /* Following members only if vduse api version >= 1 */;
> > > >                 struct vduse_vq_group vq_group;
> > > > +               struct vduse_iova_range_v2 iova_v2;
> > > > +               struct vduse_vq_group_asid vq_group_asid;
> > > >                 __u32 padding[32];
> > >
> > > This seems to break the uAPI for the userspace that tries to use
> > > sizeof(struct vduse_dev_request)?
> > >
> >
> > No, I'm adding a member to the union that is smaller than u32[32]:
> >
> > https://patchew.org/linux/20250606115012.1331551-1-eperezma@redhat.com/20250606115012.1331551-4-eperezma@redhat.com/#CACGkMEvT._5F1ngR9Cs1A6ghNhZtyXiAb7qZq-Xj=7NWOzO9o5C=w@mail.gmail.com
> >
>
> You are right.
>
> Thanks
>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/6] vduse: add vq group support
  2025-09-03  6:28         ` Eugenio Perez Martin
@ 2025-09-03  7:40           ` Jason Wang
  2025-09-03 10:30             ` Eugenio Perez Martin
  0 siblings, 1 reply; 24+ messages in thread
From: Jason Wang @ 2025-09-03  7:40 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Michael S . Tsirkin, Cindy Lu, Stefano Garzarella, Xuan Zhuo,
	Laurent Vivier, virtualization, linux-kernel, Yongji Xie,
	Maxime Coquelin

On Wed, Sep 3, 2025 at 2:29 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
>
> On Wed, Sep 3, 2025 at 5:58 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Mon, Sep 1, 2025 at 4:40 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
> > >
> > > On Mon, Sep 1, 2025 at 3:59 AM Jason Wang <jasowang@redhat.com> wrote:
> > > >
> > > > On Tue, Aug 26, 2025 at 7:27 PM Eugenio Pérez <eperezma@redhat.com> wrote:
> > > > >
> > > > > This allows sepparate the different virtqueues in groups that shares the
> > > > > same address space.  Asking the VDUSE device for the groups of the vq at
> > > > > the beginning as they're needed for the DMA API.
> > > > >
> > > > > Allocating 3 vq groups as net is the device that need the most groups:
> > > > > * Dataplane (guest passthrough)
> > > > > * CVQ
> > > > > * Shadowed vrings.
> > > > >
> > > > > Future versions of the series can include dynamic allocation of the
> > > > > groups array so VDUSE can declare more groups.
> > > > >
> > > > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > > > ---
> > > > > v1: Fix: Remove BIT_ULL(VIRTIO_S_*), as _S_ is already the bit (Maxime)
> > > > >
> > > > > RFC v3:
> > > > > * Increase VDUSE_MAX_VQ_GROUPS to 0xffff (Jason). It was set to a lower
> > > > >   value to reduce memory consumption, but vqs are already limited to
> > > > >   that value and userspace VDUSE is able to allocate that many vqs.
> > > > > * Remove the descs vq group capability as it will not be used and we can
> > > > >   add it on top.
> > > > > * Do not ask for vq groups in number of vq groups < 2.
> > > > > * Move the valid vq groups range check to vduse_validate_config.
> > > > >
> > > > > RFC v2:
> > > > > * Cache group information in kernel, as we need to provide the vq map
> > > > >   tokens properly.
> > > > > * Add descs vq group to optimize SVQ forwarding and support indirect
> > > > >   descriptors out of the box.
> > > > > ---
> > > > >  drivers/vdpa/vdpa_user/vduse_dev.c | 51 ++++++++++++++++++++++++++++--
> > > > >  include/uapi/linux/vduse.h         | 21 +++++++++++-
> > > > >  2 files changed, 68 insertions(+), 4 deletions(-)
> > > > >
> > > > > diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
> > > > > index e7bced0b5542..0f4e36dd167e 100644
> > > > > --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> > > > > +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> > > > > @@ -58,6 +58,7 @@ struct vduse_virtqueue {
> > > > >         struct vdpa_vq_state state;
> > > > >         bool ready;
> > > > >         bool kicked;
> > > > > +       u32 vq_group;
> > > > >         spinlock_t kick_lock;
> > > > >         spinlock_t irq_lock;
> > > > >         struct eventfd_ctx *kickfd;
> > > > > @@ -114,6 +115,7 @@ struct vduse_dev {
> > > > >         u8 status;
> > > > >         u32 vq_num;
> > > > >         u32 vq_align;
> > > > > +       u32 ngroups;
> > > > >         struct vduse_umem *umem;
> > > > >         struct mutex mem_lock;
> > > > >         unsigned int bounce_size;
> > > > > @@ -592,6 +594,13 @@ static int vduse_vdpa_set_vq_state(struct vdpa_device *vdpa, u16 idx,
> > > > >         return 0;
> > > > >  }
> > > > >
> > > > > +static u32 vduse_get_vq_group(struct vdpa_device *vdpa, u16 idx)
> > > > > +{
> > > > > +       struct vduse_dev *dev = vdpa_to_vduse(vdpa);
> > > > > +
> > > > > +       return dev->vqs[idx]->vq_group;
> > > > > +}
> > > > > +
> > > > >  static int vduse_vdpa_get_vq_state(struct vdpa_device *vdpa, u16 idx,
> > > > >                                 struct vdpa_vq_state *state)
> > > > >  {
> > > > > @@ -678,6 +687,28 @@ static u8 vduse_vdpa_get_status(struct vdpa_device *vdpa)
> > > > >         return dev->status;
> > > > >  }
> > > > >
> > > > > +static int vduse_fill_vq_groups(struct vduse_dev *dev)
> > > > > +{
> > > > > +       /* All vqs and descs must be in vq group 0 if ngroups < 2 */
> > > > > +       if (dev->ngroups < 2)
> > > > > +               return 0;
> > > > > +
> > > > > +       for (int i = 0; i < dev->vdev->vdpa.nvqs; ++i) {
> > > > > +               struct vduse_dev_msg msg = { 0 };
> > > > > +               int ret;
> > > > > +
> > > > > +               msg.req.type = VDUSE_GET_VQ_GROUP;
> > > > > +               msg.req.vq_group.index = i;
> > > > > +               ret = vduse_dev_msg_sync(dev, &msg);
> > > > > +               if (ret)
> > > > > +                       return ret;
> > > > > +
> > > > > +               dev->vqs[i]->vq_group = msg.resp.vq_group.group;
> > > > > +       }
> > > > > +
> > > > > +       return 0;
> > > > > +}
> > > > > +
> > > > >  static void vduse_vdpa_set_status(struct vdpa_device *vdpa, u8 status)
> > > > >  {
> > > > >         struct vduse_dev *dev = vdpa_to_vduse(vdpa);
> > > > > @@ -685,6 +716,11 @@ static void vduse_vdpa_set_status(struct vdpa_device *vdpa, u8 status)
> > > > >         if (vduse_dev_set_status(dev, status))
> > > > >                 return;
> > > > >
> > > > > +       if (((dev->status ^ status) & VIRTIO_CONFIG_S_FEATURES_OK) &&
> > > > > +           (status & VIRTIO_CONFIG_S_FEATURES_OK))
> > > > > +               if (vduse_fill_vq_groups(dev))
> > > > > +                       return;
> > > >
> > > > I may lose some context but I think we've agreed that we need to
> > > > extend the status response for this instead of having multiple
> > > > indepdent response.
> > > >
> > >
> > > My understanding was it is ok to start with this version by [1]. We
> > > can even make it asynchronous on top if we find this is a bottleneck
> > > and the VDUSE device would need no change, would that work?
> >
> > I think I need to understand why we can not defer this to get_group_asid() call.
> >
>
> Because we need to know the vq_groups->asid mapping in other calls
> like set_group_asid or get_vq_group.

I think we don't need the mapping of those, or anything I miss?

And the vq to group mappings could be piggy backed by the device
creation request.

>
> We could add a boolean on each virtqueue to track if we know its
> virtqueue group and then only ask VDUSE device it if needed, would
> that work?

Thanks

>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/6] vduse: add vq group support
  2025-09-03  7:40           ` Jason Wang
@ 2025-09-03 10:30             ` Eugenio Perez Martin
  2025-09-04  3:08               ` Jason Wang
  0 siblings, 1 reply; 24+ messages in thread
From: Eugenio Perez Martin @ 2025-09-03 10:30 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S . Tsirkin, Cindy Lu, Stefano Garzarella, Xuan Zhuo,
	Laurent Vivier, virtualization, linux-kernel, Yongji Xie,
	Maxime Coquelin

On Wed, Sep 3, 2025 at 9:40 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Wed, Sep 3, 2025 at 2:29 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
> >
> > On Wed, Sep 3, 2025 at 5:58 AM Jason Wang <jasowang@redhat.com> wrote:
> > >
> > > On Mon, Sep 1, 2025 at 4:40 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
> > > >
> > > > On Mon, Sep 1, 2025 at 3:59 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > >
> > > > > On Tue, Aug 26, 2025 at 7:27 PM Eugenio Pérez <eperezma@redhat.com> wrote:
> > > > > >
> > > > > > This allows sepparate the different virtqueues in groups that shares the
> > > > > > same address space.  Asking the VDUSE device for the groups of the vq at
> > > > > > the beginning as they're needed for the DMA API.
> > > > > >
> > > > > > Allocating 3 vq groups as net is the device that need the most groups:
> > > > > > * Dataplane (guest passthrough)
> > > > > > * CVQ
> > > > > > * Shadowed vrings.
> > > > > >
> > > > > > Future versions of the series can include dynamic allocation of the
> > > > > > groups array so VDUSE can declare more groups.
> > > > > >
> > > > > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > > > > ---
> > > > > > v1: Fix: Remove BIT_ULL(VIRTIO_S_*), as _S_ is already the bit (Maxime)
> > > > > >
> > > > > > RFC v3:
> > > > > > * Increase VDUSE_MAX_VQ_GROUPS to 0xffff (Jason). It was set to a lower
> > > > > >   value to reduce memory consumption, but vqs are already limited to
> > > > > >   that value and userspace VDUSE is able to allocate that many vqs.
> > > > > > * Remove the descs vq group capability as it will not be used and we can
> > > > > >   add it on top.
> > > > > > * Do not ask for vq groups in number of vq groups < 2.
> > > > > > * Move the valid vq groups range check to vduse_validate_config.
> > > > > >
> > > > > > RFC v2:
> > > > > > * Cache group information in kernel, as we need to provide the vq map
> > > > > >   tokens properly.
> > > > > > * Add descs vq group to optimize SVQ forwarding and support indirect
> > > > > >   descriptors out of the box.
> > > > > > ---
> > > > > >  drivers/vdpa/vdpa_user/vduse_dev.c | 51 ++++++++++++++++++++++++++++--
> > > > > >  include/uapi/linux/vduse.h         | 21 +++++++++++-
> > > > > >  2 files changed, 68 insertions(+), 4 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
> > > > > > index e7bced0b5542..0f4e36dd167e 100644
> > > > > > --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> > > > > > +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> > > > > > @@ -58,6 +58,7 @@ struct vduse_virtqueue {
> > > > > >         struct vdpa_vq_state state;
> > > > > >         bool ready;
> > > > > >         bool kicked;
> > > > > > +       u32 vq_group;
> > > > > >         spinlock_t kick_lock;
> > > > > >         spinlock_t irq_lock;
> > > > > >         struct eventfd_ctx *kickfd;
> > > > > > @@ -114,6 +115,7 @@ struct vduse_dev {
> > > > > >         u8 status;
> > > > > >         u32 vq_num;
> > > > > >         u32 vq_align;
> > > > > > +       u32 ngroups;
> > > > > >         struct vduse_umem *umem;
> > > > > >         struct mutex mem_lock;
> > > > > >         unsigned int bounce_size;
> > > > > > @@ -592,6 +594,13 @@ static int vduse_vdpa_set_vq_state(struct vdpa_device *vdpa, u16 idx,
> > > > > >         return 0;
> > > > > >  }
> > > > > >
> > > > > > +static u32 vduse_get_vq_group(struct vdpa_device *vdpa, u16 idx)
> > > > > > +{
> > > > > > +       struct vduse_dev *dev = vdpa_to_vduse(vdpa);
> > > > > > +
> > > > > > +       return dev->vqs[idx]->vq_group;
> > > > > > +}
> > > > > > +
> > > > > >  static int vduse_vdpa_get_vq_state(struct vdpa_device *vdpa, u16 idx,
> > > > > >                                 struct vdpa_vq_state *state)
> > > > > >  {
> > > > > > @@ -678,6 +687,28 @@ static u8 vduse_vdpa_get_status(struct vdpa_device *vdpa)
> > > > > >         return dev->status;
> > > > > >  }
> > > > > >
> > > > > > +static int vduse_fill_vq_groups(struct vduse_dev *dev)
> > > > > > +{
> > > > > > +       /* All vqs and descs must be in vq group 0 if ngroups < 2 */
> > > > > > +       if (dev->ngroups < 2)
> > > > > > +               return 0;
> > > > > > +
> > > > > > +       for (int i = 0; i < dev->vdev->vdpa.nvqs; ++i) {
> > > > > > +               struct vduse_dev_msg msg = { 0 };
> > > > > > +               int ret;
> > > > > > +
> > > > > > +               msg.req.type = VDUSE_GET_VQ_GROUP;
> > > > > > +               msg.req.vq_group.index = i;
> > > > > > +               ret = vduse_dev_msg_sync(dev, &msg);
> > > > > > +               if (ret)
> > > > > > +                       return ret;
> > > > > > +
> > > > > > +               dev->vqs[i]->vq_group = msg.resp.vq_group.group;
> > > > > > +       }
> > > > > > +
> > > > > > +       return 0;
> > > > > > +}
> > > > > > +
> > > > > >  static void vduse_vdpa_set_status(struct vdpa_device *vdpa, u8 status)
> > > > > >  {
> > > > > >         struct vduse_dev *dev = vdpa_to_vduse(vdpa);
> > > > > > @@ -685,6 +716,11 @@ static void vduse_vdpa_set_status(struct vdpa_device *vdpa, u8 status)
> > > > > >         if (vduse_dev_set_status(dev, status))
> > > > > >                 return;
> > > > > >
> > > > > > +       if (((dev->status ^ status) & VIRTIO_CONFIG_S_FEATURES_OK) &&
> > > > > > +           (status & VIRTIO_CONFIG_S_FEATURES_OK))
> > > > > > +               if (vduse_fill_vq_groups(dev))
> > > > > > +                       return;
> > > > >
> > > > > I may lose some context but I think we've agreed that we need to
> > > > > extend the status response for this instead of having multiple
> > > > > indepdent response.
> > > > >
> > > >
> > > > My understanding was it is ok to start with this version by [1]. We
> > > > can even make it asynchronous on top if we find this is a bottleneck
> > > > and the VDUSE device would need no change, would that work?
> > >
> > > I think I need to understand why we can not defer this to get_group_asid() call.
> > >
> >
> > Because we need to know the vq_groups->asid mapping in other calls
> > like set_group_asid or get_vq_group.
>
> I think we don't need the mapping of those, or anything I miss?
>

If the kernel module does not ask the userland device for the actual
vq group of a virtqueue, what should it return in vduse_get_vq_group?
0 for all vqs, even if the CVQ is in vq group 1?

That's also valid for vduse_get_vq_map, which return is assumed not to
change in all the life of the device as it is not protected by a
mutex.

> And the vq to group mappings could be piggy backed by the device
> creation request.
>

I'm not sure, I think it involves a vduse request per asid or vq group
operation. Even get_vq_map. But I'm open to explore this possibility
for sure.

> >
> > We could add a boolean on each virtqueue to track if we know its
> > virtqueue group and then only ask VDUSE device it if needed, would
> > that work?
>
> Thanks
>
> >
>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/6] vduse: add vq group support
  2025-09-03 10:30             ` Eugenio Perez Martin
@ 2025-09-04  3:08               ` Jason Wang
  2025-09-04  3:20                 ` Jason Wang
  0 siblings, 1 reply; 24+ messages in thread
From: Jason Wang @ 2025-09-04  3:08 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Michael S . Tsirkin, Cindy Lu, Stefano Garzarella, Xuan Zhuo,
	Laurent Vivier, virtualization, linux-kernel, Yongji Xie,
	Maxime Coquelin

On Wed, Sep 3, 2025 at 6:31 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
>
> On Wed, Sep 3, 2025 at 9:40 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Wed, Sep 3, 2025 at 2:29 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
> > >
> > > On Wed, Sep 3, 2025 at 5:58 AM Jason Wang <jasowang@redhat.com> wrote:
> > > >
> > > > On Mon, Sep 1, 2025 at 4:40 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
> > > > >
> > > > > On Mon, Sep 1, 2025 at 3:59 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > > >
> > > > > > On Tue, Aug 26, 2025 at 7:27 PM Eugenio Pérez <eperezma@redhat.com> wrote:
> > > > > > >
> > > > > > > This allows sepparate the different virtqueues in groups that shares the
> > > > > > > same address space.  Asking the VDUSE device for the groups of the vq at
> > > > > > > the beginning as they're needed for the DMA API.
> > > > > > >
> > > > > > > Allocating 3 vq groups as net is the device that need the most groups:
> > > > > > > * Dataplane (guest passthrough)
> > > > > > > * CVQ
> > > > > > > * Shadowed vrings.
> > > > > > >
> > > > > > > Future versions of the series can include dynamic allocation of the
> > > > > > > groups array so VDUSE can declare more groups.
> > > > > > >
> > > > > > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > > > > > ---
> > > > > > > v1: Fix: Remove BIT_ULL(VIRTIO_S_*), as _S_ is already the bit (Maxime)
> > > > > > >
> > > > > > > RFC v3:
> > > > > > > * Increase VDUSE_MAX_VQ_GROUPS to 0xffff (Jason). It was set to a lower
> > > > > > >   value to reduce memory consumption, but vqs are already limited to
> > > > > > >   that value and userspace VDUSE is able to allocate that many vqs.
> > > > > > > * Remove the descs vq group capability as it will not be used and we can
> > > > > > >   add it on top.
> > > > > > > * Do not ask for vq groups in number of vq groups < 2.
> > > > > > > * Move the valid vq groups range check to vduse_validate_config.
> > > > > > >
> > > > > > > RFC v2:
> > > > > > > * Cache group information in kernel, as we need to provide the vq map
> > > > > > >   tokens properly.
> > > > > > > * Add descs vq group to optimize SVQ forwarding and support indirect
> > > > > > >   descriptors out of the box.
> > > > > > > ---
> > > > > > >  drivers/vdpa/vdpa_user/vduse_dev.c | 51 ++++++++++++++++++++++++++++--
> > > > > > >  include/uapi/linux/vduse.h         | 21 +++++++++++-
> > > > > > >  2 files changed, 68 insertions(+), 4 deletions(-)
> > > > > > >
> > > > > > > diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
> > > > > > > index e7bced0b5542..0f4e36dd167e 100644
> > > > > > > --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> > > > > > > +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> > > > > > > @@ -58,6 +58,7 @@ struct vduse_virtqueue {
> > > > > > >         struct vdpa_vq_state state;
> > > > > > >         bool ready;
> > > > > > >         bool kicked;
> > > > > > > +       u32 vq_group;
> > > > > > >         spinlock_t kick_lock;
> > > > > > >         spinlock_t irq_lock;
> > > > > > >         struct eventfd_ctx *kickfd;
> > > > > > > @@ -114,6 +115,7 @@ struct vduse_dev {
> > > > > > >         u8 status;
> > > > > > >         u32 vq_num;
> > > > > > >         u32 vq_align;
> > > > > > > +       u32 ngroups;
> > > > > > >         struct vduse_umem *umem;
> > > > > > >         struct mutex mem_lock;
> > > > > > >         unsigned int bounce_size;
> > > > > > > @@ -592,6 +594,13 @@ static int vduse_vdpa_set_vq_state(struct vdpa_device *vdpa, u16 idx,
> > > > > > >         return 0;
> > > > > > >  }
> > > > > > >
> > > > > > > +static u32 vduse_get_vq_group(struct vdpa_device *vdpa, u16 idx)
> > > > > > > +{
> > > > > > > +       struct vduse_dev *dev = vdpa_to_vduse(vdpa);
> > > > > > > +
> > > > > > > +       return dev->vqs[idx]->vq_group;
> > > > > > > +}
> > > > > > > +
> > > > > > >  static int vduse_vdpa_get_vq_state(struct vdpa_device *vdpa, u16 idx,
> > > > > > >                                 struct vdpa_vq_state *state)
> > > > > > >  {
> > > > > > > @@ -678,6 +687,28 @@ static u8 vduse_vdpa_get_status(struct vdpa_device *vdpa)
> > > > > > >         return dev->status;
> > > > > > >  }
> > > > > > >
> > > > > > > +static int vduse_fill_vq_groups(struct vduse_dev *dev)
> > > > > > > +{
> > > > > > > +       /* All vqs and descs must be in vq group 0 if ngroups < 2 */
> > > > > > > +       if (dev->ngroups < 2)
> > > > > > > +               return 0;
> > > > > > > +
> > > > > > > +       for (int i = 0; i < dev->vdev->vdpa.nvqs; ++i) {
> > > > > > > +               struct vduse_dev_msg msg = { 0 };
> > > > > > > +               int ret;
> > > > > > > +
> > > > > > > +               msg.req.type = VDUSE_GET_VQ_GROUP;
> > > > > > > +               msg.req.vq_group.index = i;
> > > > > > > +               ret = vduse_dev_msg_sync(dev, &msg);
> > > > > > > +               if (ret)
> > > > > > > +                       return ret;
> > > > > > > +
> > > > > > > +               dev->vqs[i]->vq_group = msg.resp.vq_group.group;
> > > > > > > +       }
> > > > > > > +
> > > > > > > +       return 0;
> > > > > > > +}
> > > > > > > +
> > > > > > >  static void vduse_vdpa_set_status(struct vdpa_device *vdpa, u8 status)
> > > > > > >  {
> > > > > > >         struct vduse_dev *dev = vdpa_to_vduse(vdpa);
> > > > > > > @@ -685,6 +716,11 @@ static void vduse_vdpa_set_status(struct vdpa_device *vdpa, u8 status)
> > > > > > >         if (vduse_dev_set_status(dev, status))
> > > > > > >                 return;
> > > > > > >
> > > > > > > +       if (((dev->status ^ status) & VIRTIO_CONFIG_S_FEATURES_OK) &&
> > > > > > > +           (status & VIRTIO_CONFIG_S_FEATURES_OK))
> > > > > > > +               if (vduse_fill_vq_groups(dev))
> > > > > > > +                       return;
> > > > > >
> > > > > > I may lose some context but I think we've agreed that we need to
> > > > > > extend the status response for this instead of having multiple
> > > > > > indepdent response.
> > > > > >
> > > > >
> > > > > My understanding was it is ok to start with this version by [1]. We
> > > > > can even make it asynchronous on top if we find this is a bottleneck
> > > > > and the VDUSE device would need no change, would that work?
> > > >
> > > > I think I need to understand why we can not defer this to get_group_asid() call.
> > > >
> > >
> > > Because we need to know the vq_groups->asid mapping in other calls
> > > like set_group_asid or get_vq_group.
> >
> > I think we don't need the mapping of those, or anything I miss?
> >
>
> If the kernel module does not ask the userland device for the actual
> vq group of a virtqueue, what should it return in vduse_get_vq_group?
> 0 for all vqs, even if the CVQ is in vq group 1?

Since the topology is fixed I think userspace should provide this when
creating a device.

>
> That's also valid for vduse_get_vq_map, which return is assumed not to
> change in all the life of the device as it is not protected by a
> mutex.
>
> > And the vq to group mappings could be piggy backed by the device
> > creation request.
> >
>
> I'm not sure, I think it involves a vduse request per asid or vq group
> operation. Even get_vq_map. But I'm open to explore this possibility
> for sure.

Something like this?

struct vduse_vq_config {
        __u32 index;
        __u16 max_size;
        __u16 reserved[13];
};

?

Thanks

>
> > >
> > > We could add a boolean on each virtqueue to track if we know its
> > > virtqueue group and then only ask VDUSE device it if needed, would
> > > that work?
> >
> > Thanks
> >
> > >
> >
>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/6] vduse: add vq group support
  2025-09-04  3:08               ` Jason Wang
@ 2025-09-04  3:20                 ` Jason Wang
  0 siblings, 0 replies; 24+ messages in thread
From: Jason Wang @ 2025-09-04  3:20 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Michael S . Tsirkin, Cindy Lu, Stefano Garzarella, Xuan Zhuo,
	Laurent Vivier, virtualization, linux-kernel, Yongji Xie,
	Maxime Coquelin

On Thu, Sep 4, 2025 at 11:08 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Wed, Sep 3, 2025 at 6:31 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
> >
> > On Wed, Sep 3, 2025 at 9:40 AM Jason Wang <jasowang@redhat.com> wrote:
> > >
> > > On Wed, Sep 3, 2025 at 2:29 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
> > > >
> > > > On Wed, Sep 3, 2025 at 5:58 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > >
> > > > > On Mon, Sep 1, 2025 at 4:40 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
> > > > > >
> > > > > > On Mon, Sep 1, 2025 at 3:59 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > >
> > > > > > > On Tue, Aug 26, 2025 at 7:27 PM Eugenio Pérez <eperezma@redhat.com> wrote:
> > > > > > > >
> > > > > > > > This allows sepparate the different virtqueues in groups that shares the
> > > > > > > > same address space.  Asking the VDUSE device for the groups of the vq at
> > > > > > > > the beginning as they're needed for the DMA API.
> > > > > > > >
> > > > > > > > Allocating 3 vq groups as net is the device that need the most groups:
> > > > > > > > * Dataplane (guest passthrough)
> > > > > > > > * CVQ
> > > > > > > > * Shadowed vrings.
> > > > > > > >
> > > > > > > > Future versions of the series can include dynamic allocation of the
> > > > > > > > groups array so VDUSE can declare more groups.
> > > > > > > >
> > > > > > > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > > > > > > ---
> > > > > > > > v1: Fix: Remove BIT_ULL(VIRTIO_S_*), as _S_ is already the bit (Maxime)
> > > > > > > >
> > > > > > > > RFC v3:
> > > > > > > > * Increase VDUSE_MAX_VQ_GROUPS to 0xffff (Jason). It was set to a lower
> > > > > > > >   value to reduce memory consumption, but vqs are already limited to
> > > > > > > >   that value and userspace VDUSE is able to allocate that many vqs.
> > > > > > > > * Remove the descs vq group capability as it will not be used and we can
> > > > > > > >   add it on top.
> > > > > > > > * Do not ask for vq groups in number of vq groups < 2.
> > > > > > > > * Move the valid vq groups range check to vduse_validate_config.
> > > > > > > >
> > > > > > > > RFC v2:
> > > > > > > > * Cache group information in kernel, as we need to provide the vq map
> > > > > > > >   tokens properly.
> > > > > > > > * Add descs vq group to optimize SVQ forwarding and support indirect
> > > > > > > >   descriptors out of the box.
> > > > > > > > ---
> > > > > > > >  drivers/vdpa/vdpa_user/vduse_dev.c | 51 ++++++++++++++++++++++++++++--
> > > > > > > >  include/uapi/linux/vduse.h         | 21 +++++++++++-
> > > > > > > >  2 files changed, 68 insertions(+), 4 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
> > > > > > > > index e7bced0b5542..0f4e36dd167e 100644
> > > > > > > > --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> > > > > > > > +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> > > > > > > > @@ -58,6 +58,7 @@ struct vduse_virtqueue {
> > > > > > > >         struct vdpa_vq_state state;
> > > > > > > >         bool ready;
> > > > > > > >         bool kicked;
> > > > > > > > +       u32 vq_group;
> > > > > > > >         spinlock_t kick_lock;
> > > > > > > >         spinlock_t irq_lock;
> > > > > > > >         struct eventfd_ctx *kickfd;
> > > > > > > > @@ -114,6 +115,7 @@ struct vduse_dev {
> > > > > > > >         u8 status;
> > > > > > > >         u32 vq_num;
> > > > > > > >         u32 vq_align;
> > > > > > > > +       u32 ngroups;
> > > > > > > >         struct vduse_umem *umem;
> > > > > > > >         struct mutex mem_lock;
> > > > > > > >         unsigned int bounce_size;
> > > > > > > > @@ -592,6 +594,13 @@ static int vduse_vdpa_set_vq_state(struct vdpa_device *vdpa, u16 idx,
> > > > > > > >         return 0;
> > > > > > > >  }
> > > > > > > >
> > > > > > > > +static u32 vduse_get_vq_group(struct vdpa_device *vdpa, u16 idx)
> > > > > > > > +{
> > > > > > > > +       struct vduse_dev *dev = vdpa_to_vduse(vdpa);
> > > > > > > > +
> > > > > > > > +       return dev->vqs[idx]->vq_group;
> > > > > > > > +}
> > > > > > > > +
> > > > > > > >  static int vduse_vdpa_get_vq_state(struct vdpa_device *vdpa, u16 idx,
> > > > > > > >                                 struct vdpa_vq_state *state)
> > > > > > > >  {
> > > > > > > > @@ -678,6 +687,28 @@ static u8 vduse_vdpa_get_status(struct vdpa_device *vdpa)
> > > > > > > >         return dev->status;
> > > > > > > >  }
> > > > > > > >
> > > > > > > > +static int vduse_fill_vq_groups(struct vduse_dev *dev)
> > > > > > > > +{
> > > > > > > > +       /* All vqs and descs must be in vq group 0 if ngroups < 2 */
> > > > > > > > +       if (dev->ngroups < 2)
> > > > > > > > +               return 0;
> > > > > > > > +
> > > > > > > > +       for (int i = 0; i < dev->vdev->vdpa.nvqs; ++i) {
> > > > > > > > +               struct vduse_dev_msg msg = { 0 };
> > > > > > > > +               int ret;
> > > > > > > > +
> > > > > > > > +               msg.req.type = VDUSE_GET_VQ_GROUP;
> > > > > > > > +               msg.req.vq_group.index = i;
> > > > > > > > +               ret = vduse_dev_msg_sync(dev, &msg);
> > > > > > > > +               if (ret)
> > > > > > > > +                       return ret;
> > > > > > > > +
> > > > > > > > +               dev->vqs[i]->vq_group = msg.resp.vq_group.group;
> > > > > > > > +       }
> > > > > > > > +
> > > > > > > > +       return 0;
> > > > > > > > +}
> > > > > > > > +
> > > > > > > >  static void vduse_vdpa_set_status(struct vdpa_device *vdpa, u8 status)
> > > > > > > >  {
> > > > > > > >         struct vduse_dev *dev = vdpa_to_vduse(vdpa);
> > > > > > > > @@ -685,6 +716,11 @@ static void vduse_vdpa_set_status(struct vdpa_device *vdpa, u8 status)
> > > > > > > >         if (vduse_dev_set_status(dev, status))
> > > > > > > >                 return;
> > > > > > > >
> > > > > > > > +       if (((dev->status ^ status) & VIRTIO_CONFIG_S_FEATURES_OK) &&
> > > > > > > > +           (status & VIRTIO_CONFIG_S_FEATURES_OK))
> > > > > > > > +               if (vduse_fill_vq_groups(dev))
> > > > > > > > +                       return;
> > > > > > >
> > > > > > > I may lose some context but I think we've agreed that we need to
> > > > > > > extend the status response for this instead of having multiple
> > > > > > > indepdent response.
> > > > > > >
> > > > > >
> > > > > > My understanding was it is ok to start with this version by [1]. We
> > > > > > can even make it asynchronous on top if we find this is a bottleneck
> > > > > > and the VDUSE device would need no change, would that work?
> > > > >
> > > > > I think I need to understand why we can not defer this to get_group_asid() call.
> > > > >
> > > >
> > > > Because we need to know the vq_groups->asid mapping in other calls
> > > > like set_group_asid or get_vq_group.
> > >
> > > I think we don't need the mapping of those, or anything I miss?
> > >
> >
> > If the kernel module does not ask the userland device for the actual
> > vq group of a virtqueue, what should it return in vduse_get_vq_group?
> > 0 for all vqs, even if the CVQ is in vq group 1?
>
> Since the topology is fixed I think userspace should provide this when
> creating a device.
>
> >
> > That's also valid for vduse_get_vq_map, which return is assumed not to
> > change in all the life of the device as it is not protected by a
> > mutex.
> >
> > > And the vq to group mappings could be piggy backed by the device
> > > creation request.
> > >
> >
> > I'm not sure, I think it involves a vduse request per asid or vq group
> > operation. Even get_vq_map. But I'm open to explore this possibility
> > for sure.
>
> Something like this?
>
> struct vduse_vq_config {
>         __u32 index;
>         __u16 max_size;
>         __u16 reserved[13];
> };

I meant this actually:

struct vduse_vq_config {
         __u32 index;
         __u16 max_size;
         __u16 group;
         __u16 reserved[12];
 };

Thanks

>
> ?
>
> Thanks
>
> >
> > > >
> > > > We could add a boolean on each virtqueue to track if we know its
> > > > virtqueue group and then only ask VDUSE device it if needed, would
> > > > that work?
> > >
> > > Thanks
> > >
> > > >
> > >
> >


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2025-09-04  3:20 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-26 11:27 [PATCH 0/6] Add multiple address spaces support to VDUSE Eugenio Pérez
2025-08-26 11:27 ` [PATCH 1/6] vduse: add v1 API definition Eugenio Pérez
2025-08-26 11:27 ` [PATCH 2/6] vduse: add vq group support Eugenio Pérez
2025-09-01  1:59   ` Jason Wang
2025-09-01  2:31     ` Jason Wang
2025-09-01  8:39     ` Eugenio Perez Martin
2025-09-03  3:57       ` Jason Wang
2025-09-03  3:58       ` Jason Wang
2025-09-03  6:28         ` Eugenio Perez Martin
2025-09-03  7:40           ` Jason Wang
2025-09-03 10:30             ` Eugenio Perez Martin
2025-09-04  3:08               ` Jason Wang
2025-09-04  3:20                 ` Jason Wang
2025-08-26 11:27 ` [PATCH 3/6] vduse: return internal vq group struct as map token Eugenio Pérez
2025-09-01  2:25   ` Jason Wang
2025-09-01  7:27     ` Eugenio Perez Martin
2025-08-26 11:27 ` [PATCH 4/6] vduse: create vduse_as to make it an array Eugenio Pérez
2025-09-01  2:27   ` Jason Wang
2025-08-26 11:27 ` [PATCH 5/6] vduse: add vq group asid support Eugenio Pérez
2025-09-01  2:46   ` Jason Wang
2025-09-01  9:11     ` Eugenio Perez Martin
2025-09-03  3:56       ` Jason Wang
2025-09-03  6:39         ` Eugenio Perez Martin
2025-08-26 11:27 ` [PATCH 6/6] vduse: bump version number Eugenio Pérez

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).