* [PATCH v8 0/6] Add multiple address spaces support to VDUSE
@ 2025-10-30 10:00 Eugenio Pérez
2025-10-30 10:00 ` [PATCH v8 1/6] vduse: add v1 API definition Eugenio Pérez
` (5 more replies)
0 siblings, 6 replies; 24+ messages in thread
From: Eugenio Pérez @ 2025-10-30 10:00 UTC (permalink / raw)
To: Michael S . Tsirkin
Cc: Yongji Xie, Eugenio Pérez, Laurent Vivier, Maxime Coquelin,
virtualization, Stefano Garzarella, jasowang, Cindy Lu, Xuan Zhuo,
linux-kernel
When used by vhost-vDPA bus driver for VM, the control virtqueue
should be shadowed via userspace VMM (QEMU) instead of being assigned
directly to Guest. This is because QEMU needs to know the device state
in order to start and stop device correctly (e.g for Live Migration).
This requies to isolate the memory mapping for control virtqueue
presented by vhost-vDPA to prevent guest from accessing it directly.
This series add support to multiple address spaces in VDUSE device
allowing selective virtqueue isolation through address space IDs (ASID).
The VDUSE device needs to report:
* Number of virtqueue groups
* Association of each vq group with each virtqueue
* Number of address spaces supported.
Then, the vDPA driver can modify the ASID assigned to each VQ group to
isolate the memory AS. This aligns VDUSE with vdpa_sim and nvidia mlx5
devices which already support ASID.
This helps to isolate the environments for the virtqueues that will not
be assigned directly. E.g in the case of virtio-net, the control
virtqueue will not be assigned directly to guest.
Also, to be able to test this patch, the user needs to manually revert
56e71885b034 ("vduse: Temporarily fail if control queue feature requested").
Tested by creating a VDUSE device OVS with and without MQ, and live migrating
between two hosts back and forth while maintaining ping alive in all the
stages. All tested with and without lockdep.
PATCH v8:
* Revert the change from mutex to rwlock (MST).
PATCH v7:
* Fix not taking the write lock in the registering vdpa device error path
(Jason).
PATCH v6:
* Make vdpa_dev_add use gotos for error handling (MST).
* s/(dev->api_version < 1) ?/(dev->api_version < VDUSE_API_VERSION_1) ?/
in group and nas handling at device creation (MST).
* Fix struct name not matching in the doc.
* s/sepparate/separate (MST).
PATCH v5:
* Properly return errno if copy_to_user returns >0 in VDUSE_IOTLB_GET_FD
ioctl (Jason).
* Properly set domain bounce size to divide equally between nas (Jason).
* Revert core vdpa changes (Jason).
* Fix group == ngroup case in checking VQ_SETUP argument (Jason).
* Exclude "padding" member from the only >V1 members in
vduse_dev_request.
PATCH v4:
* Consider config->nas == 0 and config->ngroups == 0 as a fail (Jason).
* Revert the "invalid vq group" concept and assume 0 if not set.
* Divide each domain bounce size between the device bounce size (Jason).
* Revert unneeded addr = NULL assignment (Jason)
* Change if (x && (y || z)) return to if (x) { if (y) return; if (z)
return; } (Jason)
* Change a bad multiline comment, using @ caracter instead of * (Jason).
PATCH v3:
* Make the default group an invalid group as long as VDUSE device does
not set it to some valid u32 value. Modify the vdpa core to take that
into account (Jason). Adapt all the virtio_map_ops callbacks to it.
* Make setting status DRIVER_OK fail if vq group is not valid.
* Create the VDUSE_DEV_MAX_GROUPS and VDUSE_DEV_MAX_AS instead of using a magic
number
* Remove the _int name suffix from struct vduse_vq_group.
* Get the vduse domain through the vduse_as in the map functions (Jason).
* Squash the patch implementing the AS logic with the patch creating the
vduse_as struct (Jason).
PATCH v2:
* Now the vq group is in vduse_vq_config struct instead of issuing one
VDUSE message per vq.
* Convert the use of mutex to rwlock (Xie Yongji).
PATCH v1:
* Fix: Remove BIT_ULL(VIRTIO_S_*), as _S_ is already the bit (Maxime)
* Using vduse_vq_group_int directly instead of an empty struct in union
virtio_map.
RFC v3:
* Increase VDUSE_MAX_VQ_GROUPS to 0xffff (Jason). It was set to a lower
value to reduce memory consumption, but vqs are already limited to
that value and userspace VDUSE is able to allocate that many vqs. Also, it's
a dynamic array now. Same with ASID.
* Move the valid vq groups range check to vduse_validate_config.
* Embed vduse_iotlb_entry into vduse_iotlb_entry_v2.
* Use of array_index_nospec in VDUSE device ioctls.
* Move the umem mutex to asid struct so there is no contention between
ASIDs.
* Remove the descs vq group capability as it will not be used and we can
add it on top.
* Do not ask for vq groups in number of vq groups < 2.
* Remove TODO about merging VDUSE_IOTLB_GET_FD ioctl with
VDUSE_IOTLB_GET_INFO.
RFC v2:
* Cache group information in kernel, as we need to provide the vq map
tokens properly.
* Add descs vq group to optimize SVQ forwarding and support indirect
descriptors out of the box.
* Make iotlb entry the last one of vduse_iotlb_entry_v2 so the first
part of the struct is the same.
* Fixes detected testing with OVS+VDUSE.
Eugenio Pérez (6):
vduse: add v1 API definition
vduse: add vq group support
vduse: return internal vq group struct as map token
vduse: refactor vdpa_dev_add for goto err handling
vduse: add vq group asid support
vduse: bump version number
drivers/vdpa/vdpa_user/vduse_dev.c | 474 ++++++++++++++++++++++-------
include/linux/virtio.h | 6 +-
include/uapi/linux/vduse.h | 67 +++-
3 files changed, 433 insertions(+), 114 deletions(-)
--
2.51.0
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH v8 1/6] vduse: add v1 API definition
2025-10-30 10:00 [PATCH v8 0/6] Add multiple address spaces support to VDUSE Eugenio Pérez
@ 2025-10-30 10:00 ` Eugenio Pérez
2025-10-30 12:10 ` Yongji Xie
2025-10-30 10:00 ` [PATCH v8 2/6] vduse: add vq group support Eugenio Pérez
` (4 subsequent siblings)
5 siblings, 1 reply; 24+ messages in thread
From: Eugenio Pérez @ 2025-10-30 10:00 UTC (permalink / raw)
To: Michael S . Tsirkin
Cc: Yongji Xie, Eugenio Pérez, Laurent Vivier, Maxime Coquelin,
virtualization, Stefano Garzarella, jasowang, Cindy Lu, Xuan Zhuo,
linux-kernel
This allows the kernel to detect whether the userspace VDUSE device
supports the VQ group and ASID features. VDUSE devices that don't set
the V1 API will not receive the new messages, and vdpa device will be
created with only one vq group and asid.
The next patches implement the new feature incrementally, only enabling
the VDUSE device to set the V1 API version by the end of the series.
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
include/uapi/linux/vduse.h | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/include/uapi/linux/vduse.h b/include/uapi/linux/vduse.h
index 10ad71aa00d6..ccb92a1efce0 100644
--- a/include/uapi/linux/vduse.h
+++ b/include/uapi/linux/vduse.h
@@ -10,6 +10,10 @@
#define VDUSE_API_VERSION 0
+/* VQ groups and ASID support */
+
+#define VDUSE_API_VERSION_1 1
+
/*
* Get the version of VDUSE API that kernel supported (VDUSE_API_VERSION).
* This is used for future extension.
--
2.51.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH v8 2/6] vduse: add vq group support
2025-10-30 10:00 [PATCH v8 0/6] Add multiple address spaces support to VDUSE Eugenio Pérez
2025-10-30 10:00 ` [PATCH v8 1/6] vduse: add v1 API definition Eugenio Pérez
@ 2025-10-30 10:00 ` Eugenio Pérez
2025-10-30 12:18 ` Yongji Xie
2025-10-30 10:00 ` [PATCH v8 3/6] vduse: return internal vq group struct as map token Eugenio Pérez
` (3 subsequent siblings)
5 siblings, 1 reply; 24+ messages in thread
From: Eugenio Pérez @ 2025-10-30 10:00 UTC (permalink / raw)
To: Michael S . Tsirkin
Cc: Yongji Xie, Eugenio Pérez, Laurent Vivier, Maxime Coquelin,
virtualization, Stefano Garzarella, jasowang, Cindy Lu, Xuan Zhuo,
linux-kernel
This allows separate the different virtqueues in groups that shares the
same address space. Asking the VDUSE device for the groups of the vq at
the beginning as they're needed for the DMA API.
Allocating 3 vq groups as net is the device that need the most groups:
* Dataplane (guest passthrough)
* CVQ
* Shadowed vrings.
Future versions of the series can include dynamic allocation of the
groups array so VDUSE can declare more groups.
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
v6:
* s/sepparate/separate (MST).
* s/dev->api_version < 1/dev->api_version < VDUSE_API_VERSION_1
v5:
* Revert core vdpa changes (Jason).
* Fix group == ngroup case in checking VQ_SETUP argument (Jason).
v4:
* Revert the "invalid vq group" concept and assume 0 if not set (Jason).
* Make config->ngroups == 0 invalid (Jason).
v3:
* Make the default group an invalid group as long as VDUSE device does
not set it to some valid u32 value. Modify the vdpa core to take that
into account (Jason).
* Create the VDUSE_DEV_MAX_GROUPS instead of using a magic number
v2:
* Now the vq group is in vduse_vq_config struct instead of issuing one
VDUSE message per vq.
v1:
* Fix: Remove BIT_ULL(VIRTIO_S_*), as _S_ is already the bit (Maxime)
RFC v3:
* Increase VDUSE_MAX_VQ_GROUPS to 0xffff (Jason). It was set to a lower
value to reduce memory consumption, but vqs are already limited to
that value and userspace VDUSE is able to allocate that many vqs.
* Remove the descs vq group capability as it will not be used and we can
add it on top.
* Do not ask for vq groups in number of vq groups < 2.
* Move the valid vq groups range check to vduse_validate_config.
RFC v2:
* Cache group information in kernel, as we need to provide the vq map
tokens properly.
* Add descs vq group to optimize SVQ forwarding and support indirect
descriptors out of the box.
---
drivers/vdpa/vdpa_user/vduse_dev.c | 48 ++++++++++++++++++++++++++----
include/uapi/linux/vduse.h | 12 ++++++--
2 files changed, 52 insertions(+), 8 deletions(-)
diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
index e7bced0b5542..d5c90c599698 100644
--- a/drivers/vdpa/vdpa_user/vduse_dev.c
+++ b/drivers/vdpa/vdpa_user/vduse_dev.c
@@ -39,6 +39,7 @@
#define DRV_LICENSE "GPL v2"
#define VDUSE_DEV_MAX (1U << MINORBITS)
+#define VDUSE_DEV_MAX_GROUPS 0xffff
#define VDUSE_MAX_BOUNCE_SIZE (1024 * 1024 * 1024)
#define VDUSE_MIN_BOUNCE_SIZE (1024 * 1024)
#define VDUSE_BOUNCE_SIZE (64 * 1024 * 1024)
@@ -58,6 +59,7 @@ struct vduse_virtqueue {
struct vdpa_vq_state state;
bool ready;
bool kicked;
+ u32 vq_group;
spinlock_t kick_lock;
spinlock_t irq_lock;
struct eventfd_ctx *kickfd;
@@ -114,6 +116,7 @@ struct vduse_dev {
u8 status;
u32 vq_num;
u32 vq_align;
+ u32 ngroups;
struct vduse_umem *umem;
struct mutex mem_lock;
unsigned int bounce_size;
@@ -455,6 +458,7 @@ static void vduse_dev_reset(struct vduse_dev *dev)
vq->driver_addr = 0;
vq->device_addr = 0;
vq->num = 0;
+ vq->vq_group = 0;
memset(&vq->state, 0, sizeof(vq->state));
spin_lock(&vq->kick_lock);
@@ -592,6 +596,16 @@ static int vduse_vdpa_set_vq_state(struct vdpa_device *vdpa, u16 idx,
return 0;
}
+static u32 vduse_get_vq_group(struct vdpa_device *vdpa, u16 idx)
+{
+ struct vduse_dev *dev = vdpa_to_vduse(vdpa);
+
+ if (dev->api_version < VDUSE_API_VERSION_1)
+ return 0;
+
+ return dev->vqs[idx]->vq_group;
+}
+
static int vduse_vdpa_get_vq_state(struct vdpa_device *vdpa, u16 idx,
struct vdpa_vq_state *state)
{
@@ -789,6 +803,7 @@ static const struct vdpa_config_ops vduse_vdpa_config_ops = {
.set_vq_cb = vduse_vdpa_set_vq_cb,
.set_vq_num = vduse_vdpa_set_vq_num,
.get_vq_size = vduse_vdpa_get_vq_size,
+ .get_vq_group = vduse_get_vq_group,
.set_vq_ready = vduse_vdpa_set_vq_ready,
.get_vq_ready = vduse_vdpa_get_vq_ready,
.set_vq_state = vduse_vdpa_set_vq_state,
@@ -1252,12 +1267,24 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
if (config.index >= dev->vq_num)
break;
- if (!is_mem_zero((const char *)config.reserved,
- sizeof(config.reserved)))
+ if (dev->api_version < VDUSE_API_VERSION_1 && config.group)
+ break;
+
+ if (dev->api_version >= VDUSE_API_VERSION_1) {
+ if (config.group >= dev->ngroups)
+ break;
+ if (dev->status & VIRTIO_CONFIG_S_DRIVER_OK)
+ break;
+ }
+
+ if (config.reserved1 ||
+ !is_mem_zero((const char *)config.reserved2,
+ sizeof(config.reserved2)))
break;
index = array_index_nospec(config.index, dev->vq_num);
dev->vqs[index]->num_max = config.max_size;
+ dev->vqs[index]->vq_group = config.group;
ret = 0;
break;
}
@@ -1737,12 +1764,20 @@ static bool features_is_valid(struct vduse_dev_config *config)
return true;
}
-static bool vduse_validate_config(struct vduse_dev_config *config)
+static bool vduse_validate_config(struct vduse_dev_config *config,
+ u64 api_version)
{
if (!is_mem_zero((const char *)config->reserved,
sizeof(config->reserved)))
return false;
+ if (api_version < VDUSE_API_VERSION_1 && config->ngroups)
+ return false;
+
+ if (api_version >= VDUSE_API_VERSION_1 &&
+ (!config->ngroups || config->ngroups > VDUSE_DEV_MAX_GROUPS))
+ return false;
+
if (config->vq_align > PAGE_SIZE)
return false;
@@ -1858,6 +1893,9 @@ static int vduse_create_dev(struct vduse_dev_config *config,
dev->device_features = config->features;
dev->device_id = config->device_id;
dev->vendor_id = config->vendor_id;
+ dev->ngroups = (dev->api_version < VDUSE_API_VERSION_1)
+ ? 1
+ : config->ngroups;
dev->name = kstrdup(config->name, GFP_KERNEL);
if (!dev->name)
goto err_str;
@@ -1936,7 +1974,7 @@ static long vduse_ioctl(struct file *file, unsigned int cmd,
break;
ret = -EINVAL;
- if (vduse_validate_config(&config) == false)
+ if (!vduse_validate_config(&config, control->api_version))
break;
buf = vmemdup_user(argp + size, config.config_size);
@@ -2017,7 +2055,7 @@ static int vduse_dev_init_vdpa(struct vduse_dev *dev, const char *name)
vdev = vdpa_alloc_device(struct vduse_vdpa, vdpa, dev->dev,
&vduse_vdpa_config_ops, &vduse_map_ops,
- 1, 1, name, true);
+ dev->ngroups, 1, name, true);
if (IS_ERR(vdev))
return PTR_ERR(vdev);
diff --git a/include/uapi/linux/vduse.h b/include/uapi/linux/vduse.h
index ccb92a1efce0..a3d51cf6df3a 100644
--- a/include/uapi/linux/vduse.h
+++ b/include/uapi/linux/vduse.h
@@ -31,6 +31,7 @@
* @features: virtio features
* @vq_num: the number of virtqueues
* @vq_align: the allocation alignment of virtqueue's metadata
+ * @ngroups: number of vq groups that VDUSE device declares
* @reserved: for future use, needs to be initialized to zero
* @config_size: the size of the configuration space
* @config: the buffer of the configuration space
@@ -45,7 +46,8 @@ struct vduse_dev_config {
__u64 features;
__u32 vq_num;
__u32 vq_align;
- __u32 reserved[13];
+ __u32 ngroups; /* if VDUSE_API_VERSION >= 1 */
+ __u32 reserved[12];
__u32 config_size;
__u8 config[];
};
@@ -122,14 +124,18 @@ struct vduse_config_data {
* struct vduse_vq_config - basic configuration of a virtqueue
* @index: virtqueue index
* @max_size: the max size of virtqueue
- * @reserved: for future use, needs to be initialized to zero
+ * @reserved1: for future use, needs to be initialized to zero
+ * @group: virtqueue group
+ * @reserved2: for future use, needs to be initialized to zero
*
* Structure used by VDUSE_VQ_SETUP ioctl to setup a virtqueue.
*/
struct vduse_vq_config {
__u32 index;
__u16 max_size;
- __u16 reserved[13];
+ __u16 reserved1;
+ __u32 group;
+ __u16 reserved2[10];
};
/*
--
2.51.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH v8 3/6] vduse: return internal vq group struct as map token
2025-10-30 10:00 [PATCH v8 0/6] Add multiple address spaces support to VDUSE Eugenio Pérez
2025-10-30 10:00 ` [PATCH v8 1/6] vduse: add v1 API definition Eugenio Pérez
2025-10-30 10:00 ` [PATCH v8 2/6] vduse: add vq group support Eugenio Pérez
@ 2025-10-30 10:00 ` Eugenio Pérez
2025-10-30 10:00 ` [PATCH v8 4/6] vduse: refactor vdpa_dev_add for goto err handling Eugenio Pérez
` (2 subsequent siblings)
5 siblings, 0 replies; 24+ messages in thread
From: Eugenio Pérez @ 2025-10-30 10:00 UTC (permalink / raw)
To: Michael S . Tsirkin
Cc: Yongji Xie, Eugenio Pérez, Laurent Vivier, Maxime Coquelin,
virtualization, Stefano Garzarella, jasowang, Cindy Lu, Xuan Zhuo,
linux-kernel
Return the internal struct that represents the vq group as virtqueue map
token, instead of the device. This allows the map functions to access
the information per group.
At this moment all the virtqueues share the same vq group, that only
can point to ASID 0. This change prepares the infrastructure for actual
per-group address space handling
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
v4:
* Revert the "invalid vq group" concept, and assume 0 by default.
* Revert unnecesary blank line addition (Jason)
v3:
* Adapt all virtio_map_ops callbacks to handle empty tokens in case of
invalid groups.
* Make setting status DRIVER_OK fail if vq group is not valid.
* Remove the _int name suffix from struct vduse_vq_group.
RFC v3:
* Make the vq groups a dynamic array to support an arbitrary number of
them.
---
drivers/vdpa/vdpa_user/vduse_dev.c | 100 ++++++++++++++++++++++++++---
include/linux/virtio.h | 6 +-
2 files changed, 94 insertions(+), 12 deletions(-)
diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
index d5c90c599698..b01f55a1c22d 100644
--- a/drivers/vdpa/vdpa_user/vduse_dev.c
+++ b/drivers/vdpa/vdpa_user/vduse_dev.c
@@ -22,6 +22,7 @@
#include <linux/uio.h>
#include <linux/vdpa.h>
#include <linux/nospec.h>
+#include <linux/virtio.h>
#include <linux/vmalloc.h>
#include <linux/sched/mm.h>
#include <uapi/linux/vduse.h>
@@ -85,6 +86,10 @@ struct vduse_umem {
struct mm_struct *mm;
};
+struct vduse_vq_group {
+ struct vduse_dev *dev;
+};
+
struct vduse_dev {
struct vduse_vdpa *vdev;
struct device *dev;
@@ -118,6 +123,7 @@ struct vduse_dev {
u32 vq_align;
u32 ngroups;
struct vduse_umem *umem;
+ struct vduse_vq_group *groups;
struct mutex mem_lock;
unsigned int bounce_size;
struct mutex domain_lock;
@@ -606,6 +612,17 @@ static u32 vduse_get_vq_group(struct vdpa_device *vdpa, u16 idx)
return dev->vqs[idx]->vq_group;
}
+static union virtio_map vduse_get_vq_map(struct vdpa_device *vdpa, u16 idx)
+{
+ struct vduse_dev *dev = vdpa_to_vduse(vdpa);
+ u32 vq_group = vduse_get_vq_group(vdpa, idx);
+ union virtio_map ret = {
+ .group = &dev->groups[vq_group],
+ };
+
+ return ret;
+}
+
static int vduse_vdpa_get_vq_state(struct vdpa_device *vdpa, u16 idx,
struct vdpa_vq_state *state)
{
@@ -826,6 +843,7 @@ static const struct vdpa_config_ops vduse_vdpa_config_ops = {
.get_vq_affinity = vduse_vdpa_get_vq_affinity,
.reset = vduse_vdpa_reset,
.set_map = vduse_vdpa_set_map,
+ .get_vq_map = vduse_get_vq_map,
.free = vduse_vdpa_free,
};
@@ -833,7 +851,14 @@ static void vduse_dev_sync_single_for_device(union virtio_map token,
dma_addr_t dma_addr, size_t size,
enum dma_data_direction dir)
{
- struct vduse_iova_domain *domain = token.iova_domain;
+ struct vduse_dev *vdev;
+ struct vduse_iova_domain *domain;
+
+ if (!token.group)
+ return;
+
+ vdev = token.group->dev;
+ domain = vdev->domain;
vduse_domain_sync_single_for_device(domain, dma_addr, size, dir);
}
@@ -842,7 +867,14 @@ static void vduse_dev_sync_single_for_cpu(union virtio_map token,
dma_addr_t dma_addr, size_t size,
enum dma_data_direction dir)
{
- struct vduse_iova_domain *domain = token.iova_domain;
+ struct vduse_dev *vdev;
+ struct vduse_iova_domain *domain;
+
+ if (!token.group)
+ return;
+
+ vdev = token.group->dev;
+ domain = vdev->domain;
vduse_domain_sync_single_for_cpu(domain, dma_addr, size, dir);
}
@@ -852,7 +884,14 @@ static dma_addr_t vduse_dev_map_page(union virtio_map token, struct page *page,
enum dma_data_direction dir,
unsigned long attrs)
{
- struct vduse_iova_domain *domain = token.iova_domain;
+ struct vduse_dev *vdev;
+ struct vduse_iova_domain *domain;
+
+ if (!token.group)
+ return DMA_MAPPING_ERROR;
+
+ vdev = token.group->dev;
+ domain = vdev->domain;
return vduse_domain_map_page(domain, page, offset, size, dir, attrs);
}
@@ -861,7 +900,14 @@ static void vduse_dev_unmap_page(union virtio_map token, dma_addr_t dma_addr,
size_t size, enum dma_data_direction dir,
unsigned long attrs)
{
- struct vduse_iova_domain *domain = token.iova_domain;
+ struct vduse_dev *vdev;
+ struct vduse_iova_domain *domain;
+
+ if (!token.group)
+ return;
+
+ vdev = token.group->dev;
+ domain = vdev->domain;
return vduse_domain_unmap_page(domain, dma_addr, size, dir, attrs);
}
@@ -869,11 +915,17 @@ static void vduse_dev_unmap_page(union virtio_map token, dma_addr_t dma_addr,
static void *vduse_dev_alloc_coherent(union virtio_map token, size_t size,
dma_addr_t *dma_addr, gfp_t flag)
{
- struct vduse_iova_domain *domain = token.iova_domain;
+ struct vduse_dev *vdev;
+ struct vduse_iova_domain *domain;
unsigned long iova;
void *addr;
*dma_addr = DMA_MAPPING_ERROR;
+ if (!token.group)
+ return NULL;
+
+ vdev = token.group->dev;
+ domain = vdev->domain;
addr = vduse_domain_alloc_coherent(domain, size,
(dma_addr_t *)&iova, flag);
if (!addr)
@@ -888,14 +940,28 @@ static void vduse_dev_free_coherent(union virtio_map token, size_t size,
void *vaddr, dma_addr_t dma_addr,
unsigned long attrs)
{
- struct vduse_iova_domain *domain = token.iova_domain;
+ struct vduse_dev *vdev;
+ struct vduse_iova_domain *domain;
+
+ if (!token.group)
+ return;
+
+ vdev = token.group->dev;
+ domain = vdev->domain;
vduse_domain_free_coherent(domain, size, vaddr, dma_addr, attrs);
}
static bool vduse_dev_need_sync(union virtio_map token, dma_addr_t dma_addr)
{
- struct vduse_iova_domain *domain = token.iova_domain;
+ struct vduse_dev *vdev;
+ struct vduse_iova_domain *domain;
+
+ if (!token.group)
+ return false;
+
+ vdev = token.group->dev;
+ domain = vdev->domain;
return dma_addr < domain->bounce_size;
}
@@ -909,7 +975,14 @@ static int vduse_dev_mapping_error(union virtio_map token, dma_addr_t dma_addr)
static size_t vduse_dev_max_mapping_size(union virtio_map token)
{
- struct vduse_iova_domain *domain = token.iova_domain;
+ struct vduse_dev *vdev;
+ struct vduse_iova_domain *domain;
+
+ if (!token.group)
+ return 0;
+
+ vdev = token.group->dev;
+ domain = vdev->domain;
return domain->bounce_size;
}
@@ -1727,6 +1800,7 @@ static int vduse_destroy_dev(char *name)
if (dev->domain)
vduse_domain_destroy(dev->domain);
kfree(dev->name);
+ kfree(dev->groups);
vduse_dev_destroy(dev);
module_put(THIS_MODULE);
@@ -1896,6 +1970,13 @@ static int vduse_create_dev(struct vduse_dev_config *config,
dev->ngroups = (dev->api_version < VDUSE_API_VERSION_1)
? 1
: config->ngroups;
+ dev->groups = kcalloc(dev->ngroups, sizeof(dev->groups[0]),
+ GFP_KERNEL);
+ if (!dev->groups)
+ goto err_vq_groups;
+ for (u32 i = 0; i < dev->ngroups; ++i)
+ dev->groups[i].dev = dev;
+
dev->name = kstrdup(config->name, GFP_KERNEL);
if (!dev->name)
goto err_str;
@@ -1932,6 +2013,8 @@ static int vduse_create_dev(struct vduse_dev_config *config,
err_idr:
kfree(dev->name);
err_str:
+ kfree(dev->groups);
+err_vq_groups:
vduse_dev_destroy(dev);
err:
return ret;
@@ -2093,7 +2176,6 @@ static int vdpa_dev_add(struct vdpa_mgmt_dev *mdev, const char *name,
return -ENOMEM;
}
- dev->vdev->vdpa.vmap.iova_domain = dev->domain;
ret = _vdpa_register_device(&dev->vdev->vdpa, dev->vq_num);
if (ret) {
put_device(&dev->vdev->vdpa.dev);
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 96c66126c074..302109029700 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -41,13 +41,13 @@ struct virtqueue {
void *priv;
};
-struct vduse_iova_domain;
+struct vduse_vq_group;
union virtio_map {
/* Device that performs DMA */
struct device *dma_dev;
- /* VDUSE specific mapping data */
- struct vduse_iova_domain *iova_domain;
+ /* VDUSE specific virtqueue group for doing map */
+ struct vduse_vq_group *group;
};
int virtqueue_add_outbuf(struct virtqueue *vq,
--
2.51.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH v8 4/6] vduse: refactor vdpa_dev_add for goto err handling
2025-10-30 10:00 [PATCH v8 0/6] Add multiple address spaces support to VDUSE Eugenio Pérez
` (2 preceding siblings ...)
2025-10-30 10:00 ` [PATCH v8 3/6] vduse: return internal vq group struct as map token Eugenio Pérez
@ 2025-10-30 10:00 ` Eugenio Pérez
2025-10-31 2:33 ` Yongji Xie
2025-10-30 10:00 ` [PATCH v8 5/6] vduse: add vq group asid support Eugenio Pérez
2025-10-30 10:00 ` [PATCH v8 6/6] vduse: bump version number Eugenio Pérez
5 siblings, 1 reply; 24+ messages in thread
From: Eugenio Pérez @ 2025-10-30 10:00 UTC (permalink / raw)
To: Michael S . Tsirkin
Cc: Yongji Xie, Eugenio Pérez, Laurent Vivier, Maxime Coquelin,
virtualization, Stefano Garzarella, jasowang, Cindy Lu, Xuan Zhuo,
linux-kernel
Next patches introduce more error paths in this function. Refactor it
so they can be accomodated through gotos.
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
v6: New in v6.
---
drivers/vdpa/vdpa_user/vduse_dev.c | 22 ++++++++++++++--------
1 file changed, 14 insertions(+), 8 deletions(-)
diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
index b01f55a1c22d..97be04f73fbf 100644
--- a/drivers/vdpa/vdpa_user/vduse_dev.c
+++ b/drivers/vdpa/vdpa_user/vduse_dev.c
@@ -2172,21 +2172,27 @@ static int vdpa_dev_add(struct vdpa_mgmt_dev *mdev, const char *name,
dev->bounce_size);
mutex_unlock(&dev->domain_lock);
if (!dev->domain) {
- put_device(&dev->vdev->vdpa.dev);
- return -ENOMEM;
+ ret = -ENOMEM;
+ goto domain_err;
}
ret = _vdpa_register_device(&dev->vdev->vdpa, dev->vq_num);
if (ret) {
- put_device(&dev->vdev->vdpa.dev);
- mutex_lock(&dev->domain_lock);
- vduse_domain_destroy(dev->domain);
- dev->domain = NULL;
- mutex_unlock(&dev->domain_lock);
- return ret;
+ goto register_err;
}
return 0;
+
+register_err:
+ mutex_lock(&dev->domain_lock);
+ vduse_domain_destroy(dev->domain);
+ dev->domain = NULL;
+ mutex_unlock(&dev->domain_lock);
+
+domain_err:
+ put_device(&dev->vdev->vdpa.dev);
+
+ return ret;
}
static void vdpa_dev_del(struct vdpa_mgmt_dev *mdev, struct vdpa_device *dev)
--
2.51.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH v8 5/6] vduse: add vq group asid support
2025-10-30 10:00 [PATCH v8 0/6] Add multiple address spaces support to VDUSE Eugenio Pérez
` (3 preceding siblings ...)
2025-10-30 10:00 ` [PATCH v8 4/6] vduse: refactor vdpa_dev_add for goto err handling Eugenio Pérez
@ 2025-10-30 10:00 ` Eugenio Pérez
2025-10-30 11:52 ` Yongji Xie
2025-10-30 10:00 ` [PATCH v8 6/6] vduse: bump version number Eugenio Pérez
5 siblings, 1 reply; 24+ messages in thread
From: Eugenio Pérez @ 2025-10-30 10:00 UTC (permalink / raw)
To: Michael S . Tsirkin
Cc: Yongji Xie, Eugenio Pérez, Laurent Vivier, Maxime Coquelin,
virtualization, Stefano Garzarella, jasowang, Cindy Lu, Xuan Zhuo,
linux-kernel
Add support for assigning Address Space Identifiers (ASIDs) to each VQ
group. This enables mapping each group into a distinct memory space.
Now that the driver can change ASID in the middle of operation, the
domain that each vq address point is also protected by domain_lock.
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
v8:
* Revert the mutex to rwlock change, it needs proper profiling to
justify it.
v7:
* Take write lock in the error path (Jason).
v6:
* Make vdpa_dev_add use gotos for error handling (MST).
* s/(dev->api_version < 1) ?/(dev->api_version < VDUSE_API_VERSION_1) ?/
(MST).
* Fix struct name not matching in the doc.
v5:
* Properly return errno if copy_to_user returns >0 in VDUSE_IOTLB_GET_FD
ioctl (Jason).
* Properly set domain bounce size to divide equally between nas (Jason).
* Exclude "padding" member from the only >V1 members in
vduse_dev_request.
v4:
* Divide each domain bounce size between the device bounce size (Jason).
* revert unneeded addr = NULL assignment (Jason)
* Change if (x && (y || z)) return to if (x) { if (y) return; if (z)
return; } (Jason)
* Change a bad multiline comment, using @ caracter instead of * (Jason).
* Consider config->nas == 0 as a fail (Jason).
v3:
* Get the vduse domain through the vduse_as in the map functions
(Jason).
* Squash with the patch creating the vduse_as struct (Jason).
* Create VDUSE_DEV_MAX_AS instead of comparing agains a magic number
(Jason)
v2:
* Convert the use of mutex to rwlock.
RFC v3:
* Increase VDUSE_MAX_VQ_GROUPS to 0xffff (Jason). It was set to a lower
value to reduce memory consumption, but vqs are already limited to
that value and userspace VDUSE is able to allocate that many vqs.
* Remove TODO about merging VDUSE_IOTLB_GET_FD ioctl with
VDUSE_IOTLB_GET_INFO.
* Use of array_index_nospec in VDUSE device ioctls.
* Embed vduse_iotlb_entry into vduse_iotlb_entry_v2.
* Move the umem mutex to asid struct so there is no contention between
ASIDs.
RFC v2:
* Make iotlb entry the last one of vduse_iotlb_entry_v2 so the first
part of the struct is the same.
---
drivers/vdpa/vdpa_user/vduse_dev.c | 348 ++++++++++++++++++++---------
include/uapi/linux/vduse.h | 53 ++++-
2 files changed, 292 insertions(+), 109 deletions(-)
diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
index 97be04f73fbf..c6909d73d06d 100644
--- a/drivers/vdpa/vdpa_user/vduse_dev.c
+++ b/drivers/vdpa/vdpa_user/vduse_dev.c
@@ -41,6 +41,7 @@
#define VDUSE_DEV_MAX (1U << MINORBITS)
#define VDUSE_DEV_MAX_GROUPS 0xffff
+#define VDUSE_DEV_MAX_AS 0xffff
#define VDUSE_MAX_BOUNCE_SIZE (1024 * 1024 * 1024)
#define VDUSE_MIN_BOUNCE_SIZE (1024 * 1024)
#define VDUSE_BOUNCE_SIZE (64 * 1024 * 1024)
@@ -86,7 +87,14 @@ struct vduse_umem {
struct mm_struct *mm;
};
+struct vduse_as {
+ struct vduse_iova_domain *domain;
+ struct vduse_umem *umem;
+ struct mutex mem_lock;
+};
+
struct vduse_vq_group {
+ struct vduse_as *as;
struct vduse_dev *dev;
};
@@ -94,7 +102,7 @@ struct vduse_dev {
struct vduse_vdpa *vdev;
struct device *dev;
struct vduse_virtqueue **vqs;
- struct vduse_iova_domain *domain;
+ struct vduse_as *as;
char *name;
struct mutex lock;
spinlock_t msg_lock;
@@ -122,9 +130,8 @@ struct vduse_dev {
u32 vq_num;
u32 vq_align;
u32 ngroups;
- struct vduse_umem *umem;
+ u32 nas;
struct vduse_vq_group *groups;
- struct mutex mem_lock;
unsigned int bounce_size;
struct mutex domain_lock;
};
@@ -314,7 +321,7 @@ static int vduse_dev_set_status(struct vduse_dev *dev, u8 status)
return vduse_dev_msg_sync(dev, &msg);
}
-static int vduse_dev_update_iotlb(struct vduse_dev *dev,
+static int vduse_dev_update_iotlb(struct vduse_dev *dev, u32 asid,
u64 start, u64 last)
{
struct vduse_dev_msg msg = { 0 };
@@ -323,8 +330,14 @@ static int vduse_dev_update_iotlb(struct vduse_dev *dev,
return -EINVAL;
msg.req.type = VDUSE_UPDATE_IOTLB;
- msg.req.iova.start = start;
- msg.req.iova.last = last;
+ if (dev->api_version < VDUSE_API_VERSION_1) {
+ msg.req.iova.start = start;
+ msg.req.iova.last = last;
+ } else {
+ msg.req.iova_v2.start = start;
+ msg.req.iova_v2.last = last;
+ msg.req.iova_v2.asid = asid;
+ }
return vduse_dev_msg_sync(dev, &msg);
}
@@ -436,14 +449,29 @@ static __poll_t vduse_dev_poll(struct file *file, poll_table *wait)
return mask;
}
+/* Force set the asid to a vq group without a message to the VDUSE device */
+static void vduse_set_group_asid_nomsg(struct vduse_dev *dev,
+ unsigned int group, unsigned int asid)
+{
+ mutex_lock(&dev->domain_lock);
+ dev->groups[group].as = &dev->as[asid];
+ mutex_unlock(&dev->domain_lock);
+}
+
static void vduse_dev_reset(struct vduse_dev *dev)
{
int i;
- struct vduse_iova_domain *domain = dev->domain;
/* The coherent mappings are handled in vduse_dev_free_coherent() */
- if (domain && domain->bounce_map)
- vduse_domain_reset_bounce_map(domain);
+ for (i = 0; i < dev->nas; i++) {
+ struct vduse_iova_domain *domain = dev->as[i].domain;
+
+ if (domain && domain->bounce_map)
+ vduse_domain_reset_bounce_map(domain);
+ }
+
+ for (i = 0; i < dev->ngroups; i++)
+ vduse_set_group_asid_nomsg(dev, i, 0);
down_write(&dev->rwsem);
@@ -623,6 +651,29 @@ static union virtio_map vduse_get_vq_map(struct vdpa_device *vdpa, u16 idx)
return ret;
}
+static int vduse_set_group_asid(struct vdpa_device *vdpa, unsigned int group,
+ unsigned int asid)
+{
+ struct vduse_dev *dev = vdpa_to_vduse(vdpa);
+ struct vduse_dev_msg msg = { 0 };
+ int r;
+
+ if (dev->api_version < VDUSE_API_VERSION_1 ||
+ group >= dev->ngroups || asid >= dev->nas)
+ return -EINVAL;
+
+ msg.req.type = VDUSE_SET_VQ_GROUP_ASID;
+ msg.req.vq_group_asid.group = group;
+ msg.req.vq_group_asid.asid = asid;
+
+ r = vduse_dev_msg_sync(dev, &msg);
+ if (r < 0)
+ return r;
+
+ vduse_set_group_asid_nomsg(dev, group, asid);
+ return 0;
+}
+
static int vduse_vdpa_get_vq_state(struct vdpa_device *vdpa, u16 idx,
struct vdpa_vq_state *state)
{
@@ -794,13 +845,13 @@ static int vduse_vdpa_set_map(struct vdpa_device *vdpa,
struct vduse_dev *dev = vdpa_to_vduse(vdpa);
int ret;
- ret = vduse_domain_set_map(dev->domain, iotlb);
+ ret = vduse_domain_set_map(dev->as[asid].domain, iotlb);
if (ret)
return ret;
- ret = vduse_dev_update_iotlb(dev, 0ULL, ULLONG_MAX);
+ ret = vduse_dev_update_iotlb(dev, asid, 0ULL, ULLONG_MAX);
if (ret) {
- vduse_domain_clear_map(dev->domain, iotlb);
+ vduse_domain_clear_map(dev->as[asid].domain, iotlb);
return ret;
}
@@ -843,6 +894,7 @@ static const struct vdpa_config_ops vduse_vdpa_config_ops = {
.get_vq_affinity = vduse_vdpa_get_vq_affinity,
.reset = vduse_vdpa_reset,
.set_map = vduse_vdpa_set_map,
+ .set_group_asid = vduse_set_group_asid,
.get_vq_map = vduse_get_vq_map,
.free = vduse_vdpa_free,
};
@@ -858,9 +910,10 @@ static void vduse_dev_sync_single_for_device(union virtio_map token,
return;
vdev = token.group->dev;
- domain = vdev->domain;
-
+ mutex_lock(&vdev->domain_lock);
+ domain = token.group->as->domain;
vduse_domain_sync_single_for_device(domain, dma_addr, size, dir);
+ mutex_unlock(&vdev->domain_lock);
}
static void vduse_dev_sync_single_for_cpu(union virtio_map token,
@@ -874,9 +927,10 @@ static void vduse_dev_sync_single_for_cpu(union virtio_map token,
return;
vdev = token.group->dev;
- domain = vdev->domain;
-
+ mutex_lock(&vdev->domain_lock);
+ domain = token.group->as->domain;
vduse_domain_sync_single_for_cpu(domain, dma_addr, size, dir);
+ mutex_unlock(&vdev->domain_lock);
}
static dma_addr_t vduse_dev_map_page(union virtio_map token, struct page *page,
@@ -886,14 +940,18 @@ static dma_addr_t vduse_dev_map_page(union virtio_map token, struct page *page,
{
struct vduse_dev *vdev;
struct vduse_iova_domain *domain;
+ dma_addr_t r;
if (!token.group)
return DMA_MAPPING_ERROR;
vdev = token.group->dev;
- domain = vdev->domain;
+ mutex_lock(&vdev->domain_lock);
+ domain = token.group->as->domain;
+ r = vduse_domain_map_page(domain, page, offset, size, dir, attrs);
+ mutex_unlock(&vdev->domain_lock);
- return vduse_domain_map_page(domain, page, offset, size, dir, attrs);
+ return r;
}
static void vduse_dev_unmap_page(union virtio_map token, dma_addr_t dma_addr,
@@ -907,9 +965,10 @@ static void vduse_dev_unmap_page(union virtio_map token, dma_addr_t dma_addr,
return;
vdev = token.group->dev;
- domain = vdev->domain;
-
- return vduse_domain_unmap_page(domain, dma_addr, size, dir, attrs);
+ mutex_lock(&vdev->domain_lock);
+ domain = token.group->as->domain;
+ vduse_domain_unmap_page(domain, dma_addr, size, dir, attrs);
+ mutex_unlock(&vdev->domain_lock);
}
static void *vduse_dev_alloc_coherent(union virtio_map token, size_t size,
@@ -925,14 +984,14 @@ static void *vduse_dev_alloc_coherent(union virtio_map token, size_t size,
return NULL;
vdev = token.group->dev;
- domain = vdev->domain;
+ mutex_lock(&vdev->domain_lock);
+ domain = token.group->as->domain;
addr = vduse_domain_alloc_coherent(domain, size,
(dma_addr_t *)&iova, flag);
- if (!addr)
- return NULL;
-
- *dma_addr = (dma_addr_t)iova;
+ if (addr)
+ *dma_addr = (dma_addr_t)iova;
+ mutex_unlock(&vdev->domain_lock);
return addr;
}
@@ -947,23 +1006,28 @@ static void vduse_dev_free_coherent(union virtio_map token, size_t size,
return;
vdev = token.group->dev;
- domain = vdev->domain;
-
+ mutex_lock(&vdev->domain_lock);
+ domain = token.group->as->domain;
vduse_domain_free_coherent(domain, size, vaddr, dma_addr, attrs);
+ mutex_unlock(&vdev->domain_lock);
}
static bool vduse_dev_need_sync(union virtio_map token, dma_addr_t dma_addr)
{
struct vduse_dev *vdev;
struct vduse_iova_domain *domain;
+ size_t bounce_size;
if (!token.group)
return false;
vdev = token.group->dev;
- domain = vdev->domain;
+ mutex_lock(&vdev->domain_lock);
+ domain = token.group->as->domain;
+ bounce_size = domain->bounce_size;
+ mutex_unlock(&vdev->domain_lock);
- return dma_addr < domain->bounce_size;
+ return dma_addr < bounce_size;
}
static int vduse_dev_mapping_error(union virtio_map token, dma_addr_t dma_addr)
@@ -977,14 +1041,18 @@ static size_t vduse_dev_max_mapping_size(union virtio_map token)
{
struct vduse_dev *vdev;
struct vduse_iova_domain *domain;
+ size_t bounce_size;
if (!token.group)
return 0;
vdev = token.group->dev;
- domain = vdev->domain;
+ mutex_lock(&vdev->domain_lock);
+ domain = token.group->as->domain;
+ bounce_size = domain->bounce_size;
+ mutex_unlock(&vdev->domain_lock);
- return domain->bounce_size;
+ return bounce_size;
}
static const struct virtio_map_ops vduse_map_ops = {
@@ -1124,39 +1192,40 @@ static int vduse_dev_queue_irq_work(struct vduse_dev *dev,
return ret;
}
-static int vduse_dev_dereg_umem(struct vduse_dev *dev,
+static int vduse_dev_dereg_umem(struct vduse_dev *dev, u32 asid,
u64 iova, u64 size)
{
int ret;
- mutex_lock(&dev->mem_lock);
+ mutex_lock(&dev->as[asid].mem_lock);
ret = -ENOENT;
- if (!dev->umem)
+ if (!dev->as[asid].umem)
goto unlock;
ret = -EINVAL;
- if (!dev->domain)
+ if (!dev->as[asid].domain)
goto unlock;
- if (dev->umem->iova != iova || size != dev->domain->bounce_size)
+ if (dev->as[asid].umem->iova != iova ||
+ size != dev->as[asid].domain->bounce_size)
goto unlock;
- vduse_domain_remove_user_bounce_pages(dev->domain);
- unpin_user_pages_dirty_lock(dev->umem->pages,
- dev->umem->npages, true);
- atomic64_sub(dev->umem->npages, &dev->umem->mm->pinned_vm);
- mmdrop(dev->umem->mm);
- vfree(dev->umem->pages);
- kfree(dev->umem);
- dev->umem = NULL;
+ vduse_domain_remove_user_bounce_pages(dev->as[asid].domain);
+ unpin_user_pages_dirty_lock(dev->as[asid].umem->pages,
+ dev->as[asid].umem->npages, true);
+ atomic64_sub(dev->as[asid].umem->npages, &dev->as[asid].umem->mm->pinned_vm);
+ mmdrop(dev->as[asid].umem->mm);
+ vfree(dev->as[asid].umem->pages);
+ kfree(dev->as[asid].umem);
+ dev->as[asid].umem = NULL;
ret = 0;
unlock:
- mutex_unlock(&dev->mem_lock);
+ mutex_unlock(&dev->as[asid].mem_lock);
return ret;
}
static int vduse_dev_reg_umem(struct vduse_dev *dev,
- u64 iova, u64 uaddr, u64 size)
+ u32 asid, u64 iova, u64 uaddr, u64 size)
{
struct page **page_list = NULL;
struct vduse_umem *umem = NULL;
@@ -1164,14 +1233,14 @@ static int vduse_dev_reg_umem(struct vduse_dev *dev,
unsigned long npages, lock_limit;
int ret;
- if (!dev->domain || !dev->domain->bounce_map ||
- size != dev->domain->bounce_size ||
+ if (!dev->as[asid].domain || !dev->as[asid].domain->bounce_map ||
+ size != dev->as[asid].domain->bounce_size ||
iova != 0 || uaddr & ~PAGE_MASK)
return -EINVAL;
- mutex_lock(&dev->mem_lock);
+ mutex_lock(&dev->as[asid].mem_lock);
ret = -EEXIST;
- if (dev->umem)
+ if (dev->as[asid].umem)
goto unlock;
ret = -ENOMEM;
@@ -1195,7 +1264,7 @@ static int vduse_dev_reg_umem(struct vduse_dev *dev,
goto out;
}
- ret = vduse_domain_add_user_bounce_pages(dev->domain,
+ ret = vduse_domain_add_user_bounce_pages(dev->as[asid].domain,
page_list, pinned);
if (ret)
goto out;
@@ -1208,7 +1277,7 @@ static int vduse_dev_reg_umem(struct vduse_dev *dev,
umem->mm = current->mm;
mmgrab(current->mm);
- dev->umem = umem;
+ dev->as[asid].umem = umem;
out:
if (ret && pinned > 0)
unpin_user_pages(page_list, pinned);
@@ -1219,7 +1288,7 @@ static int vduse_dev_reg_umem(struct vduse_dev *dev,
vfree(page_list);
kfree(umem);
}
- mutex_unlock(&dev->mem_lock);
+ mutex_unlock(&dev->as[asid].mem_lock);
return ret;
}
@@ -1251,47 +1320,66 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
switch (cmd) {
case VDUSE_IOTLB_GET_FD: {
- struct vduse_iotlb_entry entry;
+ struct vduse_iotlb_entry_v2 entry;
struct vhost_iotlb_map *map;
struct vdpa_map_file *map_file;
struct file *f = NULL;
+ u32 asid;
ret = -EFAULT;
- if (copy_from_user(&entry, argp, sizeof(entry)))
- break;
+ if (dev->api_version >= VDUSE_API_VERSION_1) {
+ if (copy_from_user(&entry, argp, sizeof(entry)))
+ break;
+ } else {
+ entry.asid = 0;
+ if (copy_from_user(&entry.v1, argp,
+ sizeof(entry.v1)))
+ break;
+ }
ret = -EINVAL;
- if (entry.start > entry.last)
+ if (entry.v1.start > entry.v1.last)
+ break;
+
+ if (entry.asid >= dev->nas)
break;
mutex_lock(&dev->domain_lock);
- if (!dev->domain) {
+ asid = array_index_nospec(entry.asid, dev->nas);
+ if (!dev->as[asid].domain) {
mutex_unlock(&dev->domain_lock);
break;
}
- spin_lock(&dev->domain->iotlb_lock);
- map = vhost_iotlb_itree_first(dev->domain->iotlb,
- entry.start, entry.last);
+ spin_lock(&dev->as[asid].domain->iotlb_lock);
+ map = vhost_iotlb_itree_first(dev->as[asid].domain->iotlb,
+ entry.v1.start, entry.v1.last);
if (map) {
map_file = (struct vdpa_map_file *)map->opaque;
f = get_file(map_file->file);
- entry.offset = map_file->offset;
- entry.start = map->start;
- entry.last = map->last;
- entry.perm = map->perm;
+ entry.v1.offset = map_file->offset;
+ entry.v1.start = map->start;
+ entry.v1.last = map->last;
+ entry.v1.perm = map->perm;
}
- spin_unlock(&dev->domain->iotlb_lock);
+ spin_unlock(&dev->as[asid].domain->iotlb_lock);
mutex_unlock(&dev->domain_lock);
ret = -EINVAL;
if (!f)
break;
- ret = -EFAULT;
- if (copy_to_user(argp, &entry, sizeof(entry))) {
+ if (dev->api_version >= VDUSE_API_VERSION_1)
+ ret = copy_to_user(argp, &entry,
+ sizeof(entry));
+ else
+ ret = copy_to_user(argp, &entry.v1,
+ sizeof(entry.v1));
+
+ if (ret) {
+ ret = -EFAULT;
fput(f);
break;
}
- ret = receive_fd(f, NULL, perm_to_file_flags(entry.perm));
+ ret = receive_fd(f, NULL, perm_to_file_flags(entry.v1.perm));
fput(f);
break;
}
@@ -1436,6 +1524,7 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
}
case VDUSE_IOTLB_REG_UMEM: {
struct vduse_iova_umem umem;
+ u32 asid;
ret = -EFAULT;
if (copy_from_user(&umem, argp, sizeof(umem)))
@@ -1443,17 +1532,21 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
ret = -EINVAL;
if (!is_mem_zero((const char *)umem.reserved,
- sizeof(umem.reserved)))
+ sizeof(umem.reserved)) ||
+ (dev->api_version < VDUSE_API_VERSION_1 &&
+ umem.asid != 0) || umem.asid >= dev->nas)
break;
mutex_lock(&dev->domain_lock);
- ret = vduse_dev_reg_umem(dev, umem.iova,
+ asid = array_index_nospec(umem.asid, dev->nas);
+ ret = vduse_dev_reg_umem(dev, asid, umem.iova,
umem.uaddr, umem.size);
mutex_unlock(&dev->domain_lock);
break;
}
case VDUSE_IOTLB_DEREG_UMEM: {
struct vduse_iova_umem umem;
+ u32 asid;
ret = -EFAULT;
if (copy_from_user(&umem, argp, sizeof(umem)))
@@ -1461,10 +1554,15 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
ret = -EINVAL;
if (!is_mem_zero((const char *)umem.reserved,
- sizeof(umem.reserved)))
+ sizeof(umem.reserved)) ||
+ (dev->api_version < VDUSE_API_VERSION_1 &&
+ umem.asid != 0) ||
+ umem.asid >= dev->nas)
break;
+
mutex_lock(&dev->domain_lock);
- ret = vduse_dev_dereg_umem(dev, umem.iova,
+ asid = array_index_nospec(umem.asid, dev->nas);
+ ret = vduse_dev_dereg_umem(dev, asid, umem.iova,
umem.size);
mutex_unlock(&dev->domain_lock);
break;
@@ -1472,6 +1570,7 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
case VDUSE_IOTLB_GET_INFO: {
struct vduse_iova_info info;
struct vhost_iotlb_map *map;
+ u32 asid;
ret = -EFAULT;
if (copy_from_user(&info, argp, sizeof(info)))
@@ -1485,23 +1584,31 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
sizeof(info.reserved)))
break;
+ if (dev->api_version < VDUSE_API_VERSION_1) {
+ if (info.asid)
+ break;
+ } else if (info.asid >= dev->nas)
+ break;
+
mutex_lock(&dev->domain_lock);
- if (!dev->domain) {
+ asid = array_index_nospec(info.asid, dev->nas);
+ if (!dev->as[asid].domain) {
mutex_unlock(&dev->domain_lock);
break;
}
- spin_lock(&dev->domain->iotlb_lock);
- map = vhost_iotlb_itree_first(dev->domain->iotlb,
+ spin_lock(&dev->as[asid].domain->iotlb_lock);
+ map = vhost_iotlb_itree_first(dev->as[asid].domain->iotlb,
info.start, info.last);
if (map) {
info.start = map->start;
info.last = map->last;
info.capability = 0;
- if (dev->domain->bounce_map && map->start == 0 &&
- map->last == dev->domain->bounce_size - 1)
+ if (dev->as[asid].domain->bounce_map &&
+ map->start == 0 &&
+ map->last == dev->as[asid].domain->bounce_size - 1)
info.capability |= VDUSE_IOVA_CAP_UMEM;
}
- spin_unlock(&dev->domain->iotlb_lock);
+ spin_unlock(&dev->as[asid].domain->iotlb_lock);
mutex_unlock(&dev->domain_lock);
if (!map)
break;
@@ -1526,8 +1633,10 @@ static int vduse_dev_release(struct inode *inode, struct file *file)
struct vduse_dev *dev = file->private_data;
mutex_lock(&dev->domain_lock);
- if (dev->domain)
- vduse_dev_dereg_umem(dev, 0, dev->domain->bounce_size);
+ for (int i = 0; i < dev->nas; i++)
+ if (dev->as[i].domain)
+ vduse_dev_dereg_umem(dev, i, 0,
+ dev->as[i].domain->bounce_size);
mutex_unlock(&dev->domain_lock);
spin_lock(&dev->msg_lock);
/* Make sure the inflight messages can processed after reconncection */
@@ -1746,7 +1855,6 @@ static struct vduse_dev *vduse_dev_create(void)
return NULL;
mutex_init(&dev->lock);
- mutex_init(&dev->mem_lock);
mutex_init(&dev->domain_lock);
spin_lock_init(&dev->msg_lock);
INIT_LIST_HEAD(&dev->send_list);
@@ -1797,8 +1905,11 @@ static int vduse_destroy_dev(char *name)
idr_remove(&vduse_idr, dev->minor);
kvfree(dev->config);
vduse_dev_deinit_vqs(dev);
- if (dev->domain)
- vduse_domain_destroy(dev->domain);
+ for (int i = 0; i < dev->nas; i++) {
+ if (dev->as[i].domain)
+ vduse_domain_destroy(dev->as[i].domain);
+ }
+ kfree(dev->as);
kfree(dev->name);
kfree(dev->groups);
vduse_dev_destroy(dev);
@@ -1845,12 +1956,17 @@ static bool vduse_validate_config(struct vduse_dev_config *config,
sizeof(config->reserved)))
return false;
- if (api_version < VDUSE_API_VERSION_1 && config->ngroups)
+ if (api_version < VDUSE_API_VERSION_1 &&
+ (config->ngroups || config->nas))
return false;
- if (api_version >= VDUSE_API_VERSION_1 &&
- (!config->ngroups || config->ngroups > VDUSE_DEV_MAX_GROUPS))
- return false;
+ if (api_version >= VDUSE_API_VERSION_1) {
+ if (!config->ngroups || config->ngroups > VDUSE_DEV_MAX_GROUPS)
+ return false;
+
+ if (!config->nas || config->nas > VDUSE_DEV_MAX_AS)
+ return false;
+ }
if (config->vq_align > PAGE_SIZE)
return false;
@@ -1915,7 +2031,8 @@ static ssize_t bounce_size_store(struct device *device,
ret = -EPERM;
mutex_lock(&dev->domain_lock);
- if (dev->domain)
+ /* Assuming that if the first domain is allocated, all are allocated */
+ if (dev->as[0].domain)
goto unlock;
ret = kstrtouint(buf, 10, &bounce_size);
@@ -1977,6 +2094,13 @@ static int vduse_create_dev(struct vduse_dev_config *config,
for (u32 i = 0; i < dev->ngroups; ++i)
dev->groups[i].dev = dev;
+ dev->nas = (dev->api_version < VDUSE_API_VERSION_1) ? 1 : config->nas;
+ dev->as = kcalloc(dev->nas, sizeof(dev->as[0]), GFP_KERNEL);
+ if (!dev->as)
+ goto err_as;
+ for (int i = 0; i < dev->nas; i++)
+ mutex_init(&dev->as[i].mem_lock);
+
dev->name = kstrdup(config->name, GFP_KERNEL);
if (!dev->name)
goto err_str;
@@ -2013,6 +2137,8 @@ static int vduse_create_dev(struct vduse_dev_config *config,
err_idr:
kfree(dev->name);
err_str:
+ kfree(dev->as);
+err_as:
kfree(dev->groups);
err_vq_groups:
vduse_dev_destroy(dev);
@@ -2138,7 +2264,7 @@ static int vduse_dev_init_vdpa(struct vduse_dev *dev, const char *name)
vdev = vdpa_alloc_device(struct vduse_vdpa, vdpa, dev->dev,
&vduse_vdpa_config_ops, &vduse_map_ops,
- dev->ngroups, 1, name, true);
+ dev->ngroups, dev->nas, name, true);
if (IS_ERR(vdev))
return PTR_ERR(vdev);
@@ -2153,7 +2279,8 @@ static int vdpa_dev_add(struct vdpa_mgmt_dev *mdev, const char *name,
const struct vdpa_dev_set_config *config)
{
struct vduse_dev *dev;
- int ret;
+ size_t domain_bounce_size;
+ int ret, i;
mutex_lock(&vduse_lock);
dev = vduse_find_dev(name);
@@ -2167,29 +2294,38 @@ static int vdpa_dev_add(struct vdpa_mgmt_dev *mdev, const char *name,
return ret;
mutex_lock(&dev->domain_lock);
- if (!dev->domain)
- dev->domain = vduse_domain_create(VDUSE_IOVA_SIZE - 1,
- dev->bounce_size);
- mutex_unlock(&dev->domain_lock);
- if (!dev->domain) {
- ret = -ENOMEM;
- goto domain_err;
+ ret = 0;
+
+ domain_bounce_size = dev->bounce_size / dev->nas;
+ for (i = 0; i < dev->nas; ++i) {
+ dev->as[i].domain = vduse_domain_create(VDUSE_IOVA_SIZE - 1,
+ domain_bounce_size);
+ if (!dev->as[i].domain) {
+ ret = -ENOMEM;
+ goto err;
+ }
}
+ mutex_unlock(&dev->domain_lock);
+
ret = _vdpa_register_device(&dev->vdev->vdpa, dev->vq_num);
- if (ret) {
- goto register_err;
- }
+ if (ret)
+ goto err_register;
return 0;
-register_err:
+err_register:
mutex_lock(&dev->domain_lock);
- vduse_domain_destroy(dev->domain);
- dev->domain = NULL;
+
+err:
+ for (int j = 0; j < i; j++) {
+ if (dev->as[j].domain) {
+ vduse_domain_destroy(dev->as[j].domain);
+ dev->as[j].domain = NULL;
+ }
+ }
mutex_unlock(&dev->domain_lock);
-domain_err:
put_device(&dev->vdev->vdpa.dev);
return ret;
diff --git a/include/uapi/linux/vduse.h b/include/uapi/linux/vduse.h
index a3d51cf6df3a..da2c5e47990e 100644
--- a/include/uapi/linux/vduse.h
+++ b/include/uapi/linux/vduse.h
@@ -47,7 +47,8 @@ struct vduse_dev_config {
__u32 vq_num;
__u32 vq_align;
__u32 ngroups; /* if VDUSE_API_VERSION >= 1 */
- __u32 reserved[12];
+ __u32 nas; /* if VDUSE_API_VERSION >= 1 */
+ __u32 reserved[11];
__u32 config_size;
__u8 config[];
};
@@ -82,6 +83,18 @@ struct vduse_iotlb_entry {
__u8 perm;
};
+/**
+ * struct vduse_iotlb_entry_v2 - entry of IOTLB to describe one IOVA region in an ASID
+ * @v1: the original vduse_iotlb_entry
+ * @asid: address space ID of the IOVA region
+ *
+ * Structure used by VDUSE_IOTLB_GET_FD ioctl to find an overlapped IOVA region.
+ */
+struct vduse_iotlb_entry_v2 {
+ struct vduse_iotlb_entry v1;
+ __u32 asid;
+};
+
/*
* Find the first IOVA region that overlaps with the range [start, last]
* and return the corresponding file descriptor. Return -EINVAL means the
@@ -166,6 +179,16 @@ struct vduse_vq_state_packed {
__u16 last_used_idx;
};
+/**
+ * struct vduse_vq_group_asid - virtqueue group ASID
+ * @group: Index of the virtqueue group
+ * @asid: Address space ID of the group
+ */
+struct vduse_vq_group_asid {
+ __u32 group;
+ __u32 asid;
+};
+
/**
* struct vduse_vq_info - information of a virtqueue
* @index: virtqueue index
@@ -225,6 +248,7 @@ struct vduse_vq_eventfd {
* @uaddr: start address of userspace memory, it must be aligned to page size
* @iova: start of the IOVA region
* @size: size of the IOVA region
+ * @asid: Address space ID of the IOVA region
* @reserved: for future use, needs to be initialized to zero
*
* Structure used by VDUSE_IOTLB_REG_UMEM and VDUSE_IOTLB_DEREG_UMEM
@@ -234,7 +258,8 @@ struct vduse_iova_umem {
__u64 uaddr;
__u64 iova;
__u64 size;
- __u64 reserved[3];
+ __u32 asid;
+ __u32 reserved[5];
};
/* Register userspace memory for IOVA regions */
@@ -248,6 +273,7 @@ struct vduse_iova_umem {
* @start: start of the IOVA region
* @last: last of the IOVA region
* @capability: capability of the IOVA region
+ * @asid: Address space ID of the IOVA region, only if device API version >= 1
* @reserved: for future use, needs to be initialized to zero
*
* Structure used by VDUSE_IOTLB_GET_INFO ioctl to get information of
@@ -258,7 +284,8 @@ struct vduse_iova_info {
__u64 last;
#define VDUSE_IOVA_CAP_UMEM (1 << 0)
__u64 capability;
- __u64 reserved[3];
+ __u32 asid; /* Only if device API version >= 1 */
+ __u32 reserved[5];
};
/*
@@ -280,6 +307,7 @@ enum vduse_req_type {
VDUSE_GET_VQ_STATE,
VDUSE_SET_STATUS,
VDUSE_UPDATE_IOTLB,
+ VDUSE_SET_VQ_GROUP_ASID,
};
/**
@@ -314,6 +342,18 @@ struct vduse_iova_range {
__u64 last;
};
+/**
+ * struct vduse_iova_range - IOVA range [start, last] if API_VERSION >= 1
+ * @start: start of the IOVA range
+ * @last: last of the IOVA range
+ * @asid: address space ID of the IOVA range
+ */
+struct vduse_iova_range_v2 {
+ __u64 start;
+ __u64 last;
+ __u32 asid;
+};
+
/**
* struct vduse_dev_request - control request
* @type: request type
@@ -322,6 +362,8 @@ struct vduse_iova_range {
* @vq_state: virtqueue state, only index field is available
* @s: device status
* @iova: IOVA range for updating
+ * @iova_v2: IOVA range for updating if API_VERSION >= 1
+ * @vq_group_asid: ASID of a virtqueue group
* @padding: padding
*
* Structure used by read(2) on /dev/vduse/$NAME.
@@ -334,6 +376,11 @@ struct vduse_dev_request {
struct vduse_vq_state vq_state;
struct vduse_dev_status s;
struct vduse_iova_range iova;
+ /* Following members but padding exist only if vduse api
+ * version >= 1
+ */;
+ struct vduse_iova_range_v2 iova_v2;
+ struct vduse_vq_group_asid vq_group_asid;
__u32 padding[32];
};
};
--
2.51.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH v8 6/6] vduse: bump version number
2025-10-30 10:00 [PATCH v8 0/6] Add multiple address spaces support to VDUSE Eugenio Pérez
` (4 preceding siblings ...)
2025-10-30 10:00 ` [PATCH v8 5/6] vduse: add vq group asid support Eugenio Pérez
@ 2025-10-30 10:00 ` Eugenio Pérez
2025-10-31 2:33 ` Yongji Xie
2025-10-31 2:34 ` Yongji Xie
5 siblings, 2 replies; 24+ messages in thread
From: Eugenio Pérez @ 2025-10-30 10:00 UTC (permalink / raw)
To: Michael S . Tsirkin
Cc: Yongji Xie, Eugenio Pérez, Laurent Vivier, Maxime Coquelin,
virtualization, Stefano Garzarella, jasowang, Cindy Lu, Xuan Zhuo,
linux-kernel
Finalize the series by advertising VDUSE API v1 support to userspace.
Now that all required infrastructure for v1 (ASIDs, VQ groups,
update_iotlb_v2) is in place, VDUSE devices can opt in to the new
features.
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
drivers/vdpa/vdpa_user/vduse_dev.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
index c6909d73d06d..7977a1be5ad6 100644
--- a/drivers/vdpa/vdpa_user/vduse_dev.c
+++ b/drivers/vdpa/vdpa_user/vduse_dev.c
@@ -2166,7 +2166,7 @@ static long vduse_ioctl(struct file *file, unsigned int cmd,
break;
ret = -EINVAL;
- if (api_version > VDUSE_API_VERSION)
+ if (api_version > VDUSE_API_VERSION_1)
break;
ret = 0;
@@ -2233,7 +2233,7 @@ static int vduse_open(struct inode *inode, struct file *file)
if (!control)
return -ENOMEM;
- control->api_version = VDUSE_API_VERSION;
+ control->api_version = VDUSE_API_VERSION_1;
file->private_data = control;
return 0;
--
2.51.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH v8 5/6] vduse: add vq group asid support
2025-10-30 10:00 ` [PATCH v8 5/6] vduse: add vq group asid support Eugenio Pérez
@ 2025-10-30 11:52 ` Yongji Xie
2025-10-31 6:30 ` Eugenio Perez Martin
0 siblings, 1 reply; 24+ messages in thread
From: Yongji Xie @ 2025-10-30 11:52 UTC (permalink / raw)
To: Eugenio Pérez
Cc: Michael S . Tsirkin, Laurent Vivier, Maxime Coquelin,
virtualization, Stefano Garzarella, Jason Wang, Cindy Lu,
Xuan Zhuo, linux-kernel
On Thu, Oct 30, 2025 at 6:01 PM Eugenio Pérez <eperezma@redhat.com> wrote:
>
> Add support for assigning Address Space Identifiers (ASIDs) to each VQ
> group. This enables mapping each group into a distinct memory space.
>
> Now that the driver can change ASID in the middle of operation, the
> domain that each vq address point is also protected by domain_lock.
>
> Acked-by: Jason Wang <jasowang@redhat.com>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
> v8:
> * Revert the mutex to rwlock change, it needs proper profiling to
> justify it.
>
> v7:
> * Take write lock in the error path (Jason).
>
> v6:
> * Make vdpa_dev_add use gotos for error handling (MST).
> * s/(dev->api_version < 1) ?/(dev->api_version < VDUSE_API_VERSION_1) ?/
> (MST).
> * Fix struct name not matching in the doc.
>
> v5:
> * Properly return errno if copy_to_user returns >0 in VDUSE_IOTLB_GET_FD
> ioctl (Jason).
> * Properly set domain bounce size to divide equally between nas (Jason).
> * Exclude "padding" member from the only >V1 members in
> vduse_dev_request.
>
> v4:
> * Divide each domain bounce size between the device bounce size (Jason).
> * revert unneeded addr = NULL assignment (Jason)
> * Change if (x && (y || z)) return to if (x) { if (y) return; if (z)
> return; } (Jason)
> * Change a bad multiline comment, using @ caracter instead of * (Jason).
> * Consider config->nas == 0 as a fail (Jason).
>
> v3:
> * Get the vduse domain through the vduse_as in the map functions
> (Jason).
> * Squash with the patch creating the vduse_as struct (Jason).
> * Create VDUSE_DEV_MAX_AS instead of comparing agains a magic number
> (Jason)
>
> v2:
> * Convert the use of mutex to rwlock.
>
> RFC v3:
> * Increase VDUSE_MAX_VQ_GROUPS to 0xffff (Jason). It was set to a lower
> value to reduce memory consumption, but vqs are already limited to
> that value and userspace VDUSE is able to allocate that many vqs.
> * Remove TODO about merging VDUSE_IOTLB_GET_FD ioctl with
> VDUSE_IOTLB_GET_INFO.
> * Use of array_index_nospec in VDUSE device ioctls.
> * Embed vduse_iotlb_entry into vduse_iotlb_entry_v2.
> * Move the umem mutex to asid struct so there is no contention between
> ASIDs.
>
> RFC v2:
> * Make iotlb entry the last one of vduse_iotlb_entry_v2 so the first
> part of the struct is the same.
> ---
> drivers/vdpa/vdpa_user/vduse_dev.c | 348 ++++++++++++++++++++---------
> include/uapi/linux/vduse.h | 53 ++++-
> 2 files changed, 292 insertions(+), 109 deletions(-)
>
> diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
> index 97be04f73fbf..c6909d73d06d 100644
> --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> @@ -41,6 +41,7 @@
>
> #define VDUSE_DEV_MAX (1U << MINORBITS)
> #define VDUSE_DEV_MAX_GROUPS 0xffff
> +#define VDUSE_DEV_MAX_AS 0xffff
> #define VDUSE_MAX_BOUNCE_SIZE (1024 * 1024 * 1024)
> #define VDUSE_MIN_BOUNCE_SIZE (1024 * 1024)
> #define VDUSE_BOUNCE_SIZE (64 * 1024 * 1024)
> @@ -86,7 +87,14 @@ struct vduse_umem {
> struct mm_struct *mm;
> };
>
> +struct vduse_as {
> + struct vduse_iova_domain *domain;
> + struct vduse_umem *umem;
> + struct mutex mem_lock;
> +};
> +
> struct vduse_vq_group {
> + struct vduse_as *as;
> struct vduse_dev *dev;
> };
>
> @@ -94,7 +102,7 @@ struct vduse_dev {
> struct vduse_vdpa *vdev;
> struct device *dev;
> struct vduse_virtqueue **vqs;
> - struct vduse_iova_domain *domain;
> + struct vduse_as *as;
> char *name;
> struct mutex lock;
> spinlock_t msg_lock;
> @@ -122,9 +130,8 @@ struct vduse_dev {
> u32 vq_num;
> u32 vq_align;
> u32 ngroups;
> - struct vduse_umem *umem;
> + u32 nas;
> struct vduse_vq_group *groups;
> - struct mutex mem_lock;
> unsigned int bounce_size;
> struct mutex domain_lock;
> };
> @@ -314,7 +321,7 @@ static int vduse_dev_set_status(struct vduse_dev *dev, u8 status)
> return vduse_dev_msg_sync(dev, &msg);
> }
>
> -static int vduse_dev_update_iotlb(struct vduse_dev *dev,
> +static int vduse_dev_update_iotlb(struct vduse_dev *dev, u32 asid,
> u64 start, u64 last)
> {
> struct vduse_dev_msg msg = { 0 };
> @@ -323,8 +330,14 @@ static int vduse_dev_update_iotlb(struct vduse_dev *dev,
> return -EINVAL;
>
> msg.req.type = VDUSE_UPDATE_IOTLB;
> - msg.req.iova.start = start;
> - msg.req.iova.last = last;
> + if (dev->api_version < VDUSE_API_VERSION_1) {
> + msg.req.iova.start = start;
> + msg.req.iova.last = last;
> + } else {
> + msg.req.iova_v2.start = start;
> + msg.req.iova_v2.last = last;
> + msg.req.iova_v2.asid = asid;
> + }
>
> return vduse_dev_msg_sync(dev, &msg);
> }
> @@ -436,14 +449,29 @@ static __poll_t vduse_dev_poll(struct file *file, poll_table *wait)
> return mask;
> }
>
> +/* Force set the asid to a vq group without a message to the VDUSE device */
> +static void vduse_set_group_asid_nomsg(struct vduse_dev *dev,
> + unsigned int group, unsigned int asid)
> +{
> + mutex_lock(&dev->domain_lock);
> + dev->groups[group].as = &dev->as[asid];
> + mutex_unlock(&dev->domain_lock);
> +}
> +
> static void vduse_dev_reset(struct vduse_dev *dev)
> {
> int i;
> - struct vduse_iova_domain *domain = dev->domain;
>
> /* The coherent mappings are handled in vduse_dev_free_coherent() */
> - if (domain && domain->bounce_map)
> - vduse_domain_reset_bounce_map(domain);
> + for (i = 0; i < dev->nas; i++) {
> + struct vduse_iova_domain *domain = dev->as[i].domain;
> +
> + if (domain && domain->bounce_map)
> + vduse_domain_reset_bounce_map(domain);
> + }
> +
> + for (i = 0; i < dev->ngroups; i++)
> + vduse_set_group_asid_nomsg(dev, i, 0);
>
> down_write(&dev->rwsem);
>
> @@ -623,6 +651,29 @@ static union virtio_map vduse_get_vq_map(struct vdpa_device *vdpa, u16 idx)
> return ret;
> }
>
> +static int vduse_set_group_asid(struct vdpa_device *vdpa, unsigned int group,
> + unsigned int asid)
> +{
> + struct vduse_dev *dev = vdpa_to_vduse(vdpa);
> + struct vduse_dev_msg msg = { 0 };
> + int r;
> +
> + if (dev->api_version < VDUSE_API_VERSION_1 ||
> + group >= dev->ngroups || asid >= dev->nas)
> + return -EINVAL;
> +
> + msg.req.type = VDUSE_SET_VQ_GROUP_ASID;
> + msg.req.vq_group_asid.group = group;
> + msg.req.vq_group_asid.asid = asid;
> +
> + r = vduse_dev_msg_sync(dev, &msg);
> + if (r < 0)
> + return r;
> +
> + vduse_set_group_asid_nomsg(dev, group, asid);
> + return 0;
> +}
> +
> static int vduse_vdpa_get_vq_state(struct vdpa_device *vdpa, u16 idx,
> struct vdpa_vq_state *state)
> {
> @@ -794,13 +845,13 @@ static int vduse_vdpa_set_map(struct vdpa_device *vdpa,
> struct vduse_dev *dev = vdpa_to_vduse(vdpa);
> int ret;
>
> - ret = vduse_domain_set_map(dev->domain, iotlb);
> + ret = vduse_domain_set_map(dev->as[asid].domain, iotlb);
> if (ret)
> return ret;
>
> - ret = vduse_dev_update_iotlb(dev, 0ULL, ULLONG_MAX);
> + ret = vduse_dev_update_iotlb(dev, asid, 0ULL, ULLONG_MAX);
> if (ret) {
> - vduse_domain_clear_map(dev->domain, iotlb);
> + vduse_domain_clear_map(dev->as[asid].domain, iotlb);
> return ret;
> }
>
> @@ -843,6 +894,7 @@ static const struct vdpa_config_ops vduse_vdpa_config_ops = {
> .get_vq_affinity = vduse_vdpa_get_vq_affinity,
> .reset = vduse_vdpa_reset,
> .set_map = vduse_vdpa_set_map,
> + .set_group_asid = vduse_set_group_asid,
> .get_vq_map = vduse_get_vq_map,
> .free = vduse_vdpa_free,
> };
> @@ -858,9 +910,10 @@ static void vduse_dev_sync_single_for_device(union virtio_map token,
> return;
>
> vdev = token.group->dev;
> - domain = vdev->domain;
> -
> + mutex_lock(&vdev->domain_lock);
> + domain = token.group->as->domain;
> vduse_domain_sync_single_for_device(domain, dma_addr, size, dir);
> + mutex_unlock(&vdev->domain_lock);
> }
>
> static void vduse_dev_sync_single_for_cpu(union virtio_map token,
> @@ -874,9 +927,10 @@ static void vduse_dev_sync_single_for_cpu(union virtio_map token,
> return;
>
> vdev = token.group->dev;
> - domain = vdev->domain;
> -
> + mutex_lock(&vdev->domain_lock);
> + domain = token.group->as->domain;
> vduse_domain_sync_single_for_cpu(domain, dma_addr, size, dir);
> + mutex_unlock(&vdev->domain_lock);
> }
>
> static dma_addr_t vduse_dev_map_page(union virtio_map token, struct page *page,
> @@ -886,14 +940,18 @@ static dma_addr_t vduse_dev_map_page(union virtio_map token, struct page *page,
> {
> struct vduse_dev *vdev;
> struct vduse_iova_domain *domain;
> + dma_addr_t r;
>
> if (!token.group)
> return DMA_MAPPING_ERROR;
>
> vdev = token.group->dev;
> - domain = vdev->domain;
> + mutex_lock(&vdev->domain_lock);
The mutex_lock can't be used here since the dma ops might be called in
atomic context. And I think we can just remove it since creation and
deletion operations of the iova domain are guaranteed not to execute
concurrently with I/O operations.
Thanks,
Yongji
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v8 1/6] vduse: add v1 API definition
2025-10-30 10:00 ` [PATCH v8 1/6] vduse: add v1 API definition Eugenio Pérez
@ 2025-10-30 12:10 ` Yongji Xie
0 siblings, 0 replies; 24+ messages in thread
From: Yongji Xie @ 2025-10-30 12:10 UTC (permalink / raw)
To: Eugenio Pérez
Cc: Michael S . Tsirkin, Laurent Vivier, Maxime Coquelin,
virtualization, Stefano Garzarella, Jason Wang, Cindy Lu,
Xuan Zhuo, linux-kernel
On Thu, Oct 30, 2025 at 6:00 PM Eugenio Pérez <eperezma@redhat.com> wrote:
>
> This allows the kernel to detect whether the userspace VDUSE device
> supports the VQ group and ASID features. VDUSE devices that don't set
> the V1 API will not receive the new messages, and vdpa device will be
> created with only one vq group and asid.
>
> The next patches implement the new feature incrementally, only enabling
> the VDUSE device to set the V1 API version by the end of the series.
>
> Acked-by: Jason Wang <jasowang@redhat.com>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Xie Yongji <xieyongji@bytedance.com>
Thanks,
Yongji
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v8 2/6] vduse: add vq group support
2025-10-30 10:00 ` [PATCH v8 2/6] vduse: add vq group support Eugenio Pérez
@ 2025-10-30 12:18 ` Yongji Xie
0 siblings, 0 replies; 24+ messages in thread
From: Yongji Xie @ 2025-10-30 12:18 UTC (permalink / raw)
To: Eugenio Pérez
Cc: Michael S . Tsirkin, Laurent Vivier, Maxime Coquelin,
virtualization, Stefano Garzarella, Jason Wang, Cindy Lu,
Xuan Zhuo, linux-kernel
On Thu, Oct 30, 2025 at 6:01 PM Eugenio Pérez <eperezma@redhat.com> wrote:
>
> This allows separate the different virtqueues in groups that shares the
> same address space. Asking the VDUSE device for the groups of the vq at
> the beginning as they're needed for the DMA API.
>
> Allocating 3 vq groups as net is the device that need the most groups:
> * Dataplane (guest passthrough)
> * CVQ
> * Shadowed vrings.
>
> Future versions of the series can include dynamic allocation of the
> groups array so VDUSE can declare more groups.
>
> Acked-by: Jason Wang <jasowang@redhat.com>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Xie Yongji <xieyongji@bytedance.com>
Thanks,
Yongji
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v8 4/6] vduse: refactor vdpa_dev_add for goto err handling
2025-10-30 10:00 ` [PATCH v8 4/6] vduse: refactor vdpa_dev_add for goto err handling Eugenio Pérez
@ 2025-10-31 2:33 ` Yongji Xie
0 siblings, 0 replies; 24+ messages in thread
From: Yongji Xie @ 2025-10-31 2:33 UTC (permalink / raw)
To: Eugenio Pérez
Cc: Michael S . Tsirkin, Laurent Vivier, Maxime Coquelin,
virtualization, Stefano Garzarella, Jason Wang, Cindy Lu,
Xuan Zhuo, linux-kernel
On Thu, Oct 30, 2025 at 6:01 PM Eugenio Pérez <eperezma@redhat.com> wrote:
>
> Next patches introduce more error paths in this function. Refactor it
> so they can be accomodated through gotos.
>
> Acked-by: Jason Wang <jasowang@redhat.com>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Xie Yongji <xieyongji@bytedance.com>
Thanks,
Yongji
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v8 6/6] vduse: bump version number
2025-10-30 10:00 ` [PATCH v8 6/6] vduse: bump version number Eugenio Pérez
@ 2025-10-31 2:33 ` Yongji Xie
2025-10-31 2:34 ` Yongji Xie
1 sibling, 0 replies; 24+ messages in thread
From: Yongji Xie @ 2025-10-31 2:33 UTC (permalink / raw)
To: Eugenio Pérez
Cc: Michael S . Tsirkin, Laurent Vivier, Maxime Coquelin,
virtualization, Stefano Garzarella, Jason Wang, Cindy Lu,
Xuan Zhuo, linux-kernel
On Thu, Oct 30, 2025 at 6:01 PM Eugenio Pérez <eperezma@redhat.com> wrote:
>
> Finalize the series by advertising VDUSE API v1 support to userspace.
>
> Now that all required infrastructure for v1 (ASIDs, VQ groups,
> update_iotlb_v2) is in place, VDUSE devices can opt in to the new
> features.
>
> Acked-by: Jason Wang <jasowang@redhat.com>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v8 6/6] vduse: bump version number
2025-10-30 10:00 ` [PATCH v8 6/6] vduse: bump version number Eugenio Pérez
2025-10-31 2:33 ` Yongji Xie
@ 2025-10-31 2:34 ` Yongji Xie
1 sibling, 0 replies; 24+ messages in thread
From: Yongji Xie @ 2025-10-31 2:34 UTC (permalink / raw)
To: Eugenio Pérez
Cc: Michael S . Tsirkin, Laurent Vivier, Maxime Coquelin,
virtualization, Stefano Garzarella, Jason Wang, Cindy Lu,
Xuan Zhuo, linux-kernel
On Thu, Oct 30, 2025 at 6:01 PM Eugenio Pérez <eperezma@redhat.com> wrote:
>
> Finalize the series by advertising VDUSE API v1 support to userspace.
>
> Now that all required infrastructure for v1 (ASIDs, VQ groups,
> update_iotlb_v2) is in place, VDUSE devices can opt in to the new
> features.
>
> Acked-by: Jason Wang <jasowang@redhat.com>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Xie Yongji <xieyongji@bytedance.com>
Thanks,
Yongji
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v8 5/6] vduse: add vq group asid support
2025-10-30 11:52 ` Yongji Xie
@ 2025-10-31 6:30 ` Eugenio Perez Martin
2025-10-31 6:51 ` [External] " Yongji Xie
2025-10-31 7:01 ` Yongji Xie
0 siblings, 2 replies; 24+ messages in thread
From: Eugenio Perez Martin @ 2025-10-31 6:30 UTC (permalink / raw)
To: Yongji Xie
Cc: Michael S . Tsirkin, Laurent Vivier, Maxime Coquelin,
virtualization, Stefano Garzarella, Jason Wang, Cindy Lu,
Xuan Zhuo, linux-kernel
On Thu, Oct 30, 2025 at 12:52 PM Yongji Xie <xieyongji@bytedance.com> wrote:
>
> On Thu, Oct 30, 2025 at 6:01 PM Eugenio Pérez <eperezma@redhat.com> wrote:
> >
> > Add support for assigning Address Space Identifiers (ASIDs) to each VQ
> > group. This enables mapping each group into a distinct memory space.
> >
> > Now that the driver can change ASID in the middle of operation, the
> > domain that each vq address point is also protected by domain_lock.
> >
> > Acked-by: Jason Wang <jasowang@redhat.com>
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> > v8:
> > * Revert the mutex to rwlock change, it needs proper profiling to
> > justify it.
> >
> > v7:
> > * Take write lock in the error path (Jason).
> >
> > v6:
> > * Make vdpa_dev_add use gotos for error handling (MST).
> > * s/(dev->api_version < 1) ?/(dev->api_version < VDUSE_API_VERSION_1) ?/
> > (MST).
> > * Fix struct name not matching in the doc.
> >
> > v5:
> > * Properly return errno if copy_to_user returns >0 in VDUSE_IOTLB_GET_FD
> > ioctl (Jason).
> > * Properly set domain bounce size to divide equally between nas (Jason).
> > * Exclude "padding" member from the only >V1 members in
> > vduse_dev_request.
> >
> > v4:
> > * Divide each domain bounce size between the device bounce size (Jason).
> > * revert unneeded addr = NULL assignment (Jason)
> > * Change if (x && (y || z)) return to if (x) { if (y) return; if (z)
> > return; } (Jason)
> > * Change a bad multiline comment, using @ caracter instead of * (Jason).
> > * Consider config->nas == 0 as a fail (Jason).
> >
> > v3:
> > * Get the vduse domain through the vduse_as in the map functions
> > (Jason).
> > * Squash with the patch creating the vduse_as struct (Jason).
> > * Create VDUSE_DEV_MAX_AS instead of comparing agains a magic number
> > (Jason)
> >
> > v2:
> > * Convert the use of mutex to rwlock.
> >
> > RFC v3:
> > * Increase VDUSE_MAX_VQ_GROUPS to 0xffff (Jason). It was set to a lower
> > value to reduce memory consumption, but vqs are already limited to
> > that value and userspace VDUSE is able to allocate that many vqs.
> > * Remove TODO about merging VDUSE_IOTLB_GET_FD ioctl with
> > VDUSE_IOTLB_GET_INFO.
> > * Use of array_index_nospec in VDUSE device ioctls.
> > * Embed vduse_iotlb_entry into vduse_iotlb_entry_v2.
> > * Move the umem mutex to asid struct so there is no contention between
> > ASIDs.
> >
> > RFC v2:
> > * Make iotlb entry the last one of vduse_iotlb_entry_v2 so the first
> > part of the struct is the same.
> > ---
> > drivers/vdpa/vdpa_user/vduse_dev.c | 348 ++++++++++++++++++++---------
> > include/uapi/linux/vduse.h | 53 ++++-
> > 2 files changed, 292 insertions(+), 109 deletions(-)
> >
> > diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
> > index 97be04f73fbf..c6909d73d06d 100644
> > --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> > +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> > @@ -41,6 +41,7 @@
> >
> > #define VDUSE_DEV_MAX (1U << MINORBITS)
> > #define VDUSE_DEV_MAX_GROUPS 0xffff
> > +#define VDUSE_DEV_MAX_AS 0xffff
> > #define VDUSE_MAX_BOUNCE_SIZE (1024 * 1024 * 1024)
> > #define VDUSE_MIN_BOUNCE_SIZE (1024 * 1024)
> > #define VDUSE_BOUNCE_SIZE (64 * 1024 * 1024)
> > @@ -86,7 +87,14 @@ struct vduse_umem {
> > struct mm_struct *mm;
> > };
> >
> > +struct vduse_as {
> > + struct vduse_iova_domain *domain;
> > + struct vduse_umem *umem;
> > + struct mutex mem_lock;
> > +};
> > +
> > struct vduse_vq_group {
> > + struct vduse_as *as;
> > struct vduse_dev *dev;
> > };
> >
> > @@ -94,7 +102,7 @@ struct vduse_dev {
> > struct vduse_vdpa *vdev;
> > struct device *dev;
> > struct vduse_virtqueue **vqs;
> > - struct vduse_iova_domain *domain;
> > + struct vduse_as *as;
> > char *name;
> > struct mutex lock;
> > spinlock_t msg_lock;
> > @@ -122,9 +130,8 @@ struct vduse_dev {
> > u32 vq_num;
> > u32 vq_align;
> > u32 ngroups;
> > - struct vduse_umem *umem;
> > + u32 nas;
> > struct vduse_vq_group *groups;
> > - struct mutex mem_lock;
> > unsigned int bounce_size;
> > struct mutex domain_lock;
> > };
> > @@ -314,7 +321,7 @@ static int vduse_dev_set_status(struct vduse_dev *dev, u8 status)
> > return vduse_dev_msg_sync(dev, &msg);
> > }
> >
> > -static int vduse_dev_update_iotlb(struct vduse_dev *dev,
> > +static int vduse_dev_update_iotlb(struct vduse_dev *dev, u32 asid,
> > u64 start, u64 last)
> > {
> > struct vduse_dev_msg msg = { 0 };
> > @@ -323,8 +330,14 @@ static int vduse_dev_update_iotlb(struct vduse_dev *dev,
> > return -EINVAL;
> >
> > msg.req.type = VDUSE_UPDATE_IOTLB;
> > - msg.req.iova.start = start;
> > - msg.req.iova.last = last;
> > + if (dev->api_version < VDUSE_API_VERSION_1) {
> > + msg.req.iova.start = start;
> > + msg.req.iova.last = last;
> > + } else {
> > + msg.req.iova_v2.start = start;
> > + msg.req.iova_v2.last = last;
> > + msg.req.iova_v2.asid = asid;
> > + }
> >
> > return vduse_dev_msg_sync(dev, &msg);
> > }
> > @@ -436,14 +449,29 @@ static __poll_t vduse_dev_poll(struct file *file, poll_table *wait)
> > return mask;
> > }
> >
> > +/* Force set the asid to a vq group without a message to the VDUSE device */
> > +static void vduse_set_group_asid_nomsg(struct vduse_dev *dev,
> > + unsigned int group, unsigned int asid)
> > +{
> > + mutex_lock(&dev->domain_lock);
> > + dev->groups[group].as = &dev->as[asid];
> > + mutex_unlock(&dev->domain_lock);
> > +}
> > +
> > static void vduse_dev_reset(struct vduse_dev *dev)
> > {
> > int i;
> > - struct vduse_iova_domain *domain = dev->domain;
> >
> > /* The coherent mappings are handled in vduse_dev_free_coherent() */
> > - if (domain && domain->bounce_map)
> > - vduse_domain_reset_bounce_map(domain);
> > + for (i = 0; i < dev->nas; i++) {
> > + struct vduse_iova_domain *domain = dev->as[i].domain;
> > +
> > + if (domain && domain->bounce_map)
> > + vduse_domain_reset_bounce_map(domain);
> > + }
> > +
> > + for (i = 0; i < dev->ngroups; i++)
> > + vduse_set_group_asid_nomsg(dev, i, 0);
> >
> > down_write(&dev->rwsem);
> >
> > @@ -623,6 +651,29 @@ static union virtio_map vduse_get_vq_map(struct vdpa_device *vdpa, u16 idx)
> > return ret;
> > }
> >
> > +static int vduse_set_group_asid(struct vdpa_device *vdpa, unsigned int group,
> > + unsigned int asid)
> > +{
> > + struct vduse_dev *dev = vdpa_to_vduse(vdpa);
> > + struct vduse_dev_msg msg = { 0 };
> > + int r;
> > +
> > + if (dev->api_version < VDUSE_API_VERSION_1 ||
> > + group >= dev->ngroups || asid >= dev->nas)
> > + return -EINVAL;
> > +
> > + msg.req.type = VDUSE_SET_VQ_GROUP_ASID;
> > + msg.req.vq_group_asid.group = group;
> > + msg.req.vq_group_asid.asid = asid;
> > +
> > + r = vduse_dev_msg_sync(dev, &msg);
> > + if (r < 0)
> > + return r;
> > +
> > + vduse_set_group_asid_nomsg(dev, group, asid);
> > + return 0;
> > +}
> > +
> > static int vduse_vdpa_get_vq_state(struct vdpa_device *vdpa, u16 idx,
> > struct vdpa_vq_state *state)
> > {
> > @@ -794,13 +845,13 @@ static int vduse_vdpa_set_map(struct vdpa_device *vdpa,
> > struct vduse_dev *dev = vdpa_to_vduse(vdpa);
> > int ret;
> >
> > - ret = vduse_domain_set_map(dev->domain, iotlb);
> > + ret = vduse_domain_set_map(dev->as[asid].domain, iotlb);
> > if (ret)
> > return ret;
> >
> > - ret = vduse_dev_update_iotlb(dev, 0ULL, ULLONG_MAX);
> > + ret = vduse_dev_update_iotlb(dev, asid, 0ULL, ULLONG_MAX);
> > if (ret) {
> > - vduse_domain_clear_map(dev->domain, iotlb);
> > + vduse_domain_clear_map(dev->as[asid].domain, iotlb);
> > return ret;
> > }
> >
> > @@ -843,6 +894,7 @@ static const struct vdpa_config_ops vduse_vdpa_config_ops = {
> > .get_vq_affinity = vduse_vdpa_get_vq_affinity,
> > .reset = vduse_vdpa_reset,
> > .set_map = vduse_vdpa_set_map,
> > + .set_group_asid = vduse_set_group_asid,
> > .get_vq_map = vduse_get_vq_map,
> > .free = vduse_vdpa_free,
> > };
> > @@ -858,9 +910,10 @@ static void vduse_dev_sync_single_for_device(union virtio_map token,
> > return;
> >
> > vdev = token.group->dev;
> > - domain = vdev->domain;
> > -
> > + mutex_lock(&vdev->domain_lock);
> > + domain = token.group->as->domain;
> > vduse_domain_sync_single_for_device(domain, dma_addr, size, dir);
> > + mutex_unlock(&vdev->domain_lock);
> > }
> >
> > static void vduse_dev_sync_single_for_cpu(union virtio_map token,
> > @@ -874,9 +927,10 @@ static void vduse_dev_sync_single_for_cpu(union virtio_map token,
> > return;
> >
> > vdev = token.group->dev;
> > - domain = vdev->domain;
> > -
> > + mutex_lock(&vdev->domain_lock);
> > + domain = token.group->as->domain;
> > vduse_domain_sync_single_for_cpu(domain, dma_addr, size, dir);
> > + mutex_unlock(&vdev->domain_lock);
> > }
> >
> > static dma_addr_t vduse_dev_map_page(union virtio_map token, struct page *page,
> > @@ -886,14 +940,18 @@ static dma_addr_t vduse_dev_map_page(union virtio_map token, struct page *page,
> > {
> > struct vduse_dev *vdev;
> > struct vduse_iova_domain *domain;
> > + dma_addr_t r;
> >
> > if (!token.group)
> > return DMA_MAPPING_ERROR;
> >
> > vdev = token.group->dev;
> > - domain = vdev->domain;
> > + mutex_lock(&vdev->domain_lock);
>
> The mutex_lock can't be used here since the dma ops might be called in
> atomic context. And I think we can just remove it since creation and
> deletion operations of the iova domain are guaranteed not to execute
> concurrently with I/O operations.
>
That would be great indeed! Can you expand on this, what protects here
from the moment the two syscalls are issues from userland?
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [External] Re: [PATCH v8 5/6] vduse: add vq group asid support
2025-10-31 6:30 ` Eugenio Perez Martin
@ 2025-10-31 6:51 ` Yongji Xie
2025-10-31 7:01 ` Yongji Xie
1 sibling, 0 replies; 24+ messages in thread
From: Yongji Xie @ 2025-10-31 6:51 UTC (permalink / raw)
To: Eugenio Perez Martin
Cc: Michael S . Tsirkin, Laurent Vivier, Maxime Coquelin,
virtualization, Stefano Garzarella, Jason Wang, Cindy Lu,
Xuan Zhuo, linux-kernel
On Fri, Oct 31, 2025 at 2:31 PM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Thu, Oct 30, 2025 at 12:52 PM Yongji Xie <xieyongji@bytedance.com> wrote:
> >
> > On Thu, Oct 30, 2025 at 6:01 PM Eugenio Pérez <eperezma@redhat.com> wrote:
> > >
> > > Add support for assigning Address Space Identifiers (ASIDs) to each VQ
> > > group. This enables mapping each group into a distinct memory space.
> > >
> > > Now that the driver can change ASID in the middle of operation, the
> > > domain that each vq address point is also protected by domain_lock.
> > >
> > > Acked-by: Jason Wang <jasowang@redhat.com>
> > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > ---
> > > v8:
> > > * Revert the mutex to rwlock change, it needs proper profiling to
> > > justify it.
> > >
> > > v7:
> > > * Take write lock in the error path (Jason).
> > >
> > > v6:
> > > * Make vdpa_dev_add use gotos for error handling (MST).
> > > * s/(dev->api_version < 1) ?/(dev->api_version < VDUSE_API_VERSION_1) ?/
> > > (MST).
> > > * Fix struct name not matching in the doc.
> > >
> > > v5:
> > > * Properly return errno if copy_to_user returns >0 in VDUSE_IOTLB_GET_FD
> > > ioctl (Jason).
> > > * Properly set domain bounce size to divide equally between nas (Jason).
> > > * Exclude "padding" member from the only >V1 members in
> > > vduse_dev_request.
> > >
> > > v4:
> > > * Divide each domain bounce size between the device bounce size (Jason).
> > > * revert unneeded addr = NULL assignment (Jason)
> > > * Change if (x && (y || z)) return to if (x) { if (y) return; if (z)
> > > return; } (Jason)
> > > * Change a bad multiline comment, using @ caracter instead of * (Jason).
> > > * Consider config->nas == 0 as a fail (Jason).
> > >
> > > v3:
> > > * Get the vduse domain through the vduse_as in the map functions
> > > (Jason).
> > > * Squash with the patch creating the vduse_as struct (Jason).
> > > * Create VDUSE_DEV_MAX_AS instead of comparing agains a magic number
> > > (Jason)
> > >
> > > v2:
> > > * Convert the use of mutex to rwlock.
> > >
> > > RFC v3:
> > > * Increase VDUSE_MAX_VQ_GROUPS to 0xffff (Jason). It was set to a lower
> > > value to reduce memory consumption, but vqs are already limited to
> > > that value and userspace VDUSE is able to allocate that many vqs.
> > > * Remove TODO about merging VDUSE_IOTLB_GET_FD ioctl with
> > > VDUSE_IOTLB_GET_INFO.
> > > * Use of array_index_nospec in VDUSE device ioctls.
> > > * Embed vduse_iotlb_entry into vduse_iotlb_entry_v2.
> > > * Move the umem mutex to asid struct so there is no contention between
> > > ASIDs.
> > >
> > > RFC v2:
> > > * Make iotlb entry the last one of vduse_iotlb_entry_v2 so the first
> > > part of the struct is the same.
> > > ---
> > > drivers/vdpa/vdpa_user/vduse_dev.c | 348 ++++++++++++++++++++---------
> > > include/uapi/linux/vduse.h | 53 ++++-
> > > 2 files changed, 292 insertions(+), 109 deletions(-)
> > >
> > > diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
> > > index 97be04f73fbf..c6909d73d06d 100644
> > > --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> > > +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> > > @@ -41,6 +41,7 @@
> > >
> > > #define VDUSE_DEV_MAX (1U << MINORBITS)
> > > #define VDUSE_DEV_MAX_GROUPS 0xffff
> > > +#define VDUSE_DEV_MAX_AS 0xffff
> > > #define VDUSE_MAX_BOUNCE_SIZE (1024 * 1024 * 1024)
> > > #define VDUSE_MIN_BOUNCE_SIZE (1024 * 1024)
> > > #define VDUSE_BOUNCE_SIZE (64 * 1024 * 1024)
> > > @@ -86,7 +87,14 @@ struct vduse_umem {
> > > struct mm_struct *mm;
> > > };
> > >
> > > +struct vduse_as {
> > > + struct vduse_iova_domain *domain;
> > > + struct vduse_umem *umem;
> > > + struct mutex mem_lock;
> > > +};
> > > +
> > > struct vduse_vq_group {
> > > + struct vduse_as *as;
> > > struct vduse_dev *dev;
> > > };
> > >
> > > @@ -94,7 +102,7 @@ struct vduse_dev {
> > > struct vduse_vdpa *vdev;
> > > struct device *dev;
> > > struct vduse_virtqueue **vqs;
> > > - struct vduse_iova_domain *domain;
> > > + struct vduse_as *as;
> > > char *name;
> > > struct mutex lock;
> > > spinlock_t msg_lock;
> > > @@ -122,9 +130,8 @@ struct vduse_dev {
> > > u32 vq_num;
> > > u32 vq_align;
> > > u32 ngroups;
> > > - struct vduse_umem *umem;
> > > + u32 nas;
> > > struct vduse_vq_group *groups;
> > > - struct mutex mem_lock;
> > > unsigned int bounce_size;
> > > struct mutex domain_lock;
> > > };
> > > @@ -314,7 +321,7 @@ static int vduse_dev_set_status(struct vduse_dev *dev, u8 status)
> > > return vduse_dev_msg_sync(dev, &msg);
> > > }
> > >
> > > -static int vduse_dev_update_iotlb(struct vduse_dev *dev,
> > > +static int vduse_dev_update_iotlb(struct vduse_dev *dev, u32 asid,
> > > u64 start, u64 last)
> > > {
> > > struct vduse_dev_msg msg = { 0 };
> > > @@ -323,8 +330,14 @@ static int vduse_dev_update_iotlb(struct vduse_dev *dev,
> > > return -EINVAL;
> > >
> > > msg.req.type = VDUSE_UPDATE_IOTLB;
> > > - msg.req.iova.start = start;
> > > - msg.req.iova.last = last;
> > > + if (dev->api_version < VDUSE_API_VERSION_1) {
> > > + msg.req.iova.start = start;
> > > + msg.req.iova.last = last;
> > > + } else {
> > > + msg.req.iova_v2.start = start;
> > > + msg.req.iova_v2.last = last;
> > > + msg.req.iova_v2.asid = asid;
> > > + }
> > >
> > > return vduse_dev_msg_sync(dev, &msg);
> > > }
> > > @@ -436,14 +449,29 @@ static __poll_t vduse_dev_poll(struct file *file, poll_table *wait)
> > > return mask;
> > > }
> > >
> > > +/* Force set the asid to a vq group without a message to the VDUSE device */
> > > +static void vduse_set_group_asid_nomsg(struct vduse_dev *dev,
> > > + unsigned int group, unsigned int asid)
> > > +{
> > > + mutex_lock(&dev->domain_lock);
> > > + dev->groups[group].as = &dev->as[asid];
> > > + mutex_unlock(&dev->domain_lock);
> > > +}
> > > +
> > > static void vduse_dev_reset(struct vduse_dev *dev)
> > > {
> > > int i;
> > > - struct vduse_iova_domain *domain = dev->domain;
> > >
> > > /* The coherent mappings are handled in vduse_dev_free_coherent() */
> > > - if (domain && domain->bounce_map)
> > > - vduse_domain_reset_bounce_map(domain);
> > > + for (i = 0; i < dev->nas; i++) {
> > > + struct vduse_iova_domain *domain = dev->as[i].domain;
> > > +
> > > + if (domain && domain->bounce_map)
> > > + vduse_domain_reset_bounce_map(domain);
> > > + }
> > > +
> > > + for (i = 0; i < dev->ngroups; i++)
> > > + vduse_set_group_asid_nomsg(dev, i, 0);
> > >
> > > down_write(&dev->rwsem);
> > >
> > > @@ -623,6 +651,29 @@ static union virtio_map vduse_get_vq_map(struct vdpa_device *vdpa, u16 idx)
> > > return ret;
> > > }
> > >
> > > +static int vduse_set_group_asid(struct vdpa_device *vdpa, unsigned int group,
> > > + unsigned int asid)
> > > +{
> > > + struct vduse_dev *dev = vdpa_to_vduse(vdpa);
> > > + struct vduse_dev_msg msg = { 0 };
> > > + int r;
> > > +
> > > + if (dev->api_version < VDUSE_API_VERSION_1 ||
> > > + group >= dev->ngroups || asid >= dev->nas)
> > > + return -EINVAL;
> > > +
> > > + msg.req.type = VDUSE_SET_VQ_GROUP_ASID;
> > > + msg.req.vq_group_asid.group = group;
> > > + msg.req.vq_group_asid.asid = asid;
> > > +
> > > + r = vduse_dev_msg_sync(dev, &msg);
> > > + if (r < 0)
> > > + return r;
> > > +
> > > + vduse_set_group_asid_nomsg(dev, group, asid);
> > > + return 0;
> > > +}
> > > +
> > > static int vduse_vdpa_get_vq_state(struct vdpa_device *vdpa, u16 idx,
> > > struct vdpa_vq_state *state)
> > > {
> > > @@ -794,13 +845,13 @@ static int vduse_vdpa_set_map(struct vdpa_device *vdpa,
> > > struct vduse_dev *dev = vdpa_to_vduse(vdpa);
> > > int ret;
> > >
> > > - ret = vduse_domain_set_map(dev->domain, iotlb);
> > > + ret = vduse_domain_set_map(dev->as[asid].domain, iotlb);
> > > if (ret)
> > > return ret;
> > >
> > > - ret = vduse_dev_update_iotlb(dev, 0ULL, ULLONG_MAX);
> > > + ret = vduse_dev_update_iotlb(dev, asid, 0ULL, ULLONG_MAX);
> > > if (ret) {
> > > - vduse_domain_clear_map(dev->domain, iotlb);
> > > + vduse_domain_clear_map(dev->as[asid].domain, iotlb);
> > > return ret;
> > > }
> > >
> > > @@ -843,6 +894,7 @@ static const struct vdpa_config_ops vduse_vdpa_config_ops = {
> > > .get_vq_affinity = vduse_vdpa_get_vq_affinity,
> > > .reset = vduse_vdpa_reset,
> > > .set_map = vduse_vdpa_set_map,
> > > + .set_group_asid = vduse_set_group_asid,
> > > .get_vq_map = vduse_get_vq_map,
> > > .free = vduse_vdpa_free,
> > > };
> > > @@ -858,9 +910,10 @@ static void vduse_dev_sync_single_for_device(union virtio_map token,
> > > return;
> > >
> > > vdev = token.group->dev;
> > > - domain = vdev->domain;
> > > -
> > > + mutex_lock(&vdev->domain_lock);
> > > + domain = token.group->as->domain;
> > > vduse_domain_sync_single_for_device(domain, dma_addr, size, dir);
> > > + mutex_unlock(&vdev->domain_lock);
> > > }
> > >
> > > static void vduse_dev_sync_single_for_cpu(union virtio_map token,
> > > @@ -874,9 +927,10 @@ static void vduse_dev_sync_single_for_cpu(union virtio_map token,
> > > return;
> > >
> > > vdev = token.group->dev;
> > > - domain = vdev->domain;
> > > -
> > > + mutex_lock(&vdev->domain_lock);
> > > + domain = token.group->as->domain;
> > > vduse_domain_sync_single_for_cpu(domain, dma_addr, size, dir);
> > > + mutex_unlock(&vdev->domain_lock);
> > > }
> > >
> > > static dma_addr_t vduse_dev_map_page(union virtio_map token, struct page *page,
> > > @@ -886,14 +940,18 @@ static dma_addr_t vduse_dev_map_page(union virtio_map token, struct page *page,
> > > {
> > > struct vduse_dev *vdev;
> > > struct vduse_iova_domain *domain;
> > > + dma_addr_t r;
> > >
> > > if (!token.group)
> > > return DMA_MAPPING_ERROR;
> > >
> > > vdev = token.group->dev;
> > > - domain = vdev->domain;
> > > + mutex_lock(&vdev->domain_lock);
> >
> > The mutex_lock can't be used here since the dma ops might be called in
> > atomic context. And I think we can just remove it since creation and
> > deletion operations of the iova domain are guaranteed not to execute
> > concurrently with I/O operations.
> >
>
> That would be great indeed! Can you expand on this, what protects here
> from the moment the two syscalls are issues from userland?
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Re: [PATCH v8 5/6] vduse: add vq group asid support
2025-10-31 6:30 ` Eugenio Perez Martin
2025-10-31 6:51 ` [External] " Yongji Xie
@ 2025-10-31 7:01 ` Yongji Xie
2025-10-31 14:08 ` Eugenio Perez Martin
1 sibling, 1 reply; 24+ messages in thread
From: Yongji Xie @ 2025-10-31 7:01 UTC (permalink / raw)
To: Eugenio Perez Martin
Cc: Michael S . Tsirkin, Laurent Vivier, Maxime Coquelin,
virtualization, Stefano Garzarella, Jason Wang, Cindy Lu,
Xuan Zhuo, linux-kernel
On Fri, Oct 31, 2025 at 2:31 PM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Thu, Oct 30, 2025 at 12:52 PM Yongji Xie <xieyongji@bytedance.com> wrote:
> >
> > On Thu, Oct 30, 2025 at 6:01 PM Eugenio Pérez <eperezma@redhat.com> wrote:
> > >
> > > Add support for assigning Address Space Identifiers (ASIDs) to each VQ
> > > group. This enables mapping each group into a distinct memory space.
> > >
> > > Now that the driver can change ASID in the middle of operation, the
> > > domain that each vq address point is also protected by domain_lock.
> > >
> > > Acked-by: Jason Wang <jasowang@redhat.com>
> > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > ---
> > > v8:
> > > * Revert the mutex to rwlock change, it needs proper profiling to
> > > justify it.
> > >
> > > v7:
> > > * Take write lock in the error path (Jason).
> > >
> > > v6:
> > > * Make vdpa_dev_add use gotos for error handling (MST).
> > > * s/(dev->api_version < 1) ?/(dev->api_version < VDUSE_API_VERSION_1) ?/
> > > (MST).
> > > * Fix struct name not matching in the doc.
> > >
> > > v5:
> > > * Properly return errno if copy_to_user returns >0 in VDUSE_IOTLB_GET_FD
> > > ioctl (Jason).
> > > * Properly set domain bounce size to divide equally between nas (Jason).
> > > * Exclude "padding" member from the only >V1 members in
> > > vduse_dev_request.
> > >
> > > v4:
> > > * Divide each domain bounce size between the device bounce size (Jason).
> > > * revert unneeded addr = NULL assignment (Jason)
> > > * Change if (x && (y || z)) return to if (x) { if (y) return; if (z)
> > > return; } (Jason)
> > > * Change a bad multiline comment, using @ caracter instead of * (Jason).
> > > * Consider config->nas == 0 as a fail (Jason).
> > >
> > > v3:
> > > * Get the vduse domain through the vduse_as in the map functions
> > > (Jason).
> > > * Squash with the patch creating the vduse_as struct (Jason).
> > > * Create VDUSE_DEV_MAX_AS instead of comparing agains a magic number
> > > (Jason)
> > >
> > > v2:
> > > * Convert the use of mutex to rwlock.
> > >
> > > RFC v3:
> > > * Increase VDUSE_MAX_VQ_GROUPS to 0xffff (Jason). It was set to a lower
> > > value to reduce memory consumption, but vqs are already limited to
> > > that value and userspace VDUSE is able to allocate that many vqs.
> > > * Remove TODO about merging VDUSE_IOTLB_GET_FD ioctl with
> > > VDUSE_IOTLB_GET_INFO.
> > > * Use of array_index_nospec in VDUSE device ioctls.
> > > * Embed vduse_iotlb_entry into vduse_iotlb_entry_v2.
> > > * Move the umem mutex to asid struct so there is no contention between
> > > ASIDs.
> > >
> > > RFC v2:
> > > * Make iotlb entry the last one of vduse_iotlb_entry_v2 so the first
> > > part of the struct is the same.
> > > ---
> > > drivers/vdpa/vdpa_user/vduse_dev.c | 348 ++++++++++++++++++++---------
> > > include/uapi/linux/vduse.h | 53 ++++-
> > > 2 files changed, 292 insertions(+), 109 deletions(-)
> > >
> > > diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
> > > index 97be04f73fbf..c6909d73d06d 100644
> > > --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> > > +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> > > @@ -41,6 +41,7 @@
> > >
> > > #define VDUSE_DEV_MAX (1U << MINORBITS)
> > > #define VDUSE_DEV_MAX_GROUPS 0xffff
> > > +#define VDUSE_DEV_MAX_AS 0xffff
> > > #define VDUSE_MAX_BOUNCE_SIZE (1024 * 1024 * 1024)
> > > #define VDUSE_MIN_BOUNCE_SIZE (1024 * 1024)
> > > #define VDUSE_BOUNCE_SIZE (64 * 1024 * 1024)
> > > @@ -86,7 +87,14 @@ struct vduse_umem {
> > > struct mm_struct *mm;
> > > };
> > >
> > > +struct vduse_as {
> > > + struct vduse_iova_domain *domain;
> > > + struct vduse_umem *umem;
> > > + struct mutex mem_lock;
> > > +};
> > > +
> > > struct vduse_vq_group {
> > > + struct vduse_as *as;
> > > struct vduse_dev *dev;
> > > };
> > >
> > > @@ -94,7 +102,7 @@ struct vduse_dev {
> > > struct vduse_vdpa *vdev;
> > > struct device *dev;
> > > struct vduse_virtqueue **vqs;
> > > - struct vduse_iova_domain *domain;
> > > + struct vduse_as *as;
> > > char *name;
> > > struct mutex lock;
> > > spinlock_t msg_lock;
> > > @@ -122,9 +130,8 @@ struct vduse_dev {
> > > u32 vq_num;
> > > u32 vq_align;
> > > u32 ngroups;
> > > - struct vduse_umem *umem;
> > > + u32 nas;
> > > struct vduse_vq_group *groups;
> > > - struct mutex mem_lock;
> > > unsigned int bounce_size;
> > > struct mutex domain_lock;
> > > };
> > > @@ -314,7 +321,7 @@ static int vduse_dev_set_status(struct vduse_dev *dev, u8 status)
> > > return vduse_dev_msg_sync(dev, &msg);
> > > }
> > >
> > > -static int vduse_dev_update_iotlb(struct vduse_dev *dev,
> > > +static int vduse_dev_update_iotlb(struct vduse_dev *dev, u32 asid,
> > > u64 start, u64 last)
> > > {
> > > struct vduse_dev_msg msg = { 0 };
> > > @@ -323,8 +330,14 @@ static int vduse_dev_update_iotlb(struct vduse_dev *dev,
> > > return -EINVAL;
> > >
> > > msg.req.type = VDUSE_UPDATE_IOTLB;
> > > - msg.req.iova.start = start;
> > > - msg.req.iova.last = last;
> > > + if (dev->api_version < VDUSE_API_VERSION_1) {
> > > + msg.req.iova.start = start;
> > > + msg.req.iova.last = last;
> > > + } else {
> > > + msg.req.iova_v2.start = start;
> > > + msg.req.iova_v2.last = last;
> > > + msg.req.iova_v2.asid = asid;
> > > + }
> > >
> > > return vduse_dev_msg_sync(dev, &msg);
> > > }
> > > @@ -436,14 +449,29 @@ static __poll_t vduse_dev_poll(struct file *file, poll_table *wait)
> > > return mask;
> > > }
> > >
> > > +/* Force set the asid to a vq group without a message to the VDUSE device */
> > > +static void vduse_set_group_asid_nomsg(struct vduse_dev *dev,
> > > + unsigned int group, unsigned int asid)
> > > +{
> > > + mutex_lock(&dev->domain_lock);
> > > + dev->groups[group].as = &dev->as[asid];
> > > + mutex_unlock(&dev->domain_lock);
> > > +}
> > > +
> > > static void vduse_dev_reset(struct vduse_dev *dev)
> > > {
> > > int i;
> > > - struct vduse_iova_domain *domain = dev->domain;
> > >
> > > /* The coherent mappings are handled in vduse_dev_free_coherent() */
> > > - if (domain && domain->bounce_map)
> > > - vduse_domain_reset_bounce_map(domain);
> > > + for (i = 0; i < dev->nas; i++) {
> > > + struct vduse_iova_domain *domain = dev->as[i].domain;
> > > +
> > > + if (domain && domain->bounce_map)
> > > + vduse_domain_reset_bounce_map(domain);
> > > + }
> > > +
> > > + for (i = 0; i < dev->ngroups; i++)
> > > + vduse_set_group_asid_nomsg(dev, i, 0);
> > >
> > > down_write(&dev->rwsem);
> > >
> > > @@ -623,6 +651,29 @@ static union virtio_map vduse_get_vq_map(struct vdpa_device *vdpa, u16 idx)
> > > return ret;
> > > }
> > >
> > > +static int vduse_set_group_asid(struct vdpa_device *vdpa, unsigned int group,
> > > + unsigned int asid)
> > > +{
> > > + struct vduse_dev *dev = vdpa_to_vduse(vdpa);
> > > + struct vduse_dev_msg msg = { 0 };
> > > + int r;
> > > +
> > > + if (dev->api_version < VDUSE_API_VERSION_1 ||
> > > + group >= dev->ngroups || asid >= dev->nas)
> > > + return -EINVAL;
> > > +
> > > + msg.req.type = VDUSE_SET_VQ_GROUP_ASID;
> > > + msg.req.vq_group_asid.group = group;
> > > + msg.req.vq_group_asid.asid = asid;
> > > +
> > > + r = vduse_dev_msg_sync(dev, &msg);
> > > + if (r < 0)
> > > + return r;
> > > +
> > > + vduse_set_group_asid_nomsg(dev, group, asid);
> > > + return 0;
> > > +}
> > > +
> > > static int vduse_vdpa_get_vq_state(struct vdpa_device *vdpa, u16 idx,
> > > struct vdpa_vq_state *state)
> > > {
> > > @@ -794,13 +845,13 @@ static int vduse_vdpa_set_map(struct vdpa_device *vdpa,
> > > struct vduse_dev *dev = vdpa_to_vduse(vdpa);
> > > int ret;
> > >
> > > - ret = vduse_domain_set_map(dev->domain, iotlb);
> > > + ret = vduse_domain_set_map(dev->as[asid].domain, iotlb);
> > > if (ret)
> > > return ret;
> > >
> > > - ret = vduse_dev_update_iotlb(dev, 0ULL, ULLONG_MAX);
> > > + ret = vduse_dev_update_iotlb(dev, asid, 0ULL, ULLONG_MAX);
> > > if (ret) {
> > > - vduse_domain_clear_map(dev->domain, iotlb);
> > > + vduse_domain_clear_map(dev->as[asid].domain, iotlb);
> > > return ret;
> > > }
> > >
> > > @@ -843,6 +894,7 @@ static const struct vdpa_config_ops vduse_vdpa_config_ops = {
> > > .get_vq_affinity = vduse_vdpa_get_vq_affinity,
> > > .reset = vduse_vdpa_reset,
> > > .set_map = vduse_vdpa_set_map,
> > > + .set_group_asid = vduse_set_group_asid,
> > > .get_vq_map = vduse_get_vq_map,
> > > .free = vduse_vdpa_free,
> > > };
> > > @@ -858,9 +910,10 @@ static void vduse_dev_sync_single_for_device(union virtio_map token,
> > > return;
> > >
> > > vdev = token.group->dev;
> > > - domain = vdev->domain;
> > > -
> > > + mutex_lock(&vdev->domain_lock);
> > > + domain = token.group->as->domain;
> > > vduse_domain_sync_single_for_device(domain, dma_addr, size, dir);
> > > + mutex_unlock(&vdev->domain_lock);
> > > }
> > >
> > > static void vduse_dev_sync_single_for_cpu(union virtio_map token,
> > > @@ -874,9 +927,10 @@ static void vduse_dev_sync_single_for_cpu(union virtio_map token,
> > > return;
> > >
> > > vdev = token.group->dev;
> > > - domain = vdev->domain;
> > > -
> > > + mutex_lock(&vdev->domain_lock);
> > > + domain = token.group->as->domain;
> > > vduse_domain_sync_single_for_cpu(domain, dma_addr, size, dir);
> > > + mutex_unlock(&vdev->domain_lock);
> > > }
> > >
> > > static dma_addr_t vduse_dev_map_page(union virtio_map token, struct page *page,
> > > @@ -886,14 +940,18 @@ static dma_addr_t vduse_dev_map_page(union virtio_map token, struct page *page,
> > > {
> > > struct vduse_dev *vdev;
> > > struct vduse_iova_domain *domain;
> > > + dma_addr_t r;
> > >
> > > if (!token.group)
> > > return DMA_MAPPING_ERROR;
> > >
> > > vdev = token.group->dev;
> > > - domain = vdev->domain;
> > > + mutex_lock(&vdev->domain_lock);
> >
> > The mutex_lock can't be used here since the dma ops might be called in
> > atomic context. And I think we can just remove it since creation and
> > deletion operations of the iova domain are guaranteed not to execute
> > concurrently with I/O operations.
> >
>
> That would be great indeed! Can you expand on this, what protects here
> from the moment the two syscalls are issues from userland?
>
The domain mutex lock is introduced in commit
d4438d23eeeef8bb1b3a8abe418abffd60bb544a ("vduse: Delay iova domain
creation"). It's used to prevent concurrent execution between
vdpa_dev_add() and some vduse device ioctl which needs to access the
iova domain such as VDUSE_IOTLB_REG_UMEM/DEREG_UMEM. But the dma ops
would not be called until vdpa_dev_add() completed, so the mutex lock
is not needed.
Thanks,
Yongji
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Re: [PATCH v8 5/6] vduse: add vq group asid support
2025-10-31 7:01 ` Yongji Xie
@ 2025-10-31 14:08 ` Eugenio Perez Martin
2025-11-03 2:55 ` Yongji Xie
0 siblings, 1 reply; 24+ messages in thread
From: Eugenio Perez Martin @ 2025-10-31 14:08 UTC (permalink / raw)
To: Yongji Xie
Cc: Michael S . Tsirkin, Laurent Vivier, Maxime Coquelin,
virtualization, Stefano Garzarella, Jason Wang, Cindy Lu,
Xuan Zhuo, linux-kernel
On Fri, Oct 31, 2025 at 8:01 AM Yongji Xie <xieyongji@bytedance.com> wrote:
>
> On Fri, Oct 31, 2025 at 2:31 PM Eugenio Perez Martin
> <eperezma@redhat.com> wrote:
> >
> > On Thu, Oct 30, 2025 at 12:52 PM Yongji Xie <xieyongji@bytedance.com> wrote:
> > >
> > > On Thu, Oct 30, 2025 at 6:01 PM Eugenio Pérez <eperezma@redhat.com> wrote:
> > > >
> > > > Add support for assigning Address Space Identifiers (ASIDs) to each VQ
> > > > group. This enables mapping each group into a distinct memory space.
> > > >
> > > > Now that the driver can change ASID in the middle of operation, the
> > > > domain that each vq address point is also protected by domain_lock.
> > > >
> > > > Acked-by: Jason Wang <jasowang@redhat.com>
> > > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > > ---
> > > > v8:
> > > > * Revert the mutex to rwlock change, it needs proper profiling to
> > > > justify it.
> > > >
> > > > v7:
> > > > * Take write lock in the error path (Jason).
> > > >
> > > > v6:
> > > > * Make vdpa_dev_add use gotos for error handling (MST).
> > > > * s/(dev->api_version < 1) ?/(dev->api_version < VDUSE_API_VERSION_1) ?/
> > > > (MST).
> > > > * Fix struct name not matching in the doc.
> > > >
> > > > v5:
> > > > * Properly return errno if copy_to_user returns >0 in VDUSE_IOTLB_GET_FD
> > > > ioctl (Jason).
> > > > * Properly set domain bounce size to divide equally between nas (Jason).
> > > > * Exclude "padding" member from the only >V1 members in
> > > > vduse_dev_request.
> > > >
> > > > v4:
> > > > * Divide each domain bounce size between the device bounce size (Jason).
> > > > * revert unneeded addr = NULL assignment (Jason)
> > > > * Change if (x && (y || z)) return to if (x) { if (y) return; if (z)
> > > > return; } (Jason)
> > > > * Change a bad multiline comment, using @ caracter instead of * (Jason).
> > > > * Consider config->nas == 0 as a fail (Jason).
> > > >
> > > > v3:
> > > > * Get the vduse domain through the vduse_as in the map functions
> > > > (Jason).
> > > > * Squash with the patch creating the vduse_as struct (Jason).
> > > > * Create VDUSE_DEV_MAX_AS instead of comparing agains a magic number
> > > > (Jason)
> > > >
> > > > v2:
> > > > * Convert the use of mutex to rwlock.
> > > >
> > > > RFC v3:
> > > > * Increase VDUSE_MAX_VQ_GROUPS to 0xffff (Jason). It was set to a lower
> > > > value to reduce memory consumption, but vqs are already limited to
> > > > that value and userspace VDUSE is able to allocate that many vqs.
> > > > * Remove TODO about merging VDUSE_IOTLB_GET_FD ioctl with
> > > > VDUSE_IOTLB_GET_INFO.
> > > > * Use of array_index_nospec in VDUSE device ioctls.
> > > > * Embed vduse_iotlb_entry into vduse_iotlb_entry_v2.
> > > > * Move the umem mutex to asid struct so there is no contention between
> > > > ASIDs.
> > > >
> > > > RFC v2:
> > > > * Make iotlb entry the last one of vduse_iotlb_entry_v2 so the first
> > > > part of the struct is the same.
> > > > ---
> > > > drivers/vdpa/vdpa_user/vduse_dev.c | 348 ++++++++++++++++++++---------
> > > > include/uapi/linux/vduse.h | 53 ++++-
> > > > 2 files changed, 292 insertions(+), 109 deletions(-)
> > > >
> > > > diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
> > > > index 97be04f73fbf..c6909d73d06d 100644
> > > > --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> > > > +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> > > > @@ -41,6 +41,7 @@
> > > >
> > > > #define VDUSE_DEV_MAX (1U << MINORBITS)
> > > > #define VDUSE_DEV_MAX_GROUPS 0xffff
> > > > +#define VDUSE_DEV_MAX_AS 0xffff
> > > > #define VDUSE_MAX_BOUNCE_SIZE (1024 * 1024 * 1024)
> > > > #define VDUSE_MIN_BOUNCE_SIZE (1024 * 1024)
> > > > #define VDUSE_BOUNCE_SIZE (64 * 1024 * 1024)
> > > > @@ -86,7 +87,14 @@ struct vduse_umem {
> > > > struct mm_struct *mm;
> > > > };
> > > >
> > > > +struct vduse_as {
> > > > + struct vduse_iova_domain *domain;
> > > > + struct vduse_umem *umem;
> > > > + struct mutex mem_lock;
> > > > +};
> > > > +
> > > > struct vduse_vq_group {
> > > > + struct vduse_as *as;
> > > > struct vduse_dev *dev;
> > > > };
> > > >
> > > > @@ -94,7 +102,7 @@ struct vduse_dev {
> > > > struct vduse_vdpa *vdev;
> > > > struct device *dev;
> > > > struct vduse_virtqueue **vqs;
> > > > - struct vduse_iova_domain *domain;
> > > > + struct vduse_as *as;
> > > > char *name;
> > > > struct mutex lock;
> > > > spinlock_t msg_lock;
> > > > @@ -122,9 +130,8 @@ struct vduse_dev {
> > > > u32 vq_num;
> > > > u32 vq_align;
> > > > u32 ngroups;
> > > > - struct vduse_umem *umem;
> > > > + u32 nas;
> > > > struct vduse_vq_group *groups;
> > > > - struct mutex mem_lock;
> > > > unsigned int bounce_size;
> > > > struct mutex domain_lock;
> > > > };
> > > > @@ -314,7 +321,7 @@ static int vduse_dev_set_status(struct vduse_dev *dev, u8 status)
> > > > return vduse_dev_msg_sync(dev, &msg);
> > > > }
> > > >
> > > > -static int vduse_dev_update_iotlb(struct vduse_dev *dev,
> > > > +static int vduse_dev_update_iotlb(struct vduse_dev *dev, u32 asid,
> > > > u64 start, u64 last)
> > > > {
> > > > struct vduse_dev_msg msg = { 0 };
> > > > @@ -323,8 +330,14 @@ static int vduse_dev_update_iotlb(struct vduse_dev *dev,
> > > > return -EINVAL;
> > > >
> > > > msg.req.type = VDUSE_UPDATE_IOTLB;
> > > > - msg.req.iova.start = start;
> > > > - msg.req.iova.last = last;
> > > > + if (dev->api_version < VDUSE_API_VERSION_1) {
> > > > + msg.req.iova.start = start;
> > > > + msg.req.iova.last = last;
> > > > + } else {
> > > > + msg.req.iova_v2.start = start;
> > > > + msg.req.iova_v2.last = last;
> > > > + msg.req.iova_v2.asid = asid;
> > > > + }
> > > >
> > > > return vduse_dev_msg_sync(dev, &msg);
> > > > }
> > > > @@ -436,14 +449,29 @@ static __poll_t vduse_dev_poll(struct file *file, poll_table *wait)
> > > > return mask;
> > > > }
> > > >
> > > > +/* Force set the asid to a vq group without a message to the VDUSE device */
> > > > +static void vduse_set_group_asid_nomsg(struct vduse_dev *dev,
> > > > + unsigned int group, unsigned int asid)
> > > > +{
> > > > + mutex_lock(&dev->domain_lock);
> > > > + dev->groups[group].as = &dev->as[asid];
> > > > + mutex_unlock(&dev->domain_lock);
> > > > +}
> > > > +
> > > > static void vduse_dev_reset(struct vduse_dev *dev)
> > > > {
> > > > int i;
> > > > - struct vduse_iova_domain *domain = dev->domain;
> > > >
> > > > /* The coherent mappings are handled in vduse_dev_free_coherent() */
> > > > - if (domain && domain->bounce_map)
> > > > - vduse_domain_reset_bounce_map(domain);
> > > > + for (i = 0; i < dev->nas; i++) {
> > > > + struct vduse_iova_domain *domain = dev->as[i].domain;
> > > > +
> > > > + if (domain && domain->bounce_map)
> > > > + vduse_domain_reset_bounce_map(domain);
> > > > + }
> > > > +
> > > > + for (i = 0; i < dev->ngroups; i++)
> > > > + vduse_set_group_asid_nomsg(dev, i, 0);
> > > >
> > > > down_write(&dev->rwsem);
> > > >
> > > > @@ -623,6 +651,29 @@ static union virtio_map vduse_get_vq_map(struct vdpa_device *vdpa, u16 idx)
> > > > return ret;
> > > > }
> > > >
> > > > +static int vduse_set_group_asid(struct vdpa_device *vdpa, unsigned int group,
> > > > + unsigned int asid)
> > > > +{
> > > > + struct vduse_dev *dev = vdpa_to_vduse(vdpa);
> > > > + struct vduse_dev_msg msg = { 0 };
> > > > + int r;
> > > > +
> > > > + if (dev->api_version < VDUSE_API_VERSION_1 ||
> > > > + group >= dev->ngroups || asid >= dev->nas)
> > > > + return -EINVAL;
> > > > +
> > > > + msg.req.type = VDUSE_SET_VQ_GROUP_ASID;
> > > > + msg.req.vq_group_asid.group = group;
> > > > + msg.req.vq_group_asid.asid = asid;
> > > > +
> > > > + r = vduse_dev_msg_sync(dev, &msg);
> > > > + if (r < 0)
> > > > + return r;
> > > > +
> > > > + vduse_set_group_asid_nomsg(dev, group, asid);
> > > > + return 0;
> > > > +}
> > > > +
> > > > static int vduse_vdpa_get_vq_state(struct vdpa_device *vdpa, u16 idx,
> > > > struct vdpa_vq_state *state)
> > > > {
> > > > @@ -794,13 +845,13 @@ static int vduse_vdpa_set_map(struct vdpa_device *vdpa,
> > > > struct vduse_dev *dev = vdpa_to_vduse(vdpa);
> > > > int ret;
> > > >
> > > > - ret = vduse_domain_set_map(dev->domain, iotlb);
> > > > + ret = vduse_domain_set_map(dev->as[asid].domain, iotlb);
> > > > if (ret)
> > > > return ret;
> > > >
> > > > - ret = vduse_dev_update_iotlb(dev, 0ULL, ULLONG_MAX);
> > > > + ret = vduse_dev_update_iotlb(dev, asid, 0ULL, ULLONG_MAX);
> > > > if (ret) {
> > > > - vduse_domain_clear_map(dev->domain, iotlb);
> > > > + vduse_domain_clear_map(dev->as[asid].domain, iotlb);
> > > > return ret;
> > > > }
> > > >
> > > > @@ -843,6 +894,7 @@ static const struct vdpa_config_ops vduse_vdpa_config_ops = {
> > > > .get_vq_affinity = vduse_vdpa_get_vq_affinity,
> > > > .reset = vduse_vdpa_reset,
> > > > .set_map = vduse_vdpa_set_map,
> > > > + .set_group_asid = vduse_set_group_asid,
> > > > .get_vq_map = vduse_get_vq_map,
> > > > .free = vduse_vdpa_free,
> > > > };
> > > > @@ -858,9 +910,10 @@ static void vduse_dev_sync_single_for_device(union virtio_map token,
> > > > return;
> > > >
> > > > vdev = token.group->dev;
> > > > - domain = vdev->domain;
> > > > -
> > > > + mutex_lock(&vdev->domain_lock);
> > > > + domain = token.group->as->domain;
> > > > vduse_domain_sync_single_for_device(domain, dma_addr, size, dir);
> > > > + mutex_unlock(&vdev->domain_lock);
> > > > }
> > > >
> > > > static void vduse_dev_sync_single_for_cpu(union virtio_map token,
> > > > @@ -874,9 +927,10 @@ static void vduse_dev_sync_single_for_cpu(union virtio_map token,
> > > > return;
> > > >
> > > > vdev = token.group->dev;
> > > > - domain = vdev->domain;
> > > > -
> > > > + mutex_lock(&vdev->domain_lock);
> > > > + domain = token.group->as->domain;
> > > > vduse_domain_sync_single_for_cpu(domain, dma_addr, size, dir);
> > > > + mutex_unlock(&vdev->domain_lock);
> > > > }
> > > >
> > > > static dma_addr_t vduse_dev_map_page(union virtio_map token, struct page *page,
> > > > @@ -886,14 +940,18 @@ static dma_addr_t vduse_dev_map_page(union virtio_map token, struct page *page,
> > > > {
> > > > struct vduse_dev *vdev;
> > > > struct vduse_iova_domain *domain;
> > > > + dma_addr_t r;
> > > >
> > > > if (!token.group)
> > > > return DMA_MAPPING_ERROR;
> > > >
> > > > vdev = token.group->dev;
> > > > - domain = vdev->domain;
> > > > + mutex_lock(&vdev->domain_lock);
> > >
> > > The mutex_lock can't be used here since the dma ops might be called in
> > > atomic context. And I think we can just remove it since creation and
> > > deletion operations of the iova domain are guaranteed not to execute
> > > concurrently with I/O operations.
> > >
> >
> > That would be great indeed! Can you expand on this, what protects here
> > from the moment the two syscalls are issues from userland?
> >
>
> The domain mutex lock is introduced in commit
> d4438d23eeeef8bb1b3a8abe418abffd60bb544a ("vduse: Delay iova domain
> creation"). It's used to prevent concurrent execution between
> vdpa_dev_add() and some vduse device ioctl which needs to access the
> iova domain such as VDUSE_IOTLB_REG_UMEM/DEREG_UMEM. But the dma ops
> would not be called until vdpa_dev_add() completed, so the mutex lock
> is not needed.
>
Yes, but the usage is extended here to also protect from concurrent
calls to vduse_dev_map_page and vduse_set_group_asid or similar.
So I guess the best is to replace it with a spinlock or RCU.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Re: Re: [PATCH v8 5/6] vduse: add vq group asid support
2025-10-31 14:08 ` Eugenio Perez Martin
@ 2025-11-03 2:55 ` Yongji Xie
2025-11-03 6:40 ` Eugenio Perez Martin
0 siblings, 1 reply; 24+ messages in thread
From: Yongji Xie @ 2025-11-03 2:55 UTC (permalink / raw)
To: Eugenio Perez Martin
Cc: Michael S . Tsirkin, Laurent Vivier, Maxime Coquelin,
virtualization, Stefano Garzarella, Jason Wang, Cindy Lu,
Xuan Zhuo, linux-kernel
On Fri, Oct 31, 2025 at 10:09 PM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Fri, Oct 31, 2025 at 8:01 AM Yongji Xie <xieyongji@bytedance.com> wrote:
> >
> > On Fri, Oct 31, 2025 at 2:31 PM Eugenio Perez Martin
> > <eperezma@redhat.com> wrote:
> > >
> > > On Thu, Oct 30, 2025 at 12:52 PM Yongji Xie <xieyongji@bytedance.com> wrote:
> > > >
> > > > On Thu, Oct 30, 2025 at 6:01 PM Eugenio Pérez <eperezma@redhat.com> wrote:
> > > > >
> > > > > Add support for assigning Address Space Identifiers (ASIDs) to each VQ
> > > > > group. This enables mapping each group into a distinct memory space.
> > > > >
> > > > > Now that the driver can change ASID in the middle of operation, the
> > > > > domain that each vq address point is also protected by domain_lock.
> > > > >
> > > > > Acked-by: Jason Wang <jasowang@redhat.com>
> > > > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > > > ---
> > > > > v8:
> > > > > * Revert the mutex to rwlock change, it needs proper profiling to
> > > > > justify it.
> > > > >
> > > > > v7:
> > > > > * Take write lock in the error path (Jason).
> > > > >
> > > > > v6:
> > > > > * Make vdpa_dev_add use gotos for error handling (MST).
> > > > > * s/(dev->api_version < 1) ?/(dev->api_version < VDUSE_API_VERSION_1) ?/
> > > > > (MST).
> > > > > * Fix struct name not matching in the doc.
> > > > >
> > > > > v5:
> > > > > * Properly return errno if copy_to_user returns >0 in VDUSE_IOTLB_GET_FD
> > > > > ioctl (Jason).
> > > > > * Properly set domain bounce size to divide equally between nas (Jason).
> > > > > * Exclude "padding" member from the only >V1 members in
> > > > > vduse_dev_request.
> > > > >
> > > > > v4:
> > > > > * Divide each domain bounce size between the device bounce size (Jason).
> > > > > * revert unneeded addr = NULL assignment (Jason)
> > > > > * Change if (x && (y || z)) return to if (x) { if (y) return; if (z)
> > > > > return; } (Jason)
> > > > > * Change a bad multiline comment, using @ caracter instead of * (Jason).
> > > > > * Consider config->nas == 0 as a fail (Jason).
> > > > >
> > > > > v3:
> > > > > * Get the vduse domain through the vduse_as in the map functions
> > > > > (Jason).
> > > > > * Squash with the patch creating the vduse_as struct (Jason).
> > > > > * Create VDUSE_DEV_MAX_AS instead of comparing agains a magic number
> > > > > (Jason)
> > > > >
> > > > > v2:
> > > > > * Convert the use of mutex to rwlock.
> > > > >
> > > > > RFC v3:
> > > > > * Increase VDUSE_MAX_VQ_GROUPS to 0xffff (Jason). It was set to a lower
> > > > > value to reduce memory consumption, but vqs are already limited to
> > > > > that value and userspace VDUSE is able to allocate that many vqs.
> > > > > * Remove TODO about merging VDUSE_IOTLB_GET_FD ioctl with
> > > > > VDUSE_IOTLB_GET_INFO.
> > > > > * Use of array_index_nospec in VDUSE device ioctls.
> > > > > * Embed vduse_iotlb_entry into vduse_iotlb_entry_v2.
> > > > > * Move the umem mutex to asid struct so there is no contention between
> > > > > ASIDs.
> > > > >
> > > > > RFC v2:
> > > > > * Make iotlb entry the last one of vduse_iotlb_entry_v2 so the first
> > > > > part of the struct is the same.
> > > > > ---
> > > > > drivers/vdpa/vdpa_user/vduse_dev.c | 348 ++++++++++++++++++++---------
> > > > > include/uapi/linux/vduse.h | 53 ++++-
> > > > > 2 files changed, 292 insertions(+), 109 deletions(-)
> > > > >
> > > > > diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
> > > > > index 97be04f73fbf..c6909d73d06d 100644
> > > > > --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> > > > > +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> > > > > @@ -41,6 +41,7 @@
> > > > >
> > > > > #define VDUSE_DEV_MAX (1U << MINORBITS)
> > > > > #define VDUSE_DEV_MAX_GROUPS 0xffff
> > > > > +#define VDUSE_DEV_MAX_AS 0xffff
> > > > > #define VDUSE_MAX_BOUNCE_SIZE (1024 * 1024 * 1024)
> > > > > #define VDUSE_MIN_BOUNCE_SIZE (1024 * 1024)
> > > > > #define VDUSE_BOUNCE_SIZE (64 * 1024 * 1024)
> > > > > @@ -86,7 +87,14 @@ struct vduse_umem {
> > > > > struct mm_struct *mm;
> > > > > };
> > > > >
> > > > > +struct vduse_as {
> > > > > + struct vduse_iova_domain *domain;
> > > > > + struct vduse_umem *umem;
> > > > > + struct mutex mem_lock;
> > > > > +};
> > > > > +
> > > > > struct vduse_vq_group {
> > > > > + struct vduse_as *as;
> > > > > struct vduse_dev *dev;
> > > > > };
> > > > >
> > > > > @@ -94,7 +102,7 @@ struct vduse_dev {
> > > > > struct vduse_vdpa *vdev;
> > > > > struct device *dev;
> > > > > struct vduse_virtqueue **vqs;
> > > > > - struct vduse_iova_domain *domain;
> > > > > + struct vduse_as *as;
> > > > > char *name;
> > > > > struct mutex lock;
> > > > > spinlock_t msg_lock;
> > > > > @@ -122,9 +130,8 @@ struct vduse_dev {
> > > > > u32 vq_num;
> > > > > u32 vq_align;
> > > > > u32 ngroups;
> > > > > - struct vduse_umem *umem;
> > > > > + u32 nas;
> > > > > struct vduse_vq_group *groups;
> > > > > - struct mutex mem_lock;
> > > > > unsigned int bounce_size;
> > > > > struct mutex domain_lock;
> > > > > };
> > > > > @@ -314,7 +321,7 @@ static int vduse_dev_set_status(struct vduse_dev *dev, u8 status)
> > > > > return vduse_dev_msg_sync(dev, &msg);
> > > > > }
> > > > >
> > > > > -static int vduse_dev_update_iotlb(struct vduse_dev *dev,
> > > > > +static int vduse_dev_update_iotlb(struct vduse_dev *dev, u32 asid,
> > > > > u64 start, u64 last)
> > > > > {
> > > > > struct vduse_dev_msg msg = { 0 };
> > > > > @@ -323,8 +330,14 @@ static int vduse_dev_update_iotlb(struct vduse_dev *dev,
> > > > > return -EINVAL;
> > > > >
> > > > > msg.req.type = VDUSE_UPDATE_IOTLB;
> > > > > - msg.req.iova.start = start;
> > > > > - msg.req.iova.last = last;
> > > > > + if (dev->api_version < VDUSE_API_VERSION_1) {
> > > > > + msg.req.iova.start = start;
> > > > > + msg.req.iova.last = last;
> > > > > + } else {
> > > > > + msg.req.iova_v2.start = start;
> > > > > + msg.req.iova_v2.last = last;
> > > > > + msg.req.iova_v2.asid = asid;
> > > > > + }
> > > > >
> > > > > return vduse_dev_msg_sync(dev, &msg);
> > > > > }
> > > > > @@ -436,14 +449,29 @@ static __poll_t vduse_dev_poll(struct file *file, poll_table *wait)
> > > > > return mask;
> > > > > }
> > > > >
> > > > > +/* Force set the asid to a vq group without a message to the VDUSE device */
> > > > > +static void vduse_set_group_asid_nomsg(struct vduse_dev *dev,
> > > > > + unsigned int group, unsigned int asid)
> > > > > +{
> > > > > + mutex_lock(&dev->domain_lock);
> > > > > + dev->groups[group].as = &dev->as[asid];
> > > > > + mutex_unlock(&dev->domain_lock);
> > > > > +}
> > > > > +
> > > > > static void vduse_dev_reset(struct vduse_dev *dev)
> > > > > {
> > > > > int i;
> > > > > - struct vduse_iova_domain *domain = dev->domain;
> > > > >
> > > > > /* The coherent mappings are handled in vduse_dev_free_coherent() */
> > > > > - if (domain && domain->bounce_map)
> > > > > - vduse_domain_reset_bounce_map(domain);
> > > > > + for (i = 0; i < dev->nas; i++) {
> > > > > + struct vduse_iova_domain *domain = dev->as[i].domain;
> > > > > +
> > > > > + if (domain && domain->bounce_map)
> > > > > + vduse_domain_reset_bounce_map(domain);
> > > > > + }
> > > > > +
> > > > > + for (i = 0; i < dev->ngroups; i++)
> > > > > + vduse_set_group_asid_nomsg(dev, i, 0);
> > > > >
> > > > > down_write(&dev->rwsem);
> > > > >
> > > > > @@ -623,6 +651,29 @@ static union virtio_map vduse_get_vq_map(struct vdpa_device *vdpa, u16 idx)
> > > > > return ret;
> > > > > }
> > > > >
> > > > > +static int vduse_set_group_asid(struct vdpa_device *vdpa, unsigned int group,
> > > > > + unsigned int asid)
> > > > > +{
> > > > > + struct vduse_dev *dev = vdpa_to_vduse(vdpa);
> > > > > + struct vduse_dev_msg msg = { 0 };
> > > > > + int r;
> > > > > +
> > > > > + if (dev->api_version < VDUSE_API_VERSION_1 ||
> > > > > + group >= dev->ngroups || asid >= dev->nas)
> > > > > + return -EINVAL;
> > > > > +
> > > > > + msg.req.type = VDUSE_SET_VQ_GROUP_ASID;
> > > > > + msg.req.vq_group_asid.group = group;
> > > > > + msg.req.vq_group_asid.asid = asid;
> > > > > +
> > > > > + r = vduse_dev_msg_sync(dev, &msg);
> > > > > + if (r < 0)
> > > > > + return r;
> > > > > +
> > > > > + vduse_set_group_asid_nomsg(dev, group, asid);
> > > > > + return 0;
> > > > > +}
> > > > > +
> > > > > static int vduse_vdpa_get_vq_state(struct vdpa_device *vdpa, u16 idx,
> > > > > struct vdpa_vq_state *state)
> > > > > {
> > > > > @@ -794,13 +845,13 @@ static int vduse_vdpa_set_map(struct vdpa_device *vdpa,
> > > > > struct vduse_dev *dev = vdpa_to_vduse(vdpa);
> > > > > int ret;
> > > > >
> > > > > - ret = vduse_domain_set_map(dev->domain, iotlb);
> > > > > + ret = vduse_domain_set_map(dev->as[asid].domain, iotlb);
> > > > > if (ret)
> > > > > return ret;
> > > > >
> > > > > - ret = vduse_dev_update_iotlb(dev, 0ULL, ULLONG_MAX);
> > > > > + ret = vduse_dev_update_iotlb(dev, asid, 0ULL, ULLONG_MAX);
> > > > > if (ret) {
> > > > > - vduse_domain_clear_map(dev->domain, iotlb);
> > > > > + vduse_domain_clear_map(dev->as[asid].domain, iotlb);
> > > > > return ret;
> > > > > }
> > > > >
> > > > > @@ -843,6 +894,7 @@ static const struct vdpa_config_ops vduse_vdpa_config_ops = {
> > > > > .get_vq_affinity = vduse_vdpa_get_vq_affinity,
> > > > > .reset = vduse_vdpa_reset,
> > > > > .set_map = vduse_vdpa_set_map,
> > > > > + .set_group_asid = vduse_set_group_asid,
> > > > > .get_vq_map = vduse_get_vq_map,
> > > > > .free = vduse_vdpa_free,
> > > > > };
> > > > > @@ -858,9 +910,10 @@ static void vduse_dev_sync_single_for_device(union virtio_map token,
> > > > > return;
> > > > >
> > > > > vdev = token.group->dev;
> > > > > - domain = vdev->domain;
> > > > > -
> > > > > + mutex_lock(&vdev->domain_lock);
> > > > > + domain = token.group->as->domain;
> > > > > vduse_domain_sync_single_for_device(domain, dma_addr, size, dir);
> > > > > + mutex_unlock(&vdev->domain_lock);
> > > > > }
> > > > >
> > > > > static void vduse_dev_sync_single_for_cpu(union virtio_map token,
> > > > > @@ -874,9 +927,10 @@ static void vduse_dev_sync_single_for_cpu(union virtio_map token,
> > > > > return;
> > > > >
> > > > > vdev = token.group->dev;
> > > > > - domain = vdev->domain;
> > > > > -
> > > > > + mutex_lock(&vdev->domain_lock);
> > > > > + domain = token.group->as->domain;
> > > > > vduse_domain_sync_single_for_cpu(domain, dma_addr, size, dir);
> > > > > + mutex_unlock(&vdev->domain_lock);
> > > > > }
> > > > >
> > > > > static dma_addr_t vduse_dev_map_page(union virtio_map token, struct page *page,
> > > > > @@ -886,14 +940,18 @@ static dma_addr_t vduse_dev_map_page(union virtio_map token, struct page *page,
> > > > > {
> > > > > struct vduse_dev *vdev;
> > > > > struct vduse_iova_domain *domain;
> > > > > + dma_addr_t r;
> > > > >
> > > > > if (!token.group)
> > > > > return DMA_MAPPING_ERROR;
> > > > >
> > > > > vdev = token.group->dev;
> > > > > - domain = vdev->domain;
> > > > > + mutex_lock(&vdev->domain_lock);
> > > >
> > > > The mutex_lock can't be used here since the dma ops might be called in
> > > > atomic context. And I think we can just remove it since creation and
> > > > deletion operations of the iova domain are guaranteed not to execute
> > > > concurrently with I/O operations.
> > > >
> > >
> > > That would be great indeed! Can you expand on this, what protects here
> > > from the moment the two syscalls are issues from userland?
> > >
> >
> > The domain mutex lock is introduced in commit
> > d4438d23eeeef8bb1b3a8abe418abffd60bb544a ("vduse: Delay iova domain
> > creation"). It's used to prevent concurrent execution between
> > vdpa_dev_add() and some vduse device ioctl which needs to access the
> > iova domain such as VDUSE_IOTLB_REG_UMEM/DEREG_UMEM. But the dma ops
> > would not be called until vdpa_dev_add() completed, so the mutex lock
> > is not needed.
> >
>
> Yes, but the usage is extended here to also protect from concurrent
> calls to vduse_dev_map_page and vduse_set_group_asid or similar.
>
> So I guess the best is to replace it with a spinlock or RCU.
>
OK, I see, we simply aim to prevent concurrent access to the group->as
variable here. But I wonder if the .set_group_asid function can be
called during I/O. I think the unmap_page() would fail if we change
the group->as between map_page() and unmap_page(). Besides, it seems
that .set_group_asid is only called in the vhost-vdpa case currently,
but the dma ops such as map_page/unmap_page only service the
virtio-vdpa case, so they would not be called concurrently with
.set_group_asid now.
Thanks,
Yongji
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Re: Re: [PATCH v8 5/6] vduse: add vq group asid support
2025-11-03 2:55 ` Yongji Xie
@ 2025-11-03 6:40 ` Eugenio Perez Martin
2025-11-03 7:13 ` Yongji Xie
0 siblings, 1 reply; 24+ messages in thread
From: Eugenio Perez Martin @ 2025-11-03 6:40 UTC (permalink / raw)
To: Yongji Xie
Cc: Michael S . Tsirkin, Laurent Vivier, Maxime Coquelin,
virtualization, Stefano Garzarella, Jason Wang, Cindy Lu,
Xuan Zhuo, linux-kernel
On Mon, Nov 3, 2025 at 3:55 AM Yongji Xie <xieyongji@bytedance.com> wrote:
>
> On Fri, Oct 31, 2025 at 10:09 PM Eugenio Perez Martin
> <eperezma@redhat.com> wrote:
> >
> > On Fri, Oct 31, 2025 at 8:01 AM Yongji Xie <xieyongji@bytedance.com> wrote:
> > >
> > > On Fri, Oct 31, 2025 at 2:31 PM Eugenio Perez Martin
> > > <eperezma@redhat.com> wrote:
> > > >
> > > > On Thu, Oct 30, 2025 at 12:52 PM Yongji Xie <xieyongji@bytedance.com> wrote:
> > > > >
> > > > > On Thu, Oct 30, 2025 at 6:01 PM Eugenio Pérez <eperezma@redhat.com> wrote:
> > > > > >
> > > > > > Add support for assigning Address Space Identifiers (ASIDs) to each VQ
> > > > > > group. This enables mapping each group into a distinct memory space.
> > > > > >
> > > > > > Now that the driver can change ASID in the middle of operation, the
> > > > > > domain that each vq address point is also protected by domain_lock.
> > > > > >
> > > > > > Acked-by: Jason Wang <jasowang@redhat.com>
> > > > > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > > > > ---
> > > > > > v8:
> > > > > > * Revert the mutex to rwlock change, it needs proper profiling to
> > > > > > justify it.
> > > > > >
> > > > > > v7:
> > > > > > * Take write lock in the error path (Jason).
> > > > > >
> > > > > > v6:
> > > > > > * Make vdpa_dev_add use gotos for error handling (MST).
> > > > > > * s/(dev->api_version < 1) ?/(dev->api_version < VDUSE_API_VERSION_1) ?/
> > > > > > (MST).
> > > > > > * Fix struct name not matching in the doc.
> > > > > >
> > > > > > v5:
> > > > > > * Properly return errno if copy_to_user returns >0 in VDUSE_IOTLB_GET_FD
> > > > > > ioctl (Jason).
> > > > > > * Properly set domain bounce size to divide equally between nas (Jason).
> > > > > > * Exclude "padding" member from the only >V1 members in
> > > > > > vduse_dev_request.
> > > > > >
> > > > > > v4:
> > > > > > * Divide each domain bounce size between the device bounce size (Jason).
> > > > > > * revert unneeded addr = NULL assignment (Jason)
> > > > > > * Change if (x && (y || z)) return to if (x) { if (y) return; if (z)
> > > > > > return; } (Jason)
> > > > > > * Change a bad multiline comment, using @ caracter instead of * (Jason).
> > > > > > * Consider config->nas == 0 as a fail (Jason).
> > > > > >
> > > > > > v3:
> > > > > > * Get the vduse domain through the vduse_as in the map functions
> > > > > > (Jason).
> > > > > > * Squash with the patch creating the vduse_as struct (Jason).
> > > > > > * Create VDUSE_DEV_MAX_AS instead of comparing agains a magic number
> > > > > > (Jason)
> > > > > >
> > > > > > v2:
> > > > > > * Convert the use of mutex to rwlock.
> > > > > >
> > > > > > RFC v3:
> > > > > > * Increase VDUSE_MAX_VQ_GROUPS to 0xffff (Jason). It was set to a lower
> > > > > > value to reduce memory consumption, but vqs are already limited to
> > > > > > that value and userspace VDUSE is able to allocate that many vqs.
> > > > > > * Remove TODO about merging VDUSE_IOTLB_GET_FD ioctl with
> > > > > > VDUSE_IOTLB_GET_INFO.
> > > > > > * Use of array_index_nospec in VDUSE device ioctls.
> > > > > > * Embed vduse_iotlb_entry into vduse_iotlb_entry_v2.
> > > > > > * Move the umem mutex to asid struct so there is no contention between
> > > > > > ASIDs.
> > > > > >
> > > > > > RFC v2:
> > > > > > * Make iotlb entry the last one of vduse_iotlb_entry_v2 so the first
> > > > > > part of the struct is the same.
> > > > > > ---
> > > > > > drivers/vdpa/vdpa_user/vduse_dev.c | 348 ++++++++++++++++++++---------
> > > > > > include/uapi/linux/vduse.h | 53 ++++-
> > > > > > 2 files changed, 292 insertions(+), 109 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
> > > > > > index 97be04f73fbf..c6909d73d06d 100644
> > > > > > --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> > > > > > +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> > > > > > @@ -41,6 +41,7 @@
> > > > > >
> > > > > > #define VDUSE_DEV_MAX (1U << MINORBITS)
> > > > > > #define VDUSE_DEV_MAX_GROUPS 0xffff
> > > > > > +#define VDUSE_DEV_MAX_AS 0xffff
> > > > > > #define VDUSE_MAX_BOUNCE_SIZE (1024 * 1024 * 1024)
> > > > > > #define VDUSE_MIN_BOUNCE_SIZE (1024 * 1024)
> > > > > > #define VDUSE_BOUNCE_SIZE (64 * 1024 * 1024)
> > > > > > @@ -86,7 +87,14 @@ struct vduse_umem {
> > > > > > struct mm_struct *mm;
> > > > > > };
> > > > > >
> > > > > > +struct vduse_as {
> > > > > > + struct vduse_iova_domain *domain;
> > > > > > + struct vduse_umem *umem;
> > > > > > + struct mutex mem_lock;
> > > > > > +};
> > > > > > +
> > > > > > struct vduse_vq_group {
> > > > > > + struct vduse_as *as;
> > > > > > struct vduse_dev *dev;
> > > > > > };
> > > > > >
> > > > > > @@ -94,7 +102,7 @@ struct vduse_dev {
> > > > > > struct vduse_vdpa *vdev;
> > > > > > struct device *dev;
> > > > > > struct vduse_virtqueue **vqs;
> > > > > > - struct vduse_iova_domain *domain;
> > > > > > + struct vduse_as *as;
> > > > > > char *name;
> > > > > > struct mutex lock;
> > > > > > spinlock_t msg_lock;
> > > > > > @@ -122,9 +130,8 @@ struct vduse_dev {
> > > > > > u32 vq_num;
> > > > > > u32 vq_align;
> > > > > > u32 ngroups;
> > > > > > - struct vduse_umem *umem;
> > > > > > + u32 nas;
> > > > > > struct vduse_vq_group *groups;
> > > > > > - struct mutex mem_lock;
> > > > > > unsigned int bounce_size;
> > > > > > struct mutex domain_lock;
> > > > > > };
> > > > > > @@ -314,7 +321,7 @@ static int vduse_dev_set_status(struct vduse_dev *dev, u8 status)
> > > > > > return vduse_dev_msg_sync(dev, &msg);
> > > > > > }
> > > > > >
> > > > > > -static int vduse_dev_update_iotlb(struct vduse_dev *dev,
> > > > > > +static int vduse_dev_update_iotlb(struct vduse_dev *dev, u32 asid,
> > > > > > u64 start, u64 last)
> > > > > > {
> > > > > > struct vduse_dev_msg msg = { 0 };
> > > > > > @@ -323,8 +330,14 @@ static int vduse_dev_update_iotlb(struct vduse_dev *dev,
> > > > > > return -EINVAL;
> > > > > >
> > > > > > msg.req.type = VDUSE_UPDATE_IOTLB;
> > > > > > - msg.req.iova.start = start;
> > > > > > - msg.req.iova.last = last;
> > > > > > + if (dev->api_version < VDUSE_API_VERSION_1) {
> > > > > > + msg.req.iova.start = start;
> > > > > > + msg.req.iova.last = last;
> > > > > > + } else {
> > > > > > + msg.req.iova_v2.start = start;
> > > > > > + msg.req.iova_v2.last = last;
> > > > > > + msg.req.iova_v2.asid = asid;
> > > > > > + }
> > > > > >
> > > > > > return vduse_dev_msg_sync(dev, &msg);
> > > > > > }
> > > > > > @@ -436,14 +449,29 @@ static __poll_t vduse_dev_poll(struct file *file, poll_table *wait)
> > > > > > return mask;
> > > > > > }
> > > > > >
> > > > > > +/* Force set the asid to a vq group without a message to the VDUSE device */
> > > > > > +static void vduse_set_group_asid_nomsg(struct vduse_dev *dev,
> > > > > > + unsigned int group, unsigned int asid)
> > > > > > +{
> > > > > > + mutex_lock(&dev->domain_lock);
> > > > > > + dev->groups[group].as = &dev->as[asid];
> > > > > > + mutex_unlock(&dev->domain_lock);
> > > > > > +}
> > > > > > +
> > > > > > static void vduse_dev_reset(struct vduse_dev *dev)
> > > > > > {
> > > > > > int i;
> > > > > > - struct vduse_iova_domain *domain = dev->domain;
> > > > > >
> > > > > > /* The coherent mappings are handled in vduse_dev_free_coherent() */
> > > > > > - if (domain && domain->bounce_map)
> > > > > > - vduse_domain_reset_bounce_map(domain);
> > > > > > + for (i = 0; i < dev->nas; i++) {
> > > > > > + struct vduse_iova_domain *domain = dev->as[i].domain;
> > > > > > +
> > > > > > + if (domain && domain->bounce_map)
> > > > > > + vduse_domain_reset_bounce_map(domain);
> > > > > > + }
> > > > > > +
> > > > > > + for (i = 0; i < dev->ngroups; i++)
> > > > > > + vduse_set_group_asid_nomsg(dev, i, 0);
> > > > > >
> > > > > > down_write(&dev->rwsem);
> > > > > >
> > > > > > @@ -623,6 +651,29 @@ static union virtio_map vduse_get_vq_map(struct vdpa_device *vdpa, u16 idx)
> > > > > > return ret;
> > > > > > }
> > > > > >
> > > > > > +static int vduse_set_group_asid(struct vdpa_device *vdpa, unsigned int group,
> > > > > > + unsigned int asid)
> > > > > > +{
> > > > > > + struct vduse_dev *dev = vdpa_to_vduse(vdpa);
> > > > > > + struct vduse_dev_msg msg = { 0 };
> > > > > > + int r;
> > > > > > +
> > > > > > + if (dev->api_version < VDUSE_API_VERSION_1 ||
> > > > > > + group >= dev->ngroups || asid >= dev->nas)
> > > > > > + return -EINVAL;
> > > > > > +
> > > > > > + msg.req.type = VDUSE_SET_VQ_GROUP_ASID;
> > > > > > + msg.req.vq_group_asid.group = group;
> > > > > > + msg.req.vq_group_asid.asid = asid;
> > > > > > +
> > > > > > + r = vduse_dev_msg_sync(dev, &msg);
> > > > > > + if (r < 0)
> > > > > > + return r;
> > > > > > +
> > > > > > + vduse_set_group_asid_nomsg(dev, group, asid);
> > > > > > + return 0;
> > > > > > +}
> > > > > > +
> > > > > > static int vduse_vdpa_get_vq_state(struct vdpa_device *vdpa, u16 idx,
> > > > > > struct vdpa_vq_state *state)
> > > > > > {
> > > > > > @@ -794,13 +845,13 @@ static int vduse_vdpa_set_map(struct vdpa_device *vdpa,
> > > > > > struct vduse_dev *dev = vdpa_to_vduse(vdpa);
> > > > > > int ret;
> > > > > >
> > > > > > - ret = vduse_domain_set_map(dev->domain, iotlb);
> > > > > > + ret = vduse_domain_set_map(dev->as[asid].domain, iotlb);
> > > > > > if (ret)
> > > > > > return ret;
> > > > > >
> > > > > > - ret = vduse_dev_update_iotlb(dev, 0ULL, ULLONG_MAX);
> > > > > > + ret = vduse_dev_update_iotlb(dev, asid, 0ULL, ULLONG_MAX);
> > > > > > if (ret) {
> > > > > > - vduse_domain_clear_map(dev->domain, iotlb);
> > > > > > + vduse_domain_clear_map(dev->as[asid].domain, iotlb);
> > > > > > return ret;
> > > > > > }
> > > > > >
> > > > > > @@ -843,6 +894,7 @@ static const struct vdpa_config_ops vduse_vdpa_config_ops = {
> > > > > > .get_vq_affinity = vduse_vdpa_get_vq_affinity,
> > > > > > .reset = vduse_vdpa_reset,
> > > > > > .set_map = vduse_vdpa_set_map,
> > > > > > + .set_group_asid = vduse_set_group_asid,
> > > > > > .get_vq_map = vduse_get_vq_map,
> > > > > > .free = vduse_vdpa_free,
> > > > > > };
> > > > > > @@ -858,9 +910,10 @@ static void vduse_dev_sync_single_for_device(union virtio_map token,
> > > > > > return;
> > > > > >
> > > > > > vdev = token.group->dev;
> > > > > > - domain = vdev->domain;
> > > > > > -
> > > > > > + mutex_lock(&vdev->domain_lock);
> > > > > > + domain = token.group->as->domain;
> > > > > > vduse_domain_sync_single_for_device(domain, dma_addr, size, dir);
> > > > > > + mutex_unlock(&vdev->domain_lock);
> > > > > > }
> > > > > >
> > > > > > static void vduse_dev_sync_single_for_cpu(union virtio_map token,
> > > > > > @@ -874,9 +927,10 @@ static void vduse_dev_sync_single_for_cpu(union virtio_map token,
> > > > > > return;
> > > > > >
> > > > > > vdev = token.group->dev;
> > > > > > - domain = vdev->domain;
> > > > > > -
> > > > > > + mutex_lock(&vdev->domain_lock);
> > > > > > + domain = token.group->as->domain;
> > > > > > vduse_domain_sync_single_for_cpu(domain, dma_addr, size, dir);
> > > > > > + mutex_unlock(&vdev->domain_lock);
> > > > > > }
> > > > > >
> > > > > > static dma_addr_t vduse_dev_map_page(union virtio_map token, struct page *page,
> > > > > > @@ -886,14 +940,18 @@ static dma_addr_t vduse_dev_map_page(union virtio_map token, struct page *page,
> > > > > > {
> > > > > > struct vduse_dev *vdev;
> > > > > > struct vduse_iova_domain *domain;
> > > > > > + dma_addr_t r;
> > > > > >
> > > > > > if (!token.group)
> > > > > > return DMA_MAPPING_ERROR;
> > > > > >
> > > > > > vdev = token.group->dev;
> > > > > > - domain = vdev->domain;
> > > > > > + mutex_lock(&vdev->domain_lock);
> > > > >
> > > > > The mutex_lock can't be used here since the dma ops might be called in
> > > > > atomic context. And I think we can just remove it since creation and
> > > > > deletion operations of the iova domain are guaranteed not to execute
> > > > > concurrently with I/O operations.
> > > > >
> > > >
> > > > That would be great indeed! Can you expand on this, what protects here
> > > > from the moment the two syscalls are issues from userland?
> > > >
> > >
> > > The domain mutex lock is introduced in commit
> > > d4438d23eeeef8bb1b3a8abe418abffd60bb544a ("vduse: Delay iova domain
> > > creation"). It's used to prevent concurrent execution between
> > > vdpa_dev_add() and some vduse device ioctl which needs to access the
> > > iova domain such as VDUSE_IOTLB_REG_UMEM/DEREG_UMEM. But the dma ops
> > > would not be called until vdpa_dev_add() completed, so the mutex lock
> > > is not needed.
> > >
> >
> > Yes, but the usage is extended here to also protect from concurrent
> > calls to vduse_dev_map_page and vduse_set_group_asid or similar.
> >
> > So I guess the best is to replace it with a spinlock or RCU.
> >
>
> OK, I see, we simply aim to prevent concurrent access to the group->as
> variable here. But I wonder if the .set_group_asid function can be
> called during I/O.
At least a malicious userland app could do it.
> I think the unmap_page() would fail if we change
> the group->as between map_page() and unmap_page().
I guess a SIGSEGV or similar could be a valid outcome of that, but we
need to protect the "as" pointer anyway.
> Besides, it seems
> that .set_group_asid is only called in the vhost-vdpa case currently,
> but the dma ops such as map_page/unmap_page only service the
> virtio-vdpa case, so they would not be called concurrently with
> .set_group_asid now.
>
That's totally true but a malicious app can do it anyway. So it is
better to protect it properly.
Preparing a new version with a spinlock or RCU, thanks!
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Re: Re: Re: [PATCH v8 5/6] vduse: add vq group asid support
2025-11-03 6:40 ` Eugenio Perez Martin
@ 2025-11-03 7:13 ` Yongji Xie
2025-11-03 8:19 ` Eugenio Perez Martin
0 siblings, 1 reply; 24+ messages in thread
From: Yongji Xie @ 2025-11-03 7:13 UTC (permalink / raw)
To: Eugenio Perez Martin
Cc: Michael S . Tsirkin, Laurent Vivier, Maxime Coquelin,
virtualization, Stefano Garzarella, Jason Wang, Cindy Lu,
Xuan Zhuo, linux-kernel
On Mon, Nov 3, 2025 at 2:41 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
>
> On Mon, Nov 3, 2025 at 3:55 AM Yongji Xie <xieyongji@bytedance.com> wrote:
> >
> > On Fri, Oct 31, 2025 at 10:09 PM Eugenio Perez Martin
> > <eperezma@redhat.com> wrote:
> > >
> > > On Fri, Oct 31, 2025 at 8:01 AM Yongji Xie <xieyongji@bytedance.com> wrote:
> > > >
> > > > On Fri, Oct 31, 2025 at 2:31 PM Eugenio Perez Martin
> > > > <eperezma@redhat.com> wrote:
> > > > >
> > > > > On Thu, Oct 30, 2025 at 12:52 PM Yongji Xie <xieyongji@bytedance.com> wrote:
> > > > > >
> > > > > > On Thu, Oct 30, 2025 at 6:01 PM Eugenio Pérez <eperezma@redhat.com> wrote:
> > > > > > >
> > > > > > > Add support for assigning Address Space Identifiers (ASIDs) to each VQ
> > > > > > > group. This enables mapping each group into a distinct memory space.
> > > > > > >
> > > > > > > Now that the driver can change ASID in the middle of operation, the
> > > > > > > domain that each vq address point is also protected by domain_lock.
> > > > > > >
> > > > > > > Acked-by: Jason Wang <jasowang@redhat.com>
> > > > > > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > > > > > ---
> > > > > > > v8:
> > > > > > > * Revert the mutex to rwlock change, it needs proper profiling to
> > > > > > > justify it.
> > > > > > >
> > > > > > > v7:
> > > > > > > * Take write lock in the error path (Jason).
> > > > > > >
> > > > > > > v6:
> > > > > > > * Make vdpa_dev_add use gotos for error handling (MST).
> > > > > > > * s/(dev->api_version < 1) ?/(dev->api_version < VDUSE_API_VERSION_1) ?/
> > > > > > > (MST).
> > > > > > > * Fix struct name not matching in the doc.
> > > > > > >
> > > > > > > v5:
> > > > > > > * Properly return errno if copy_to_user returns >0 in VDUSE_IOTLB_GET_FD
> > > > > > > ioctl (Jason).
> > > > > > > * Properly set domain bounce size to divide equally between nas (Jason).
> > > > > > > * Exclude "padding" member from the only >V1 members in
> > > > > > > vduse_dev_request.
> > > > > > >
> > > > > > > v4:
> > > > > > > * Divide each domain bounce size between the device bounce size (Jason).
> > > > > > > * revert unneeded addr = NULL assignment (Jason)
> > > > > > > * Change if (x && (y || z)) return to if (x) { if (y) return; if (z)
> > > > > > > return; } (Jason)
> > > > > > > * Change a bad multiline comment, using @ caracter instead of * (Jason).
> > > > > > > * Consider config->nas == 0 as a fail (Jason).
> > > > > > >
> > > > > > > v3:
> > > > > > > * Get the vduse domain through the vduse_as in the map functions
> > > > > > > (Jason).
> > > > > > > * Squash with the patch creating the vduse_as struct (Jason).
> > > > > > > * Create VDUSE_DEV_MAX_AS instead of comparing agains a magic number
> > > > > > > (Jason)
> > > > > > >
> > > > > > > v2:
> > > > > > > * Convert the use of mutex to rwlock.
> > > > > > >
> > > > > > > RFC v3:
> > > > > > > * Increase VDUSE_MAX_VQ_GROUPS to 0xffff (Jason). It was set to a lower
> > > > > > > value to reduce memory consumption, but vqs are already limited to
> > > > > > > that value and userspace VDUSE is able to allocate that many vqs.
> > > > > > > * Remove TODO about merging VDUSE_IOTLB_GET_FD ioctl with
> > > > > > > VDUSE_IOTLB_GET_INFO.
> > > > > > > * Use of array_index_nospec in VDUSE device ioctls.
> > > > > > > * Embed vduse_iotlb_entry into vduse_iotlb_entry_v2.
> > > > > > > * Move the umem mutex to asid struct so there is no contention between
> > > > > > > ASIDs.
> > > > > > >
> > > > > > > RFC v2:
> > > > > > > * Make iotlb entry the last one of vduse_iotlb_entry_v2 so the first
> > > > > > > part of the struct is the same.
> > > > > > > ---
> > > > > > > drivers/vdpa/vdpa_user/vduse_dev.c | 348 ++++++++++++++++++++---------
> > > > > > > include/uapi/linux/vduse.h | 53 ++++-
> > > > > > > 2 files changed, 292 insertions(+), 109 deletions(-)
> > > > > > >
> > > > > > > diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
> > > > > > > index 97be04f73fbf..c6909d73d06d 100644
> > > > > > > --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> > > > > > > +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> > > > > > > @@ -41,6 +41,7 @@
> > > > > > >
> > > > > > > #define VDUSE_DEV_MAX (1U << MINORBITS)
> > > > > > > #define VDUSE_DEV_MAX_GROUPS 0xffff
> > > > > > > +#define VDUSE_DEV_MAX_AS 0xffff
> > > > > > > #define VDUSE_MAX_BOUNCE_SIZE (1024 * 1024 * 1024)
> > > > > > > #define VDUSE_MIN_BOUNCE_SIZE (1024 * 1024)
> > > > > > > #define VDUSE_BOUNCE_SIZE (64 * 1024 * 1024)
> > > > > > > @@ -86,7 +87,14 @@ struct vduse_umem {
> > > > > > > struct mm_struct *mm;
> > > > > > > };
> > > > > > >
> > > > > > > +struct vduse_as {
> > > > > > > + struct vduse_iova_domain *domain;
> > > > > > > + struct vduse_umem *umem;
> > > > > > > + struct mutex mem_lock;
> > > > > > > +};
> > > > > > > +
> > > > > > > struct vduse_vq_group {
> > > > > > > + struct vduse_as *as;
> > > > > > > struct vduse_dev *dev;
> > > > > > > };
> > > > > > >
> > > > > > > @@ -94,7 +102,7 @@ struct vduse_dev {
> > > > > > > struct vduse_vdpa *vdev;
> > > > > > > struct device *dev;
> > > > > > > struct vduse_virtqueue **vqs;
> > > > > > > - struct vduse_iova_domain *domain;
> > > > > > > + struct vduse_as *as;
> > > > > > > char *name;
> > > > > > > struct mutex lock;
> > > > > > > spinlock_t msg_lock;
> > > > > > > @@ -122,9 +130,8 @@ struct vduse_dev {
> > > > > > > u32 vq_num;
> > > > > > > u32 vq_align;
> > > > > > > u32 ngroups;
> > > > > > > - struct vduse_umem *umem;
> > > > > > > + u32 nas;
> > > > > > > struct vduse_vq_group *groups;
> > > > > > > - struct mutex mem_lock;
> > > > > > > unsigned int bounce_size;
> > > > > > > struct mutex domain_lock;
> > > > > > > };
> > > > > > > @@ -314,7 +321,7 @@ static int vduse_dev_set_status(struct vduse_dev *dev, u8 status)
> > > > > > > return vduse_dev_msg_sync(dev, &msg);
> > > > > > > }
> > > > > > >
> > > > > > > -static int vduse_dev_update_iotlb(struct vduse_dev *dev,
> > > > > > > +static int vduse_dev_update_iotlb(struct vduse_dev *dev, u32 asid,
> > > > > > > u64 start, u64 last)
> > > > > > > {
> > > > > > > struct vduse_dev_msg msg = { 0 };
> > > > > > > @@ -323,8 +330,14 @@ static int vduse_dev_update_iotlb(struct vduse_dev *dev,
> > > > > > > return -EINVAL;
> > > > > > >
> > > > > > > msg.req.type = VDUSE_UPDATE_IOTLB;
> > > > > > > - msg.req.iova.start = start;
> > > > > > > - msg.req.iova.last = last;
> > > > > > > + if (dev->api_version < VDUSE_API_VERSION_1) {
> > > > > > > + msg.req.iova.start = start;
> > > > > > > + msg.req.iova.last = last;
> > > > > > > + } else {
> > > > > > > + msg.req.iova_v2.start = start;
> > > > > > > + msg.req.iova_v2.last = last;
> > > > > > > + msg.req.iova_v2.asid = asid;
> > > > > > > + }
> > > > > > >
> > > > > > > return vduse_dev_msg_sync(dev, &msg);
> > > > > > > }
> > > > > > > @@ -436,14 +449,29 @@ static __poll_t vduse_dev_poll(struct file *file, poll_table *wait)
> > > > > > > return mask;
> > > > > > > }
> > > > > > >
> > > > > > > +/* Force set the asid to a vq group without a message to the VDUSE device */
> > > > > > > +static void vduse_set_group_asid_nomsg(struct vduse_dev *dev,
> > > > > > > + unsigned int group, unsigned int asid)
> > > > > > > +{
> > > > > > > + mutex_lock(&dev->domain_lock);
> > > > > > > + dev->groups[group].as = &dev->as[asid];
> > > > > > > + mutex_unlock(&dev->domain_lock);
> > > > > > > +}
> > > > > > > +
> > > > > > > static void vduse_dev_reset(struct vduse_dev *dev)
> > > > > > > {
> > > > > > > int i;
> > > > > > > - struct vduse_iova_domain *domain = dev->domain;
> > > > > > >
> > > > > > > /* The coherent mappings are handled in vduse_dev_free_coherent() */
> > > > > > > - if (domain && domain->bounce_map)
> > > > > > > - vduse_domain_reset_bounce_map(domain);
> > > > > > > + for (i = 0; i < dev->nas; i++) {
> > > > > > > + struct vduse_iova_domain *domain = dev->as[i].domain;
> > > > > > > +
> > > > > > > + if (domain && domain->bounce_map)
> > > > > > > + vduse_domain_reset_bounce_map(domain);
> > > > > > > + }
> > > > > > > +
> > > > > > > + for (i = 0; i < dev->ngroups; i++)
> > > > > > > + vduse_set_group_asid_nomsg(dev, i, 0);
> > > > > > >
> > > > > > > down_write(&dev->rwsem);
> > > > > > >
> > > > > > > @@ -623,6 +651,29 @@ static union virtio_map vduse_get_vq_map(struct vdpa_device *vdpa, u16 idx)
> > > > > > > return ret;
> > > > > > > }
> > > > > > >
> > > > > > > +static int vduse_set_group_asid(struct vdpa_device *vdpa, unsigned int group,
> > > > > > > + unsigned int asid)
> > > > > > > +{
> > > > > > > + struct vduse_dev *dev = vdpa_to_vduse(vdpa);
> > > > > > > + struct vduse_dev_msg msg = { 0 };
> > > > > > > + int r;
> > > > > > > +
> > > > > > > + if (dev->api_version < VDUSE_API_VERSION_1 ||
> > > > > > > + group >= dev->ngroups || asid >= dev->nas)
> > > > > > > + return -EINVAL;
> > > > > > > +
> > > > > > > + msg.req.type = VDUSE_SET_VQ_GROUP_ASID;
> > > > > > > + msg.req.vq_group_asid.group = group;
> > > > > > > + msg.req.vq_group_asid.asid = asid;
> > > > > > > +
> > > > > > > + r = vduse_dev_msg_sync(dev, &msg);
> > > > > > > + if (r < 0)
> > > > > > > + return r;
> > > > > > > +
> > > > > > > + vduse_set_group_asid_nomsg(dev, group, asid);
> > > > > > > + return 0;
> > > > > > > +}
> > > > > > > +
> > > > > > > static int vduse_vdpa_get_vq_state(struct vdpa_device *vdpa, u16 idx,
> > > > > > > struct vdpa_vq_state *state)
> > > > > > > {
> > > > > > > @@ -794,13 +845,13 @@ static int vduse_vdpa_set_map(struct vdpa_device *vdpa,
> > > > > > > struct vduse_dev *dev = vdpa_to_vduse(vdpa);
> > > > > > > int ret;
> > > > > > >
> > > > > > > - ret = vduse_domain_set_map(dev->domain, iotlb);
> > > > > > > + ret = vduse_domain_set_map(dev->as[asid].domain, iotlb);
> > > > > > > if (ret)
> > > > > > > return ret;
> > > > > > >
> > > > > > > - ret = vduse_dev_update_iotlb(dev, 0ULL, ULLONG_MAX);
> > > > > > > + ret = vduse_dev_update_iotlb(dev, asid, 0ULL, ULLONG_MAX);
> > > > > > > if (ret) {
> > > > > > > - vduse_domain_clear_map(dev->domain, iotlb);
> > > > > > > + vduse_domain_clear_map(dev->as[asid].domain, iotlb);
> > > > > > > return ret;
> > > > > > > }
> > > > > > >
> > > > > > > @@ -843,6 +894,7 @@ static const struct vdpa_config_ops vduse_vdpa_config_ops = {
> > > > > > > .get_vq_affinity = vduse_vdpa_get_vq_affinity,
> > > > > > > .reset = vduse_vdpa_reset,
> > > > > > > .set_map = vduse_vdpa_set_map,
> > > > > > > + .set_group_asid = vduse_set_group_asid,
> > > > > > > .get_vq_map = vduse_get_vq_map,
> > > > > > > .free = vduse_vdpa_free,
> > > > > > > };
> > > > > > > @@ -858,9 +910,10 @@ static void vduse_dev_sync_single_for_device(union virtio_map token,
> > > > > > > return;
> > > > > > >
> > > > > > > vdev = token.group->dev;
> > > > > > > - domain = vdev->domain;
> > > > > > > -
> > > > > > > + mutex_lock(&vdev->domain_lock);
> > > > > > > + domain = token.group->as->domain;
> > > > > > > vduse_domain_sync_single_for_device(domain, dma_addr, size, dir);
> > > > > > > + mutex_unlock(&vdev->domain_lock);
> > > > > > > }
> > > > > > >
> > > > > > > static void vduse_dev_sync_single_for_cpu(union virtio_map token,
> > > > > > > @@ -874,9 +927,10 @@ static void vduse_dev_sync_single_for_cpu(union virtio_map token,
> > > > > > > return;
> > > > > > >
> > > > > > > vdev = token.group->dev;
> > > > > > > - domain = vdev->domain;
> > > > > > > -
> > > > > > > + mutex_lock(&vdev->domain_lock);
> > > > > > > + domain = token.group->as->domain;
> > > > > > > vduse_domain_sync_single_for_cpu(domain, dma_addr, size, dir);
> > > > > > > + mutex_unlock(&vdev->domain_lock);
> > > > > > > }
> > > > > > >
> > > > > > > static dma_addr_t vduse_dev_map_page(union virtio_map token, struct page *page,
> > > > > > > @@ -886,14 +940,18 @@ static dma_addr_t vduse_dev_map_page(union virtio_map token, struct page *page,
> > > > > > > {
> > > > > > > struct vduse_dev *vdev;
> > > > > > > struct vduse_iova_domain *domain;
> > > > > > > + dma_addr_t r;
> > > > > > >
> > > > > > > if (!token.group)
> > > > > > > return DMA_MAPPING_ERROR;
> > > > > > >
> > > > > > > vdev = token.group->dev;
> > > > > > > - domain = vdev->domain;
> > > > > > > + mutex_lock(&vdev->domain_lock);
> > > > > >
> > > > > > The mutex_lock can't be used here since the dma ops might be called in
> > > > > > atomic context. And I think we can just remove it since creation and
> > > > > > deletion operations of the iova domain are guaranteed not to execute
> > > > > > concurrently with I/O operations.
> > > > > >
> > > > >
> > > > > That would be great indeed! Can you expand on this, what protects here
> > > > > from the moment the two syscalls are issues from userland?
> > > > >
> > > >
> > > > The domain mutex lock is introduced in commit
> > > > d4438d23eeeef8bb1b3a8abe418abffd60bb544a ("vduse: Delay iova domain
> > > > creation"). It's used to prevent concurrent execution between
> > > > vdpa_dev_add() and some vduse device ioctl which needs to access the
> > > > iova domain such as VDUSE_IOTLB_REG_UMEM/DEREG_UMEM. But the dma ops
> > > > would not be called until vdpa_dev_add() completed, so the mutex lock
> > > > is not needed.
> > > >
> > >
> > > Yes, but the usage is extended here to also protect from concurrent
> > > calls to vduse_dev_map_page and vduse_set_group_asid or similar.
> > >
> > > So I guess the best is to replace it with a spinlock or RCU.
> > >
> >
> > OK, I see, we simply aim to prevent concurrent access to the group->as
> > variable here. But I wonder if the .set_group_asid function can be
> > called during I/O.
>
> At least a malicious userland app could do it.
>
> > I think the unmap_page() would fail if we change
> > the group->as between map_page() and unmap_page().
>
> I guess a SIGSEGV or similar could be a valid outcome of that, but we
> need to protect the "as" pointer anyway.
>
> > Besides, it seems
> > that .set_group_asid is only called in the vhost-vdpa case currently,
> > but the dma ops such as map_page/unmap_page only service the
> > virtio-vdpa case, so they would not be called concurrently with
> > .set_group_asid now.
> >
>
> That's totally true but a malicious app can do it anyway. So it is
> better to protect it properly.
>
Actually there is no interface for userspace to call .set_group_asid
if the vduse device is binded to the virtio-vdpa driver.
> Preparing a new version with a spinlock or RCU, thanks!
>
If so, we'd better introduce a new RCU lock here rather than using the
domain lock.
Thanks,
Yongji
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Re: Re: Re: [PATCH v8 5/6] vduse: add vq group asid support
2025-11-03 7:13 ` Yongji Xie
@ 2025-11-03 8:19 ` Eugenio Perez Martin
2025-11-03 8:56 ` Yongji Xie
0 siblings, 1 reply; 24+ messages in thread
From: Eugenio Perez Martin @ 2025-11-03 8:19 UTC (permalink / raw)
To: Yongji Xie
Cc: Michael S . Tsirkin, Laurent Vivier, Maxime Coquelin,
virtualization, Stefano Garzarella, Jason Wang, Cindy Lu,
Xuan Zhuo, linux-kernel
On Mon, Nov 3, 2025 at 8:13 AM Yongji Xie <xieyongji@bytedance.com> wrote:
>
> On Mon, Nov 3, 2025 at 2:41 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
> >
> > On Mon, Nov 3, 2025 at 3:55 AM Yongji Xie <xieyongji@bytedance.com> wrote:
> > >
> > > On Fri, Oct 31, 2025 at 10:09 PM Eugenio Perez Martin
> > > <eperezma@redhat.com> wrote:
> > > >
> > > > On Fri, Oct 31, 2025 at 8:01 AM Yongji Xie <xieyongji@bytedance.com> wrote:
> > > > >
> > > > > On Fri, Oct 31, 2025 at 2:31 PM Eugenio Perez Martin
> > > > > <eperezma@redhat.com> wrote:
> > > > > >
> > > > > > On Thu, Oct 30, 2025 at 12:52 PM Yongji Xie <xieyongji@bytedance.com> wrote:
> > > > > > >
> > > > > > > On Thu, Oct 30, 2025 at 6:01 PM Eugenio Pérez <eperezma@redhat.com> wrote:
> > > > > > > >
> > > > > > > > Add support for assigning Address Space Identifiers (ASIDs) to each VQ
> > > > > > > > group. This enables mapping each group into a distinct memory space.
> > > > > > > >
> > > > > > > > Now that the driver can change ASID in the middle of operation, the
> > > > > > > > domain that each vq address point is also protected by domain_lock.
> > > > > > > >
> > > > > > > > Acked-by: Jason Wang <jasowang@redhat.com>
> > > > > > > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > > > > > > ---
> > > > > > > > v8:
> > > > > > > > * Revert the mutex to rwlock change, it needs proper profiling to
> > > > > > > > justify it.
> > > > > > > >
> > > > > > > > v7:
> > > > > > > > * Take write lock in the error path (Jason).
> > > > > > > >
> > > > > > > > v6:
> > > > > > > > * Make vdpa_dev_add use gotos for error handling (MST).
> > > > > > > > * s/(dev->api_version < 1) ?/(dev->api_version < VDUSE_API_VERSION_1) ?/
> > > > > > > > (MST).
> > > > > > > > * Fix struct name not matching in the doc.
> > > > > > > >
> > > > > > > > v5:
> > > > > > > > * Properly return errno if copy_to_user returns >0 in VDUSE_IOTLB_GET_FD
> > > > > > > > ioctl (Jason).
> > > > > > > > * Properly set domain bounce size to divide equally between nas (Jason).
> > > > > > > > * Exclude "padding" member from the only >V1 members in
> > > > > > > > vduse_dev_request.
> > > > > > > >
> > > > > > > > v4:
> > > > > > > > * Divide each domain bounce size between the device bounce size (Jason).
> > > > > > > > * revert unneeded addr = NULL assignment (Jason)
> > > > > > > > * Change if (x && (y || z)) return to if (x) { if (y) return; if (z)
> > > > > > > > return; } (Jason)
> > > > > > > > * Change a bad multiline comment, using @ caracter instead of * (Jason).
> > > > > > > > * Consider config->nas == 0 as a fail (Jason).
> > > > > > > >
> > > > > > > > v3:
> > > > > > > > * Get the vduse domain through the vduse_as in the map functions
> > > > > > > > (Jason).
> > > > > > > > * Squash with the patch creating the vduse_as struct (Jason).
> > > > > > > > * Create VDUSE_DEV_MAX_AS instead of comparing agains a magic number
> > > > > > > > (Jason)
> > > > > > > >
> > > > > > > > v2:
> > > > > > > > * Convert the use of mutex to rwlock.
> > > > > > > >
> > > > > > > > RFC v3:
> > > > > > > > * Increase VDUSE_MAX_VQ_GROUPS to 0xffff (Jason). It was set to a lower
> > > > > > > > value to reduce memory consumption, but vqs are already limited to
> > > > > > > > that value and userspace VDUSE is able to allocate that many vqs.
> > > > > > > > * Remove TODO about merging VDUSE_IOTLB_GET_FD ioctl with
> > > > > > > > VDUSE_IOTLB_GET_INFO.
> > > > > > > > * Use of array_index_nospec in VDUSE device ioctls.
> > > > > > > > * Embed vduse_iotlb_entry into vduse_iotlb_entry_v2.
> > > > > > > > * Move the umem mutex to asid struct so there is no contention between
> > > > > > > > ASIDs.
> > > > > > > >
> > > > > > > > RFC v2:
> > > > > > > > * Make iotlb entry the last one of vduse_iotlb_entry_v2 so the first
> > > > > > > > part of the struct is the same.
> > > > > > > > ---
> > > > > > > > drivers/vdpa/vdpa_user/vduse_dev.c | 348 ++++++++++++++++++++---------
> > > > > > > > include/uapi/linux/vduse.h | 53 ++++-
> > > > > > > > 2 files changed, 292 insertions(+), 109 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
> > > > > > > > index 97be04f73fbf..c6909d73d06d 100644
> > > > > > > > --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> > > > > > > > +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> > > > > > > > @@ -41,6 +41,7 @@
> > > > > > > >
> > > > > > > > #define VDUSE_DEV_MAX (1U << MINORBITS)
> > > > > > > > #define VDUSE_DEV_MAX_GROUPS 0xffff
> > > > > > > > +#define VDUSE_DEV_MAX_AS 0xffff
> > > > > > > > #define VDUSE_MAX_BOUNCE_SIZE (1024 * 1024 * 1024)
> > > > > > > > #define VDUSE_MIN_BOUNCE_SIZE (1024 * 1024)
> > > > > > > > #define VDUSE_BOUNCE_SIZE (64 * 1024 * 1024)
> > > > > > > > @@ -86,7 +87,14 @@ struct vduse_umem {
> > > > > > > > struct mm_struct *mm;
> > > > > > > > };
> > > > > > > >
> > > > > > > > +struct vduse_as {
> > > > > > > > + struct vduse_iova_domain *domain;
> > > > > > > > + struct vduse_umem *umem;
> > > > > > > > + struct mutex mem_lock;
> > > > > > > > +};
> > > > > > > > +
> > > > > > > > struct vduse_vq_group {
> > > > > > > > + struct vduse_as *as;
> > > > > > > > struct vduse_dev *dev;
> > > > > > > > };
> > > > > > > >
> > > > > > > > @@ -94,7 +102,7 @@ struct vduse_dev {
> > > > > > > > struct vduse_vdpa *vdev;
> > > > > > > > struct device *dev;
> > > > > > > > struct vduse_virtqueue **vqs;
> > > > > > > > - struct vduse_iova_domain *domain;
> > > > > > > > + struct vduse_as *as;
> > > > > > > > char *name;
> > > > > > > > struct mutex lock;
> > > > > > > > spinlock_t msg_lock;
> > > > > > > > @@ -122,9 +130,8 @@ struct vduse_dev {
> > > > > > > > u32 vq_num;
> > > > > > > > u32 vq_align;
> > > > > > > > u32 ngroups;
> > > > > > > > - struct vduse_umem *umem;
> > > > > > > > + u32 nas;
> > > > > > > > struct vduse_vq_group *groups;
> > > > > > > > - struct mutex mem_lock;
> > > > > > > > unsigned int bounce_size;
> > > > > > > > struct mutex domain_lock;
> > > > > > > > };
> > > > > > > > @@ -314,7 +321,7 @@ static int vduse_dev_set_status(struct vduse_dev *dev, u8 status)
> > > > > > > > return vduse_dev_msg_sync(dev, &msg);
> > > > > > > > }
> > > > > > > >
> > > > > > > > -static int vduse_dev_update_iotlb(struct vduse_dev *dev,
> > > > > > > > +static int vduse_dev_update_iotlb(struct vduse_dev *dev, u32 asid,
> > > > > > > > u64 start, u64 last)
> > > > > > > > {
> > > > > > > > struct vduse_dev_msg msg = { 0 };
> > > > > > > > @@ -323,8 +330,14 @@ static int vduse_dev_update_iotlb(struct vduse_dev *dev,
> > > > > > > > return -EINVAL;
> > > > > > > >
> > > > > > > > msg.req.type = VDUSE_UPDATE_IOTLB;
> > > > > > > > - msg.req.iova.start = start;
> > > > > > > > - msg.req.iova.last = last;
> > > > > > > > + if (dev->api_version < VDUSE_API_VERSION_1) {
> > > > > > > > + msg.req.iova.start = start;
> > > > > > > > + msg.req.iova.last = last;
> > > > > > > > + } else {
> > > > > > > > + msg.req.iova_v2.start = start;
> > > > > > > > + msg.req.iova_v2.last = last;
> > > > > > > > + msg.req.iova_v2.asid = asid;
> > > > > > > > + }
> > > > > > > >
> > > > > > > > return vduse_dev_msg_sync(dev, &msg);
> > > > > > > > }
> > > > > > > > @@ -436,14 +449,29 @@ static __poll_t vduse_dev_poll(struct file *file, poll_table *wait)
> > > > > > > > return mask;
> > > > > > > > }
> > > > > > > >
> > > > > > > > +/* Force set the asid to a vq group without a message to the VDUSE device */
> > > > > > > > +static void vduse_set_group_asid_nomsg(struct vduse_dev *dev,
> > > > > > > > + unsigned int group, unsigned int asid)
> > > > > > > > +{
> > > > > > > > + mutex_lock(&dev->domain_lock);
> > > > > > > > + dev->groups[group].as = &dev->as[asid];
> > > > > > > > + mutex_unlock(&dev->domain_lock);
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > static void vduse_dev_reset(struct vduse_dev *dev)
> > > > > > > > {
> > > > > > > > int i;
> > > > > > > > - struct vduse_iova_domain *domain = dev->domain;
> > > > > > > >
> > > > > > > > /* The coherent mappings are handled in vduse_dev_free_coherent() */
> > > > > > > > - if (domain && domain->bounce_map)
> > > > > > > > - vduse_domain_reset_bounce_map(domain);
> > > > > > > > + for (i = 0; i < dev->nas; i++) {
> > > > > > > > + struct vduse_iova_domain *domain = dev->as[i].domain;
> > > > > > > > +
> > > > > > > > + if (domain && domain->bounce_map)
> > > > > > > > + vduse_domain_reset_bounce_map(domain);
> > > > > > > > + }
> > > > > > > > +
> > > > > > > > + for (i = 0; i < dev->ngroups; i++)
> > > > > > > > + vduse_set_group_asid_nomsg(dev, i, 0);
> > > > > > > >
> > > > > > > > down_write(&dev->rwsem);
> > > > > > > >
> > > > > > > > @@ -623,6 +651,29 @@ static union virtio_map vduse_get_vq_map(struct vdpa_device *vdpa, u16 idx)
> > > > > > > > return ret;
> > > > > > > > }
> > > > > > > >
> > > > > > > > +static int vduse_set_group_asid(struct vdpa_device *vdpa, unsigned int group,
> > > > > > > > + unsigned int asid)
> > > > > > > > +{
> > > > > > > > + struct vduse_dev *dev = vdpa_to_vduse(vdpa);
> > > > > > > > + struct vduse_dev_msg msg = { 0 };
> > > > > > > > + int r;
> > > > > > > > +
> > > > > > > > + if (dev->api_version < VDUSE_API_VERSION_1 ||
> > > > > > > > + group >= dev->ngroups || asid >= dev->nas)
> > > > > > > > + return -EINVAL;
> > > > > > > > +
> > > > > > > > + msg.req.type = VDUSE_SET_VQ_GROUP_ASID;
> > > > > > > > + msg.req.vq_group_asid.group = group;
> > > > > > > > + msg.req.vq_group_asid.asid = asid;
> > > > > > > > +
> > > > > > > > + r = vduse_dev_msg_sync(dev, &msg);
> > > > > > > > + if (r < 0)
> > > > > > > > + return r;
> > > > > > > > +
> > > > > > > > + vduse_set_group_asid_nomsg(dev, group, asid);
> > > > > > > > + return 0;
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > static int vduse_vdpa_get_vq_state(struct vdpa_device *vdpa, u16 idx,
> > > > > > > > struct vdpa_vq_state *state)
> > > > > > > > {
> > > > > > > > @@ -794,13 +845,13 @@ static int vduse_vdpa_set_map(struct vdpa_device *vdpa,
> > > > > > > > struct vduse_dev *dev = vdpa_to_vduse(vdpa);
> > > > > > > > int ret;
> > > > > > > >
> > > > > > > > - ret = vduse_domain_set_map(dev->domain, iotlb);
> > > > > > > > + ret = vduse_domain_set_map(dev->as[asid].domain, iotlb);
> > > > > > > > if (ret)
> > > > > > > > return ret;
> > > > > > > >
> > > > > > > > - ret = vduse_dev_update_iotlb(dev, 0ULL, ULLONG_MAX);
> > > > > > > > + ret = vduse_dev_update_iotlb(dev, asid, 0ULL, ULLONG_MAX);
> > > > > > > > if (ret) {
> > > > > > > > - vduse_domain_clear_map(dev->domain, iotlb);
> > > > > > > > + vduse_domain_clear_map(dev->as[asid].domain, iotlb);
> > > > > > > > return ret;
> > > > > > > > }
> > > > > > > >
> > > > > > > > @@ -843,6 +894,7 @@ static const struct vdpa_config_ops vduse_vdpa_config_ops = {
> > > > > > > > .get_vq_affinity = vduse_vdpa_get_vq_affinity,
> > > > > > > > .reset = vduse_vdpa_reset,
> > > > > > > > .set_map = vduse_vdpa_set_map,
> > > > > > > > + .set_group_asid = vduse_set_group_asid,
> > > > > > > > .get_vq_map = vduse_get_vq_map,
> > > > > > > > .free = vduse_vdpa_free,
> > > > > > > > };
> > > > > > > > @@ -858,9 +910,10 @@ static void vduse_dev_sync_single_for_device(union virtio_map token,
> > > > > > > > return;
> > > > > > > >
> > > > > > > > vdev = token.group->dev;
> > > > > > > > - domain = vdev->domain;
> > > > > > > > -
> > > > > > > > + mutex_lock(&vdev->domain_lock);
> > > > > > > > + domain = token.group->as->domain;
> > > > > > > > vduse_domain_sync_single_for_device(domain, dma_addr, size, dir);
> > > > > > > > + mutex_unlock(&vdev->domain_lock);
> > > > > > > > }
> > > > > > > >
> > > > > > > > static void vduse_dev_sync_single_for_cpu(union virtio_map token,
> > > > > > > > @@ -874,9 +927,10 @@ static void vduse_dev_sync_single_for_cpu(union virtio_map token,
> > > > > > > > return;
> > > > > > > >
> > > > > > > > vdev = token.group->dev;
> > > > > > > > - domain = vdev->domain;
> > > > > > > > -
> > > > > > > > + mutex_lock(&vdev->domain_lock);
> > > > > > > > + domain = token.group->as->domain;
> > > > > > > > vduse_domain_sync_single_for_cpu(domain, dma_addr, size, dir);
> > > > > > > > + mutex_unlock(&vdev->domain_lock);
> > > > > > > > }
> > > > > > > >
> > > > > > > > static dma_addr_t vduse_dev_map_page(union virtio_map token, struct page *page,
> > > > > > > > @@ -886,14 +940,18 @@ static dma_addr_t vduse_dev_map_page(union virtio_map token, struct page *page,
> > > > > > > > {
> > > > > > > > struct vduse_dev *vdev;
> > > > > > > > struct vduse_iova_domain *domain;
> > > > > > > > + dma_addr_t r;
> > > > > > > >
> > > > > > > > if (!token.group)
> > > > > > > > return DMA_MAPPING_ERROR;
> > > > > > > >
> > > > > > > > vdev = token.group->dev;
> > > > > > > > - domain = vdev->domain;
> > > > > > > > + mutex_lock(&vdev->domain_lock);
> > > > > > >
> > > > > > > The mutex_lock can't be used here since the dma ops might be called in
> > > > > > > atomic context. And I think we can just remove it since creation and
> > > > > > > deletion operations of the iova domain are guaranteed not to execute
> > > > > > > concurrently with I/O operations.
> > > > > > >
> > > > > >
> > > > > > That would be great indeed! Can you expand on this, what protects here
> > > > > > from the moment the two syscalls are issues from userland?
> > > > > >
> > > > >
> > > > > The domain mutex lock is introduced in commit
> > > > > d4438d23eeeef8bb1b3a8abe418abffd60bb544a ("vduse: Delay iova domain
> > > > > creation"). It's used to prevent concurrent execution between
> > > > > vdpa_dev_add() and some vduse device ioctl which needs to access the
> > > > > iova domain such as VDUSE_IOTLB_REG_UMEM/DEREG_UMEM. But the dma ops
> > > > > would not be called until vdpa_dev_add() completed, so the mutex lock
> > > > > is not needed.
> > > > >
> > > >
> > > > Yes, but the usage is extended here to also protect from concurrent
> > > > calls to vduse_dev_map_page and vduse_set_group_asid or similar.
> > > >
> > > > So I guess the best is to replace it with a spinlock or RCU.
> > > >
> > >
> > > OK, I see, we simply aim to prevent concurrent access to the group->as
> > > variable here. But I wonder if the .set_group_asid function can be
> > > called during I/O.
> >
> > At least a malicious userland app could do it.
> >
> > > I think the unmap_page() would fail if we change
> > > the group->as between map_page() and unmap_page().
> >
> > I guess a SIGSEGV or similar could be a valid outcome of that, but we
> > need to protect the "as" pointer anyway.
> >
> > > Besides, it seems
> > > that .set_group_asid is only called in the vhost-vdpa case currently,
> > > but the dma ops such as map_page/unmap_page only service the
> > > virtio-vdpa case, so they would not be called concurrently with
> > > .set_group_asid now.
> > >
> >
> > That's totally true but a malicious app can do it anyway. So it is
> > better to protect it properly.
> >
>
> Actually there is no interface for userspace to call .set_group_asid
> if the vduse device is binded to the virtio-vdpa driver.
>
It is possible from vhost-vdpa.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Re: Re: Re: [PATCH v8 5/6] vduse: add vq group asid support
2025-11-03 8:19 ` Eugenio Perez Martin
@ 2025-11-03 8:56 ` Yongji Xie
2025-11-03 9:26 ` Jason Wang
0 siblings, 1 reply; 24+ messages in thread
From: Yongji Xie @ 2025-11-03 8:56 UTC (permalink / raw)
To: Eugenio Perez Martin
Cc: Michael S . Tsirkin, Laurent Vivier, Maxime Coquelin,
virtualization, Stefano Garzarella, Jason Wang, Cindy Lu,
Xuan Zhuo, linux-kernel
On Mon, Nov 3, 2025 at 4:20 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
>
> On Mon, Nov 3, 2025 at 8:13 AM Yongji Xie <xieyongji@bytedance.com> wrote:
> >
> > On Mon, Nov 3, 2025 at 2:41 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
> > >
> > > On Mon, Nov 3, 2025 at 3:55 AM Yongji Xie <xieyongji@bytedance.com> wrote:
> > > >
> > > > On Fri, Oct 31, 2025 at 10:09 PM Eugenio Perez Martin
> > > > <eperezma@redhat.com> wrote:
> > > > >
> > > > > On Fri, Oct 31, 2025 at 8:01 AM Yongji Xie <xieyongji@bytedance.com> wrote:
> > > > > >
> > > > > > On Fri, Oct 31, 2025 at 2:31 PM Eugenio Perez Martin
> > > > > > <eperezma@redhat.com> wrote:
> > > > > > >
> > > > > > > On Thu, Oct 30, 2025 at 12:52 PM Yongji Xie <xieyongji@bytedance.com> wrote:
> > > > > > > >
> > > > > > > > On Thu, Oct 30, 2025 at 6:01 PM Eugenio Pérez <eperezma@redhat.com> wrote:
> > > > > > > > >
> > > > > > > > > Add support for assigning Address Space Identifiers (ASIDs) to each VQ
> > > > > > > > > group. This enables mapping each group into a distinct memory space.
> > > > > > > > >
> > > > > > > > > Now that the driver can change ASID in the middle of operation, the
> > > > > > > > > domain that each vq address point is also protected by domain_lock.
> > > > > > > > >
> > > > > > > > > Acked-by: Jason Wang <jasowang@redhat.com>
> > > > > > > > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > > > > > > > ---
> > > > > > > > > v8:
> > > > > > > > > * Revert the mutex to rwlock change, it needs proper profiling to
> > > > > > > > > justify it.
> > > > > > > > >
> > > > > > > > > v7:
> > > > > > > > > * Take write lock in the error path (Jason).
> > > > > > > > >
> > > > > > > > > v6:
> > > > > > > > > * Make vdpa_dev_add use gotos for error handling (MST).
> > > > > > > > > * s/(dev->api_version < 1) ?/(dev->api_version < VDUSE_API_VERSION_1) ?/
> > > > > > > > > (MST).
> > > > > > > > > * Fix struct name not matching in the doc.
> > > > > > > > >
> > > > > > > > > v5:
> > > > > > > > > * Properly return errno if copy_to_user returns >0 in VDUSE_IOTLB_GET_FD
> > > > > > > > > ioctl (Jason).
> > > > > > > > > * Properly set domain bounce size to divide equally between nas (Jason).
> > > > > > > > > * Exclude "padding" member from the only >V1 members in
> > > > > > > > > vduse_dev_request.
> > > > > > > > >
> > > > > > > > > v4:
> > > > > > > > > * Divide each domain bounce size between the device bounce size (Jason).
> > > > > > > > > * revert unneeded addr = NULL assignment (Jason)
> > > > > > > > > * Change if (x && (y || z)) return to if (x) { if (y) return; if (z)
> > > > > > > > > return; } (Jason)
> > > > > > > > > * Change a bad multiline comment, using @ caracter instead of * (Jason).
> > > > > > > > > * Consider config->nas == 0 as a fail (Jason).
> > > > > > > > >
> > > > > > > > > v3:
> > > > > > > > > * Get the vduse domain through the vduse_as in the map functions
> > > > > > > > > (Jason).
> > > > > > > > > * Squash with the patch creating the vduse_as struct (Jason).
> > > > > > > > > * Create VDUSE_DEV_MAX_AS instead of comparing agains a magic number
> > > > > > > > > (Jason)
> > > > > > > > >
> > > > > > > > > v2:
> > > > > > > > > * Convert the use of mutex to rwlock.
> > > > > > > > >
> > > > > > > > > RFC v3:
> > > > > > > > > * Increase VDUSE_MAX_VQ_GROUPS to 0xffff (Jason). It was set to a lower
> > > > > > > > > value to reduce memory consumption, but vqs are already limited to
> > > > > > > > > that value and userspace VDUSE is able to allocate that many vqs.
> > > > > > > > > * Remove TODO about merging VDUSE_IOTLB_GET_FD ioctl with
> > > > > > > > > VDUSE_IOTLB_GET_INFO.
> > > > > > > > > * Use of array_index_nospec in VDUSE device ioctls.
> > > > > > > > > * Embed vduse_iotlb_entry into vduse_iotlb_entry_v2.
> > > > > > > > > * Move the umem mutex to asid struct so there is no contention between
> > > > > > > > > ASIDs.
> > > > > > > > >
> > > > > > > > > RFC v2:
> > > > > > > > > * Make iotlb entry the last one of vduse_iotlb_entry_v2 so the first
> > > > > > > > > part of the struct is the same.
> > > > > > > > > ---
> > > > > > > > > drivers/vdpa/vdpa_user/vduse_dev.c | 348 ++++++++++++++++++++---------
> > > > > > > > > include/uapi/linux/vduse.h | 53 ++++-
> > > > > > > > > 2 files changed, 292 insertions(+), 109 deletions(-)
> > > > > > > > >
> > > > > > > > > diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
> > > > > > > > > index 97be04f73fbf..c6909d73d06d 100644
> > > > > > > > > --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> > > > > > > > > +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> > > > > > > > > @@ -41,6 +41,7 @@
> > > > > > > > >
> > > > > > > > > #define VDUSE_DEV_MAX (1U << MINORBITS)
> > > > > > > > > #define VDUSE_DEV_MAX_GROUPS 0xffff
> > > > > > > > > +#define VDUSE_DEV_MAX_AS 0xffff
> > > > > > > > > #define VDUSE_MAX_BOUNCE_SIZE (1024 * 1024 * 1024)
> > > > > > > > > #define VDUSE_MIN_BOUNCE_SIZE (1024 * 1024)
> > > > > > > > > #define VDUSE_BOUNCE_SIZE (64 * 1024 * 1024)
> > > > > > > > > @@ -86,7 +87,14 @@ struct vduse_umem {
> > > > > > > > > struct mm_struct *mm;
> > > > > > > > > };
> > > > > > > > >
> > > > > > > > > +struct vduse_as {
> > > > > > > > > + struct vduse_iova_domain *domain;
> > > > > > > > > + struct vduse_umem *umem;
> > > > > > > > > + struct mutex mem_lock;
> > > > > > > > > +};
> > > > > > > > > +
> > > > > > > > > struct vduse_vq_group {
> > > > > > > > > + struct vduse_as *as;
> > > > > > > > > struct vduse_dev *dev;
> > > > > > > > > };
> > > > > > > > >
> > > > > > > > > @@ -94,7 +102,7 @@ struct vduse_dev {
> > > > > > > > > struct vduse_vdpa *vdev;
> > > > > > > > > struct device *dev;
> > > > > > > > > struct vduse_virtqueue **vqs;
> > > > > > > > > - struct vduse_iova_domain *domain;
> > > > > > > > > + struct vduse_as *as;
> > > > > > > > > char *name;
> > > > > > > > > struct mutex lock;
> > > > > > > > > spinlock_t msg_lock;
> > > > > > > > > @@ -122,9 +130,8 @@ struct vduse_dev {
> > > > > > > > > u32 vq_num;
> > > > > > > > > u32 vq_align;
> > > > > > > > > u32 ngroups;
> > > > > > > > > - struct vduse_umem *umem;
> > > > > > > > > + u32 nas;
> > > > > > > > > struct vduse_vq_group *groups;
> > > > > > > > > - struct mutex mem_lock;
> > > > > > > > > unsigned int bounce_size;
> > > > > > > > > struct mutex domain_lock;
> > > > > > > > > };
> > > > > > > > > @@ -314,7 +321,7 @@ static int vduse_dev_set_status(struct vduse_dev *dev, u8 status)
> > > > > > > > > return vduse_dev_msg_sync(dev, &msg);
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > -static int vduse_dev_update_iotlb(struct vduse_dev *dev,
> > > > > > > > > +static int vduse_dev_update_iotlb(struct vduse_dev *dev, u32 asid,
> > > > > > > > > u64 start, u64 last)
> > > > > > > > > {
> > > > > > > > > struct vduse_dev_msg msg = { 0 };
> > > > > > > > > @@ -323,8 +330,14 @@ static int vduse_dev_update_iotlb(struct vduse_dev *dev,
> > > > > > > > > return -EINVAL;
> > > > > > > > >
> > > > > > > > > msg.req.type = VDUSE_UPDATE_IOTLB;
> > > > > > > > > - msg.req.iova.start = start;
> > > > > > > > > - msg.req.iova.last = last;
> > > > > > > > > + if (dev->api_version < VDUSE_API_VERSION_1) {
> > > > > > > > > + msg.req.iova.start = start;
> > > > > > > > > + msg.req.iova.last = last;
> > > > > > > > > + } else {
> > > > > > > > > + msg.req.iova_v2.start = start;
> > > > > > > > > + msg.req.iova_v2.last = last;
> > > > > > > > > + msg.req.iova_v2.asid = asid;
> > > > > > > > > + }
> > > > > > > > >
> > > > > > > > > return vduse_dev_msg_sync(dev, &msg);
> > > > > > > > > }
> > > > > > > > > @@ -436,14 +449,29 @@ static __poll_t vduse_dev_poll(struct file *file, poll_table *wait)
> > > > > > > > > return mask;
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > +/* Force set the asid to a vq group without a message to the VDUSE device */
> > > > > > > > > +static void vduse_set_group_asid_nomsg(struct vduse_dev *dev,
> > > > > > > > > + unsigned int group, unsigned int asid)
> > > > > > > > > +{
> > > > > > > > > + mutex_lock(&dev->domain_lock);
> > > > > > > > > + dev->groups[group].as = &dev->as[asid];
> > > > > > > > > + mutex_unlock(&dev->domain_lock);
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > > static void vduse_dev_reset(struct vduse_dev *dev)
> > > > > > > > > {
> > > > > > > > > int i;
> > > > > > > > > - struct vduse_iova_domain *domain = dev->domain;
> > > > > > > > >
> > > > > > > > > /* The coherent mappings are handled in vduse_dev_free_coherent() */
> > > > > > > > > - if (domain && domain->bounce_map)
> > > > > > > > > - vduse_domain_reset_bounce_map(domain);
> > > > > > > > > + for (i = 0; i < dev->nas; i++) {
> > > > > > > > > + struct vduse_iova_domain *domain = dev->as[i].domain;
> > > > > > > > > +
> > > > > > > > > + if (domain && domain->bounce_map)
> > > > > > > > > + vduse_domain_reset_bounce_map(domain);
> > > > > > > > > + }
> > > > > > > > > +
> > > > > > > > > + for (i = 0; i < dev->ngroups; i++)
> > > > > > > > > + vduse_set_group_asid_nomsg(dev, i, 0);
> > > > > > > > >
> > > > > > > > > down_write(&dev->rwsem);
> > > > > > > > >
> > > > > > > > > @@ -623,6 +651,29 @@ static union virtio_map vduse_get_vq_map(struct vdpa_device *vdpa, u16 idx)
> > > > > > > > > return ret;
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > +static int vduse_set_group_asid(struct vdpa_device *vdpa, unsigned int group,
> > > > > > > > > + unsigned int asid)
> > > > > > > > > +{
> > > > > > > > > + struct vduse_dev *dev = vdpa_to_vduse(vdpa);
> > > > > > > > > + struct vduse_dev_msg msg = { 0 };
> > > > > > > > > + int r;
> > > > > > > > > +
> > > > > > > > > + if (dev->api_version < VDUSE_API_VERSION_1 ||
> > > > > > > > > + group >= dev->ngroups || asid >= dev->nas)
> > > > > > > > > + return -EINVAL;
> > > > > > > > > +
> > > > > > > > > + msg.req.type = VDUSE_SET_VQ_GROUP_ASID;
> > > > > > > > > + msg.req.vq_group_asid.group = group;
> > > > > > > > > + msg.req.vq_group_asid.asid = asid;
> > > > > > > > > +
> > > > > > > > > + r = vduse_dev_msg_sync(dev, &msg);
> > > > > > > > > + if (r < 0)
> > > > > > > > > + return r;
> > > > > > > > > +
> > > > > > > > > + vduse_set_group_asid_nomsg(dev, group, asid);
> > > > > > > > > + return 0;
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > > static int vduse_vdpa_get_vq_state(struct vdpa_device *vdpa, u16 idx,
> > > > > > > > > struct vdpa_vq_state *state)
> > > > > > > > > {
> > > > > > > > > @@ -794,13 +845,13 @@ static int vduse_vdpa_set_map(struct vdpa_device *vdpa,
> > > > > > > > > struct vduse_dev *dev = vdpa_to_vduse(vdpa);
> > > > > > > > > int ret;
> > > > > > > > >
> > > > > > > > > - ret = vduse_domain_set_map(dev->domain, iotlb);
> > > > > > > > > + ret = vduse_domain_set_map(dev->as[asid].domain, iotlb);
> > > > > > > > > if (ret)
> > > > > > > > > return ret;
> > > > > > > > >
> > > > > > > > > - ret = vduse_dev_update_iotlb(dev, 0ULL, ULLONG_MAX);
> > > > > > > > > + ret = vduse_dev_update_iotlb(dev, asid, 0ULL, ULLONG_MAX);
> > > > > > > > > if (ret) {
> > > > > > > > > - vduse_domain_clear_map(dev->domain, iotlb);
> > > > > > > > > + vduse_domain_clear_map(dev->as[asid].domain, iotlb);
> > > > > > > > > return ret;
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > @@ -843,6 +894,7 @@ static const struct vdpa_config_ops vduse_vdpa_config_ops = {
> > > > > > > > > .get_vq_affinity = vduse_vdpa_get_vq_affinity,
> > > > > > > > > .reset = vduse_vdpa_reset,
> > > > > > > > > .set_map = vduse_vdpa_set_map,
> > > > > > > > > + .set_group_asid = vduse_set_group_asid,
> > > > > > > > > .get_vq_map = vduse_get_vq_map,
> > > > > > > > > .free = vduse_vdpa_free,
> > > > > > > > > };
> > > > > > > > > @@ -858,9 +910,10 @@ static void vduse_dev_sync_single_for_device(union virtio_map token,
> > > > > > > > > return;
> > > > > > > > >
> > > > > > > > > vdev = token.group->dev;
> > > > > > > > > - domain = vdev->domain;
> > > > > > > > > -
> > > > > > > > > + mutex_lock(&vdev->domain_lock);
> > > > > > > > > + domain = token.group->as->domain;
> > > > > > > > > vduse_domain_sync_single_for_device(domain, dma_addr, size, dir);
> > > > > > > > > + mutex_unlock(&vdev->domain_lock);
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > static void vduse_dev_sync_single_for_cpu(union virtio_map token,
> > > > > > > > > @@ -874,9 +927,10 @@ static void vduse_dev_sync_single_for_cpu(union virtio_map token,
> > > > > > > > > return;
> > > > > > > > >
> > > > > > > > > vdev = token.group->dev;
> > > > > > > > > - domain = vdev->domain;
> > > > > > > > > -
> > > > > > > > > + mutex_lock(&vdev->domain_lock);
> > > > > > > > > + domain = token.group->as->domain;
> > > > > > > > > vduse_domain_sync_single_for_cpu(domain, dma_addr, size, dir);
> > > > > > > > > + mutex_unlock(&vdev->domain_lock);
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > static dma_addr_t vduse_dev_map_page(union virtio_map token, struct page *page,
> > > > > > > > > @@ -886,14 +940,18 @@ static dma_addr_t vduse_dev_map_page(union virtio_map token, struct page *page,
> > > > > > > > > {
> > > > > > > > > struct vduse_dev *vdev;
> > > > > > > > > struct vduse_iova_domain *domain;
> > > > > > > > > + dma_addr_t r;
> > > > > > > > >
> > > > > > > > > if (!token.group)
> > > > > > > > > return DMA_MAPPING_ERROR;
> > > > > > > > >
> > > > > > > > > vdev = token.group->dev;
> > > > > > > > > - domain = vdev->domain;
> > > > > > > > > + mutex_lock(&vdev->domain_lock);
> > > > > > > >
> > > > > > > > The mutex_lock can't be used here since the dma ops might be called in
> > > > > > > > atomic context. And I think we can just remove it since creation and
> > > > > > > > deletion operations of the iova domain are guaranteed not to execute
> > > > > > > > concurrently with I/O operations.
> > > > > > > >
> > > > > > >
> > > > > > > That would be great indeed! Can you expand on this, what protects here
> > > > > > > from the moment the two syscalls are issues from userland?
> > > > > > >
> > > > > >
> > > > > > The domain mutex lock is introduced in commit
> > > > > > d4438d23eeeef8bb1b3a8abe418abffd60bb544a ("vduse: Delay iova domain
> > > > > > creation"). It's used to prevent concurrent execution between
> > > > > > vdpa_dev_add() and some vduse device ioctl which needs to access the
> > > > > > iova domain such as VDUSE_IOTLB_REG_UMEM/DEREG_UMEM. But the dma ops
> > > > > > would not be called until vdpa_dev_add() completed, so the mutex lock
> > > > > > is not needed.
> > > > > >
> > > > >
> > > > > Yes, but the usage is extended here to also protect from concurrent
> > > > > calls to vduse_dev_map_page and vduse_set_group_asid or similar.
> > > > >
> > > > > So I guess the best is to replace it with a spinlock or RCU.
> > > > >
> > > >
> > > > OK, I see, we simply aim to prevent concurrent access to the group->as
> > > > variable here. But I wonder if the .set_group_asid function can be
> > > > called during I/O.
> > >
> > > At least a malicious userland app could do it.
> > >
> > > > I think the unmap_page() would fail if we change
> > > > the group->as between map_page() and unmap_page().
> > >
> > > I guess a SIGSEGV or similar could be a valid outcome of that, but we
> > > need to protect the "as" pointer anyway.
> > >
> > > > Besides, it seems
> > > > that .set_group_asid is only called in the vhost-vdpa case currently,
> > > > but the dma ops such as map_page/unmap_page only service the
> > > > virtio-vdpa case, so they would not be called concurrently with
> > > > .set_group_asid now.
> > > >
> > >
> > > That's totally true but a malicious app can do it anyway. So it is
> > > better to protect it properly.
> > >
> >
> > Actually there is no interface for userspace to call .set_group_asid
> > if the vduse device is binded to the virtio-vdpa driver.
> >
>
> It is possible from vhost-vdpa.
>
Then there is no interface for userspace to call dma_ops such as
map_page/unmap_page if the device is binded to the vhost-vdpa driver.
Thanks,
Yongji
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Re: Re: Re: [PATCH v8 5/6] vduse: add vq group asid support
2025-11-03 8:56 ` Yongji Xie
@ 2025-11-03 9:26 ` Jason Wang
2025-11-03 9:45 ` Yongji Xie
0 siblings, 1 reply; 24+ messages in thread
From: Jason Wang @ 2025-11-03 9:26 UTC (permalink / raw)
To: Yongji Xie
Cc: Eugenio Perez Martin, Michael S . Tsirkin, Laurent Vivier,
Maxime Coquelin, virtualization, Stefano Garzarella, Cindy Lu,
Xuan Zhuo, linux-kernel
On Mon, Nov 3, 2025 at 4:56 PM Yongji Xie <xieyongji@bytedance.com> wrote:
>
> On Mon, Nov 3, 2025 at 4:20 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
> >
> > On Mon, Nov 3, 2025 at 8:13 AM Yongji Xie <xieyongji@bytedance.com> wrote:
> > >
> > > On Mon, Nov 3, 2025 at 2:41 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
> > > >
> > > > On Mon, Nov 3, 2025 at 3:55 AM Yongji Xie <xieyongji@bytedance.com> wrote:
> > > > >
> > > > > On Fri, Oct 31, 2025 at 10:09 PM Eugenio Perez Martin
> > > > > <eperezma@redhat.com> wrote:
> > > > > >
> > > > > > On Fri, Oct 31, 2025 at 8:01 AM Yongji Xie <xieyongji@bytedance.com> wrote:
> > > > > > >
> > > > > > > On Fri, Oct 31, 2025 at 2:31 PM Eugenio Perez Martin
> > > > > > > <eperezma@redhat.com> wrote:
> > > > > > > >
> > > > > > > > On Thu, Oct 30, 2025 at 12:52 PM Yongji Xie <xieyongji@bytedance.com> wrote:
> > > > > > > > >
> > > > > > > > > On Thu, Oct 30, 2025 at 6:01 PM Eugenio Pérez <eperezma@redhat.com> wrote:
> > > > > > > > > >
> > > > > > > > > > Add support for assigning Address Space Identifiers (ASIDs) to each VQ
> > > > > > > > > > group. This enables mapping each group into a distinct memory space.
> > > > > > > > > >
> > > > > > > > > > Now that the driver can change ASID in the middle of operation, the
> > > > > > > > > > domain that each vq address point is also protected by domain_lock.
> > > > > > > > > >
> > > > > > > > > > Acked-by: Jason Wang <jasowang@redhat.com>
> > > > > > > > > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > > > > > > > > ---
> > > > > > > > > > v8:
> > > > > > > > > > * Revert the mutex to rwlock change, it needs proper profiling to
> > > > > > > > > > justify it.
> > > > > > > > > >
> > > > > > > > > > v7:
> > > > > > > > > > * Take write lock in the error path (Jason).
> > > > > > > > > >
> > > > > > > > > > v6:
> > > > > > > > > > * Make vdpa_dev_add use gotos for error handling (MST).
> > > > > > > > > > * s/(dev->api_version < 1) ?/(dev->api_version < VDUSE_API_VERSION_1) ?/
> > > > > > > > > > (MST).
> > > > > > > > > > * Fix struct name not matching in the doc.
> > > > > > > > > >
> > > > > > > > > > v5:
> > > > > > > > > > * Properly return errno if copy_to_user returns >0 in VDUSE_IOTLB_GET_FD
> > > > > > > > > > ioctl (Jason).
> > > > > > > > > > * Properly set domain bounce size to divide equally between nas (Jason).
> > > > > > > > > > * Exclude "padding" member from the only >V1 members in
> > > > > > > > > > vduse_dev_request.
> > > > > > > > > >
> > > > > > > > > > v4:
> > > > > > > > > > * Divide each domain bounce size between the device bounce size (Jason).
> > > > > > > > > > * revert unneeded addr = NULL assignment (Jason)
> > > > > > > > > > * Change if (x && (y || z)) return to if (x) { if (y) return; if (z)
> > > > > > > > > > return; } (Jason)
> > > > > > > > > > * Change a bad multiline comment, using @ caracter instead of * (Jason).
> > > > > > > > > > * Consider config->nas == 0 as a fail (Jason).
> > > > > > > > > >
> > > > > > > > > > v3:
> > > > > > > > > > * Get the vduse domain through the vduse_as in the map functions
> > > > > > > > > > (Jason).
> > > > > > > > > > * Squash with the patch creating the vduse_as struct (Jason).
> > > > > > > > > > * Create VDUSE_DEV_MAX_AS instead of comparing agains a magic number
> > > > > > > > > > (Jason)
> > > > > > > > > >
> > > > > > > > > > v2:
> > > > > > > > > > * Convert the use of mutex to rwlock.
> > > > > > > > > >
> > > > > > > > > > RFC v3:
> > > > > > > > > > * Increase VDUSE_MAX_VQ_GROUPS to 0xffff (Jason). It was set to a lower
> > > > > > > > > > value to reduce memory consumption, but vqs are already limited to
> > > > > > > > > > that value and userspace VDUSE is able to allocate that many vqs.
> > > > > > > > > > * Remove TODO about merging VDUSE_IOTLB_GET_FD ioctl with
> > > > > > > > > > VDUSE_IOTLB_GET_INFO.
> > > > > > > > > > * Use of array_index_nospec in VDUSE device ioctls.
> > > > > > > > > > * Embed vduse_iotlb_entry into vduse_iotlb_entry_v2.
> > > > > > > > > > * Move the umem mutex to asid struct so there is no contention between
> > > > > > > > > > ASIDs.
> > > > > > > > > >
> > > > > > > > > > RFC v2:
> > > > > > > > > > * Make iotlb entry the last one of vduse_iotlb_entry_v2 so the first
> > > > > > > > > > part of the struct is the same.
> > > > > > > > > > ---
> > > > > > > > > > drivers/vdpa/vdpa_user/vduse_dev.c | 348 ++++++++++++++++++++---------
> > > > > > > > > > include/uapi/linux/vduse.h | 53 ++++-
> > > > > > > > > > 2 files changed, 292 insertions(+), 109 deletions(-)
> > > > > > > > > >
> > > > > > > > > > diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
> > > > > > > > > > index 97be04f73fbf..c6909d73d06d 100644
> > > > > > > > > > --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> > > > > > > > > > +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> > > > > > > > > > @@ -41,6 +41,7 @@
> > > > > > > > > >
> > > > > > > > > > #define VDUSE_DEV_MAX (1U << MINORBITS)
> > > > > > > > > > #define VDUSE_DEV_MAX_GROUPS 0xffff
> > > > > > > > > > +#define VDUSE_DEV_MAX_AS 0xffff
> > > > > > > > > > #define VDUSE_MAX_BOUNCE_SIZE (1024 * 1024 * 1024)
> > > > > > > > > > #define VDUSE_MIN_BOUNCE_SIZE (1024 * 1024)
> > > > > > > > > > #define VDUSE_BOUNCE_SIZE (64 * 1024 * 1024)
> > > > > > > > > > @@ -86,7 +87,14 @@ struct vduse_umem {
> > > > > > > > > > struct mm_struct *mm;
> > > > > > > > > > };
> > > > > > > > > >
> > > > > > > > > > +struct vduse_as {
> > > > > > > > > > + struct vduse_iova_domain *domain;
> > > > > > > > > > + struct vduse_umem *umem;
> > > > > > > > > > + struct mutex mem_lock;
> > > > > > > > > > +};
> > > > > > > > > > +
> > > > > > > > > > struct vduse_vq_group {
> > > > > > > > > > + struct vduse_as *as;
> > > > > > > > > > struct vduse_dev *dev;
> > > > > > > > > > };
> > > > > > > > > >
> > > > > > > > > > @@ -94,7 +102,7 @@ struct vduse_dev {
> > > > > > > > > > struct vduse_vdpa *vdev;
> > > > > > > > > > struct device *dev;
> > > > > > > > > > struct vduse_virtqueue **vqs;
> > > > > > > > > > - struct vduse_iova_domain *domain;
> > > > > > > > > > + struct vduse_as *as;
> > > > > > > > > > char *name;
> > > > > > > > > > struct mutex lock;
> > > > > > > > > > spinlock_t msg_lock;
> > > > > > > > > > @@ -122,9 +130,8 @@ struct vduse_dev {
> > > > > > > > > > u32 vq_num;
> > > > > > > > > > u32 vq_align;
> > > > > > > > > > u32 ngroups;
> > > > > > > > > > - struct vduse_umem *umem;
> > > > > > > > > > + u32 nas;
> > > > > > > > > > struct vduse_vq_group *groups;
> > > > > > > > > > - struct mutex mem_lock;
> > > > > > > > > > unsigned int bounce_size;
> > > > > > > > > > struct mutex domain_lock;
> > > > > > > > > > };
> > > > > > > > > > @@ -314,7 +321,7 @@ static int vduse_dev_set_status(struct vduse_dev *dev, u8 status)
> > > > > > > > > > return vduse_dev_msg_sync(dev, &msg);
> > > > > > > > > > }
> > > > > > > > > >
> > > > > > > > > > -static int vduse_dev_update_iotlb(struct vduse_dev *dev,
> > > > > > > > > > +static int vduse_dev_update_iotlb(struct vduse_dev *dev, u32 asid,
> > > > > > > > > > u64 start, u64 last)
> > > > > > > > > > {
> > > > > > > > > > struct vduse_dev_msg msg = { 0 };
> > > > > > > > > > @@ -323,8 +330,14 @@ static int vduse_dev_update_iotlb(struct vduse_dev *dev,
> > > > > > > > > > return -EINVAL;
> > > > > > > > > >
> > > > > > > > > > msg.req.type = VDUSE_UPDATE_IOTLB;
> > > > > > > > > > - msg.req.iova.start = start;
> > > > > > > > > > - msg.req.iova.last = last;
> > > > > > > > > > + if (dev->api_version < VDUSE_API_VERSION_1) {
> > > > > > > > > > + msg.req.iova.start = start;
> > > > > > > > > > + msg.req.iova.last = last;
> > > > > > > > > > + } else {
> > > > > > > > > > + msg.req.iova_v2.start = start;
> > > > > > > > > > + msg.req.iova_v2.last = last;
> > > > > > > > > > + msg.req.iova_v2.asid = asid;
> > > > > > > > > > + }
> > > > > > > > > >
> > > > > > > > > > return vduse_dev_msg_sync(dev, &msg);
> > > > > > > > > > }
> > > > > > > > > > @@ -436,14 +449,29 @@ static __poll_t vduse_dev_poll(struct file *file, poll_table *wait)
> > > > > > > > > > return mask;
> > > > > > > > > > }
> > > > > > > > > >
> > > > > > > > > > +/* Force set the asid to a vq group without a message to the VDUSE device */
> > > > > > > > > > +static void vduse_set_group_asid_nomsg(struct vduse_dev *dev,
> > > > > > > > > > + unsigned int group, unsigned int asid)
> > > > > > > > > > +{
> > > > > > > > > > + mutex_lock(&dev->domain_lock);
> > > > > > > > > > + dev->groups[group].as = &dev->as[asid];
> > > > > > > > > > + mutex_unlock(&dev->domain_lock);
> > > > > > > > > > +}
> > > > > > > > > > +
> > > > > > > > > > static void vduse_dev_reset(struct vduse_dev *dev)
> > > > > > > > > > {
> > > > > > > > > > int i;
> > > > > > > > > > - struct vduse_iova_domain *domain = dev->domain;
> > > > > > > > > >
> > > > > > > > > > /* The coherent mappings are handled in vduse_dev_free_coherent() */
> > > > > > > > > > - if (domain && domain->bounce_map)
> > > > > > > > > > - vduse_domain_reset_bounce_map(domain);
> > > > > > > > > > + for (i = 0; i < dev->nas; i++) {
> > > > > > > > > > + struct vduse_iova_domain *domain = dev->as[i].domain;
> > > > > > > > > > +
> > > > > > > > > > + if (domain && domain->bounce_map)
> > > > > > > > > > + vduse_domain_reset_bounce_map(domain);
> > > > > > > > > > + }
> > > > > > > > > > +
> > > > > > > > > > + for (i = 0; i < dev->ngroups; i++)
> > > > > > > > > > + vduse_set_group_asid_nomsg(dev, i, 0);
> > > > > > > > > >
> > > > > > > > > > down_write(&dev->rwsem);
> > > > > > > > > >
> > > > > > > > > > @@ -623,6 +651,29 @@ static union virtio_map vduse_get_vq_map(struct vdpa_device *vdpa, u16 idx)
> > > > > > > > > > return ret;
> > > > > > > > > > }
> > > > > > > > > >
> > > > > > > > > > +static int vduse_set_group_asid(struct vdpa_device *vdpa, unsigned int group,
> > > > > > > > > > + unsigned int asid)
> > > > > > > > > > +{
> > > > > > > > > > + struct vduse_dev *dev = vdpa_to_vduse(vdpa);
> > > > > > > > > > + struct vduse_dev_msg msg = { 0 };
> > > > > > > > > > + int r;
> > > > > > > > > > +
> > > > > > > > > > + if (dev->api_version < VDUSE_API_VERSION_1 ||
> > > > > > > > > > + group >= dev->ngroups || asid >= dev->nas)
> > > > > > > > > > + return -EINVAL;
> > > > > > > > > > +
> > > > > > > > > > + msg.req.type = VDUSE_SET_VQ_GROUP_ASID;
> > > > > > > > > > + msg.req.vq_group_asid.group = group;
> > > > > > > > > > + msg.req.vq_group_asid.asid = asid;
> > > > > > > > > > +
> > > > > > > > > > + r = vduse_dev_msg_sync(dev, &msg);
> > > > > > > > > > + if (r < 0)
> > > > > > > > > > + return r;
> > > > > > > > > > +
> > > > > > > > > > + vduse_set_group_asid_nomsg(dev, group, asid);
> > > > > > > > > > + return 0;
> > > > > > > > > > +}
> > > > > > > > > > +
> > > > > > > > > > static int vduse_vdpa_get_vq_state(struct vdpa_device *vdpa, u16 idx,
> > > > > > > > > > struct vdpa_vq_state *state)
> > > > > > > > > > {
> > > > > > > > > > @@ -794,13 +845,13 @@ static int vduse_vdpa_set_map(struct vdpa_device *vdpa,
> > > > > > > > > > struct vduse_dev *dev = vdpa_to_vduse(vdpa);
> > > > > > > > > > int ret;
> > > > > > > > > >
> > > > > > > > > > - ret = vduse_domain_set_map(dev->domain, iotlb);
> > > > > > > > > > + ret = vduse_domain_set_map(dev->as[asid].domain, iotlb);
> > > > > > > > > > if (ret)
> > > > > > > > > > return ret;
> > > > > > > > > >
> > > > > > > > > > - ret = vduse_dev_update_iotlb(dev, 0ULL, ULLONG_MAX);
> > > > > > > > > > + ret = vduse_dev_update_iotlb(dev, asid, 0ULL, ULLONG_MAX);
> > > > > > > > > > if (ret) {
> > > > > > > > > > - vduse_domain_clear_map(dev->domain, iotlb);
> > > > > > > > > > + vduse_domain_clear_map(dev->as[asid].domain, iotlb);
> > > > > > > > > > return ret;
> > > > > > > > > > }
> > > > > > > > > >
> > > > > > > > > > @@ -843,6 +894,7 @@ static const struct vdpa_config_ops vduse_vdpa_config_ops = {
> > > > > > > > > > .get_vq_affinity = vduse_vdpa_get_vq_affinity,
> > > > > > > > > > .reset = vduse_vdpa_reset,
> > > > > > > > > > .set_map = vduse_vdpa_set_map,
> > > > > > > > > > + .set_group_asid = vduse_set_group_asid,
> > > > > > > > > > .get_vq_map = vduse_get_vq_map,
> > > > > > > > > > .free = vduse_vdpa_free,
> > > > > > > > > > };
> > > > > > > > > > @@ -858,9 +910,10 @@ static void vduse_dev_sync_single_for_device(union virtio_map token,
> > > > > > > > > > return;
> > > > > > > > > >
> > > > > > > > > > vdev = token.group->dev;
> > > > > > > > > > - domain = vdev->domain;
> > > > > > > > > > -
> > > > > > > > > > + mutex_lock(&vdev->domain_lock);
> > > > > > > > > > + domain = token.group->as->domain;
> > > > > > > > > > vduse_domain_sync_single_for_device(domain, dma_addr, size, dir);
> > > > > > > > > > + mutex_unlock(&vdev->domain_lock);
> > > > > > > > > > }
> > > > > > > > > >
> > > > > > > > > > static void vduse_dev_sync_single_for_cpu(union virtio_map token,
> > > > > > > > > > @@ -874,9 +927,10 @@ static void vduse_dev_sync_single_for_cpu(union virtio_map token,
> > > > > > > > > > return;
> > > > > > > > > >
> > > > > > > > > > vdev = token.group->dev;
> > > > > > > > > > - domain = vdev->domain;
> > > > > > > > > > -
> > > > > > > > > > + mutex_lock(&vdev->domain_lock);
> > > > > > > > > > + domain = token.group->as->domain;
> > > > > > > > > > vduse_domain_sync_single_for_cpu(domain, dma_addr, size, dir);
> > > > > > > > > > + mutex_unlock(&vdev->domain_lock);
> > > > > > > > > > }
> > > > > > > > > >
> > > > > > > > > > static dma_addr_t vduse_dev_map_page(union virtio_map token, struct page *page,
> > > > > > > > > > @@ -886,14 +940,18 @@ static dma_addr_t vduse_dev_map_page(union virtio_map token, struct page *page,
> > > > > > > > > > {
> > > > > > > > > > struct vduse_dev *vdev;
> > > > > > > > > > struct vduse_iova_domain *domain;
> > > > > > > > > > + dma_addr_t r;
> > > > > > > > > >
> > > > > > > > > > if (!token.group)
> > > > > > > > > > return DMA_MAPPING_ERROR;
> > > > > > > > > >
> > > > > > > > > > vdev = token.group->dev;
> > > > > > > > > > - domain = vdev->domain;
> > > > > > > > > > + mutex_lock(&vdev->domain_lock);
> > > > > > > > >
> > > > > > > > > The mutex_lock can't be used here since the dma ops might be called in
> > > > > > > > > atomic context. And I think we can just remove it since creation and
> > > > > > > > > deletion operations of the iova domain are guaranteed not to execute
> > > > > > > > > concurrently with I/O operations.
> > > > > > > > >
> > > > > > > >
> > > > > > > > That would be great indeed! Can you expand on this, what protects here
> > > > > > > > from the moment the two syscalls are issues from userland?
> > > > > > > >
> > > > > > >
> > > > > > > The domain mutex lock is introduced in commit
> > > > > > > d4438d23eeeef8bb1b3a8abe418abffd60bb544a ("vduse: Delay iova domain
> > > > > > > creation"). It's used to prevent concurrent execution between
> > > > > > > vdpa_dev_add() and some vduse device ioctl which needs to access the
> > > > > > > iova domain such as VDUSE_IOTLB_REG_UMEM/DEREG_UMEM. But the dma ops
> > > > > > > would not be called until vdpa_dev_add() completed, so the mutex lock
> > > > > > > is not needed.
> > > > > > >
> > > > > >
> > > > > > Yes, but the usage is extended here to also protect from concurrent
> > > > > > calls to vduse_dev_map_page and vduse_set_group_asid or similar.
> > > > > >
> > > > > > So I guess the best is to replace it with a spinlock or RCU.
> > > > > >
> > > > >
> > > > > OK, I see, we simply aim to prevent concurrent access to the group->as
> > > > > variable here. But I wonder if the .set_group_asid function can be
> > > > > called during I/O.
> > > >
> > > > At least a malicious userland app could do it.
> > > >
> > > > > I think the unmap_page() would fail if we change
> > > > > the group->as between map_page() and unmap_page().
It's better for the VUDSE to prepare for this.
> > > >
> > > > I guess a SIGSEGV or similar could be a valid outcome of that, but we
> > > > need to protect the "as" pointer anyway.
SIGSEGV is probably not suitable for the case of map/unmap as they
might be executed in bh.
> > > >
> > > > > Besides, it seems
> > > > > that .set_group_asid is only called in the vhost-vdpa case currently,
> > > > > but the dma ops such as map_page/unmap_page only service the
> > > > > virtio-vdpa case, so they would not be called concurrently with
> > > > > .set_group_asid now.
> > > > >
> > > >
> > > > That's totally true but a malicious app can do it anyway. So it is
> > > > better to protect it properly.
> > > >
> > >
> > > Actually there is no interface for userspace to call .set_group_asid
> > > if the vduse device is binded to the virtio-vdpa driver.
> > >
> >
> > It is possible from vhost-vdpa.
> >
>
> Then there is no interface for userspace to call dma_ops such as
> map_page/unmap_page if the device is binded to the vhost-vdpa driver.
>
> Thanks,
> Yongji
>
Right now, but we can't say that for the future drivers. Unless we
have a guard in the vdpa core, we probably need to prepare for that.
Thanks
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Re: Re: Re: [PATCH v8 5/6] vduse: add vq group asid support
2025-11-03 9:26 ` Jason Wang
@ 2025-11-03 9:45 ` Yongji Xie
0 siblings, 0 replies; 24+ messages in thread
From: Yongji Xie @ 2025-11-03 9:45 UTC (permalink / raw)
To: Jason Wang
Cc: Eugenio Perez Martin, Michael S . Tsirkin, Laurent Vivier,
Maxime Coquelin, virtualization, Stefano Garzarella, Cindy Lu,
Xuan Zhuo, linux-kernel
On Mon, Nov 3, 2025 at 5:26 PM Jason Wang <jasowang@redhat.com> wrote:
>
> On Mon, Nov 3, 2025 at 4:56 PM Yongji Xie <xieyongji@bytedance.com> wrote:
> >
> > On Mon, Nov 3, 2025 at 4:20 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
> > >
> > > On Mon, Nov 3, 2025 at 8:13 AM Yongji Xie <xieyongji@bytedance.com> wrote:
> > > >
> > > > On Mon, Nov 3, 2025 at 2:41 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
> > > > >
> > > > > On Mon, Nov 3, 2025 at 3:55 AM Yongji Xie <xieyongji@bytedance.com> wrote:
> > > > > >
> > > > > > On Fri, Oct 31, 2025 at 10:09 PM Eugenio Perez Martin
> > > > > > <eperezma@redhat.com> wrote:
> > > > > > >
> > > > > > > On Fri, Oct 31, 2025 at 8:01 AM Yongji Xie <xieyongji@bytedance.com> wrote:
> > > > > > > >
> > > > > > > > On Fri, Oct 31, 2025 at 2:31 PM Eugenio Perez Martin
> > > > > > > > <eperezma@redhat.com> wrote:
> > > > > > > > >
> > > > > > > > > On Thu, Oct 30, 2025 at 12:52 PM Yongji Xie <xieyongji@bytedance.com> wrote:
> > > > > > > > > >
> > > > > > > > > > On Thu, Oct 30, 2025 at 6:01 PM Eugenio Pérez <eperezma@redhat.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > Add support for assigning Address Space Identifiers (ASIDs) to each VQ
> > > > > > > > > > > group. This enables mapping each group into a distinct memory space.
> > > > > > > > > > >
> > > > > > > > > > > Now that the driver can change ASID in the middle of operation, the
> > > > > > > > > > > domain that each vq address point is also protected by domain_lock.
> > > > > > > > > > >
> > > > > > > > > > > Acked-by: Jason Wang <jasowang@redhat.com>
> > > > > > > > > > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > > > > > > > > > ---
> > > > > > > > > > > v8:
> > > > > > > > > > > * Revert the mutex to rwlock change, it needs proper profiling to
> > > > > > > > > > > justify it.
> > > > > > > > > > >
> > > > > > > > > > > v7:
> > > > > > > > > > > * Take write lock in the error path (Jason).
> > > > > > > > > > >
> > > > > > > > > > > v6:
> > > > > > > > > > > * Make vdpa_dev_add use gotos for error handling (MST).
> > > > > > > > > > > * s/(dev->api_version < 1) ?/(dev->api_version < VDUSE_API_VERSION_1) ?/
> > > > > > > > > > > (MST).
> > > > > > > > > > > * Fix struct name not matching in the doc.
> > > > > > > > > > >
> > > > > > > > > > > v5:
> > > > > > > > > > > * Properly return errno if copy_to_user returns >0 in VDUSE_IOTLB_GET_FD
> > > > > > > > > > > ioctl (Jason).
> > > > > > > > > > > * Properly set domain bounce size to divide equally between nas (Jason).
> > > > > > > > > > > * Exclude "padding" member from the only >V1 members in
> > > > > > > > > > > vduse_dev_request.
> > > > > > > > > > >
> > > > > > > > > > > v4:
> > > > > > > > > > > * Divide each domain bounce size between the device bounce size (Jason).
> > > > > > > > > > > * revert unneeded addr = NULL assignment (Jason)
> > > > > > > > > > > * Change if (x && (y || z)) return to if (x) { if (y) return; if (z)
> > > > > > > > > > > return; } (Jason)
> > > > > > > > > > > * Change a bad multiline comment, using @ caracter instead of * (Jason).
> > > > > > > > > > > * Consider config->nas == 0 as a fail (Jason).
> > > > > > > > > > >
> > > > > > > > > > > v3:
> > > > > > > > > > > * Get the vduse domain through the vduse_as in the map functions
> > > > > > > > > > > (Jason).
> > > > > > > > > > > * Squash with the patch creating the vduse_as struct (Jason).
> > > > > > > > > > > * Create VDUSE_DEV_MAX_AS instead of comparing agains a magic number
> > > > > > > > > > > (Jason)
> > > > > > > > > > >
> > > > > > > > > > > v2:
> > > > > > > > > > > * Convert the use of mutex to rwlock.
> > > > > > > > > > >
> > > > > > > > > > > RFC v3:
> > > > > > > > > > > * Increase VDUSE_MAX_VQ_GROUPS to 0xffff (Jason). It was set to a lower
> > > > > > > > > > > value to reduce memory consumption, but vqs are already limited to
> > > > > > > > > > > that value and userspace VDUSE is able to allocate that many vqs.
> > > > > > > > > > > * Remove TODO about merging VDUSE_IOTLB_GET_FD ioctl with
> > > > > > > > > > > VDUSE_IOTLB_GET_INFO.
> > > > > > > > > > > * Use of array_index_nospec in VDUSE device ioctls.
> > > > > > > > > > > * Embed vduse_iotlb_entry into vduse_iotlb_entry_v2.
> > > > > > > > > > > * Move the umem mutex to asid struct so there is no contention between
> > > > > > > > > > > ASIDs.
> > > > > > > > > > >
> > > > > > > > > > > RFC v2:
> > > > > > > > > > > * Make iotlb entry the last one of vduse_iotlb_entry_v2 so the first
> > > > > > > > > > > part of the struct is the same.
> > > > > > > > > > > ---
> > > > > > > > > > > drivers/vdpa/vdpa_user/vduse_dev.c | 348 ++++++++++++++++++++---------
> > > > > > > > > > > include/uapi/linux/vduse.h | 53 ++++-
> > > > > > > > > > > 2 files changed, 292 insertions(+), 109 deletions(-)
> > > > > > > > > > >
> > > > > > > > > > > diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
> > > > > > > > > > > index 97be04f73fbf..c6909d73d06d 100644
> > > > > > > > > > > --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> > > > > > > > > > > +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> > > > > > > > > > > @@ -41,6 +41,7 @@
> > > > > > > > > > >
> > > > > > > > > > > #define VDUSE_DEV_MAX (1U << MINORBITS)
> > > > > > > > > > > #define VDUSE_DEV_MAX_GROUPS 0xffff
> > > > > > > > > > > +#define VDUSE_DEV_MAX_AS 0xffff
> > > > > > > > > > > #define VDUSE_MAX_BOUNCE_SIZE (1024 * 1024 * 1024)
> > > > > > > > > > > #define VDUSE_MIN_BOUNCE_SIZE (1024 * 1024)
> > > > > > > > > > > #define VDUSE_BOUNCE_SIZE (64 * 1024 * 1024)
> > > > > > > > > > > @@ -86,7 +87,14 @@ struct vduse_umem {
> > > > > > > > > > > struct mm_struct *mm;
> > > > > > > > > > > };
> > > > > > > > > > >
> > > > > > > > > > > +struct vduse_as {
> > > > > > > > > > > + struct vduse_iova_domain *domain;
> > > > > > > > > > > + struct vduse_umem *umem;
> > > > > > > > > > > + struct mutex mem_lock;
> > > > > > > > > > > +};
> > > > > > > > > > > +
> > > > > > > > > > > struct vduse_vq_group {
> > > > > > > > > > > + struct vduse_as *as;
> > > > > > > > > > > struct vduse_dev *dev;
> > > > > > > > > > > };
> > > > > > > > > > >
> > > > > > > > > > > @@ -94,7 +102,7 @@ struct vduse_dev {
> > > > > > > > > > > struct vduse_vdpa *vdev;
> > > > > > > > > > > struct device *dev;
> > > > > > > > > > > struct vduse_virtqueue **vqs;
> > > > > > > > > > > - struct vduse_iova_domain *domain;
> > > > > > > > > > > + struct vduse_as *as;
> > > > > > > > > > > char *name;
> > > > > > > > > > > struct mutex lock;
> > > > > > > > > > > spinlock_t msg_lock;
> > > > > > > > > > > @@ -122,9 +130,8 @@ struct vduse_dev {
> > > > > > > > > > > u32 vq_num;
> > > > > > > > > > > u32 vq_align;
> > > > > > > > > > > u32 ngroups;
> > > > > > > > > > > - struct vduse_umem *umem;
> > > > > > > > > > > + u32 nas;
> > > > > > > > > > > struct vduse_vq_group *groups;
> > > > > > > > > > > - struct mutex mem_lock;
> > > > > > > > > > > unsigned int bounce_size;
> > > > > > > > > > > struct mutex domain_lock;
> > > > > > > > > > > };
> > > > > > > > > > > @@ -314,7 +321,7 @@ static int vduse_dev_set_status(struct vduse_dev *dev, u8 status)
> > > > > > > > > > > return vduse_dev_msg_sync(dev, &msg);
> > > > > > > > > > > }
> > > > > > > > > > >
> > > > > > > > > > > -static int vduse_dev_update_iotlb(struct vduse_dev *dev,
> > > > > > > > > > > +static int vduse_dev_update_iotlb(struct vduse_dev *dev, u32 asid,
> > > > > > > > > > > u64 start, u64 last)
> > > > > > > > > > > {
> > > > > > > > > > > struct vduse_dev_msg msg = { 0 };
> > > > > > > > > > > @@ -323,8 +330,14 @@ static int vduse_dev_update_iotlb(struct vduse_dev *dev,
> > > > > > > > > > > return -EINVAL;
> > > > > > > > > > >
> > > > > > > > > > > msg.req.type = VDUSE_UPDATE_IOTLB;
> > > > > > > > > > > - msg.req.iova.start = start;
> > > > > > > > > > > - msg.req.iova.last = last;
> > > > > > > > > > > + if (dev->api_version < VDUSE_API_VERSION_1) {
> > > > > > > > > > > + msg.req.iova.start = start;
> > > > > > > > > > > + msg.req.iova.last = last;
> > > > > > > > > > > + } else {
> > > > > > > > > > > + msg.req.iova_v2.start = start;
> > > > > > > > > > > + msg.req.iova_v2.last = last;
> > > > > > > > > > > + msg.req.iova_v2.asid = asid;
> > > > > > > > > > > + }
> > > > > > > > > > >
> > > > > > > > > > > return vduse_dev_msg_sync(dev, &msg);
> > > > > > > > > > > }
> > > > > > > > > > > @@ -436,14 +449,29 @@ static __poll_t vduse_dev_poll(struct file *file, poll_table *wait)
> > > > > > > > > > > return mask;
> > > > > > > > > > > }
> > > > > > > > > > >
> > > > > > > > > > > +/* Force set the asid to a vq group without a message to the VDUSE device */
> > > > > > > > > > > +static void vduse_set_group_asid_nomsg(struct vduse_dev *dev,
> > > > > > > > > > > + unsigned int group, unsigned int asid)
> > > > > > > > > > > +{
> > > > > > > > > > > + mutex_lock(&dev->domain_lock);
> > > > > > > > > > > + dev->groups[group].as = &dev->as[asid];
> > > > > > > > > > > + mutex_unlock(&dev->domain_lock);
> > > > > > > > > > > +}
> > > > > > > > > > > +
> > > > > > > > > > > static void vduse_dev_reset(struct vduse_dev *dev)
> > > > > > > > > > > {
> > > > > > > > > > > int i;
> > > > > > > > > > > - struct vduse_iova_domain *domain = dev->domain;
> > > > > > > > > > >
> > > > > > > > > > > /* The coherent mappings are handled in vduse_dev_free_coherent() */
> > > > > > > > > > > - if (domain && domain->bounce_map)
> > > > > > > > > > > - vduse_domain_reset_bounce_map(domain);
> > > > > > > > > > > + for (i = 0; i < dev->nas; i++) {
> > > > > > > > > > > + struct vduse_iova_domain *domain = dev->as[i].domain;
> > > > > > > > > > > +
> > > > > > > > > > > + if (domain && domain->bounce_map)
> > > > > > > > > > > + vduse_domain_reset_bounce_map(domain);
> > > > > > > > > > > + }
> > > > > > > > > > > +
> > > > > > > > > > > + for (i = 0; i < dev->ngroups; i++)
> > > > > > > > > > > + vduse_set_group_asid_nomsg(dev, i, 0);
> > > > > > > > > > >
> > > > > > > > > > > down_write(&dev->rwsem);
> > > > > > > > > > >
> > > > > > > > > > > @@ -623,6 +651,29 @@ static union virtio_map vduse_get_vq_map(struct vdpa_device *vdpa, u16 idx)
> > > > > > > > > > > return ret;
> > > > > > > > > > > }
> > > > > > > > > > >
> > > > > > > > > > > +static int vduse_set_group_asid(struct vdpa_device *vdpa, unsigned int group,
> > > > > > > > > > > + unsigned int asid)
> > > > > > > > > > > +{
> > > > > > > > > > > + struct vduse_dev *dev = vdpa_to_vduse(vdpa);
> > > > > > > > > > > + struct vduse_dev_msg msg = { 0 };
> > > > > > > > > > > + int r;
> > > > > > > > > > > +
> > > > > > > > > > > + if (dev->api_version < VDUSE_API_VERSION_1 ||
> > > > > > > > > > > + group >= dev->ngroups || asid >= dev->nas)
> > > > > > > > > > > + return -EINVAL;
> > > > > > > > > > > +
> > > > > > > > > > > + msg.req.type = VDUSE_SET_VQ_GROUP_ASID;
> > > > > > > > > > > + msg.req.vq_group_asid.group = group;
> > > > > > > > > > > + msg.req.vq_group_asid.asid = asid;
> > > > > > > > > > > +
> > > > > > > > > > > + r = vduse_dev_msg_sync(dev, &msg);
> > > > > > > > > > > + if (r < 0)
> > > > > > > > > > > + return r;
> > > > > > > > > > > +
> > > > > > > > > > > + vduse_set_group_asid_nomsg(dev, group, asid);
> > > > > > > > > > > + return 0;
> > > > > > > > > > > +}
> > > > > > > > > > > +
> > > > > > > > > > > static int vduse_vdpa_get_vq_state(struct vdpa_device *vdpa, u16 idx,
> > > > > > > > > > > struct vdpa_vq_state *state)
> > > > > > > > > > > {
> > > > > > > > > > > @@ -794,13 +845,13 @@ static int vduse_vdpa_set_map(struct vdpa_device *vdpa,
> > > > > > > > > > > struct vduse_dev *dev = vdpa_to_vduse(vdpa);
> > > > > > > > > > > int ret;
> > > > > > > > > > >
> > > > > > > > > > > - ret = vduse_domain_set_map(dev->domain, iotlb);
> > > > > > > > > > > + ret = vduse_domain_set_map(dev->as[asid].domain, iotlb);
> > > > > > > > > > > if (ret)
> > > > > > > > > > > return ret;
> > > > > > > > > > >
> > > > > > > > > > > - ret = vduse_dev_update_iotlb(dev, 0ULL, ULLONG_MAX);
> > > > > > > > > > > + ret = vduse_dev_update_iotlb(dev, asid, 0ULL, ULLONG_MAX);
> > > > > > > > > > > if (ret) {
> > > > > > > > > > > - vduse_domain_clear_map(dev->domain, iotlb);
> > > > > > > > > > > + vduse_domain_clear_map(dev->as[asid].domain, iotlb);
> > > > > > > > > > > return ret;
> > > > > > > > > > > }
> > > > > > > > > > >
> > > > > > > > > > > @@ -843,6 +894,7 @@ static const struct vdpa_config_ops vduse_vdpa_config_ops = {
> > > > > > > > > > > .get_vq_affinity = vduse_vdpa_get_vq_affinity,
> > > > > > > > > > > .reset = vduse_vdpa_reset,
> > > > > > > > > > > .set_map = vduse_vdpa_set_map,
> > > > > > > > > > > + .set_group_asid = vduse_set_group_asid,
> > > > > > > > > > > .get_vq_map = vduse_get_vq_map,
> > > > > > > > > > > .free = vduse_vdpa_free,
> > > > > > > > > > > };
> > > > > > > > > > > @@ -858,9 +910,10 @@ static void vduse_dev_sync_single_for_device(union virtio_map token,
> > > > > > > > > > > return;
> > > > > > > > > > >
> > > > > > > > > > > vdev = token.group->dev;
> > > > > > > > > > > - domain = vdev->domain;
> > > > > > > > > > > -
> > > > > > > > > > > + mutex_lock(&vdev->domain_lock);
> > > > > > > > > > > + domain = token.group->as->domain;
> > > > > > > > > > > vduse_domain_sync_single_for_device(domain, dma_addr, size, dir);
> > > > > > > > > > > + mutex_unlock(&vdev->domain_lock);
> > > > > > > > > > > }
> > > > > > > > > > >
> > > > > > > > > > > static void vduse_dev_sync_single_for_cpu(union virtio_map token,
> > > > > > > > > > > @@ -874,9 +927,10 @@ static void vduse_dev_sync_single_for_cpu(union virtio_map token,
> > > > > > > > > > > return;
> > > > > > > > > > >
> > > > > > > > > > > vdev = token.group->dev;
> > > > > > > > > > > - domain = vdev->domain;
> > > > > > > > > > > -
> > > > > > > > > > > + mutex_lock(&vdev->domain_lock);
> > > > > > > > > > > + domain = token.group->as->domain;
> > > > > > > > > > > vduse_domain_sync_single_for_cpu(domain, dma_addr, size, dir);
> > > > > > > > > > > + mutex_unlock(&vdev->domain_lock);
> > > > > > > > > > > }
> > > > > > > > > > >
> > > > > > > > > > > static dma_addr_t vduse_dev_map_page(union virtio_map token, struct page *page,
> > > > > > > > > > > @@ -886,14 +940,18 @@ static dma_addr_t vduse_dev_map_page(union virtio_map token, struct page *page,
> > > > > > > > > > > {
> > > > > > > > > > > struct vduse_dev *vdev;
> > > > > > > > > > > struct vduse_iova_domain *domain;
> > > > > > > > > > > + dma_addr_t r;
> > > > > > > > > > >
> > > > > > > > > > > if (!token.group)
> > > > > > > > > > > return DMA_MAPPING_ERROR;
> > > > > > > > > > >
> > > > > > > > > > > vdev = token.group->dev;
> > > > > > > > > > > - domain = vdev->domain;
> > > > > > > > > > > + mutex_lock(&vdev->domain_lock);
> > > > > > > > > >
> > > > > > > > > > The mutex_lock can't be used here since the dma ops might be called in
> > > > > > > > > > atomic context. And I think we can just remove it since creation and
> > > > > > > > > > deletion operations of the iova domain are guaranteed not to execute
> > > > > > > > > > concurrently with I/O operations.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > That would be great indeed! Can you expand on this, what protects here
> > > > > > > > > from the moment the two syscalls are issues from userland?
> > > > > > > > >
> > > > > > > >
> > > > > > > > The domain mutex lock is introduced in commit
> > > > > > > > d4438d23eeeef8bb1b3a8abe418abffd60bb544a ("vduse: Delay iova domain
> > > > > > > > creation"). It's used to prevent concurrent execution between
> > > > > > > > vdpa_dev_add() and some vduse device ioctl which needs to access the
> > > > > > > > iova domain such as VDUSE_IOTLB_REG_UMEM/DEREG_UMEM. But the dma ops
> > > > > > > > would not be called until vdpa_dev_add() completed, so the mutex lock
> > > > > > > > is not needed.
> > > > > > > >
> > > > > > >
> > > > > > > Yes, but the usage is extended here to also protect from concurrent
> > > > > > > calls to vduse_dev_map_page and vduse_set_group_asid or similar.
> > > > > > >
> > > > > > > So I guess the best is to replace it with a spinlock or RCU.
> > > > > > >
> > > > > >
> > > > > > OK, I see, we simply aim to prevent concurrent access to the group->as
> > > > > > variable here. But I wonder if the .set_group_asid function can be
> > > > > > called during I/O.
> > > > >
> > > > > At least a malicious userland app could do it.
> > > > >
> > > > > > I think the unmap_page() would fail if we change
> > > > > > the group->as between map_page() and unmap_page().
>
> It's better for the VUDSE to prepare for this.
>
> > > > >
> > > > > I guess a SIGSEGV or similar could be a valid outcome of that, but we
> > > > > need to protect the "as" pointer anyway.
>
> SIGSEGV is probably not suitable for the case of map/unmap as they
> might be executed in bh.
>
> > > > >
> > > > > > Besides, it seems
> > > > > > that .set_group_asid is only called in the vhost-vdpa case currently,
> > > > > > but the dma ops such as map_page/unmap_page only service the
> > > > > > virtio-vdpa case, so they would not be called concurrently with
> > > > > > .set_group_asid now.
> > > > > >
> > > > >
> > > > > That's totally true but a malicious app can do it anyway. So it is
> > > > > better to protect it properly.
> > > > >
> > > >
> > > > Actually there is no interface for userspace to call .set_group_asid
> > > > if the vduse device is binded to the virtio-vdpa driver.
> > > >
> > >
> > > It is possible from vhost-vdpa.
> > >
> >
> > Then there is no interface for userspace to call dma_ops such as
> > map_page/unmap_page if the device is binded to the vhost-vdpa driver.
> >
> > Thanks,
> > Yongji
> >
>
> Right now, but we can't say that for the future drivers. Unless we
> have a guard in the vdpa core, we probably need to prepare for that.
>
OK, then I think we can introduce a new RCU lock here which is hurtless.
Thanks,
Yongji
^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2025-11-03 9:45 UTC | newest]
Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-30 10:00 [PATCH v8 0/6] Add multiple address spaces support to VDUSE Eugenio Pérez
2025-10-30 10:00 ` [PATCH v8 1/6] vduse: add v1 API definition Eugenio Pérez
2025-10-30 12:10 ` Yongji Xie
2025-10-30 10:00 ` [PATCH v8 2/6] vduse: add vq group support Eugenio Pérez
2025-10-30 12:18 ` Yongji Xie
2025-10-30 10:00 ` [PATCH v8 3/6] vduse: return internal vq group struct as map token Eugenio Pérez
2025-10-30 10:00 ` [PATCH v8 4/6] vduse: refactor vdpa_dev_add for goto err handling Eugenio Pérez
2025-10-31 2:33 ` Yongji Xie
2025-10-30 10:00 ` [PATCH v8 5/6] vduse: add vq group asid support Eugenio Pérez
2025-10-30 11:52 ` Yongji Xie
2025-10-31 6:30 ` Eugenio Perez Martin
2025-10-31 6:51 ` [External] " Yongji Xie
2025-10-31 7:01 ` Yongji Xie
2025-10-31 14:08 ` Eugenio Perez Martin
2025-11-03 2:55 ` Yongji Xie
2025-11-03 6:40 ` Eugenio Perez Martin
2025-11-03 7:13 ` Yongji Xie
2025-11-03 8:19 ` Eugenio Perez Martin
2025-11-03 8:56 ` Yongji Xie
2025-11-03 9:26 ` Jason Wang
2025-11-03 9:45 ` Yongji Xie
2025-10-30 10:00 ` [PATCH v8 6/6] vduse: bump version number Eugenio Pérez
2025-10-31 2:33 ` Yongji Xie
2025-10-31 2:34 ` Yongji Xie
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).