* [PATCH v11 01/12] vduse: add v1 API definition
2026-01-09 15:24 [PATCH v11 00/12] Add multiple address spaces support to VDUSE Eugenio Pérez
@ 2026-01-09 15:24 ` Eugenio Pérez
2026-01-09 15:24 ` [PATCH v11 02/12] vduse: add vq group support Eugenio Pérez
` (10 subsequent siblings)
11 siblings, 0 replies; 40+ messages in thread
From: Eugenio Pérez @ 2026-01-09 15:24 UTC (permalink / raw)
To: Michael S . Tsirkin
Cc: linux-kernel, virtualization, Maxime Coquelin, Laurent Vivier,
Cindy Lu, jasowang, Xuan Zhuo, Stefano Garzarella, Yongji Xie,
Eugenio Pérez
This allows the kernel to detect whether the userspace VDUSE device
supports the VQ group and ASID features. VDUSE devices that don't set
the V1 API will not receive the new messages, and vdpa device will be
created with only one vq group and asid.
The next patches implement the new feature incrementally, only enabling
the VDUSE device to set the V1 API version by the end of the series.
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Xie Yongji <xieyongji@bytedance.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
include/uapi/linux/vduse.h | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/include/uapi/linux/vduse.h b/include/uapi/linux/vduse.h
index 10ad71aa00d6..ccb92a1efce0 100644
--- a/include/uapi/linux/vduse.h
+++ b/include/uapi/linux/vduse.h
@@ -10,6 +10,10 @@
#define VDUSE_API_VERSION 0
+/* VQ groups and ASID support */
+
+#define VDUSE_API_VERSION_1 1
+
/*
* Get the version of VDUSE API that kernel supported (VDUSE_API_VERSION).
* This is used for future extension.
--
2.52.0
^ permalink raw reply related [flat|nested] 40+ messages in thread* [PATCH v11 02/12] vduse: add vq group support
2026-01-09 15:24 [PATCH v11 00/12] Add multiple address spaces support to VDUSE Eugenio Pérez
2026-01-09 15:24 ` [PATCH v11 01/12] vduse: add v1 API definition Eugenio Pérez
@ 2026-01-09 15:24 ` Eugenio Pérez
2026-01-10 23:44 ` Michael S. Tsirkin
2026-01-09 15:24 ` [PATCH v11 03/12] vduse: return internal vq group struct as map token Eugenio Pérez
` (9 subsequent siblings)
11 siblings, 1 reply; 40+ messages in thread
From: Eugenio Pérez @ 2026-01-09 15:24 UTC (permalink / raw)
To: Michael S . Tsirkin
Cc: linux-kernel, virtualization, Maxime Coquelin, Laurent Vivier,
Cindy Lu, jasowang, Xuan Zhuo, Stefano Garzarella, Yongji Xie,
Eugenio Pérez
This allows separate the different virtqueues in groups that shares the
same address space. Asking the VDUSE device for the groups of the vq at
the beginning as they're needed for the DMA API.
Allocating 3 vq groups as net is the device that need the most groups:
* Dataplane (guest passthrough)
* CVQ
* Shadowed vrings.
Future versions of the series can include dynamic allocation of the
groups array so VDUSE can declare more groups.
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Xie Yongji <xieyongji@bytedance.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
v11:
* Rename vq->vq_group to vq->group (Jason).
* Do not reset vq group at virtio reset (Jason).
v6:
* s/sepparate/separate (MST).
* s/dev->api_version < 1/dev->api_version < VDUSE_API_VERSION_1
v5:
* Revert core vdpa changes (Jason).
* Fix group == ngroup case in checking VQ_SETUP argument (Jason).
v4:
* Revert the "invalid vq group" concept and assume 0 if not set (Jason).
* Make config->ngroups == 0 invalid (Jason).
v3:
* Make the default group an invalid group as long as VDUSE device does
not set it to some valid u32 value. Modify the vdpa core to take that
into account (Jason).
* Create the VDUSE_DEV_MAX_GROUPS instead of using a magic number
v2:
* Now the vq group is in vduse_vq_config struct instead of issuing one
VDUSE message per vq.
v1:
* Fix: Remove BIT_ULL(VIRTIO_S_*), as _S_ is already the bit (Maxime)
RFC v3:
* Increase VDUSE_MAX_VQ_GROUPS to 0xffff (Jason). It was set to a lower
value to reduce memory consumption, but vqs are already limited to
that value and userspace VDUSE is able to allocate that many vqs.
* Remove the descs vq group capability as it will not be used and we can
add it on top.
* Do not ask for vq groups in number of vq groups < 2.
* Move the valid vq groups range check to vduse_validate_config.
RFC v2:
* Cache group information in kernel, as we need to provide the vq map
tokens properly.
* Add descs vq group to optimize SVQ forwarding and support indirect
descriptors out of the box.
---
drivers/vdpa/vdpa_user/vduse_dev.c | 47 ++++++++++++++++++++++++++----
include/uapi/linux/vduse.h | 12 ++++++--
2 files changed, 51 insertions(+), 8 deletions(-)
diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
index ae357d014564..87ce140c08fc 100644
--- a/drivers/vdpa/vdpa_user/vduse_dev.c
+++ b/drivers/vdpa/vdpa_user/vduse_dev.c
@@ -39,6 +39,7 @@
#define DRV_LICENSE "GPL v2"
#define VDUSE_DEV_MAX (1U << MINORBITS)
+#define VDUSE_DEV_MAX_GROUPS 0xffff
#define VDUSE_MAX_BOUNCE_SIZE (1024 * 1024 * 1024)
#define VDUSE_MIN_BOUNCE_SIZE (1024 * 1024)
#define VDUSE_BOUNCE_SIZE (64 * 1024 * 1024)
@@ -58,6 +59,7 @@ struct vduse_virtqueue {
struct vdpa_vq_state state;
bool ready;
bool kicked;
+ u32 group;
spinlock_t kick_lock;
spinlock_t irq_lock;
struct eventfd_ctx *kickfd;
@@ -114,6 +116,7 @@ struct vduse_dev {
u8 status;
u32 vq_num;
u32 vq_align;
+ u32 ngroups;
struct vduse_umem *umem;
struct mutex mem_lock;
unsigned int bounce_size;
@@ -592,6 +595,16 @@ static int vduse_vdpa_set_vq_state(struct vdpa_device *vdpa, u16 idx,
return 0;
}
+static u32 vduse_get_vq_group(struct vdpa_device *vdpa, u16 idx)
+{
+ struct vduse_dev *dev = vdpa_to_vduse(vdpa);
+
+ if (dev->api_version < VDUSE_API_VERSION_1)
+ return 0;
+
+ return dev->vqs[idx]->group;
+}
+
static int vduse_vdpa_get_vq_state(struct vdpa_device *vdpa, u16 idx,
struct vdpa_vq_state *state)
{
@@ -789,6 +802,7 @@ static const struct vdpa_config_ops vduse_vdpa_config_ops = {
.set_vq_cb = vduse_vdpa_set_vq_cb,
.set_vq_num = vduse_vdpa_set_vq_num,
.get_vq_size = vduse_vdpa_get_vq_size,
+ .get_vq_group = vduse_get_vq_group,
.set_vq_ready = vduse_vdpa_set_vq_ready,
.get_vq_ready = vduse_vdpa_get_vq_ready,
.set_vq_state = vduse_vdpa_set_vq_state,
@@ -1252,12 +1266,24 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
if (config.index >= dev->vq_num)
break;
- if (!is_mem_zero((const char *)config.reserved,
- sizeof(config.reserved)))
+ if (dev->api_version < VDUSE_API_VERSION_1 && config.group)
+ break;
+
+ if (dev->api_version >= VDUSE_API_VERSION_1) {
+ if (config.group >= dev->ngroups)
+ break;
+ if (dev->status & VIRTIO_CONFIG_S_DRIVER_OK)
+ break;
+ }
+
+ if (config.reserved1 ||
+ !is_mem_zero((const char *)config.reserved2,
+ sizeof(config.reserved2)))
break;
index = array_index_nospec(config.index, dev->vq_num);
dev->vqs[index]->num_max = config.max_size;
+ dev->vqs[index]->group = config.group;
ret = 0;
break;
}
@@ -1737,12 +1763,20 @@ static bool features_is_valid(struct vduse_dev_config *config)
return true;
}
-static bool vduse_validate_config(struct vduse_dev_config *config)
+static bool vduse_validate_config(struct vduse_dev_config *config,
+ u64 api_version)
{
if (!is_mem_zero((const char *)config->reserved,
sizeof(config->reserved)))
return false;
+ if (api_version < VDUSE_API_VERSION_1 && config->ngroups)
+ return false;
+
+ if (api_version >= VDUSE_API_VERSION_1 &&
+ (!config->ngroups || config->ngroups > VDUSE_DEV_MAX_GROUPS))
+ return false;
+
if (config->vq_align > PAGE_SIZE)
return false;
@@ -1858,6 +1892,9 @@ static int vduse_create_dev(struct vduse_dev_config *config,
dev->device_features = config->features;
dev->device_id = config->device_id;
dev->vendor_id = config->vendor_id;
+ dev->ngroups = (dev->api_version < VDUSE_API_VERSION_1)
+ ? 1
+ : config->ngroups;
dev->name = kstrdup(config->name, GFP_KERNEL);
if (!dev->name)
goto err_str;
@@ -1936,7 +1973,7 @@ static long vduse_ioctl(struct file *file, unsigned int cmd,
break;
ret = -EINVAL;
- if (vduse_validate_config(&config) == false)
+ if (!vduse_validate_config(&config, control->api_version))
break;
buf = vmemdup_user(argp + size, config.config_size);
@@ -2017,7 +2054,7 @@ static int vduse_dev_init_vdpa(struct vduse_dev *dev, const char *name)
vdev = vdpa_alloc_device(struct vduse_vdpa, vdpa, dev->dev,
&vduse_vdpa_config_ops, &vduse_map_ops,
- 1, 1, name, true);
+ dev->ngroups, 1, name, true);
if (IS_ERR(vdev))
return PTR_ERR(vdev);
diff --git a/include/uapi/linux/vduse.h b/include/uapi/linux/vduse.h
index ccb92a1efce0..a3d51cf6df3a 100644
--- a/include/uapi/linux/vduse.h
+++ b/include/uapi/linux/vduse.h
@@ -31,6 +31,7 @@
* @features: virtio features
* @vq_num: the number of virtqueues
* @vq_align: the allocation alignment of virtqueue's metadata
+ * @ngroups: number of vq groups that VDUSE device declares
* @reserved: for future use, needs to be initialized to zero
* @config_size: the size of the configuration space
* @config: the buffer of the configuration space
@@ -45,7 +46,8 @@ struct vduse_dev_config {
__u64 features;
__u32 vq_num;
__u32 vq_align;
- __u32 reserved[13];
+ __u32 ngroups; /* if VDUSE_API_VERSION >= 1 */
+ __u32 reserved[12];
__u32 config_size;
__u8 config[];
};
@@ -122,14 +124,18 @@ struct vduse_config_data {
* struct vduse_vq_config - basic configuration of a virtqueue
* @index: virtqueue index
* @max_size: the max size of virtqueue
- * @reserved: for future use, needs to be initialized to zero
+ * @reserved1: for future use, needs to be initialized to zero
+ * @group: virtqueue group
+ * @reserved2: for future use, needs to be initialized to zero
*
* Structure used by VDUSE_VQ_SETUP ioctl to setup a virtqueue.
*/
struct vduse_vq_config {
__u32 index;
__u16 max_size;
- __u16 reserved[13];
+ __u16 reserved1;
+ __u32 group;
+ __u16 reserved2[10];
};
/*
--
2.52.0
^ permalink raw reply related [flat|nested] 40+ messages in thread* Re: [PATCH v11 02/12] vduse: add vq group support
2026-01-09 15:24 ` [PATCH v11 02/12] vduse: add vq group support Eugenio Pérez
@ 2026-01-10 23:44 ` Michael S. Tsirkin
2026-01-12 7:35 ` Eugenio Perez Martin
0 siblings, 1 reply; 40+ messages in thread
From: Michael S. Tsirkin @ 2026-01-10 23:44 UTC (permalink / raw)
To: Eugenio Pérez
Cc: linux-kernel, virtualization, Maxime Coquelin, Laurent Vivier,
Cindy Lu, jasowang, Xuan Zhuo, Stefano Garzarella, Yongji Xie
On Fri, Jan 09, 2026 at 04:24:20PM +0100, Eugenio Pérez wrote:
> @@ -1252,12 +1266,24 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
> if (config.index >= dev->vq_num)
> break;
>
> - if (!is_mem_zero((const char *)config.reserved,
> - sizeof(config.reserved)))
> + if (dev->api_version < VDUSE_API_VERSION_1 && config.group)
> + break;
> +
> + if (dev->api_version >= VDUSE_API_VERSION_1) {
> + if (config.group >= dev->ngroups)
> + break;
> + if (dev->status & VIRTIO_CONFIG_S_DRIVER_OK)
> + break;
> + }
> +
> + if (config.reserved1 ||
> + !is_mem_zero((const char *)config.reserved2,
> + sizeof(config.reserved2)))
Hmm but if api version is 0 then group should be 0 no?
We should validate.
^ permalink raw reply [flat|nested] 40+ messages in thread* Re: [PATCH v11 02/12] vduse: add vq group support
2026-01-10 23:44 ` Michael S. Tsirkin
@ 2026-01-12 7:35 ` Eugenio Perez Martin
2026-01-12 8:00 ` Michael S. Tsirkin
0 siblings, 1 reply; 40+ messages in thread
From: Eugenio Perez Martin @ 2026-01-12 7:35 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: linux-kernel, virtualization, Maxime Coquelin, Laurent Vivier,
Cindy Lu, jasowang, Xuan Zhuo, Stefano Garzarella, Yongji Xie
On Sun, Jan 11, 2026 at 12:44 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Fri, Jan 09, 2026 at 04:24:20PM +0100, Eugenio Pérez wrote:
> > @@ -1252,12 +1266,24 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
> > if (config.index >= dev->vq_num)
> > break;
> >
> > - if (!is_mem_zero((const char *)config.reserved,
> > - sizeof(config.reserved)))
> > + if (dev->api_version < VDUSE_API_VERSION_1 && config.group)
> > + break;
(Bookmarking the piece of code above as [1] to reference later)
> > +
> > + if (dev->api_version >= VDUSE_API_VERSION_1) {
> > + if (config.group >= dev->ngroups)
> > + break;
> > + if (dev->status & VIRTIO_CONFIG_S_DRIVER_OK)
> > + break;
> > + }
> > +
> > + if (config.reserved1 ||
> > + !is_mem_zero((const char *)config.reserved2,
> > + sizeof(config.reserved2)))
>
> Hmm but if api version is 0 then group should be 0 no?
> We should validate.
>
The check (dev->api_version < VDUSE_API_VERSION_1 && config.group) is
above this check in this set of changes [1], am I missing something?
Would you prefer it to be reordered here or written differently?
^ permalink raw reply [flat|nested] 40+ messages in thread* Re: [PATCH v11 02/12] vduse: add vq group support
2026-01-12 7:35 ` Eugenio Perez Martin
@ 2026-01-12 8:00 ` Michael S. Tsirkin
2026-01-12 12:09 ` Eugenio Perez Martin
0 siblings, 1 reply; 40+ messages in thread
From: Michael S. Tsirkin @ 2026-01-12 8:00 UTC (permalink / raw)
To: Eugenio Perez Martin
Cc: linux-kernel, virtualization, Maxime Coquelin, Laurent Vivier,
Cindy Lu, jasowang, Xuan Zhuo, Stefano Garzarella, Yongji Xie
On Mon, Jan 12, 2026 at 08:35:37AM +0100, Eugenio Perez Martin wrote:
> On Sun, Jan 11, 2026 at 12:44 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Fri, Jan 09, 2026 at 04:24:20PM +0100, Eugenio Pérez wrote:
> > > @@ -1252,12 +1266,24 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
> > > if (config.index >= dev->vq_num)
> > > break;
> > >
> > > - if (!is_mem_zero((const char *)config.reserved,
> > > - sizeof(config.reserved)))
> > > + if (dev->api_version < VDUSE_API_VERSION_1 && config.group)
> > > + break;
>
> (Bookmarking the piece of code above as [1] to reference later)
>
> > > +
> > > + if (dev->api_version >= VDUSE_API_VERSION_1) {
> > > + if (config.group >= dev->ngroups)
> > > + break;
> > > + if (dev->status & VIRTIO_CONFIG_S_DRIVER_OK)
> > > + break;
> > > + }
> > > +
> > > + if (config.reserved1 ||
> > > + !is_mem_zero((const char *)config.reserved2,
> > > + sizeof(config.reserved2)))
> >
> > Hmm but if api version is 0 then group should be 0 no?
> > We should validate.
> >
>
> The check (dev->api_version < VDUSE_API_VERSION_1 && config.group) is
> above this check in this set of changes [1], am I missing something?
> Would you prefer it to be reordered here or written differently?
Oh you are right. It's just not very clear that everything is covered.
if (dev->api_version < VDUSE_API_VERSION_1) {
if (config.group)
....
} else {
....
}
would be clearer.
BTW I don't really like this idiom of "break to return".
Just return -EINVAL would be more explicit.
But this is the way current code handles it, so I'm not demanding
it is changed as part of this patchset.
--
MST
^ permalink raw reply [flat|nested] 40+ messages in thread* Re: [PATCH v11 02/12] vduse: add vq group support
2026-01-12 8:00 ` Michael S. Tsirkin
@ 2026-01-12 12:09 ` Eugenio Perez Martin
0 siblings, 0 replies; 40+ messages in thread
From: Eugenio Perez Martin @ 2026-01-12 12:09 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: linux-kernel, virtualization, Maxime Coquelin, Laurent Vivier,
Cindy Lu, jasowang, Xuan Zhuo, Stefano Garzarella, Yongji Xie
On Mon, Jan 12, 2026 at 9:00 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Mon, Jan 12, 2026 at 08:35:37AM +0100, Eugenio Perez Martin wrote:
> > On Sun, Jan 11, 2026 at 12:44 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Fri, Jan 09, 2026 at 04:24:20PM +0100, Eugenio Pérez wrote:
> > > > @@ -1252,12 +1266,24 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
> > > > if (config.index >= dev->vq_num)
> > > > break;
> > > >
> > > > - if (!is_mem_zero((const char *)config.reserved,
> > > > - sizeof(config.reserved)))
> > > > + if (dev->api_version < VDUSE_API_VERSION_1 && config.group)
> > > > + break;
> >
> > (Bookmarking the piece of code above as [1] to reference later)
> >
> > > > +
> > > > + if (dev->api_version >= VDUSE_API_VERSION_1) {
> > > > + if (config.group >= dev->ngroups)
> > > > + break;
> > > > + if (dev->status & VIRTIO_CONFIG_S_DRIVER_OK)
> > > > + break;
> > > > + }
> > > > +
> > > > + if (config.reserved1 ||
> > > > + !is_mem_zero((const char *)config.reserved2,
> > > > + sizeof(config.reserved2)))
> > >
> > > Hmm but if api version is 0 then group should be 0 no?
> > > We should validate.
> > >
> >
> > The check (dev->api_version < VDUSE_API_VERSION_1 && config.group) is
> > above this check in this set of changes [1], am I missing something?
> > Would you prefer it to be reordered here or written differently?
>
>
> Oh you are right. It's just not very clear that everything is covered.
>
> if (dev->api_version < VDUSE_API_VERSION_1) {
> if (config.group)
> ....
> } else {
> ....
> }
>
>
> would be clearer.
>
>
> BTW I don't really like this idiom of "break to return".
> Just return -EINVAL would be more explicit.
>
I agree. Maybe if I change an entire case block I could move it to
return style in the future, case by case. Would that be ok?
> But this is the way current code handles it, so I'm not demanding
> it is changed as part of this patchset.
>
> --
> MST
>
^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH v11 03/12] vduse: return internal vq group struct as map token
2026-01-09 15:24 [PATCH v11 00/12] Add multiple address spaces support to VDUSE Eugenio Pérez
2026-01-09 15:24 ` [PATCH v11 01/12] vduse: add v1 API definition Eugenio Pérez
2026-01-09 15:24 ` [PATCH v11 02/12] vduse: add vq group support Eugenio Pérez
@ 2026-01-09 15:24 ` Eugenio Pérez
2026-01-09 15:24 ` [PATCH v11 04/12] vhost: move vdpa group bound check to vhost_vdpa Eugenio Pérez
` (8 subsequent siblings)
11 siblings, 0 replies; 40+ messages in thread
From: Eugenio Pérez @ 2026-01-09 15:24 UTC (permalink / raw)
To: Michael S . Tsirkin
Cc: linux-kernel, virtualization, Maxime Coquelin, Laurent Vivier,
Cindy Lu, jasowang, Xuan Zhuo, Stefano Garzarella, Yongji Xie,
Eugenio Pérez
Return the internal struct that represents the vq group as virtqueue map
token, instead of the device. This allows the map functions to access
the information per group.
At this moment all the virtqueues share the same vq group, that only
can point to ASID 0. This change prepares the infrastructure for actual
per-group address space handling
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
v4:
* Revert the "invalid vq group" concept, and assume 0 by default.
* Revert unnecesary blank line addition (Jason)
v3:
* Adapt all virtio_map_ops callbacks to handle empty tokens in case of
invalid groups.
* Make setting status DRIVER_OK fail if vq group is not valid.
* Remove the _int name suffix from struct vduse_vq_group.
RFC v3:
* Make the vq groups a dynamic array to support an arbitrary number of
them.
---
drivers/vdpa/vdpa_user/vduse_dev.c | 100 ++++++++++++++++++++++++++---
include/linux/virtio.h | 6 +-
2 files changed, 94 insertions(+), 12 deletions(-)
diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
index 87ce140c08fc..0386434577b5 100644
--- a/drivers/vdpa/vdpa_user/vduse_dev.c
+++ b/drivers/vdpa/vdpa_user/vduse_dev.c
@@ -22,6 +22,7 @@
#include <linux/uio.h>
#include <linux/vdpa.h>
#include <linux/nospec.h>
+#include <linux/virtio.h>
#include <linux/vmalloc.h>
#include <linux/sched/mm.h>
#include <uapi/linux/vduse.h>
@@ -85,6 +86,10 @@ struct vduse_umem {
struct mm_struct *mm;
};
+struct vduse_vq_group {
+ struct vduse_dev *dev;
+};
+
struct vduse_dev {
struct vduse_vdpa *vdev;
struct device *dev;
@@ -118,6 +123,7 @@ struct vduse_dev {
u32 vq_align;
u32 ngroups;
struct vduse_umem *umem;
+ struct vduse_vq_group *groups;
struct mutex mem_lock;
unsigned int bounce_size;
struct mutex domain_lock;
@@ -605,6 +611,17 @@ static u32 vduse_get_vq_group(struct vdpa_device *vdpa, u16 idx)
return dev->vqs[idx]->group;
}
+static union virtio_map vduse_get_vq_map(struct vdpa_device *vdpa, u16 idx)
+{
+ struct vduse_dev *dev = vdpa_to_vduse(vdpa);
+ u32 vq_group = vduse_get_vq_group(vdpa, idx);
+ union virtio_map ret = {
+ .group = &dev->groups[vq_group],
+ };
+
+ return ret;
+}
+
static int vduse_vdpa_get_vq_state(struct vdpa_device *vdpa, u16 idx,
struct vdpa_vq_state *state)
{
@@ -825,6 +842,7 @@ static const struct vdpa_config_ops vduse_vdpa_config_ops = {
.get_vq_affinity = vduse_vdpa_get_vq_affinity,
.reset = vduse_vdpa_reset,
.set_map = vduse_vdpa_set_map,
+ .get_vq_map = vduse_get_vq_map,
.free = vduse_vdpa_free,
};
@@ -832,7 +850,14 @@ static void vduse_dev_sync_single_for_device(union virtio_map token,
dma_addr_t dma_addr, size_t size,
enum dma_data_direction dir)
{
- struct vduse_iova_domain *domain = token.iova_domain;
+ struct vduse_dev *vdev;
+ struct vduse_iova_domain *domain;
+
+ if (!token.group)
+ return;
+
+ vdev = token.group->dev;
+ domain = vdev->domain;
vduse_domain_sync_single_for_device(domain, dma_addr, size, dir);
}
@@ -841,7 +866,14 @@ static void vduse_dev_sync_single_for_cpu(union virtio_map token,
dma_addr_t dma_addr, size_t size,
enum dma_data_direction dir)
{
- struct vduse_iova_domain *domain = token.iova_domain;
+ struct vduse_dev *vdev;
+ struct vduse_iova_domain *domain;
+
+ if (!token.group)
+ return;
+
+ vdev = token.group->dev;
+ domain = vdev->domain;
vduse_domain_sync_single_for_cpu(domain, dma_addr, size, dir);
}
@@ -851,7 +883,14 @@ static dma_addr_t vduse_dev_map_page(union virtio_map token, struct page *page,
enum dma_data_direction dir,
unsigned long attrs)
{
- struct vduse_iova_domain *domain = token.iova_domain;
+ struct vduse_dev *vdev;
+ struct vduse_iova_domain *domain;
+
+ if (!token.group)
+ return DMA_MAPPING_ERROR;
+
+ vdev = token.group->dev;
+ domain = vdev->domain;
return vduse_domain_map_page(domain, page, offset, size, dir, attrs);
}
@@ -860,7 +899,14 @@ static void vduse_dev_unmap_page(union virtio_map token, dma_addr_t dma_addr,
size_t size, enum dma_data_direction dir,
unsigned long attrs)
{
- struct vduse_iova_domain *domain = token.iova_domain;
+ struct vduse_dev *vdev;
+ struct vduse_iova_domain *domain;
+
+ if (!token.group)
+ return;
+
+ vdev = token.group->dev;
+ domain = vdev->domain;
return vduse_domain_unmap_page(domain, dma_addr, size, dir, attrs);
}
@@ -868,11 +914,17 @@ static void vduse_dev_unmap_page(union virtio_map token, dma_addr_t dma_addr,
static void *vduse_dev_alloc_coherent(union virtio_map token, size_t size,
dma_addr_t *dma_addr, gfp_t flag)
{
- struct vduse_iova_domain *domain = token.iova_domain;
+ struct vduse_dev *vdev;
+ struct vduse_iova_domain *domain;
unsigned long iova;
void *addr;
*dma_addr = DMA_MAPPING_ERROR;
+ if (!token.group)
+ return NULL;
+
+ vdev = token.group->dev;
+ domain = vdev->domain;
addr = vduse_domain_alloc_coherent(domain, size,
(dma_addr_t *)&iova, flag);
if (!addr)
@@ -887,14 +939,28 @@ static void vduse_dev_free_coherent(union virtio_map token, size_t size,
void *vaddr, dma_addr_t dma_addr,
unsigned long attrs)
{
- struct vduse_iova_domain *domain = token.iova_domain;
+ struct vduse_dev *vdev;
+ struct vduse_iova_domain *domain;
+
+ if (!token.group)
+ return;
+
+ vdev = token.group->dev;
+ domain = vdev->domain;
vduse_domain_free_coherent(domain, size, vaddr, dma_addr, attrs);
}
static bool vduse_dev_need_sync(union virtio_map token, dma_addr_t dma_addr)
{
- struct vduse_iova_domain *domain = token.iova_domain;
+ struct vduse_dev *vdev;
+ struct vduse_iova_domain *domain;
+
+ if (!token.group)
+ return false;
+
+ vdev = token.group->dev;
+ domain = vdev->domain;
return dma_addr < domain->bounce_size;
}
@@ -908,7 +974,14 @@ static int vduse_dev_mapping_error(union virtio_map token, dma_addr_t dma_addr)
static size_t vduse_dev_max_mapping_size(union virtio_map token)
{
- struct vduse_iova_domain *domain = token.iova_domain;
+ struct vduse_dev *vdev;
+ struct vduse_iova_domain *domain;
+
+ if (!token.group)
+ return 0;
+
+ vdev = token.group->dev;
+ domain = vdev->domain;
return domain->bounce_size;
}
@@ -1726,6 +1799,7 @@ static int vduse_destroy_dev(char *name)
if (dev->domain)
vduse_domain_destroy(dev->domain);
kfree(dev->name);
+ kfree(dev->groups);
vduse_dev_destroy(dev);
module_put(THIS_MODULE);
@@ -1895,6 +1969,13 @@ static int vduse_create_dev(struct vduse_dev_config *config,
dev->ngroups = (dev->api_version < VDUSE_API_VERSION_1)
? 1
: config->ngroups;
+ dev->groups = kcalloc(dev->ngroups, sizeof(dev->groups[0]),
+ GFP_KERNEL);
+ if (!dev->groups)
+ goto err_vq_groups;
+ for (u32 i = 0; i < dev->ngroups; ++i)
+ dev->groups[i].dev = dev;
+
dev->name = kstrdup(config->name, GFP_KERNEL);
if (!dev->name)
goto err_str;
@@ -1931,6 +2012,8 @@ static int vduse_create_dev(struct vduse_dev_config *config,
err_idr:
kfree(dev->name);
err_str:
+ kfree(dev->groups);
+err_vq_groups:
vduse_dev_destroy(dev);
err:
return ret;
@@ -2092,7 +2175,6 @@ static int vdpa_dev_add(struct vdpa_mgmt_dev *mdev, const char *name,
return -ENOMEM;
}
- dev->vdev->vdpa.vmap.iova_domain = dev->domain;
ret = _vdpa_register_device(&dev->vdev->vdpa, dev->vq_num);
if (ret) {
put_device(&dev->vdev->vdpa.dev);
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 63bb05ece8c5..3bbc4cb6a672 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -43,13 +43,13 @@ struct virtqueue {
void *priv;
};
-struct vduse_iova_domain;
+struct vduse_vq_group;
union virtio_map {
/* Device that performs DMA */
struct device *dma_dev;
- /* VDUSE specific mapping data */
- struct vduse_iova_domain *iova_domain;
+ /* VDUSE specific virtqueue group for doing map */
+ struct vduse_vq_group *group;
};
int virtqueue_add_outbuf(struct virtqueue *vq,
--
2.52.0
^ permalink raw reply related [flat|nested] 40+ messages in thread* [PATCH v11 04/12] vhost: move vdpa group bound check to vhost_vdpa
2026-01-09 15:24 [PATCH v11 00/12] Add multiple address spaces support to VDUSE Eugenio Pérez
` (2 preceding siblings ...)
2026-01-09 15:24 ` [PATCH v11 03/12] vduse: return internal vq group struct as map token Eugenio Pérez
@ 2026-01-09 15:24 ` Eugenio Pérez
2026-01-10 23:46 ` Michael S. Tsirkin
2026-01-09 15:24 ` [PATCH v11 05/12] vdpa: document set_group_asid thread safety Eugenio Pérez
` (7 subsequent siblings)
11 siblings, 1 reply; 40+ messages in thread
From: Eugenio Pérez @ 2026-01-09 15:24 UTC (permalink / raw)
To: Michael S . Tsirkin
Cc: linux-kernel, virtualization, Maxime Coquelin, Laurent Vivier,
Cindy Lu, jasowang, Xuan Zhuo, Stefano Garzarella, Yongji Xie,
Eugenio Pérez
Remove duplication by consolidating these here. This reduces the
posibility of a parent driver missing them.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
drivers/vdpa/mlx5/net/mlx5_vnet.c | 3 ---
drivers/vdpa/vdpa_sim/vdpa_sim.c | 6 ------
drivers/vhost/vdpa.c | 2 +-
3 files changed, 1 insertion(+), 10 deletions(-)
diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index ddaa1366704b..44062e9d68f0 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -3640,9 +3640,6 @@ static int mlx5_set_group_asid(struct vdpa_device *vdev, u32 group,
struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
int err = 0;
- if (group >= MLX5_VDPA_NUMVQ_GROUPS)
- return -EINVAL;
-
mvdev->mres.group2asid[group] = asid;
mutex_lock(&mvdev->mres.lock);
diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
index c1c6431950e1..df9c7ddc5d78 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
@@ -606,12 +606,6 @@ static int vdpasim_set_group_asid(struct vdpa_device *vdpa, unsigned int group,
struct vhost_iotlb *iommu;
int i;
- if (group > vdpasim->dev_attr.ngroups)
- return -EINVAL;
-
- if (asid >= vdpasim->dev_attr.nas)
- return -EINVAL;
-
iommu = &vdpasim->iommu[asid];
mutex_lock(&vdpasim->mutex);
diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 05a481e4c385..9d25b735b43d 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -680,7 +680,7 @@ static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, unsigned int cmd,
case VHOST_VDPA_SET_GROUP_ASID:
if (copy_from_user(&s, argp, sizeof(s)))
return -EFAULT;
- if (s.num >= vdpa->nas)
+ if (idx >= vdpa->ngroups || s.num >= vdpa->nas)
return -EINVAL;
if (!ops->set_group_asid)
return -EOPNOTSUPP;
--
2.52.0
^ permalink raw reply related [flat|nested] 40+ messages in thread* Re: [PATCH v11 04/12] vhost: move vdpa group bound check to vhost_vdpa
2026-01-09 15:24 ` [PATCH v11 04/12] vhost: move vdpa group bound check to vhost_vdpa Eugenio Pérez
@ 2026-01-10 23:46 ` Michael S. Tsirkin
2026-01-12 7:38 ` Eugenio Perez Martin
0 siblings, 1 reply; 40+ messages in thread
From: Michael S. Tsirkin @ 2026-01-10 23:46 UTC (permalink / raw)
To: Eugenio Pérez
Cc: linux-kernel, virtualization, Maxime Coquelin, Laurent Vivier,
Cindy Lu, jasowang, Xuan Zhuo, Stefano Garzarella, Yongji Xie
On Fri, Jan 09, 2026 at 04:24:22PM +0100, Eugenio Pérez wrote:
> Remove duplication by consolidating these here. This reduces the
> posibility of a parent driver missing them.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
> drivers/vdpa/mlx5/net/mlx5_vnet.c | 3 ---
> drivers/vdpa/vdpa_sim/vdpa_sim.c | 6 ------
> drivers/vhost/vdpa.c | 2 +-
> 3 files changed, 1 insertion(+), 10 deletions(-)
>
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index ddaa1366704b..44062e9d68f0 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -3640,9 +3640,6 @@ static int mlx5_set_group_asid(struct vdpa_device *vdev, u32 group,
> struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
> int err = 0;
>
> - if (group >= MLX5_VDPA_NUMVQ_GROUPS)
> - return -EINVAL;
> -
> mvdev->mres.group2asid[group] = asid;
>
> mutex_lock(&mvdev->mres.lock);
> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> index c1c6431950e1..df9c7ddc5d78 100644
> --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> @@ -606,12 +606,6 @@ static int vdpasim_set_group_asid(struct vdpa_device *vdpa, unsigned int group,
> struct vhost_iotlb *iommu;
> int i;
>
> - if (group > vdpasim->dev_attr.ngroups)
> - return -EINVAL;
> -
BTW is the original ">" here an off by one error? Should have been >= ?
if yes then this is a kind of bugfix and maybe needs a fixes tag.
> - if (asid >= vdpasim->dev_attr.nas)
> - return -EINVAL;
> -
> iommu = &vdpasim->iommu[asid];
>
> mutex_lock(&vdpasim->mutex);
> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> index 05a481e4c385..9d25b735b43d 100644
> --- a/drivers/vhost/vdpa.c
> +++ b/drivers/vhost/vdpa.c
> @@ -680,7 +680,7 @@ static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, unsigned int cmd,
> case VHOST_VDPA_SET_GROUP_ASID:
> if (copy_from_user(&s, argp, sizeof(s)))
> return -EFAULT;
> - if (s.num >= vdpa->nas)
> + if (idx >= vdpa->ngroups || s.num >= vdpa->nas)
> return -EINVAL;
> if (!ops->set_group_asid)
> return -EOPNOTSUPP;
> --
> 2.52.0
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v11 04/12] vhost: move vdpa group bound check to vhost_vdpa
2026-01-10 23:46 ` Michael S. Tsirkin
@ 2026-01-12 7:38 ` Eugenio Perez Martin
2026-01-12 7:56 ` Michael S. Tsirkin
0 siblings, 1 reply; 40+ messages in thread
From: Eugenio Perez Martin @ 2026-01-12 7:38 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: linux-kernel, virtualization, Maxime Coquelin, Laurent Vivier,
Cindy Lu, jasowang, Xuan Zhuo, Stefano Garzarella, Yongji Xie
On Sun, Jan 11, 2026 at 12:46 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Fri, Jan 09, 2026 at 04:24:22PM +0100, Eugenio Pérez wrote:
> > Remove duplication by consolidating these here. This reduces the
> > posibility of a parent driver missing them.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> > drivers/vdpa/mlx5/net/mlx5_vnet.c | 3 ---
> > drivers/vdpa/vdpa_sim/vdpa_sim.c | 6 ------
> > drivers/vhost/vdpa.c | 2 +-
> > 3 files changed, 1 insertion(+), 10 deletions(-)
> >
> > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > index ddaa1366704b..44062e9d68f0 100644
> > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > @@ -3640,9 +3640,6 @@ static int mlx5_set_group_asid(struct vdpa_device *vdev, u32 group,
> > struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
> > int err = 0;
> >
> > - if (group >= MLX5_VDPA_NUMVQ_GROUPS)
> > - return -EINVAL;
> > -
> > mvdev->mres.group2asid[group] = asid;
> >
> > mutex_lock(&mvdev->mres.lock);
> > diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> > index c1c6431950e1..df9c7ddc5d78 100644
> > --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
> > +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> > @@ -606,12 +606,6 @@ static int vdpasim_set_group_asid(struct vdpa_device *vdpa, unsigned int group,
> > struct vhost_iotlb *iommu;
> > int i;
> >
> > - if (group > vdpasim->dev_attr.ngroups)
> > - return -EINVAL;
> > -
>
> BTW is the original ">" here an off by one error? Should have been >= ?
> if yes then this is a kind of bugfix and maybe needs a fixes tag.
>
Ouch that's a good catch, thanks! Do you prefer me to mark this patch
as "Fixes:" and send it for backporting to stable to or to create a
new patch just adding the ">=" and then moving the check to the vdpa
core on top?
> > - if (asid >= vdpasim->dev_attr.nas)
> > - return -EINVAL;
> > -
> > iommu = &vdpasim->iommu[asid];
> >
> > mutex_lock(&vdpasim->mutex);
> > diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> > index 05a481e4c385..9d25b735b43d 100644
> > --- a/drivers/vhost/vdpa.c
> > +++ b/drivers/vhost/vdpa.c
> > @@ -680,7 +680,7 @@ static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, unsigned int cmd,
> > case VHOST_VDPA_SET_GROUP_ASID:
> > if (copy_from_user(&s, argp, sizeof(s)))
> > return -EFAULT;
> > - if (s.num >= vdpa->nas)
> > + if (idx >= vdpa->ngroups || s.num >= vdpa->nas)
> > return -EINVAL;
> > if (!ops->set_group_asid)
> > return -EOPNOTSUPP;
> > --
> > 2.52.0
>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v11 04/12] vhost: move vdpa group bound check to vhost_vdpa
2026-01-12 7:38 ` Eugenio Perez Martin
@ 2026-01-12 7:56 ` Michael S. Tsirkin
0 siblings, 0 replies; 40+ messages in thread
From: Michael S. Tsirkin @ 2026-01-12 7:56 UTC (permalink / raw)
To: Eugenio Perez Martin
Cc: linux-kernel, virtualization, Maxime Coquelin, Laurent Vivier,
Cindy Lu, jasowang, Xuan Zhuo, Stefano Garzarella, Yongji Xie
On Mon, Jan 12, 2026 at 08:38:26AM +0100, Eugenio Perez Martin wrote:
> On Sun, Jan 11, 2026 at 12:46 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Fri, Jan 09, 2026 at 04:24:22PM +0100, Eugenio Pérez wrote:
> > > Remove duplication by consolidating these here. This reduces the
> > > posibility of a parent driver missing them.
> > >
> > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > ---
> > > drivers/vdpa/mlx5/net/mlx5_vnet.c | 3 ---
> > > drivers/vdpa/vdpa_sim/vdpa_sim.c | 6 ------
> > > drivers/vhost/vdpa.c | 2 +-
> > > 3 files changed, 1 insertion(+), 10 deletions(-)
> > >
> > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > index ddaa1366704b..44062e9d68f0 100644
> > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > @@ -3640,9 +3640,6 @@ static int mlx5_set_group_asid(struct vdpa_device *vdev, u32 group,
> > > struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
> > > int err = 0;
> > >
> > > - if (group >= MLX5_VDPA_NUMVQ_GROUPS)
> > > - return -EINVAL;
> > > -
> > > mvdev->mres.group2asid[group] = asid;
> > >
> > > mutex_lock(&mvdev->mres.lock);
> > > diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> > > index c1c6431950e1..df9c7ddc5d78 100644
> > > --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
> > > +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> > > @@ -606,12 +606,6 @@ static int vdpasim_set_group_asid(struct vdpa_device *vdpa, unsigned int group,
> > > struct vhost_iotlb *iommu;
> > > int i;
> > >
> > > - if (group > vdpasim->dev_attr.ngroups)
> > > - return -EINVAL;
> > > -
> >
> > BTW is the original ">" here an off by one error? Should have been >= ?
> > if yes then this is a kind of bugfix and maybe needs a fixes tag.
> >
>
> Ouch that's a good catch, thanks! Do you prefer me to mark this patch
> as "Fixes:" and send it for backporting to stable to or to create a
> new patch just adding the ">=" and then moving the check to the vdpa
> core on top?
It seems adequate to just send this to backporting.
Do document that this is a fix in the commit log though.
> > > - if (asid >= vdpasim->dev_attr.nas)
> > > - return -EINVAL;
> > > -
> > > iommu = &vdpasim->iommu[asid];
> > >
> > > mutex_lock(&vdpasim->mutex);
> > > diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> > > index 05a481e4c385..9d25b735b43d 100644
> > > --- a/drivers/vhost/vdpa.c
> > > +++ b/drivers/vhost/vdpa.c
> > > @@ -680,7 +680,7 @@ static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, unsigned int cmd,
> > > case VHOST_VDPA_SET_GROUP_ASID:
> > > if (copy_from_user(&s, argp, sizeof(s)))
> > > return -EFAULT;
> > > - if (s.num >= vdpa->nas)
> > > + if (idx >= vdpa->ngroups || s.num >= vdpa->nas)
> > > return -EINVAL;
> > > if (!ops->set_group_asid)
> > > return -EOPNOTSUPP;
> > > --
> > > 2.52.0
> >
^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH v11 05/12] vdpa: document set_group_asid thread safety
2026-01-09 15:24 [PATCH v11 00/12] Add multiple address spaces support to VDUSE Eugenio Pérez
` (3 preceding siblings ...)
2026-01-09 15:24 ` [PATCH v11 04/12] vhost: move vdpa group bound check to vhost_vdpa Eugenio Pérez
@ 2026-01-09 15:24 ` Eugenio Pérez
2026-01-10 23:48 ` Michael S. Tsirkin
2026-01-09 15:24 ` [PATCH v11 06/12] vhost: forbid change vq groups ASID if DRIVER_OK is set Eugenio Pérez
` (6 subsequent siblings)
11 siblings, 1 reply; 40+ messages in thread
From: Eugenio Pérez @ 2026-01-09 15:24 UTC (permalink / raw)
To: Michael S . Tsirkin
Cc: linux-kernel, virtualization, Maxime Coquelin, Laurent Vivier,
Cindy Lu, jasowang, Xuan Zhuo, Stefano Garzarella, Yongji Xie,
Eugenio Pérez
Document that the function races with the check of DRIVER_OK.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
Requested at
https://lore.kernel.org/lkml/CACGkMEvXdV4ukZE6xhLL0sSN70G=AWVQgpRnH98Fr4btzMkK9g@mail.gmail.com/
---
include/linux/vdpa.h | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
index 4cf21d6e9cfd..cd1b1f1321b9 100644
--- a/include/linux/vdpa.h
+++ b/include/linux/vdpa.h
@@ -312,7 +312,9 @@ struct vdpa_map_file {
* @idx: virtqueue index
* Returns the affinity mask
* @set_group_asid: Set address space identifier for a
- * virtqueue group (optional)
+ * virtqueue group (optional). It's not thread
+ * safe to call this function concurrently with
+ * set_status.
* @vdev: vdpa device
* @group: virtqueue group
* @asid: address space id for this group
--
2.52.0
^ permalink raw reply related [flat|nested] 40+ messages in thread* Re: [PATCH v11 05/12] vdpa: document set_group_asid thread safety
2026-01-09 15:24 ` [PATCH v11 05/12] vdpa: document set_group_asid thread safety Eugenio Pérez
@ 2026-01-10 23:48 ` Michael S. Tsirkin
2026-01-12 7:39 ` Eugenio Perez Martin
0 siblings, 1 reply; 40+ messages in thread
From: Michael S. Tsirkin @ 2026-01-10 23:48 UTC (permalink / raw)
To: Eugenio Pérez
Cc: linux-kernel, virtualization, Maxime Coquelin, Laurent Vivier,
Cindy Lu, jasowang, Xuan Zhuo, Stefano Garzarella, Yongji Xie
On Fri, Jan 09, 2026 at 04:24:23PM +0100, Eugenio Pérez wrote:
> Document that the function races with the check of DRIVER_OK.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
> Requested at
> https://lore.kernel.org/lkml/CACGkMEvXdV4ukZE6xhLL0sSN70G=AWVQgpRnH98Fr4btzMkK9g@mail.gmail.com/
> ---
> include/linux/vdpa.h | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
> index 4cf21d6e9cfd..cd1b1f1321b9 100644
> --- a/include/linux/vdpa.h
> +++ b/include/linux/vdpa.h
> @@ -312,7 +312,9 @@ struct vdpa_map_file {
> * @idx: virtqueue index
> * Returns the affinity mask
> * @set_group_asid: Set address space identifier for a
> - * virtqueue group (optional)
> + * virtqueue group (optional). It's not thread
> + * safe to call this function concurrently with
> + * set_status.
Let's be explicit about what to do.
"Caller must prevent this from being executed concurrently with
set_status"?
> * @vdev: vdpa device
> * @group: virtqueue group
> * @asid: address space id for this group
> --
> 2.52.0
^ permalink raw reply [flat|nested] 40+ messages in thread* Re: [PATCH v11 05/12] vdpa: document set_group_asid thread safety
2026-01-10 23:48 ` Michael S. Tsirkin
@ 2026-01-12 7:39 ` Eugenio Perez Martin
0 siblings, 0 replies; 40+ messages in thread
From: Eugenio Perez Martin @ 2026-01-12 7:39 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: linux-kernel, virtualization, Maxime Coquelin, Laurent Vivier,
Cindy Lu, jasowang, Xuan Zhuo, Stefano Garzarella, Yongji Xie
On Sun, Jan 11, 2026 at 12:48 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Fri, Jan 09, 2026 at 04:24:23PM +0100, Eugenio Pérez wrote:
> > Document that the function races with the check of DRIVER_OK.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> > Requested at
> > https://lore.kernel.org/lkml/CACGkMEvXdV4ukZE6xhLL0sSN70G=AWVQgpRnH98Fr4btzMkK9g@mail.gmail.com/
> > ---
> > include/linux/vdpa.h | 4 +++-
> > 1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
> > index 4cf21d6e9cfd..cd1b1f1321b9 100644
> > --- a/include/linux/vdpa.h
> > +++ b/include/linux/vdpa.h
> > @@ -312,7 +312,9 @@ struct vdpa_map_file {
> > * @idx: virtqueue index
> > * Returns the affinity mask
> > * @set_group_asid: Set address space identifier for a
> > - * virtqueue group (optional)
> > + * virtqueue group (optional). It's not thread
> > + * safe to call this function concurrently with
> > + * set_status.
>
> Let's be explicit about what to do.
> "Caller must prevent this from being executed concurrently with
> set_status"?
>
Sure, I prefer your version too. Rewording for the next version, thanks!
>
> > * @vdev: vdpa device
> > * @group: virtqueue group
> > * @asid: address space id for this group
> > --
> > 2.52.0
>
^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH v11 06/12] vhost: forbid change vq groups ASID if DRIVER_OK is set
2026-01-09 15:24 [PATCH v11 00/12] Add multiple address spaces support to VDUSE Eugenio Pérez
` (4 preceding siblings ...)
2026-01-09 15:24 ` [PATCH v11 05/12] vdpa: document set_group_asid thread safety Eugenio Pérez
@ 2026-01-09 15:24 ` Eugenio Pérez
2026-01-10 23:49 ` Michael S. Tsirkin
2026-01-09 15:24 ` [PATCH v11 07/12] vduse: refactor vdpa_dev_add for goto err handling Eugenio Pérez
` (5 subsequent siblings)
11 siblings, 1 reply; 40+ messages in thread
From: Eugenio Pérez @ 2026-01-09 15:24 UTC (permalink / raw)
To: Michael S . Tsirkin
Cc: linux-kernel, virtualization, Maxime Coquelin, Laurent Vivier,
Cindy Lu, jasowang, Xuan Zhuo, Stefano Garzarella, Yongji Xie,
Eugenio Pérez
Only vdpa_sim support it. Forbid this behavious as there is no use for
it right now, we can always enable it in the future with a feature flag.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
drivers/vhost/vdpa.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 9d25b735b43d..3f0184d42075 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -682,6 +682,8 @@ static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, unsigned int cmd,
return -EFAULT;
if (idx >= vdpa->ngroups || s.num >= vdpa->nas)
return -EINVAL;
+ if (ops->get_status(vdpa) & VIRTIO_CONFIG_S_DRIVER_OK)
+ return -EBUSY;
if (!ops->set_group_asid)
return -EOPNOTSUPP;
return ops->set_group_asid(vdpa, idx, s.num);
--
2.52.0
^ permalink raw reply related [flat|nested] 40+ messages in thread* Re: [PATCH v11 06/12] vhost: forbid change vq groups ASID if DRIVER_OK is set
2026-01-09 15:24 ` [PATCH v11 06/12] vhost: forbid change vq groups ASID if DRIVER_OK is set Eugenio Pérez
@ 2026-01-10 23:49 ` Michael S. Tsirkin
2026-01-12 7:50 ` Eugenio Perez Martin
0 siblings, 1 reply; 40+ messages in thread
From: Michael S. Tsirkin @ 2026-01-10 23:49 UTC (permalink / raw)
To: Eugenio Pérez
Cc: linux-kernel, virtualization, Maxime Coquelin, Laurent Vivier,
Cindy Lu, jasowang, Xuan Zhuo, Stefano Garzarella, Yongji Xie
On Fri, Jan 09, 2026 at 04:24:24PM +0100, Eugenio Pérez wrote:
> Only vdpa_sim support it. Forbid this behavious as there is no use for
behaviour
> it right now, we can always enable it in the future with a feature flag.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
> drivers/vhost/vdpa.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> index 9d25b735b43d..3f0184d42075 100644
> --- a/drivers/vhost/vdpa.c
> +++ b/drivers/vhost/vdpa.c
> @@ -682,6 +682,8 @@ static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, unsigned int cmd,
> return -EFAULT;
> if (idx >= vdpa->ngroups || s.num >= vdpa->nas)
> return -EINVAL;
> + if (ops->get_status(vdpa) & VIRTIO_CONFIG_S_DRIVER_OK)
> + return -EBUSY;
> if (!ops->set_group_asid)
> return -EOPNOTSUPP;
> return ops->set_group_asid(vdpa, idx, s.num);
> --
> 2.52.0
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v11 06/12] vhost: forbid change vq groups ASID if DRIVER_OK is set
2026-01-10 23:49 ` Michael S. Tsirkin
@ 2026-01-12 7:50 ` Eugenio Perez Martin
0 siblings, 0 replies; 40+ messages in thread
From: Eugenio Perez Martin @ 2026-01-12 7:50 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: linux-kernel, virtualization, Maxime Coquelin, Laurent Vivier,
Cindy Lu, jasowang, Xuan Zhuo, Stefano Garzarella, Yongji Xie
On Sun, Jan 11, 2026 at 12:49 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Fri, Jan 09, 2026 at 04:24:24PM +0100, Eugenio Pérez wrote:
> > Only vdpa_sim support it. Forbid this behavious as there is no use for
>
> behaviour
>
Fixing in the next version, thanks!
> > it right now, we can always enable it in the future with a feature flag.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> > drivers/vhost/vdpa.c | 2 ++
> > 1 file changed, 2 insertions(+)
> >
> > diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> > index 9d25b735b43d..3f0184d42075 100644
> > --- a/drivers/vhost/vdpa.c
> > +++ b/drivers/vhost/vdpa.c
> > @@ -682,6 +682,8 @@ static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, unsigned int cmd,
> > return -EFAULT;
> > if (idx >= vdpa->ngroups || s.num >= vdpa->nas)
> > return -EINVAL;
> > + if (ops->get_status(vdpa) & VIRTIO_CONFIG_S_DRIVER_OK)
> > + return -EBUSY;
> > if (!ops->set_group_asid)
> > return -EOPNOTSUPP;
> > return ops->set_group_asid(vdpa, idx, s.num);
> > --
> > 2.52.0
>
^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH v11 07/12] vduse: refactor vdpa_dev_add for goto err handling
2026-01-09 15:24 [PATCH v11 00/12] Add multiple address spaces support to VDUSE Eugenio Pérez
` (5 preceding siblings ...)
2026-01-09 15:24 ` [PATCH v11 06/12] vhost: forbid change vq groups ASID if DRIVER_OK is set Eugenio Pérez
@ 2026-01-09 15:24 ` Eugenio Pérez
2026-01-10 23:49 ` Michael S. Tsirkin
2026-01-09 15:24 ` [PATCH v11 08/12] vduse: remove unused vaddr parameter of vduse_domain_free_coherent Eugenio Pérez
` (4 subsequent siblings)
11 siblings, 1 reply; 40+ messages in thread
From: Eugenio Pérez @ 2026-01-09 15:24 UTC (permalink / raw)
To: Michael S . Tsirkin
Cc: linux-kernel, virtualization, Maxime Coquelin, Laurent Vivier,
Cindy Lu, jasowang, Xuan Zhuo, Stefano Garzarella, Yongji Xie,
Eugenio Pérez
Next patches introduce more error paths in this function. Refactor it
so they can be accomodated through gotos.
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Xie Yongji <xieyongji@bytedance.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
v6: New in v6.
---
drivers/vdpa/vdpa_user/vduse_dev.c | 22 ++++++++++++++--------
1 file changed, 14 insertions(+), 8 deletions(-)
diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
index 0386434577b5..f7a45f396cf8 100644
--- a/drivers/vdpa/vdpa_user/vduse_dev.c
+++ b/drivers/vdpa/vdpa_user/vduse_dev.c
@@ -2171,21 +2171,27 @@ static int vdpa_dev_add(struct vdpa_mgmt_dev *mdev, const char *name,
dev->bounce_size);
mutex_unlock(&dev->domain_lock);
if (!dev->domain) {
- put_device(&dev->vdev->vdpa.dev);
- return -ENOMEM;
+ ret = -ENOMEM;
+ goto domain_err;
}
ret = _vdpa_register_device(&dev->vdev->vdpa, dev->vq_num);
if (ret) {
- put_device(&dev->vdev->vdpa.dev);
- mutex_lock(&dev->domain_lock);
- vduse_domain_destroy(dev->domain);
- dev->domain = NULL;
- mutex_unlock(&dev->domain_lock);
- return ret;
+ goto register_err;
}
return 0;
+
+register_err:
+ mutex_lock(&dev->domain_lock);
+ vduse_domain_destroy(dev->domain);
+ dev->domain = NULL;
+ mutex_unlock(&dev->domain_lock);
+
+domain_err:
+ put_device(&dev->vdev->vdpa.dev);
+
+ return ret;
}
static void vdpa_dev_del(struct vdpa_mgmt_dev *mdev, struct vdpa_device *dev)
--
2.52.0
^ permalink raw reply related [flat|nested] 40+ messages in thread* Re: [PATCH v11 07/12] vduse: refactor vdpa_dev_add for goto err handling
2026-01-09 15:24 ` [PATCH v11 07/12] vduse: refactor vdpa_dev_add for goto err handling Eugenio Pérez
@ 2026-01-10 23:49 ` Michael S. Tsirkin
2026-01-12 7:51 ` Eugenio Perez Martin
0 siblings, 1 reply; 40+ messages in thread
From: Michael S. Tsirkin @ 2026-01-10 23:49 UTC (permalink / raw)
To: Eugenio Pérez
Cc: linux-kernel, virtualization, Maxime Coquelin, Laurent Vivier,
Cindy Lu, jasowang, Xuan Zhuo, Stefano Garzarella, Yongji Xie
On Fri, Jan 09, 2026 at 04:24:25PM +0100, Eugenio Pérez wrote:
> Next patches introduce more error paths in this function. Refactor it
> so they can be accomodated through gotos.
accommodated
>
> Acked-by: Jason Wang <jasowang@redhat.com>
> Reviewed-by: Xie Yongji <xieyongji@bytedance.com>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
> v6: New in v6.
> ---
> drivers/vdpa/vdpa_user/vduse_dev.c | 22 ++++++++++++++--------
> 1 file changed, 14 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
> index 0386434577b5..f7a45f396cf8 100644
> --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> @@ -2171,21 +2171,27 @@ static int vdpa_dev_add(struct vdpa_mgmt_dev *mdev, const char *name,
> dev->bounce_size);
> mutex_unlock(&dev->domain_lock);
> if (!dev->domain) {
> - put_device(&dev->vdev->vdpa.dev);
> - return -ENOMEM;
> + ret = -ENOMEM;
> + goto domain_err;
> }
>
> ret = _vdpa_register_device(&dev->vdev->vdpa, dev->vq_num);
> if (ret) {
> - put_device(&dev->vdev->vdpa.dev);
> - mutex_lock(&dev->domain_lock);
> - vduse_domain_destroy(dev->domain);
> - dev->domain = NULL;
> - mutex_unlock(&dev->domain_lock);
> - return ret;
> + goto register_err;
> }
>
> return 0;
> +
> +register_err:
> + mutex_lock(&dev->domain_lock);
> + vduse_domain_destroy(dev->domain);
> + dev->domain = NULL;
> + mutex_unlock(&dev->domain_lock);
> +
> +domain_err:
> + put_device(&dev->vdev->vdpa.dev);
> +
> + return ret;
> }
>
> static void vdpa_dev_del(struct vdpa_mgmt_dev *mdev, struct vdpa_device *dev)
> --
> 2.52.0
^ permalink raw reply [flat|nested] 40+ messages in thread* Re: [PATCH v11 07/12] vduse: refactor vdpa_dev_add for goto err handling
2026-01-10 23:49 ` Michael S. Tsirkin
@ 2026-01-12 7:51 ` Eugenio Perez Martin
0 siblings, 0 replies; 40+ messages in thread
From: Eugenio Perez Martin @ 2026-01-12 7:51 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: linux-kernel, virtualization, Maxime Coquelin, Laurent Vivier,
Cindy Lu, jasowang, Xuan Zhuo, Stefano Garzarella, Yongji Xie
On Sun, Jan 11, 2026 at 12:49 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Fri, Jan 09, 2026 at 04:24:25PM +0100, Eugenio Pérez wrote:
> > Next patches introduce more error paths in this function. Refactor it
> > so they can be accomodated through gotos.
>
> accommodated
>
Fixing in the next version, thanks!
> >
> > Acked-by: Jason Wang <jasowang@redhat.com>
> > Reviewed-by: Xie Yongji <xieyongji@bytedance.com>
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> > v6: New in v6.
> > ---
> > drivers/vdpa/vdpa_user/vduse_dev.c | 22 ++++++++++++++--------
> > 1 file changed, 14 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
> > index 0386434577b5..f7a45f396cf8 100644
> > --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> > +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> > @@ -2171,21 +2171,27 @@ static int vdpa_dev_add(struct vdpa_mgmt_dev *mdev, const char *name,
> > dev->bounce_size);
> > mutex_unlock(&dev->domain_lock);
> > if (!dev->domain) {
> > - put_device(&dev->vdev->vdpa.dev);
> > - return -ENOMEM;
> > + ret = -ENOMEM;
> > + goto domain_err;
> > }
> >
> > ret = _vdpa_register_device(&dev->vdev->vdpa, dev->vq_num);
> > if (ret) {
> > - put_device(&dev->vdev->vdpa.dev);
> > - mutex_lock(&dev->domain_lock);
> > - vduse_domain_destroy(dev->domain);
> > - dev->domain = NULL;
> > - mutex_unlock(&dev->domain_lock);
> > - return ret;
> > + goto register_err;
> > }
> >
> > return 0;
> > +
> > +register_err:
> > + mutex_lock(&dev->domain_lock);
> > + vduse_domain_destroy(dev->domain);
> > + dev->domain = NULL;
> > + mutex_unlock(&dev->domain_lock);
> > +
> > +domain_err:
> > + put_device(&dev->vdev->vdpa.dev);
> > +
> > + return ret;
> > }
> >
> > static void vdpa_dev_del(struct vdpa_mgmt_dev *mdev, struct vdpa_device *dev)
> > --
> > 2.52.0
>
^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH v11 08/12] vduse: remove unused vaddr parameter of vduse_domain_free_coherent
2026-01-09 15:24 [PATCH v11 00/12] Add multiple address spaces support to VDUSE Eugenio Pérez
` (6 preceding siblings ...)
2026-01-09 15:24 ` [PATCH v11 07/12] vduse: refactor vdpa_dev_add for goto err handling Eugenio Pérez
@ 2026-01-09 15:24 ` Eugenio Pérez
2026-01-09 15:24 ` [PATCH v11 09/12] vduse: take out allocations from vduse_dev_alloc_coherent Eugenio Pérez
` (3 subsequent siblings)
11 siblings, 0 replies; 40+ messages in thread
From: Eugenio Pérez @ 2026-01-09 15:24 UTC (permalink / raw)
To: Michael S . Tsirkin
Cc: linux-kernel, virtualization, Maxime Coquelin, Laurent Vivier,
Cindy Lu, jasowang, Xuan Zhuo, Stefano Garzarella, Yongji Xie,
Eugenio Pérez
We will modify the function in next patches so let's clean it first.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
drivers/vdpa/vdpa_user/iova_domain.c | 3 +--
drivers/vdpa/vdpa_user/iova_domain.h | 3 +--
drivers/vdpa/vdpa_user/vduse_dev.c | 2 +-
3 files changed, 3 insertions(+), 5 deletions(-)
diff --git a/drivers/vdpa/vdpa_user/iova_domain.c b/drivers/vdpa/vdpa_user/iova_domain.c
index 4352b5cf74f0..309cd5a039d1 100644
--- a/drivers/vdpa/vdpa_user/iova_domain.c
+++ b/drivers/vdpa/vdpa_user/iova_domain.c
@@ -528,8 +528,7 @@ void *vduse_domain_alloc_coherent(struct vduse_iova_domain *domain,
}
void vduse_domain_free_coherent(struct vduse_iova_domain *domain, size_t size,
- void *vaddr, dma_addr_t dma_addr,
- unsigned long attrs)
+ dma_addr_t dma_addr, unsigned long attrs)
{
struct iova_domain *iovad = &domain->consistent_iovad;
struct vhost_iotlb_map *map;
diff --git a/drivers/vdpa/vdpa_user/iova_domain.h b/drivers/vdpa/vdpa_user/iova_domain.h
index a923971a64f5..081f06c52cdc 100644
--- a/drivers/vdpa/vdpa_user/iova_domain.h
+++ b/drivers/vdpa/vdpa_user/iova_domain.h
@@ -70,8 +70,7 @@ void *vduse_domain_alloc_coherent(struct vduse_iova_domain *domain,
gfp_t flag);
void vduse_domain_free_coherent(struct vduse_iova_domain *domain, size_t size,
- void *vaddr, dma_addr_t dma_addr,
- unsigned long attrs);
+ dma_addr_t dma_addr, unsigned long attrs);
void vduse_domain_reset_bounce_map(struct vduse_iova_domain *domain);
diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
index f7a45f396cf8..82ee476d45e0 100644
--- a/drivers/vdpa/vdpa_user/vduse_dev.c
+++ b/drivers/vdpa/vdpa_user/vduse_dev.c
@@ -948,7 +948,7 @@ static void vduse_dev_free_coherent(union virtio_map token, size_t size,
vdev = token.group->dev;
domain = vdev->domain;
- vduse_domain_free_coherent(domain, size, vaddr, dma_addr, attrs);
+ vduse_domain_free_coherent(domain, size, dma_addr, attrs);
}
static bool vduse_dev_need_sync(union virtio_map token, dma_addr_t dma_addr)
--
2.52.0
^ permalink raw reply related [flat|nested] 40+ messages in thread* [PATCH v11 09/12] vduse: take out allocations from vduse_dev_alloc_coherent
2026-01-09 15:24 [PATCH v11 00/12] Add multiple address spaces support to VDUSE Eugenio Pérez
` (7 preceding siblings ...)
2026-01-09 15:24 ` [PATCH v11 08/12] vduse: remove unused vaddr parameter of vduse_domain_free_coherent Eugenio Pérez
@ 2026-01-09 15:24 ` Eugenio Pérez
2026-01-10 23:54 ` Michael S. Tsirkin
2026-01-09 15:24 ` [PATCH v11 10/12] vduse: merge tree search logic of IOTLB_GET_FD and IOTLB_GET_INFO ioctls Eugenio Pérez
` (2 subsequent siblings)
11 siblings, 1 reply; 40+ messages in thread
From: Eugenio Pérez @ 2026-01-09 15:24 UTC (permalink / raw)
To: Michael S . Tsirkin
Cc: linux-kernel, virtualization, Maxime Coquelin, Laurent Vivier,
Cindy Lu, jasowang, Xuan Zhuo, Stefano Garzarella, Yongji Xie,
Eugenio Pérez
The function vduse_dev_alloc_coherent will be called under rwlock in
next patches. Make it out of the lock to avoid increasing its fail
rate.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
v11: Remove duplicated call to free_pages_exact (Jason).
---
drivers/vdpa/vdpa_user/iova_domain.c | 10 ++--------
drivers/vdpa/vdpa_user/iova_domain.h | 2 +-
drivers/vdpa/vdpa_user/vduse_dev.c | 13 +++++++++++--
3 files changed, 14 insertions(+), 11 deletions(-)
diff --git a/drivers/vdpa/vdpa_user/iova_domain.c b/drivers/vdpa/vdpa_user/iova_domain.c
index 309cd5a039d1..3955690696fe 100644
--- a/drivers/vdpa/vdpa_user/iova_domain.c
+++ b/drivers/vdpa/vdpa_user/iova_domain.c
@@ -495,14 +495,13 @@ void vduse_domain_unmap_page(struct vduse_iova_domain *domain,
void *vduse_domain_alloc_coherent(struct vduse_iova_domain *domain,
size_t size, dma_addr_t *dma_addr,
- gfp_t flag)
+ void *orig)
{
struct iova_domain *iovad = &domain->consistent_iovad;
unsigned long limit = domain->iova_limit;
dma_addr_t iova = vduse_domain_alloc_iova(iovad, size, limit);
- void *orig = alloc_pages_exact(size, flag);
- if (!iova || !orig)
+ if (!iova)
goto err;
spin_lock(&domain->iotlb_lock);
@@ -519,8 +518,6 @@ void *vduse_domain_alloc_coherent(struct vduse_iova_domain *domain,
return orig;
err:
*dma_addr = DMA_MAPPING_ERROR;
- if (orig)
- free_pages_exact(orig, size);
if (iova)
vduse_domain_free_iova(iovad, iova, size);
@@ -533,7 +530,6 @@ void vduse_domain_free_coherent(struct vduse_iova_domain *domain, size_t size,
struct iova_domain *iovad = &domain->consistent_iovad;
struct vhost_iotlb_map *map;
struct vdpa_map_file *map_file;
- phys_addr_t pa;
spin_lock(&domain->iotlb_lock);
map = vhost_iotlb_itree_first(domain->iotlb, (u64)dma_addr,
@@ -545,12 +541,10 @@ void vduse_domain_free_coherent(struct vduse_iova_domain *domain, size_t size,
map_file = (struct vdpa_map_file *)map->opaque;
fput(map_file->file);
kfree(map_file);
- pa = map->addr;
vhost_iotlb_map_free(domain->iotlb, map);
spin_unlock(&domain->iotlb_lock);
vduse_domain_free_iova(iovad, dma_addr, size);
- free_pages_exact(phys_to_virt(pa), size);
}
static vm_fault_t vduse_domain_mmap_fault(struct vm_fault *vmf)
diff --git a/drivers/vdpa/vdpa_user/iova_domain.h b/drivers/vdpa/vdpa_user/iova_domain.h
index 081f06c52cdc..1854fdc25597 100644
--- a/drivers/vdpa/vdpa_user/iova_domain.h
+++ b/drivers/vdpa/vdpa_user/iova_domain.h
@@ -67,7 +67,7 @@ void vduse_domain_unmap_page(struct vduse_iova_domain *domain,
void *vduse_domain_alloc_coherent(struct vduse_iova_domain *domain,
size_t size, dma_addr_t *dma_addr,
- gfp_t flag);
+ void *orig);
void vduse_domain_free_coherent(struct vduse_iova_domain *domain, size_t size,
dma_addr_t dma_addr, unsigned long attrs);
diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
index 82ee476d45e0..675da1465e0e 100644
--- a/drivers/vdpa/vdpa_user/vduse_dev.c
+++ b/drivers/vdpa/vdpa_user/vduse_dev.c
@@ -923,16 +923,24 @@ static void *vduse_dev_alloc_coherent(union virtio_map token, size_t size,
if (!token.group)
return NULL;
+ addr = alloc_pages_exact(size, flag);
+ if (!addr)
+ return NULL;
+
vdev = token.group->dev;
domain = vdev->domain;
addr = vduse_domain_alloc_coherent(domain, size,
- (dma_addr_t *)&iova, flag);
+ (dma_addr_t *)&iova, addr);
if (!addr)
- return NULL;
+ goto err;
*dma_addr = (dma_addr_t)iova;
return addr;
+
+err:
+ free_pages_exact(addr, size);
+ return NULL;
}
static void vduse_dev_free_coherent(union virtio_map token, size_t size,
@@ -949,6 +957,7 @@ static void vduse_dev_free_coherent(union virtio_map token, size_t size,
domain = vdev->domain;
vduse_domain_free_coherent(domain, size, dma_addr, attrs);
+ free_pages_exact(vaddr, size);
}
static bool vduse_dev_need_sync(union virtio_map token, dma_addr_t dma_addr)
--
2.52.0
^ permalink raw reply related [flat|nested] 40+ messages in thread* Re: [PATCH v11 09/12] vduse: take out allocations from vduse_dev_alloc_coherent
2026-01-09 15:24 ` [PATCH v11 09/12] vduse: take out allocations from vduse_dev_alloc_coherent Eugenio Pérez
@ 2026-01-10 23:54 ` Michael S. Tsirkin
2026-01-12 9:26 ` Eugenio Perez Martin
0 siblings, 1 reply; 40+ messages in thread
From: Michael S. Tsirkin @ 2026-01-10 23:54 UTC (permalink / raw)
To: Eugenio Pérez
Cc: linux-kernel, virtualization, Maxime Coquelin, Laurent Vivier,
Cindy Lu, jasowang, Xuan Zhuo, Stefano Garzarella, Yongji Xie
On Fri, Jan 09, 2026 at 04:24:27PM +0100, Eugenio Pérez wrote:
> diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
> index 82ee476d45e0..675da1465e0e 100644
> --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> @@ -923,16 +923,24 @@ static void *vduse_dev_alloc_coherent(union virtio_map token, size_t size,
> if (!token.group)
> return NULL;
>
> + addr = alloc_pages_exact(size, flag);
> + if (!addr)
> + return NULL;
> +
So addr has allocated pages here ...
> vdev = token.group->dev;
> domain = vdev->domain;
> addr = vduse_domain_alloc_coherent(domain, size,
> - (dma_addr_t *)&iova, flag);
> + (dma_addr_t *)&iova, addr);
and then is overwritten here ...
> if (!addr)
> - return NULL;
> + goto err;
except on error where we go to err ...
>
> *dma_addr = (dma_addr_t)iova;
>
> return addr;
> +
> +err:
> + free_pages_exact(addr, size);
only to try and free NULL. will leak the original pages, will it not.
> + return NULL;
> }
>
> static void vduse_dev_free_coherent(union virtio_map token, size_t size,
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v11 09/12] vduse: take out allocations from vduse_dev_alloc_coherent
2026-01-10 23:54 ` Michael S. Tsirkin
@ 2026-01-12 9:26 ` Eugenio Perez Martin
0 siblings, 0 replies; 40+ messages in thread
From: Eugenio Perez Martin @ 2026-01-12 9:26 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: linux-kernel, virtualization, Maxime Coquelin, Laurent Vivier,
Cindy Lu, jasowang, Xuan Zhuo, Stefano Garzarella, Yongji Xie
On Sun, Jan 11, 2026 at 12:54 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Fri, Jan 09, 2026 at 04:24:27PM +0100, Eugenio Pérez wrote:
> > diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
> > index 82ee476d45e0..675da1465e0e 100644
> > --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> > +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> > @@ -923,16 +923,24 @@ static void *vduse_dev_alloc_coherent(union virtio_map token, size_t size,
> > if (!token.group)
> > return NULL;
> >
> > + addr = alloc_pages_exact(size, flag);
> > + if (!addr)
> > + return NULL;
> > +
>
> So addr has allocated pages here ...
>
>
> > vdev = token.group->dev;
> > domain = vdev->domain;
> > addr = vduse_domain_alloc_coherent(domain, size,
> > - (dma_addr_t *)&iova, flag);
> > + (dma_addr_t *)&iova, addr);
>
> and then is overwritten here ...
>
> > if (!addr)
> > - return NULL;
> > + goto err;
>
> except on error where we go to err ...
>
> >
> > *dma_addr = (dma_addr_t)iova;
> >
> > return addr;
> > +
> > +err:
> > + free_pages_exact(addr, size);
>
> only to try and free NULL. will leak the original pages, will it not.
>
Right, I missed that in the conversion. I'll change the function
vduse_domain_alloc_coherent so it returns dma_addr instead of orig,
making all the code simpler. Thanks!
> > + return NULL;
> > }
> >
> > static void vduse_dev_free_coherent(union virtio_map token, size_t size,
>
^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH v11 10/12] vduse: merge tree search logic of IOTLB_GET_FD and IOTLB_GET_INFO ioctls
2026-01-09 15:24 [PATCH v11 00/12] Add multiple address spaces support to VDUSE Eugenio Pérez
` (8 preceding siblings ...)
2026-01-09 15:24 ` [PATCH v11 09/12] vduse: take out allocations from vduse_dev_alloc_coherent Eugenio Pérez
@ 2026-01-09 15:24 ` Eugenio Pérez
2026-01-10 23:56 ` Michael S. Tsirkin
2026-01-16 6:40 ` kernel test robot
2026-01-09 15:24 ` [PATCH v11 11/12] vduse: add vq group asid support Eugenio Pérez
2026-01-09 15:24 ` [PATCH v11 12/12] vduse: bump version number Eugenio Pérez
11 siblings, 2 replies; 40+ messages in thread
From: Eugenio Pérez @ 2026-01-09 15:24 UTC (permalink / raw)
To: Michael S . Tsirkin
Cc: linux-kernel, virtualization, Maxime Coquelin, Laurent Vivier,
Cindy Lu, jasowang, Xuan Zhuo, Stefano Garzarella, Yongji Xie,
Eugenio Pérez
The next patch adds new ioctl with the ASID member per entry. Abstract
these two so it can be build on top easily.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
v11: New in v11
---
drivers/vdpa/vdpa_user/vduse_dev.c | 101 ++++++++++++++++-------------
1 file changed, 55 insertions(+), 46 deletions(-)
diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
index 675da1465e0e..bf437816fd7d 100644
--- a/drivers/vdpa/vdpa_user/vduse_dev.c
+++ b/drivers/vdpa/vdpa_user/vduse_dev.c
@@ -1247,6 +1247,50 @@ static void vduse_vq_update_effective_cpu(struct vduse_virtqueue *vq)
vq->irq_effective_cpu = curr_cpu;
}
+static int vduse_dev_iotlb_entry(struct vduse_dev *dev,
+ struct vduse_iotlb_entry *entry,
+ struct file **f, uint64_t *capability)
+{
+ int r = -EINVAL;
+ struct vhost_iotlb_map *map;
+ const struct vdpa_map_file *map_file;
+
+ if (entry->start > entry->last)
+ return -EINVAL;
+
+ mutex_lock(&dev->domain_lock);
+ if (!dev->domain)
+ goto out;
+
+ spin_lock(&dev->domain->iotlb_lock);
+ map = vhost_iotlb_itree_first(dev->domain->iotlb, entry->start,
+ entry->last);
+ if (map) {
+ if (f) {
+ map_file = (struct vdpa_map_file *)map->opaque;
+ *f = get_file(map_file->file);
+ }
+ entry->offset = map_file->offset;
+ entry->start = map->start;
+ entry->last = map->last;
+ entry->perm = map->perm;
+ if (capability) {
+ *capability = 0;
+
+ if (dev->domain->bounce_map && map->start == 0 &&
+ map->last == dev->domain->bounce_size - 1)
+ *capability |= VDUSE_IOVA_CAP_UMEM;
+ }
+
+ r = 0;
+ }
+ spin_unlock(&dev->domain->iotlb_lock);
+
+out:
+ mutex_unlock(&dev->domain_lock);
+ return r;
+}
+
static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
unsigned long arg)
{
@@ -1260,36 +1304,16 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
switch (cmd) {
case VDUSE_IOTLB_GET_FD: {
struct vduse_iotlb_entry entry;
- struct vhost_iotlb_map *map;
- struct vdpa_map_file *map_file;
struct file *f = NULL;
ret = -EFAULT;
if (copy_from_user(&entry, argp, sizeof(entry)))
break;
- ret = -EINVAL;
- if (entry.start > entry.last)
+ ret = vduse_dev_iotlb_entry(dev, &entry, &f, NULL);
+ if (ret)
break;
- mutex_lock(&dev->domain_lock);
- if (!dev->domain) {
- mutex_unlock(&dev->domain_lock);
- break;
- }
- spin_lock(&dev->domain->iotlb_lock);
- map = vhost_iotlb_itree_first(dev->domain->iotlb,
- entry.start, entry.last);
- if (map) {
- map_file = (struct vdpa_map_file *)map->opaque;
- f = get_file(map_file->file);
- entry.offset = map_file->offset;
- entry.start = map->start;
- entry.last = map->last;
- entry.perm = map->perm;
- }
- spin_unlock(&dev->domain->iotlb_lock);
- mutex_unlock(&dev->domain_lock);
ret = -EINVAL;
if (!f)
break;
@@ -1479,41 +1503,26 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
}
case VDUSE_IOTLB_GET_INFO: {
struct vduse_iova_info info;
- struct vhost_iotlb_map *map;
+ struct vduse_iotlb_entry entry;
ret = -EFAULT;
if (copy_from_user(&info, argp, sizeof(info)))
break;
- ret = -EINVAL;
- if (info.start > info.last)
- break;
-
if (!is_mem_zero((const char *)info.reserved,
sizeof(info.reserved)))
break;
- mutex_lock(&dev->domain_lock);
- if (!dev->domain) {
- mutex_unlock(&dev->domain_lock);
- break;
- }
- spin_lock(&dev->domain->iotlb_lock);
- map = vhost_iotlb_itree_first(dev->domain->iotlb,
- info.start, info.last);
- if (map) {
- info.start = map->start;
- info.last = map->last;
- info.capability = 0;
- if (dev->domain->bounce_map && map->start == 0 &&
- map->last == dev->domain->bounce_size - 1)
- info.capability |= VDUSE_IOVA_CAP_UMEM;
- }
- spin_unlock(&dev->domain->iotlb_lock);
- mutex_unlock(&dev->domain_lock);
- if (!map)
+ entry.start = info.start;
+ entry.last = info.last;
+ ret = vduse_dev_iotlb_entry(dev, &entry, NULL,
+ &info.capability);
+ if (ret < 0)
break;
+ info.start = entry.start;
+ info.last = entry.last;
+
ret = -EFAULT;
if (copy_to_user(argp, &info, sizeof(info)))
break;
--
2.52.0
^ permalink raw reply related [flat|nested] 40+ messages in thread* Re: [PATCH v11 10/12] vduse: merge tree search logic of IOTLB_GET_FD and IOTLB_GET_INFO ioctls
2026-01-09 15:24 ` [PATCH v11 10/12] vduse: merge tree search logic of IOTLB_GET_FD and IOTLB_GET_INFO ioctls Eugenio Pérez
@ 2026-01-10 23:56 ` Michael S. Tsirkin
2026-01-12 10:55 ` Eugenio Perez Martin
2026-01-16 6:40 ` kernel test robot
1 sibling, 1 reply; 40+ messages in thread
From: Michael S. Tsirkin @ 2026-01-10 23:56 UTC (permalink / raw)
To: Eugenio Pérez
Cc: linux-kernel, virtualization, Maxime Coquelin, Laurent Vivier,
Cindy Lu, jasowang, Xuan Zhuo, Stefano Garzarella, Yongji Xie
On Fri, Jan 09, 2026 at 04:24:28PM +0100, Eugenio Pérez wrote:
> The next patch adds new ioctl with the ASID member per entry. Abstract
> these two so it can be build on top easily.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
> v11: New in v11
> ---
> drivers/vdpa/vdpa_user/vduse_dev.c | 101 ++++++++++++++++-------------
> 1 file changed, 55 insertions(+), 46 deletions(-)
>
> diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
> index 675da1465e0e..bf437816fd7d 100644
> --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> @@ -1247,6 +1247,50 @@ static void vduse_vq_update_effective_cpu(struct vduse_virtqueue *vq)
> vq->irq_effective_cpu = curr_cpu;
> }
>
> +static int vduse_dev_iotlb_entry(struct vduse_dev *dev,
> + struct vduse_iotlb_entry *entry,
> + struct file **f, uint64_t *capability)
> +{
> + int r = -EINVAL;
> + struct vhost_iotlb_map *map;
> + const struct vdpa_map_file *map_file;
> +
> + if (entry->start > entry->last)
> + return -EINVAL;
> +
> + mutex_lock(&dev->domain_lock);
> + if (!dev->domain)
> + goto out;
> +
> + spin_lock(&dev->domain->iotlb_lock);
> + map = vhost_iotlb_itree_first(dev->domain->iotlb, entry->start,
> + entry->last);
> + if (map) {
> + if (f) {
> + map_file = (struct vdpa_map_file *)map->opaque;
map_file assigned value when f != NULL here ...
> + *f = get_file(map_file->file);
> + }
> + entry->offset = map_file->offset;
but dereferenced unconditionally here.
> + entry->start = map->start;
> + entry->last = map->last;
> + entry->perm = map->perm;
> + if (capability) {
> + *capability = 0;
> +
> + if (dev->domain->bounce_map && map->start == 0 &&
> + map->last == dev->domain->bounce_size - 1)
> + *capability |= VDUSE_IOVA_CAP_UMEM;
> + }
> +
> + r = 0;
> + }
> + spin_unlock(&dev->domain->iotlb_lock);
> +
> +out:
> + mutex_unlock(&dev->domain_lock);
> + return r;
> +}
> +
> static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
> unsigned long arg)
> {
^ permalink raw reply [flat|nested] 40+ messages in thread* Re: [PATCH v11 10/12] vduse: merge tree search logic of IOTLB_GET_FD and IOTLB_GET_INFO ioctls
2026-01-10 23:56 ` Michael S. Tsirkin
@ 2026-01-12 10:55 ` Eugenio Perez Martin
0 siblings, 0 replies; 40+ messages in thread
From: Eugenio Perez Martin @ 2026-01-12 10:55 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: linux-kernel, virtualization, Maxime Coquelin, Laurent Vivier,
Cindy Lu, jasowang, Xuan Zhuo, Stefano Garzarella, Yongji Xie
On Sun, Jan 11, 2026 at 12:56 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Fri, Jan 09, 2026 at 04:24:28PM +0100, Eugenio Pérez wrote:
> > The next patch adds new ioctl with the ASID member per entry. Abstract
> > these two so it can be build on top easily.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> > v11: New in v11
> > ---
> > drivers/vdpa/vdpa_user/vduse_dev.c | 101 ++++++++++++++++-------------
> > 1 file changed, 55 insertions(+), 46 deletions(-)
> >
> > diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
> > index 675da1465e0e..bf437816fd7d 100644
> > --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> > +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> > @@ -1247,6 +1247,50 @@ static void vduse_vq_update_effective_cpu(struct vduse_virtqueue *vq)
> > vq->irq_effective_cpu = curr_cpu;
> > }
> >
> > +static int vduse_dev_iotlb_entry(struct vduse_dev *dev,
> > + struct vduse_iotlb_entry *entry,
> > + struct file **f, uint64_t *capability)
> > +{
> > + int r = -EINVAL;
> > + struct vhost_iotlb_map *map;
> > + const struct vdpa_map_file *map_file;
> > +
> > + if (entry->start > entry->last)
> > + return -EINVAL;
> > +
> > + mutex_lock(&dev->domain_lock);
> > + if (!dev->domain)
> > + goto out;
> > +
> > + spin_lock(&dev->domain->iotlb_lock);
> > + map = vhost_iotlb_itree_first(dev->domain->iotlb, entry->start,
> > + entry->last);
> > + if (map) {
> > + if (f) {
> > + map_file = (struct vdpa_map_file *)map->opaque;
>
> map_file assigned value when f != NULL here ...
>
> > + *f = get_file(map_file->file);
> > + }
> > + entry->offset = map_file->offset;
>
> but dereferenced unconditionally here.
>
Fixing in the next version, thanks!
> > + entry->start = map->start;
> > + entry->last = map->last;
> > + entry->perm = map->perm;
> > + if (capability) {
> > + *capability = 0;
> > +
> > + if (dev->domain->bounce_map && map->start == 0 &&
> > + map->last == dev->domain->bounce_size - 1)
> > + *capability |= VDUSE_IOVA_CAP_UMEM;
> > + }
> > +
> > + r = 0;
> > + }
> > + spin_unlock(&dev->domain->iotlb_lock);
> > +
> > +out:
> > + mutex_unlock(&dev->domain_lock);
> > + return r;
> > +}
> > +
> > static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
> > unsigned long arg)
> > {
>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v11 10/12] vduse: merge tree search logic of IOTLB_GET_FD and IOTLB_GET_INFO ioctls
2026-01-09 15:24 ` [PATCH v11 10/12] vduse: merge tree search logic of IOTLB_GET_FD and IOTLB_GET_INFO ioctls Eugenio Pérez
2026-01-10 23:56 ` Michael S. Tsirkin
@ 2026-01-16 6:40 ` kernel test robot
1 sibling, 0 replies; 40+ messages in thread
From: kernel test robot @ 2026-01-16 6:40 UTC (permalink / raw)
To: Eugenio Pérez, Michael S . Tsirkin
Cc: llvm, oe-kbuild-all, linux-kernel, virtualization,
Maxime Coquelin, Laurent Vivier, Cindy Lu, jasowang, Xuan Zhuo,
Stefano Garzarella, Yongji Xie, Eugenio Pérez
Hi Eugenio,
kernel test robot noticed the following build warnings:
[auto build test WARNING on mst-vhost/linux-next]
[also build test WARNING on linus/master v6.19-rc5 next-20260115]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Eugenio-P-rez/vduse-add-v1-API-definition/20260109-233435
base: https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next
patch link: https://lore.kernel.org/r/20260109152430.512923-11-eperezma%40redhat.com
patch subject: [PATCH v11 10/12] vduse: merge tree search logic of IOTLB_GET_FD and IOTLB_GET_INFO ioctls
config: hexagon-allmodconfig (https://download.01.org/0day-ci/archive/20260116/202601161430.lDfgRE4U-lkp@intel.com/config)
compiler: clang version 17.0.6 (https://github.com/llvm/llvm-project 6009708b4367171ccdbf4b5905cb6a803753fe18)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260116/202601161430.lDfgRE4U-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202601161430.lDfgRE4U-lkp@intel.com/
All warnings (new ones prefixed by >>):
>> drivers/vdpa/vdpa_user/vduse_dev.c:1269:7: warning: variable 'map_file' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
1269 | if (f) {
| ^
drivers/vdpa/vdpa_user/vduse_dev.c:1273:19: note: uninitialized use occurs here
1273 | entry->offset = map_file->offset;
| ^~~~~~~~
drivers/vdpa/vdpa_user/vduse_dev.c:1269:3: note: remove the 'if' if its condition is always true
1269 | if (f) {
| ^~~~~~
drivers/vdpa/vdpa_user/vduse_dev.c:1256:38: note: initialize the variable 'map_file' to silence this warning
1256 | const struct vdpa_map_file *map_file;
| ^
| = NULL
1 warning generated.
vim +1269 drivers/vdpa/vdpa_user/vduse_dev.c
1249
1250 static int vduse_dev_iotlb_entry(struct vduse_dev *dev,
1251 struct vduse_iotlb_entry *entry,
1252 struct file **f, uint64_t *capability)
1253 {
1254 int r = -EINVAL;
1255 struct vhost_iotlb_map *map;
1256 const struct vdpa_map_file *map_file;
1257
1258 if (entry->start > entry->last)
1259 return -EINVAL;
1260
1261 mutex_lock(&dev->domain_lock);
1262 if (!dev->domain)
1263 goto out;
1264
1265 spin_lock(&dev->domain->iotlb_lock);
1266 map = vhost_iotlb_itree_first(dev->domain->iotlb, entry->start,
1267 entry->last);
1268 if (map) {
> 1269 if (f) {
1270 map_file = (struct vdpa_map_file *)map->opaque;
1271 *f = get_file(map_file->file);
1272 }
1273 entry->offset = map_file->offset;
1274 entry->start = map->start;
1275 entry->last = map->last;
1276 entry->perm = map->perm;
1277 if (capability) {
1278 *capability = 0;
1279
1280 if (dev->domain->bounce_map && map->start == 0 &&
1281 map->last == dev->domain->bounce_size - 1)
1282 *capability |= VDUSE_IOVA_CAP_UMEM;
1283 }
1284
1285 r = 0;
1286 }
1287 spin_unlock(&dev->domain->iotlb_lock);
1288
1289 out:
1290 mutex_unlock(&dev->domain_lock);
1291 return r;
1292 }
1293
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH v11 11/12] vduse: add vq group asid support
2026-01-09 15:24 [PATCH v11 00/12] Add multiple address spaces support to VDUSE Eugenio Pérez
` (9 preceding siblings ...)
2026-01-09 15:24 ` [PATCH v11 10/12] vduse: merge tree search logic of IOTLB_GET_FD and IOTLB_GET_INFO ioctls Eugenio Pérez
@ 2026-01-09 15:24 ` Eugenio Pérez
2026-01-11 0:03 ` Michael S. Tsirkin
2026-01-13 6:23 ` Jason Wang
2026-01-09 15:24 ` [PATCH v11 12/12] vduse: bump version number Eugenio Pérez
11 siblings, 2 replies; 40+ messages in thread
From: Eugenio Pérez @ 2026-01-09 15:24 UTC (permalink / raw)
To: Michael S . Tsirkin
Cc: linux-kernel, virtualization, Maxime Coquelin, Laurent Vivier,
Cindy Lu, jasowang, Xuan Zhuo, Stefano Garzarella, Yongji Xie,
Eugenio Pérez
Add support for assigning Address Space Identifiers (ASIDs) to each VQ
group. This enables mapping each group into a distinct memory space.
The vq group to ASID association is protected by a rwlock now. But the
mutex domain_lock keeps protecting the domains of all ASIDs, as some
operations like the one related with the bounce buffer size still
requires to lock all the ASIDs.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
Future improvements can include performance optimizations on top like
ore to RCU or thread synchronized atomics, or hardening by tracking ASID
or ASID hashes on unused bits of the DMA address.
Tested virtio_vdpa by adding manually two threads in vduse_set_status:
one of them modifies the vq group 0 ASID and the other one map and unmap
memory continuously. After a while, the two threads stop and the usual
work continues. Test with version 0, version 1 with the old ioctl, and
verion 1 with the new ioctl.
Tested with vhost_vdpa by migrating a VM while ping on OVS+VDUSE. A few
workaround were needed in some parts:
* Do not enable CVQ before data vqs in QEMU, as VDUSE does not forward
the enable message to the userland device. This will be solved in the
future.
* Share the suspended state between all vhost devices in QEMU:
https://lists.nongnu.org/archive/html/qemu-devel/2025-11/msg02947.html
* Implement a fake VDUSE suspend vdpa operation callback that always
returns true in the kernel. DPDK suspend the device at the first
GET_VRING_BASE.
* Remove the CVQ blocker in ASID.
The driver vhost_vdpa was also tested with version 0, version 1 with the
old ioctl, version 1 with the new ioctl but only one ASID, and version 1
with many ASID.
---
v11:
* Remove duplicated free_pages_exact in vduse_domain_free_coherent
(Jason).
* Do not take the vq groups lock if nas == 1.
* Do not reset the vq group ASID in vq reset (Jason). Removed extra
function vduse_set_group_asid_nomsg, not needed anymore.
* Move the vduse_iotlb_entry_v2 argument to a new ioctl, as argument
didn't match the previous VDUSE_IOTLB_GET_FD.
* Move the asid < dev->nas check to vdpa core.
v10:
* Back to rwlock version so stronger locks are used.
* Take out allocations from rwlock.
* Forbid changing ASID of a vq group after DRIVER_OK (Jason)
* Remove bad fetching again of domain variable in
vduse_dev_max_mapping_size (Yongji).
* Remove unused vdev definition in vdpa map_ops callbacks (kernel test
robot).
v9:
* Replace mutex with rwlock, as the vdpa map_ops can run from atomic
context.
v8:
* Revert the mutex to rwlock change, it needs proper profiling to
justify it.
v7:
* Take write lock in the error path (Jason).
v6:
* Make vdpa_dev_add use gotos for error handling (MST).
* s/(dev->api_version < 1) ?/(dev->api_version < VDUSE_API_VERSION_1) ?/
(MST).
* Fix struct name not matching in the doc.
v5:
* Properly return errno if copy_to_user returns >0 in VDUSE_IOTLB_GET_FD
ioctl (Jason).
* Properly set domain bounce size to divide equally between nas (Jason).
* Exclude "padding" member from the only >V1 members in
vduse_dev_request.
v4:
* Divide each domain bounce size between the device bounce size (Jason).
* revert unneeded addr = NULL assignment (Jason)
* Change if (x && (y || z)) return to if (x) { if (y) return; if (z)
return; } (Jason)
* Change a bad multiline comment, using @ caracter instead of * (Jason).
* Consider config->nas == 0 as a fail (Jason).
v3:
* Get the vduse domain through the vduse_as in the map functions
(Jason).
* Squash with the patch creating the vduse_as struct (Jason).
* Create VDUSE_DEV_MAX_AS instead of comparing agains a magic number
(Jason)
v2:
* Convert the use of mutex to rwlock.
RFC v3:
* Increase VDUSE_MAX_VQ_GROUPS to 0xffff (Jason). It was set to a lower
value to reduce memory consumption, but vqs are already limited to
that value and userspace VDUSE is able to allocate that many vqs.
* Remove TODO about merging VDUSE_IOTLB_GET_FD ioctl with
VDUSE_IOTLB_GET_INFO.
* Use of array_index_nospec in VDUSE device ioctls.
* Embed vduse_iotlb_entry into vduse_iotlb_entry_v2.
* Move the umem mutex to asid struct so there is no contention between
ASIDs.
RFC v2:
* Make iotlb entry the last one of vduse_iotlb_entry_v2 so the first
part of the struct is the same.
---
drivers/vdpa/vdpa_user/vduse_dev.c | 392 ++++++++++++++++++++---------
include/uapi/linux/vduse.h | 63 ++++-
2 files changed, 333 insertions(+), 122 deletions(-)
diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
index bf437816fd7d..8227b5e9f3f6 100644
--- a/drivers/vdpa/vdpa_user/vduse_dev.c
+++ b/drivers/vdpa/vdpa_user/vduse_dev.c
@@ -41,6 +41,7 @@
#define VDUSE_DEV_MAX (1U << MINORBITS)
#define VDUSE_DEV_MAX_GROUPS 0xffff
+#define VDUSE_DEV_MAX_AS 0xffff
#define VDUSE_MAX_BOUNCE_SIZE (1024 * 1024 * 1024)
#define VDUSE_MIN_BOUNCE_SIZE (1024 * 1024)
#define VDUSE_BOUNCE_SIZE (64 * 1024 * 1024)
@@ -86,7 +87,15 @@ struct vduse_umem {
struct mm_struct *mm;
};
+struct vduse_as {
+ struct vduse_iova_domain *domain;
+ struct vduse_umem *umem;
+ struct mutex mem_lock;
+};
+
struct vduse_vq_group {
+ rwlock_t as_lock;
+ struct vduse_as *as; /* Protected by as_lock */
struct vduse_dev *dev;
};
@@ -94,7 +103,7 @@ struct vduse_dev {
struct vduse_vdpa *vdev;
struct device *dev;
struct vduse_virtqueue **vqs;
- struct vduse_iova_domain *domain;
+ struct vduse_as *as;
char *name;
struct mutex lock;
spinlock_t msg_lock;
@@ -122,9 +131,8 @@ struct vduse_dev {
u32 vq_num;
u32 vq_align;
u32 ngroups;
- struct vduse_umem *umem;
+ u32 nas;
struct vduse_vq_group *groups;
- struct mutex mem_lock;
unsigned int bounce_size;
struct mutex domain_lock;
};
@@ -314,7 +322,7 @@ static int vduse_dev_set_status(struct vduse_dev *dev, u8 status)
return vduse_dev_msg_sync(dev, &msg);
}
-static int vduse_dev_update_iotlb(struct vduse_dev *dev,
+static int vduse_dev_update_iotlb(struct vduse_dev *dev, u32 asid,
u64 start, u64 last)
{
struct vduse_dev_msg msg = { 0 };
@@ -323,8 +331,14 @@ static int vduse_dev_update_iotlb(struct vduse_dev *dev,
return -EINVAL;
msg.req.type = VDUSE_UPDATE_IOTLB;
- msg.req.iova.start = start;
- msg.req.iova.last = last;
+ if (dev->api_version < VDUSE_API_VERSION_1) {
+ msg.req.iova.start = start;
+ msg.req.iova.last = last;
+ } else {
+ msg.req.iova_v2.start = start;
+ msg.req.iova_v2.last = last;
+ msg.req.iova_v2.asid = asid;
+ }
return vduse_dev_msg_sync(dev, &msg);
}
@@ -439,11 +453,14 @@ static __poll_t vduse_dev_poll(struct file *file, poll_table *wait)
static void vduse_dev_reset(struct vduse_dev *dev)
{
int i;
- struct vduse_iova_domain *domain = dev->domain;
/* The coherent mappings are handled in vduse_dev_free_coherent() */
- if (domain && domain->bounce_map)
- vduse_domain_reset_bounce_map(domain);
+ for (i = 0; i < dev->nas; i++) {
+ struct vduse_iova_domain *domain = dev->as[i].domain;
+
+ if (domain && domain->bounce_map)
+ vduse_domain_reset_bounce_map(domain);
+ }
down_write(&dev->rwsem);
@@ -622,6 +639,31 @@ static union virtio_map vduse_get_vq_map(struct vdpa_device *vdpa, u16 idx)
return ret;
}
+static int vduse_set_group_asid(struct vdpa_device *vdpa, unsigned int group,
+ unsigned int asid)
+{
+ struct vduse_dev *dev = vdpa_to_vduse(vdpa);
+ struct vduse_dev_msg msg = { 0 };
+ int r;
+
+ if (dev->api_version < VDUSE_API_VERSION_1)
+ return -EINVAL;
+
+ msg.req.type = VDUSE_SET_VQ_GROUP_ASID;
+ msg.req.vq_group_asid.group = group;
+ msg.req.vq_group_asid.asid = asid;
+
+ r = vduse_dev_msg_sync(dev, &msg);
+ if (r < 0)
+ return r;
+
+ write_lock(&dev->groups[group].as_lock);
+ dev->groups[group].as = &dev->as[asid];
+ write_unlock(&dev->groups[group].as_lock);
+
+ return 0;
+}
+
static int vduse_vdpa_get_vq_state(struct vdpa_device *vdpa, u16 idx,
struct vdpa_vq_state *state)
{
@@ -793,13 +835,13 @@ static int vduse_vdpa_set_map(struct vdpa_device *vdpa,
struct vduse_dev *dev = vdpa_to_vduse(vdpa);
int ret;
- ret = vduse_domain_set_map(dev->domain, iotlb);
+ ret = vduse_domain_set_map(dev->as[asid].domain, iotlb);
if (ret)
return ret;
- ret = vduse_dev_update_iotlb(dev, 0ULL, ULLONG_MAX);
+ ret = vduse_dev_update_iotlb(dev, asid, 0ULL, ULLONG_MAX);
if (ret) {
- vduse_domain_clear_map(dev->domain, iotlb);
+ vduse_domain_clear_map(dev->as[asid].domain, iotlb);
return ret;
}
@@ -842,6 +884,7 @@ static const struct vdpa_config_ops vduse_vdpa_config_ops = {
.get_vq_affinity = vduse_vdpa_get_vq_affinity,
.reset = vduse_vdpa_reset,
.set_map = vduse_vdpa_set_map,
+ .set_group_asid = vduse_set_group_asid,
.get_vq_map = vduse_get_vq_map,
.free = vduse_vdpa_free,
};
@@ -850,32 +893,38 @@ static void vduse_dev_sync_single_for_device(union virtio_map token,
dma_addr_t dma_addr, size_t size,
enum dma_data_direction dir)
{
- struct vduse_dev *vdev;
struct vduse_iova_domain *domain;
if (!token.group)
return;
- vdev = token.group->dev;
- domain = vdev->domain;
+ if (token.group->dev->nas > 1)
+ read_lock(&token.group->as_lock);
+ domain = token.group->as->domain;
vduse_domain_sync_single_for_device(domain, dma_addr, size, dir);
+
+ if (token.group->dev->nas > 1)
+ read_unlock(&token.group->as_lock);
}
static void vduse_dev_sync_single_for_cpu(union virtio_map token,
dma_addr_t dma_addr, size_t size,
enum dma_data_direction dir)
{
- struct vduse_dev *vdev;
struct vduse_iova_domain *domain;
if (!token.group)
return;
- vdev = token.group->dev;
- domain = vdev->domain;
+ if (token.group->dev->nas > 1)
+ read_lock(&token.group->as_lock);
+ domain = token.group->as->domain;
vduse_domain_sync_single_for_cpu(domain, dma_addr, size, dir);
+
+ if (token.group->dev->nas > 1)
+ read_unlock(&token.group->as_lock);
}
static dma_addr_t vduse_dev_map_page(union virtio_map token, struct page *page,
@@ -883,38 +932,45 @@ static dma_addr_t vduse_dev_map_page(union virtio_map token, struct page *page,
enum dma_data_direction dir,
unsigned long attrs)
{
- struct vduse_dev *vdev;
struct vduse_iova_domain *domain;
+ dma_addr_t r;
if (!token.group)
return DMA_MAPPING_ERROR;
- vdev = token.group->dev;
- domain = vdev->domain;
+ if (token.group->dev->nas > 1)
+ read_lock(&token.group->as_lock);
+ domain = token.group->as->domain;
+ r = vduse_domain_map_page(domain, page, offset, size, dir, attrs);
- return vduse_domain_map_page(domain, page, offset, size, dir, attrs);
+ if (token.group->dev->nas > 1)
+ read_unlock(&token.group->as_lock);
+
+ return r;
}
static void vduse_dev_unmap_page(union virtio_map token, dma_addr_t dma_addr,
size_t size, enum dma_data_direction dir,
unsigned long attrs)
{
- struct vduse_dev *vdev;
struct vduse_iova_domain *domain;
if (!token.group)
return;
- vdev = token.group->dev;
- domain = vdev->domain;
+ if (token.group->dev->nas > 1)
+ read_lock(&token.group->as_lock);
+
+ domain = token.group->as->domain;
+ vduse_domain_unmap_page(domain, dma_addr, size, dir, attrs);
- return vduse_domain_unmap_page(domain, dma_addr, size, dir, attrs);
+ if (token.group->dev->nas > 1)
+ read_unlock(&token.group->as_lock);
}
static void *vduse_dev_alloc_coherent(union virtio_map token, size_t size,
dma_addr_t *dma_addr, gfp_t flag)
{
- struct vduse_dev *vdev;
struct vduse_iova_domain *domain;
unsigned long iova;
void *addr;
@@ -927,8 +983,10 @@ static void *vduse_dev_alloc_coherent(union virtio_map token, size_t size,
if (!addr)
return NULL;
- vdev = token.group->dev;
- domain = vdev->domain;
+ if (token.group->dev->nas > 1)
+ read_lock(&token.group->as_lock);
+
+ domain = token.group->as->domain;
addr = vduse_domain_alloc_coherent(domain, size,
(dma_addr_t *)&iova, addr);
if (!addr)
@@ -936,9 +994,14 @@ static void *vduse_dev_alloc_coherent(union virtio_map token, size_t size,
*dma_addr = (dma_addr_t)iova;
+ if (token.group->dev->nas > 1)
+ read_unlock(&token.group->as_lock);
+
return addr;
err:
+ if (token.group->dev->nas > 1)
+ read_unlock(&token.group->as_lock);
free_pages_exact(addr, size);
return NULL;
}
@@ -947,31 +1010,39 @@ static void vduse_dev_free_coherent(union virtio_map token, size_t size,
void *vaddr, dma_addr_t dma_addr,
unsigned long attrs)
{
- struct vduse_dev *vdev;
struct vduse_iova_domain *domain;
if (!token.group)
return;
- vdev = token.group->dev;
- domain = vdev->domain;
+ if (token.group->dev->nas > 1)
+ read_lock(&token.group->as_lock);
+ domain = token.group->as->domain;
vduse_domain_free_coherent(domain, size, dma_addr, attrs);
+
+ if (token.group->dev->nas > 1)
+ read_unlock(&token.group->as_lock);
+
free_pages_exact(vaddr, size);
}
static bool vduse_dev_need_sync(union virtio_map token, dma_addr_t dma_addr)
{
- struct vduse_dev *vdev;
- struct vduse_iova_domain *domain;
+ size_t bounce_size;
if (!token.group)
return false;
- vdev = token.group->dev;
- domain = vdev->domain;
+ if (token.group->dev->nas > 1)
+ read_lock(&token.group->as_lock);
+
+ bounce_size = token.group->as->domain->bounce_size;
+
+ if (token.group->dev->nas > 1)
+ read_unlock(&token.group->as_lock);
- return dma_addr < domain->bounce_size;
+ return dma_addr < bounce_size;
}
static int vduse_dev_mapping_error(union virtio_map token, dma_addr_t dma_addr)
@@ -983,16 +1054,20 @@ static int vduse_dev_mapping_error(union virtio_map token, dma_addr_t dma_addr)
static size_t vduse_dev_max_mapping_size(union virtio_map token)
{
- struct vduse_dev *vdev;
- struct vduse_iova_domain *domain;
+ size_t bounce_size;
if (!token.group)
return 0;
- vdev = token.group->dev;
- domain = vdev->domain;
+ if (token.group->dev->nas > 1)
+ read_lock(&token.group->as_lock);
+
+ bounce_size = token.group->as->domain->bounce_size;
+
+ if (token.group->dev->nas > 1)
+ read_unlock(&token.group->as_lock);
- return domain->bounce_size;
+ return bounce_size;
}
static const struct virtio_map_ops vduse_map_ops = {
@@ -1132,39 +1207,40 @@ static int vduse_dev_queue_irq_work(struct vduse_dev *dev,
return ret;
}
-static int vduse_dev_dereg_umem(struct vduse_dev *dev,
+static int vduse_dev_dereg_umem(struct vduse_dev *dev, u32 asid,
u64 iova, u64 size)
{
int ret;
- mutex_lock(&dev->mem_lock);
+ mutex_lock(&dev->as[asid].mem_lock);
ret = -ENOENT;
- if (!dev->umem)
+ if (!dev->as[asid].umem)
goto unlock;
ret = -EINVAL;
- if (!dev->domain)
+ if (!dev->as[asid].domain)
goto unlock;
- if (dev->umem->iova != iova || size != dev->domain->bounce_size)
+ if (dev->as[asid].umem->iova != iova ||
+ size != dev->as[asid].domain->bounce_size)
goto unlock;
- vduse_domain_remove_user_bounce_pages(dev->domain);
- unpin_user_pages_dirty_lock(dev->umem->pages,
- dev->umem->npages, true);
- atomic64_sub(dev->umem->npages, &dev->umem->mm->pinned_vm);
- mmdrop(dev->umem->mm);
- vfree(dev->umem->pages);
- kfree(dev->umem);
- dev->umem = NULL;
+ vduse_domain_remove_user_bounce_pages(dev->as[asid].domain);
+ unpin_user_pages_dirty_lock(dev->as[asid].umem->pages,
+ dev->as[asid].umem->npages, true);
+ atomic64_sub(dev->as[asid].umem->npages, &dev->as[asid].umem->mm->pinned_vm);
+ mmdrop(dev->as[asid].umem->mm);
+ vfree(dev->as[asid].umem->pages);
+ kfree(dev->as[asid].umem);
+ dev->as[asid].umem = NULL;
ret = 0;
unlock:
- mutex_unlock(&dev->mem_lock);
+ mutex_unlock(&dev->as[asid].mem_lock);
return ret;
}
static int vduse_dev_reg_umem(struct vduse_dev *dev,
- u64 iova, u64 uaddr, u64 size)
+ u32 asid, u64 iova, u64 uaddr, u64 size)
{
struct page **page_list = NULL;
struct vduse_umem *umem = NULL;
@@ -1172,14 +1248,14 @@ static int vduse_dev_reg_umem(struct vduse_dev *dev,
unsigned long npages, lock_limit;
int ret;
- if (!dev->domain || !dev->domain->bounce_map ||
- size != dev->domain->bounce_size ||
+ if (!dev->as[asid].domain || !dev->as[asid].domain->bounce_map ||
+ size != dev->as[asid].domain->bounce_size ||
iova != 0 || uaddr & ~PAGE_MASK)
return -EINVAL;
- mutex_lock(&dev->mem_lock);
+ mutex_lock(&dev->as[asid].mem_lock);
ret = -EEXIST;
- if (dev->umem)
+ if (dev->as[asid].umem)
goto unlock;
ret = -ENOMEM;
@@ -1203,7 +1279,7 @@ static int vduse_dev_reg_umem(struct vduse_dev *dev,
goto out;
}
- ret = vduse_domain_add_user_bounce_pages(dev->domain,
+ ret = vduse_domain_add_user_bounce_pages(dev->as[asid].domain,
page_list, pinned);
if (ret)
goto out;
@@ -1216,7 +1292,7 @@ static int vduse_dev_reg_umem(struct vduse_dev *dev,
umem->mm = current->mm;
mmgrab(current->mm);
- dev->umem = umem;
+ dev->as[asid].umem = umem;
out:
if (ret && pinned > 0)
unpin_user_pages(page_list, pinned);
@@ -1227,7 +1303,7 @@ static int vduse_dev_reg_umem(struct vduse_dev *dev,
vfree(page_list);
kfree(umem);
}
- mutex_unlock(&dev->mem_lock);
+ mutex_unlock(&dev->as[asid].mem_lock);
return ret;
}
@@ -1248,43 +1324,46 @@ static void vduse_vq_update_effective_cpu(struct vduse_virtqueue *vq)
}
static int vduse_dev_iotlb_entry(struct vduse_dev *dev,
- struct vduse_iotlb_entry *entry,
+ struct vduse_iotlb_entry_v2 *entry,
struct file **f, uint64_t *capability)
{
+ u32 asid;
int r = -EINVAL;
struct vhost_iotlb_map *map;
const struct vdpa_map_file *map_file;
- if (entry->start > entry->last)
+ if (entry->v1.start > entry->v1.last || entry->asid >= dev->nas)
return -EINVAL;
+ asid = array_index_nospec(entry->asid, dev->nas);
mutex_lock(&dev->domain_lock);
- if (!dev->domain)
+
+ if (!dev->as[asid].domain)
goto out;
- spin_lock(&dev->domain->iotlb_lock);
- map = vhost_iotlb_itree_first(dev->domain->iotlb, entry->start,
- entry->last);
+ spin_lock(&dev->as[asid].domain->iotlb_lock);
+ map = vhost_iotlb_itree_first(dev->as[asid].domain->iotlb,
+ entry->v1.start, entry->v1.last);
if (map) {
if (f) {
map_file = (struct vdpa_map_file *)map->opaque;
*f = get_file(map_file->file);
}
- entry->offset = map_file->offset;
- entry->start = map->start;
- entry->last = map->last;
- entry->perm = map->perm;
+ entry->v1.offset = map_file->offset;
+ entry->v1.start = map->start;
+ entry->v1.last = map->last;
+ entry->v1.perm = map->perm;
if (capability) {
*capability = 0;
- if (dev->domain->bounce_map && map->start == 0 &&
- map->last == dev->domain->bounce_size - 1)
+ if (dev->as[asid].domain->bounce_map && map->start == 0 &&
+ map->last == dev->as[asid].domain->bounce_size - 1)
*capability |= VDUSE_IOVA_CAP_UMEM;
}
r = 0;
}
- spin_unlock(&dev->domain->iotlb_lock);
+ spin_unlock(&dev->as[asid].domain->iotlb_lock);
out:
mutex_unlock(&dev->domain_lock);
@@ -1302,12 +1381,29 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
return -EPERM;
switch (cmd) {
- case VDUSE_IOTLB_GET_FD: {
- struct vduse_iotlb_entry entry;
+ case VDUSE_IOTLB_GET_FD:
+ case VDUSE_IOTLB_GET_FD2: {
+ struct vduse_iotlb_entry_v2 entry = {0};
struct file *f = NULL;
+ ret = -ENOIOCTLCMD;
+ if (dev->api_version < VDUSE_API_VERSION_1 &&
+ cmd == VDUSE_IOTLB_GET_FD2)
+ break;
+
ret = -EFAULT;
- if (copy_from_user(&entry, argp, sizeof(entry)))
+ if (cmd == VDUSE_IOTLB_GET_FD2) {
+ if (copy_from_user(&entry, argp, sizeof(entry)))
+ break;
+ } else {
+ if (copy_from_user(&entry.v1, argp,
+ sizeof(entry.v1)))
+ break;
+ }
+
+ ret = -EINVAL;
+ if (!is_mem_zero((const char *)entry.reserved,
+ sizeof(entry.reserved)))
break;
ret = vduse_dev_iotlb_entry(dev, &entry, &f, NULL);
@@ -1318,12 +1414,19 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
if (!f)
break;
- ret = -EFAULT;
- if (copy_to_user(argp, &entry, sizeof(entry))) {
+ if (cmd == VDUSE_IOTLB_GET_FD2)
+ ret = copy_to_user(argp, &entry,
+ sizeof(entry));
+ else
+ ret = copy_to_user(argp, &entry.v1,
+ sizeof(entry.v1));
+
+ if (ret) {
+ ret = -EFAULT;
fput(f);
break;
}
- ret = receive_fd(f, NULL, perm_to_file_flags(entry.perm));
+ ret = receive_fd(f, NULL, perm_to_file_flags(entry.v1.perm));
fput(f);
break;
}
@@ -1468,6 +1571,7 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
}
case VDUSE_IOTLB_REG_UMEM: {
struct vduse_iova_umem umem;
+ u32 asid;
ret = -EFAULT;
if (copy_from_user(&umem, argp, sizeof(umem)))
@@ -1475,17 +1579,21 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
ret = -EINVAL;
if (!is_mem_zero((const char *)umem.reserved,
- sizeof(umem.reserved)))
+ sizeof(umem.reserved)) ||
+ (dev->api_version < VDUSE_API_VERSION_1 &&
+ umem.asid != 0) || umem.asid >= dev->nas)
break;
mutex_lock(&dev->domain_lock);
- ret = vduse_dev_reg_umem(dev, umem.iova,
+ asid = array_index_nospec(umem.asid, dev->nas);
+ ret = vduse_dev_reg_umem(dev, asid, umem.iova,
umem.uaddr, umem.size);
mutex_unlock(&dev->domain_lock);
break;
}
case VDUSE_IOTLB_DEREG_UMEM: {
struct vduse_iova_umem umem;
+ u32 asid;
ret = -EFAULT;
if (copy_from_user(&umem, argp, sizeof(umem)))
@@ -1493,17 +1601,22 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
ret = -EINVAL;
if (!is_mem_zero((const char *)umem.reserved,
- sizeof(umem.reserved)))
+ sizeof(umem.reserved)) ||
+ (dev->api_version < VDUSE_API_VERSION_1 &&
+ umem.asid != 0) ||
+ umem.asid >= dev->nas)
break;
+
mutex_lock(&dev->domain_lock);
- ret = vduse_dev_dereg_umem(dev, umem.iova,
+ asid = array_index_nospec(umem.asid, dev->nas);
+ ret = vduse_dev_dereg_umem(dev, asid, umem.iova,
umem.size);
mutex_unlock(&dev->domain_lock);
break;
}
case VDUSE_IOTLB_GET_INFO: {
struct vduse_iova_info info;
- struct vduse_iotlb_entry entry;
+ struct vduse_iotlb_entry_v2 entry;
ret = -EFAULT;
if (copy_from_user(&info, argp, sizeof(info)))
@@ -1513,15 +1626,23 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
sizeof(info.reserved)))
break;
- entry.start = info.start;
- entry.last = info.last;
+ if (dev->api_version < VDUSE_API_VERSION_1) {
+ if (info.asid)
+ break;
+ } else if (info.asid >= dev->nas)
+ break;
+
+ entry.v1.start = info.start;
+ entry.v1.last = info.last;
+ entry.asid = info.asid;
ret = vduse_dev_iotlb_entry(dev, &entry, NULL,
&info.capability);
if (ret < 0)
break;
- info.start = entry.start;
- info.last = entry.last;
+ info.start = entry.v1.start;
+ info.last = entry.v1.last;
+ info.asid = entry.asid;
ret = -EFAULT;
if (copy_to_user(argp, &info, sizeof(info)))
@@ -1543,8 +1664,10 @@ static int vduse_dev_release(struct inode *inode, struct file *file)
struct vduse_dev *dev = file->private_data;
mutex_lock(&dev->domain_lock);
- if (dev->domain)
- vduse_dev_dereg_umem(dev, 0, dev->domain->bounce_size);
+ for (int i = 0; i < dev->nas; i++)
+ if (dev->as[i].domain)
+ vduse_dev_dereg_umem(dev, i, 0,
+ dev->as[i].domain->bounce_size);
mutex_unlock(&dev->domain_lock);
spin_lock(&dev->msg_lock);
/* Make sure the inflight messages can processed after reconncection */
@@ -1763,7 +1886,6 @@ static struct vduse_dev *vduse_dev_create(void)
return NULL;
mutex_init(&dev->lock);
- mutex_init(&dev->mem_lock);
mutex_init(&dev->domain_lock);
spin_lock_init(&dev->msg_lock);
INIT_LIST_HEAD(&dev->send_list);
@@ -1814,8 +1936,11 @@ static int vduse_destroy_dev(char *name)
idr_remove(&vduse_idr, dev->minor);
kvfree(dev->config);
vduse_dev_deinit_vqs(dev);
- if (dev->domain)
- vduse_domain_destroy(dev->domain);
+ for (int i = 0; i < dev->nas; i++) {
+ if (dev->as[i].domain)
+ vduse_domain_destroy(dev->as[i].domain);
+ }
+ kfree(dev->as);
kfree(dev->name);
kfree(dev->groups);
vduse_dev_destroy(dev);
@@ -1862,12 +1987,17 @@ static bool vduse_validate_config(struct vduse_dev_config *config,
sizeof(config->reserved)))
return false;
- if (api_version < VDUSE_API_VERSION_1 && config->ngroups)
+ if (api_version < VDUSE_API_VERSION_1 &&
+ (config->ngroups || config->nas))
return false;
- if (api_version >= VDUSE_API_VERSION_1 &&
- (!config->ngroups || config->ngroups > VDUSE_DEV_MAX_GROUPS))
- return false;
+ if (api_version >= VDUSE_API_VERSION_1) {
+ if (!config->ngroups || config->ngroups > VDUSE_DEV_MAX_GROUPS)
+ return false;
+
+ if (!config->nas || config->nas > VDUSE_DEV_MAX_AS)
+ return false;
+ }
if (config->vq_align > PAGE_SIZE)
return false;
@@ -1932,7 +2062,8 @@ static ssize_t bounce_size_store(struct device *device,
ret = -EPERM;
mutex_lock(&dev->domain_lock);
- if (dev->domain)
+ /* Assuming that if the first domain is allocated, all are allocated */
+ if (dev->as[0].domain)
goto unlock;
ret = kstrtouint(buf, 10, &bounce_size);
@@ -1984,6 +2115,14 @@ static int vduse_create_dev(struct vduse_dev_config *config,
dev->device_features = config->features;
dev->device_id = config->device_id;
dev->vendor_id = config->vendor_id;
+
+ dev->nas = (dev->api_version < VDUSE_API_VERSION_1) ? 1 : config->nas;
+ dev->as = kcalloc(dev->nas, sizeof(dev->as[0]), GFP_KERNEL);
+ if (!dev->as)
+ goto err_as;
+ for (int i = 0; i < dev->nas; i++)
+ mutex_init(&dev->as[i].mem_lock);
+
dev->ngroups = (dev->api_version < VDUSE_API_VERSION_1)
? 1
: config->ngroups;
@@ -1991,8 +2130,11 @@ static int vduse_create_dev(struct vduse_dev_config *config,
GFP_KERNEL);
if (!dev->groups)
goto err_vq_groups;
- for (u32 i = 0; i < dev->ngroups; ++i)
+ for (u32 i = 0; i < dev->ngroups; ++i) {
dev->groups[i].dev = dev;
+ rwlock_init(&dev->groups[i].as_lock);
+ dev->groups[i].as = &dev->as[0];
+ }
dev->name = kstrdup(config->name, GFP_KERNEL);
if (!dev->name)
@@ -2032,6 +2174,8 @@ static int vduse_create_dev(struct vduse_dev_config *config,
err_str:
kfree(dev->groups);
err_vq_groups:
+ kfree(dev->as);
+err_as:
vduse_dev_destroy(dev);
err:
return ret;
@@ -2155,7 +2299,7 @@ static int vduse_dev_init_vdpa(struct vduse_dev *dev, const char *name)
vdev = vdpa_alloc_device(struct vduse_vdpa, vdpa, dev->dev,
&vduse_vdpa_config_ops, &vduse_map_ops,
- dev->ngroups, 1, name, true);
+ dev->ngroups, dev->nas, name, true);
if (IS_ERR(vdev))
return PTR_ERR(vdev);
@@ -2170,7 +2314,8 @@ static int vdpa_dev_add(struct vdpa_mgmt_dev *mdev, const char *name,
const struct vdpa_dev_set_config *config)
{
struct vduse_dev *dev;
- int ret;
+ size_t domain_bounce_size;
+ int ret, i;
mutex_lock(&vduse_lock);
dev = vduse_find_dev(name);
@@ -2184,29 +2329,38 @@ static int vdpa_dev_add(struct vdpa_mgmt_dev *mdev, const char *name,
return ret;
mutex_lock(&dev->domain_lock);
- if (!dev->domain)
- dev->domain = vduse_domain_create(VDUSE_IOVA_SIZE - 1,
- dev->bounce_size);
- mutex_unlock(&dev->domain_lock);
- if (!dev->domain) {
- ret = -ENOMEM;
- goto domain_err;
+ ret = 0;
+
+ domain_bounce_size = dev->bounce_size / dev->nas;
+ for (i = 0; i < dev->nas; ++i) {
+ dev->as[i].domain = vduse_domain_create(VDUSE_IOVA_SIZE - 1,
+ domain_bounce_size);
+ if (!dev->as[i].domain) {
+ ret = -ENOMEM;
+ goto err;
+ }
}
+ mutex_unlock(&dev->domain_lock);
+
ret = _vdpa_register_device(&dev->vdev->vdpa, dev->vq_num);
- if (ret) {
- goto register_err;
- }
+ if (ret)
+ goto err_register;
return 0;
-register_err:
+err_register:
mutex_lock(&dev->domain_lock);
- vduse_domain_destroy(dev->domain);
- dev->domain = NULL;
+
+err:
+ for (int j = 0; j < i; j++) {
+ if (dev->as[j].domain) {
+ vduse_domain_destroy(dev->as[j].domain);
+ dev->as[j].domain = NULL;
+ }
+ }
mutex_unlock(&dev->domain_lock);
-domain_err:
put_device(&dev->vdev->vdpa.dev);
return ret;
diff --git a/include/uapi/linux/vduse.h b/include/uapi/linux/vduse.h
index a3d51cf6df3a..9e423163d819 100644
--- a/include/uapi/linux/vduse.h
+++ b/include/uapi/linux/vduse.h
@@ -47,7 +47,8 @@ struct vduse_dev_config {
__u32 vq_num;
__u32 vq_align;
__u32 ngroups; /* if VDUSE_API_VERSION >= 1 */
- __u32 reserved[12];
+ __u32 nas; /* if VDUSE_API_VERSION >= 1 */
+ __u32 reserved[11];
__u32 config_size;
__u8 config[];
};
@@ -166,6 +167,16 @@ struct vduse_vq_state_packed {
__u16 last_used_idx;
};
+/**
+ * struct vduse_vq_group_asid - virtqueue group ASID
+ * @group: Index of the virtqueue group
+ * @asid: Address space ID of the group
+ */
+struct vduse_vq_group_asid {
+ __u32 group;
+ __u32 asid;
+};
+
/**
* struct vduse_vq_info - information of a virtqueue
* @index: virtqueue index
@@ -225,6 +236,7 @@ struct vduse_vq_eventfd {
* @uaddr: start address of userspace memory, it must be aligned to page size
* @iova: start of the IOVA region
* @size: size of the IOVA region
+ * @asid: Address space ID of the IOVA region
* @reserved: for future use, needs to be initialized to zero
*
* Structure used by VDUSE_IOTLB_REG_UMEM and VDUSE_IOTLB_DEREG_UMEM
@@ -234,7 +246,8 @@ struct vduse_iova_umem {
__u64 uaddr;
__u64 iova;
__u64 size;
- __u64 reserved[3];
+ __u32 asid;
+ __u32 reserved[5];
};
/* Register userspace memory for IOVA regions */
@@ -248,6 +261,7 @@ struct vduse_iova_umem {
* @start: start of the IOVA region
* @last: last of the IOVA region
* @capability: capability of the IOVA region
+ * @asid: Address space ID of the IOVA region, only if device API version >= 1
* @reserved: for future use, needs to be initialized to zero
*
* Structure used by VDUSE_IOTLB_GET_INFO ioctl to get information of
@@ -258,7 +272,8 @@ struct vduse_iova_info {
__u64 last;
#define VDUSE_IOVA_CAP_UMEM (1 << 0)
__u64 capability;
- __u64 reserved[3];
+ __u32 asid; /* Only if device API version >= 1 */
+ __u32 reserved[5];
};
/*
@@ -267,6 +282,28 @@ struct vduse_iova_info {
*/
#define VDUSE_IOTLB_GET_INFO _IOWR(VDUSE_BASE, 0x1a, struct vduse_iova_info)
+/**
+ * struct vduse_iotlb_entry_v2 - entry of IOTLB to describe one IOVA region
+ *
+ * @v1: the original vduse_iotlb_entry
+ * @asid: address space ID of the IOVA region
+ * @reserver: for future use, needs to be initialized to zero
+ *
+ * Structure used by VDUSE_IOTLB_GET_FD2 ioctl to find an overlapped IOVA region.
+ */
+struct vduse_iotlb_entry_v2 {
+ struct vduse_iotlb_entry v1;
+ __u32 asid;
+ __u32 reserved[12];
+};
+
+/*
+ * Same as VDUSE_IOTLB_GET_FD but with vduse_iotlb_entry_v2 argument that
+ * support extra fields.
+ */
+#define VDUSE_IOTLB_GET_FD2 _IOWR(VDUSE_BASE, 0x1b, struct vduse_iotlb_entry_v2)
+
+
/* The control messages definition for read(2)/write(2) on /dev/vduse/$NAME */
/**
@@ -280,6 +317,7 @@ enum vduse_req_type {
VDUSE_GET_VQ_STATE,
VDUSE_SET_STATUS,
VDUSE_UPDATE_IOTLB,
+ VDUSE_SET_VQ_GROUP_ASID,
};
/**
@@ -314,6 +352,18 @@ struct vduse_iova_range {
__u64 last;
};
+/**
+ * struct vduse_iova_range - IOVA range [start, last] if API_VERSION >= 1
+ * @start: start of the IOVA range
+ * @last: last of the IOVA range
+ * @asid: address space ID of the IOVA range
+ */
+struct vduse_iova_range_v2 {
+ __u64 start;
+ __u64 last;
+ __u32 asid;
+};
+
/**
* struct vduse_dev_request - control request
* @type: request type
@@ -322,6 +372,8 @@ struct vduse_iova_range {
* @vq_state: virtqueue state, only index field is available
* @s: device status
* @iova: IOVA range for updating
+ * @iova_v2: IOVA range for updating if API_VERSION >= 1
+ * @vq_group_asid: ASID of a virtqueue group
* @padding: padding
*
* Structure used by read(2) on /dev/vduse/$NAME.
@@ -334,6 +386,11 @@ struct vduse_dev_request {
struct vduse_vq_state vq_state;
struct vduse_dev_status s;
struct vduse_iova_range iova;
+ /* Following members but padding exist only if vduse api
+ * version >= 1
+ */;
+ struct vduse_iova_range_v2 iova_v2;
+ struct vduse_vq_group_asid vq_group_asid;
__u32 padding[32];
};
};
--
2.52.0
^ permalink raw reply related [flat|nested] 40+ messages in thread* Re: [PATCH v11 11/12] vduse: add vq group asid support
2026-01-09 15:24 ` [PATCH v11 11/12] vduse: add vq group asid support Eugenio Pérez
@ 2026-01-11 0:03 ` Michael S. Tsirkin
2026-01-12 11:56 ` Eugenio Perez Martin
2026-01-13 6:23 ` Jason Wang
1 sibling, 1 reply; 40+ messages in thread
From: Michael S. Tsirkin @ 2026-01-11 0:03 UTC (permalink / raw)
To: Eugenio Pérez
Cc: linux-kernel, virtualization, Maxime Coquelin, Laurent Vivier,
Cindy Lu, jasowang, Xuan Zhuo, Stefano Garzarella, Yongji Xie
On Fri, Jan 09, 2026 at 04:24:29PM +0100, Eugenio Pérez wrote:
> Add support for assigning Address Space Identifiers (ASIDs) to each VQ
> group. This enables mapping each group into a distinct memory space.
>
> The vq group to ASID association is protected by a rwlock now. But the
> mutex domain_lock keeps protecting the domains of all ASIDs, as some
> operations like the one related with the bounce buffer size still
> requires to lock all the ASIDs.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>
> ---
> Future improvements can include performance optimizations on top like
> ore to RCU or thread synchronized atomics, or hardening by tracking ASID
> or ASID hashes on unused bits of the DMA address.
>
> Tested virtio_vdpa by adding manually two threads in vduse_set_status:
> one of them modifies the vq group 0 ASID and the other one map and unmap
> memory continuously. After a while, the two threads stop and the usual
> work continues. Test with version 0, version 1 with the old ioctl, and
> verion 1 with the new ioctl.
>
> Tested with vhost_vdpa by migrating a VM while ping on OVS+VDUSE. A few
> workaround were needed in some parts:
> * Do not enable CVQ before data vqs in QEMU, as VDUSE does not forward
> the enable message to the userland device. This will be solved in the
> future.
> * Share the suspended state between all vhost devices in QEMU:
> https://lists.nongnu.org/archive/html/qemu-devel/2025-11/msg02947.html
> * Implement a fake VDUSE suspend vdpa operation callback that always
> returns true in the kernel. DPDK suspend the device at the first
> GET_VRING_BASE.
> * Remove the CVQ blocker in ASID.
>
> The driver vhost_vdpa was also tested with version 0, version 1 with the
> old ioctl, version 1 with the new ioctl but only one ASID, and version 1
> with many ASID.
>
> ---
> v11:
> * Remove duplicated free_pages_exact in vduse_domain_free_coherent
> (Jason).
> * Do not take the vq groups lock if nas == 1.
> * Do not reset the vq group ASID in vq reset (Jason). Removed extra
> function vduse_set_group_asid_nomsg, not needed anymore.
> * Move the vduse_iotlb_entry_v2 argument to a new ioctl, as argument
> didn't match the previous VDUSE_IOTLB_GET_FD.
> * Move the asid < dev->nas check to vdpa core.
>
> v10:
> * Back to rwlock version so stronger locks are used.
> * Take out allocations from rwlock.
> * Forbid changing ASID of a vq group after DRIVER_OK (Jason)
> * Remove bad fetching again of domain variable in
> vduse_dev_max_mapping_size (Yongji).
> * Remove unused vdev definition in vdpa map_ops callbacks (kernel test
> robot).
>
> v9:
> * Replace mutex with rwlock, as the vdpa map_ops can run from atomic
> context.
>
> v8:
> * Revert the mutex to rwlock change, it needs proper profiling to
> justify it.
>
> v7:
> * Take write lock in the error path (Jason).
>
> v6:
> * Make vdpa_dev_add use gotos for error handling (MST).
> * s/(dev->api_version < 1) ?/(dev->api_version < VDUSE_API_VERSION_1) ?/
> (MST).
> * Fix struct name not matching in the doc.
>
> v5:
> * Properly return errno if copy_to_user returns >0 in VDUSE_IOTLB_GET_FD
> ioctl (Jason).
> * Properly set domain bounce size to divide equally between nas (Jason).
> * Exclude "padding" member from the only >V1 members in
> vduse_dev_request.
>
> v4:
> * Divide each domain bounce size between the device bounce size (Jason).
> * revert unneeded addr = NULL assignment (Jason)
> * Change if (x && (y || z)) return to if (x) { if (y) return; if (z)
> return; } (Jason)
> * Change a bad multiline comment, using @ caracter instead of * (Jason).
> * Consider config->nas == 0 as a fail (Jason).
>
> v3:
> * Get the vduse domain through the vduse_as in the map functions
> (Jason).
> * Squash with the patch creating the vduse_as struct (Jason).
> * Create VDUSE_DEV_MAX_AS instead of comparing agains a magic number
> (Jason)
>
> v2:
> * Convert the use of mutex to rwlock.
>
> RFC v3:
> * Increase VDUSE_MAX_VQ_GROUPS to 0xffff (Jason). It was set to a lower
> value to reduce memory consumption, but vqs are already limited to
> that value and userspace VDUSE is able to allocate that many vqs.
> * Remove TODO about merging VDUSE_IOTLB_GET_FD ioctl with
> VDUSE_IOTLB_GET_INFO.
> * Use of array_index_nospec in VDUSE device ioctls.
> * Embed vduse_iotlb_entry into vduse_iotlb_entry_v2.
> * Move the umem mutex to asid struct so there is no contention between
> ASIDs.
>
> RFC v2:
> * Make iotlb entry the last one of vduse_iotlb_entry_v2 so the first
> part of the struct is the same.
> ---
> drivers/vdpa/vdpa_user/vduse_dev.c | 392 ++++++++++++++++++++---------
> include/uapi/linux/vduse.h | 63 ++++-
> 2 files changed, 333 insertions(+), 122 deletions(-)
>
> diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
> index bf437816fd7d..8227b5e9f3f6 100644
> --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> @@ -41,6 +41,7 @@
>
> #define VDUSE_DEV_MAX (1U << MINORBITS)
> #define VDUSE_DEV_MAX_GROUPS 0xffff
> +#define VDUSE_DEV_MAX_AS 0xffff
> #define VDUSE_MAX_BOUNCE_SIZE (1024 * 1024 * 1024)
> #define VDUSE_MIN_BOUNCE_SIZE (1024 * 1024)
> #define VDUSE_BOUNCE_SIZE (64 * 1024 * 1024)
> @@ -86,7 +87,15 @@ struct vduse_umem {
> struct mm_struct *mm;
> };
>
> +struct vduse_as {
> + struct vduse_iova_domain *domain;
> + struct vduse_umem *umem;
> + struct mutex mem_lock;
> +};
> +
> struct vduse_vq_group {
> + rwlock_t as_lock;
> + struct vduse_as *as; /* Protected by as_lock */
> struct vduse_dev *dev;
> };
>
> @@ -94,7 +103,7 @@ struct vduse_dev {
> struct vduse_vdpa *vdev;
> struct device *dev;
> struct vduse_virtqueue **vqs;
> - struct vduse_iova_domain *domain;
> + struct vduse_as *as;
> char *name;
> struct mutex lock;
> spinlock_t msg_lock;
> @@ -122,9 +131,8 @@ struct vduse_dev {
> u32 vq_num;
> u32 vq_align;
> u32 ngroups;
> - struct vduse_umem *umem;
> + u32 nas;
> struct vduse_vq_group *groups;
> - struct mutex mem_lock;
> unsigned int bounce_size;
> struct mutex domain_lock;
> };
> @@ -314,7 +322,7 @@ static int vduse_dev_set_status(struct vduse_dev *dev, u8 status)
> return vduse_dev_msg_sync(dev, &msg);
> }
>
> -static int vduse_dev_update_iotlb(struct vduse_dev *dev,
> +static int vduse_dev_update_iotlb(struct vduse_dev *dev, u32 asid,
> u64 start, u64 last)
> {
> struct vduse_dev_msg msg = { 0 };
> @@ -323,8 +331,14 @@ static int vduse_dev_update_iotlb(struct vduse_dev *dev,
> return -EINVAL;
>
> msg.req.type = VDUSE_UPDATE_IOTLB;
> - msg.req.iova.start = start;
> - msg.req.iova.last = last;
> + if (dev->api_version < VDUSE_API_VERSION_1) {
> + msg.req.iova.start = start;
> + msg.req.iova.last = last;
> + } else {
> + msg.req.iova_v2.start = start;
> + msg.req.iova_v2.last = last;
> + msg.req.iova_v2.asid = asid;
> + }
>
> return vduse_dev_msg_sync(dev, &msg);
> }
> @@ -439,11 +453,14 @@ static __poll_t vduse_dev_poll(struct file *file, poll_table *wait)
> static void vduse_dev_reset(struct vduse_dev *dev)
> {
> int i;
> - struct vduse_iova_domain *domain = dev->domain;
>
> /* The coherent mappings are handled in vduse_dev_free_coherent() */
> - if (domain && domain->bounce_map)
> - vduse_domain_reset_bounce_map(domain);
> + for (i = 0; i < dev->nas; i++) {
> + struct vduse_iova_domain *domain = dev->as[i].domain;
I do not understand the locking here an in many other places.
dev->as is dereferenced here apparently outside as_lock?
> +
> + if (domain && domain->bounce_map)
> + vduse_domain_reset_bounce_map(domain);
> + }
>
> down_write(&dev->rwsem);
>
> @@ -622,6 +639,31 @@ static union virtio_map vduse_get_vq_map(struct vdpa_device *vdpa, u16 idx)
> return ret;
> }
>
> +static int vduse_set_group_asid(struct vdpa_device *vdpa, unsigned int group,
> + unsigned int asid)
> +{
> + struct vduse_dev *dev = vdpa_to_vduse(vdpa);
> + struct vduse_dev_msg msg = { 0 };
> + int r;
> +
> + if (dev->api_version < VDUSE_API_VERSION_1)
> + return -EINVAL;
> +
> + msg.req.type = VDUSE_SET_VQ_GROUP_ASID;
> + msg.req.vq_group_asid.group = group;
> + msg.req.vq_group_asid.asid = asid;
> +
> + r = vduse_dev_msg_sync(dev, &msg);
> + if (r < 0)
> + return r;
> +
> + write_lock(&dev->groups[group].as_lock);
> + dev->groups[group].as = &dev->as[asid];
> + write_unlock(&dev->groups[group].as_lock);
> +
> + return 0;
> +}
> +
> static int vduse_vdpa_get_vq_state(struct vdpa_device *vdpa, u16 idx,
> struct vdpa_vq_state *state)
> {
> @@ -793,13 +835,13 @@ static int vduse_vdpa_set_map(struct vdpa_device *vdpa,
> struct vduse_dev *dev = vdpa_to_vduse(vdpa);
> int ret;
>
> - ret = vduse_domain_set_map(dev->domain, iotlb);
> + ret = vduse_domain_set_map(dev->as[asid].domain, iotlb);
> if (ret)
> return ret;
>
> - ret = vduse_dev_update_iotlb(dev, 0ULL, ULLONG_MAX);
> + ret = vduse_dev_update_iotlb(dev, asid, 0ULL, ULLONG_MAX);
> if (ret) {
> - vduse_domain_clear_map(dev->domain, iotlb);
> + vduse_domain_clear_map(dev->as[asid].domain, iotlb);
> return ret;
> }
>
> @@ -842,6 +884,7 @@ static const struct vdpa_config_ops vduse_vdpa_config_ops = {
> .get_vq_affinity = vduse_vdpa_get_vq_affinity,
> .reset = vduse_vdpa_reset,
> .set_map = vduse_vdpa_set_map,
> + .set_group_asid = vduse_set_group_asid,
> .get_vq_map = vduse_get_vq_map,
> .free = vduse_vdpa_free,
> };
> @@ -850,32 +893,38 @@ static void vduse_dev_sync_single_for_device(union virtio_map token,
> dma_addr_t dma_addr, size_t size,
> enum dma_data_direction dir)
> {
> - struct vduse_dev *vdev;
> struct vduse_iova_domain *domain;
>
> if (!token.group)
> return;
>
> - vdev = token.group->dev;
> - domain = vdev->domain;
> + if (token.group->dev->nas > 1)
> + read_lock(&token.group->as_lock);
>
> + domain = token.group->as->domain;
> vduse_domain_sync_single_for_device(domain, dma_addr, size, dir);
> +
> + if (token.group->dev->nas > 1)
> + read_unlock(&token.group->as_lock);
> }
>
> static void vduse_dev_sync_single_for_cpu(union virtio_map token,
> dma_addr_t dma_addr, size_t size,
> enum dma_data_direction dir)
> {
> - struct vduse_dev *vdev;
> struct vduse_iova_domain *domain;
>
> if (!token.group)
> return;
>
> - vdev = token.group->dev;
> - domain = vdev->domain;
> + if (token.group->dev->nas > 1)
> + read_lock(&token.group->as_lock);
>
> + domain = token.group->as->domain;
> vduse_domain_sync_single_for_cpu(domain, dma_addr, size, dir);
> +
> + if (token.group->dev->nas > 1)
> + read_unlock(&token.group->as_lock);
> }
>
> static dma_addr_t vduse_dev_map_page(union virtio_map token, struct page *page,
> @@ -883,38 +932,45 @@ static dma_addr_t vduse_dev_map_page(union virtio_map token, struct page *page,
> enum dma_data_direction dir,
> unsigned long attrs)
> {
> - struct vduse_dev *vdev;
> struct vduse_iova_domain *domain;
> + dma_addr_t r;
>
> if (!token.group)
> return DMA_MAPPING_ERROR;
>
> - vdev = token.group->dev;
> - domain = vdev->domain;
> + if (token.group->dev->nas > 1)
> + read_lock(&token.group->as_lock);
> + domain = token.group->as->domain;
> + r = vduse_domain_map_page(domain, page, offset, size, dir, attrs);
>
> - return vduse_domain_map_page(domain, page, offset, size, dir, attrs);
> + if (token.group->dev->nas > 1)
> + read_unlock(&token.group->as_lock);
> +
> + return r;
> }
>
> static void vduse_dev_unmap_page(union virtio_map token, dma_addr_t dma_addr,
> size_t size, enum dma_data_direction dir,
> unsigned long attrs)
> {
> - struct vduse_dev *vdev;
> struct vduse_iova_domain *domain;
>
> if (!token.group)
> return;
>
> - vdev = token.group->dev;
> - domain = vdev->domain;
> + if (token.group->dev->nas > 1)
> + read_lock(&token.group->as_lock);
> +
> + domain = token.group->as->domain;
> + vduse_domain_unmap_page(domain, dma_addr, size, dir, attrs);
>
> - return vduse_domain_unmap_page(domain, dma_addr, size, dir, attrs);
> + if (token.group->dev->nas > 1)
> + read_unlock(&token.group->as_lock);
> }
>
> static void *vduse_dev_alloc_coherent(union virtio_map token, size_t size,
> dma_addr_t *dma_addr, gfp_t flag)
> {
> - struct vduse_dev *vdev;
> struct vduse_iova_domain *domain;
> unsigned long iova;
> void *addr;
> @@ -927,8 +983,10 @@ static void *vduse_dev_alloc_coherent(union virtio_map token, size_t size,
> if (!addr)
> return NULL;
>
> - vdev = token.group->dev;
> - domain = vdev->domain;
> + if (token.group->dev->nas > 1)
> + read_lock(&token.group->as_lock);
> +
> + domain = token.group->as->domain;
> addr = vduse_domain_alloc_coherent(domain, size,
> (dma_addr_t *)&iova, addr);
> if (!addr)
> @@ -936,9 +994,14 @@ static void *vduse_dev_alloc_coherent(union virtio_map token, size_t size,
>
> *dma_addr = (dma_addr_t)iova;
>
> + if (token.group->dev->nas > 1)
> + read_unlock(&token.group->as_lock);
> +
> return addr;
>
> err:
> + if (token.group->dev->nas > 1)
> + read_unlock(&token.group->as_lock);
> free_pages_exact(addr, size);
> return NULL;
> }
> @@ -947,31 +1010,39 @@ static void vduse_dev_free_coherent(union virtio_map token, size_t size,
> void *vaddr, dma_addr_t dma_addr,
> unsigned long attrs)
> {
> - struct vduse_dev *vdev;
> struct vduse_iova_domain *domain;
>
> if (!token.group)
> return;
>
> - vdev = token.group->dev;
> - domain = vdev->domain;
> + if (token.group->dev->nas > 1)
> + read_lock(&token.group->as_lock);
>
> + domain = token.group->as->domain;
> vduse_domain_free_coherent(domain, size, dma_addr, attrs);
> +
> + if (token.group->dev->nas > 1)
> + read_unlock(&token.group->as_lock);
> +
> free_pages_exact(vaddr, size);
> }
>
> static bool vduse_dev_need_sync(union virtio_map token, dma_addr_t dma_addr)
> {
> - struct vduse_dev *vdev;
> - struct vduse_iova_domain *domain;
> + size_t bounce_size;
>
> if (!token.group)
> return false;
>
> - vdev = token.group->dev;
> - domain = vdev->domain;
> + if (token.group->dev->nas > 1)
> + read_lock(&token.group->as_lock);
> +
> + bounce_size = token.group->as->domain->bounce_size;
> +
> + if (token.group->dev->nas > 1)
> + read_unlock(&token.group->as_lock);
>
> - return dma_addr < domain->bounce_size;
> + return dma_addr < bounce_size;
> }
>
> static int vduse_dev_mapping_error(union virtio_map token, dma_addr_t dma_addr)
> @@ -983,16 +1054,20 @@ static int vduse_dev_mapping_error(union virtio_map token, dma_addr_t dma_addr)
>
> static size_t vduse_dev_max_mapping_size(union virtio_map token)
> {
> - struct vduse_dev *vdev;
> - struct vduse_iova_domain *domain;
> + size_t bounce_size;
>
> if (!token.group)
> return 0;
>
> - vdev = token.group->dev;
> - domain = vdev->domain;
> + if (token.group->dev->nas > 1)
> + read_lock(&token.group->as_lock);
> +
> + bounce_size = token.group->as->domain->bounce_size;
> +
> + if (token.group->dev->nas > 1)
> + read_unlock(&token.group->as_lock);
>
> - return domain->bounce_size;
> + return bounce_size;
> }
>
> static const struct virtio_map_ops vduse_map_ops = {
> @@ -1132,39 +1207,40 @@ static int vduse_dev_queue_irq_work(struct vduse_dev *dev,
> return ret;
> }
>
> -static int vduse_dev_dereg_umem(struct vduse_dev *dev,
> +static int vduse_dev_dereg_umem(struct vduse_dev *dev, u32 asid,
> u64 iova, u64 size)
> {
> int ret;
>
> - mutex_lock(&dev->mem_lock);
> + mutex_lock(&dev->as[asid].mem_lock);
> ret = -ENOENT;
> - if (!dev->umem)
> + if (!dev->as[asid].umem)
> goto unlock;
>
> ret = -EINVAL;
> - if (!dev->domain)
> + if (!dev->as[asid].domain)
> goto unlock;
>
> - if (dev->umem->iova != iova || size != dev->domain->bounce_size)
> + if (dev->as[asid].umem->iova != iova ||
> + size != dev->as[asid].domain->bounce_size)
> goto unlock;
>
> - vduse_domain_remove_user_bounce_pages(dev->domain);
> - unpin_user_pages_dirty_lock(dev->umem->pages,
> - dev->umem->npages, true);
> - atomic64_sub(dev->umem->npages, &dev->umem->mm->pinned_vm);
> - mmdrop(dev->umem->mm);
> - vfree(dev->umem->pages);
> - kfree(dev->umem);
> - dev->umem = NULL;
> + vduse_domain_remove_user_bounce_pages(dev->as[asid].domain);
> + unpin_user_pages_dirty_lock(dev->as[asid].umem->pages,
> + dev->as[asid].umem->npages, true);
> + atomic64_sub(dev->as[asid].umem->npages, &dev->as[asid].umem->mm->pinned_vm);
> + mmdrop(dev->as[asid].umem->mm);
> + vfree(dev->as[asid].umem->pages);
> + kfree(dev->as[asid].umem);
> + dev->as[asid].umem = NULL;
> ret = 0;
> unlock:
> - mutex_unlock(&dev->mem_lock);
> + mutex_unlock(&dev->as[asid].mem_lock);
> return ret;
> }
>
> static int vduse_dev_reg_umem(struct vduse_dev *dev,
> - u64 iova, u64 uaddr, u64 size)
> + u32 asid, u64 iova, u64 uaddr, u64 size)
> {
> struct page **page_list = NULL;
> struct vduse_umem *umem = NULL;
> @@ -1172,14 +1248,14 @@ static int vduse_dev_reg_umem(struct vduse_dev *dev,
> unsigned long npages, lock_limit;
> int ret;
>
> - if (!dev->domain || !dev->domain->bounce_map ||
> - size != dev->domain->bounce_size ||
> + if (!dev->as[asid].domain || !dev->as[asid].domain->bounce_map ||
> + size != dev->as[asid].domain->bounce_size ||
> iova != 0 || uaddr & ~PAGE_MASK)
> return -EINVAL;
>
> - mutex_lock(&dev->mem_lock);
> + mutex_lock(&dev->as[asid].mem_lock);
> ret = -EEXIST;
> - if (dev->umem)
> + if (dev->as[asid].umem)
> goto unlock;
>
> ret = -ENOMEM;
> @@ -1203,7 +1279,7 @@ static int vduse_dev_reg_umem(struct vduse_dev *dev,
> goto out;
> }
>
> - ret = vduse_domain_add_user_bounce_pages(dev->domain,
> + ret = vduse_domain_add_user_bounce_pages(dev->as[asid].domain,
> page_list, pinned);
> if (ret)
> goto out;
> @@ -1216,7 +1292,7 @@ static int vduse_dev_reg_umem(struct vduse_dev *dev,
> umem->mm = current->mm;
> mmgrab(current->mm);
>
> - dev->umem = umem;
> + dev->as[asid].umem = umem;
> out:
> if (ret && pinned > 0)
> unpin_user_pages(page_list, pinned);
> @@ -1227,7 +1303,7 @@ static int vduse_dev_reg_umem(struct vduse_dev *dev,
> vfree(page_list);
> kfree(umem);
> }
> - mutex_unlock(&dev->mem_lock);
> + mutex_unlock(&dev->as[asid].mem_lock);
> return ret;
> }
>
> @@ -1248,43 +1324,46 @@ static void vduse_vq_update_effective_cpu(struct vduse_virtqueue *vq)
> }
>
> static int vduse_dev_iotlb_entry(struct vduse_dev *dev,
> - struct vduse_iotlb_entry *entry,
> + struct vduse_iotlb_entry_v2 *entry,
> struct file **f, uint64_t *capability)
> {
> + u32 asid;
> int r = -EINVAL;
> struct vhost_iotlb_map *map;
> const struct vdpa_map_file *map_file;
>
> - if (entry->start > entry->last)
> + if (entry->v1.start > entry->v1.last || entry->asid >= dev->nas)
> return -EINVAL;
>
> + asid = array_index_nospec(entry->asid, dev->nas);
> mutex_lock(&dev->domain_lock);
> - if (!dev->domain)
> +
> + if (!dev->as[asid].domain)
> goto out;
>
> - spin_lock(&dev->domain->iotlb_lock);
> - map = vhost_iotlb_itree_first(dev->domain->iotlb, entry->start,
> - entry->last);
> + spin_lock(&dev->as[asid].domain->iotlb_lock);
> + map = vhost_iotlb_itree_first(dev->as[asid].domain->iotlb,
> + entry->v1.start, entry->v1.last);
> if (map) {
> if (f) {
> map_file = (struct vdpa_map_file *)map->opaque;
> *f = get_file(map_file->file);
> }
> - entry->offset = map_file->offset;
> - entry->start = map->start;
> - entry->last = map->last;
> - entry->perm = map->perm;
> + entry->v1.offset = map_file->offset;
> + entry->v1.start = map->start;
> + entry->v1.last = map->last;
> + entry->v1.perm = map->perm;
> if (capability) {
> *capability = 0;
>
> - if (dev->domain->bounce_map && map->start == 0 &&
> - map->last == dev->domain->bounce_size - 1)
> + if (dev->as[asid].domain->bounce_map && map->start == 0 &&
> + map->last == dev->as[asid].domain->bounce_size - 1)
> *capability |= VDUSE_IOVA_CAP_UMEM;
> }
>
> r = 0;
> }
> - spin_unlock(&dev->domain->iotlb_lock);
> + spin_unlock(&dev->as[asid].domain->iotlb_lock);
>
> out:
> mutex_unlock(&dev->domain_lock);
> @@ -1302,12 +1381,29 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
> return -EPERM;
>
> switch (cmd) {
> - case VDUSE_IOTLB_GET_FD: {
> - struct vduse_iotlb_entry entry;
> + case VDUSE_IOTLB_GET_FD:
> + case VDUSE_IOTLB_GET_FD2: {
> + struct vduse_iotlb_entry_v2 entry = {0};
> struct file *f = NULL;
>
> + ret = -ENOIOCTLCMD;
> + if (dev->api_version < VDUSE_API_VERSION_1 &&
> + cmd == VDUSE_IOTLB_GET_FD2)
> + break;
> +
> ret = -EFAULT;
> - if (copy_from_user(&entry, argp, sizeof(entry)))
> + if (cmd == VDUSE_IOTLB_GET_FD2) {
> + if (copy_from_user(&entry, argp, sizeof(entry)))
> + break;
> + } else {
> + if (copy_from_user(&entry.v1, argp,
> + sizeof(entry.v1)))
> + break;
> + }
> +
> + ret = -EINVAL;
> + if (!is_mem_zero((const char *)entry.reserved,
> + sizeof(entry.reserved)))
> break;
>
> ret = vduse_dev_iotlb_entry(dev, &entry, &f, NULL);
> @@ -1318,12 +1414,19 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
> if (!f)
> break;
>
> - ret = -EFAULT;
> - if (copy_to_user(argp, &entry, sizeof(entry))) {
> + if (cmd == VDUSE_IOTLB_GET_FD2)
> + ret = copy_to_user(argp, &entry,
> + sizeof(entry));
> + else
> + ret = copy_to_user(argp, &entry.v1,
> + sizeof(entry.v1));
> +
> + if (ret) {
> + ret = -EFAULT;
> fput(f);
> break;
> }
> - ret = receive_fd(f, NULL, perm_to_file_flags(entry.perm));
> + ret = receive_fd(f, NULL, perm_to_file_flags(entry.v1.perm));
> fput(f);
> break;
> }
> @@ -1468,6 +1571,7 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
> }
> case VDUSE_IOTLB_REG_UMEM: {
> struct vduse_iova_umem umem;
> + u32 asid;
>
> ret = -EFAULT;
> if (copy_from_user(&umem, argp, sizeof(umem)))
> @@ -1475,17 +1579,21 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
>
> ret = -EINVAL;
> if (!is_mem_zero((const char *)umem.reserved,
> - sizeof(umem.reserved)))
> + sizeof(umem.reserved)) ||
> + (dev->api_version < VDUSE_API_VERSION_1 &&
> + umem.asid != 0) || umem.asid >= dev->nas)
> break;
>
> mutex_lock(&dev->domain_lock);
> - ret = vduse_dev_reg_umem(dev, umem.iova,
> + asid = array_index_nospec(umem.asid, dev->nas);
> + ret = vduse_dev_reg_umem(dev, asid, umem.iova,
> umem.uaddr, umem.size);
> mutex_unlock(&dev->domain_lock);
> break;
> }
> case VDUSE_IOTLB_DEREG_UMEM: {
> struct vduse_iova_umem umem;
> + u32 asid;
>
> ret = -EFAULT;
> if (copy_from_user(&umem, argp, sizeof(umem)))
> @@ -1493,17 +1601,22 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
>
> ret = -EINVAL;
> if (!is_mem_zero((const char *)umem.reserved,
> - sizeof(umem.reserved)))
> + sizeof(umem.reserved)) ||
> + (dev->api_version < VDUSE_API_VERSION_1 &&
> + umem.asid != 0) ||
> + umem.asid >= dev->nas)
> break;
> +
> mutex_lock(&dev->domain_lock);
> - ret = vduse_dev_dereg_umem(dev, umem.iova,
> + asid = array_index_nospec(umem.asid, dev->nas);
> + ret = vduse_dev_dereg_umem(dev, asid, umem.iova,
> umem.size);
> mutex_unlock(&dev->domain_lock);
> break;
> }
> case VDUSE_IOTLB_GET_INFO: {
> struct vduse_iova_info info;
> - struct vduse_iotlb_entry entry;
> + struct vduse_iotlb_entry_v2 entry;
>
> ret = -EFAULT;
> if (copy_from_user(&info, argp, sizeof(info)))
> @@ -1513,15 +1626,23 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
> sizeof(info.reserved)))
> break;
>
> - entry.start = info.start;
> - entry.last = info.last;
> + if (dev->api_version < VDUSE_API_VERSION_1) {
> + if (info.asid)
> + break;
> + } else if (info.asid >= dev->nas)
> + break;
> +
> + entry.v1.start = info.start;
> + entry.v1.last = info.last;
> + entry.asid = info.asid;
> ret = vduse_dev_iotlb_entry(dev, &entry, NULL,
> &info.capability);
> if (ret < 0)
> break;
>
> - info.start = entry.start;
> - info.last = entry.last;
> + info.start = entry.v1.start;
> + info.last = entry.v1.last;
> + info.asid = entry.asid;
>
> ret = -EFAULT;
> if (copy_to_user(argp, &info, sizeof(info)))
> @@ -1543,8 +1664,10 @@ static int vduse_dev_release(struct inode *inode, struct file *file)
> struct vduse_dev *dev = file->private_data;
>
> mutex_lock(&dev->domain_lock);
> - if (dev->domain)
> - vduse_dev_dereg_umem(dev, 0, dev->domain->bounce_size);
> + for (int i = 0; i < dev->nas; i++)
> + if (dev->as[i].domain)
> + vduse_dev_dereg_umem(dev, i, 0,
> + dev->as[i].domain->bounce_size);
> mutex_unlock(&dev->domain_lock);
> spin_lock(&dev->msg_lock);
> /* Make sure the inflight messages can processed after reconncection */
> @@ -1763,7 +1886,6 @@ static struct vduse_dev *vduse_dev_create(void)
> return NULL;
>
> mutex_init(&dev->lock);
> - mutex_init(&dev->mem_lock);
> mutex_init(&dev->domain_lock);
> spin_lock_init(&dev->msg_lock);
> INIT_LIST_HEAD(&dev->send_list);
> @@ -1814,8 +1936,11 @@ static int vduse_destroy_dev(char *name)
> idr_remove(&vduse_idr, dev->minor);
> kvfree(dev->config);
> vduse_dev_deinit_vqs(dev);
> - if (dev->domain)
> - vduse_domain_destroy(dev->domain);
> + for (int i = 0; i < dev->nas; i++) {
> + if (dev->as[i].domain)
> + vduse_domain_destroy(dev->as[i].domain);
> + }
> + kfree(dev->as);
> kfree(dev->name);
> kfree(dev->groups);
> vduse_dev_destroy(dev);
> @@ -1862,12 +1987,17 @@ static bool vduse_validate_config(struct vduse_dev_config *config,
> sizeof(config->reserved)))
> return false;
>
> - if (api_version < VDUSE_API_VERSION_1 && config->ngroups)
> + if (api_version < VDUSE_API_VERSION_1 &&
> + (config->ngroups || config->nas))
> return false;
>
> - if (api_version >= VDUSE_API_VERSION_1 &&
> - (!config->ngroups || config->ngroups > VDUSE_DEV_MAX_GROUPS))
> - return false;
> + if (api_version >= VDUSE_API_VERSION_1) {
> + if (!config->ngroups || config->ngroups > VDUSE_DEV_MAX_GROUPS)
> + return false;
> +
> + if (!config->nas || config->nas > VDUSE_DEV_MAX_AS)
> + return false;
> + }
>
> if (config->vq_align > PAGE_SIZE)
> return false;
> @@ -1932,7 +2062,8 @@ static ssize_t bounce_size_store(struct device *device,
>
> ret = -EPERM;
> mutex_lock(&dev->domain_lock);
> - if (dev->domain)
> + /* Assuming that if the first domain is allocated, all are allocated */
> + if (dev->as[0].domain)
> goto unlock;
>
> ret = kstrtouint(buf, 10, &bounce_size);
> @@ -1984,6 +2115,14 @@ static int vduse_create_dev(struct vduse_dev_config *config,
> dev->device_features = config->features;
> dev->device_id = config->device_id;
> dev->vendor_id = config->vendor_id;
> +
> + dev->nas = (dev->api_version < VDUSE_API_VERSION_1) ? 1 : config->nas;
> + dev->as = kcalloc(dev->nas, sizeof(dev->as[0]), GFP_KERNEL);
> + if (!dev->as)
> + goto err_as;
> + for (int i = 0; i < dev->nas; i++)
> + mutex_init(&dev->as[i].mem_lock);
> +
> dev->ngroups = (dev->api_version < VDUSE_API_VERSION_1)
> ? 1
> : config->ngroups;
> @@ -1991,8 +2130,11 @@ static int vduse_create_dev(struct vduse_dev_config *config,
> GFP_KERNEL);
> if (!dev->groups)
> goto err_vq_groups;
> - for (u32 i = 0; i < dev->ngroups; ++i)
> + for (u32 i = 0; i < dev->ngroups; ++i) {
> dev->groups[i].dev = dev;
> + rwlock_init(&dev->groups[i].as_lock);
> + dev->groups[i].as = &dev->as[0];
> + }
>
> dev->name = kstrdup(config->name, GFP_KERNEL);
> if (!dev->name)
> @@ -2032,6 +2174,8 @@ static int vduse_create_dev(struct vduse_dev_config *config,
> err_str:
> kfree(dev->groups);
> err_vq_groups:
> + kfree(dev->as);
> +err_as:
> vduse_dev_destroy(dev);
> err:
> return ret;
> @@ -2155,7 +2299,7 @@ static int vduse_dev_init_vdpa(struct vduse_dev *dev, const char *name)
>
> vdev = vdpa_alloc_device(struct vduse_vdpa, vdpa, dev->dev,
> &vduse_vdpa_config_ops, &vduse_map_ops,
> - dev->ngroups, 1, name, true);
> + dev->ngroups, dev->nas, name, true);
> if (IS_ERR(vdev))
> return PTR_ERR(vdev);
>
> @@ -2170,7 +2314,8 @@ static int vdpa_dev_add(struct vdpa_mgmt_dev *mdev, const char *name,
> const struct vdpa_dev_set_config *config)
> {
> struct vduse_dev *dev;
> - int ret;
> + size_t domain_bounce_size;
> + int ret, i;
>
> mutex_lock(&vduse_lock);
> dev = vduse_find_dev(name);
> @@ -2184,29 +2329,38 @@ static int vdpa_dev_add(struct vdpa_mgmt_dev *mdev, const char *name,
> return ret;
>
> mutex_lock(&dev->domain_lock);
> - if (!dev->domain)
> - dev->domain = vduse_domain_create(VDUSE_IOVA_SIZE - 1,
> - dev->bounce_size);
> - mutex_unlock(&dev->domain_lock);
> - if (!dev->domain) {
> - ret = -ENOMEM;
> - goto domain_err;
> + ret = 0;
> +
> + domain_bounce_size = dev->bounce_size / dev->nas;
> + for (i = 0; i < dev->nas; ++i) {
> + dev->as[i].domain = vduse_domain_create(VDUSE_IOVA_SIZE - 1,
> + domain_bounce_size);
> + if (!dev->as[i].domain) {
> + ret = -ENOMEM;
> + goto err;
> + }
> }
>
> + mutex_unlock(&dev->domain_lock);
> +
> ret = _vdpa_register_device(&dev->vdev->vdpa, dev->vq_num);
> - if (ret) {
> - goto register_err;
> - }
> + if (ret)
> + goto err_register;
>
> return 0;
>
> -register_err:
> +err_register:
> mutex_lock(&dev->domain_lock);
> - vduse_domain_destroy(dev->domain);
> - dev->domain = NULL;
> +
> +err:
> + for (int j = 0; j < i; j++) {
> + if (dev->as[j].domain) {
> + vduse_domain_destroy(dev->as[j].domain);
> + dev->as[j].domain = NULL;
> + }
> + }
> mutex_unlock(&dev->domain_lock);
>
> -domain_err:
> put_device(&dev->vdev->vdpa.dev);
>
> return ret;
> diff --git a/include/uapi/linux/vduse.h b/include/uapi/linux/vduse.h
> index a3d51cf6df3a..9e423163d819 100644
> --- a/include/uapi/linux/vduse.h
> +++ b/include/uapi/linux/vduse.h
> @@ -47,7 +47,8 @@ struct vduse_dev_config {
> __u32 vq_num;
> __u32 vq_align;
> __u32 ngroups; /* if VDUSE_API_VERSION >= 1 */
> - __u32 reserved[12];
> + __u32 nas; /* if VDUSE_API_VERSION >= 1 */
> + __u32 reserved[11];
> __u32 config_size;
> __u8 config[];
> };
> @@ -166,6 +167,16 @@ struct vduse_vq_state_packed {
> __u16 last_used_idx;
> };
>
> +/**
> + * struct vduse_vq_group_asid - virtqueue group ASID
> + * @group: Index of the virtqueue group
> + * @asid: Address space ID of the group
> + */
> +struct vduse_vq_group_asid {
> + __u32 group;
> + __u32 asid;
> +};
> +
> /**
> * struct vduse_vq_info - information of a virtqueue
> * @index: virtqueue index
> @@ -225,6 +236,7 @@ struct vduse_vq_eventfd {
> * @uaddr: start address of userspace memory, it must be aligned to page size
> * @iova: start of the IOVA region
> * @size: size of the IOVA region
> + * @asid: Address space ID of the IOVA region
> * @reserved: for future use, needs to be initialized to zero
> *
> * Structure used by VDUSE_IOTLB_REG_UMEM and VDUSE_IOTLB_DEREG_UMEM
> @@ -234,7 +246,8 @@ struct vduse_iova_umem {
> __u64 uaddr;
> __u64 iova;
> __u64 size;
> - __u64 reserved[3];
> + __u32 asid;
> + __u32 reserved[5];
> };
>
> /* Register userspace memory for IOVA regions */
> @@ -248,6 +261,7 @@ struct vduse_iova_umem {
> * @start: start of the IOVA region
> * @last: last of the IOVA region
> * @capability: capability of the IOVA region
> + * @asid: Address space ID of the IOVA region, only if device API version >= 1
> * @reserved: for future use, needs to be initialized to zero
> *
> * Structure used by VDUSE_IOTLB_GET_INFO ioctl to get information of
> @@ -258,7 +272,8 @@ struct vduse_iova_info {
> __u64 last;
> #define VDUSE_IOVA_CAP_UMEM (1 << 0)
> __u64 capability;
> - __u64 reserved[3];
> + __u32 asid; /* Only if device API version >= 1 */
> + __u32 reserved[5];
> };
>
> /*
> @@ -267,6 +282,28 @@ struct vduse_iova_info {
> */
> #define VDUSE_IOTLB_GET_INFO _IOWR(VDUSE_BASE, 0x1a, struct vduse_iova_info)
>
> +/**
> + * struct vduse_iotlb_entry_v2 - entry of IOTLB to describe one IOVA region
> + *
> + * @v1: the original vduse_iotlb_entry
> + * @asid: address space ID of the IOVA region
> + * @reserver: for future use, needs to be initialized to zero
> + *
> + * Structure used by VDUSE_IOTLB_GET_FD2 ioctl to find an overlapped IOVA region.
> + */
> +struct vduse_iotlb_entry_v2 {
> + struct vduse_iotlb_entry v1;
> + __u32 asid;
> + __u32 reserved[12];
> +};
> +
> +/*
> + * Same as VDUSE_IOTLB_GET_FD but with vduse_iotlb_entry_v2 argument that
> + * support extra fields.
> + */
> +#define VDUSE_IOTLB_GET_FD2 _IOWR(VDUSE_BASE, 0x1b, struct vduse_iotlb_entry_v2)
> +
> +
> /* The control messages definition for read(2)/write(2) on /dev/vduse/$NAME */
>
> /**
> @@ -280,6 +317,7 @@ enum vduse_req_type {
> VDUSE_GET_VQ_STATE,
> VDUSE_SET_STATUS,
> VDUSE_UPDATE_IOTLB,
> + VDUSE_SET_VQ_GROUP_ASID,
> };
>
> /**
> @@ -314,6 +352,18 @@ struct vduse_iova_range {
> __u64 last;
> };
>
> +/**
> + * struct vduse_iova_range - IOVA range [start, last] if API_VERSION >= 1
ypu mean struct vduse_iova_range_v2 ?
> + * @start: start of the IOVA range
> + * @last: last of the IOVA range
> + * @asid: address space ID of the IOVA range
> + */
> +struct vduse_iova_range_v2 {
> + __u64 start;
> + __u64 last;
> + __u32 asid;
> +};
> +
> /**
> * struct vduse_dev_request - control request
> * @type: request type
> @@ -322,6 +372,8 @@ struct vduse_iova_range {
> * @vq_state: virtqueue state, only index field is available
> * @s: device status
> * @iova: IOVA range for updating
> + * @iova_v2: IOVA range for updating if API_VERSION >= 1
> + * @vq_group_asid: ASID of a virtqueue group
> * @padding: padding
> *
> * Structure used by read(2) on /dev/vduse/$NAME.
> @@ -334,6 +386,11 @@ struct vduse_dev_request {
> struct vduse_vq_state vq_state;
> struct vduse_dev_status s;
> struct vduse_iova_range iova;
> + /* Following members but padding exist only if vduse api
> + * version >= 1
> + */;
> + struct vduse_iova_range_v2 iova_v2;
> + struct vduse_vq_group_asid vq_group_asid;
> __u32 padding[32];
> };
> };
> --
> 2.52.0
^ permalink raw reply [flat|nested] 40+ messages in thread* Re: [PATCH v11 11/12] vduse: add vq group asid support
2026-01-11 0:03 ` Michael S. Tsirkin
@ 2026-01-12 11:56 ` Eugenio Perez Martin
2026-01-12 12:00 ` Michael S. Tsirkin
0 siblings, 1 reply; 40+ messages in thread
From: Eugenio Perez Martin @ 2026-01-12 11:56 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: linux-kernel, virtualization, Maxime Coquelin, Laurent Vivier,
Cindy Lu, jasowang, Xuan Zhuo, Stefano Garzarella, Yongji Xie
On Sun, Jan 11, 2026 at 1:03 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Fri, Jan 09, 2026 at 04:24:29PM +0100, Eugenio Pérez wrote:
> > Add support for assigning Address Space Identifiers (ASIDs) to each VQ
> > group. This enables mapping each group into a distinct memory space.
> >
> > The vq group to ASID association is protected by a rwlock now. But the
> > mutex domain_lock keeps protecting the domains of all ASIDs, as some
> > operations like the one related with the bounce buffer size still
> > requires to lock all the ASIDs.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> >
> > ---
> > Future improvements can include performance optimizations on top like
> > ore to RCU or thread synchronized atomics, or hardening by tracking ASID
> > or ASID hashes on unused bits of the DMA address.
> >
> > Tested virtio_vdpa by adding manually two threads in vduse_set_status:
> > one of them modifies the vq group 0 ASID and the other one map and unmap
> > memory continuously. After a while, the two threads stop and the usual
> > work continues. Test with version 0, version 1 with the old ioctl, and
> > verion 1 with the new ioctl.
> >
> > Tested with vhost_vdpa by migrating a VM while ping on OVS+VDUSE. A few
> > workaround were needed in some parts:
> > * Do not enable CVQ before data vqs in QEMU, as VDUSE does not forward
> > the enable message to the userland device. This will be solved in the
> > future.
> > * Share the suspended state between all vhost devices in QEMU:
> > https://lists.nongnu.org/archive/html/qemu-devel/2025-11/msg02947.html
> > * Implement a fake VDUSE suspend vdpa operation callback that always
> > returns true in the kernel. DPDK suspend the device at the first
> > GET_VRING_BASE.
> > * Remove the CVQ blocker in ASID.
> >
> > The driver vhost_vdpa was also tested with version 0, version 1 with the
> > old ioctl, version 1 with the new ioctl but only one ASID, and version 1
> > with many ASID.
> >
> > ---
> > v11:
> > * Remove duplicated free_pages_exact in vduse_domain_free_coherent
> > (Jason).
> > * Do not take the vq groups lock if nas == 1.
> > * Do not reset the vq group ASID in vq reset (Jason). Removed extra
> > function vduse_set_group_asid_nomsg, not needed anymore.
> > * Move the vduse_iotlb_entry_v2 argument to a new ioctl, as argument
> > didn't match the previous VDUSE_IOTLB_GET_FD.
> > * Move the asid < dev->nas check to vdpa core.
> >
> > v10:
> > * Back to rwlock version so stronger locks are used.
> > * Take out allocations from rwlock.
> > * Forbid changing ASID of a vq group after DRIVER_OK (Jason)
> > * Remove bad fetching again of domain variable in
> > vduse_dev_max_mapping_size (Yongji).
> > * Remove unused vdev definition in vdpa map_ops callbacks (kernel test
> > robot).
> >
> > v9:
> > * Replace mutex with rwlock, as the vdpa map_ops can run from atomic
> > context.
> >
> > v8:
> > * Revert the mutex to rwlock change, it needs proper profiling to
> > justify it.
> >
> > v7:
> > * Take write lock in the error path (Jason).
> >
> > v6:
> > * Make vdpa_dev_add use gotos for error handling (MST).
> > * s/(dev->api_version < 1) ?/(dev->api_version < VDUSE_API_VERSION_1) ?/
> > (MST).
> > * Fix struct name not matching in the doc.
> >
> > v5:
> > * Properly return errno if copy_to_user returns >0 in VDUSE_IOTLB_GET_FD
> > ioctl (Jason).
> > * Properly set domain bounce size to divide equally between nas (Jason).
> > * Exclude "padding" member from the only >V1 members in
> > vduse_dev_request.
> >
> > v4:
> > * Divide each domain bounce size between the device bounce size (Jason).
> > * revert unneeded addr = NULL assignment (Jason)
> > * Change if (x && (y || z)) return to if (x) { if (y) return; if (z)
> > return; } (Jason)
> > * Change a bad multiline comment, using @ caracter instead of * (Jason).
> > * Consider config->nas == 0 as a fail (Jason).
> >
> > v3:
> > * Get the vduse domain through the vduse_as in the map functions
> > (Jason).
> > * Squash with the patch creating the vduse_as struct (Jason).
> > * Create VDUSE_DEV_MAX_AS instead of comparing agains a magic number
> > (Jason)
> >
> > v2:
> > * Convert the use of mutex to rwlock.
> >
> > RFC v3:
> > * Increase VDUSE_MAX_VQ_GROUPS to 0xffff (Jason). It was set to a lower
> > value to reduce memory consumption, but vqs are already limited to
> > that value and userspace VDUSE is able to allocate that many vqs.
> > * Remove TODO about merging VDUSE_IOTLB_GET_FD ioctl with
> > VDUSE_IOTLB_GET_INFO.
> > * Use of array_index_nospec in VDUSE device ioctls.
> > * Embed vduse_iotlb_entry into vduse_iotlb_entry_v2.
> > * Move the umem mutex to asid struct so there is no contention between
> > ASIDs.
> >
> > RFC v2:
> > * Make iotlb entry the last one of vduse_iotlb_entry_v2 so the first
> > part of the struct is the same.
> > ---
> > drivers/vdpa/vdpa_user/vduse_dev.c | 392 ++++++++++++++++++++---------
> > include/uapi/linux/vduse.h | 63 ++++-
> > 2 files changed, 333 insertions(+), 122 deletions(-)
> >
> > diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
> > index bf437816fd7d..8227b5e9f3f6 100644
> > --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> > +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> > @@ -41,6 +41,7 @@
> >
> > #define VDUSE_DEV_MAX (1U << MINORBITS)
> > #define VDUSE_DEV_MAX_GROUPS 0xffff
> > +#define VDUSE_DEV_MAX_AS 0xffff
> > #define VDUSE_MAX_BOUNCE_SIZE (1024 * 1024 * 1024)
> > #define VDUSE_MIN_BOUNCE_SIZE (1024 * 1024)
> > #define VDUSE_BOUNCE_SIZE (64 * 1024 * 1024)
> > @@ -86,7 +87,15 @@ struct vduse_umem {
> > struct mm_struct *mm;
> > };
> >
> > +struct vduse_as {
> > + struct vduse_iova_domain *domain;
> > + struct vduse_umem *umem;
> > + struct mutex mem_lock;
> > +};
> > +
> > struct vduse_vq_group {
> > + rwlock_t as_lock;
> > + struct vduse_as *as; /* Protected by as_lock */
> > struct vduse_dev *dev;
> > };
> >
> > @@ -94,7 +103,7 @@ struct vduse_dev {
> > struct vduse_vdpa *vdev;
> > struct device *dev;
> > struct vduse_virtqueue **vqs;
> > - struct vduse_iova_domain *domain;
> > + struct vduse_as *as;
> > char *name;
> > struct mutex lock;
> > spinlock_t msg_lock;
> > @@ -122,9 +131,8 @@ struct vduse_dev {
> > u32 vq_num;
> > u32 vq_align;
> > u32 ngroups;
> > - struct vduse_umem *umem;
> > + u32 nas;
> > struct vduse_vq_group *groups;
> > - struct mutex mem_lock;
> > unsigned int bounce_size;
> > struct mutex domain_lock;
> > };
> > @@ -314,7 +322,7 @@ static int vduse_dev_set_status(struct vduse_dev *dev, u8 status)
> > return vduse_dev_msg_sync(dev, &msg);
> > }
> >
> > -static int vduse_dev_update_iotlb(struct vduse_dev *dev,
> > +static int vduse_dev_update_iotlb(struct vduse_dev *dev, u32 asid,
> > u64 start, u64 last)
> > {
> > struct vduse_dev_msg msg = { 0 };
> > @@ -323,8 +331,14 @@ static int vduse_dev_update_iotlb(struct vduse_dev *dev,
> > return -EINVAL;
> >
> > msg.req.type = VDUSE_UPDATE_IOTLB;
> > - msg.req.iova.start = start;
> > - msg.req.iova.last = last;
> > + if (dev->api_version < VDUSE_API_VERSION_1) {
> > + msg.req.iova.start = start;
> > + msg.req.iova.last = last;
> > + } else {
> > + msg.req.iova_v2.start = start;
> > + msg.req.iova_v2.last = last;
> > + msg.req.iova_v2.asid = asid;
> > + }
> >
> > return vduse_dev_msg_sync(dev, &msg);
> > }
> > @@ -439,11 +453,14 @@ static __poll_t vduse_dev_poll(struct file *file, poll_table *wait)
> > static void vduse_dev_reset(struct vduse_dev *dev)
> > {
> > int i;
> > - struct vduse_iova_domain *domain = dev->domain;
> >
> > /* The coherent mappings are handled in vduse_dev_free_coherent() */
> > - if (domain && domain->bounce_map)
> > - vduse_domain_reset_bounce_map(domain);
> > + for (i = 0; i < dev->nas; i++) {
> > + struct vduse_iova_domain *domain = dev->as[i].domain;
>
> I do not understand the locking here an in many other places.
> dev->as is dereferenced here apparently outside as_lock?
>
The virtqueue groups' "as_lock" member protects the virtqueue groups'
"as" pointer. But we're not accessing any vq groups "as" member here,
we're accessing the dev->as[i].domain pointer here, which doesn't
change in all the lifetime of the device.
>
>
> > +
> > + if (domain && domain->bounce_map)
> > + vduse_domain_reset_bounce_map(domain);
> > + }
> >
> > down_write(&dev->rwsem);
> >
> > @@ -622,6 +639,31 @@ static union virtio_map vduse_get_vq_map(struct vdpa_device *vdpa, u16 idx)
> > return ret;
> > }
> >
> > +static int vduse_set_group_asid(struct vdpa_device *vdpa, unsigned int group,
> > + unsigned int asid)
> > +{
> > + struct vduse_dev *dev = vdpa_to_vduse(vdpa);
> > + struct vduse_dev_msg msg = { 0 };
> > + int r;
> > +
> > + if (dev->api_version < VDUSE_API_VERSION_1)
> > + return -EINVAL;
> > +
> > + msg.req.type = VDUSE_SET_VQ_GROUP_ASID;
> > + msg.req.vq_group_asid.group = group;
> > + msg.req.vq_group_asid.asid = asid;
> > +
> > + r = vduse_dev_msg_sync(dev, &msg);
> > + if (r < 0)
> > + return r;
> > +
> > + write_lock(&dev->groups[group].as_lock);
> > + dev->groups[group].as = &dev->as[asid];
> > + write_unlock(&dev->groups[group].as_lock);
> > +
> > + return 0;
> > +}
> > +
> > static int vduse_vdpa_get_vq_state(struct vdpa_device *vdpa, u16 idx,
> > struct vdpa_vq_state *state)
> > {
> > @@ -793,13 +835,13 @@ static int vduse_vdpa_set_map(struct vdpa_device *vdpa,
> > struct vduse_dev *dev = vdpa_to_vduse(vdpa);
> > int ret;
> >
> > - ret = vduse_domain_set_map(dev->domain, iotlb);
> > + ret = vduse_domain_set_map(dev->as[asid].domain, iotlb);
> > if (ret)
> > return ret;
> >
> > - ret = vduse_dev_update_iotlb(dev, 0ULL, ULLONG_MAX);
> > + ret = vduse_dev_update_iotlb(dev, asid, 0ULL, ULLONG_MAX);
> > if (ret) {
> > - vduse_domain_clear_map(dev->domain, iotlb);
> > + vduse_domain_clear_map(dev->as[asid].domain, iotlb);
> > return ret;
> > }
> >
> > @@ -842,6 +884,7 @@ static const struct vdpa_config_ops vduse_vdpa_config_ops = {
> > .get_vq_affinity = vduse_vdpa_get_vq_affinity,
> > .reset = vduse_vdpa_reset,
> > .set_map = vduse_vdpa_set_map,
> > + .set_group_asid = vduse_set_group_asid,
> > .get_vq_map = vduse_get_vq_map,
> > .free = vduse_vdpa_free,
> > };
> > @@ -850,32 +893,38 @@ static void vduse_dev_sync_single_for_device(union virtio_map token,
> > dma_addr_t dma_addr, size_t size,
> > enum dma_data_direction dir)
> > {
> > - struct vduse_dev *vdev;
> > struct vduse_iova_domain *domain;
> >
> > if (!token.group)
> > return;
> >
> > - vdev = token.group->dev;
> > - domain = vdev->domain;
> > + if (token.group->dev->nas > 1)
> > + read_lock(&token.group->as_lock);
> >
> > + domain = token.group->as->domain;
> > vduse_domain_sync_single_for_device(domain, dma_addr, size, dir);
> > +
> > + if (token.group->dev->nas > 1)
> > + read_unlock(&token.group->as_lock);
> > }
> >
> > static void vduse_dev_sync_single_for_cpu(union virtio_map token,
> > dma_addr_t dma_addr, size_t size,
> > enum dma_data_direction dir)
> > {
> > - struct vduse_dev *vdev;
> > struct vduse_iova_domain *domain;
> >
> > if (!token.group)
> > return;
> >
> > - vdev = token.group->dev;
> > - domain = vdev->domain;
> > + if (token.group->dev->nas > 1)
> > + read_lock(&token.group->as_lock);
> >
> > + domain = token.group->as->domain;
> > vduse_domain_sync_single_for_cpu(domain, dma_addr, size, dir);
> > +
> > + if (token.group->dev->nas > 1)
> > + read_unlock(&token.group->as_lock);
> > }
> >
> > static dma_addr_t vduse_dev_map_page(union virtio_map token, struct page *page,
> > @@ -883,38 +932,45 @@ static dma_addr_t vduse_dev_map_page(union virtio_map token, struct page *page,
> > enum dma_data_direction dir,
> > unsigned long attrs)
> > {
> > - struct vduse_dev *vdev;
> > struct vduse_iova_domain *domain;
> > + dma_addr_t r;
> >
> > if (!token.group)
> > return DMA_MAPPING_ERROR;
> >
> > - vdev = token.group->dev;
> > - domain = vdev->domain;
> > + if (token.group->dev->nas > 1)
> > + read_lock(&token.group->as_lock);
> > + domain = token.group->as->domain;
> > + r = vduse_domain_map_page(domain, page, offset, size, dir, attrs);
> >
> > - return vduse_domain_map_page(domain, page, offset, size, dir, attrs);
> > + if (token.group->dev->nas > 1)
> > + read_unlock(&token.group->as_lock);
> > +
> > + return r;
> > }
> >
> > static void vduse_dev_unmap_page(union virtio_map token, dma_addr_t dma_addr,
> > size_t size, enum dma_data_direction dir,
> > unsigned long attrs)
> > {
> > - struct vduse_dev *vdev;
> > struct vduse_iova_domain *domain;
> >
> > if (!token.group)
> > return;
> >
> > - vdev = token.group->dev;
> > - domain = vdev->domain;
> > + if (token.group->dev->nas > 1)
> > + read_lock(&token.group->as_lock);
> > +
> > + domain = token.group->as->domain;
> > + vduse_domain_unmap_page(domain, dma_addr, size, dir, attrs);
> >
> > - return vduse_domain_unmap_page(domain, dma_addr, size, dir, attrs);
> > + if (token.group->dev->nas > 1)
> > + read_unlock(&token.group->as_lock);
> > }
> >
> > static void *vduse_dev_alloc_coherent(union virtio_map token, size_t size,
> > dma_addr_t *dma_addr, gfp_t flag)
> > {
> > - struct vduse_dev *vdev;
> > struct vduse_iova_domain *domain;
> > unsigned long iova;
> > void *addr;
> > @@ -927,8 +983,10 @@ static void *vduse_dev_alloc_coherent(union virtio_map token, size_t size,
> > if (!addr)
> > return NULL;
> >
> > - vdev = token.group->dev;
> > - domain = vdev->domain;
> > + if (token.group->dev->nas > 1)
> > + read_lock(&token.group->as_lock);
> > +
> > + domain = token.group->as->domain;
> > addr = vduse_domain_alloc_coherent(domain, size,
> > (dma_addr_t *)&iova, addr);
> > if (!addr)
> > @@ -936,9 +994,14 @@ static void *vduse_dev_alloc_coherent(union virtio_map token, size_t size,
> >
> > *dma_addr = (dma_addr_t)iova;
> >
> > + if (token.group->dev->nas > 1)
> > + read_unlock(&token.group->as_lock);
> > +
> > return addr;
> >
> > err:
> > + if (token.group->dev->nas > 1)
> > + read_unlock(&token.group->as_lock);
> > free_pages_exact(addr, size);
> > return NULL;
> > }
> > @@ -947,31 +1010,39 @@ static void vduse_dev_free_coherent(union virtio_map token, size_t size,
> > void *vaddr, dma_addr_t dma_addr,
> > unsigned long attrs)
> > {
> > - struct vduse_dev *vdev;
> > struct vduse_iova_domain *domain;
> >
> > if (!token.group)
> > return;
> >
> > - vdev = token.group->dev;
> > - domain = vdev->domain;
> > + if (token.group->dev->nas > 1)
> > + read_lock(&token.group->as_lock);
> >
> > + domain = token.group->as->domain;
> > vduse_domain_free_coherent(domain, size, dma_addr, attrs);
> > +
> > + if (token.group->dev->nas > 1)
> > + read_unlock(&token.group->as_lock);
> > +
> > free_pages_exact(vaddr, size);
> > }
> >
> > static bool vduse_dev_need_sync(union virtio_map token, dma_addr_t dma_addr)
> > {
> > - struct vduse_dev *vdev;
> > - struct vduse_iova_domain *domain;
> > + size_t bounce_size;
> >
> > if (!token.group)
> > return false;
> >
> > - vdev = token.group->dev;
> > - domain = vdev->domain;
> > + if (token.group->dev->nas > 1)
> > + read_lock(&token.group->as_lock);
> > +
> > + bounce_size = token.group->as->domain->bounce_size;
> > +
> > + if (token.group->dev->nas > 1)
> > + read_unlock(&token.group->as_lock);
> >
> > - return dma_addr < domain->bounce_size;
> > + return dma_addr < bounce_size;
> > }
> >
> > static int vduse_dev_mapping_error(union virtio_map token, dma_addr_t dma_addr)
> > @@ -983,16 +1054,20 @@ static int vduse_dev_mapping_error(union virtio_map token, dma_addr_t dma_addr)
> >
> > static size_t vduse_dev_max_mapping_size(union virtio_map token)
> > {
> > - struct vduse_dev *vdev;
> > - struct vduse_iova_domain *domain;
> > + size_t bounce_size;
> >
> > if (!token.group)
> > return 0;
> >
> > - vdev = token.group->dev;
> > - domain = vdev->domain;
> > + if (token.group->dev->nas > 1)
> > + read_lock(&token.group->as_lock);
> > +
> > + bounce_size = token.group->as->domain->bounce_size;
> > +
> > + if (token.group->dev->nas > 1)
> > + read_unlock(&token.group->as_lock);
> >
> > - return domain->bounce_size;
> > + return bounce_size;
> > }
> >
> > static const struct virtio_map_ops vduse_map_ops = {
> > @@ -1132,39 +1207,40 @@ static int vduse_dev_queue_irq_work(struct vduse_dev *dev,
> > return ret;
> > }
> >
> > -static int vduse_dev_dereg_umem(struct vduse_dev *dev,
> > +static int vduse_dev_dereg_umem(struct vduse_dev *dev, u32 asid,
> > u64 iova, u64 size)
> > {
> > int ret;
> >
> > - mutex_lock(&dev->mem_lock);
> > + mutex_lock(&dev->as[asid].mem_lock);
> > ret = -ENOENT;
> > - if (!dev->umem)
> > + if (!dev->as[asid].umem)
> > goto unlock;
> >
> > ret = -EINVAL;
> > - if (!dev->domain)
> > + if (!dev->as[asid].domain)
> > goto unlock;
> >
> > - if (dev->umem->iova != iova || size != dev->domain->bounce_size)
> > + if (dev->as[asid].umem->iova != iova ||
> > + size != dev->as[asid].domain->bounce_size)
> > goto unlock;
> >
> > - vduse_domain_remove_user_bounce_pages(dev->domain);
> > - unpin_user_pages_dirty_lock(dev->umem->pages,
> > - dev->umem->npages, true);
> > - atomic64_sub(dev->umem->npages, &dev->umem->mm->pinned_vm);
> > - mmdrop(dev->umem->mm);
> > - vfree(dev->umem->pages);
> > - kfree(dev->umem);
> > - dev->umem = NULL;
> > + vduse_domain_remove_user_bounce_pages(dev->as[asid].domain);
> > + unpin_user_pages_dirty_lock(dev->as[asid].umem->pages,
> > + dev->as[asid].umem->npages, true);
> > + atomic64_sub(dev->as[asid].umem->npages, &dev->as[asid].umem->mm->pinned_vm);
> > + mmdrop(dev->as[asid].umem->mm);
> > + vfree(dev->as[asid].umem->pages);
> > + kfree(dev->as[asid].umem);
> > + dev->as[asid].umem = NULL;
> > ret = 0;
> > unlock:
> > - mutex_unlock(&dev->mem_lock);
> > + mutex_unlock(&dev->as[asid].mem_lock);
> > return ret;
> > }
> >
> > static int vduse_dev_reg_umem(struct vduse_dev *dev,
> > - u64 iova, u64 uaddr, u64 size)
> > + u32 asid, u64 iova, u64 uaddr, u64 size)
> > {
> > struct page **page_list = NULL;
> > struct vduse_umem *umem = NULL;
> > @@ -1172,14 +1248,14 @@ static int vduse_dev_reg_umem(struct vduse_dev *dev,
> > unsigned long npages, lock_limit;
> > int ret;
> >
> > - if (!dev->domain || !dev->domain->bounce_map ||
> > - size != dev->domain->bounce_size ||
> > + if (!dev->as[asid].domain || !dev->as[asid].domain->bounce_map ||
> > + size != dev->as[asid].domain->bounce_size ||
> > iova != 0 || uaddr & ~PAGE_MASK)
> > return -EINVAL;
> >
> > - mutex_lock(&dev->mem_lock);
> > + mutex_lock(&dev->as[asid].mem_lock);
> > ret = -EEXIST;
> > - if (dev->umem)
> > + if (dev->as[asid].umem)
> > goto unlock;
> >
> > ret = -ENOMEM;
> > @@ -1203,7 +1279,7 @@ static int vduse_dev_reg_umem(struct vduse_dev *dev,
> > goto out;
> > }
> >
> > - ret = vduse_domain_add_user_bounce_pages(dev->domain,
> > + ret = vduse_domain_add_user_bounce_pages(dev->as[asid].domain,
> > page_list, pinned);
> > if (ret)
> > goto out;
> > @@ -1216,7 +1292,7 @@ static int vduse_dev_reg_umem(struct vduse_dev *dev,
> > umem->mm = current->mm;
> > mmgrab(current->mm);
> >
> > - dev->umem = umem;
> > + dev->as[asid].umem = umem;
> > out:
> > if (ret && pinned > 0)
> > unpin_user_pages(page_list, pinned);
> > @@ -1227,7 +1303,7 @@ static int vduse_dev_reg_umem(struct vduse_dev *dev,
> > vfree(page_list);
> > kfree(umem);
> > }
> > - mutex_unlock(&dev->mem_lock);
> > + mutex_unlock(&dev->as[asid].mem_lock);
> > return ret;
> > }
> >
> > @@ -1248,43 +1324,46 @@ static void vduse_vq_update_effective_cpu(struct vduse_virtqueue *vq)
> > }
> >
> > static int vduse_dev_iotlb_entry(struct vduse_dev *dev,
> > - struct vduse_iotlb_entry *entry,
> > + struct vduse_iotlb_entry_v2 *entry,
> > struct file **f, uint64_t *capability)
> > {
> > + u32 asid;
> > int r = -EINVAL;
> > struct vhost_iotlb_map *map;
> > const struct vdpa_map_file *map_file;
> >
> > - if (entry->start > entry->last)
> > + if (entry->v1.start > entry->v1.last || entry->asid >= dev->nas)
> > return -EINVAL;
> >
> > + asid = array_index_nospec(entry->asid, dev->nas);
> > mutex_lock(&dev->domain_lock);
> > - if (!dev->domain)
> > +
> > + if (!dev->as[asid].domain)
> > goto out;
> >
> > - spin_lock(&dev->domain->iotlb_lock);
> > - map = vhost_iotlb_itree_first(dev->domain->iotlb, entry->start,
> > - entry->last);
> > + spin_lock(&dev->as[asid].domain->iotlb_lock);
> > + map = vhost_iotlb_itree_first(dev->as[asid].domain->iotlb,
> > + entry->v1.start, entry->v1.last);
> > if (map) {
> > if (f) {
> > map_file = (struct vdpa_map_file *)map->opaque;
> > *f = get_file(map_file->file);
> > }
> > - entry->offset = map_file->offset;
> > - entry->start = map->start;
> > - entry->last = map->last;
> > - entry->perm = map->perm;
> > + entry->v1.offset = map_file->offset;
> > + entry->v1.start = map->start;
> > + entry->v1.last = map->last;
> > + entry->v1.perm = map->perm;
> > if (capability) {
> > *capability = 0;
> >
> > - if (dev->domain->bounce_map && map->start == 0 &&
> > - map->last == dev->domain->bounce_size - 1)
> > + if (dev->as[asid].domain->bounce_map && map->start == 0 &&
> > + map->last == dev->as[asid].domain->bounce_size - 1)
> > *capability |= VDUSE_IOVA_CAP_UMEM;
> > }
> >
> > r = 0;
> > }
> > - spin_unlock(&dev->domain->iotlb_lock);
> > + spin_unlock(&dev->as[asid].domain->iotlb_lock);
> >
> > out:
> > mutex_unlock(&dev->domain_lock);
> > @@ -1302,12 +1381,29 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
> > return -EPERM;
> >
> > switch (cmd) {
> > - case VDUSE_IOTLB_GET_FD: {
> > - struct vduse_iotlb_entry entry;
> > + case VDUSE_IOTLB_GET_FD:
> > + case VDUSE_IOTLB_GET_FD2: {
> > + struct vduse_iotlb_entry_v2 entry = {0};
> > struct file *f = NULL;
> >
> > + ret = -ENOIOCTLCMD;
> > + if (dev->api_version < VDUSE_API_VERSION_1 &&
> > + cmd == VDUSE_IOTLB_GET_FD2)
> > + break;
> > +
> > ret = -EFAULT;
> > - if (copy_from_user(&entry, argp, sizeof(entry)))
> > + if (cmd == VDUSE_IOTLB_GET_FD2) {
> > + if (copy_from_user(&entry, argp, sizeof(entry)))
> > + break;
> > + } else {
> > + if (copy_from_user(&entry.v1, argp,
> > + sizeof(entry.v1)))
> > + break;
> > + }
> > +
> > + ret = -EINVAL;
> > + if (!is_mem_zero((const char *)entry.reserved,
> > + sizeof(entry.reserved)))
> > break;
> >
> > ret = vduse_dev_iotlb_entry(dev, &entry, &f, NULL);
> > @@ -1318,12 +1414,19 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
> > if (!f)
> > break;
> >
> > - ret = -EFAULT;
> > - if (copy_to_user(argp, &entry, sizeof(entry))) {
> > + if (cmd == VDUSE_IOTLB_GET_FD2)
> > + ret = copy_to_user(argp, &entry,
> > + sizeof(entry));
> > + else
> > + ret = copy_to_user(argp, &entry.v1,
> > + sizeof(entry.v1));
> > +
> > + if (ret) {
> > + ret = -EFAULT;
> > fput(f);
> > break;
> > }
> > - ret = receive_fd(f, NULL, perm_to_file_flags(entry.perm));
> > + ret = receive_fd(f, NULL, perm_to_file_flags(entry.v1.perm));
> > fput(f);
> > break;
> > }
> > @@ -1468,6 +1571,7 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
> > }
> > case VDUSE_IOTLB_REG_UMEM: {
> > struct vduse_iova_umem umem;
> > + u32 asid;
> >
> > ret = -EFAULT;
> > if (copy_from_user(&umem, argp, sizeof(umem)))
> > @@ -1475,17 +1579,21 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
> >
> > ret = -EINVAL;
> > if (!is_mem_zero((const char *)umem.reserved,
> > - sizeof(umem.reserved)))
> > + sizeof(umem.reserved)) ||
> > + (dev->api_version < VDUSE_API_VERSION_1 &&
> > + umem.asid != 0) || umem.asid >= dev->nas)
> > break;
> >
> > mutex_lock(&dev->domain_lock);
> > - ret = vduse_dev_reg_umem(dev, umem.iova,
> > + asid = array_index_nospec(umem.asid, dev->nas);
> > + ret = vduse_dev_reg_umem(dev, asid, umem.iova,
> > umem.uaddr, umem.size);
> > mutex_unlock(&dev->domain_lock);
> > break;
> > }
> > case VDUSE_IOTLB_DEREG_UMEM: {
> > struct vduse_iova_umem umem;
> > + u32 asid;
> >
> > ret = -EFAULT;
> > if (copy_from_user(&umem, argp, sizeof(umem)))
> > @@ -1493,17 +1601,22 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
> >
> > ret = -EINVAL;
> > if (!is_mem_zero((const char *)umem.reserved,
> > - sizeof(umem.reserved)))
> > + sizeof(umem.reserved)) ||
> > + (dev->api_version < VDUSE_API_VERSION_1 &&
> > + umem.asid != 0) ||
> > + umem.asid >= dev->nas)
> > break;
> > +
> > mutex_lock(&dev->domain_lock);
> > - ret = vduse_dev_dereg_umem(dev, umem.iova,
> > + asid = array_index_nospec(umem.asid, dev->nas);
> > + ret = vduse_dev_dereg_umem(dev, asid, umem.iova,
> > umem.size);
> > mutex_unlock(&dev->domain_lock);
> > break;
> > }
> > case VDUSE_IOTLB_GET_INFO: {
> > struct vduse_iova_info info;
> > - struct vduse_iotlb_entry entry;
> > + struct vduse_iotlb_entry_v2 entry;
> >
> > ret = -EFAULT;
> > if (copy_from_user(&info, argp, sizeof(info)))
> > @@ -1513,15 +1626,23 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
> > sizeof(info.reserved)))
> > break;
> >
> > - entry.start = info.start;
> > - entry.last = info.last;
> > + if (dev->api_version < VDUSE_API_VERSION_1) {
> > + if (info.asid)
> > + break;
> > + } else if (info.asid >= dev->nas)
> > + break;
> > +
> > + entry.v1.start = info.start;
> > + entry.v1.last = info.last;
> > + entry.asid = info.asid;
> > ret = vduse_dev_iotlb_entry(dev, &entry, NULL,
> > &info.capability);
> > if (ret < 0)
> > break;
> >
> > - info.start = entry.start;
> > - info.last = entry.last;
> > + info.start = entry.v1.start;
> > + info.last = entry.v1.last;
> > + info.asid = entry.asid;
> >
> > ret = -EFAULT;
> > if (copy_to_user(argp, &info, sizeof(info)))
> > @@ -1543,8 +1664,10 @@ static int vduse_dev_release(struct inode *inode, struct file *file)
> > struct vduse_dev *dev = file->private_data;
> >
> > mutex_lock(&dev->domain_lock);
> > - if (dev->domain)
> > - vduse_dev_dereg_umem(dev, 0, dev->domain->bounce_size);
> > + for (int i = 0; i < dev->nas; i++)
> > + if (dev->as[i].domain)
> > + vduse_dev_dereg_umem(dev, i, 0,
> > + dev->as[i].domain->bounce_size);
> > mutex_unlock(&dev->domain_lock);
> > spin_lock(&dev->msg_lock);
> > /* Make sure the inflight messages can processed after reconncection */
> > @@ -1763,7 +1886,6 @@ static struct vduse_dev *vduse_dev_create(void)
> > return NULL;
> >
> > mutex_init(&dev->lock);
> > - mutex_init(&dev->mem_lock);
> > mutex_init(&dev->domain_lock);
> > spin_lock_init(&dev->msg_lock);
> > INIT_LIST_HEAD(&dev->send_list);
> > @@ -1814,8 +1936,11 @@ static int vduse_destroy_dev(char *name)
> > idr_remove(&vduse_idr, dev->minor);
> > kvfree(dev->config);
> > vduse_dev_deinit_vqs(dev);
> > - if (dev->domain)
> > - vduse_domain_destroy(dev->domain);
> > + for (int i = 0; i < dev->nas; i++) {
> > + if (dev->as[i].domain)
> > + vduse_domain_destroy(dev->as[i].domain);
> > + }
> > + kfree(dev->as);
> > kfree(dev->name);
> > kfree(dev->groups);
> > vduse_dev_destroy(dev);
> > @@ -1862,12 +1987,17 @@ static bool vduse_validate_config(struct vduse_dev_config *config,
> > sizeof(config->reserved)))
> > return false;
> >
> > - if (api_version < VDUSE_API_VERSION_1 && config->ngroups)
> > + if (api_version < VDUSE_API_VERSION_1 &&
> > + (config->ngroups || config->nas))
> > return false;
> >
> > - if (api_version >= VDUSE_API_VERSION_1 &&
> > - (!config->ngroups || config->ngroups > VDUSE_DEV_MAX_GROUPS))
> > - return false;
> > + if (api_version >= VDUSE_API_VERSION_1) {
> > + if (!config->ngroups || config->ngroups > VDUSE_DEV_MAX_GROUPS)
> > + return false;
> > +
> > + if (!config->nas || config->nas > VDUSE_DEV_MAX_AS)
> > + return false;
> > + }
> >
> > if (config->vq_align > PAGE_SIZE)
> > return false;
> > @@ -1932,7 +2062,8 @@ static ssize_t bounce_size_store(struct device *device,
> >
> > ret = -EPERM;
> > mutex_lock(&dev->domain_lock);
> > - if (dev->domain)
> > + /* Assuming that if the first domain is allocated, all are allocated */
> > + if (dev->as[0].domain)
> > goto unlock;
> >
> > ret = kstrtouint(buf, 10, &bounce_size);
> > @@ -1984,6 +2115,14 @@ static int vduse_create_dev(struct vduse_dev_config *config,
> > dev->device_features = config->features;
> > dev->device_id = config->device_id;
> > dev->vendor_id = config->vendor_id;
> > +
> > + dev->nas = (dev->api_version < VDUSE_API_VERSION_1) ? 1 : config->nas;
> > + dev->as = kcalloc(dev->nas, sizeof(dev->as[0]), GFP_KERNEL);
> > + if (!dev->as)
> > + goto err_as;
> > + for (int i = 0; i < dev->nas; i++)
> > + mutex_init(&dev->as[i].mem_lock);
> > +
> > dev->ngroups = (dev->api_version < VDUSE_API_VERSION_1)
> > ? 1
> > : config->ngroups;
> > @@ -1991,8 +2130,11 @@ static int vduse_create_dev(struct vduse_dev_config *config,
> > GFP_KERNEL);
> > if (!dev->groups)
> > goto err_vq_groups;
> > - for (u32 i = 0; i < dev->ngroups; ++i)
> > + for (u32 i = 0; i < dev->ngroups; ++i) {
> > dev->groups[i].dev = dev;
> > + rwlock_init(&dev->groups[i].as_lock);
> > + dev->groups[i].as = &dev->as[0];
> > + }
> >
> > dev->name = kstrdup(config->name, GFP_KERNEL);
> > if (!dev->name)
> > @@ -2032,6 +2174,8 @@ static int vduse_create_dev(struct vduse_dev_config *config,
> > err_str:
> > kfree(dev->groups);
> > err_vq_groups:
> > + kfree(dev->as);
> > +err_as:
> > vduse_dev_destroy(dev);
> > err:
> > return ret;
> > @@ -2155,7 +2299,7 @@ static int vduse_dev_init_vdpa(struct vduse_dev *dev, const char *name)
> >
> > vdev = vdpa_alloc_device(struct vduse_vdpa, vdpa, dev->dev,
> > &vduse_vdpa_config_ops, &vduse_map_ops,
> > - dev->ngroups, 1, name, true);
> > + dev->ngroups, dev->nas, name, true);
> > if (IS_ERR(vdev))
> > return PTR_ERR(vdev);
> >
> > @@ -2170,7 +2314,8 @@ static int vdpa_dev_add(struct vdpa_mgmt_dev *mdev, const char *name,
> > const struct vdpa_dev_set_config *config)
> > {
> > struct vduse_dev *dev;
> > - int ret;
> > + size_t domain_bounce_size;
> > + int ret, i;
> >
> > mutex_lock(&vduse_lock);
> > dev = vduse_find_dev(name);
> > @@ -2184,29 +2329,38 @@ static int vdpa_dev_add(struct vdpa_mgmt_dev *mdev, const char *name,
> > return ret;
> >
> > mutex_lock(&dev->domain_lock);
> > - if (!dev->domain)
> > - dev->domain = vduse_domain_create(VDUSE_IOVA_SIZE - 1,
> > - dev->bounce_size);
> > - mutex_unlock(&dev->domain_lock);
> > - if (!dev->domain) {
> > - ret = -ENOMEM;
> > - goto domain_err;
> > + ret = 0;
> > +
> > + domain_bounce_size = dev->bounce_size / dev->nas;
> > + for (i = 0; i < dev->nas; ++i) {
> > + dev->as[i].domain = vduse_domain_create(VDUSE_IOVA_SIZE - 1,
> > + domain_bounce_size);
> > + if (!dev->as[i].domain) {
> > + ret = -ENOMEM;
> > + goto err;
> > + }
> > }
> >
> > + mutex_unlock(&dev->domain_lock);
> > +
> > ret = _vdpa_register_device(&dev->vdev->vdpa, dev->vq_num);
> > - if (ret) {
> > - goto register_err;
> > - }
> > + if (ret)
> > + goto err_register;
> >
> > return 0;
> >
> > -register_err:
> > +err_register:
> > mutex_lock(&dev->domain_lock);
> > - vduse_domain_destroy(dev->domain);
> > - dev->domain = NULL;
> > +
> > +err:
> > + for (int j = 0; j < i; j++) {
> > + if (dev->as[j].domain) {
> > + vduse_domain_destroy(dev->as[j].domain);
> > + dev->as[j].domain = NULL;
> > + }
> > + }
> > mutex_unlock(&dev->domain_lock);
> >
> > -domain_err:
> > put_device(&dev->vdev->vdpa.dev);
> >
> > return ret;
> > diff --git a/include/uapi/linux/vduse.h b/include/uapi/linux/vduse.h
> > index a3d51cf6df3a..9e423163d819 100644
> > --- a/include/uapi/linux/vduse.h
> > +++ b/include/uapi/linux/vduse.h
> > @@ -47,7 +47,8 @@ struct vduse_dev_config {
> > __u32 vq_num;
> > __u32 vq_align;
> > __u32 ngroups; /* if VDUSE_API_VERSION >= 1 */
> > - __u32 reserved[12];
> > + __u32 nas; /* if VDUSE_API_VERSION >= 1 */
> > + __u32 reserved[11];
> > __u32 config_size;
> > __u8 config[];
> > };
> > @@ -166,6 +167,16 @@ struct vduse_vq_state_packed {
> > __u16 last_used_idx;
> > };
> >
> > +/**
> > + * struct vduse_vq_group_asid - virtqueue group ASID
> > + * @group: Index of the virtqueue group
> > + * @asid: Address space ID of the group
> > + */
> > +struct vduse_vq_group_asid {
> > + __u32 group;
> > + __u32 asid;
> > +};
> > +
> > /**
> > * struct vduse_vq_info - information of a virtqueue
> > * @index: virtqueue index
> > @@ -225,6 +236,7 @@ struct vduse_vq_eventfd {
> > * @uaddr: start address of userspace memory, it must be aligned to page size
> > * @iova: start of the IOVA region
> > * @size: size of the IOVA region
> > + * @asid: Address space ID of the IOVA region
> > * @reserved: for future use, needs to be initialized to zero
> > *
> > * Structure used by VDUSE_IOTLB_REG_UMEM and VDUSE_IOTLB_DEREG_UMEM
> > @@ -234,7 +246,8 @@ struct vduse_iova_umem {
> > __u64 uaddr;
> > __u64 iova;
> > __u64 size;
> > - __u64 reserved[3];
> > + __u32 asid;
> > + __u32 reserved[5];
> > };
> >
> > /* Register userspace memory for IOVA regions */
> > @@ -248,6 +261,7 @@ struct vduse_iova_umem {
> > * @start: start of the IOVA region
> > * @last: last of the IOVA region
> > * @capability: capability of the IOVA region
> > + * @asid: Address space ID of the IOVA region, only if device API version >= 1
> > * @reserved: for future use, needs to be initialized to zero
> > *
> > * Structure used by VDUSE_IOTLB_GET_INFO ioctl to get information of
> > @@ -258,7 +272,8 @@ struct vduse_iova_info {
> > __u64 last;
> > #define VDUSE_IOVA_CAP_UMEM (1 << 0)
> > __u64 capability;
> > - __u64 reserved[3];
> > + __u32 asid; /* Only if device API version >= 1 */
> > + __u32 reserved[5];
> > };
> >
> > /*
> > @@ -267,6 +282,28 @@ struct vduse_iova_info {
> > */
> > #define VDUSE_IOTLB_GET_INFO _IOWR(VDUSE_BASE, 0x1a, struct vduse_iova_info)
> >
> > +/**
> > + * struct vduse_iotlb_entry_v2 - entry of IOTLB to describe one IOVA region
> > + *
> > + * @v1: the original vduse_iotlb_entry
> > + * @asid: address space ID of the IOVA region
> > + * @reserver: for future use, needs to be initialized to zero
> > + *
> > + * Structure used by VDUSE_IOTLB_GET_FD2 ioctl to find an overlapped IOVA region.
> > + */
> > +struct vduse_iotlb_entry_v2 {
> > + struct vduse_iotlb_entry v1;
> > + __u32 asid;
> > + __u32 reserved[12];
> > +};
> > +
> > +/*
> > + * Same as VDUSE_IOTLB_GET_FD but with vduse_iotlb_entry_v2 argument that
> > + * support extra fields.
> > + */
> > +#define VDUSE_IOTLB_GET_FD2 _IOWR(VDUSE_BASE, 0x1b, struct vduse_iotlb_entry_v2)
> > +
> > +
> > /* The control messages definition for read(2)/write(2) on /dev/vduse/$NAME */
> >
> > /**
> > @@ -280,6 +317,7 @@ enum vduse_req_type {
> > VDUSE_GET_VQ_STATE,
> > VDUSE_SET_STATUS,
> > VDUSE_UPDATE_IOTLB,
> > + VDUSE_SET_VQ_GROUP_ASID,
> > };
> >
> > /**
> > @@ -314,6 +352,18 @@ struct vduse_iova_range {
> > __u64 last;
> > };
> >
> > +/**
> > + * struct vduse_iova_range - IOVA range [start, last] if API_VERSION >= 1
>
> ypu mean struct vduse_iova_range_v2 ?
>
Right, thanks for the catch!
> > + * @start: start of the IOVA range
> > + * @last: last of the IOVA range
> > + * @asid: address space ID of the IOVA range
> > + */
> > +struct vduse_iova_range_v2 {
> > + __u64 start;
> > + __u64 last;
> > + __u32 asid;
> > +};
> > +
> > /**
> > * struct vduse_dev_request - control request
> > * @type: request type
> > @@ -322,6 +372,8 @@ struct vduse_iova_range {
> > * @vq_state: virtqueue state, only index field is available
> > * @s: device status
> > * @iova: IOVA range for updating
> > + * @iova_v2: IOVA range for updating if API_VERSION >= 1
> > + * @vq_group_asid: ASID of a virtqueue group
> > * @padding: padding
> > *
> > * Structure used by read(2) on /dev/vduse/$NAME.
> > @@ -334,6 +386,11 @@ struct vduse_dev_request {
> > struct vduse_vq_state vq_state;
> > struct vduse_dev_status s;
> > struct vduse_iova_range iova;
> > + /* Following members but padding exist only if vduse api
> > + * version >= 1
> > + */;
> > + struct vduse_iova_range_v2 iova_v2;
> > + struct vduse_vq_group_asid vq_group_asid;
> > __u32 padding[32];
> > };
> > };
> > --
> > 2.52.0
>
^ permalink raw reply [flat|nested] 40+ messages in thread* Re: [PATCH v11 11/12] vduse: add vq group asid support
2026-01-12 11:56 ` Eugenio Perez Martin
@ 2026-01-12 12:00 ` Michael S. Tsirkin
0 siblings, 0 replies; 40+ messages in thread
From: Michael S. Tsirkin @ 2026-01-12 12:00 UTC (permalink / raw)
To: Eugenio Perez Martin
Cc: linux-kernel, virtualization, Maxime Coquelin, Laurent Vivier,
Cindy Lu, jasowang, Xuan Zhuo, Stefano Garzarella, Yongji Xie
On Mon, Jan 12, 2026 at 12:56:34PM +0100, Eugenio Perez Martin wrote:
> > > @@ -439,11 +453,14 @@ static __poll_t vduse_dev_poll(struct file *file, poll_table *wait)
> > > static void vduse_dev_reset(struct vduse_dev *dev)
> > > {
> > > int i;
> > > - struct vduse_iova_domain *domain = dev->domain;
> > >
> > > /* The coherent mappings are handled in vduse_dev_free_coherent() */
> > > - if (domain && domain->bounce_map)
> > > - vduse_domain_reset_bounce_map(domain);
> > > + for (i = 0; i < dev->nas; i++) {
> > > + struct vduse_iova_domain *domain = dev->as[i].domain;
> >
> > I do not understand the locking here an in many other places.
> > dev->as is dereferenced here apparently outside as_lock?
> >
>
> The virtqueue groups' "as_lock" member protects the virtqueue groups'
> "as" pointer. But we're not accessing any vq groups "as" member here,
> we're accessing the dev->as[i].domain pointer here, which doesn't
> change in all the lifetime of the device.
Ah. got it. maybe a comment near the field definition will help.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v11 11/12] vduse: add vq group asid support
2026-01-09 15:24 ` [PATCH v11 11/12] vduse: add vq group asid support Eugenio Pérez
2026-01-11 0:03 ` Michael S. Tsirkin
@ 2026-01-13 6:23 ` Jason Wang
2026-01-13 14:38 ` Eugenio Perez Martin
1 sibling, 1 reply; 40+ messages in thread
From: Jason Wang @ 2026-01-13 6:23 UTC (permalink / raw)
To: Eugenio Pérez
Cc: Michael S . Tsirkin, linux-kernel, virtualization,
Maxime Coquelin, Laurent Vivier, Cindy Lu, Xuan Zhuo,
Stefano Garzarella, Yongji Xie
On Fri, Jan 9, 2026 at 11:25 PM Eugenio Pérez <eperezma@redhat.com> wrote:
>
> Add support for assigning Address Space Identifiers (ASIDs) to each VQ
> group. This enables mapping each group into a distinct memory space.
>
> The vq group to ASID association is protected by a rwlock now. But the
> mutex domain_lock keeps protecting the domains of all ASIDs, as some
> operations like the one related with the bounce buffer size still
> requires to lock all the ASIDs.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>
> ---
> Future improvements can include performance optimizations on top like
> ore to RCU or thread synchronized atomics, or hardening by tracking ASID
> or ASID hashes on unused bits of the DMA address.
>
> Tested virtio_vdpa by adding manually two threads in vduse_set_status:
> one of them modifies the vq group 0 ASID and the other one map and unmap
> memory continuously. After a while, the two threads stop and the usual
> work continues. Test with version 0, version 1 with the old ioctl, and
> verion 1 with the new ioctl.
>
> Tested with vhost_vdpa by migrating a VM while ping on OVS+VDUSE. A few
> workaround were needed in some parts:
> * Do not enable CVQ before data vqs in QEMU, as VDUSE does not forward
> the enable message to the userland device. This will be solved in the
> future.
> * Share the suspended state between all vhost devices in QEMU:
> https://lists.nongnu.org/archive/html/qemu-devel/2025-11/msg02947.html
> * Implement a fake VDUSE suspend vdpa operation callback that always
> returns true in the kernel. DPDK suspend the device at the first
> GET_VRING_BASE.
> * Remove the CVQ blocker in ASID.
>
> The driver vhost_vdpa was also tested with version 0, version 1 with the
> old ioctl, version 1 with the new ioctl but only one ASID, and version 1
> with many ASID.
I think we'd better update the Documentation/userspace-api/vduse.rst
for the new uAPI.
>
> ---
> v11:
> * Remove duplicated free_pages_exact in vduse_domain_free_coherent
> (Jason).
> * Do not take the vq groups lock if nas == 1.
> * Do not reset the vq group ASID in vq reset (Jason). Removed extra
> function vduse_set_group_asid_nomsg, not needed anymore.
> * Move the vduse_iotlb_entry_v2 argument to a new ioctl, as argument
> didn't match the previous VDUSE_IOTLB_GET_FD.
> * Move the asid < dev->nas check to vdpa core.
>
> v10:
> * Back to rwlock version so stronger locks are used.
> * Take out allocations from rwlock.
> * Forbid changing ASID of a vq group after DRIVER_OK (Jason)
> * Remove bad fetching again of domain variable in
> vduse_dev_max_mapping_size (Yongji).
> * Remove unused vdev definition in vdpa map_ops callbacks (kernel test
> robot).
>
> v9:
> * Replace mutex with rwlock, as the vdpa map_ops can run from atomic
> context.
>
> v8:
> * Revert the mutex to rwlock change, it needs proper profiling to
> justify it.
>
> v7:
> * Take write lock in the error path (Jason).
>
> v6:
> * Make vdpa_dev_add use gotos for error handling (MST).
> * s/(dev->api_version < 1) ?/(dev->api_version < VDUSE_API_VERSION_1) ?/
> (MST).
> * Fix struct name not matching in the doc.
>
> v5:
> * Properly return errno if copy_to_user returns >0 in VDUSE_IOTLB_GET_FD
> ioctl (Jason).
> * Properly set domain bounce size to divide equally between nas (Jason).
> * Exclude "padding" member from the only >V1 members in
> vduse_dev_request.
>
> v4:
> * Divide each domain bounce size between the device bounce size (Jason).
> * revert unneeded addr = NULL assignment (Jason)
> * Change if (x && (y || z)) return to if (x) { if (y) return; if (z)
> return; } (Jason)
> * Change a bad multiline comment, using @ caracter instead of * (Jason).
> * Consider config->nas == 0 as a fail (Jason).
>
> v3:
> * Get the vduse domain through the vduse_as in the map functions
> (Jason).
> * Squash with the patch creating the vduse_as struct (Jason).
> * Create VDUSE_DEV_MAX_AS instead of comparing agains a magic number
> (Jason)
>
> v2:
> * Convert the use of mutex to rwlock.
>
> RFC v3:
> * Increase VDUSE_MAX_VQ_GROUPS to 0xffff (Jason). It was set to a lower
> value to reduce memory consumption, but vqs are already limited to
> that value and userspace VDUSE is able to allocate that many vqs.
> * Remove TODO about merging VDUSE_IOTLB_GET_FD ioctl with
> VDUSE_IOTLB_GET_INFO.
> * Use of array_index_nospec in VDUSE device ioctls.
> * Embed vduse_iotlb_entry into vduse_iotlb_entry_v2.
> * Move the umem mutex to asid struct so there is no contention between
> ASIDs.
>
> RFC v2:
> * Make iotlb entry the last one of vduse_iotlb_entry_v2 so the first
> part of the struct is the same.
> ---
> drivers/vdpa/vdpa_user/vduse_dev.c | 392 ++++++++++++++++++++---------
> include/uapi/linux/vduse.h | 63 ++++-
> 2 files changed, 333 insertions(+), 122 deletions(-)
>
> diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
> index bf437816fd7d..8227b5e9f3f6 100644
> --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> @@ -41,6 +41,7 @@
>
> #define VDUSE_DEV_MAX (1U << MINORBITS)
> #define VDUSE_DEV_MAX_GROUPS 0xffff
> +#define VDUSE_DEV_MAX_AS 0xffff
> #define VDUSE_MAX_BOUNCE_SIZE (1024 * 1024 * 1024)
> #define VDUSE_MIN_BOUNCE_SIZE (1024 * 1024)
> #define VDUSE_BOUNCE_SIZE (64 * 1024 * 1024)
> @@ -86,7 +87,15 @@ struct vduse_umem {
> struct mm_struct *mm;
> };
>
> +struct vduse_as {
> + struct vduse_iova_domain *domain;
> + struct vduse_umem *umem;
> + struct mutex mem_lock;
> +};
> +
> struct vduse_vq_group {
> + rwlock_t as_lock;
> + struct vduse_as *as; /* Protected by as_lock */
> struct vduse_dev *dev;
> };
>
> @@ -94,7 +103,7 @@ struct vduse_dev {
> struct vduse_vdpa *vdev;
> struct device *dev;
> struct vduse_virtqueue **vqs;
> - struct vduse_iova_domain *domain;
> + struct vduse_as *as;
> char *name;
> struct mutex lock;
> spinlock_t msg_lock;
> @@ -122,9 +131,8 @@ struct vduse_dev {
> u32 vq_num;
> u32 vq_align;
> u32 ngroups;
> - struct vduse_umem *umem;
> + u32 nas;
> struct vduse_vq_group *groups;
> - struct mutex mem_lock;
> unsigned int bounce_size;
> struct mutex domain_lock;
> };
> @@ -314,7 +322,7 @@ static int vduse_dev_set_status(struct vduse_dev *dev, u8 status)
> return vduse_dev_msg_sync(dev, &msg);
> }
>
> -static int vduse_dev_update_iotlb(struct vduse_dev *dev,
> +static int vduse_dev_update_iotlb(struct vduse_dev *dev, u32 asid,
> u64 start, u64 last)
> {
> struct vduse_dev_msg msg = { 0 };
> @@ -323,8 +331,14 @@ static int vduse_dev_update_iotlb(struct vduse_dev *dev,
> return -EINVAL;
>
> msg.req.type = VDUSE_UPDATE_IOTLB;
> - msg.req.iova.start = start;
> - msg.req.iova.last = last;
> + if (dev->api_version < VDUSE_API_VERSION_1) {
> + msg.req.iova.start = start;
> + msg.req.iova.last = last;
> + } else {
> + msg.req.iova_v2.start = start;
> + msg.req.iova_v2.last = last;
> + msg.req.iova_v2.asid = asid;
> + }
>
> return vduse_dev_msg_sync(dev, &msg);
> }
> @@ -439,11 +453,14 @@ static __poll_t vduse_dev_poll(struct file *file, poll_table *wait)
> static void vduse_dev_reset(struct vduse_dev *dev)
> {
> int i;
> - struct vduse_iova_domain *domain = dev->domain;
>
> /* The coherent mappings are handled in vduse_dev_free_coherent() */
> - if (domain && domain->bounce_map)
> - vduse_domain_reset_bounce_map(domain);
> + for (i = 0; i < dev->nas; i++) {
> + struct vduse_iova_domain *domain = dev->as[i].domain;
> +
> + if (domain && domain->bounce_map)
> + vduse_domain_reset_bounce_map(domain);
> + }
>
> down_write(&dev->rwsem);
>
> @@ -622,6 +639,31 @@ static union virtio_map vduse_get_vq_map(struct vdpa_device *vdpa, u16 idx)
> return ret;
> }
>
> +static int vduse_set_group_asid(struct vdpa_device *vdpa, unsigned int group,
> + unsigned int asid)
> +{
> + struct vduse_dev *dev = vdpa_to_vduse(vdpa);
> + struct vduse_dev_msg msg = { 0 };
> + int r;
> +
> + if (dev->api_version < VDUSE_API_VERSION_1)
> + return -EINVAL;
> +
> + msg.req.type = VDUSE_SET_VQ_GROUP_ASID;
> + msg.req.vq_group_asid.group = group;
> + msg.req.vq_group_asid.asid = asid;
> +
> + r = vduse_dev_msg_sync(dev, &msg);
> + if (r < 0)
> + return r;
> +
> + write_lock(&dev->groups[group].as_lock);
> + dev->groups[group].as = &dev->as[asid];
> + write_unlock(&dev->groups[group].as_lock);
> +
> + return 0;
> +}
> +
> static int vduse_vdpa_get_vq_state(struct vdpa_device *vdpa, u16 idx,
> struct vdpa_vq_state *state)
> {
> @@ -793,13 +835,13 @@ static int vduse_vdpa_set_map(struct vdpa_device *vdpa,
> struct vduse_dev *dev = vdpa_to_vduse(vdpa);
> int ret;
>
> - ret = vduse_domain_set_map(dev->domain, iotlb);
> + ret = vduse_domain_set_map(dev->as[asid].domain, iotlb);
> if (ret)
> return ret;
>
> - ret = vduse_dev_update_iotlb(dev, 0ULL, ULLONG_MAX);
> + ret = vduse_dev_update_iotlb(dev, asid, 0ULL, ULLONG_MAX);
> if (ret) {
> - vduse_domain_clear_map(dev->domain, iotlb);
> + vduse_domain_clear_map(dev->as[asid].domain, iotlb);
> return ret;
> }
>
> @@ -842,6 +884,7 @@ static const struct vdpa_config_ops vduse_vdpa_config_ops = {
> .get_vq_affinity = vduse_vdpa_get_vq_affinity,
> .reset = vduse_vdpa_reset,
> .set_map = vduse_vdpa_set_map,
> + .set_group_asid = vduse_set_group_asid,
> .get_vq_map = vduse_get_vq_map,
> .free = vduse_vdpa_free,
> };
> @@ -850,32 +893,38 @@ static void vduse_dev_sync_single_for_device(union virtio_map token,
> dma_addr_t dma_addr, size_t size,
> enum dma_data_direction dir)
> {
> - struct vduse_dev *vdev;
> struct vduse_iova_domain *domain;
>
> if (!token.group)
> return;
>
> - vdev = token.group->dev;
> - domain = vdev->domain;
> + if (token.group->dev->nas > 1)
> + read_lock(&token.group->as_lock);
I suggest to factor this and unlock out into helpers.
>
> + domain = token.group->as->domain;
> vduse_domain_sync_single_for_device(domain, dma_addr, size, dir);
> +
> + if (token.group->dev->nas > 1)
> + read_unlock(&token.group->as_lock);
> }
>
> static void vduse_dev_sync_single_for_cpu(union virtio_map token,
> dma_addr_t dma_addr, size_t size,
> enum dma_data_direction dir)
> {
> - struct vduse_dev *vdev;
> struct vduse_iova_domain *domain;
>
> if (!token.group)
> return;
>
> - vdev = token.group->dev;
> - domain = vdev->domain;
> + if (token.group->dev->nas > 1)
> + read_lock(&token.group->as_lock);
>
> + domain = token.group->as->domain;
> vduse_domain_sync_single_for_cpu(domain, dma_addr, size, dir);
> +
> + if (token.group->dev->nas > 1)
> + read_unlock(&token.group->as_lock);
> }
>
> static dma_addr_t vduse_dev_map_page(union virtio_map token, struct page *page,
> @@ -883,38 +932,45 @@ static dma_addr_t vduse_dev_map_page(union virtio_map token, struct page *page,
> enum dma_data_direction dir,
> unsigned long attrs)
> {
> - struct vduse_dev *vdev;
> struct vduse_iova_domain *domain;
> + dma_addr_t r;
>
> if (!token.group)
> return DMA_MAPPING_ERROR;
>
> - vdev = token.group->dev;
> - domain = vdev->domain;
> + if (token.group->dev->nas > 1)
> + read_lock(&token.group->as_lock);
> + domain = token.group->as->domain;
> + r = vduse_domain_map_page(domain, page, offset, size, dir, attrs);
>
> - return vduse_domain_map_page(domain, page, offset, size, dir, attrs);
> + if (token.group->dev->nas > 1)
> + read_unlock(&token.group->as_lock);
> +
> + return r;
> }
>
> static void vduse_dev_unmap_page(union virtio_map token, dma_addr_t dma_addr,
> size_t size, enum dma_data_direction dir,
> unsigned long attrs)
> {
> - struct vduse_dev *vdev;
> struct vduse_iova_domain *domain;
>
> if (!token.group)
> return;
>
> - vdev = token.group->dev;
> - domain = vdev->domain;
> + if (token.group->dev->nas > 1)
> + read_lock(&token.group->as_lock);
> +
> + domain = token.group->as->domain;
> + vduse_domain_unmap_page(domain, dma_addr, size, dir, attrs);
>
> - return vduse_domain_unmap_page(domain, dma_addr, size, dir, attrs);
> + if (token.group->dev->nas > 1)
> + read_unlock(&token.group->as_lock);
> }
>
> static void *vduse_dev_alloc_coherent(union virtio_map token, size_t size,
> dma_addr_t *dma_addr, gfp_t flag)
> {
> - struct vduse_dev *vdev;
> struct vduse_iova_domain *domain;
> unsigned long iova;
> void *addr;
> @@ -927,8 +983,10 @@ static void *vduse_dev_alloc_coherent(union virtio_map token, size_t size,
> if (!addr)
> return NULL;
>
> - vdev = token.group->dev;
> - domain = vdev->domain;
> + if (token.group->dev->nas > 1)
> + read_lock(&token.group->as_lock);
> +
> + domain = token.group->as->domain;
> addr = vduse_domain_alloc_coherent(domain, size,
> (dma_addr_t *)&iova, addr);
> if (!addr)
> @@ -936,9 +994,14 @@ static void *vduse_dev_alloc_coherent(union virtio_map token, size_t size,
>
> *dma_addr = (dma_addr_t)iova;
>
> + if (token.group->dev->nas > 1)
> + read_unlock(&token.group->as_lock);
> +
> return addr;
>
> err:
> + if (token.group->dev->nas > 1)
> + read_unlock(&token.group->as_lock);
> free_pages_exact(addr, size);
> return NULL;
> }
> @@ -947,31 +1010,39 @@ static void vduse_dev_free_coherent(union virtio_map token, size_t size,
> void *vaddr, dma_addr_t dma_addr,
> unsigned long attrs)
> {
> - struct vduse_dev *vdev;
> struct vduse_iova_domain *domain;
>
> if (!token.group)
> return;
>
> - vdev = token.group->dev;
> - domain = vdev->domain;
> + if (token.group->dev->nas > 1)
> + read_lock(&token.group->as_lock);
>
> + domain = token.group->as->domain;
> vduse_domain_free_coherent(domain, size, dma_addr, attrs);
> +
> + if (token.group->dev->nas > 1)
> + read_unlock(&token.group->as_lock);
> +
> free_pages_exact(vaddr, size);
> }
>
> static bool vduse_dev_need_sync(union virtio_map token, dma_addr_t dma_addr)
> {
> - struct vduse_dev *vdev;
> - struct vduse_iova_domain *domain;
> + size_t bounce_size;
>
> if (!token.group)
> return false;
>
> - vdev = token.group->dev;
> - domain = vdev->domain;
> + if (token.group->dev->nas > 1)
> + read_lock(&token.group->as_lock);
> +
> + bounce_size = token.group->as->domain->bounce_size;
> +
> + if (token.group->dev->nas > 1)
> + read_unlock(&token.group->as_lock);
>
> - return dma_addr < domain->bounce_size;
> + return dma_addr < bounce_size;
> }
>
> static int vduse_dev_mapping_error(union virtio_map token, dma_addr_t dma_addr)
> @@ -983,16 +1054,20 @@ static int vduse_dev_mapping_error(union virtio_map token, dma_addr_t dma_addr)
>
> static size_t vduse_dev_max_mapping_size(union virtio_map token)
> {
> - struct vduse_dev *vdev;
> - struct vduse_iova_domain *domain;
> + size_t bounce_size;
>
> if (!token.group)
> return 0;
>
> - vdev = token.group->dev;
> - domain = vdev->domain;
> + if (token.group->dev->nas > 1)
> + read_lock(&token.group->as_lock);
> +
> + bounce_size = token.group->as->domain->bounce_size;
> +
> + if (token.group->dev->nas > 1)
> + read_unlock(&token.group->as_lock);
>
> - return domain->bounce_size;
> + return bounce_size;
> }
>
> static const struct virtio_map_ops vduse_map_ops = {
> @@ -1132,39 +1207,40 @@ static int vduse_dev_queue_irq_work(struct vduse_dev *dev,
> return ret;
> }
>
> -static int vduse_dev_dereg_umem(struct vduse_dev *dev,
> +static int vduse_dev_dereg_umem(struct vduse_dev *dev, u32 asid,
> u64 iova, u64 size)
> {
> int ret;
>
> - mutex_lock(&dev->mem_lock);
> + mutex_lock(&dev->as[asid].mem_lock);
> ret = -ENOENT;
> - if (!dev->umem)
> + if (!dev->as[asid].umem)
> goto unlock;
>
> ret = -EINVAL;
> - if (!dev->domain)
> + if (!dev->as[asid].domain)
> goto unlock;
>
> - if (dev->umem->iova != iova || size != dev->domain->bounce_size)
> + if (dev->as[asid].umem->iova != iova ||
> + size != dev->as[asid].domain->bounce_size)
> goto unlock;
>
> - vduse_domain_remove_user_bounce_pages(dev->domain);
> - unpin_user_pages_dirty_lock(dev->umem->pages,
> - dev->umem->npages, true);
> - atomic64_sub(dev->umem->npages, &dev->umem->mm->pinned_vm);
> - mmdrop(dev->umem->mm);
> - vfree(dev->umem->pages);
> - kfree(dev->umem);
> - dev->umem = NULL;
> + vduse_domain_remove_user_bounce_pages(dev->as[asid].domain);
> + unpin_user_pages_dirty_lock(dev->as[asid].umem->pages,
> + dev->as[asid].umem->npages, true);
> + atomic64_sub(dev->as[asid].umem->npages, &dev->as[asid].umem->mm->pinned_vm);
> + mmdrop(dev->as[asid].umem->mm);
> + vfree(dev->as[asid].umem->pages);
> + kfree(dev->as[asid].umem);
> + dev->as[asid].umem = NULL;
> ret = 0;
> unlock:
> - mutex_unlock(&dev->mem_lock);
> + mutex_unlock(&dev->as[asid].mem_lock);
> return ret;
> }
>
> static int vduse_dev_reg_umem(struct vduse_dev *dev,
> - u64 iova, u64 uaddr, u64 size)
> + u32 asid, u64 iova, u64 uaddr, u64 size)
> {
> struct page **page_list = NULL;
> struct vduse_umem *umem = NULL;
> @@ -1172,14 +1248,14 @@ static int vduse_dev_reg_umem(struct vduse_dev *dev,
> unsigned long npages, lock_limit;
> int ret;
>
> - if (!dev->domain || !dev->domain->bounce_map ||
> - size != dev->domain->bounce_size ||
> + if (!dev->as[asid].domain || !dev->as[asid].domain->bounce_map ||
> + size != dev->as[asid].domain->bounce_size ||
> iova != 0 || uaddr & ~PAGE_MASK)
> return -EINVAL;
>
> - mutex_lock(&dev->mem_lock);
> + mutex_lock(&dev->as[asid].mem_lock);
> ret = -EEXIST;
> - if (dev->umem)
> + if (dev->as[asid].umem)
> goto unlock;
>
> ret = -ENOMEM;
> @@ -1203,7 +1279,7 @@ static int vduse_dev_reg_umem(struct vduse_dev *dev,
> goto out;
> }
>
> - ret = vduse_domain_add_user_bounce_pages(dev->domain,
> + ret = vduse_domain_add_user_bounce_pages(dev->as[asid].domain,
> page_list, pinned);
> if (ret)
> goto out;
> @@ -1216,7 +1292,7 @@ static int vduse_dev_reg_umem(struct vduse_dev *dev,
> umem->mm = current->mm;
> mmgrab(current->mm);
>
> - dev->umem = umem;
> + dev->as[asid].umem = umem;
> out:
> if (ret && pinned > 0)
> unpin_user_pages(page_list, pinned);
> @@ -1227,7 +1303,7 @@ static int vduse_dev_reg_umem(struct vduse_dev *dev,
> vfree(page_list);
> kfree(umem);
> }
> - mutex_unlock(&dev->mem_lock);
> + mutex_unlock(&dev->as[asid].mem_lock);
> return ret;
> }
>
> @@ -1248,43 +1324,46 @@ static void vduse_vq_update_effective_cpu(struct vduse_virtqueue *vq)
> }
>
> static int vduse_dev_iotlb_entry(struct vduse_dev *dev,
> - struct vduse_iotlb_entry *entry,
> + struct vduse_iotlb_entry_v2 *entry,
> struct file **f, uint64_t *capability)
> {
> + u32 asid;
> int r = -EINVAL;
> struct vhost_iotlb_map *map;
> const struct vdpa_map_file *map_file;
>
> - if (entry->start > entry->last)
> + if (entry->v1.start > entry->v1.last || entry->asid >= dev->nas)
> return -EINVAL;
>
> + asid = array_index_nospec(entry->asid, dev->nas);
> mutex_lock(&dev->domain_lock);
> - if (!dev->domain)
> +
> + if (!dev->as[asid].domain)
> goto out;
>
> - spin_lock(&dev->domain->iotlb_lock);
> - map = vhost_iotlb_itree_first(dev->domain->iotlb, entry->start,
> - entry->last);
> + spin_lock(&dev->as[asid].domain->iotlb_lock);
> + map = vhost_iotlb_itree_first(dev->as[asid].domain->iotlb,
> + entry->v1.start, entry->v1.last);
> if (map) {
> if (f) {
> map_file = (struct vdpa_map_file *)map->opaque;
> *f = get_file(map_file->file);
> }
> - entry->offset = map_file->offset;
> - entry->start = map->start;
> - entry->last = map->last;
> - entry->perm = map->perm;
> + entry->v1.offset = map_file->offset;
> + entry->v1.start = map->start;
> + entry->v1.last = map->last;
> + entry->v1.perm = map->perm;
> if (capability) {
> *capability = 0;
>
> - if (dev->domain->bounce_map && map->start == 0 &&
> - map->last == dev->domain->bounce_size - 1)
> + if (dev->as[asid].domain->bounce_map && map->start == 0 &&
> + map->last == dev->as[asid].domain->bounce_size - 1)
> *capability |= VDUSE_IOVA_CAP_UMEM;
> }
>
> r = 0;
> }
> - spin_unlock(&dev->domain->iotlb_lock);
> + spin_unlock(&dev->as[asid].domain->iotlb_lock);
>
> out:
> mutex_unlock(&dev->domain_lock);
> @@ -1302,12 +1381,29 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
> return -EPERM;
>
> switch (cmd) {
> - case VDUSE_IOTLB_GET_FD: {
> - struct vduse_iotlb_entry entry;
> + case VDUSE_IOTLB_GET_FD:
> + case VDUSE_IOTLB_GET_FD2: {
I would rename this as GET_FD_ASID but I wonder the reason for the new
ioctl. I think we can deduce it from the API version, for example, you
didn't introduce VDUSE_IOTLB_REG_UMEM2.
> + struct vduse_iotlb_entry_v2 entry = {0};
> struct file *f = NULL;
>
> + ret = -ENOIOCTLCMD;
> + if (dev->api_version < VDUSE_API_VERSION_1 &&
> + cmd == VDUSE_IOTLB_GET_FD2)
> + break;
> +
> ret = -EFAULT;
> - if (copy_from_user(&entry, argp, sizeof(entry)))
> + if (cmd == VDUSE_IOTLB_GET_FD2) {
> + if (copy_from_user(&entry, argp, sizeof(entry)))
> + break;
> + } else {
> + if (copy_from_user(&entry.v1, argp,
> + sizeof(entry.v1)))
> + break;
> + }
> +
> + ret = -EINVAL;
> + if (!is_mem_zero((const char *)entry.reserved,
> + sizeof(entry.reserved)))
> break;
>
> ret = vduse_dev_iotlb_entry(dev, &entry, &f, NULL);
> @@ -1318,12 +1414,19 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
> if (!f)
> break;
>
> - ret = -EFAULT;
> - if (copy_to_user(argp, &entry, sizeof(entry))) {
> + if (cmd == VDUSE_IOTLB_GET_FD2)
> + ret = copy_to_user(argp, &entry,
> + sizeof(entry));
> + else
> + ret = copy_to_user(argp, &entry.v1,
> + sizeof(entry.v1));
> +
> + if (ret) {
> + ret = -EFAULT;
> fput(f);
> break;
> }
> - ret = receive_fd(f, NULL, perm_to_file_flags(entry.perm));
> + ret = receive_fd(f, NULL, perm_to_file_flags(entry.v1.perm));
> fput(f);
> break;
> }
> @@ -1468,6 +1571,7 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
> }
> case VDUSE_IOTLB_REG_UMEM: {
> struct vduse_iova_umem umem;
> + u32 asid;
>
> ret = -EFAULT;
> if (copy_from_user(&umem, argp, sizeof(umem)))
> @@ -1475,17 +1579,21 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
>
> ret = -EINVAL;
> if (!is_mem_zero((const char *)umem.reserved,
> - sizeof(umem.reserved)))
> + sizeof(umem.reserved)) ||
> + (dev->api_version < VDUSE_API_VERSION_1 &&
> + umem.asid != 0) || umem.asid >= dev->nas)
> break;
>
> mutex_lock(&dev->domain_lock);
> - ret = vduse_dev_reg_umem(dev, umem.iova,
> + asid = array_index_nospec(umem.asid, dev->nas);
> + ret = vduse_dev_reg_umem(dev, asid, umem.iova,
> umem.uaddr, umem.size);
> mutex_unlock(&dev->domain_lock);
> break;
> }
> case VDUSE_IOTLB_DEREG_UMEM: {
> struct vduse_iova_umem umem;
> + u32 asid;
>
> ret = -EFAULT;
> if (copy_from_user(&umem, argp, sizeof(umem)))
> @@ -1493,17 +1601,22 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
>
> ret = -EINVAL;
> if (!is_mem_zero((const char *)umem.reserved,
> - sizeof(umem.reserved)))
> + sizeof(umem.reserved)) ||
> + (dev->api_version < VDUSE_API_VERSION_1 &&
> + umem.asid != 0) ||
> + umem.asid >= dev->nas)
> break;
> +
> mutex_lock(&dev->domain_lock);
> - ret = vduse_dev_dereg_umem(dev, umem.iova,
> + asid = array_index_nospec(umem.asid, dev->nas);
> + ret = vduse_dev_dereg_umem(dev, asid, umem.iova,
> umem.size);
> mutex_unlock(&dev->domain_lock);
> break;
> }
> case VDUSE_IOTLB_GET_INFO: {
> struct vduse_iova_info info;
> - struct vduse_iotlb_entry entry;
> + struct vduse_iotlb_entry_v2 entry;
>
> ret = -EFAULT;
> if (copy_from_user(&info, argp, sizeof(info)))
> @@ -1513,15 +1626,23 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
> sizeof(info.reserved)))
> break;
>
> - entry.start = info.start;
> - entry.last = info.last;
> + if (dev->api_version < VDUSE_API_VERSION_1) {
> + if (info.asid)
> + break;
> + } else if (info.asid >= dev->nas)
> + break;
> +
> + entry.v1.start = info.start;
> + entry.v1.last = info.last;
> + entry.asid = info.asid;
> ret = vduse_dev_iotlb_entry(dev, &entry, NULL,
> &info.capability);
> if (ret < 0)
> break;
>
> - info.start = entry.start;
> - info.last = entry.last;
> + info.start = entry.v1.start;
> + info.last = entry.v1.last;
> + info.asid = entry.asid;
>
> ret = -EFAULT;
> if (copy_to_user(argp, &info, sizeof(info)))
> @@ -1543,8 +1664,10 @@ static int vduse_dev_release(struct inode *inode, struct file *file)
> struct vduse_dev *dev = file->private_data;
>
> mutex_lock(&dev->domain_lock);
> - if (dev->domain)
> - vduse_dev_dereg_umem(dev, 0, dev->domain->bounce_size);
> + for (int i = 0; i < dev->nas; i++)
> + if (dev->as[i].domain)
> + vduse_dev_dereg_umem(dev, i, 0,
> + dev->as[i].domain->bounce_size);
> mutex_unlock(&dev->domain_lock);
> spin_lock(&dev->msg_lock);
> /* Make sure the inflight messages can processed after reconncection */
> @@ -1763,7 +1886,6 @@ static struct vduse_dev *vduse_dev_create(void)
> return NULL;
>
> mutex_init(&dev->lock);
> - mutex_init(&dev->mem_lock);
> mutex_init(&dev->domain_lock);
> spin_lock_init(&dev->msg_lock);
> INIT_LIST_HEAD(&dev->send_list);
> @@ -1814,8 +1936,11 @@ static int vduse_destroy_dev(char *name)
> idr_remove(&vduse_idr, dev->minor);
> kvfree(dev->config);
> vduse_dev_deinit_vqs(dev);
> - if (dev->domain)
> - vduse_domain_destroy(dev->domain);
> + for (int i = 0; i < dev->nas; i++) {
> + if (dev->as[i].domain)
> + vduse_domain_destroy(dev->as[i].domain);
> + }
> + kfree(dev->as);
> kfree(dev->name);
> kfree(dev->groups);
> vduse_dev_destroy(dev);
> @@ -1862,12 +1987,17 @@ static bool vduse_validate_config(struct vduse_dev_config *config,
> sizeof(config->reserved)))
> return false;
>
> - if (api_version < VDUSE_API_VERSION_1 && config->ngroups)
> + if (api_version < VDUSE_API_VERSION_1 &&
> + (config->ngroups || config->nas))
> return false;
>
> - if (api_version >= VDUSE_API_VERSION_1 &&
> - (!config->ngroups || config->ngroups > VDUSE_DEV_MAX_GROUPS))
> - return false;
> + if (api_version >= VDUSE_API_VERSION_1) {
> + if (!config->ngroups || config->ngroups > VDUSE_DEV_MAX_GROUPS)
> + return false;
> +
> + if (!config->nas || config->nas > VDUSE_DEV_MAX_AS)
> + return false;
> + }
>
> if (config->vq_align > PAGE_SIZE)
> return false;
> @@ -1932,7 +2062,8 @@ static ssize_t bounce_size_store(struct device *device,
>
> ret = -EPERM;
> mutex_lock(&dev->domain_lock);
> - if (dev->domain)
> + /* Assuming that if the first domain is allocated, all are allocated */
> + if (dev->as[0].domain)
> goto unlock;
Should we update the per as bounce size here, and if yes, how to
synchronize with need_sync()?
>
> ret = kstrtouint(buf, 10, &bounce_size);
> @@ -1984,6 +2115,14 @@ static int vduse_create_dev(struct vduse_dev_config *config,
> dev->device_features = config->features;
> dev->device_id = config->device_id;
> dev->vendor_id = config->vendor_id;
> +
> + dev->nas = (dev->api_version < VDUSE_API_VERSION_1) ? 1 : config->nas;
> + dev->as = kcalloc(dev->nas, sizeof(dev->as[0]), GFP_KERNEL);
> + if (!dev->as)
> + goto err_as;
> + for (int i = 0; i < dev->nas; i++)
> + mutex_init(&dev->as[i].mem_lock);
> +
> dev->ngroups = (dev->api_version < VDUSE_API_VERSION_1)
> ? 1
> : config->ngroups;
> @@ -1991,8 +2130,11 @@ static int vduse_create_dev(struct vduse_dev_config *config,
> GFP_KERNEL);
> if (!dev->groups)
> goto err_vq_groups;
> - for (u32 i = 0; i < dev->ngroups; ++i)
> + for (u32 i = 0; i < dev->ngroups; ++i) {
> dev->groups[i].dev = dev;
> + rwlock_init(&dev->groups[i].as_lock);
> + dev->groups[i].as = &dev->as[0];
> + }
>
> dev->name = kstrdup(config->name, GFP_KERNEL);
> if (!dev->name)
> @@ -2032,6 +2174,8 @@ static int vduse_create_dev(struct vduse_dev_config *config,
> err_str:
> kfree(dev->groups);
> err_vq_groups:
> + kfree(dev->as);
> +err_as:
> vduse_dev_destroy(dev);
> err:
> return ret;
> @@ -2155,7 +2299,7 @@ static int vduse_dev_init_vdpa(struct vduse_dev *dev, const char *name)
>
> vdev = vdpa_alloc_device(struct vduse_vdpa, vdpa, dev->dev,
> &vduse_vdpa_config_ops, &vduse_map_ops,
> - dev->ngroups, 1, name, true);
> + dev->ngroups, dev->nas, name, true);
> if (IS_ERR(vdev))
> return PTR_ERR(vdev);
>
> @@ -2170,7 +2314,8 @@ static int vdpa_dev_add(struct vdpa_mgmt_dev *mdev, const char *name,
> const struct vdpa_dev_set_config *config)
> {
> struct vduse_dev *dev;
> - int ret;
> + size_t domain_bounce_size;
> + int ret, i;
>
> mutex_lock(&vduse_lock);
> dev = vduse_find_dev(name);
> @@ -2184,29 +2329,38 @@ static int vdpa_dev_add(struct vdpa_mgmt_dev *mdev, const char *name,
> return ret;
>
> mutex_lock(&dev->domain_lock);
> - if (!dev->domain)
> - dev->domain = vduse_domain_create(VDUSE_IOVA_SIZE - 1,
> - dev->bounce_size);
> - mutex_unlock(&dev->domain_lock);
> - if (!dev->domain) {
> - ret = -ENOMEM;
> - goto domain_err;
> + ret = 0;
> +
> + domain_bounce_size = dev->bounce_size / dev->nas;
> + for (i = 0; i < dev->nas; ++i) {
> + dev->as[i].domain = vduse_domain_create(VDUSE_IOVA_SIZE - 1,
> + domain_bounce_size);
> + if (!dev->as[i].domain) {
> + ret = -ENOMEM;
> + goto err;
> + }
> }
>
> + mutex_unlock(&dev->domain_lock);
> +
> ret = _vdpa_register_device(&dev->vdev->vdpa, dev->vq_num);
> - if (ret) {
> - goto register_err;
> - }
> + if (ret)
> + goto err_register;
>
> return 0;
>
> -register_err:
> +err_register:
> mutex_lock(&dev->domain_lock);
> - vduse_domain_destroy(dev->domain);
> - dev->domain = NULL;
> +
> +err:
> + for (int j = 0; j < i; j++) {
> + if (dev->as[j].domain) {
> + vduse_domain_destroy(dev->as[j].domain);
> + dev->as[j].domain = NULL;
> + }
> + }
> mutex_unlock(&dev->domain_lock);
>
> -domain_err:
> put_device(&dev->vdev->vdpa.dev);
>
> return ret;
> diff --git a/include/uapi/linux/vduse.h b/include/uapi/linux/vduse.h
> index a3d51cf6df3a..9e423163d819 100644
> --- a/include/uapi/linux/vduse.h
> +++ b/include/uapi/linux/vduse.h
> @@ -47,7 +47,8 @@ struct vduse_dev_config {
> __u32 vq_num;
> __u32 vq_align;
> __u32 ngroups; /* if VDUSE_API_VERSION >= 1 */
> - __u32 reserved[12];
> + __u32 nas; /* if VDUSE_API_VERSION >= 1 */
> + __u32 reserved[11];
> __u32 config_size;
> __u8 config[];
> };
> @@ -166,6 +167,16 @@ struct vduse_vq_state_packed {
> __u16 last_used_idx;
> };
>
> +/**
> + * struct vduse_vq_group_asid - virtqueue group ASID
> + * @group: Index of the virtqueue group
> + * @asid: Address space ID of the group
> + */
> +struct vduse_vq_group_asid {
> + __u32 group;
> + __u32 asid;
> +};
> +
> /**
> * struct vduse_vq_info - information of a virtqueue
> * @index: virtqueue index
> @@ -225,6 +236,7 @@ struct vduse_vq_eventfd {
> * @uaddr: start address of userspace memory, it must be aligned to page size
> * @iova: start of the IOVA region
> * @size: size of the IOVA region
> + * @asid: Address space ID of the IOVA region
> * @reserved: for future use, needs to be initialized to zero
> *
> * Structure used by VDUSE_IOTLB_REG_UMEM and VDUSE_IOTLB_DEREG_UMEM
> @@ -234,7 +246,8 @@ struct vduse_iova_umem {
> __u64 uaddr;
> __u64 iova;
> __u64 size;
> - __u64 reserved[3];
> + __u32 asid;
> + __u32 reserved[5];
> };
>
> /* Register userspace memory for IOVA regions */
> @@ -248,6 +261,7 @@ struct vduse_iova_umem {
> * @start: start of the IOVA region
> * @last: last of the IOVA region
> * @capability: capability of the IOVA region
> + * @asid: Address space ID of the IOVA region, only if device API version >= 1
> * @reserved: for future use, needs to be initialized to zero
> *
> * Structure used by VDUSE_IOTLB_GET_INFO ioctl to get information of
> @@ -258,7 +272,8 @@ struct vduse_iova_info {
> __u64 last;
> #define VDUSE_IOVA_CAP_UMEM (1 << 0)
> __u64 capability;
> - __u64 reserved[3];
> + __u32 asid; /* Only if device API version >= 1 */
> + __u32 reserved[5];
> };
>
> /*
> @@ -267,6 +282,28 @@ struct vduse_iova_info {
> */
> #define VDUSE_IOTLB_GET_INFO _IOWR(VDUSE_BASE, 0x1a, struct vduse_iova_info)
>
> +/**
> + * struct vduse_iotlb_entry_v2 - entry of IOTLB to describe one IOVA region
> + *
> + * @v1: the original vduse_iotlb_entry
> + * @asid: address space ID of the IOVA region
> + * @reserver: for future use, needs to be initialized to zero
> + *
> + * Structure used by VDUSE_IOTLB_GET_FD2 ioctl to find an overlapped IOVA region.
> + */
> +struct vduse_iotlb_entry_v2 {
> + struct vduse_iotlb_entry v1;
> + __u32 asid;
> + __u32 reserved[12];
> +};
> +
> +/*
> + * Same as VDUSE_IOTLB_GET_FD but with vduse_iotlb_entry_v2 argument that
> + * support extra fields.
> + */
> +#define VDUSE_IOTLB_GET_FD2 _IOWR(VDUSE_BASE, 0x1b, struct vduse_iotlb_entry_v2)
> +
> +
> /* The control messages definition for read(2)/write(2) on /dev/vduse/$NAME */
>
> /**
> @@ -280,6 +317,7 @@ enum vduse_req_type {
> VDUSE_GET_VQ_STATE,
> VDUSE_SET_STATUS,
> VDUSE_UPDATE_IOTLB,
> + VDUSE_SET_VQ_GROUP_ASID,
> };
>
> /**
> @@ -314,6 +352,18 @@ struct vduse_iova_range {
> __u64 last;
> };
>
> +/**
> + * struct vduse_iova_range - IOVA range [start, last] if API_VERSION >= 1
> + * @start: start of the IOVA range
> + * @last: last of the IOVA range
> + * @asid: address space ID of the IOVA range
> + */
> +struct vduse_iova_range_v2 {
> + __u64 start;
> + __u64 last;
> + __u32 asid;
> +};
> +
> /**
> * struct vduse_dev_request - control request
> * @type: request type
> @@ -322,6 +372,8 @@ struct vduse_iova_range {
> * @vq_state: virtqueue state, only index field is available
> * @s: device status
> * @iova: IOVA range for updating
> + * @iova_v2: IOVA range for updating if API_VERSION >= 1
> + * @vq_group_asid: ASID of a virtqueue group
> * @padding: padding
> *
> * Structure used by read(2) on /dev/vduse/$NAME.
> @@ -334,6 +386,11 @@ struct vduse_dev_request {
> struct vduse_vq_state vq_state;
> struct vduse_dev_status s;
> struct vduse_iova_range iova;
> + /* Following members but padding exist only if vduse api
> + * version >= 1
> + */;
Unnecessary tailing ';'.
> + struct vduse_iova_range_v2 iova_v2;
> + struct vduse_vq_group_asid vq_group_asid;
> __u32 padding[32];
> };
> };
> --
> 2.52.0
>
^ permalink raw reply [flat|nested] 40+ messages in thread* Re: [PATCH v11 11/12] vduse: add vq group asid support
2026-01-13 6:23 ` Jason Wang
@ 2026-01-13 14:38 ` Eugenio Perez Martin
2026-01-14 7:32 ` Jason Wang
0 siblings, 1 reply; 40+ messages in thread
From: Eugenio Perez Martin @ 2026-01-13 14:38 UTC (permalink / raw)
To: Jason Wang
Cc: Michael S . Tsirkin, linux-kernel, virtualization,
Maxime Coquelin, Laurent Vivier, Cindy Lu, Xuan Zhuo,
Stefano Garzarella, Yongji Xie
On Tue, Jan 13, 2026 at 7:23 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Fri, Jan 9, 2026 at 11:25 PM Eugenio Pérez <eperezma@redhat.com> wrote:
> >
> > Add support for assigning Address Space Identifiers (ASIDs) to each VQ
> > group. This enables mapping each group into a distinct memory space.
> >
> > The vq group to ASID association is protected by a rwlock now. But the
> > mutex domain_lock keeps protecting the domains of all ASIDs, as some
> > operations like the one related with the bounce buffer size still
> > requires to lock all the ASIDs.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> >
> > ---
> > Future improvements can include performance optimizations on top like
> > ore to RCU or thread synchronized atomics, or hardening by tracking ASID
> > or ASID hashes on unused bits of the DMA address.
> >
> > Tested virtio_vdpa by adding manually two threads in vduse_set_status:
> > one of them modifies the vq group 0 ASID and the other one map and unmap
> > memory continuously. After a while, the two threads stop and the usual
> > work continues. Test with version 0, version 1 with the old ioctl, and
> > verion 1 with the new ioctl.
> >
> > Tested with vhost_vdpa by migrating a VM while ping on OVS+VDUSE. A few
> > workaround were needed in some parts:
> > * Do not enable CVQ before data vqs in QEMU, as VDUSE does not forward
> > the enable message to the userland device. This will be solved in the
> > future.
> > * Share the suspended state between all vhost devices in QEMU:
> > https://lists.nongnu.org/archive/html/qemu-devel/2025-11/msg02947.html
> > * Implement a fake VDUSE suspend vdpa operation callback that always
> > returns true in the kernel. DPDK suspend the device at the first
> > GET_VRING_BASE.
> > * Remove the CVQ blocker in ASID.
> >
> > The driver vhost_vdpa was also tested with version 0, version 1 with the
> > old ioctl, version 1 with the new ioctl but only one ASID, and version 1
> > with many ASID.
>
> I think we'd better update the Documentation/userspace-api/vduse.rst
> for the new uAPI.
>
Good point! I'll do it for the next version, thanks!
> >
> > ---
> > v11:
> > * Remove duplicated free_pages_exact in vduse_domain_free_coherent
> > (Jason).
> > * Do not take the vq groups lock if nas == 1.
> > * Do not reset the vq group ASID in vq reset (Jason). Removed extra
> > function vduse_set_group_asid_nomsg, not needed anymore.
> > * Move the vduse_iotlb_entry_v2 argument to a new ioctl, as argument
> > didn't match the previous VDUSE_IOTLB_GET_FD.
> > * Move the asid < dev->nas check to vdpa core.
> >
> > v10:
> > * Back to rwlock version so stronger locks are used.
> > * Take out allocations from rwlock.
> > * Forbid changing ASID of a vq group after DRIVER_OK (Jason)
> > * Remove bad fetching again of domain variable in
> > vduse_dev_max_mapping_size (Yongji).
> > * Remove unused vdev definition in vdpa map_ops callbacks (kernel test
> > robot).
> >
> > v9:
> > * Replace mutex with rwlock, as the vdpa map_ops can run from atomic
> > context.
> >
> > v8:
> > * Revert the mutex to rwlock change, it needs proper profiling to
> > justify it.
> >
> > v7:
> > * Take write lock in the error path (Jason).
> >
> > v6:
> > * Make vdpa_dev_add use gotos for error handling (MST).
> > * s/(dev->api_version < 1) ?/(dev->api_version < VDUSE_API_VERSION_1) ?/
> > (MST).
> > * Fix struct name not matching in the doc.
> >
> > v5:
> > * Properly return errno if copy_to_user returns >0 in VDUSE_IOTLB_GET_FD
> > ioctl (Jason).
> > * Properly set domain bounce size to divide equally between nas (Jason).
> > * Exclude "padding" member from the only >V1 members in
> > vduse_dev_request.
> >
> > v4:
> > * Divide each domain bounce size between the device bounce size (Jason).
> > * revert unneeded addr = NULL assignment (Jason)
> > * Change if (x && (y || z)) return to if (x) { if (y) return; if (z)
> > return; } (Jason)
> > * Change a bad multiline comment, using @ caracter instead of * (Jason).
> > * Consider config->nas == 0 as a fail (Jason).
> >
> > v3:
> > * Get the vduse domain through the vduse_as in the map functions
> > (Jason).
> > * Squash with the patch creating the vduse_as struct (Jason).
> > * Create VDUSE_DEV_MAX_AS instead of comparing agains a magic number
> > (Jason)
> >
> > v2:
> > * Convert the use of mutex to rwlock.
> >
> > RFC v3:
> > * Increase VDUSE_MAX_VQ_GROUPS to 0xffff (Jason). It was set to a lower
> > value to reduce memory consumption, but vqs are already limited to
> > that value and userspace VDUSE is able to allocate that many vqs.
> > * Remove TODO about merging VDUSE_IOTLB_GET_FD ioctl with
> > VDUSE_IOTLB_GET_INFO.
> > * Use of array_index_nospec in VDUSE device ioctls.
> > * Embed vduse_iotlb_entry into vduse_iotlb_entry_v2.
> > * Move the umem mutex to asid struct so there is no contention between
> > ASIDs.
> >
> > RFC v2:
> > * Make iotlb entry the last one of vduse_iotlb_entry_v2 so the first
> > part of the struct is the same.
> > ---
> > drivers/vdpa/vdpa_user/vduse_dev.c | 392 ++++++++++++++++++++---------
> > include/uapi/linux/vduse.h | 63 ++++-
> > 2 files changed, 333 insertions(+), 122 deletions(-)
> >
> > diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
> > index bf437816fd7d..8227b5e9f3f6 100644
> > --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> > +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> > @@ -41,6 +41,7 @@
> >
> > #define VDUSE_DEV_MAX (1U << MINORBITS)
> > #define VDUSE_DEV_MAX_GROUPS 0xffff
> > +#define VDUSE_DEV_MAX_AS 0xffff
> > #define VDUSE_MAX_BOUNCE_SIZE (1024 * 1024 * 1024)
> > #define VDUSE_MIN_BOUNCE_SIZE (1024 * 1024)
> > #define VDUSE_BOUNCE_SIZE (64 * 1024 * 1024)
> > @@ -86,7 +87,15 @@ struct vduse_umem {
> > struct mm_struct *mm;
> > };
> >
> > +struct vduse_as {
> > + struct vduse_iova_domain *domain;
> > + struct vduse_umem *umem;
> > + struct mutex mem_lock;
> > +};
> > +
> > struct vduse_vq_group {
> > + rwlock_t as_lock;
> > + struct vduse_as *as; /* Protected by as_lock */
> > struct vduse_dev *dev;
> > };
> >
> > @@ -94,7 +103,7 @@ struct vduse_dev {
> > struct vduse_vdpa *vdev;
> > struct device *dev;
> > struct vduse_virtqueue **vqs;
> > - struct vduse_iova_domain *domain;
> > + struct vduse_as *as;
> > char *name;
> > struct mutex lock;
> > spinlock_t msg_lock;
> > @@ -122,9 +131,8 @@ struct vduse_dev {
> > u32 vq_num;
> > u32 vq_align;
> > u32 ngroups;
> > - struct vduse_umem *umem;
> > + u32 nas;
> > struct vduse_vq_group *groups;
> > - struct mutex mem_lock;
> > unsigned int bounce_size;
> > struct mutex domain_lock;
> > };
> > @@ -314,7 +322,7 @@ static int vduse_dev_set_status(struct vduse_dev *dev, u8 status)
> > return vduse_dev_msg_sync(dev, &msg);
> > }
> >
> > -static int vduse_dev_update_iotlb(struct vduse_dev *dev,
> > +static int vduse_dev_update_iotlb(struct vduse_dev *dev, u32 asid,
> > u64 start, u64 last)
> > {
> > struct vduse_dev_msg msg = { 0 };
> > @@ -323,8 +331,14 @@ static int vduse_dev_update_iotlb(struct vduse_dev *dev,
> > return -EINVAL;
> >
> > msg.req.type = VDUSE_UPDATE_IOTLB;
> > - msg.req.iova.start = start;
> > - msg.req.iova.last = last;
> > + if (dev->api_version < VDUSE_API_VERSION_1) {
> > + msg.req.iova.start = start;
> > + msg.req.iova.last = last;
> > + } else {
> > + msg.req.iova_v2.start = start;
> > + msg.req.iova_v2.last = last;
> > + msg.req.iova_v2.asid = asid;
> > + }
> >
> > return vduse_dev_msg_sync(dev, &msg);
> > }
> > @@ -439,11 +453,14 @@ static __poll_t vduse_dev_poll(struct file *file, poll_table *wait)
> > static void vduse_dev_reset(struct vduse_dev *dev)
> > {
> > int i;
> > - struct vduse_iova_domain *domain = dev->domain;
> >
> > /* The coherent mappings are handled in vduse_dev_free_coherent() */
> > - if (domain && domain->bounce_map)
> > - vduse_domain_reset_bounce_map(domain);
> > + for (i = 0; i < dev->nas; i++) {
> > + struct vduse_iova_domain *domain = dev->as[i].domain;
> > +
> > + if (domain && domain->bounce_map)
> > + vduse_domain_reset_bounce_map(domain);
> > + }
> >
> > down_write(&dev->rwsem);
> >
> > @@ -622,6 +639,31 @@ static union virtio_map vduse_get_vq_map(struct vdpa_device *vdpa, u16 idx)
> > return ret;
> > }
> >
> > +static int vduse_set_group_asid(struct vdpa_device *vdpa, unsigned int group,
> > + unsigned int asid)
> > +{
> > + struct vduse_dev *dev = vdpa_to_vduse(vdpa);
> > + struct vduse_dev_msg msg = { 0 };
> > + int r;
> > +
> > + if (dev->api_version < VDUSE_API_VERSION_1)
> > + return -EINVAL;
> > +
> > + msg.req.type = VDUSE_SET_VQ_GROUP_ASID;
> > + msg.req.vq_group_asid.group = group;
> > + msg.req.vq_group_asid.asid = asid;
> > +
> > + r = vduse_dev_msg_sync(dev, &msg);
> > + if (r < 0)
> > + return r;
> > +
> > + write_lock(&dev->groups[group].as_lock);
> > + dev->groups[group].as = &dev->as[asid];
> > + write_unlock(&dev->groups[group].as_lock);
> > +
> > + return 0;
> > +}
> > +
> > static int vduse_vdpa_get_vq_state(struct vdpa_device *vdpa, u16 idx,
> > struct vdpa_vq_state *state)
> > {
> > @@ -793,13 +835,13 @@ static int vduse_vdpa_set_map(struct vdpa_device *vdpa,
> > struct vduse_dev *dev = vdpa_to_vduse(vdpa);
> > int ret;
> >
> > - ret = vduse_domain_set_map(dev->domain, iotlb);
> > + ret = vduse_domain_set_map(dev->as[asid].domain, iotlb);
> > if (ret)
> > return ret;
> >
> > - ret = vduse_dev_update_iotlb(dev, 0ULL, ULLONG_MAX);
> > + ret = vduse_dev_update_iotlb(dev, asid, 0ULL, ULLONG_MAX);
> > if (ret) {
> > - vduse_domain_clear_map(dev->domain, iotlb);
> > + vduse_domain_clear_map(dev->as[asid].domain, iotlb);
> > return ret;
> > }
> >
> > @@ -842,6 +884,7 @@ static const struct vdpa_config_ops vduse_vdpa_config_ops = {
> > .get_vq_affinity = vduse_vdpa_get_vq_affinity,
> > .reset = vduse_vdpa_reset,
> > .set_map = vduse_vdpa_set_map,
> > + .set_group_asid = vduse_set_group_asid,
> > .get_vq_map = vduse_get_vq_map,
> > .free = vduse_vdpa_free,
> > };
> > @@ -850,32 +893,38 @@ static void vduse_dev_sync_single_for_device(union virtio_map token,
> > dma_addr_t dma_addr, size_t size,
> > enum dma_data_direction dir)
> > {
> > - struct vduse_dev *vdev;
> > struct vduse_iova_domain *domain;
> >
> > if (!token.group)
> > return;
> >
> > - vdev = token.group->dev;
> > - domain = vdev->domain;
> > + if (token.group->dev->nas > 1)
> > + read_lock(&token.group->as_lock);
>
> I suggest to factor this and unlock out into helpers.
>
I'm moving to scroped guards.
> >
> > + domain = token.group->as->domain;
> > vduse_domain_sync_single_for_device(domain, dma_addr, size, dir);
> > +
> > + if (token.group->dev->nas > 1)
> > + read_unlock(&token.group->as_lock);
> > }
> >
> > static void vduse_dev_sync_single_for_cpu(union virtio_map token,
> > dma_addr_t dma_addr, size_t size,
> > enum dma_data_direction dir)
> > {
> > - struct vduse_dev *vdev;
> > struct vduse_iova_domain *domain;
> >
> > if (!token.group)
> > return;
> >
> > - vdev = token.group->dev;
> > - domain = vdev->domain;
> > + if (token.group->dev->nas > 1)
> > + read_lock(&token.group->as_lock);
> >
> > + domain = token.group->as->domain;
> > vduse_domain_sync_single_for_cpu(domain, dma_addr, size, dir);
> > +
> > + if (token.group->dev->nas > 1)
> > + read_unlock(&token.group->as_lock);
> > }
> >
> > static dma_addr_t vduse_dev_map_page(union virtio_map token, struct page *page,
> > @@ -883,38 +932,45 @@ static dma_addr_t vduse_dev_map_page(union virtio_map token, struct page *page,
> > enum dma_data_direction dir,
> > unsigned long attrs)
> > {
> > - struct vduse_dev *vdev;
> > struct vduse_iova_domain *domain;
> > + dma_addr_t r;
> >
> > if (!token.group)
> > return DMA_MAPPING_ERROR;
> >
> > - vdev = token.group->dev;
> > - domain = vdev->domain;
> > + if (token.group->dev->nas > 1)
> > + read_lock(&token.group->as_lock);
> > + domain = token.group->as->domain;
> > + r = vduse_domain_map_page(domain, page, offset, size, dir, attrs);
> >
> > - return vduse_domain_map_page(domain, page, offset, size, dir, attrs);
> > + if (token.group->dev->nas > 1)
> > + read_unlock(&token.group->as_lock);
> > +
> > + return r;
> > }
> >
> > static void vduse_dev_unmap_page(union virtio_map token, dma_addr_t dma_addr,
> > size_t size, enum dma_data_direction dir,
> > unsigned long attrs)
> > {
> > - struct vduse_dev *vdev;
> > struct vduse_iova_domain *domain;
> >
> > if (!token.group)
> > return;
> >
> > - vdev = token.group->dev;
> > - domain = vdev->domain;
> > + if (token.group->dev->nas > 1)
> > + read_lock(&token.group->as_lock);
> > +
> > + domain = token.group->as->domain;
> > + vduse_domain_unmap_page(domain, dma_addr, size, dir, attrs);
> >
> > - return vduse_domain_unmap_page(domain, dma_addr, size, dir, attrs);
> > + if (token.group->dev->nas > 1)
> > + read_unlock(&token.group->as_lock);
> > }
> >
> > static void *vduse_dev_alloc_coherent(union virtio_map token, size_t size,
> > dma_addr_t *dma_addr, gfp_t flag)
> > {
> > - struct vduse_dev *vdev;
> > struct vduse_iova_domain *domain;
> > unsigned long iova;
> > void *addr;
> > @@ -927,8 +983,10 @@ static void *vduse_dev_alloc_coherent(union virtio_map token, size_t size,
> > if (!addr)
> > return NULL;
> >
> > - vdev = token.group->dev;
> > - domain = vdev->domain;
> > + if (token.group->dev->nas > 1)
> > + read_lock(&token.group->as_lock);
> > +
> > + domain = token.group->as->domain;
> > addr = vduse_domain_alloc_coherent(domain, size,
> > (dma_addr_t *)&iova, addr);
> > if (!addr)
> > @@ -936,9 +994,14 @@ static void *vduse_dev_alloc_coherent(union virtio_map token, size_t size,
> >
> > *dma_addr = (dma_addr_t)iova;
> >
> > + if (token.group->dev->nas > 1)
> > + read_unlock(&token.group->as_lock);
> > +
> > return addr;
> >
> > err:
> > + if (token.group->dev->nas > 1)
> > + read_unlock(&token.group->as_lock);
> > free_pages_exact(addr, size);
> > return NULL;
> > }
> > @@ -947,31 +1010,39 @@ static void vduse_dev_free_coherent(union virtio_map token, size_t size,
> > void *vaddr, dma_addr_t dma_addr,
> > unsigned long attrs)
> > {
> > - struct vduse_dev *vdev;
> > struct vduse_iova_domain *domain;
> >
> > if (!token.group)
> > return;
> >
> > - vdev = token.group->dev;
> > - domain = vdev->domain;
> > + if (token.group->dev->nas > 1)
> > + read_lock(&token.group->as_lock);
> >
> > + domain = token.group->as->domain;
> > vduse_domain_free_coherent(domain, size, dma_addr, attrs);
> > +
> > + if (token.group->dev->nas > 1)
> > + read_unlock(&token.group->as_lock);
> > +
> > free_pages_exact(vaddr, size);
> > }
> >
> > static bool vduse_dev_need_sync(union virtio_map token, dma_addr_t dma_addr)
> > {
> > - struct vduse_dev *vdev;
> > - struct vduse_iova_domain *domain;
> > + size_t bounce_size;
> >
> > if (!token.group)
> > return false;
> >
> > - vdev = token.group->dev;
> > - domain = vdev->domain;
> > + if (token.group->dev->nas > 1)
> > + read_lock(&token.group->as_lock);
> > +
> > + bounce_size = token.group->as->domain->bounce_size;
> > +
> > + if (token.group->dev->nas > 1)
> > + read_unlock(&token.group->as_lock);
> >
> > - return dma_addr < domain->bounce_size;
> > + return dma_addr < bounce_size;
> > }
> >
> > static int vduse_dev_mapping_error(union virtio_map token, dma_addr_t dma_addr)
> > @@ -983,16 +1054,20 @@ static int vduse_dev_mapping_error(union virtio_map token, dma_addr_t dma_addr)
> >
> > static size_t vduse_dev_max_mapping_size(union virtio_map token)
> > {
> > - struct vduse_dev *vdev;
> > - struct vduse_iova_domain *domain;
> > + size_t bounce_size;
> >
> > if (!token.group)
> > return 0;
> >
> > - vdev = token.group->dev;
> > - domain = vdev->domain;
> > + if (token.group->dev->nas > 1)
> > + read_lock(&token.group->as_lock);
> > +
> > + bounce_size = token.group->as->domain->bounce_size;
> > +
> > + if (token.group->dev->nas > 1)
> > + read_unlock(&token.group->as_lock);
> >
> > - return domain->bounce_size;
> > + return bounce_size;
> > }
> >
> > static const struct virtio_map_ops vduse_map_ops = {
> > @@ -1132,39 +1207,40 @@ static int vduse_dev_queue_irq_work(struct vduse_dev *dev,
> > return ret;
> > }
> >
> > -static int vduse_dev_dereg_umem(struct vduse_dev *dev,
> > +static int vduse_dev_dereg_umem(struct vduse_dev *dev, u32 asid,
> > u64 iova, u64 size)
> > {
> > int ret;
> >
> > - mutex_lock(&dev->mem_lock);
> > + mutex_lock(&dev->as[asid].mem_lock);
> > ret = -ENOENT;
> > - if (!dev->umem)
> > + if (!dev->as[asid].umem)
> > goto unlock;
> >
> > ret = -EINVAL;
> > - if (!dev->domain)
> > + if (!dev->as[asid].domain)
> > goto unlock;
> >
> > - if (dev->umem->iova != iova || size != dev->domain->bounce_size)
> > + if (dev->as[asid].umem->iova != iova ||
> > + size != dev->as[asid].domain->bounce_size)
> > goto unlock;
> >
> > - vduse_domain_remove_user_bounce_pages(dev->domain);
> > - unpin_user_pages_dirty_lock(dev->umem->pages,
> > - dev->umem->npages, true);
> > - atomic64_sub(dev->umem->npages, &dev->umem->mm->pinned_vm);
> > - mmdrop(dev->umem->mm);
> > - vfree(dev->umem->pages);
> > - kfree(dev->umem);
> > - dev->umem = NULL;
> > + vduse_domain_remove_user_bounce_pages(dev->as[asid].domain);
> > + unpin_user_pages_dirty_lock(dev->as[asid].umem->pages,
> > + dev->as[asid].umem->npages, true);
> > + atomic64_sub(dev->as[asid].umem->npages, &dev->as[asid].umem->mm->pinned_vm);
> > + mmdrop(dev->as[asid].umem->mm);
> > + vfree(dev->as[asid].umem->pages);
> > + kfree(dev->as[asid].umem);
> > + dev->as[asid].umem = NULL;
> > ret = 0;
> > unlock:
> > - mutex_unlock(&dev->mem_lock);
> > + mutex_unlock(&dev->as[asid].mem_lock);
> > return ret;
> > }
> >
> > static int vduse_dev_reg_umem(struct vduse_dev *dev,
> > - u64 iova, u64 uaddr, u64 size)
> > + u32 asid, u64 iova, u64 uaddr, u64 size)
> > {
> > struct page **page_list = NULL;
> > struct vduse_umem *umem = NULL;
> > @@ -1172,14 +1248,14 @@ static int vduse_dev_reg_umem(struct vduse_dev *dev,
> > unsigned long npages, lock_limit;
> > int ret;
> >
> > - if (!dev->domain || !dev->domain->bounce_map ||
> > - size != dev->domain->bounce_size ||
> > + if (!dev->as[asid].domain || !dev->as[asid].domain->bounce_map ||
> > + size != dev->as[asid].domain->bounce_size ||
> > iova != 0 || uaddr & ~PAGE_MASK)
> > return -EINVAL;
> >
> > - mutex_lock(&dev->mem_lock);
> > + mutex_lock(&dev->as[asid].mem_lock);
> > ret = -EEXIST;
> > - if (dev->umem)
> > + if (dev->as[asid].umem)
> > goto unlock;
> >
> > ret = -ENOMEM;
> > @@ -1203,7 +1279,7 @@ static int vduse_dev_reg_umem(struct vduse_dev *dev,
> > goto out;
> > }
> >
> > - ret = vduse_domain_add_user_bounce_pages(dev->domain,
> > + ret = vduse_domain_add_user_bounce_pages(dev->as[asid].domain,
> > page_list, pinned);
> > if (ret)
> > goto out;
> > @@ -1216,7 +1292,7 @@ static int vduse_dev_reg_umem(struct vduse_dev *dev,
> > umem->mm = current->mm;
> > mmgrab(current->mm);
> >
> > - dev->umem = umem;
> > + dev->as[asid].umem = umem;
> > out:
> > if (ret && pinned > 0)
> > unpin_user_pages(page_list, pinned);
> > @@ -1227,7 +1303,7 @@ static int vduse_dev_reg_umem(struct vduse_dev *dev,
> > vfree(page_list);
> > kfree(umem);
> > }
> > - mutex_unlock(&dev->mem_lock);
> > + mutex_unlock(&dev->as[asid].mem_lock);
> > return ret;
> > }
> >
> > @@ -1248,43 +1324,46 @@ static void vduse_vq_update_effective_cpu(struct vduse_virtqueue *vq)
> > }
> >
> > static int vduse_dev_iotlb_entry(struct vduse_dev *dev,
> > - struct vduse_iotlb_entry *entry,
> > + struct vduse_iotlb_entry_v2 *entry,
> > struct file **f, uint64_t *capability)
> > {
> > + u32 asid;
> > int r = -EINVAL;
> > struct vhost_iotlb_map *map;
> > const struct vdpa_map_file *map_file;
> >
> > - if (entry->start > entry->last)
> > + if (entry->v1.start > entry->v1.last || entry->asid >= dev->nas)
> > return -EINVAL;
> >
> > + asid = array_index_nospec(entry->asid, dev->nas);
> > mutex_lock(&dev->domain_lock);
> > - if (!dev->domain)
> > +
> > + if (!dev->as[asid].domain)
> > goto out;
> >
> > - spin_lock(&dev->domain->iotlb_lock);
> > - map = vhost_iotlb_itree_first(dev->domain->iotlb, entry->start,
> > - entry->last);
> > + spin_lock(&dev->as[asid].domain->iotlb_lock);
> > + map = vhost_iotlb_itree_first(dev->as[asid].domain->iotlb,
> > + entry->v1.start, entry->v1.last);
> > if (map) {
> > if (f) {
> > map_file = (struct vdpa_map_file *)map->opaque;
> > *f = get_file(map_file->file);
> > }
> > - entry->offset = map_file->offset;
> > - entry->start = map->start;
> > - entry->last = map->last;
> > - entry->perm = map->perm;
> > + entry->v1.offset = map_file->offset;
> > + entry->v1.start = map->start;
> > + entry->v1.last = map->last;
> > + entry->v1.perm = map->perm;
> > if (capability) {
> > *capability = 0;
> >
> > - if (dev->domain->bounce_map && map->start == 0 &&
> > - map->last == dev->domain->bounce_size - 1)
> > + if (dev->as[asid].domain->bounce_map && map->start == 0 &&
> > + map->last == dev->as[asid].domain->bounce_size - 1)
> > *capability |= VDUSE_IOVA_CAP_UMEM;
> > }
> >
> > r = 0;
> > }
> > - spin_unlock(&dev->domain->iotlb_lock);
> > + spin_unlock(&dev->as[asid].domain->iotlb_lock);
> >
> > out:
> > mutex_unlock(&dev->domain_lock);
> > @@ -1302,12 +1381,29 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
> > return -EPERM;
> >
> > switch (cmd) {
> > - case VDUSE_IOTLB_GET_FD: {
> > - struct vduse_iotlb_entry entry;
> > + case VDUSE_IOTLB_GET_FD:
> > + case VDUSE_IOTLB_GET_FD2: {
>
> I would rename this as GET_FD_ASID
It does not scale very well. What if we add more fields to the
reserved members of the ioctl argument?
> but I wonder the reason for the new
> ioctl. I think we can deduce it from the API version,
It's in the changelog but let me know if I should expand there.
The reason for a new ioctl is that the previous one is defined in the
uapi with a struct size that is not able to hold the addition of the
asid. From the documentation [1]:
"the command number encodes the sizeof(data_type) value in a 13-bit or
14-bit integer"
I'm not sure if any sanitizer checks this or if it will be implemented
in the future though. My interpretation of the next section of the
ioctl documentation "Interface versions" also recommends creating a
new versioned ioctl.
Sadly I didn't realize this in the first versions so I introduced it later.
> for example, you
> didn't introduce VDUSE_IOTLB_REG_UMEM2.
>
VDUSE_IOTLB_REG_UMEM already has room for extension in the form of
reserved members that must be zero in the master branch.
> > + struct vduse_iotlb_entry_v2 entry = {0};
> > struct file *f = NULL;
> >
> > + ret = -ENOIOCTLCMD;
> > + if (dev->api_version < VDUSE_API_VERSION_1 &&
> > + cmd == VDUSE_IOTLB_GET_FD2)
> > + break;
> > +
> > ret = -EFAULT;
> > - if (copy_from_user(&entry, argp, sizeof(entry)))
> > + if (cmd == VDUSE_IOTLB_GET_FD2) {
> > + if (copy_from_user(&entry, argp, sizeof(entry)))
> > + break;
> > + } else {
> > + if (copy_from_user(&entry.v1, argp,
> > + sizeof(entry.v1)))
> > + break;
> > + }
> > +
> > + ret = -EINVAL;
> > + if (!is_mem_zero((const char *)entry.reserved,
> > + sizeof(entry.reserved)))
> > break;
> >
> > ret = vduse_dev_iotlb_entry(dev, &entry, &f, NULL);
> > @@ -1318,12 +1414,19 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
> > if (!f)
> > break;
> >
> > - ret = -EFAULT;
> > - if (copy_to_user(argp, &entry, sizeof(entry))) {
> > + if (cmd == VDUSE_IOTLB_GET_FD2)
> > + ret = copy_to_user(argp, &entry,
> > + sizeof(entry));
> > + else
> > + ret = copy_to_user(argp, &entry.v1,
> > + sizeof(entry.v1));
> > +
> > + if (ret) {
> > + ret = -EFAULT;
> > fput(f);
> > break;
> > }
> > - ret = receive_fd(f, NULL, perm_to_file_flags(entry.perm));
> > + ret = receive_fd(f, NULL, perm_to_file_flags(entry.v1.perm));
> > fput(f);
> > break;
> > }
> > @@ -1468,6 +1571,7 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
> > }
> > case VDUSE_IOTLB_REG_UMEM: {
> > struct vduse_iova_umem umem;
> > + u32 asid;
> >
> > ret = -EFAULT;
> > if (copy_from_user(&umem, argp, sizeof(umem)))
> > @@ -1475,17 +1579,21 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
> >
> > ret = -EINVAL;
> > if (!is_mem_zero((const char *)umem.reserved,
> > - sizeof(umem.reserved)))
> > + sizeof(umem.reserved)) ||
> > + (dev->api_version < VDUSE_API_VERSION_1 &&
> > + umem.asid != 0) || umem.asid >= dev->nas)
> > break;
> >
> > mutex_lock(&dev->domain_lock);
> > - ret = vduse_dev_reg_umem(dev, umem.iova,
> > + asid = array_index_nospec(umem.asid, dev->nas);
> > + ret = vduse_dev_reg_umem(dev, asid, umem.iova,
> > umem.uaddr, umem.size);
> > mutex_unlock(&dev->domain_lock);
> > break;
> > }
> > case VDUSE_IOTLB_DEREG_UMEM: {
> > struct vduse_iova_umem umem;
> > + u32 asid;
> >
> > ret = -EFAULT;
> > if (copy_from_user(&umem, argp, sizeof(umem)))
> > @@ -1493,17 +1601,22 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
> >
> > ret = -EINVAL;
> > if (!is_mem_zero((const char *)umem.reserved,
> > - sizeof(umem.reserved)))
> > + sizeof(umem.reserved)) ||
> > + (dev->api_version < VDUSE_API_VERSION_1 &&
> > + umem.asid != 0) ||
> > + umem.asid >= dev->nas)
> > break;
> > +
> > mutex_lock(&dev->domain_lock);
> > - ret = vduse_dev_dereg_umem(dev, umem.iova,
> > + asid = array_index_nospec(umem.asid, dev->nas);
> > + ret = vduse_dev_dereg_umem(dev, asid, umem.iova,
> > umem.size);
> > mutex_unlock(&dev->domain_lock);
> > break;
> > }
> > case VDUSE_IOTLB_GET_INFO: {
> > struct vduse_iova_info info;
> > - struct vduse_iotlb_entry entry;
> > + struct vduse_iotlb_entry_v2 entry;
> >
> > ret = -EFAULT;
> > if (copy_from_user(&info, argp, sizeof(info)))
> > @@ -1513,15 +1626,23 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
> > sizeof(info.reserved)))
> > break;
> >
> > - entry.start = info.start;
> > - entry.last = info.last;
> > + if (dev->api_version < VDUSE_API_VERSION_1) {
> > + if (info.asid)
> > + break;
> > + } else if (info.asid >= dev->nas)
> > + break;
> > +
> > + entry.v1.start = info.start;
> > + entry.v1.last = info.last;
> > + entry.asid = info.asid;
> > ret = vduse_dev_iotlb_entry(dev, &entry, NULL,
> > &info.capability);
> > if (ret < 0)
> > break;
> >
> > - info.start = entry.start;
> > - info.last = entry.last;
> > + info.start = entry.v1.start;
> > + info.last = entry.v1.last;
> > + info.asid = entry.asid;
> >
> > ret = -EFAULT;
> > if (copy_to_user(argp, &info, sizeof(info)))
> > @@ -1543,8 +1664,10 @@ static int vduse_dev_release(struct inode *inode, struct file *file)
> > struct vduse_dev *dev = file->private_data;
> >
> > mutex_lock(&dev->domain_lock);
> > - if (dev->domain)
> > - vduse_dev_dereg_umem(dev, 0, dev->domain->bounce_size);
> > + for (int i = 0; i < dev->nas; i++)
> > + if (dev->as[i].domain)
> > + vduse_dev_dereg_umem(dev, i, 0,
> > + dev->as[i].domain->bounce_size);
> > mutex_unlock(&dev->domain_lock);
> > spin_lock(&dev->msg_lock);
> > /* Make sure the inflight messages can processed after reconncection */
> > @@ -1763,7 +1886,6 @@ static struct vduse_dev *vduse_dev_create(void)
> > return NULL;
> >
> > mutex_init(&dev->lock);
> > - mutex_init(&dev->mem_lock);
> > mutex_init(&dev->domain_lock);
> > spin_lock_init(&dev->msg_lock);
> > INIT_LIST_HEAD(&dev->send_list);
> > @@ -1814,8 +1936,11 @@ static int vduse_destroy_dev(char *name)
> > idr_remove(&vduse_idr, dev->minor);
> > kvfree(dev->config);
> > vduse_dev_deinit_vqs(dev);
> > - if (dev->domain)
> > - vduse_domain_destroy(dev->domain);
> > + for (int i = 0; i < dev->nas; i++) {
> > + if (dev->as[i].domain)
> > + vduse_domain_destroy(dev->as[i].domain);
> > + }
> > + kfree(dev->as);
> > kfree(dev->name);
> > kfree(dev->groups);
> > vduse_dev_destroy(dev);
> > @@ -1862,12 +1987,17 @@ static bool vduse_validate_config(struct vduse_dev_config *config,
> > sizeof(config->reserved)))
> > return false;
> >
> > - if (api_version < VDUSE_API_VERSION_1 && config->ngroups)
> > + if (api_version < VDUSE_API_VERSION_1 &&
> > + (config->ngroups || config->nas))
> > return false;
> >
> > - if (api_version >= VDUSE_API_VERSION_1 &&
> > - (!config->ngroups || config->ngroups > VDUSE_DEV_MAX_GROUPS))
> > - return false;
> > + if (api_version >= VDUSE_API_VERSION_1) {
> > + if (!config->ngroups || config->ngroups > VDUSE_DEV_MAX_GROUPS)
> > + return false;
> > +
> > + if (!config->nas || config->nas > VDUSE_DEV_MAX_AS)
> > + return false;
> > + }
> >
> > if (config->vq_align > PAGE_SIZE)
> > return false;
> > @@ -1932,7 +2062,8 @@ static ssize_t bounce_size_store(struct device *device,
> >
> > ret = -EPERM;
> > mutex_lock(&dev->domain_lock);
> > - if (dev->domain)
> > + /* Assuming that if the first domain is allocated, all are allocated */
> > + if (dev->as[0].domain)
> > goto unlock;
>
> Should we update the per as bounce size here, and if yes, how to
> synchronize with need_sync()?
>
No, per as bounce size is still not allocated. It is stored in the
per->as[i].domain we check that it is not allocated in this
conditional [2].
> >
> > ret = kstrtouint(buf, 10, &bounce_size);
> > @@ -1984,6 +2115,14 @@ static int vduse_create_dev(struct vduse_dev_config *config,
> > dev->device_features = config->features;
> > dev->device_id = config->device_id;
> > dev->vendor_id = config->vendor_id;
> > +
> > + dev->nas = (dev->api_version < VDUSE_API_VERSION_1) ? 1 : config->nas;
> > + dev->as = kcalloc(dev->nas, sizeof(dev->as[0]), GFP_KERNEL);
> > + if (!dev->as)
> > + goto err_as;
> > + for (int i = 0; i < dev->nas; i++)
> > + mutex_init(&dev->as[i].mem_lock);
> > +
> > dev->ngroups = (dev->api_version < VDUSE_API_VERSION_1)
> > ? 1
> > : config->ngroups;
> > @@ -1991,8 +2130,11 @@ static int vduse_create_dev(struct vduse_dev_config *config,
> > GFP_KERNEL);
> > if (!dev->groups)
> > goto err_vq_groups;
> > - for (u32 i = 0; i < dev->ngroups; ++i)
> > + for (u32 i = 0; i < dev->ngroups; ++i) {
> > dev->groups[i].dev = dev;
> > + rwlock_init(&dev->groups[i].as_lock);
> > + dev->groups[i].as = &dev->as[0];
> > + }
> >
> > dev->name = kstrdup(config->name, GFP_KERNEL);
> > if (!dev->name)
> > @@ -2032,6 +2174,8 @@ static int vduse_create_dev(struct vduse_dev_config *config,
> > err_str:
> > kfree(dev->groups);
> > err_vq_groups:
> > + kfree(dev->as);
> > +err_as:
> > vduse_dev_destroy(dev);
> > err:
> > return ret;
> > @@ -2155,7 +2299,7 @@ static int vduse_dev_init_vdpa(struct vduse_dev *dev, const char *name)
> >
> > vdev = vdpa_alloc_device(struct vduse_vdpa, vdpa, dev->dev,
> > &vduse_vdpa_config_ops, &vduse_map_ops,
> > - dev->ngroups, 1, name, true);
> > + dev->ngroups, dev->nas, name, true);
> > if (IS_ERR(vdev))
> > return PTR_ERR(vdev);
> >
> > @@ -2170,7 +2314,8 @@ static int vdpa_dev_add(struct vdpa_mgmt_dev *mdev, const char *name,
> > const struct vdpa_dev_set_config *config)
> > {
> > struct vduse_dev *dev;
> > - int ret;
> > + size_t domain_bounce_size;
> > + int ret, i;
> >
> > mutex_lock(&vduse_lock);
> > dev = vduse_find_dev(name);
> > @@ -2184,29 +2329,38 @@ static int vdpa_dev_add(struct vdpa_mgmt_dev *mdev, const char *name,
> > return ret;
> >
> > mutex_lock(&dev->domain_lock);
> > - if (!dev->domain)
> > - dev->domain = vduse_domain_create(VDUSE_IOVA_SIZE - 1,
> > - dev->bounce_size);
> > - mutex_unlock(&dev->domain_lock);
> > - if (!dev->domain) {
> > - ret = -ENOMEM;
> > - goto domain_err;
> > + ret = 0;
> > +
> > + domain_bounce_size = dev->bounce_size / dev->nas;
> > + for (i = 0; i < dev->nas; ++i) {
> > + dev->as[i].domain = vduse_domain_create(VDUSE_IOVA_SIZE - 1,
> > + domain_bounce_size);
> > + if (!dev->as[i].domain) {
> > + ret = -ENOMEM;
> > + goto err;
> > + }
> > }
> >
> > + mutex_unlock(&dev->domain_lock);
> > +
> > ret = _vdpa_register_device(&dev->vdev->vdpa, dev->vq_num);
> > - if (ret) {
> > - goto register_err;
> > - }
> > + if (ret)
> > + goto err_register;
> >
> > return 0;
> >
> > -register_err:
> > +err_register:
> > mutex_lock(&dev->domain_lock);
> > - vduse_domain_destroy(dev->domain);
> > - dev->domain = NULL;
> > +
> > +err:
> > + for (int j = 0; j < i; j++) {
> > + if (dev->as[j].domain) {
> > + vduse_domain_destroy(dev->as[j].domain);
> > + dev->as[j].domain = NULL;
> > + }
> > + }
> > mutex_unlock(&dev->domain_lock);
> >
> > -domain_err:
> > put_device(&dev->vdev->vdpa.dev);
> >
> > return ret;
> > diff --git a/include/uapi/linux/vduse.h b/include/uapi/linux/vduse.h
> > index a3d51cf6df3a..9e423163d819 100644
> > --- a/include/uapi/linux/vduse.h
> > +++ b/include/uapi/linux/vduse.h
> > @@ -47,7 +47,8 @@ struct vduse_dev_config {
> > __u32 vq_num;
> > __u32 vq_align;
> > __u32 ngroups; /* if VDUSE_API_VERSION >= 1 */
> > - __u32 reserved[12];
> > + __u32 nas; /* if VDUSE_API_VERSION >= 1 */
> > + __u32 reserved[11];
> > __u32 config_size;
> > __u8 config[];
> > };
> > @@ -166,6 +167,16 @@ struct vduse_vq_state_packed {
> > __u16 last_used_idx;
> > };
> >
> > +/**
> > + * struct vduse_vq_group_asid - virtqueue group ASID
> > + * @group: Index of the virtqueue group
> > + * @asid: Address space ID of the group
> > + */
> > +struct vduse_vq_group_asid {
> > + __u32 group;
> > + __u32 asid;
> > +};
> > +
> > /**
> > * struct vduse_vq_info - information of a virtqueue
> > * @index: virtqueue index
> > @@ -225,6 +236,7 @@ struct vduse_vq_eventfd {
> > * @uaddr: start address of userspace memory, it must be aligned to page size
> > * @iova: start of the IOVA region
> > * @size: size of the IOVA region
> > + * @asid: Address space ID of the IOVA region
> > * @reserved: for future use, needs to be initialized to zero
> > *
> > * Structure used by VDUSE_IOTLB_REG_UMEM and VDUSE_IOTLB_DEREG_UMEM
> > @@ -234,7 +246,8 @@ struct vduse_iova_umem {
> > __u64 uaddr;
> > __u64 iova;
> > __u64 size;
> > - __u64 reserved[3];
> > + __u32 asid;
> > + __u32 reserved[5];
> > };
> >
> > /* Register userspace memory for IOVA regions */
> > @@ -248,6 +261,7 @@ struct vduse_iova_umem {
> > * @start: start of the IOVA region
> > * @last: last of the IOVA region
> > * @capability: capability of the IOVA region
> > + * @asid: Address space ID of the IOVA region, only if device API version >= 1
> > * @reserved: for future use, needs to be initialized to zero
> > *
> > * Structure used by VDUSE_IOTLB_GET_INFO ioctl to get information of
> > @@ -258,7 +272,8 @@ struct vduse_iova_info {
> > __u64 last;
> > #define VDUSE_IOVA_CAP_UMEM (1 << 0)
> > __u64 capability;
> > - __u64 reserved[3];
> > + __u32 asid; /* Only if device API version >= 1 */
> > + __u32 reserved[5];
> > };
> >
> > /*
> > @@ -267,6 +282,28 @@ struct vduse_iova_info {
> > */
> > #define VDUSE_IOTLB_GET_INFO _IOWR(VDUSE_BASE, 0x1a, struct vduse_iova_info)
> >
> > +/**
> > + * struct vduse_iotlb_entry_v2 - entry of IOTLB to describe one IOVA region
> > + *
> > + * @v1: the original vduse_iotlb_entry
> > + * @asid: address space ID of the IOVA region
> > + * @reserver: for future use, needs to be initialized to zero
> > + *
> > + * Structure used by VDUSE_IOTLB_GET_FD2 ioctl to find an overlapped IOVA region.
> > + */
> > +struct vduse_iotlb_entry_v2 {
> > + struct vduse_iotlb_entry v1;
> > + __u32 asid;
> > + __u32 reserved[12];
> > +};
> > +
> > +/*
> > + * Same as VDUSE_IOTLB_GET_FD but with vduse_iotlb_entry_v2 argument that
> > + * support extra fields.
> > + */
> > +#define VDUSE_IOTLB_GET_FD2 _IOWR(VDUSE_BASE, 0x1b, struct vduse_iotlb_entry_v2)
> > +
> > +
> > /* The control messages definition for read(2)/write(2) on /dev/vduse/$NAME */
> >
> > /**
> > @@ -280,6 +317,7 @@ enum vduse_req_type {
> > VDUSE_GET_VQ_STATE,
> > VDUSE_SET_STATUS,
> > VDUSE_UPDATE_IOTLB,
> > + VDUSE_SET_VQ_GROUP_ASID,
> > };
> >
> > /**
> > @@ -314,6 +352,18 @@ struct vduse_iova_range {
> > __u64 last;
> > };
> >
> > +/**
> > + * struct vduse_iova_range - IOVA range [start, last] if API_VERSION >= 1
> > + * @start: start of the IOVA range
> > + * @last: last of the IOVA range
> > + * @asid: address space ID of the IOVA range
> > + */
> > +struct vduse_iova_range_v2 {
> > + __u64 start;
> > + __u64 last;
> > + __u32 asid;
> > +};
> > +
> > /**
> > * struct vduse_dev_request - control request
> > * @type: request type
> > @@ -322,6 +372,8 @@ struct vduse_iova_range {
> > * @vq_state: virtqueue state, only index field is available
> > * @s: device status
> > * @iova: IOVA range for updating
> > + * @iova_v2: IOVA range for updating if API_VERSION >= 1
> > + * @vq_group_asid: ASID of a virtqueue group
> > * @padding: padding
> > *
> > * Structure used by read(2) on /dev/vduse/$NAME.
> > @@ -334,6 +386,11 @@ struct vduse_dev_request {
> > struct vduse_vq_state vq_state;
> > struct vduse_dev_status s;
> > struct vduse_iova_range iova;
> > + /* Following members but padding exist only if vduse api
> > + * version >= 1
> > + */;
>
> Unnecessary tailing ';'.
>
Ouch, fixing, thanks!
[1] https://docs.kernel.org/driver-api/ioctl.html
[2] https://lore.kernel.org/all/CACGkMEu1Lxun4aPza+kw7KrmLZJkfmW22KQNxbch+YhiTMkJZg@mail.gmail.com/
> > + struct vduse_iova_range_v2 iova_v2;
> > + struct vduse_vq_group_asid vq_group_asid;
> > __u32 padding[32];
> > };
> > };
> > --
> > 2.52.0
> >
>
^ permalink raw reply [flat|nested] 40+ messages in thread* Re: [PATCH v11 11/12] vduse: add vq group asid support
2026-01-13 14:38 ` Eugenio Perez Martin
@ 2026-01-14 7:32 ` Jason Wang
2026-01-14 8:21 ` Eugenio Perez Martin
0 siblings, 1 reply; 40+ messages in thread
From: Jason Wang @ 2026-01-14 7:32 UTC (permalink / raw)
To: Eugenio Perez Martin
Cc: Michael S . Tsirkin, linux-kernel, virtualization,
Maxime Coquelin, Laurent Vivier, Cindy Lu, Xuan Zhuo,
Stefano Garzarella, Yongji Xie
On Tue, Jan 13, 2026 at 10:39 PM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Tue, Jan 13, 2026 at 7:23 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Fri, Jan 9, 2026 at 11:25 PM Eugenio Pérez <eperezma@redhat.com> wrote:
> > >
> > > Add support for assigning Address Space Identifiers (ASIDs) to each VQ
> > > group. This enables mapping each group into a distinct memory space.
> > >
> > > The vq group to ASID association is protected by a rwlock now. But the
> > > mutex domain_lock keeps protecting the domains of all ASIDs, as some
> > > operations like the one related with the bounce buffer size still
> > > requires to lock all the ASIDs.
> > >
> > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > >
> > > ---
> > > Future improvements can include performance optimizations on top like
> > > ore to RCU or thread synchronized atomics, or hardening by tracking ASID
> > > or ASID hashes on unused bits of the DMA address.
> > >
> > > Tested virtio_vdpa by adding manually two threads in vduse_set_status:
> > > one of them modifies the vq group 0 ASID and the other one map and unmap
> > > memory continuously. After a while, the two threads stop and the usual
> > > work continues. Test with version 0, version 1 with the old ioctl, and
> > > verion 1 with the new ioctl.
> > >
> > > Tested with vhost_vdpa by migrating a VM while ping on OVS+VDUSE. A few
> > > workaround were needed in some parts:
> > > * Do not enable CVQ before data vqs in QEMU, as VDUSE does not forward
> > > the enable message to the userland device. This will be solved in the
> > > future.
> > > * Share the suspended state between all vhost devices in QEMU:
> > > https://lists.nongnu.org/archive/html/qemu-devel/2025-11/msg02947.html
> > > * Implement a fake VDUSE suspend vdpa operation callback that always
> > > returns true in the kernel. DPDK suspend the device at the first
> > > GET_VRING_BASE.
> > > * Remove the CVQ blocker in ASID.
> > >
> > > The driver vhost_vdpa was also tested with version 0, version 1 with the
> > > old ioctl, version 1 with the new ioctl but only one ASID, and version 1
> > > with many ASID.
> >
> > I think we'd better update the Documentation/userspace-api/vduse.rst
> > for the new uAPI.
> >
>
> Good point! I'll do it for the next version, thanks!
>
> > >
> > > ---
> > > v11:
> > > * Remove duplicated free_pages_exact in vduse_domain_free_coherent
> > > (Jason).
> > > * Do not take the vq groups lock if nas == 1.
> > > * Do not reset the vq group ASID in vq reset (Jason). Removed extra
> > > function vduse_set_group_asid_nomsg, not needed anymore.
> > > * Move the vduse_iotlb_entry_v2 argument to a new ioctl, as argument
> > > didn't match the previous VDUSE_IOTLB_GET_FD.
> > > * Move the asid < dev->nas check to vdpa core.
> > >
> > > v10:
> > > * Back to rwlock version so stronger locks are used.
> > > * Take out allocations from rwlock.
> > > * Forbid changing ASID of a vq group after DRIVER_OK (Jason)
> > > * Remove bad fetching again of domain variable in
> > > vduse_dev_max_mapping_size (Yongji).
> > > * Remove unused vdev definition in vdpa map_ops callbacks (kernel test
> > > robot).
> > >
> > > v9:
> > > * Replace mutex with rwlock, as the vdpa map_ops can run from atomic
> > > context.
> > >
> > > v8:
> > > * Revert the mutex to rwlock change, it needs proper profiling to
> > > justify it.
> > >
> > > v7:
> > > * Take write lock in the error path (Jason).
> > >
> > > v6:
> > > * Make vdpa_dev_add use gotos for error handling (MST).
> > > * s/(dev->api_version < 1) ?/(dev->api_version < VDUSE_API_VERSION_1) ?/
> > > (MST).
> > > * Fix struct name not matching in the doc.
> > >
> > > v5:
> > > * Properly return errno if copy_to_user returns >0 in VDUSE_IOTLB_GET_FD
> > > ioctl (Jason).
> > > * Properly set domain bounce size to divide equally between nas (Jason).
> > > * Exclude "padding" member from the only >V1 members in
> > > vduse_dev_request.
> > >
> > > v4:
> > > * Divide each domain bounce size between the device bounce size (Jason).
> > > * revert unneeded addr = NULL assignment (Jason)
> > > * Change if (x && (y || z)) return to if (x) { if (y) return; if (z)
> > > return; } (Jason)
> > > * Change a bad multiline comment, using @ caracter instead of * (Jason).
> > > * Consider config->nas == 0 as a fail (Jason).
> > >
> > > v3:
> > > * Get the vduse domain through the vduse_as in the map functions
> > > (Jason).
> > > * Squash with the patch creating the vduse_as struct (Jason).
> > > * Create VDUSE_DEV_MAX_AS instead of comparing agains a magic number
> > > (Jason)
> > >
> > > v2:
> > > * Convert the use of mutex to rwlock.
> > >
> > > RFC v3:
> > > * Increase VDUSE_MAX_VQ_GROUPS to 0xffff (Jason). It was set to a lower
> > > value to reduce memory consumption, but vqs are already limited to
> > > that value and userspace VDUSE is able to allocate that many vqs.
> > > * Remove TODO about merging VDUSE_IOTLB_GET_FD ioctl with
> > > VDUSE_IOTLB_GET_INFO.
> > > * Use of array_index_nospec in VDUSE device ioctls.
> > > * Embed vduse_iotlb_entry into vduse_iotlb_entry_v2.
> > > * Move the umem mutex to asid struct so there is no contention between
> > > ASIDs.
> > >
> > > RFC v2:
> > > * Make iotlb entry the last one of vduse_iotlb_entry_v2 so the first
> > > part of the struct is the same.
> > > ---
> > > drivers/vdpa/vdpa_user/vduse_dev.c | 392 ++++++++++++++++++++---------
> > > include/uapi/linux/vduse.h | 63 ++++-
> > > 2 files changed, 333 insertions(+), 122 deletions(-)
> > >
> > > diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
> > > index bf437816fd7d..8227b5e9f3f6 100644
> > > --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> > > +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> > > @@ -41,6 +41,7 @@
> > >
> > > #define VDUSE_DEV_MAX (1U << MINORBITS)
> > > #define VDUSE_DEV_MAX_GROUPS 0xffff
> > > +#define VDUSE_DEV_MAX_AS 0xffff
> > > #define VDUSE_MAX_BOUNCE_SIZE (1024 * 1024 * 1024)
> > > #define VDUSE_MIN_BOUNCE_SIZE (1024 * 1024)
> > > #define VDUSE_BOUNCE_SIZE (64 * 1024 * 1024)
> > > @@ -86,7 +87,15 @@ struct vduse_umem {
> > > struct mm_struct *mm;
> > > };
> > >
> > > +struct vduse_as {
> > > + struct vduse_iova_domain *domain;
> > > + struct vduse_umem *umem;
> > > + struct mutex mem_lock;
> > > +};
> > > +
> > > struct vduse_vq_group {
> > > + rwlock_t as_lock;
> > > + struct vduse_as *as; /* Protected by as_lock */
> > > struct vduse_dev *dev;
> > > };
> > >
> > > @@ -94,7 +103,7 @@ struct vduse_dev {
> > > struct vduse_vdpa *vdev;
> > > struct device *dev;
> > > struct vduse_virtqueue **vqs;
> > > - struct vduse_iova_domain *domain;
> > > + struct vduse_as *as;
> > > char *name;
> > > struct mutex lock;
> > > spinlock_t msg_lock;
> > > @@ -122,9 +131,8 @@ struct vduse_dev {
> > > u32 vq_num;
> > > u32 vq_align;
> > > u32 ngroups;
> > > - struct vduse_umem *umem;
> > > + u32 nas;
> > > struct vduse_vq_group *groups;
> > > - struct mutex mem_lock;
> > > unsigned int bounce_size;
> > > struct mutex domain_lock;
> > > };
> > > @@ -314,7 +322,7 @@ static int vduse_dev_set_status(struct vduse_dev *dev, u8 status)
> > > return vduse_dev_msg_sync(dev, &msg);
> > > }
> > >
> > > -static int vduse_dev_update_iotlb(struct vduse_dev *dev,
> > > +static int vduse_dev_update_iotlb(struct vduse_dev *dev, u32 asid,
> > > u64 start, u64 last)
> > > {
> > > struct vduse_dev_msg msg = { 0 };
> > > @@ -323,8 +331,14 @@ static int vduse_dev_update_iotlb(struct vduse_dev *dev,
> > > return -EINVAL;
> > >
> > > msg.req.type = VDUSE_UPDATE_IOTLB;
> > > - msg.req.iova.start = start;
> > > - msg.req.iova.last = last;
> > > + if (dev->api_version < VDUSE_API_VERSION_1) {
> > > + msg.req.iova.start = start;
> > > + msg.req.iova.last = last;
> > > + } else {
> > > + msg.req.iova_v2.start = start;
> > > + msg.req.iova_v2.last = last;
> > > + msg.req.iova_v2.asid = asid;
> > > + }
> > >
> > > return vduse_dev_msg_sync(dev, &msg);
> > > }
> > > @@ -439,11 +453,14 @@ static __poll_t vduse_dev_poll(struct file *file, poll_table *wait)
> > > static void vduse_dev_reset(struct vduse_dev *dev)
> > > {
> > > int i;
> > > - struct vduse_iova_domain *domain = dev->domain;
> > >
> > > /* The coherent mappings are handled in vduse_dev_free_coherent() */
> > > - if (domain && domain->bounce_map)
> > > - vduse_domain_reset_bounce_map(domain);
> > > + for (i = 0; i < dev->nas; i++) {
> > > + struct vduse_iova_domain *domain = dev->as[i].domain;
> > > +
> > > + if (domain && domain->bounce_map)
> > > + vduse_domain_reset_bounce_map(domain);
> > > + }
> > >
> > > down_write(&dev->rwsem);
> > >
> > > @@ -622,6 +639,31 @@ static union virtio_map vduse_get_vq_map(struct vdpa_device *vdpa, u16 idx)
> > > return ret;
> > > }
> > >
> > > +static int vduse_set_group_asid(struct vdpa_device *vdpa, unsigned int group,
> > > + unsigned int asid)
> > > +{
> > > + struct vduse_dev *dev = vdpa_to_vduse(vdpa);
> > > + struct vduse_dev_msg msg = { 0 };
> > > + int r;
> > > +
> > > + if (dev->api_version < VDUSE_API_VERSION_1)
> > > + return -EINVAL;
> > > +
> > > + msg.req.type = VDUSE_SET_VQ_GROUP_ASID;
> > > + msg.req.vq_group_asid.group = group;
> > > + msg.req.vq_group_asid.asid = asid;
> > > +
> > > + r = vduse_dev_msg_sync(dev, &msg);
> > > + if (r < 0)
> > > + return r;
> > > +
> > > + write_lock(&dev->groups[group].as_lock);
> > > + dev->groups[group].as = &dev->as[asid];
> > > + write_unlock(&dev->groups[group].as_lock);
> > > +
> > > + return 0;
> > > +}
> > > +
> > > static int vduse_vdpa_get_vq_state(struct vdpa_device *vdpa, u16 idx,
> > > struct vdpa_vq_state *state)
> > > {
> > > @@ -793,13 +835,13 @@ static int vduse_vdpa_set_map(struct vdpa_device *vdpa,
> > > struct vduse_dev *dev = vdpa_to_vduse(vdpa);
> > > int ret;
> > >
> > > - ret = vduse_domain_set_map(dev->domain, iotlb);
> > > + ret = vduse_domain_set_map(dev->as[asid].domain, iotlb);
> > > if (ret)
> > > return ret;
> > >
> > > - ret = vduse_dev_update_iotlb(dev, 0ULL, ULLONG_MAX);
> > > + ret = vduse_dev_update_iotlb(dev, asid, 0ULL, ULLONG_MAX);
> > > if (ret) {
> > > - vduse_domain_clear_map(dev->domain, iotlb);
> > > + vduse_domain_clear_map(dev->as[asid].domain, iotlb);
> > > return ret;
> > > }
> > >
> > > @@ -842,6 +884,7 @@ static const struct vdpa_config_ops vduse_vdpa_config_ops = {
> > > .get_vq_affinity = vduse_vdpa_get_vq_affinity,
> > > .reset = vduse_vdpa_reset,
> > > .set_map = vduse_vdpa_set_map,
> > > + .set_group_asid = vduse_set_group_asid,
> > > .get_vq_map = vduse_get_vq_map,
> > > .free = vduse_vdpa_free,
> > > };
> > > @@ -850,32 +893,38 @@ static void vduse_dev_sync_single_for_device(union virtio_map token,
> > > dma_addr_t dma_addr, size_t size,
> > > enum dma_data_direction dir)
> > > {
> > > - struct vduse_dev *vdev;
> > > struct vduse_iova_domain *domain;
> > >
> > > if (!token.group)
> > > return;
> > >
> > > - vdev = token.group->dev;
> > > - domain = vdev->domain;
> > > + if (token.group->dev->nas > 1)
> > > + read_lock(&token.group->as_lock);
> >
> > I suggest to factor this and unlock out into helpers.
> >
>
> I'm moving to scroped guards.
>
> > >
> > > + domain = token.group->as->domain;
> > > vduse_domain_sync_single_for_device(domain, dma_addr, size, dir);
> > > +
> > > + if (token.group->dev->nas > 1)
> > > + read_unlock(&token.group->as_lock);
> > > }
> > >
> > > static void vduse_dev_sync_single_for_cpu(union virtio_map token,
> > > dma_addr_t dma_addr, size_t size,
> > > enum dma_data_direction dir)
> > > {
> > > - struct vduse_dev *vdev;
> > > struct vduse_iova_domain *domain;
> > >
> > > if (!token.group)
> > > return;
> > >
> > > - vdev = token.group->dev;
> > > - domain = vdev->domain;
> > > + if (token.group->dev->nas > 1)
> > > + read_lock(&token.group->as_lock);
> > >
> > > + domain = token.group->as->domain;
> > > vduse_domain_sync_single_for_cpu(domain, dma_addr, size, dir);
> > > +
> > > + if (token.group->dev->nas > 1)
> > > + read_unlock(&token.group->as_lock);
> > > }
> > >
> > > static dma_addr_t vduse_dev_map_page(union virtio_map token, struct page *page,
> > > @@ -883,38 +932,45 @@ static dma_addr_t vduse_dev_map_page(union virtio_map token, struct page *page,
> > > enum dma_data_direction dir,
> > > unsigned long attrs)
> > > {
> > > - struct vduse_dev *vdev;
> > > struct vduse_iova_domain *domain;
> > > + dma_addr_t r;
> > >
> > > if (!token.group)
> > > return DMA_MAPPING_ERROR;
> > >
> > > - vdev = token.group->dev;
> > > - domain = vdev->domain;
> > > + if (token.group->dev->nas > 1)
> > > + read_lock(&token.group->as_lock);
> > > + domain = token.group->as->domain;
> > > + r = vduse_domain_map_page(domain, page, offset, size, dir, attrs);
> > >
> > > - return vduse_domain_map_page(domain, page, offset, size, dir, attrs);
> > > + if (token.group->dev->nas > 1)
> > > + read_unlock(&token.group->as_lock);
> > > +
> > > + return r;
> > > }
> > >
> > > static void vduse_dev_unmap_page(union virtio_map token, dma_addr_t dma_addr,
> > > size_t size, enum dma_data_direction dir,
> > > unsigned long attrs)
> > > {
> > > - struct vduse_dev *vdev;
> > > struct vduse_iova_domain *domain;
> > >
> > > if (!token.group)
> > > return;
> > >
> > > - vdev = token.group->dev;
> > > - domain = vdev->domain;
> > > + if (token.group->dev->nas > 1)
> > > + read_lock(&token.group->as_lock);
> > > +
> > > + domain = token.group->as->domain;
> > > + vduse_domain_unmap_page(domain, dma_addr, size, dir, attrs);
> > >
> > > - return vduse_domain_unmap_page(domain, dma_addr, size, dir, attrs);
> > > + if (token.group->dev->nas > 1)
> > > + read_unlock(&token.group->as_lock);
> > > }
> > >
> > > static void *vduse_dev_alloc_coherent(union virtio_map token, size_t size,
> > > dma_addr_t *dma_addr, gfp_t flag)
> > > {
> > > - struct vduse_dev *vdev;
> > > struct vduse_iova_domain *domain;
> > > unsigned long iova;
> > > void *addr;
> > > @@ -927,8 +983,10 @@ static void *vduse_dev_alloc_coherent(union virtio_map token, size_t size,
> > > if (!addr)
> > > return NULL;
> > >
> > > - vdev = token.group->dev;
> > > - domain = vdev->domain;
> > > + if (token.group->dev->nas > 1)
> > > + read_lock(&token.group->as_lock);
> > > +
> > > + domain = token.group->as->domain;
> > > addr = vduse_domain_alloc_coherent(domain, size,
> > > (dma_addr_t *)&iova, addr);
> > > if (!addr)
> > > @@ -936,9 +994,14 @@ static void *vduse_dev_alloc_coherent(union virtio_map token, size_t size,
> > >
> > > *dma_addr = (dma_addr_t)iova;
> > >
> > > + if (token.group->dev->nas > 1)
> > > + read_unlock(&token.group->as_lock);
> > > +
> > > return addr;
> > >
> > > err:
> > > + if (token.group->dev->nas > 1)
> > > + read_unlock(&token.group->as_lock);
> > > free_pages_exact(addr, size);
> > > return NULL;
> > > }
> > > @@ -947,31 +1010,39 @@ static void vduse_dev_free_coherent(union virtio_map token, size_t size,
> > > void *vaddr, dma_addr_t dma_addr,
> > > unsigned long attrs)
> > > {
> > > - struct vduse_dev *vdev;
> > > struct vduse_iova_domain *domain;
> > >
> > > if (!token.group)
> > > return;
> > >
> > > - vdev = token.group->dev;
> > > - domain = vdev->domain;
> > > + if (token.group->dev->nas > 1)
> > > + read_lock(&token.group->as_lock);
> > >
> > > + domain = token.group->as->domain;
> > > vduse_domain_free_coherent(domain, size, dma_addr, attrs);
> > > +
> > > + if (token.group->dev->nas > 1)
> > > + read_unlock(&token.group->as_lock);
> > > +
> > > free_pages_exact(vaddr, size);
> > > }
> > >
> > > static bool vduse_dev_need_sync(union virtio_map token, dma_addr_t dma_addr)
> > > {
> > > - struct vduse_dev *vdev;
> > > - struct vduse_iova_domain *domain;
> > > + size_t bounce_size;
> > >
> > > if (!token.group)
> > > return false;
> > >
> > > - vdev = token.group->dev;
> > > - domain = vdev->domain;
> > > + if (token.group->dev->nas > 1)
> > > + read_lock(&token.group->as_lock);
> > > +
> > > + bounce_size = token.group->as->domain->bounce_size;
> > > +
> > > + if (token.group->dev->nas > 1)
> > > + read_unlock(&token.group->as_lock);
> > >
> > > - return dma_addr < domain->bounce_size;
> > > + return dma_addr < bounce_size;
> > > }
> > >
> > > static int vduse_dev_mapping_error(union virtio_map token, dma_addr_t dma_addr)
> > > @@ -983,16 +1054,20 @@ static int vduse_dev_mapping_error(union virtio_map token, dma_addr_t dma_addr)
> > >
> > > static size_t vduse_dev_max_mapping_size(union virtio_map token)
> > > {
> > > - struct vduse_dev *vdev;
> > > - struct vduse_iova_domain *domain;
> > > + size_t bounce_size;
> > >
> > > if (!token.group)
> > > return 0;
> > >
> > > - vdev = token.group->dev;
> > > - domain = vdev->domain;
> > > + if (token.group->dev->nas > 1)
> > > + read_lock(&token.group->as_lock);
> > > +
> > > + bounce_size = token.group->as->domain->bounce_size;
> > > +
> > > + if (token.group->dev->nas > 1)
> > > + read_unlock(&token.group->as_lock);
> > >
> > > - return domain->bounce_size;
> > > + return bounce_size;
> > > }
> > >
> > > static const struct virtio_map_ops vduse_map_ops = {
> > > @@ -1132,39 +1207,40 @@ static int vduse_dev_queue_irq_work(struct vduse_dev *dev,
> > > return ret;
> > > }
> > >
> > > -static int vduse_dev_dereg_umem(struct vduse_dev *dev,
> > > +static int vduse_dev_dereg_umem(struct vduse_dev *dev, u32 asid,
> > > u64 iova, u64 size)
> > > {
> > > int ret;
> > >
> > > - mutex_lock(&dev->mem_lock);
> > > + mutex_lock(&dev->as[asid].mem_lock);
> > > ret = -ENOENT;
> > > - if (!dev->umem)
> > > + if (!dev->as[asid].umem)
> > > goto unlock;
> > >
> > > ret = -EINVAL;
> > > - if (!dev->domain)
> > > + if (!dev->as[asid].domain)
> > > goto unlock;
> > >
> > > - if (dev->umem->iova != iova || size != dev->domain->bounce_size)
> > > + if (dev->as[asid].umem->iova != iova ||
> > > + size != dev->as[asid].domain->bounce_size)
> > > goto unlock;
> > >
> > > - vduse_domain_remove_user_bounce_pages(dev->domain);
> > > - unpin_user_pages_dirty_lock(dev->umem->pages,
> > > - dev->umem->npages, true);
> > > - atomic64_sub(dev->umem->npages, &dev->umem->mm->pinned_vm);
> > > - mmdrop(dev->umem->mm);
> > > - vfree(dev->umem->pages);
> > > - kfree(dev->umem);
> > > - dev->umem = NULL;
> > > + vduse_domain_remove_user_bounce_pages(dev->as[asid].domain);
> > > + unpin_user_pages_dirty_lock(dev->as[asid].umem->pages,
> > > + dev->as[asid].umem->npages, true);
> > > + atomic64_sub(dev->as[asid].umem->npages, &dev->as[asid].umem->mm->pinned_vm);
> > > + mmdrop(dev->as[asid].umem->mm);
> > > + vfree(dev->as[asid].umem->pages);
> > > + kfree(dev->as[asid].umem);
> > > + dev->as[asid].umem = NULL;
> > > ret = 0;
> > > unlock:
> > > - mutex_unlock(&dev->mem_lock);
> > > + mutex_unlock(&dev->as[asid].mem_lock);
> > > return ret;
> > > }
> > >
> > > static int vduse_dev_reg_umem(struct vduse_dev *dev,
> > > - u64 iova, u64 uaddr, u64 size)
> > > + u32 asid, u64 iova, u64 uaddr, u64 size)
> > > {
> > > struct page **page_list = NULL;
> > > struct vduse_umem *umem = NULL;
> > > @@ -1172,14 +1248,14 @@ static int vduse_dev_reg_umem(struct vduse_dev *dev,
> > > unsigned long npages, lock_limit;
> > > int ret;
> > >
> > > - if (!dev->domain || !dev->domain->bounce_map ||
> > > - size != dev->domain->bounce_size ||
> > > + if (!dev->as[asid].domain || !dev->as[asid].domain->bounce_map ||
> > > + size != dev->as[asid].domain->bounce_size ||
> > > iova != 0 || uaddr & ~PAGE_MASK)
> > > return -EINVAL;
> > >
> > > - mutex_lock(&dev->mem_lock);
> > > + mutex_lock(&dev->as[asid].mem_lock);
> > > ret = -EEXIST;
> > > - if (dev->umem)
> > > + if (dev->as[asid].umem)
> > > goto unlock;
> > >
> > > ret = -ENOMEM;
> > > @@ -1203,7 +1279,7 @@ static int vduse_dev_reg_umem(struct vduse_dev *dev,
> > > goto out;
> > > }
> > >
> > > - ret = vduse_domain_add_user_bounce_pages(dev->domain,
> > > + ret = vduse_domain_add_user_bounce_pages(dev->as[asid].domain,
> > > page_list, pinned);
> > > if (ret)
> > > goto out;
> > > @@ -1216,7 +1292,7 @@ static int vduse_dev_reg_umem(struct vduse_dev *dev,
> > > umem->mm = current->mm;
> > > mmgrab(current->mm);
> > >
> > > - dev->umem = umem;
> > > + dev->as[asid].umem = umem;
> > > out:
> > > if (ret && pinned > 0)
> > > unpin_user_pages(page_list, pinned);
> > > @@ -1227,7 +1303,7 @@ static int vduse_dev_reg_umem(struct vduse_dev *dev,
> > > vfree(page_list);
> > > kfree(umem);
> > > }
> > > - mutex_unlock(&dev->mem_lock);
> > > + mutex_unlock(&dev->as[asid].mem_lock);
> > > return ret;
> > > }
> > >
> > > @@ -1248,43 +1324,46 @@ static void vduse_vq_update_effective_cpu(struct vduse_virtqueue *vq)
> > > }
> > >
> > > static int vduse_dev_iotlb_entry(struct vduse_dev *dev,
> > > - struct vduse_iotlb_entry *entry,
> > > + struct vduse_iotlb_entry_v2 *entry,
> > > struct file **f, uint64_t *capability)
> > > {
> > > + u32 asid;
> > > int r = -EINVAL;
> > > struct vhost_iotlb_map *map;
> > > const struct vdpa_map_file *map_file;
> > >
> > > - if (entry->start > entry->last)
> > > + if (entry->v1.start > entry->v1.last || entry->asid >= dev->nas)
> > > return -EINVAL;
> > >
> > > + asid = array_index_nospec(entry->asid, dev->nas);
> > > mutex_lock(&dev->domain_lock);
> > > - if (!dev->domain)
> > > +
> > > + if (!dev->as[asid].domain)
> > > goto out;
> > >
> > > - spin_lock(&dev->domain->iotlb_lock);
> > > - map = vhost_iotlb_itree_first(dev->domain->iotlb, entry->start,
> > > - entry->last);
> > > + spin_lock(&dev->as[asid].domain->iotlb_lock);
> > > + map = vhost_iotlb_itree_first(dev->as[asid].domain->iotlb,
> > > + entry->v1.start, entry->v1.last);
> > > if (map) {
> > > if (f) {
> > > map_file = (struct vdpa_map_file *)map->opaque;
> > > *f = get_file(map_file->file);
> > > }
> > > - entry->offset = map_file->offset;
> > > - entry->start = map->start;
> > > - entry->last = map->last;
> > > - entry->perm = map->perm;
> > > + entry->v1.offset = map_file->offset;
> > > + entry->v1.start = map->start;
> > > + entry->v1.last = map->last;
> > > + entry->v1.perm = map->perm;
> > > if (capability) {
> > > *capability = 0;
> > >
> > > - if (dev->domain->bounce_map && map->start == 0 &&
> > > - map->last == dev->domain->bounce_size - 1)
> > > + if (dev->as[asid].domain->bounce_map && map->start == 0 &&
> > > + map->last == dev->as[asid].domain->bounce_size - 1)
> > > *capability |= VDUSE_IOVA_CAP_UMEM;
> > > }
> > >
> > > r = 0;
> > > }
> > > - spin_unlock(&dev->domain->iotlb_lock);
> > > + spin_unlock(&dev->as[asid].domain->iotlb_lock);
> > >
> > > out:
> > > mutex_unlock(&dev->domain_lock);
> > > @@ -1302,12 +1381,29 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
> > > return -EPERM;
> > >
> > > switch (cmd) {
> > > - case VDUSE_IOTLB_GET_FD: {
> > > - struct vduse_iotlb_entry entry;
> > > + case VDUSE_IOTLB_GET_FD:
> > > + case VDUSE_IOTLB_GET_FD2: {
> >
> > I would rename this as GET_FD_ASID
>
> It does not scale very well. What if we add more fields to the
> reserved members of the ioctl argument?
New ioctl then? Or you meant we can know which fields are meaningful
according to the api version or other?
>
> > but I wonder the reason for the new
> > ioctl. I think we can deduce it from the API version,
>
> It's in the changelog but let me know if I should expand there.
>
> The reason for a new ioctl is that the previous one is defined in the
> uapi with a struct size that is not able to hold the addition of the
> asid. From the documentation [1]:
>
> "the command number encodes the sizeof(data_type) value in a 13-bit or
> 14-bit integer"
>
> I'm not sure if any sanitizer checks this or if it will be implemented
> in the future though. My interpretation of the next section of the
> ioctl documentation "Interface versions" also recommends creating a
> new versioned ioctl.
>
> Sadly I didn't realize this in the first versions so I introduced it later.
You're right.
>
> > for example, you
> > didn't introduce VDUSE_IOTLB_REG_UMEM2.
> >
>
> VDUSE_IOTLB_REG_UMEM already has room for extension in the form of
> reserved members that must be zero in the master branch.
Right.
>
> > > + struct vduse_iotlb_entry_v2 entry = {0};
> > > struct file *f = NULL;
> > >
> > > + ret = -ENOIOCTLCMD;
> > > + if (dev->api_version < VDUSE_API_VERSION_1 &&
> > > + cmd == VDUSE_IOTLB_GET_FD2)
> > > + break;
> > > +
> > > ret = -EFAULT;
> > > - if (copy_from_user(&entry, argp, sizeof(entry)))
> > > + if (cmd == VDUSE_IOTLB_GET_FD2) {
> > > + if (copy_from_user(&entry, argp, sizeof(entry)))
> > > + break;
> > > + } else {
> > > + if (copy_from_user(&entry.v1, argp,
> > > + sizeof(entry.v1)))
> > > + break;
> > > + }
> > > +
> > > + ret = -EINVAL;
> > > + if (!is_mem_zero((const char *)entry.reserved,
> > > + sizeof(entry.reserved)))
> > > break;
> > >
> > > ret = vduse_dev_iotlb_entry(dev, &entry, &f, NULL);
> > > @@ -1318,12 +1414,19 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
> > > if (!f)
> > > break;
> > >
> > > - ret = -EFAULT;
> > > - if (copy_to_user(argp, &entry, sizeof(entry))) {
> > > + if (cmd == VDUSE_IOTLB_GET_FD2)
> > > + ret = copy_to_user(argp, &entry,
> > > + sizeof(entry));
> > > + else
> > > + ret = copy_to_user(argp, &entry.v1,
> > > + sizeof(entry.v1));
> > > +
> > > + if (ret) {
> > > + ret = -EFAULT;
> > > fput(f);
> > > break;
> > > }
> > > - ret = receive_fd(f, NULL, perm_to_file_flags(entry.perm));
> > > + ret = receive_fd(f, NULL, perm_to_file_flags(entry.v1.perm));
> > > fput(f);
> > > break;
> > > }
> > > @@ -1468,6 +1571,7 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
> > > }
> > > case VDUSE_IOTLB_REG_UMEM: {
> > > struct vduse_iova_umem umem;
> > > + u32 asid;
> > >
> > > ret = -EFAULT;
> > > if (copy_from_user(&umem, argp, sizeof(umem)))
> > > @@ -1475,17 +1579,21 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
> > >
> > > ret = -EINVAL;
> > > if (!is_mem_zero((const char *)umem.reserved,
> > > - sizeof(umem.reserved)))
> > > + sizeof(umem.reserved)) ||
> > > + (dev->api_version < VDUSE_API_VERSION_1 &&
> > > + umem.asid != 0) || umem.asid >= dev->nas)
> > > break;
> > >
> > > mutex_lock(&dev->domain_lock);
> > > - ret = vduse_dev_reg_umem(dev, umem.iova,
> > > + asid = array_index_nospec(umem.asid, dev->nas);
> > > + ret = vduse_dev_reg_umem(dev, asid, umem.iova,
> > > umem.uaddr, umem.size);
> > > mutex_unlock(&dev->domain_lock);
> > > break;
> > > }
> > > case VDUSE_IOTLB_DEREG_UMEM: {
> > > struct vduse_iova_umem umem;
> > > + u32 asid;
> > >
> > > ret = -EFAULT;
> > > if (copy_from_user(&umem, argp, sizeof(umem)))
> > > @@ -1493,17 +1601,22 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
> > >
> > > ret = -EINVAL;
> > > if (!is_mem_zero((const char *)umem.reserved,
> > > - sizeof(umem.reserved)))
> > > + sizeof(umem.reserved)) ||
> > > + (dev->api_version < VDUSE_API_VERSION_1 &&
> > > + umem.asid != 0) ||
> > > + umem.asid >= dev->nas)
> > > break;
> > > +
> > > mutex_lock(&dev->domain_lock);
> > > - ret = vduse_dev_dereg_umem(dev, umem.iova,
> > > + asid = array_index_nospec(umem.asid, dev->nas);
> > > + ret = vduse_dev_dereg_umem(dev, asid, umem.iova,
> > > umem.size);
> > > mutex_unlock(&dev->domain_lock);
> > > break;
> > > }
> > > case VDUSE_IOTLB_GET_INFO: {
> > > struct vduse_iova_info info;
> > > - struct vduse_iotlb_entry entry;
> > > + struct vduse_iotlb_entry_v2 entry;
> > >
> > > ret = -EFAULT;
> > > if (copy_from_user(&info, argp, sizeof(info)))
> > > @@ -1513,15 +1626,23 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
> > > sizeof(info.reserved)))
> > > break;
> > >
> > > - entry.start = info.start;
> > > - entry.last = info.last;
> > > + if (dev->api_version < VDUSE_API_VERSION_1) {
> > > + if (info.asid)
> > > + break;
> > > + } else if (info.asid >= dev->nas)
> > > + break;
> > > +
> > > + entry.v1.start = info.start;
> > > + entry.v1.last = info.last;
> > > + entry.asid = info.asid;
> > > ret = vduse_dev_iotlb_entry(dev, &entry, NULL,
> > > &info.capability);
> > > if (ret < 0)
> > > break;
> > >
> > > - info.start = entry.start;
> > > - info.last = entry.last;
> > > + info.start = entry.v1.start;
> > > + info.last = entry.v1.last;
> > > + info.asid = entry.asid;
> > >
> > > ret = -EFAULT;
> > > if (copy_to_user(argp, &info, sizeof(info)))
> > > @@ -1543,8 +1664,10 @@ static int vduse_dev_release(struct inode *inode, struct file *file)
> > > struct vduse_dev *dev = file->private_data;
> > >
> > > mutex_lock(&dev->domain_lock);
> > > - if (dev->domain)
> > > - vduse_dev_dereg_umem(dev, 0, dev->domain->bounce_size);
> > > + for (int i = 0; i < dev->nas; i++)
> > > + if (dev->as[i].domain)
> > > + vduse_dev_dereg_umem(dev, i, 0,
> > > + dev->as[i].domain->bounce_size);
> > > mutex_unlock(&dev->domain_lock);
> > > spin_lock(&dev->msg_lock);
> > > /* Make sure the inflight messages can processed after reconncection */
> > > @@ -1763,7 +1886,6 @@ static struct vduse_dev *vduse_dev_create(void)
> > > return NULL;
> > >
> > > mutex_init(&dev->lock);
> > > - mutex_init(&dev->mem_lock);
> > > mutex_init(&dev->domain_lock);
> > > spin_lock_init(&dev->msg_lock);
> > > INIT_LIST_HEAD(&dev->send_list);
> > > @@ -1814,8 +1936,11 @@ static int vduse_destroy_dev(char *name)
> > > idr_remove(&vduse_idr, dev->minor);
> > > kvfree(dev->config);
> > > vduse_dev_deinit_vqs(dev);
> > > - if (dev->domain)
> > > - vduse_domain_destroy(dev->domain);
> > > + for (int i = 0; i < dev->nas; i++) {
> > > + if (dev->as[i].domain)
> > > + vduse_domain_destroy(dev->as[i].domain);
> > > + }
> > > + kfree(dev->as);
> > > kfree(dev->name);
> > > kfree(dev->groups);
> > > vduse_dev_destroy(dev);
> > > @@ -1862,12 +1987,17 @@ static bool vduse_validate_config(struct vduse_dev_config *config,
> > > sizeof(config->reserved)))
> > > return false;
> > >
> > > - if (api_version < VDUSE_API_VERSION_1 && config->ngroups)
> > > + if (api_version < VDUSE_API_VERSION_1 &&
> > > + (config->ngroups || config->nas))
> > > return false;
> > >
> > > - if (api_version >= VDUSE_API_VERSION_1 &&
> > > - (!config->ngroups || config->ngroups > VDUSE_DEV_MAX_GROUPS))
> > > - return false;
> > > + if (api_version >= VDUSE_API_VERSION_1) {
> > > + if (!config->ngroups || config->ngroups > VDUSE_DEV_MAX_GROUPS)
> > > + return false;
> > > +
> > > + if (!config->nas || config->nas > VDUSE_DEV_MAX_AS)
> > > + return false;
> > > + }
> > >
> > > if (config->vq_align > PAGE_SIZE)
> > > return false;
> > > @@ -1932,7 +2062,8 @@ static ssize_t bounce_size_store(struct device *device,
> > >
> > > ret = -EPERM;
> > > mutex_lock(&dev->domain_lock);
> > > - if (dev->domain)
> > > + /* Assuming that if the first domain is allocated, all are allocated */
> > > + if (dev->as[0].domain)
> > > goto unlock;
> >
> > Should we update the per as bounce size here, and if yes, how to
> > synchronize with need_sync()?
> >
>
> No, per as bounce size is still not allocated. It is stored in the
> per->as[i].domain we check that it is not allocated in this
> conditional [2].
Exactly.
Thanks
^ permalink raw reply [flat|nested] 40+ messages in thread* Re: [PATCH v11 11/12] vduse: add vq group asid support
2026-01-14 7:32 ` Jason Wang
@ 2026-01-14 8:21 ` Eugenio Perez Martin
0 siblings, 0 replies; 40+ messages in thread
From: Eugenio Perez Martin @ 2026-01-14 8:21 UTC (permalink / raw)
To: Jason Wang
Cc: Michael S . Tsirkin, linux-kernel, virtualization,
Maxime Coquelin, Laurent Vivier, Cindy Lu, Xuan Zhuo,
Stefano Garzarella, Yongji Xie
On Wed, Jan 14, 2026 at 8:33 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Tue, Jan 13, 2026 at 10:39 PM Eugenio Perez Martin
> <eperezma@redhat.com> wrote:
> >
> > On Tue, Jan 13, 2026 at 7:23 AM Jason Wang <jasowang@redhat.com> wrote:
> > >
> > > On Fri, Jan 9, 2026 at 11:25 PM Eugenio Pérez <eperezma@redhat.com> wrote:
> > > >
> > > > Add support for assigning Address Space Identifiers (ASIDs) to each VQ
> > > > group. This enables mapping each group into a distinct memory space.
> > > >
> > > > The vq group to ASID association is protected by a rwlock now. But the
> > > > mutex domain_lock keeps protecting the domains of all ASIDs, as some
> > > > operations like the one related with the bounce buffer size still
> > > > requires to lock all the ASIDs.
> > > >
> > > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > >
> > > > ---
> > > > Future improvements can include performance optimizations on top like
> > > > ore to RCU or thread synchronized atomics, or hardening by tracking ASID
> > > > or ASID hashes on unused bits of the DMA address.
> > > >
> > > > Tested virtio_vdpa by adding manually two threads in vduse_set_status:
> > > > one of them modifies the vq group 0 ASID and the other one map and unmap
> > > > memory continuously. After a while, the two threads stop and the usual
> > > > work continues. Test with version 0, version 1 with the old ioctl, and
> > > > verion 1 with the new ioctl.
> > > >
> > > > Tested with vhost_vdpa by migrating a VM while ping on OVS+VDUSE. A few
> > > > workaround were needed in some parts:
> > > > * Do not enable CVQ before data vqs in QEMU, as VDUSE does not forward
> > > > the enable message to the userland device. This will be solved in the
> > > > future.
> > > > * Share the suspended state between all vhost devices in QEMU:
> > > > https://lists.nongnu.org/archive/html/qemu-devel/2025-11/msg02947.html
> > > > * Implement a fake VDUSE suspend vdpa operation callback that always
> > > > returns true in the kernel. DPDK suspend the device at the first
> > > > GET_VRING_BASE.
> > > > * Remove the CVQ blocker in ASID.
> > > >
> > > > The driver vhost_vdpa was also tested with version 0, version 1 with the
> > > > old ioctl, version 1 with the new ioctl but only one ASID, and version 1
> > > > with many ASID.
> > >
> > > I think we'd better update the Documentation/userspace-api/vduse.rst
> > > for the new uAPI.
> > >
> >
> > Good point! I'll do it for the next version, thanks!
> >
> > > >
> > > > ---
> > > > v11:
> > > > * Remove duplicated free_pages_exact in vduse_domain_free_coherent
> > > > (Jason).
> > > > * Do not take the vq groups lock if nas == 1.
> > > > * Do not reset the vq group ASID in vq reset (Jason). Removed extra
> > > > function vduse_set_group_asid_nomsg, not needed anymore.
> > > > * Move the vduse_iotlb_entry_v2 argument to a new ioctl, as argument
> > > > didn't match the previous VDUSE_IOTLB_GET_FD.
> > > > * Move the asid < dev->nas check to vdpa core.
> > > >
> > > > v10:
> > > > * Back to rwlock version so stronger locks are used.
> > > > * Take out allocations from rwlock.
> > > > * Forbid changing ASID of a vq group after DRIVER_OK (Jason)
> > > > * Remove bad fetching again of domain variable in
> > > > vduse_dev_max_mapping_size (Yongji).
> > > > * Remove unused vdev definition in vdpa map_ops callbacks (kernel test
> > > > robot).
> > > >
> > > > v9:
> > > > * Replace mutex with rwlock, as the vdpa map_ops can run from atomic
> > > > context.
> > > >
> > > > v8:
> > > > * Revert the mutex to rwlock change, it needs proper profiling to
> > > > justify it.
> > > >
> > > > v7:
> > > > * Take write lock in the error path (Jason).
> > > >
> > > > v6:
> > > > * Make vdpa_dev_add use gotos for error handling (MST).
> > > > * s/(dev->api_version < 1) ?/(dev->api_version < VDUSE_API_VERSION_1) ?/
> > > > (MST).
> > > > * Fix struct name not matching in the doc.
> > > >
> > > > v5:
> > > > * Properly return errno if copy_to_user returns >0 in VDUSE_IOTLB_GET_FD
> > > > ioctl (Jason).
> > > > * Properly set domain bounce size to divide equally between nas (Jason).
> > > > * Exclude "padding" member from the only >V1 members in
> > > > vduse_dev_request.
> > > >
> > > > v4:
> > > > * Divide each domain bounce size between the device bounce size (Jason).
> > > > * revert unneeded addr = NULL assignment (Jason)
> > > > * Change if (x && (y || z)) return to if (x) { if (y) return; if (z)
> > > > return; } (Jason)
> > > > * Change a bad multiline comment, using @ caracter instead of * (Jason).
> > > > * Consider config->nas == 0 as a fail (Jason).
> > > >
> > > > v3:
> > > > * Get the vduse domain through the vduse_as in the map functions
> > > > (Jason).
> > > > * Squash with the patch creating the vduse_as struct (Jason).
> > > > * Create VDUSE_DEV_MAX_AS instead of comparing agains a magic number
> > > > (Jason)
> > > >
> > > > v2:
> > > > * Convert the use of mutex to rwlock.
> > > >
> > > > RFC v3:
> > > > * Increase VDUSE_MAX_VQ_GROUPS to 0xffff (Jason). It was set to a lower
> > > > value to reduce memory consumption, but vqs are already limited to
> > > > that value and userspace VDUSE is able to allocate that many vqs.
> > > > * Remove TODO about merging VDUSE_IOTLB_GET_FD ioctl with
> > > > VDUSE_IOTLB_GET_INFO.
> > > > * Use of array_index_nospec in VDUSE device ioctls.
> > > > * Embed vduse_iotlb_entry into vduse_iotlb_entry_v2.
> > > > * Move the umem mutex to asid struct so there is no contention between
> > > > ASIDs.
> > > >
> > > > RFC v2:
> > > > * Make iotlb entry the last one of vduse_iotlb_entry_v2 so the first
> > > > part of the struct is the same.
> > > > ---
> > > > drivers/vdpa/vdpa_user/vduse_dev.c | 392 ++++++++++++++++++++---------
> > > > include/uapi/linux/vduse.h | 63 ++++-
> > > > 2 files changed, 333 insertions(+), 122 deletions(-)
> > > >
> > > > diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
> > > > index bf437816fd7d..8227b5e9f3f6 100644
> > > > --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> > > > +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> > > > @@ -41,6 +41,7 @@
> > > >
> > > > #define VDUSE_DEV_MAX (1U << MINORBITS)
> > > > #define VDUSE_DEV_MAX_GROUPS 0xffff
> > > > +#define VDUSE_DEV_MAX_AS 0xffff
> > > > #define VDUSE_MAX_BOUNCE_SIZE (1024 * 1024 * 1024)
> > > > #define VDUSE_MIN_BOUNCE_SIZE (1024 * 1024)
> > > > #define VDUSE_BOUNCE_SIZE (64 * 1024 * 1024)
> > > > @@ -86,7 +87,15 @@ struct vduse_umem {
> > > > struct mm_struct *mm;
> > > > };
> > > >
> > > > +struct vduse_as {
> > > > + struct vduse_iova_domain *domain;
> > > > + struct vduse_umem *umem;
> > > > + struct mutex mem_lock;
> > > > +};
> > > > +
> > > > struct vduse_vq_group {
> > > > + rwlock_t as_lock;
> > > > + struct vduse_as *as; /* Protected by as_lock */
> > > > struct vduse_dev *dev;
> > > > };
> > > >
> > > > @@ -94,7 +103,7 @@ struct vduse_dev {
> > > > struct vduse_vdpa *vdev;
> > > > struct device *dev;
> > > > struct vduse_virtqueue **vqs;
> > > > - struct vduse_iova_domain *domain;
> > > > + struct vduse_as *as;
> > > > char *name;
> > > > struct mutex lock;
> > > > spinlock_t msg_lock;
> > > > @@ -122,9 +131,8 @@ struct vduse_dev {
> > > > u32 vq_num;
> > > > u32 vq_align;
> > > > u32 ngroups;
> > > > - struct vduse_umem *umem;
> > > > + u32 nas;
> > > > struct vduse_vq_group *groups;
> > > > - struct mutex mem_lock;
> > > > unsigned int bounce_size;
> > > > struct mutex domain_lock;
> > > > };
> > > > @@ -314,7 +322,7 @@ static int vduse_dev_set_status(struct vduse_dev *dev, u8 status)
> > > > return vduse_dev_msg_sync(dev, &msg);
> > > > }
> > > >
> > > > -static int vduse_dev_update_iotlb(struct vduse_dev *dev,
> > > > +static int vduse_dev_update_iotlb(struct vduse_dev *dev, u32 asid,
> > > > u64 start, u64 last)
> > > > {
> > > > struct vduse_dev_msg msg = { 0 };
> > > > @@ -323,8 +331,14 @@ static int vduse_dev_update_iotlb(struct vduse_dev *dev,
> > > > return -EINVAL;
> > > >
> > > > msg.req.type = VDUSE_UPDATE_IOTLB;
> > > > - msg.req.iova.start = start;
> > > > - msg.req.iova.last = last;
> > > > + if (dev->api_version < VDUSE_API_VERSION_1) {
> > > > + msg.req.iova.start = start;
> > > > + msg.req.iova.last = last;
> > > > + } else {
> > > > + msg.req.iova_v2.start = start;
> > > > + msg.req.iova_v2.last = last;
> > > > + msg.req.iova_v2.asid = asid;
> > > > + }
> > > >
> > > > return vduse_dev_msg_sync(dev, &msg);
> > > > }
> > > > @@ -439,11 +453,14 @@ static __poll_t vduse_dev_poll(struct file *file, poll_table *wait)
> > > > static void vduse_dev_reset(struct vduse_dev *dev)
> > > > {
> > > > int i;
> > > > - struct vduse_iova_domain *domain = dev->domain;
> > > >
> > > > /* The coherent mappings are handled in vduse_dev_free_coherent() */
> > > > - if (domain && domain->bounce_map)
> > > > - vduse_domain_reset_bounce_map(domain);
> > > > + for (i = 0; i < dev->nas; i++) {
> > > > + struct vduse_iova_domain *domain = dev->as[i].domain;
> > > > +
> > > > + if (domain && domain->bounce_map)
> > > > + vduse_domain_reset_bounce_map(domain);
> > > > + }
> > > >
> > > > down_write(&dev->rwsem);
> > > >
> > > > @@ -622,6 +639,31 @@ static union virtio_map vduse_get_vq_map(struct vdpa_device *vdpa, u16 idx)
> > > > return ret;
> > > > }
> > > >
> > > > +static int vduse_set_group_asid(struct vdpa_device *vdpa, unsigned int group,
> > > > + unsigned int asid)
> > > > +{
> > > > + struct vduse_dev *dev = vdpa_to_vduse(vdpa);
> > > > + struct vduse_dev_msg msg = { 0 };
> > > > + int r;
> > > > +
> > > > + if (dev->api_version < VDUSE_API_VERSION_1)
> > > > + return -EINVAL;
> > > > +
> > > > + msg.req.type = VDUSE_SET_VQ_GROUP_ASID;
> > > > + msg.req.vq_group_asid.group = group;
> > > > + msg.req.vq_group_asid.asid = asid;
> > > > +
> > > > + r = vduse_dev_msg_sync(dev, &msg);
> > > > + if (r < 0)
> > > > + return r;
> > > > +
> > > > + write_lock(&dev->groups[group].as_lock);
> > > > + dev->groups[group].as = &dev->as[asid];
> > > > + write_unlock(&dev->groups[group].as_lock);
> > > > +
> > > > + return 0;
> > > > +}
> > > > +
> > > > static int vduse_vdpa_get_vq_state(struct vdpa_device *vdpa, u16 idx,
> > > > struct vdpa_vq_state *state)
> > > > {
> > > > @@ -793,13 +835,13 @@ static int vduse_vdpa_set_map(struct vdpa_device *vdpa,
> > > > struct vduse_dev *dev = vdpa_to_vduse(vdpa);
> > > > int ret;
> > > >
> > > > - ret = vduse_domain_set_map(dev->domain, iotlb);
> > > > + ret = vduse_domain_set_map(dev->as[asid].domain, iotlb);
> > > > if (ret)
> > > > return ret;
> > > >
> > > > - ret = vduse_dev_update_iotlb(dev, 0ULL, ULLONG_MAX);
> > > > + ret = vduse_dev_update_iotlb(dev, asid, 0ULL, ULLONG_MAX);
> > > > if (ret) {
> > > > - vduse_domain_clear_map(dev->domain, iotlb);
> > > > + vduse_domain_clear_map(dev->as[asid].domain, iotlb);
> > > > return ret;
> > > > }
> > > >
> > > > @@ -842,6 +884,7 @@ static const struct vdpa_config_ops vduse_vdpa_config_ops = {
> > > > .get_vq_affinity = vduse_vdpa_get_vq_affinity,
> > > > .reset = vduse_vdpa_reset,
> > > > .set_map = vduse_vdpa_set_map,
> > > > + .set_group_asid = vduse_set_group_asid,
> > > > .get_vq_map = vduse_get_vq_map,
> > > > .free = vduse_vdpa_free,
> > > > };
> > > > @@ -850,32 +893,38 @@ static void vduse_dev_sync_single_for_device(union virtio_map token,
> > > > dma_addr_t dma_addr, size_t size,
> > > > enum dma_data_direction dir)
> > > > {
> > > > - struct vduse_dev *vdev;
> > > > struct vduse_iova_domain *domain;
> > > >
> > > > if (!token.group)
> > > > return;
> > > >
> > > > - vdev = token.group->dev;
> > > > - domain = vdev->domain;
> > > > + if (token.group->dev->nas > 1)
> > > > + read_lock(&token.group->as_lock);
> > >
> > > I suggest to factor this and unlock out into helpers.
> > >
> >
> > I'm moving to scroped guards.
> >
> > > >
> > > > + domain = token.group->as->domain;
> > > > vduse_domain_sync_single_for_device(domain, dma_addr, size, dir);
> > > > +
> > > > + if (token.group->dev->nas > 1)
> > > > + read_unlock(&token.group->as_lock);
> > > > }
> > > >
> > > > static void vduse_dev_sync_single_for_cpu(union virtio_map token,
> > > > dma_addr_t dma_addr, size_t size,
> > > > enum dma_data_direction dir)
> > > > {
> > > > - struct vduse_dev *vdev;
> > > > struct vduse_iova_domain *domain;
> > > >
> > > > if (!token.group)
> > > > return;
> > > >
> > > > - vdev = token.group->dev;
> > > > - domain = vdev->domain;
> > > > + if (token.group->dev->nas > 1)
> > > > + read_lock(&token.group->as_lock);
> > > >
> > > > + domain = token.group->as->domain;
> > > > vduse_domain_sync_single_for_cpu(domain, dma_addr, size, dir);
> > > > +
> > > > + if (token.group->dev->nas > 1)
> > > > + read_unlock(&token.group->as_lock);
> > > > }
> > > >
> > > > static dma_addr_t vduse_dev_map_page(union virtio_map token, struct page *page,
> > > > @@ -883,38 +932,45 @@ static dma_addr_t vduse_dev_map_page(union virtio_map token, struct page *page,
> > > > enum dma_data_direction dir,
> > > > unsigned long attrs)
> > > > {
> > > > - struct vduse_dev *vdev;
> > > > struct vduse_iova_domain *domain;
> > > > + dma_addr_t r;
> > > >
> > > > if (!token.group)
> > > > return DMA_MAPPING_ERROR;
> > > >
> > > > - vdev = token.group->dev;
> > > > - domain = vdev->domain;
> > > > + if (token.group->dev->nas > 1)
> > > > + read_lock(&token.group->as_lock);
> > > > + domain = token.group->as->domain;
> > > > + r = vduse_domain_map_page(domain, page, offset, size, dir, attrs);
> > > >
> > > > - return vduse_domain_map_page(domain, page, offset, size, dir, attrs);
> > > > + if (token.group->dev->nas > 1)
> > > > + read_unlock(&token.group->as_lock);
> > > > +
> > > > + return r;
> > > > }
> > > >
> > > > static void vduse_dev_unmap_page(union virtio_map token, dma_addr_t dma_addr,
> > > > size_t size, enum dma_data_direction dir,
> > > > unsigned long attrs)
> > > > {
> > > > - struct vduse_dev *vdev;
> > > > struct vduse_iova_domain *domain;
> > > >
> > > > if (!token.group)
> > > > return;
> > > >
> > > > - vdev = token.group->dev;
> > > > - domain = vdev->domain;
> > > > + if (token.group->dev->nas > 1)
> > > > + read_lock(&token.group->as_lock);
> > > > +
> > > > + domain = token.group->as->domain;
> > > > + vduse_domain_unmap_page(domain, dma_addr, size, dir, attrs);
> > > >
> > > > - return vduse_domain_unmap_page(domain, dma_addr, size, dir, attrs);
> > > > + if (token.group->dev->nas > 1)
> > > > + read_unlock(&token.group->as_lock);
> > > > }
> > > >
> > > > static void *vduse_dev_alloc_coherent(union virtio_map token, size_t size,
> > > > dma_addr_t *dma_addr, gfp_t flag)
> > > > {
> > > > - struct vduse_dev *vdev;
> > > > struct vduse_iova_domain *domain;
> > > > unsigned long iova;
> > > > void *addr;
> > > > @@ -927,8 +983,10 @@ static void *vduse_dev_alloc_coherent(union virtio_map token, size_t size,
> > > > if (!addr)
> > > > return NULL;
> > > >
> > > > - vdev = token.group->dev;
> > > > - domain = vdev->domain;
> > > > + if (token.group->dev->nas > 1)
> > > > + read_lock(&token.group->as_lock);
> > > > +
> > > > + domain = token.group->as->domain;
> > > > addr = vduse_domain_alloc_coherent(domain, size,
> > > > (dma_addr_t *)&iova, addr);
> > > > if (!addr)
> > > > @@ -936,9 +994,14 @@ static void *vduse_dev_alloc_coherent(union virtio_map token, size_t size,
> > > >
> > > > *dma_addr = (dma_addr_t)iova;
> > > >
> > > > + if (token.group->dev->nas > 1)
> > > > + read_unlock(&token.group->as_lock);
> > > > +
> > > > return addr;
> > > >
> > > > err:
> > > > + if (token.group->dev->nas > 1)
> > > > + read_unlock(&token.group->as_lock);
> > > > free_pages_exact(addr, size);
> > > > return NULL;
> > > > }
> > > > @@ -947,31 +1010,39 @@ static void vduse_dev_free_coherent(union virtio_map token, size_t size,
> > > > void *vaddr, dma_addr_t dma_addr,
> > > > unsigned long attrs)
> > > > {
> > > > - struct vduse_dev *vdev;
> > > > struct vduse_iova_domain *domain;
> > > >
> > > > if (!token.group)
> > > > return;
> > > >
> > > > - vdev = token.group->dev;
> > > > - domain = vdev->domain;
> > > > + if (token.group->dev->nas > 1)
> > > > + read_lock(&token.group->as_lock);
> > > >
> > > > + domain = token.group->as->domain;
> > > > vduse_domain_free_coherent(domain, size, dma_addr, attrs);
> > > > +
> > > > + if (token.group->dev->nas > 1)
> > > > + read_unlock(&token.group->as_lock);
> > > > +
> > > > free_pages_exact(vaddr, size);
> > > > }
> > > >
> > > > static bool vduse_dev_need_sync(union virtio_map token, dma_addr_t dma_addr)
> > > > {
> > > > - struct vduse_dev *vdev;
> > > > - struct vduse_iova_domain *domain;
> > > > + size_t bounce_size;
> > > >
> > > > if (!token.group)
> > > > return false;
> > > >
> > > > - vdev = token.group->dev;
> > > > - domain = vdev->domain;
> > > > + if (token.group->dev->nas > 1)
> > > > + read_lock(&token.group->as_lock);
> > > > +
> > > > + bounce_size = token.group->as->domain->bounce_size;
> > > > +
> > > > + if (token.group->dev->nas > 1)
> > > > + read_unlock(&token.group->as_lock);
> > > >
> > > > - return dma_addr < domain->bounce_size;
> > > > + return dma_addr < bounce_size;
> > > > }
> > > >
> > > > static int vduse_dev_mapping_error(union virtio_map token, dma_addr_t dma_addr)
> > > > @@ -983,16 +1054,20 @@ static int vduse_dev_mapping_error(union virtio_map token, dma_addr_t dma_addr)
> > > >
> > > > static size_t vduse_dev_max_mapping_size(union virtio_map token)
> > > > {
> > > > - struct vduse_dev *vdev;
> > > > - struct vduse_iova_domain *domain;
> > > > + size_t bounce_size;
> > > >
> > > > if (!token.group)
> > > > return 0;
> > > >
> > > > - vdev = token.group->dev;
> > > > - domain = vdev->domain;
> > > > + if (token.group->dev->nas > 1)
> > > > + read_lock(&token.group->as_lock);
> > > > +
> > > > + bounce_size = token.group->as->domain->bounce_size;
> > > > +
> > > > + if (token.group->dev->nas > 1)
> > > > + read_unlock(&token.group->as_lock);
> > > >
> > > > - return domain->bounce_size;
> > > > + return bounce_size;
> > > > }
> > > >
> > > > static const struct virtio_map_ops vduse_map_ops = {
> > > > @@ -1132,39 +1207,40 @@ static int vduse_dev_queue_irq_work(struct vduse_dev *dev,
> > > > return ret;
> > > > }
> > > >
> > > > -static int vduse_dev_dereg_umem(struct vduse_dev *dev,
> > > > +static int vduse_dev_dereg_umem(struct vduse_dev *dev, u32 asid,
> > > > u64 iova, u64 size)
> > > > {
> > > > int ret;
> > > >
> > > > - mutex_lock(&dev->mem_lock);
> > > > + mutex_lock(&dev->as[asid].mem_lock);
> > > > ret = -ENOENT;
> > > > - if (!dev->umem)
> > > > + if (!dev->as[asid].umem)
> > > > goto unlock;
> > > >
> > > > ret = -EINVAL;
> > > > - if (!dev->domain)
> > > > + if (!dev->as[asid].domain)
> > > > goto unlock;
> > > >
> > > > - if (dev->umem->iova != iova || size != dev->domain->bounce_size)
> > > > + if (dev->as[asid].umem->iova != iova ||
> > > > + size != dev->as[asid].domain->bounce_size)
> > > > goto unlock;
> > > >
> > > > - vduse_domain_remove_user_bounce_pages(dev->domain);
> > > > - unpin_user_pages_dirty_lock(dev->umem->pages,
> > > > - dev->umem->npages, true);
> > > > - atomic64_sub(dev->umem->npages, &dev->umem->mm->pinned_vm);
> > > > - mmdrop(dev->umem->mm);
> > > > - vfree(dev->umem->pages);
> > > > - kfree(dev->umem);
> > > > - dev->umem = NULL;
> > > > + vduse_domain_remove_user_bounce_pages(dev->as[asid].domain);
> > > > + unpin_user_pages_dirty_lock(dev->as[asid].umem->pages,
> > > > + dev->as[asid].umem->npages, true);
> > > > + atomic64_sub(dev->as[asid].umem->npages, &dev->as[asid].umem->mm->pinned_vm);
> > > > + mmdrop(dev->as[asid].umem->mm);
> > > > + vfree(dev->as[asid].umem->pages);
> > > > + kfree(dev->as[asid].umem);
> > > > + dev->as[asid].umem = NULL;
> > > > ret = 0;
> > > > unlock:
> > > > - mutex_unlock(&dev->mem_lock);
> > > > + mutex_unlock(&dev->as[asid].mem_lock);
> > > > return ret;
> > > > }
> > > >
> > > > static int vduse_dev_reg_umem(struct vduse_dev *dev,
> > > > - u64 iova, u64 uaddr, u64 size)
> > > > + u32 asid, u64 iova, u64 uaddr, u64 size)
> > > > {
> > > > struct page **page_list = NULL;
> > > > struct vduse_umem *umem = NULL;
> > > > @@ -1172,14 +1248,14 @@ static int vduse_dev_reg_umem(struct vduse_dev *dev,
> > > > unsigned long npages, lock_limit;
> > > > int ret;
> > > >
> > > > - if (!dev->domain || !dev->domain->bounce_map ||
> > > > - size != dev->domain->bounce_size ||
> > > > + if (!dev->as[asid].domain || !dev->as[asid].domain->bounce_map ||
> > > > + size != dev->as[asid].domain->bounce_size ||
> > > > iova != 0 || uaddr & ~PAGE_MASK)
> > > > return -EINVAL;
> > > >
> > > > - mutex_lock(&dev->mem_lock);
> > > > + mutex_lock(&dev->as[asid].mem_lock);
> > > > ret = -EEXIST;
> > > > - if (dev->umem)
> > > > + if (dev->as[asid].umem)
> > > > goto unlock;
> > > >
> > > > ret = -ENOMEM;
> > > > @@ -1203,7 +1279,7 @@ static int vduse_dev_reg_umem(struct vduse_dev *dev,
> > > > goto out;
> > > > }
> > > >
> > > > - ret = vduse_domain_add_user_bounce_pages(dev->domain,
> > > > + ret = vduse_domain_add_user_bounce_pages(dev->as[asid].domain,
> > > > page_list, pinned);
> > > > if (ret)
> > > > goto out;
> > > > @@ -1216,7 +1292,7 @@ static int vduse_dev_reg_umem(struct vduse_dev *dev,
> > > > umem->mm = current->mm;
> > > > mmgrab(current->mm);
> > > >
> > > > - dev->umem = umem;
> > > > + dev->as[asid].umem = umem;
> > > > out:
> > > > if (ret && pinned > 0)
> > > > unpin_user_pages(page_list, pinned);
> > > > @@ -1227,7 +1303,7 @@ static int vduse_dev_reg_umem(struct vduse_dev *dev,
> > > > vfree(page_list);
> > > > kfree(umem);
> > > > }
> > > > - mutex_unlock(&dev->mem_lock);
> > > > + mutex_unlock(&dev->as[asid].mem_lock);
> > > > return ret;
> > > > }
> > > >
> > > > @@ -1248,43 +1324,46 @@ static void vduse_vq_update_effective_cpu(struct vduse_virtqueue *vq)
> > > > }
> > > >
> > > > static int vduse_dev_iotlb_entry(struct vduse_dev *dev,
> > > > - struct vduse_iotlb_entry *entry,
> > > > + struct vduse_iotlb_entry_v2 *entry,
> > > > struct file **f, uint64_t *capability)
> > > > {
> > > > + u32 asid;
> > > > int r = -EINVAL;
> > > > struct vhost_iotlb_map *map;
> > > > const struct vdpa_map_file *map_file;
> > > >
> > > > - if (entry->start > entry->last)
> > > > + if (entry->v1.start > entry->v1.last || entry->asid >= dev->nas)
> > > > return -EINVAL;
> > > >
> > > > + asid = array_index_nospec(entry->asid, dev->nas);
> > > > mutex_lock(&dev->domain_lock);
> > > > - if (!dev->domain)
> > > > +
> > > > + if (!dev->as[asid].domain)
> > > > goto out;
> > > >
> > > > - spin_lock(&dev->domain->iotlb_lock);
> > > > - map = vhost_iotlb_itree_first(dev->domain->iotlb, entry->start,
> > > > - entry->last);
> > > > + spin_lock(&dev->as[asid].domain->iotlb_lock);
> > > > + map = vhost_iotlb_itree_first(dev->as[asid].domain->iotlb,
> > > > + entry->v1.start, entry->v1.last);
> > > > if (map) {
> > > > if (f) {
> > > > map_file = (struct vdpa_map_file *)map->opaque;
> > > > *f = get_file(map_file->file);
> > > > }
> > > > - entry->offset = map_file->offset;
> > > > - entry->start = map->start;
> > > > - entry->last = map->last;
> > > > - entry->perm = map->perm;
> > > > + entry->v1.offset = map_file->offset;
> > > > + entry->v1.start = map->start;
> > > > + entry->v1.last = map->last;
> > > > + entry->v1.perm = map->perm;
> > > > if (capability) {
> > > > *capability = 0;
> > > >
> > > > - if (dev->domain->bounce_map && map->start == 0 &&
> > > > - map->last == dev->domain->bounce_size - 1)
> > > > + if (dev->as[asid].domain->bounce_map && map->start == 0 &&
> > > > + map->last == dev->as[asid].domain->bounce_size - 1)
> > > > *capability |= VDUSE_IOVA_CAP_UMEM;
> > > > }
> > > >
> > > > r = 0;
> > > > }
> > > > - spin_unlock(&dev->domain->iotlb_lock);
> > > > + spin_unlock(&dev->as[asid].domain->iotlb_lock);
> > > >
> > > > out:
> > > > mutex_unlock(&dev->domain_lock);
> > > > @@ -1302,12 +1381,29 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
> > > > return -EPERM;
> > > >
> > > > switch (cmd) {
> > > > - case VDUSE_IOTLB_GET_FD: {
> > > > - struct vduse_iotlb_entry entry;
> > > > + case VDUSE_IOTLB_GET_FD:
> > > > + case VDUSE_IOTLB_GET_FD2: {
> > >
> > > I would rename this as GET_FD_ASID
> >
> > It does not scale very well. What if we add more fields to the
> > reserved members of the ioctl argument?
>
> New ioctl then? Or you meant we can know which fields are meaningful
> according to the api version or other?
>
(I think it is clear with the rest of the mail but I'll make my point
clear here in case I'm missing anything).
I meant that having N iotlb entry properties and one ioctl per
properties combinations does not scale very well.
If we add another property that makes 3 ioctl already. Let's say we
also want to return the iotlb capability too. That makes GET_FD_ASID,
GET_FD_CAPABILITY, and GET_FD_ASID_CAPABILITY to return both of them.
If we add another one, that makes 15, and so on.
Compared with that approach, VDUSE_IOTLB_REG_UMEM did the right thing
from the beginning: It reserves space so the size can grow in the
future based on conditions (API version, feature flags, ioctl argument
flags...).
> >
> > > but I wonder the reason for the new
> > > ioctl. I think we can deduce it from the API version,
> >
> > It's in the changelog but let me know if I should expand there.
> >
> > The reason for a new ioctl is that the previous one is defined in the
> > uapi with a struct size that is not able to hold the addition of the
> > asid. From the documentation [1]:
> >
> > "the command number encodes the sizeof(data_type) value in a 13-bit or
> > 14-bit integer"
> >
> > I'm not sure if any sanitizer checks this or if it will be implemented
> > in the future though. My interpretation of the next section of the
> > ioctl documentation "Interface versions" also recommends creating a
> > new versioned ioctl.
> >
> > Sadly I didn't realize this in the first versions so I introduced it later.
>
> You're right.
>
> >
> > > for example, you
> > > didn't introduce VDUSE_IOTLB_REG_UMEM2.
> > >
> >
> > VDUSE_IOTLB_REG_UMEM already has room for extension in the form of
> > reserved members that must be zero in the master branch.
>
> Right.
>
> >
> > > > + struct vduse_iotlb_entry_v2 entry = {0};
> > > > struct file *f = NULL;
> > > >
> > > > + ret = -ENOIOCTLCMD;
> > > > + if (dev->api_version < VDUSE_API_VERSION_1 &&
> > > > + cmd == VDUSE_IOTLB_GET_FD2)
> > > > + break;
> > > > +
> > > > ret = -EFAULT;
> > > > - if (copy_from_user(&entry, argp, sizeof(entry)))
> > > > + if (cmd == VDUSE_IOTLB_GET_FD2) {
> > > > + if (copy_from_user(&entry, argp, sizeof(entry)))
> > > > + break;
> > > > + } else {
> > > > + if (copy_from_user(&entry.v1, argp,
> > > > + sizeof(entry.v1)))
> > > > + break;
> > > > + }
> > > > +
> > > > + ret = -EINVAL;
> > > > + if (!is_mem_zero((const char *)entry.reserved,
> > > > + sizeof(entry.reserved)))
> > > > break;
> > > >
> > > > ret = vduse_dev_iotlb_entry(dev, &entry, &f, NULL);
> > > > @@ -1318,12 +1414,19 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
> > > > if (!f)
> > > > break;
> > > >
> > > > - ret = -EFAULT;
> > > > - if (copy_to_user(argp, &entry, sizeof(entry))) {
> > > > + if (cmd == VDUSE_IOTLB_GET_FD2)
> > > > + ret = copy_to_user(argp, &entry,
> > > > + sizeof(entry));
> > > > + else
> > > > + ret = copy_to_user(argp, &entry.v1,
> > > > + sizeof(entry.v1));
> > > > +
> > > > + if (ret) {
> > > > + ret = -EFAULT;
> > > > fput(f);
> > > > break;
> > > > }
> > > > - ret = receive_fd(f, NULL, perm_to_file_flags(entry.perm));
> > > > + ret = receive_fd(f, NULL, perm_to_file_flags(entry.v1.perm));
> > > > fput(f);
> > > > break;
> > > > }
> > > > @@ -1468,6 +1571,7 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
> > > > }
> > > > case VDUSE_IOTLB_REG_UMEM: {
> > > > struct vduse_iova_umem umem;
> > > > + u32 asid;
> > > >
> > > > ret = -EFAULT;
> > > > if (copy_from_user(&umem, argp, sizeof(umem)))
> > > > @@ -1475,17 +1579,21 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
> > > >
> > > > ret = -EINVAL;
> > > > if (!is_mem_zero((const char *)umem.reserved,
> > > > - sizeof(umem.reserved)))
> > > > + sizeof(umem.reserved)) ||
> > > > + (dev->api_version < VDUSE_API_VERSION_1 &&
> > > > + umem.asid != 0) || umem.asid >= dev->nas)
> > > > break;
> > > >
> > > > mutex_lock(&dev->domain_lock);
> > > > - ret = vduse_dev_reg_umem(dev, umem.iova,
> > > > + asid = array_index_nospec(umem.asid, dev->nas);
> > > > + ret = vduse_dev_reg_umem(dev, asid, umem.iova,
> > > > umem.uaddr, umem.size);
> > > > mutex_unlock(&dev->domain_lock);
> > > > break;
> > > > }
> > > > case VDUSE_IOTLB_DEREG_UMEM: {
> > > > struct vduse_iova_umem umem;
> > > > + u32 asid;
> > > >
> > > > ret = -EFAULT;
> > > > if (copy_from_user(&umem, argp, sizeof(umem)))
> > > > @@ -1493,17 +1601,22 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
> > > >
> > > > ret = -EINVAL;
> > > > if (!is_mem_zero((const char *)umem.reserved,
> > > > - sizeof(umem.reserved)))
> > > > + sizeof(umem.reserved)) ||
> > > > + (dev->api_version < VDUSE_API_VERSION_1 &&
> > > > + umem.asid != 0) ||
> > > > + umem.asid >= dev->nas)
> > > > break;
> > > > +
> > > > mutex_lock(&dev->domain_lock);
> > > > - ret = vduse_dev_dereg_umem(dev, umem.iova,
> > > > + asid = array_index_nospec(umem.asid, dev->nas);
> > > > + ret = vduse_dev_dereg_umem(dev, asid, umem.iova,
> > > > umem.size);
> > > > mutex_unlock(&dev->domain_lock);
> > > > break;
> > > > }
> > > > case VDUSE_IOTLB_GET_INFO: {
> > > > struct vduse_iova_info info;
> > > > - struct vduse_iotlb_entry entry;
> > > > + struct vduse_iotlb_entry_v2 entry;
> > > >
> > > > ret = -EFAULT;
> > > > if (copy_from_user(&info, argp, sizeof(info)))
> > > > @@ -1513,15 +1626,23 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
> > > > sizeof(info.reserved)))
> > > > break;
> > > >
> > > > - entry.start = info.start;
> > > > - entry.last = info.last;
> > > > + if (dev->api_version < VDUSE_API_VERSION_1) {
> > > > + if (info.asid)
> > > > + break;
> > > > + } else if (info.asid >= dev->nas)
> > > > + break;
> > > > +
> > > > + entry.v1.start = info.start;
> > > > + entry.v1.last = info.last;
> > > > + entry.asid = info.asid;
> > > > ret = vduse_dev_iotlb_entry(dev, &entry, NULL,
> > > > &info.capability);
> > > > if (ret < 0)
> > > > break;
> > > >
> > > > - info.start = entry.start;
> > > > - info.last = entry.last;
> > > > + info.start = entry.v1.start;
> > > > + info.last = entry.v1.last;
> > > > + info.asid = entry.asid;
> > > >
> > > > ret = -EFAULT;
> > > > if (copy_to_user(argp, &info, sizeof(info)))
> > > > @@ -1543,8 +1664,10 @@ static int vduse_dev_release(struct inode *inode, struct file *file)
> > > > struct vduse_dev *dev = file->private_data;
> > > >
> > > > mutex_lock(&dev->domain_lock);
> > > > - if (dev->domain)
> > > > - vduse_dev_dereg_umem(dev, 0, dev->domain->bounce_size);
> > > > + for (int i = 0; i < dev->nas; i++)
> > > > + if (dev->as[i].domain)
> > > > + vduse_dev_dereg_umem(dev, i, 0,
> > > > + dev->as[i].domain->bounce_size);
> > > > mutex_unlock(&dev->domain_lock);
> > > > spin_lock(&dev->msg_lock);
> > > > /* Make sure the inflight messages can processed after reconncection */
> > > > @@ -1763,7 +1886,6 @@ static struct vduse_dev *vduse_dev_create(void)
> > > > return NULL;
> > > >
> > > > mutex_init(&dev->lock);
> > > > - mutex_init(&dev->mem_lock);
> > > > mutex_init(&dev->domain_lock);
> > > > spin_lock_init(&dev->msg_lock);
> > > > INIT_LIST_HEAD(&dev->send_list);
> > > > @@ -1814,8 +1936,11 @@ static int vduse_destroy_dev(char *name)
> > > > idr_remove(&vduse_idr, dev->minor);
> > > > kvfree(dev->config);
> > > > vduse_dev_deinit_vqs(dev);
> > > > - if (dev->domain)
> > > > - vduse_domain_destroy(dev->domain);
> > > > + for (int i = 0; i < dev->nas; i++) {
> > > > + if (dev->as[i].domain)
> > > > + vduse_domain_destroy(dev->as[i].domain);
> > > > + }
> > > > + kfree(dev->as);
> > > > kfree(dev->name);
> > > > kfree(dev->groups);
> > > > vduse_dev_destroy(dev);
> > > > @@ -1862,12 +1987,17 @@ static bool vduse_validate_config(struct vduse_dev_config *config,
> > > > sizeof(config->reserved)))
> > > > return false;
> > > >
> > > > - if (api_version < VDUSE_API_VERSION_1 && config->ngroups)
> > > > + if (api_version < VDUSE_API_VERSION_1 &&
> > > > + (config->ngroups || config->nas))
> > > > return false;
> > > >
> > > > - if (api_version >= VDUSE_API_VERSION_1 &&
> > > > - (!config->ngroups || config->ngroups > VDUSE_DEV_MAX_GROUPS))
> > > > - return false;
> > > > + if (api_version >= VDUSE_API_VERSION_1) {
> > > > + if (!config->ngroups || config->ngroups > VDUSE_DEV_MAX_GROUPS)
> > > > + return false;
> > > > +
> > > > + if (!config->nas || config->nas > VDUSE_DEV_MAX_AS)
> > > > + return false;
> > > > + }
> > > >
> > > > if (config->vq_align > PAGE_SIZE)
> > > > return false;
> > > > @@ -1932,7 +2062,8 @@ static ssize_t bounce_size_store(struct device *device,
> > > >
> > > > ret = -EPERM;
> > > > mutex_lock(&dev->domain_lock);
> > > > - if (dev->domain)
> > > > + /* Assuming that if the first domain is allocated, all are allocated */
> > > > + if (dev->as[0].domain)
> > > > goto unlock;
> > >
> > > Should we update the per as bounce size here, and if yes, how to
> > > synchronize with need_sync()?
> > >
> >
> > No, per as bounce size is still not allocated. It is stored in the
> > per->as[i].domain we check that it is not allocated in this
> > conditional [2].
>
> Exactly.
>
> Thanks
>
^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH v11 12/12] vduse: bump version number
2026-01-09 15:24 [PATCH v11 00/12] Add multiple address spaces support to VDUSE Eugenio Pérez
` (10 preceding siblings ...)
2026-01-09 15:24 ` [PATCH v11 11/12] vduse: add vq group asid support Eugenio Pérez
@ 2026-01-09 15:24 ` Eugenio Pérez
2026-01-13 6:25 ` Jason Wang
11 siblings, 1 reply; 40+ messages in thread
From: Eugenio Pérez @ 2026-01-09 15:24 UTC (permalink / raw)
To: Michael S . Tsirkin
Cc: linux-kernel, virtualization, Maxime Coquelin, Laurent Vivier,
Cindy Lu, jasowang, Xuan Zhuo, Stefano Garzarella, Yongji Xie,
Eugenio Pérez
Finalize the series by advertising VDUSE API v1 support to userspace.
Now that all required infrastructure for v1 (ASIDs, VQ groups,
update_iotlb_v2) is in place, VDUSE devices can opt in to the new
features.
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Xie Yongji <xieyongji@bytedance.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
drivers/vdpa/vdpa_user/vduse_dev.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
index 8227b5e9f3f6..5ad0ba1392f3 100644
--- a/drivers/vdpa/vdpa_user/vduse_dev.c
+++ b/drivers/vdpa/vdpa_user/vduse_dev.c
@@ -2201,7 +2201,7 @@ static long vduse_ioctl(struct file *file, unsigned int cmd,
break;
ret = -EINVAL;
- if (api_version > VDUSE_API_VERSION)
+ if (api_version > VDUSE_API_VERSION_1)
break;
ret = 0;
@@ -2268,7 +2268,7 @@ static int vduse_open(struct inode *inode, struct file *file)
if (!control)
return -ENOMEM;
- control->api_version = VDUSE_API_VERSION;
+ control->api_version = VDUSE_API_VERSION_1;
file->private_data = control;
return 0;
--
2.52.0
^ permalink raw reply related [flat|nested] 40+ messages in thread* Re: [PATCH v11 12/12] vduse: bump version number
2026-01-09 15:24 ` [PATCH v11 12/12] vduse: bump version number Eugenio Pérez
@ 2026-01-13 6:25 ` Jason Wang
2026-01-13 15:09 ` Eugenio Perez Martin
0 siblings, 1 reply; 40+ messages in thread
From: Jason Wang @ 2026-01-13 6:25 UTC (permalink / raw)
To: Eugenio Pérez
Cc: Michael S . Tsirkin, linux-kernel, virtualization,
Maxime Coquelin, Laurent Vivier, Cindy Lu, Xuan Zhuo,
Stefano Garzarella, Yongji Xie
On Fri, Jan 9, 2026 at 11:25 PM Eugenio Pérez <eperezma@redhat.com> wrote:
>
> Finalize the series by advertising VDUSE API v1 support to userspace.
>
> Now that all required infrastructure for v1 (ASIDs, VQ groups,
> update_iotlb_v2) is in place, VDUSE devices can opt in to the new
> features.
>
> Acked-by: Jason Wang <jasowang@redhat.com>
> Reviewed-by: Xie Yongji <xieyongji@bytedance.com>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
> drivers/vdpa/vdpa_user/vduse_dev.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
> index 8227b5e9f3f6..5ad0ba1392f3 100644
> --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> @@ -2201,7 +2201,7 @@ static long vduse_ioctl(struct file *file, unsigned int cmd,
> break;
>
> ret = -EINVAL;
> - if (api_version > VDUSE_API_VERSION)
> + if (api_version > VDUSE_API_VERSION_1)
> break;
>
> ret = 0;
> @@ -2268,7 +2268,7 @@ static int vduse_open(struct inode *inode, struct file *file)
> if (!control)
> return -ENOMEM;
>
> - control->api_version = VDUSE_API_VERSION;
> + control->api_version = VDUSE_API_VERSION_1;
This can break the "legacy" userspace that doesn't call VDUSE_SET_API_VERSION?
Thanks
> file->private_data = control;
>
> return 0;
> --
> 2.52.0
>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v11 12/12] vduse: bump version number
2026-01-13 6:25 ` Jason Wang
@ 2026-01-13 15:09 ` Eugenio Perez Martin
0 siblings, 0 replies; 40+ messages in thread
From: Eugenio Perez Martin @ 2026-01-13 15:09 UTC (permalink / raw)
To: Jason Wang
Cc: Michael S . Tsirkin, linux-kernel, virtualization,
Maxime Coquelin, Laurent Vivier, Cindy Lu, Xuan Zhuo,
Stefano Garzarella, Yongji Xie
On Tue, Jan 13, 2026 at 7:25 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Fri, Jan 9, 2026 at 11:25 PM Eugenio Pérez <eperezma@redhat.com> wrote:
> >
> > Finalize the series by advertising VDUSE API v1 support to userspace.
> >
> > Now that all required infrastructure for v1 (ASIDs, VQ groups,
> > update_iotlb_v2) is in place, VDUSE devices can opt in to the new
> > features.
> >
> > Acked-by: Jason Wang <jasowang@redhat.com>
> > Reviewed-by: Xie Yongji <xieyongji@bytedance.com>
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> > drivers/vdpa/vdpa_user/vduse_dev.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
> > index 8227b5e9f3f6..5ad0ba1392f3 100644
> > --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> > +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> > @@ -2201,7 +2201,7 @@ static long vduse_ioctl(struct file *file, unsigned int cmd,
> > break;
> >
> > ret = -EINVAL;
> > - if (api_version > VDUSE_API_VERSION)
> > + if (api_version > VDUSE_API_VERSION_1)
> > break;
> >
> > ret = 0;
> > @@ -2268,7 +2268,7 @@ static int vduse_open(struct inode *inode, struct file *file)
> > if (!control)
> > return -ENOMEM;
> >
> > - control->api_version = VDUSE_API_VERSION;
> > + control->api_version = VDUSE_API_VERSION_1;
>
> This can break the "legacy" userspace that doesn't call VDUSE_SET_API_VERSION?
>
> Thanks
>
It can result in a userland visible change, yes. Protecting it for the
next version, thanks for pointing it out!
I don't think you can "break" any valid userspace VDUSE actually, the
only change the userland VDUSE can see is if calls
VDUSE_IOTLB_GET_FD2, set the vq group in VDUSE_VQ_SETUP, set the asid
in VDUSE_IOTLB_GET_INFO, or similar. In that case, the userland device
gets -ENOIOCTLCMD or -EINVAL and suddenly something "valid" happens if
the kernel applies this patch. But it is an userland visible change
anyway, so I'll protect it for the next version.
Thanks!
^ permalink raw reply [flat|nested] 40+ messages in thread