* [PATCH RFC v2 0/4] vdpa: decouple reset of iotlb mapping from device reset
@ 2023-09-09 13:31 Si-Wei Liu
2023-09-09 13:31 ` [PATCH RFC v2 1/4] vdpa: introduce .reset_map operation callback Si-Wei Liu
` (3 more replies)
0 siblings, 4 replies; 9+ messages in thread
From: Si-Wei Liu @ 2023-09-09 13:31 UTC (permalink / raw)
To: eperezma, jasowang, mst, xuanzhuo, dtatulea; +Cc: virtualization
In order to reduce needlessly high setup and teardown cost
of iotlb mapping during live migration, it's crucial to
decouple the vhost-vdpa iotlb abstraction from the virtio
device life cycle, i.e. iotlb mappings should be left
intact across virtio device reset [1]. For it to work, the
on-chip IOMMU parent device should implement a separate
.reset_map() operation callback to restore 1:1 DMA mapping
without having to resort to the .reset() callback, which
is mainly used to reset virtio specific device state.
This new .reset_map() callback will be invoked only when
the vhost-vdpa driver is to be removed and detached from
the vdpa bus, such that other vdpa bus drivers, e.g.
virtio-vdpa, can start with 1:1 DMA mapping when they
are attached. For the context, those on-chip IOMMU parent
devices, create the 1:1 DMA mapping at vdpa device add,
and they would implicitly destroy the 1:1 mapping when
the first .set_map or .dma_map callback is invoked.
[1] Reducing vdpa migration downtime because of memory pin / maps
https://www.mail-archive.com/qemu-devel@nongnu.org/msg953755.html
---
RFC v2:
- rebased on top of the "[PATCH RFC v2 0/3] vdpa: dedicated descriptor table group" series:
https://lore.kernel.org/virtualization/1694248959-13369-1-git-send-email-si-wei.liu@oracle.com/
---
Si-Wei Liu (4):
vdpa: introduce .reset_map operation callback
vdpa/mlx5: implement .reset_map driver op
vhost-vdpa: should restore 1:1 dma mapping before detaching driver
vhost-vdpa: introduce IOTLB_PERSIST backend feature bit
drivers/vdpa/mlx5/core/mlx5_vdpa.h | 1 +
drivers/vdpa/mlx5/core/mr.c | 70 +++++++++++++++++++++++---------------
drivers/vdpa/mlx5/net/mlx5_vnet.c | 18 +++++++---
drivers/vhost/vdpa.c | 32 ++++++++++++++++-
include/linux/vdpa.h | 7 ++++
include/uapi/linux/vhost_types.h | 2 ++
6 files changed, 96 insertions(+), 34 deletions(-)
--
1.8.3.1
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH RFC v2 1/4] vdpa: introduce .reset_map operation callback
2023-09-09 13:31 [PATCH RFC v2 0/4] vdpa: decouple reset of iotlb mapping from device reset Si-Wei Liu
@ 2023-09-09 13:31 ` Si-Wei Liu
2023-09-11 3:42 ` Jason Wang
2023-09-09 13:31 ` [PATCH RFC v2 2/4] vdpa/mlx5: implement .reset_map driver op Si-Wei Liu
` (2 subsequent siblings)
3 siblings, 1 reply; 9+ messages in thread
From: Si-Wei Liu @ 2023-09-09 13:31 UTC (permalink / raw)
To: eperezma, jasowang, mst, xuanzhuo, dtatulea; +Cc: virtualization
On-chip IOMMU parent driver could use it to restore memory mapping
to the initial state.
Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
---
include/linux/vdpa.h | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
index 17a4efa..daecf55 100644
--- a/include/linux/vdpa.h
+++ b/include/linux/vdpa.h
@@ -324,6 +324,12 @@ struct vdpa_map_file {
* @iova: iova to be unmapped
* @size: size of the area
* Returns integer: success (0) or error (< 0)
+ * @reset_map: Reset device memory mapping (optional)
+ * Needed for device that using device
+ * specific DMA translation (on-chip IOMMU)
+ * @vdev: vdpa device
+ * @asid: address space identifier
+ * Returns integer: success (0) or error (< 0)
* @get_vq_dma_dev: Get the dma device for a specific
* virtqueue (optional)
* @vdev: vdpa device
@@ -401,6 +407,7 @@ struct vdpa_config_ops {
u64 iova, u64 size, u64 pa, u32 perm, void *opaque);
int (*dma_unmap)(struct vdpa_device *vdev, unsigned int asid,
u64 iova, u64 size);
+ int (*reset_map)(struct vdpa_device *vdev, unsigned int asid);
int (*set_group_asid)(struct vdpa_device *vdev, unsigned int group,
unsigned int asid);
struct device *(*get_vq_dma_dev)(struct vdpa_device *vdev, u16 idx);
--
1.8.3.1
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH RFC v2 2/4] vdpa/mlx5: implement .reset_map driver op
2023-09-09 13:31 [PATCH RFC v2 0/4] vdpa: decouple reset of iotlb mapping from device reset Si-Wei Liu
2023-09-09 13:31 ` [PATCH RFC v2 1/4] vdpa: introduce .reset_map operation callback Si-Wei Liu
@ 2023-09-09 13:31 ` Si-Wei Liu
2023-09-09 13:31 ` [PATCH RFC v2 3/4] vhost-vdpa: should restore 1:1 dma mapping before detaching driver Si-Wei Liu
2023-09-09 13:31 ` [PATCH RFC v2 4/4] vhost-vdpa: introduce IOTLB_PERSIST backend feature bit Si-Wei Liu
3 siblings, 0 replies; 9+ messages in thread
From: Si-Wei Liu @ 2023-09-09 13:31 UTC (permalink / raw)
To: eperezma, jasowang, mst, xuanzhuo, dtatulea; +Cc: virtualization
Today, mlx5_vdpa gets started by preallocate 1:1 DMA mapping at
device creation time, while this 1:1 mapping will be implicitly
destroyed when the first .set_map call is invoked. Everytime
when the .reset callback is invoked, any mapping left behind will
be dropped then reset back to the initial 1:1 DMA mapping.
In order to reduce excessive memory mapping cost during live
migration, it is desirable to decouple the vhost-vdpa iotlb
abstraction from the virtio device life cycle, i.e. mappings
should be left intact across virtio device reset. Leverage the
.reset_map callback to reset memory mapping, then the device
.reset routine can run free from having to clean up memory
mappings.
Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
---
RFC v1 -> v2:
- fix error path when both CVQ and DVQ fall in same asid
---
drivers/vdpa/mlx5/core/mlx5_vdpa.h | 1 +
drivers/vdpa/mlx5/core/mr.c | 70 +++++++++++++++++++++++---------------
drivers/vdpa/mlx5/net/mlx5_vnet.c | 18 +++++++---
3 files changed, 56 insertions(+), 33 deletions(-)
diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
index b53420e..5c9a25a 100644
--- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
@@ -123,6 +123,7 @@ int mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev, struct vhost_iotlb *iotlb,
unsigned int asid);
void mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev);
void mlx5_vdpa_destroy_mr_asid(struct mlx5_vdpa_dev *mvdev, unsigned int asid);
+int mlx5_vdpa_reset_mr(struct mlx5_vdpa_dev *mvdev, unsigned int asid);
#define mlx5_vdpa_warn(__dev, format, ...) \
dev_warn((__dev)->mdev->device, "%s:%d:(pid %d) warning: " format, __func__, __LINE__, \
diff --git a/drivers/vdpa/mlx5/core/mr.c b/drivers/vdpa/mlx5/core/mr.c
index 5a1971fc..ec2c7b4e1 100644
--- a/drivers/vdpa/mlx5/core/mr.c
+++ b/drivers/vdpa/mlx5/core/mr.c
@@ -489,21 +489,15 @@ static void destroy_user_mr(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_mr *mr
}
}
-static void _mlx5_vdpa_destroy_cvq_mr(struct mlx5_vdpa_dev *mvdev, unsigned int asid)
+static void _mlx5_vdpa_destroy_cvq_mr(struct mlx5_vdpa_dev *mvdev)
{
- if (mvdev->group2asid[MLX5_VDPA_CVQ_GROUP] != asid)
- return;
-
prune_iotlb(mvdev);
}
-static void _mlx5_vdpa_destroy_dvq_mr(struct mlx5_vdpa_dev *mvdev, unsigned int asid)
+static void _mlx5_vdpa_destroy_dvq_mr(struct mlx5_vdpa_dev *mvdev)
{
struct mlx5_vdpa_mr *mr = &mvdev->mr;
- if (mvdev->group2asid[MLX5_VDPA_DATAVQ_GROUP] != asid)
- return;
-
if (!mr->initialized)
return;
@@ -521,8 +515,10 @@ void mlx5_vdpa_destroy_mr_asid(struct mlx5_vdpa_dev *mvdev, unsigned int asid)
mutex_lock(&mr->mkey_mtx);
- _mlx5_vdpa_destroy_dvq_mr(mvdev, asid);
- _mlx5_vdpa_destroy_cvq_mr(mvdev, asid);
+ if (mvdev->group2asid[MLX5_VDPA_DATAVQ_GROUP] == asid)
+ _mlx5_vdpa_destroy_dvq_mr(mvdev);
+ if (mvdev->group2asid[MLX5_VDPA_CVQ_GROUP] == asid)
+ _mlx5_vdpa_destroy_cvq_mr(mvdev);
mutex_unlock(&mr->mkey_mtx);
}
@@ -534,25 +530,17 @@ void mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev)
}
static int _mlx5_vdpa_create_cvq_mr(struct mlx5_vdpa_dev *mvdev,
- struct vhost_iotlb *iotlb,
- unsigned int asid)
+ struct vhost_iotlb *iotlb)
{
- if (mvdev->group2asid[MLX5_VDPA_CVQ_GROUP] != asid)
- return 0;
-
return dup_iotlb(mvdev, iotlb);
}
static int _mlx5_vdpa_create_dvq_mr(struct mlx5_vdpa_dev *mvdev,
- struct vhost_iotlb *iotlb,
- unsigned int asid)
+ struct vhost_iotlb *iotlb)
{
struct mlx5_vdpa_mr *mr = &mvdev->mr;
int err;
- if (mvdev->group2asid[MLX5_VDPA_DATAVQ_GROUP] != asid)
- return 0;
-
if (mr->initialized)
return 0;
@@ -574,18 +562,22 @@ static int _mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev,
{
int err;
- err = _mlx5_vdpa_create_dvq_mr(mvdev, iotlb, asid);
- if (err)
- return err;
-
- err = _mlx5_vdpa_create_cvq_mr(mvdev, iotlb, asid);
- if (err)
- goto out_err;
+ if (mvdev->group2asid[MLX5_VDPA_DATAVQ_GROUP] == asid) {
+ err = _mlx5_vdpa_create_dvq_mr(mvdev, iotlb);
+ if (err)
+ return err;
+ }
+ if (mvdev->group2asid[MLX5_VDPA_CVQ_GROUP] == asid) {
+ err = _mlx5_vdpa_create_cvq_mr(mvdev, iotlb);
+ if (err)
+ goto out_err;
+ }
return 0;
out_err:
- _mlx5_vdpa_destroy_dvq_mr(mvdev, asid);
+ if (mvdev->group2asid[MLX5_VDPA_DATAVQ_GROUP] == asid)
+ _mlx5_vdpa_destroy_dvq_mr(mvdev);
return err;
}
@@ -601,6 +593,28 @@ int mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev, struct vhost_iotlb *iotlb,
return err;
}
+int mlx5_vdpa_reset_mr(struct mlx5_vdpa_dev *mvdev, unsigned int asid)
+{
+ struct mlx5_vdpa_mr *mr = &mvdev->mr;
+ int err = 0;
+
+ if (asid != 0)
+ return 0;
+
+ mutex_lock(&mr->mkey_mtx);
+ if (!mr->user_mr)
+ goto out;
+ _mlx5_vdpa_destroy_dvq_mr(mvdev);
+ if (MLX5_CAP_GEN(mvdev->mdev, umem_uid_0)) {
+ err = _mlx5_vdpa_create_dvq_mr(mvdev, NULL);
+ if (err)
+ mlx5_vdpa_warn(mvdev, "create DMA MR failed\n");
+ }
+out:
+ mutex_unlock(&mr->mkey_mtx);
+ return err;
+}
+
int mlx5_vdpa_handle_set_map(struct mlx5_vdpa_dev *mvdev, struct vhost_iotlb *iotlb,
bool *change_map, unsigned int asid)
{
diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 37be945..3cb5db6 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -2824,7 +2824,6 @@ static int mlx5_vdpa_reset(struct vdpa_device *vdev)
unregister_link_notifier(ndev);
teardown_driver(ndev);
clear_vqs_ready(ndev);
- mlx5_vdpa_destroy_mr(&ndev->mvdev);
ndev->mvdev.status = 0;
ndev->mvdev.suspended = false;
ndev->cur_num_vqs = 0;
@@ -2835,10 +2834,6 @@ static int mlx5_vdpa_reset(struct vdpa_device *vdev)
init_group_to_asid_map(mvdev);
++mvdev->generation;
- if (MLX5_CAP_GEN(mvdev->mdev, umem_uid_0)) {
- if (mlx5_vdpa_create_mr(mvdev, NULL, 0))
- mlx5_vdpa_warn(mvdev, "create MR failed\n");
- }
up_write(&ndev->reslock);
return 0;
@@ -2903,6 +2898,18 @@ static int mlx5_vdpa_set_map(struct vdpa_device *vdev, unsigned int asid,
return err;
}
+static int mlx5_vdpa_reset_map(struct vdpa_device *vdev, unsigned int asid)
+{
+ struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
+ struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
+ int err;
+
+ down_write(&ndev->reslock);
+ err = mlx5_vdpa_reset_mr(mvdev, asid);
+ up_write(&ndev->reslock);
+ return err;
+}
+
static struct device *mlx5_get_vq_dma_dev(struct vdpa_device *vdev, u16 idx)
{
struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
@@ -3162,6 +3169,7 @@ static int mlx5_set_group_asid(struct vdpa_device *vdev, u32 group,
.set_config = mlx5_vdpa_set_config,
.get_generation = mlx5_vdpa_get_generation,
.set_map = mlx5_vdpa_set_map,
+ .reset_map = mlx5_vdpa_reset_map,
.set_group_asid = mlx5_set_group_asid,
.get_vq_dma_dev = mlx5_get_vq_dma_dev,
.free = mlx5_vdpa_free,
--
1.8.3.1
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH RFC v2 3/4] vhost-vdpa: should restore 1:1 dma mapping before detaching driver
2023-09-09 13:31 [PATCH RFC v2 0/4] vdpa: decouple reset of iotlb mapping from device reset Si-Wei Liu
2023-09-09 13:31 ` [PATCH RFC v2 1/4] vdpa: introduce .reset_map operation callback Si-Wei Liu
2023-09-09 13:31 ` [PATCH RFC v2 2/4] vdpa/mlx5: implement .reset_map driver op Si-Wei Liu
@ 2023-09-09 13:31 ` Si-Wei Liu
2023-09-09 13:31 ` [PATCH RFC v2 4/4] vhost-vdpa: introduce IOTLB_PERSIST backend feature bit Si-Wei Liu
3 siblings, 0 replies; 9+ messages in thread
From: Si-Wei Liu @ 2023-09-09 13:31 UTC (permalink / raw)
To: eperezma, jasowang, mst, xuanzhuo, dtatulea; +Cc: virtualization
Devices with on-chip IOMMU may need to restore iotlb to 1:1 identity
mapping from IOVA to PA. Before vhost-vdpa is going away, give them
a chance to clean up and reset iotlb back to 1:1 identify mapping
mode. This is done so that any vdpa bus driver may start with 1:1
identity mapping by default.
Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
---
drivers/vhost/vdpa.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index eabac06..71fbd559 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -131,6 +131,15 @@ static struct vhost_vdpa_as *vhost_vdpa_find_alloc_as(struct vhost_vdpa *v,
return vhost_vdpa_alloc_as(v, asid);
}
+static void vhost_vdpa_reset_map(struct vhost_vdpa *v, u32 asid)
+{
+ struct vdpa_device *vdpa = v->vdpa;
+ const struct vdpa_config_ops *ops = vdpa->config;
+
+ if (ops->reset_map)
+ ops->reset_map(vdpa, asid);
+}
+
static int vhost_vdpa_remove_as(struct vhost_vdpa *v, u32 asid)
{
struct vhost_vdpa_as *as = asid_to_as(v, asid);
@@ -140,6 +149,14 @@ static int vhost_vdpa_remove_as(struct vhost_vdpa *v, u32 asid)
hlist_del(&as->hash_link);
vhost_vdpa_iotlb_unmap(v, &as->iotlb, 0ULL, 0ULL - 1, asid);
+ /*
+ * Devices with on-chip IOMMU need to restore iotlb
+ * to 1:1 identity mapping before vhost-vdpa is going
+ * to be removed and detached from the device. Give
+ * them a chance to do so, as this cannot be done
+ * efficiently via the whole-range unmap call above.
+ */
+ vhost_vdpa_reset_map(v, asid);
kfree(as);
return 0;
--
1.8.3.1
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH RFC v2 4/4] vhost-vdpa: introduce IOTLB_PERSIST backend feature bit
2023-09-09 13:31 [PATCH RFC v2 0/4] vdpa: decouple reset of iotlb mapping from device reset Si-Wei Liu
` (2 preceding siblings ...)
2023-09-09 13:31 ` [PATCH RFC v2 3/4] vhost-vdpa: should restore 1:1 dma mapping before detaching driver Si-Wei Liu
@ 2023-09-09 13:31 ` Si-Wei Liu
3 siblings, 0 replies; 9+ messages in thread
From: Si-Wei Liu @ 2023-09-09 13:31 UTC (permalink / raw)
To: eperezma, jasowang, mst, xuanzhuo, dtatulea; +Cc: virtualization
Userspace needs this feature flag to distinguish if vhost-vdpa
iotlb in the kernel supports persistent IOTLB mapping across
device reset. There are two cases that backend may claim
this feature bit on:
- parent device that has to work with platform IOMMU
- parent device with on-chip IOMMU that has the expected
.reset_map support in driver
Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
---
drivers/vhost/vdpa.c | 15 ++++++++++++++-
include/uapi/linux/vhost_types.h | 2 ++
2 files changed, 16 insertions(+), 1 deletion(-)
diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 71fbd559..bbb1092 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -414,6 +414,14 @@ static bool vhost_vdpa_has_desc_group(const struct vhost_vdpa *v)
return ops->get_vq_desc_group;
}
+static bool vhost_vdpa_has_persistent_map(const struct vhost_vdpa *v)
+{
+ struct vdpa_device *vdpa = v->vdpa;
+ const struct vdpa_config_ops *ops = vdpa->config;
+
+ return (!ops->set_map && !ops->dma_map) || ops->reset_map;
+}
+
static long vhost_vdpa_get_features(struct vhost_vdpa *v, u64 __user *featurep)
{
struct vdpa_device *vdpa = v->vdpa;
@@ -716,7 +724,8 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep,
if (features & ~(VHOST_VDPA_BACKEND_FEATURES |
BIT_ULL(VHOST_BACKEND_F_DESC_ASID) |
BIT_ULL(VHOST_BACKEND_F_SUSPEND) |
- BIT_ULL(VHOST_BACKEND_F_RESUME)))
+ BIT_ULL(VHOST_BACKEND_F_RESUME) |
+ BIT_ULL(VHOST_BACKEND_F_IOTLB_PERSIST)))
return -EOPNOTSUPP;
if ((features & BIT_ULL(VHOST_BACKEND_F_SUSPEND)) &&
!vhost_vdpa_can_suspend(v))
@@ -729,6 +738,8 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep,
return -EINVAL;
if ((features & BIT_ULL(VHOST_BACKEND_F_DESC_ASID)) &&
!vhost_vdpa_has_desc_group(v))
+ if ((features & BIT_ULL(VHOST_BACKEND_F_IOTLB_PERSIST)) &&
+ !vhost_vdpa_has_persistent_map(v))
return -EOPNOTSUPP;
vhost_set_backend_features(&v->vdev, features);
return 0;
@@ -785,6 +796,8 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep,
features |= BIT_ULL(VHOST_BACKEND_F_RESUME);
if (vhost_vdpa_has_desc_group(v))
features |= BIT_ULL(VHOST_BACKEND_F_DESC_ASID);
+ if (vhost_vdpa_has_persistent_map(v))
+ features |= BIT_ULL(VHOST_BACKEND_F_IOTLB_PERSIST);
if (copy_to_user(featurep, &features, sizeof(features)))
r = -EFAULT;
break;
diff --git a/include/uapi/linux/vhost_types.h b/include/uapi/linux/vhost_types.h
index 6acc604..0fdb6f0 100644
--- a/include/uapi/linux/vhost_types.h
+++ b/include/uapi/linux/vhost_types.h
@@ -186,5 +186,7 @@ struct vhost_vdpa_iova_range {
* buffers may reside. Requires VHOST_BACKEND_F_IOTLB_ASID.
*/
#define VHOST_BACKEND_F_DESC_ASID 0x6
+/* IOTLB don't flush memory mapping across device reset */
+#define VHOST_BACKEND_F_IOTLB_PERSIST 0x7
#endif
--
1.8.3.1
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH RFC v2 1/4] vdpa: introduce .reset_map operation callback
2023-09-09 13:31 ` [PATCH RFC v2 1/4] vdpa: introduce .reset_map operation callback Si-Wei Liu
@ 2023-09-11 3:42 ` Jason Wang
2023-09-11 23:31 ` Si-Wei Liu
0 siblings, 1 reply; 9+ messages in thread
From: Jason Wang @ 2023-09-11 3:42 UTC (permalink / raw)
To: Si-Wei Liu; +Cc: eperezma, virtualization, xuanzhuo, mst
Hi Si-Wei:
On Sat, Sep 9, 2023 at 9:34 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> On-chip IOMMU parent driver could use it to restore memory mapping
> to the initial state.
As discussed before. On-chip IOMMU is the hardware details that need
to be hidden by the vDPA bus.
Exposing this will complicate the implementation of bus drivers.
Thanks
>
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> ---
> include/linux/vdpa.h | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
> index 17a4efa..daecf55 100644
> --- a/include/linux/vdpa.h
> +++ b/include/linux/vdpa.h
> @@ -324,6 +324,12 @@ struct vdpa_map_file {
> * @iova: iova to be unmapped
> * @size: size of the area
> * Returns integer: success (0) or error (< 0)
> + * @reset_map: Reset device memory mapping (optional)
> + * Needed for device that using device
> + * specific DMA translation (on-chip IOMMU)
> + * @vdev: vdpa device
> + * @asid: address space identifier
> + * Returns integer: success (0) or error (< 0)
> * @get_vq_dma_dev: Get the dma device for a specific
> * virtqueue (optional)
> * @vdev: vdpa device
> @@ -401,6 +407,7 @@ struct vdpa_config_ops {
> u64 iova, u64 size, u64 pa, u32 perm, void *opaque);
> int (*dma_unmap)(struct vdpa_device *vdev, unsigned int asid,
> u64 iova, u64 size);
> + int (*reset_map)(struct vdpa_device *vdev, unsigned int asid);
> int (*set_group_asid)(struct vdpa_device *vdev, unsigned int group,
> unsigned int asid);
> struct device *(*get_vq_dma_dev)(struct vdpa_device *vdev, u16 idx);
> --
> 1.8.3.1
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH RFC v2 1/4] vdpa: introduce .reset_map operation callback
2023-09-11 3:42 ` Jason Wang
@ 2023-09-11 23:31 ` Si-Wei Liu
2023-09-12 6:23 ` Jason Wang
0 siblings, 1 reply; 9+ messages in thread
From: Si-Wei Liu @ 2023-09-11 23:31 UTC (permalink / raw)
To: Jason Wang; +Cc: eperezma, virtualization, xuanzhuo, mst
Hi Jason,
On 9/10/2023 8:42 PM, Jason Wang wrote:
> Hi Si-Wei:
>
> On Sat, Sep 9, 2023 at 9:34 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>> On-chip IOMMU parent driver could use it to restore memory mapping
>> to the initial state.
> As discussed before. On-chip IOMMU is the hardware details that need
> to be hidden by the vDPA bus.
I guess today this is exposed to the bus driver layer already, for e.g.
vhost_vdpa_map() can call into the .dma_map, or .set_map, or
iommu_map() flavors depending on the specific hardware IOMMU
implementation underneath? Specifically, "struct iommu_domain *domain"
is now part of "struct vhost_vdpa" at an individual bus driver
(vhost-vdpa), rather than being wrapped around under the vdpa core
"struct vdpa_device" as vdpa device level object. Do we know for what
reason the hardware details could be exposed to bus callers like
vhost_vdpa_map and vhost_vdpa_general_unmap, while it's prohibited for
other similar cases on the other hand? Or is there a boundary in between
I was not aware of?
I think a more fundamental question I don't quite understand, is adding
an extra API to on-chip IOMMU itself an issue, or just that you don't
like the way how the IOMMU model gets exposed via this specific API of
.reset_map? For the platform IOMMU case, internally there exists
distinction between the 1:1 identify (passthrough) mode and DMA page
mapping mode, and this distinction is somehow getting exposed and
propagated through the IOMMU API - for e.g. iommu_domain_alloc() and
iommu_attach_device() are being called explicitly from
vhost_vdpa_alloc_domain() by vhost-vdpa (and the opposite from within
vhost_vdpa_free_domain), while for virtio-vdpa it doesn't call any IOMMU
API at all on the other hand - which is to inherit what default IOMMU
domain has. Ideally for on-chip IOMMU we can and should do pretty much
the same, but I don't think there's a clean way without introducing any
driver API to make vhost-vdpa case distinguish from the virtio-vdpa
case. I'm afraid to say that it was just a hack to hide the necessary
distinction needed by vdpa bus users for e.g. in the deep of
vdpa_reset(), if not introducing any new driver API is the goal here...
> Exposing this will complicate the implementation of bus drivers.
As said above, this distinction is needed by bus drivers, and it's
already done by platform IOMMU via IOMMU API. I can drop the .reset_map
API while add another set of similar driver API to mimic
iommu_domain_alloc/iommu_domain_free, but doing this will complicate the
parent driver's implementation on the other hand. While .reset_map is
what I can think of to be the simplest for parent, I can do the other
way if you're fine with it. Let me know how it sounds.
Thanks,
-Siwei
>
> Thanks
>
>> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
>> ---
>> include/linux/vdpa.h | 7 +++++++
>> 1 file changed, 7 insertions(+)
>>
>> diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
>> index 17a4efa..daecf55 100644
>> --- a/include/linux/vdpa.h
>> +++ b/include/linux/vdpa.h
>> @@ -324,6 +324,12 @@ struct vdpa_map_file {
>> * @iova: iova to be unmapped
>> * @size: size of the area
>> * Returns integer: success (0) or error (< 0)
>> + * @reset_map: Reset device memory mapping (optional)
>> + * Needed for device that using device
>> + * specific DMA translation (on-chip IOMMU)
>> + * @vdev: vdpa device
>> + * @asid: address space identifier
>> + * Returns integer: success (0) or error (< 0)
>> * @get_vq_dma_dev: Get the dma device for a specific
>> * virtqueue (optional)
>> * @vdev: vdpa device
>> @@ -401,6 +407,7 @@ struct vdpa_config_ops {
>> u64 iova, u64 size, u64 pa, u32 perm, void *opaque);
>> int (*dma_unmap)(struct vdpa_device *vdev, unsigned int asid,
>> u64 iova, u64 size);
>> + int (*reset_map)(struct vdpa_device *vdev, unsigned int asid);
>> int (*set_group_asid)(struct vdpa_device *vdev, unsigned int group,
>> unsigned int asid);
>> struct device *(*get_vq_dma_dev)(struct vdpa_device *vdev, u16 idx);
>> --
>> 1.8.3.1
>>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH RFC v2 1/4] vdpa: introduce .reset_map operation callback
2023-09-11 23:31 ` Si-Wei Liu
@ 2023-09-12 6:23 ` Jason Wang
2023-09-15 5:38 ` Si-Wei Liu
0 siblings, 1 reply; 9+ messages in thread
From: Jason Wang @ 2023-09-12 6:23 UTC (permalink / raw)
To: Si-Wei Liu; +Cc: eperezma, virtualization, xuanzhuo, mst
On Tue, Sep 12, 2023 at 7:31 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> Hi Jason,
>
> On 9/10/2023 8:42 PM, Jason Wang wrote:
> > Hi Si-Wei:
> >
> > On Sat, Sep 9, 2023 at 9:34 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
> >> On-chip IOMMU parent driver could use it to restore memory mapping
> >> to the initial state.
> > As discussed before. On-chip IOMMU is the hardware details that need
> > to be hidden by the vDPA bus.
> I guess today this is exposed to the bus driver layer already, for e.g.
> vhost_vdpa_map() can call into the .dma_map, or .set_map, or
> iommu_map() flavors depending on the specific hardware IOMMU
> implementation underneath? Specifically, "struct iommu_domain *domain"
> is now part of "struct vhost_vdpa" at an individual bus driver
> (vhost-vdpa), rather than being wrapped around under the vdpa core
> "struct vdpa_device" as vdpa device level object. Do we know for what
> reason the hardware details could be exposed to bus callers like
> vhost_vdpa_map and vhost_vdpa_general_unmap, while it's prohibited for
> other similar cases on the other hand? Or is there a boundary in between
> I was not aware of?
Let me try to explain:
set_map(), dma_map(), dma_unmap() is used for parent specific
mappings. It means the parents want to do vendor specific setup for
the mapping. The abstraction of translation is still one dimension
(thought the actual implementation in the parent could be two
dimensions). So it's not necessarily the on-chip stuff (see the
example of the VDUSE).
That means we never expose two dimension mappings like (on-chip)
beyond the bus. So it's not one dimension vs two dimensions but the
platform specific mappings vs vendor specific mappings.
>
> I think a more fundamental question I don't quite understand, is adding
> an extra API to on-chip IOMMU itself an issue, or just that you don't
> like the way how the IOMMU model gets exposed via this specific API of
> .reset_map?
extra API to on-chip IOMMU, since the on-chip logics should be hidden
by the bus unless we want to introduce the two dimensions abstraction
(which seems to be an overkill).
> For the platform IOMMU case, internally there exists
> distinction between the 1:1 identify (passthrough) mode and DMA page
> mapping mode, and this distinction is somehow getting exposed and
> propagated through the IOMMU API - for e.g. iommu_domain_alloc() and
> iommu_attach_device() are being called explicitly from
> vhost_vdpa_alloc_domain() by vhost-vdpa (and the opposite from within
> vhost_vdpa_free_domain), while for virtio-vdpa it doesn't call any IOMMU
> API at all on the other hand
It's the way the kernel manages DMA mappings. For a userspace driver
via vhost-vDPA, it needs to call IOMMU APIs. And for a kernel driver
via virtio-vDPA, DMA API is used (via the dma_dev exposed through
virtio_vdpa). DMA API may decide to call IOMMU API if IOMMU is enabled
but not in passthrough mode.
> - which is to inherit what default IOMMU
> domain has.
Yes, but it's not a 1:1 (identify) mapping, it really depends on the
configuration. (And there could even be a swiotlb layer in the
middle).
> Ideally for on-chip IOMMU we can and should do pretty much
> the same, but I don't think there's a clean way without introducing any
> driver API to make vhost-vdpa case distinguish from the virtio-vdpa
> case. I'm afraid to say that it was just a hack to hide the necessary
> distinction needed by vdpa bus users for e.g. in the deep of
> vdpa_reset(), if not introducing any new driver API is the goal here...
So rest_map() is fine if it is not defined just for on-chip. For
example, does VDUSE need to implement it or not?
>
> > Exposing this will complicate the implementation of bus drivers.
> As said above, this distinction is needed by bus drivers, and it's
> already done by platform IOMMU via IOMMU API. I can drop the .reset_map
> API while add another set of similar driver API to mimic
> iommu_domain_alloc/iommu_domain_free, but doing this will complicate the
> parent driver's implementation on the other hand.
I'm not sure I understand the issue. But something like PD
allocation/free in RDMA?
> While .reset_map is
> what I can think of to be the simplest for parent, I can do the other
> way if you're fine with it. Let me know how it sounds.
I think what I still don't understand is: how is reset_map() related
to persistent IOTLB? I guess it's a must but I still didn't figure out
why.
Thanks
>
> Thanks,
> -Siwei
>
> >
> > Thanks
> >
> >> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> >> ---
> >> include/linux/vdpa.h | 7 +++++++
> >> 1 file changed, 7 insertions(+)
> >>
> >> diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
> >> index 17a4efa..daecf55 100644
> >> --- a/include/linux/vdpa.h
> >> +++ b/include/linux/vdpa.h
> >> @@ -324,6 +324,12 @@ struct vdpa_map_file {
> >> * @iova: iova to be unmapped
> >> * @size: size of the area
> >> * Returns integer: success (0) or error (< 0)
> >> + * @reset_map: Reset device memory mapping (optional)
> >> + * Needed for device that using device
> >> + * specific DMA translation (on-chip IOMMU)
> >> + * @vdev: vdpa device
> >> + * @asid: address space identifier
> >> + * Returns integer: success (0) or error (< 0)
> >> * @get_vq_dma_dev: Get the dma device for a specific
> >> * virtqueue (optional)
> >> * @vdev: vdpa device
> >> @@ -401,6 +407,7 @@ struct vdpa_config_ops {
> >> u64 iova, u64 size, u64 pa, u32 perm, void *opaque);
> >> int (*dma_unmap)(struct vdpa_device *vdev, unsigned int asid,
> >> u64 iova, u64 size);
> >> + int (*reset_map)(struct vdpa_device *vdev, unsigned int asid);
> >> int (*set_group_asid)(struct vdpa_device *vdev, unsigned int group,
> >> unsigned int asid);
> >> struct device *(*get_vq_dma_dev)(struct vdpa_device *vdev, u16 idx);
> >> --
> >> 1.8.3.1
> >>
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH RFC v2 1/4] vdpa: introduce .reset_map operation callback
2023-09-12 6:23 ` Jason Wang
@ 2023-09-15 5:38 ` Si-Wei Liu
0 siblings, 0 replies; 9+ messages in thread
From: Si-Wei Liu @ 2023-09-15 5:38 UTC (permalink / raw)
To: Jason Wang; +Cc: eperezma, virtualization, xuanzhuo, mst
On 9/11/2023 11:23 PM, Jason Wang wrote:
> On Tue, Sep 12, 2023 at 7:31 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>> Hi Jason,
>>
>> On 9/10/2023 8:42 PM, Jason Wang wrote:
>>> Hi Si-Wei:
>>>
>>> On Sat, Sep 9, 2023 at 9:34 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>>>> On-chip IOMMU parent driver could use it to restore memory mapping
>>>> to the initial state.
>>> As discussed before. On-chip IOMMU is the hardware details that need
>>> to be hidden by the vDPA bus.
>> I guess today this is exposed to the bus driver layer already, for e.g.
>> vhost_vdpa_map() can call into the .dma_map, or .set_map, or
>> iommu_map() flavors depending on the specific hardware IOMMU
>> implementation underneath? Specifically, "struct iommu_domain *domain"
>> is now part of "struct vhost_vdpa" at an individual bus driver
>> (vhost-vdpa), rather than being wrapped around under the vdpa core
>> "struct vdpa_device" as vdpa device level object. Do we know for what
>> reason the hardware details could be exposed to bus callers like
>> vhost_vdpa_map and vhost_vdpa_general_unmap, while it's prohibited for
>> other similar cases on the other hand? Or is there a boundary in between
>> I was not aware of?
> Let me try to explain:
>
> set_map(), dma_map(), dma_unmap() is used for parent specific
> mappings. It means the parents want to do vendor specific setup for
> the mapping. The abstraction of translation is still one dimension
> (thought the actual implementation in the parent could be two
> dimensions). So it's not necessarily the on-chip stuff (see the
> example of the VDUSE).
>
> That means we never expose two dimension mappings like (on-chip)
> beyond the bus. So it's not one dimension vs two dimensions but the
> platform specific mappings vs vendor specific mappings.
OK, I think I saw on-chip was used interchangeably for vendor specific
means of mapping even for VDUSE. While I think we both agreed it's too
complex to expose the details of two-dimensions and we should try to
avoid that (I thought on-chip doesn't imply two-dimension but just the
vendor specific part). That's the reason why I hide this special detail
under a simple .reset_map interface such that we could easily decouple
mapping from virtio life cycle (device reset).
>
>> I think a more fundamental question I don't quite understand, is adding
>> an extra API to on-chip IOMMU itself an issue, or just that you don't
>> like the way how the IOMMU model gets exposed via this specific API of
>> .reset_map?
> extra API to on-chip IOMMU, since the on-chip logics should be hidden
> by the bus unless we want to introduce the two dimensions abstraction
> (which seems to be an overkill).
Thanks for clarifications of your concern. I will rephrase on-chip to
"vendor specific" and try to avoid mentioning the two-dimension aspect
of the API.
>
>> For the platform IOMMU case, internally there exists
>> distinction between the 1:1 identify (passthrough) mode and DMA page
>> mapping mode, and this distinction is somehow getting exposed and
>> propagated through the IOMMU API - for e.g. iommu_domain_alloc() and
>> iommu_attach_device() are being called explicitly from
>> vhost_vdpa_alloc_domain() by vhost-vdpa (and the opposite from within
>> vhost_vdpa_free_domain), while for virtio-vdpa it doesn't call any IOMMU
>> API at all on the other hand
> It's the way the kernel manages DMA mappings. For a userspace driver
> via vhost-vDPA, it needs to call IOMMU APIs. And for a kernel driver
> via virtio-vDPA, DMA API is used (via the dma_dev exposed through
> virtio_vdpa). DMA API may decide to call IOMMU API if IOMMU is enabled
> but not in passthrough mode.
Right, I think what I meant is, distinction of mapping requirement
exists between two bus drivers, vhost-vdpa and virtio-vdpa. It's
impossible to hide every details (identity, swiotlb, dmar) under the
cover of DMA API simply using the IOMMU API abstraction. Same applies to
how one dimension oriented vendor specific API ( .dma_map/.set_map I
mean) can't cover all cases of potentially multi-dimensional mapping
requirements from virtio-vdpa (which is using a feature rich DMA API
instead of simple and lower level page mapping based IOMMU API). I now
get that you may want to understand why .reset_map is required and which
part of the userspace functionality won't work without it, on the other
hand.
>
>> - which is to inherit what default IOMMU
>> domain has.
> Yes, but it's not a 1:1 (identify) mapping, it really depends on the
> configuration. (And there could even be a swiotlb layer in the
> middle).
Yes, so I said inherit the configuration of the default domain, which
could vary versus one-dimension.
>
>> Ideally for on-chip IOMMU we can and should do pretty much
>> the same, but I don't think there's a clean way without introducing any
>> driver API to make vhost-vdpa case distinguish from the virtio-vdpa
>> case. I'm afraid to say that it was just a hack to hide the necessary
>> distinction needed by vdpa bus users for e.g. in the deep of
>> vdpa_reset(), if not introducing any new driver API is the goal here...
> So rest_map() is fine if it is not defined just for on-chip. For
> example, does VDUSE need to implement it or not?
If "on-chip" of what you said means "two-dimension" or "identity
mapping" in the context I would say it's definitely not the intent.
Instead, it's the best I can think of what is able to not expose that
part of the specifics.
VDUSE can implement it if it has similar requirement of resetting
mapping to the default state for bus driver users like vhost-vdpa. This
is left up to the VDUSE owner to decide, though from what I collect so
far it doesn't seem have to do so for the moment, as it's explicitly
using DMA ops to implement swiotlb like bouncing mechanism which works
more closely to the DMA API. And its vhost-vdpa usage seems not page
pinning based but through shared memory? Losing the performance reason
to decouple mapping.
>
>>> Exposing this will complicate the implementation of bus drivers.
>> As said above, this distinction is needed by bus drivers, and it's
>> already done by platform IOMMU via IOMMU API. I can drop the .reset_map
>> API while add another set of similar driver API to mimic
>> iommu_domain_alloc/iommu_domain_free, but doing this will complicate the
>> parent driver's implementation on the other hand.
> I'm not sure I understand the issue. But something like PD
> allocation/free in RDMA?
Never mind, this would just expose more implementation details on the
two-dimension (or maybe multi-dimensional) mapping model and also
introduces more complexity. Certainly not something I'd advocate for.
>> While .reset_map is
>> what I can think of to be the simplest for parent, I can do the other
>> way if you're fine with it. Let me know how it sounds.
> I think what I still don't understand is: how is reset_map() related
> to persistent IOTLB? I guess it's a must but I still didn't figure out
> why.
Hope my follow-up response to patch 2 and 4 got it clarified? If not let
me know which part I may be missing.
Thanks,
-Siwei
>
> Thanks
>
>> Thanks,
>> -Siwei
>>
>>> Thanks
>>>
>>>> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
>>>> ---
>>>> include/linux/vdpa.h | 7 +++++++
>>>> 1 file changed, 7 insertions(+)
>>>>
>>>> diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
>>>> index 17a4efa..daecf55 100644
>>>> --- a/include/linux/vdpa.h
>>>> +++ b/include/linux/vdpa.h
>>>> @@ -324,6 +324,12 @@ struct vdpa_map_file {
>>>> * @iova: iova to be unmapped
>>>> * @size: size of the area
>>>> * Returns integer: success (0) or error (< 0)
>>>> + * @reset_map: Reset device memory mapping (optional)
>>>> + * Needed for device that using device
>>>> + * specific DMA translation (on-chip IOMMU)
>>>> + * @vdev: vdpa device
>>>> + * @asid: address space identifier
>>>> + * Returns integer: success (0) or error (< 0)
>>>> * @get_vq_dma_dev: Get the dma device for a specific
>>>> * virtqueue (optional)
>>>> * @vdev: vdpa device
>>>> @@ -401,6 +407,7 @@ struct vdpa_config_ops {
>>>> u64 iova, u64 size, u64 pa, u32 perm, void *opaque);
>>>> int (*dma_unmap)(struct vdpa_device *vdev, unsigned int asid,
>>>> u64 iova, u64 size);
>>>> + int (*reset_map)(struct vdpa_device *vdev, unsigned int asid);
>>>> int (*set_group_asid)(struct vdpa_device *vdev, unsigned int group,
>>>> unsigned int asid);
>>>> struct device *(*get_vq_dma_dev)(struct vdpa_device *vdev, u16 idx);
>>>> --
>>>> 1.8.3.1
>>>>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2023-09-15 5:38 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-09 13:31 [PATCH RFC v2 0/4] vdpa: decouple reset of iotlb mapping from device reset Si-Wei Liu
2023-09-09 13:31 ` [PATCH RFC v2 1/4] vdpa: introduce .reset_map operation callback Si-Wei Liu
2023-09-11 3:42 ` Jason Wang
2023-09-11 23:31 ` Si-Wei Liu
2023-09-12 6:23 ` Jason Wang
2023-09-15 5:38 ` Si-Wei Liu
2023-09-09 13:31 ` [PATCH RFC v2 2/4] vdpa/mlx5: implement .reset_map driver op Si-Wei Liu
2023-09-09 13:31 ` [PATCH RFC v2 3/4] vhost-vdpa: should restore 1:1 dma mapping before detaching driver Si-Wei Liu
2023-09-09 13:31 ` [PATCH RFC v2 4/4] vhost-vdpa: introduce IOTLB_PERSIST backend feature bit Si-Wei Liu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).