* [PATCH v5 01/13] vfio/pci: Extract mdev check into an helper
2024-07-19 12:04 [PATCH v5 00/13] hw/iommufd: IOMMUFD Dirty Tracking Joao Martins
@ 2024-07-19 12:04 ` Joao Martins
2024-07-19 14:09 ` Cédric Le Goater
` (2 more replies)
2024-07-19 12:04 ` [PATCH v5 02/13] vfio/iommufd: Don't initialize nor set a HOST_IOMMU_DEVICE with mdev Joao Martins
` (14 subsequent siblings)
15 siblings, 3 replies; 53+ messages in thread
From: Joao Martins @ 2024-07-19 12:04 UTC (permalink / raw)
To: qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Cedric Le Goater, Jason Gunthorpe, Avihai Horon, Joao Martins
In preparation to skip initialization of the HostIOMMUDevice for mdev,
extract the checks that validate if a device is an mdev into helpers.
A vfio_device_is_mdev() is created, and subsystems consult VFIODevice::mdev
to check if it's mdev or not.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
include/hw/vfio/vfio-common.h | 2 ++
hw/vfio/helpers.c | 14 ++++++++++++++
hw/vfio/pci.c | 12 +++---------
3 files changed, 19 insertions(+), 9 deletions(-)
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index e8ddf92bb185..98acae8c1c97 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -116,6 +116,7 @@ typedef struct VFIODevice {
DeviceState *dev;
int fd;
int type;
+ bool mdev;
bool reset_works;
bool needs_reset;
bool no_mmap;
@@ -231,6 +232,7 @@ void vfio_region_exit(VFIORegion *region);
void vfio_region_finalize(VFIORegion *region);
void vfio_reset_handler(void *opaque);
struct vfio_device_info *vfio_get_device_info(int fd);
+bool vfio_device_is_mdev(VFIODevice *vbasedev);
bool vfio_attach_device(char *name, VFIODevice *vbasedev,
AddressSpace *as, Error **errp);
void vfio_detach_device(VFIODevice *vbasedev);
diff --git a/hw/vfio/helpers.c b/hw/vfio/helpers.c
index b14edd46edc9..7e23e9080c9d 100644
--- a/hw/vfio/helpers.c
+++ b/hw/vfio/helpers.c
@@ -675,3 +675,17 @@ int vfio_device_get_aw_bits(VFIODevice *vdev)
return HOST_IOMMU_DEVICE_CAP_AW_BITS_MAX;
}
+
+bool vfio_device_is_mdev(VFIODevice *vbasedev)
+{
+ g_autofree char *subsys = NULL;
+ g_autofree char *tmp = NULL;
+
+ if (!vbasedev->sysfsdev) {
+ return false;
+ }
+
+ tmp = g_strdup_printf("%s/subsystem", vbasedev->sysfsdev);
+ subsys = realpath(tmp, NULL);
+ return subsys && (strcmp(subsys, "/sys/bus/mdev") == 0);
+}
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index e03d9f3ba546..b34e91468a53 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2963,12 +2963,9 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
ERRP_GUARD();
VFIOPCIDevice *vdev = VFIO_PCI(pdev);
VFIODevice *vbasedev = &vdev->vbasedev;
- char *subsys;
int i, ret;
- bool is_mdev;
char uuid[UUID_STR_LEN];
g_autofree char *name = NULL;
- g_autofree char *tmp = NULL;
if (vbasedev->fd < 0 && !vbasedev->sysfsdev) {
if (!(~vdev->host.domain || ~vdev->host.bus ||
@@ -2997,14 +2994,11 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
* stays in sync with the active working set of the guest driver. Prevent
* the x-balloon-allowed option unless this is minimally an mdev device.
*/
- tmp = g_strdup_printf("%s/subsystem", vbasedev->sysfsdev);
- subsys = realpath(tmp, NULL);
- is_mdev = subsys && (strcmp(subsys, "/sys/bus/mdev") == 0);
- free(subsys);
+ vbasedev->mdev = vfio_device_is_mdev(vbasedev);
- trace_vfio_mdev(vbasedev->name, is_mdev);
+ trace_vfio_mdev(vbasedev->name, vbasedev->mdev);
- if (vbasedev->ram_block_discard_allowed && !is_mdev) {
+ if (vbasedev->ram_block_discard_allowed && !vbasedev->mdev) {
error_setg(errp, "x-balloon-allowed only potentially compatible "
"with mdev devices");
goto error;
--
2.17.2
^ permalink raw reply related [flat|nested] 53+ messages in thread
* Re: [PATCH v5 01/13] vfio/pci: Extract mdev check into an helper
2024-07-19 12:04 ` [PATCH v5 01/13] vfio/pci: Extract mdev check into an helper Joao Martins
@ 2024-07-19 14:09 ` Cédric Le Goater
2024-07-22 5:13 ` Duan, Zhenzhong
2024-07-23 7:00 ` Eric Auger
2 siblings, 0 replies; 53+ messages in thread
From: Cédric Le Goater @ 2024-07-19 14:09 UTC (permalink / raw)
To: Joao Martins, qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Jason Gunthorpe, Avihai Horon
On 7/19/24 14:04, Joao Martins wrote:
> In preparation to skip initialization of the HostIOMMUDevice for mdev,
> extract the checks that validate if a device is an mdev into helpers.
>
> A vfio_device_is_mdev() is created, and subsystems consult VFIODevice::mdev
> to check if it's mdev or not.
>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Thanks,
C.
> ---
> include/hw/vfio/vfio-common.h | 2 ++
> hw/vfio/helpers.c | 14 ++++++++++++++
> hw/vfio/pci.c | 12 +++---------
> 3 files changed, 19 insertions(+), 9 deletions(-)
>
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index e8ddf92bb185..98acae8c1c97 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -116,6 +116,7 @@ typedef struct VFIODevice {
> DeviceState *dev;
> int fd;
> int type;
> + bool mdev;
> bool reset_works;
> bool needs_reset;
> bool no_mmap;
> @@ -231,6 +232,7 @@ void vfio_region_exit(VFIORegion *region);
> void vfio_region_finalize(VFIORegion *region);
> void vfio_reset_handler(void *opaque);
> struct vfio_device_info *vfio_get_device_info(int fd);
> +bool vfio_device_is_mdev(VFIODevice *vbasedev);
> bool vfio_attach_device(char *name, VFIODevice *vbasedev,
> AddressSpace *as, Error **errp);
> void vfio_detach_device(VFIODevice *vbasedev);
> diff --git a/hw/vfio/helpers.c b/hw/vfio/helpers.c
> index b14edd46edc9..7e23e9080c9d 100644
> --- a/hw/vfio/helpers.c
> +++ b/hw/vfio/helpers.c
> @@ -675,3 +675,17 @@ int vfio_device_get_aw_bits(VFIODevice *vdev)
>
> return HOST_IOMMU_DEVICE_CAP_AW_BITS_MAX;
> }
> +
> +bool vfio_device_is_mdev(VFIODevice *vbasedev)
> +{
> + g_autofree char *subsys = NULL;
> + g_autofree char *tmp = NULL;
> +
> + if (!vbasedev->sysfsdev) {
> + return false;
> + }
> +
> + tmp = g_strdup_printf("%s/subsystem", vbasedev->sysfsdev);
> + subsys = realpath(tmp, NULL);
> + return subsys && (strcmp(subsys, "/sys/bus/mdev") == 0);
> +}
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index e03d9f3ba546..b34e91468a53 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -2963,12 +2963,9 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
> ERRP_GUARD();
> VFIOPCIDevice *vdev = VFIO_PCI(pdev);
> VFIODevice *vbasedev = &vdev->vbasedev;
> - char *subsys;
> int i, ret;
> - bool is_mdev;
> char uuid[UUID_STR_LEN];
> g_autofree char *name = NULL;
> - g_autofree char *tmp = NULL;
>
> if (vbasedev->fd < 0 && !vbasedev->sysfsdev) {
> if (!(~vdev->host.domain || ~vdev->host.bus ||
> @@ -2997,14 +2994,11 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
> * stays in sync with the active working set of the guest driver. Prevent
> * the x-balloon-allowed option unless this is minimally an mdev device.
> */
> - tmp = g_strdup_printf("%s/subsystem", vbasedev->sysfsdev);
> - subsys = realpath(tmp, NULL);
> - is_mdev = subsys && (strcmp(subsys, "/sys/bus/mdev") == 0);
> - free(subsys);
> + vbasedev->mdev = vfio_device_is_mdev(vbasedev);
>
> - trace_vfio_mdev(vbasedev->name, is_mdev);
> + trace_vfio_mdev(vbasedev->name, vbasedev->mdev);
>
> - if (vbasedev->ram_block_discard_allowed && !is_mdev) {
> + if (vbasedev->ram_block_discard_allowed && !vbasedev->mdev) {
> error_setg(errp, "x-balloon-allowed only potentially compatible "
> "with mdev devices");
> goto error;
^ permalink raw reply [flat|nested] 53+ messages in thread
* RE: [PATCH v5 01/13] vfio/pci: Extract mdev check into an helper
2024-07-19 12:04 ` [PATCH v5 01/13] vfio/pci: Extract mdev check into an helper Joao Martins
2024-07-19 14:09 ` Cédric Le Goater
@ 2024-07-22 5:13 ` Duan, Zhenzhong
2024-07-23 7:00 ` Eric Auger
2 siblings, 0 replies; 53+ messages in thread
From: Duan, Zhenzhong @ 2024-07-22 5:13 UTC (permalink / raw)
To: Joao Martins, qemu-devel@nongnu.org
Cc: Liu, Yi L, Eric Auger, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
>-----Original Message-----
>From: Joao Martins <joao.m.martins@oracle.com>
>Sent: Friday, July 19, 2024 8:05 PM
>To: qemu-devel@nongnu.org
>Cc: Liu, Yi L <yi.l.liu@intel.com>; Eric Auger <eric.auger@redhat.com>; Duan,
>Zhenzhong <zhenzhong.duan@intel.com>; Alex Williamson
><alex.williamson@redhat.com>; Cedric Le Goater <clg@redhat.com>; Jason
>Gunthorpe <jgg@nvidia.com>; Avihai Horon <avihaih@nvidia.com>; Joao
>Martins <joao.m.martins@oracle.com>
>Subject: [PATCH v5 01/13] vfio/pci: Extract mdev check into an helper
>
>In preparation to skip initialization of the HostIOMMUDevice for mdev,
>extract the checks that validate if a device is an mdev into helpers.
>
>A vfio_device_is_mdev() is created, and subsystems consult
>VFIODevice::mdev
>to check if it's mdev or not.
>
>Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Thanks
Zhenzhong
>---
> include/hw/vfio/vfio-common.h | 2 ++
> hw/vfio/helpers.c | 14 ++++++++++++++
> hw/vfio/pci.c | 12 +++---------
> 3 files changed, 19 insertions(+), 9 deletions(-)
>
>diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-
>common.h
>index e8ddf92bb185..98acae8c1c97 100644
>--- a/include/hw/vfio/vfio-common.h
>+++ b/include/hw/vfio/vfio-common.h
>@@ -116,6 +116,7 @@ typedef struct VFIODevice {
> DeviceState *dev;
> int fd;
> int type;
>+ bool mdev;
> bool reset_works;
> bool needs_reset;
> bool no_mmap;
>@@ -231,6 +232,7 @@ void vfio_region_exit(VFIORegion *region);
> void vfio_region_finalize(VFIORegion *region);
> void vfio_reset_handler(void *opaque);
> struct vfio_device_info *vfio_get_device_info(int fd);
>+bool vfio_device_is_mdev(VFIODevice *vbasedev);
> bool vfio_attach_device(char *name, VFIODevice *vbasedev,
> AddressSpace *as, Error **errp);
> void vfio_detach_device(VFIODevice *vbasedev);
>diff --git a/hw/vfio/helpers.c b/hw/vfio/helpers.c
>index b14edd46edc9..7e23e9080c9d 100644
>--- a/hw/vfio/helpers.c
>+++ b/hw/vfio/helpers.c
>@@ -675,3 +675,17 @@ int vfio_device_get_aw_bits(VFIODevice *vdev)
>
> return HOST_IOMMU_DEVICE_CAP_AW_BITS_MAX;
> }
>+
>+bool vfio_device_is_mdev(VFIODevice *vbasedev)
>+{
>+ g_autofree char *subsys = NULL;
>+ g_autofree char *tmp = NULL;
>+
>+ if (!vbasedev->sysfsdev) {
>+ return false;
>+ }
>+
>+ tmp = g_strdup_printf("%s/subsystem", vbasedev->sysfsdev);
>+ subsys = realpath(tmp, NULL);
>+ return subsys && (strcmp(subsys, "/sys/bus/mdev") == 0);
>+}
>diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>index e03d9f3ba546..b34e91468a53 100644
>--- a/hw/vfio/pci.c
>+++ b/hw/vfio/pci.c
>@@ -2963,12 +2963,9 @@ static void vfio_realize(PCIDevice *pdev, Error
>**errp)
> ERRP_GUARD();
> VFIOPCIDevice *vdev = VFIO_PCI(pdev);
> VFIODevice *vbasedev = &vdev->vbasedev;
>- char *subsys;
> int i, ret;
>- bool is_mdev;
> char uuid[UUID_STR_LEN];
> g_autofree char *name = NULL;
>- g_autofree char *tmp = NULL;
>
> if (vbasedev->fd < 0 && !vbasedev->sysfsdev) {
> if (!(~vdev->host.domain || ~vdev->host.bus ||
>@@ -2997,14 +2994,11 @@ static void vfio_realize(PCIDevice *pdev, Error
>**errp)
> * stays in sync with the active working set of the guest driver. Prevent
> * the x-balloon-allowed option unless this is minimally an mdev device.
> */
>- tmp = g_strdup_printf("%s/subsystem", vbasedev->sysfsdev);
>- subsys = realpath(tmp, NULL);
>- is_mdev = subsys && (strcmp(subsys, "/sys/bus/mdev") == 0);
>- free(subsys);
>+ vbasedev->mdev = vfio_device_is_mdev(vbasedev);
>
>- trace_vfio_mdev(vbasedev->name, is_mdev);
>+ trace_vfio_mdev(vbasedev->name, vbasedev->mdev);
>
>- if (vbasedev->ram_block_discard_allowed && !is_mdev) {
>+ if (vbasedev->ram_block_discard_allowed && !vbasedev->mdev) {
> error_setg(errp, "x-balloon-allowed only potentially compatible "
> "with mdev devices");
> goto error;
>--
>2.17.2
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v5 01/13] vfio/pci: Extract mdev check into an helper
2024-07-19 12:04 ` [PATCH v5 01/13] vfio/pci: Extract mdev check into an helper Joao Martins
2024-07-19 14:09 ` Cédric Le Goater
2024-07-22 5:13 ` Duan, Zhenzhong
@ 2024-07-23 7:00 ` Eric Auger
2 siblings, 0 replies; 53+ messages in thread
From: Eric Auger @ 2024-07-23 7:00 UTC (permalink / raw)
To: Joao Martins, qemu-devel
Cc: Yi Liu, Zhenzhong Duan, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
On 7/19/24 14:04, Joao Martins wrote:
> In preparation to skip initialization of the HostIOMMUDevice for mdev,
> extract the checks that validate if a device is an mdev into helpers.
>
> A vfio_device_is_mdev() is created, and subsystems consult VFIODevice::mdev
> to check if it's mdev or not.
>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Eric
> ---
> include/hw/vfio/vfio-common.h | 2 ++
> hw/vfio/helpers.c | 14 ++++++++++++++
> hw/vfio/pci.c | 12 +++---------
> 3 files changed, 19 insertions(+), 9 deletions(-)
>
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index e8ddf92bb185..98acae8c1c97 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -116,6 +116,7 @@ typedef struct VFIODevice {
> DeviceState *dev;
> int fd;
> int type;
> + bool mdev;
> bool reset_works;
> bool needs_reset;
> bool no_mmap;
> @@ -231,6 +232,7 @@ void vfio_region_exit(VFIORegion *region);
> void vfio_region_finalize(VFIORegion *region);
> void vfio_reset_handler(void *opaque);
> struct vfio_device_info *vfio_get_device_info(int fd);
> +bool vfio_device_is_mdev(VFIODevice *vbasedev);
> bool vfio_attach_device(char *name, VFIODevice *vbasedev,
> AddressSpace *as, Error **errp);
> void vfio_detach_device(VFIODevice *vbasedev);
> diff --git a/hw/vfio/helpers.c b/hw/vfio/helpers.c
> index b14edd46edc9..7e23e9080c9d 100644
> --- a/hw/vfio/helpers.c
> +++ b/hw/vfio/helpers.c
> @@ -675,3 +675,17 @@ int vfio_device_get_aw_bits(VFIODevice *vdev)
>
> return HOST_IOMMU_DEVICE_CAP_AW_BITS_MAX;
> }
> +
> +bool vfio_device_is_mdev(VFIODevice *vbasedev)
> +{
> + g_autofree char *subsys = NULL;
> + g_autofree char *tmp = NULL;
> +
> + if (!vbasedev->sysfsdev) {
> + return false;
> + }
> +
> + tmp = g_strdup_printf("%s/subsystem", vbasedev->sysfsdev);
> + subsys = realpath(tmp, NULL);
> + return subsys && (strcmp(subsys, "/sys/bus/mdev") == 0);
> +}
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index e03d9f3ba546..b34e91468a53 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -2963,12 +2963,9 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
> ERRP_GUARD();
> VFIOPCIDevice *vdev = VFIO_PCI(pdev);
> VFIODevice *vbasedev = &vdev->vbasedev;
> - char *subsys;
> int i, ret;
> - bool is_mdev;
> char uuid[UUID_STR_LEN];
> g_autofree char *name = NULL;
> - g_autofree char *tmp = NULL;
>
> if (vbasedev->fd < 0 && !vbasedev->sysfsdev) {
> if (!(~vdev->host.domain || ~vdev->host.bus ||
> @@ -2997,14 +2994,11 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
> * stays in sync with the active working set of the guest driver. Prevent
> * the x-balloon-allowed option unless this is minimally an mdev device.
> */
> - tmp = g_strdup_printf("%s/subsystem", vbasedev->sysfsdev);
> - subsys = realpath(tmp, NULL);
> - is_mdev = subsys && (strcmp(subsys, "/sys/bus/mdev") == 0);
> - free(subsys);
> + vbasedev->mdev = vfio_device_is_mdev(vbasedev);
>
> - trace_vfio_mdev(vbasedev->name, is_mdev);
> + trace_vfio_mdev(vbasedev->name, vbasedev->mdev);
>
> - if (vbasedev->ram_block_discard_allowed && !is_mdev) {
> + if (vbasedev->ram_block_discard_allowed && !vbasedev->mdev) {
> error_setg(errp, "x-balloon-allowed only potentially compatible "
> "with mdev devices");
> goto error;
^ permalink raw reply [flat|nested] 53+ messages in thread
* [PATCH v5 02/13] vfio/iommufd: Don't initialize nor set a HOST_IOMMU_DEVICE with mdev
2024-07-19 12:04 [PATCH v5 00/13] hw/iommufd: IOMMUFD Dirty Tracking Joao Martins
2024-07-19 12:04 ` [PATCH v5 01/13] vfio/pci: Extract mdev check into an helper Joao Martins
@ 2024-07-19 12:04 ` Joao Martins
2024-07-19 12:04 ` [PATCH v5 03/13] backends/iommufd: Extend iommufd_backend_get_device_info() to fetch HW capabilities Joao Martins
` (13 subsequent siblings)
15 siblings, 0 replies; 53+ messages in thread
From: Joao Martins @ 2024-07-19 12:04 UTC (permalink / raw)
To: qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Cedric Le Goater, Jason Gunthorpe, Avihai Horon, Joao Martins
mdevs aren't "physical" devices and when asking for backing IOMMU info, it
fails the entire provisioning of the guest. Fix that by skipping
HostIOMMUDevice initialization in the presence of mdevs, and skip setting
an iommu device when it is known to be an mdev.
Cc: Zhenzhong Duan <zhenzhong.duan@intel.com>
Fixes: 930589520128 ("vfio/iommufd: Implement HostIOMMUDeviceClass::realize() handler")
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
hw/vfio/common.c | 4 ++++
hw/vfio/pci.c | 11 ++++++++---
2 files changed, 12 insertions(+), 3 deletions(-)
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 7cdb969fd396..b0beed44116e 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1556,6 +1556,10 @@ bool vfio_attach_device(char *name, VFIODevice *vbasedev,
return false;
}
+ if (vbasedev->mdev) {
+ return true;
+ }
+
hiod = HOST_IOMMU_DEVICE(object_new(ops->hiod_typename));
if (!HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod, vbasedev, errp)) {
object_unref(hiod);
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index b34e91468a53..265d3cb82ffc 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3115,7 +3115,8 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
vfio_bars_register(vdev);
- if (!pci_device_set_iommu_device(pdev, vbasedev->hiod, errp)) {
+ if (!vbasedev->mdev &&
+ !pci_device_set_iommu_device(pdev, vbasedev->hiod, errp)) {
error_prepend(errp, "Failed to set iommu_device: ");
goto out_teardown;
}
@@ -3238,7 +3239,9 @@ out_deregister:
timer_free(vdev->intx.mmap_timer);
}
out_unset_idev:
- pci_device_unset_iommu_device(pdev);
+ if (!vbasedev->mdev) {
+ pci_device_unset_iommu_device(pdev);
+ }
out_teardown:
vfio_teardown_msi(vdev);
vfio_bars_exit(vdev);
@@ -3283,7 +3286,9 @@ static void vfio_exitfn(PCIDevice *pdev)
vfio_pci_disable_rp_atomics(vdev);
vfio_bars_exit(vdev);
vfio_migration_exit(vbasedev);
- pci_device_unset_iommu_device(pdev);
+ if (!vbasedev->mdev) {
+ pci_device_unset_iommu_device(pdev);
+ }
}
static void vfio_pci_reset(DeviceState *dev)
--
2.17.2
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [PATCH v5 03/13] backends/iommufd: Extend iommufd_backend_get_device_info() to fetch HW capabilities
2024-07-19 12:04 [PATCH v5 00/13] hw/iommufd: IOMMUFD Dirty Tracking Joao Martins
2024-07-19 12:04 ` [PATCH v5 01/13] vfio/pci: Extract mdev check into an helper Joao Martins
2024-07-19 12:04 ` [PATCH v5 02/13] vfio/iommufd: Don't initialize nor set a HOST_IOMMU_DEVICE with mdev Joao Martins
@ 2024-07-19 12:04 ` Joao Martins
2024-07-19 12:04 ` [PATCH v5 04/13] vfio/iommufd: Return errno in iommufd_cdev_attach_ioas_hwpt() Joao Martins
` (12 subsequent siblings)
15 siblings, 0 replies; 53+ messages in thread
From: Joao Martins @ 2024-07-19 12:04 UTC (permalink / raw)
To: qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Cedric Le Goater, Jason Gunthorpe, Avihai Horon, Joao Martins
The helper will be able to fetch vendor agnostic IOMMU capabilities
supported both by hardware and software. Right now it is only iommu dirty
tracking.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com
Reviewed-by: Eric Auger <eric.auger@redhat.com>
---
include/sysemu/iommufd.h | 2 +-
backends/iommufd.c | 4 +++-
hw/vfio/iommufd.c | 4 +++-
3 files changed, 7 insertions(+), 3 deletions(-)
diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
index 9edfec604595..57d502a1c79a 100644
--- a/include/sysemu/iommufd.h
+++ b/include/sysemu/iommufd.h
@@ -49,7 +49,7 @@ int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
hwaddr iova, ram_addr_t size);
bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t devid,
uint32_t *type, void *data, uint32_t len,
- Error **errp);
+ uint64_t *caps, Error **errp);
#define TYPE_HOST_IOMMU_DEVICE_IOMMUFD TYPE_HOST_IOMMU_DEVICE "-iommufd"
#endif
diff --git a/backends/iommufd.c b/backends/iommufd.c
index 84fefbc9ee7a..2b3d51af26d2 100644
--- a/backends/iommufd.c
+++ b/backends/iommufd.c
@@ -210,7 +210,7 @@ int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t devid,
uint32_t *type, void *data, uint32_t len,
- Error **errp)
+ uint64_t *caps, Error **errp)
{
struct iommu_hw_info info = {
.size = sizeof(info),
@@ -226,6 +226,8 @@ bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t devid,
g_assert(type);
*type = info.out_data_type;
+ g_assert(caps);
+ *caps = info.out_capabilities;
return true;
}
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index c2f158e60386..604eaa4d9a5d 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -628,11 +628,13 @@ static bool hiod_iommufd_vfio_realize(HostIOMMUDevice *hiod, void *opaque,
union {
struct iommu_hw_info_vtd vtd;
} data;
+ uint64_t hw_caps;
hiod->agent = opaque;
if (!iommufd_backend_get_device_info(vdev->iommufd, vdev->devid,
- &type, &data, sizeof(data), errp)) {
+ &type, &data, sizeof(data),
+ &hw_caps, errp)) {
return false;
}
--
2.17.2
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [PATCH v5 04/13] vfio/iommufd: Return errno in iommufd_cdev_attach_ioas_hwpt()
2024-07-19 12:04 [PATCH v5 00/13] hw/iommufd: IOMMUFD Dirty Tracking Joao Martins
` (2 preceding siblings ...)
2024-07-19 12:04 ` [PATCH v5 03/13] backends/iommufd: Extend iommufd_backend_get_device_info() to fetch HW capabilities Joao Martins
@ 2024-07-19 12:04 ` Joao Martins
2024-07-19 12:04 ` [PATCH v5 05/13] vfio/iommufd: Introduce auto domain creation Joao Martins
` (11 subsequent siblings)
15 siblings, 0 replies; 53+ messages in thread
From: Joao Martins @ 2024-07-19 12:04 UTC (permalink / raw)
To: qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Cedric Le Goater, Jason Gunthorpe, Avihai Horon, Joao Martins
In preparation to implement auto domains have the attach function
return the errno it got during domain attach instead of a bool.
-EINVAL is tracked to track domain incompatibilities, and decide whether
to create a new IOMMU domain.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
hw/vfio/iommufd.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 604eaa4d9a5d..077dea8f1b64 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -172,7 +172,7 @@ out:
return ret;
}
-static bool iommufd_cdev_attach_ioas_hwpt(VFIODevice *vbasedev, uint32_t id,
+static int iommufd_cdev_attach_ioas_hwpt(VFIODevice *vbasedev, uint32_t id,
Error **errp)
{
int iommufd = vbasedev->iommufd->fd;
@@ -187,12 +187,12 @@ static bool iommufd_cdev_attach_ioas_hwpt(VFIODevice *vbasedev, uint32_t id,
error_setg_errno(errp, errno,
"[iommufd=%d] error attach %s (%d) to id=%d",
iommufd, vbasedev->name, vbasedev->fd, id);
- return false;
+ return -errno;
}
trace_iommufd_cdev_attach_ioas_hwpt(iommufd, vbasedev->name,
vbasedev->fd, id);
- return true;
+ return 0;
}
static bool iommufd_cdev_detach_ioas_hwpt(VFIODevice *vbasedev, Error **errp)
@@ -216,7 +216,7 @@ static bool iommufd_cdev_attach_container(VFIODevice *vbasedev,
VFIOIOMMUFDContainer *container,
Error **errp)
{
- return iommufd_cdev_attach_ioas_hwpt(vbasedev, container->ioas_id, errp);
+ return !iommufd_cdev_attach_ioas_hwpt(vbasedev, container->ioas_id, errp);
}
static void iommufd_cdev_detach_container(VFIODevice *vbasedev,
--
2.17.2
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [PATCH v5 05/13] vfio/iommufd: Introduce auto domain creation
2024-07-19 12:04 [PATCH v5 00/13] hw/iommufd: IOMMUFD Dirty Tracking Joao Martins
` (3 preceding siblings ...)
2024-07-19 12:04 ` [PATCH v5 04/13] vfio/iommufd: Return errno in iommufd_cdev_attach_ioas_hwpt() Joao Martins
@ 2024-07-19 12:04 ` Joao Martins
2024-07-22 5:16 ` Duan, Zhenzhong
2024-07-19 12:04 ` [PATCH v5 06/13] vfio/{iommufd,container}: Remove caps::aw_bits Joao Martins
` (10 subsequent siblings)
15 siblings, 1 reply; 53+ messages in thread
From: Joao Martins @ 2024-07-19 12:04 UTC (permalink / raw)
To: qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Cedric Le Goater, Jason Gunthorpe, Avihai Horon, Joao Martins
There's generally two modes of operation for IOMMUFD:
1) The simple user API which intends to perform relatively simple things
with IOMMUs e.g. DPDK. The process generally creates an IOAS and attaches
to VFIO and mainly performs IOAS_MAP and UNMAP.
2) The native IOMMUFD API where you have fine grained control of the
IOMMU domain and model it accordingly. This is where most new feature
are being steered to.
For dirty tracking 2) is required, as it needs to ensure that
the stage-2/parent IOMMU domain will only attach devices
that support dirty tracking (so far it is all homogeneous in x86, likely
not the case for smmuv3). Such invariant on dirty tracking provides a
useful guarantee to VMMs that will refuse incompatible device
attachments for IOMMU domains.
Dirty tracking insurance is enforced via HWPT_ALLOC, which is
responsible for creating an IOMMU domain. This is contrast to the
'simple API' where the IOMMU domain is created by IOMMUFD automatically
when it attaches to VFIO (usually referred as autodomains) but it has
the needed handling for mdevs.
To support dirty tracking with the advanced IOMMUFD API, it needs
similar logic, where IOMMU domains are created and devices attached to
compatible domains. Essentially mimicking kernel
iommufd_device_auto_get_domain(). With mdevs given there's no IOMMU domain
it falls back to IOAS attach.
The auto domain logic allows different IOMMU domains to be created when
DMA dirty tracking is not desired (and VF can provide it), and others where
it is. Here it is not used in this way given how VFIODevice migration
state is initialized after the device attachment. But such mixed mode of
IOMMU dirty tracking + device dirty tracking is an improvement that can
be added on. Keep the 'all of nothing' of type1 approach that we have
been using so far between container vs device dirty tracking.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
include/hw/vfio/vfio-common.h | 9 ++++
include/sysemu/iommufd.h | 5 +++
backends/iommufd.c | 30 +++++++++++++
hw/vfio/iommufd.c | 84 +++++++++++++++++++++++++++++++++++
backends/trace-events | 1 +
5 files changed, 129 insertions(+)
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 98acae8c1c97..1a96678f8c38 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -95,10 +95,17 @@ typedef struct VFIOHostDMAWindow {
typedef struct IOMMUFDBackend IOMMUFDBackend;
+typedef struct VFIOIOASHwpt {
+ uint32_t hwpt_id;
+ QLIST_HEAD(, VFIODevice) device_list;
+ QLIST_ENTRY(VFIOIOASHwpt) next;
+} VFIOIOASHwpt;
+
typedef struct VFIOIOMMUFDContainer {
VFIOContainerBase bcontainer;
IOMMUFDBackend *be;
uint32_t ioas_id;
+ QLIST_HEAD(, VFIOIOASHwpt) hwpt_list;
} VFIOIOMMUFDContainer;
OBJECT_DECLARE_SIMPLE_TYPE(VFIOIOMMUFDContainer, VFIO_IOMMU_IOMMUFD);
@@ -135,6 +142,8 @@ typedef struct VFIODevice {
HostIOMMUDevice *hiod;
int devid;
IOMMUFDBackend *iommufd;
+ VFIOIOASHwpt *hwpt;
+ QLIST_ENTRY(VFIODevice) hwpt_next;
} VFIODevice;
struct VFIODeviceOps {
diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
index 57d502a1c79a..e917e7591d05 100644
--- a/include/sysemu/iommufd.h
+++ b/include/sysemu/iommufd.h
@@ -50,6 +50,11 @@ int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t devid,
uint32_t *type, void *data, uint32_t len,
uint64_t *caps, Error **errp);
+bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t dev_id,
+ uint32_t pt_id, uint32_t flags,
+ uint32_t data_type, uint32_t data_len,
+ void *data_ptr, uint32_t *out_hwpt,
+ Error **errp);
#define TYPE_HOST_IOMMU_DEVICE_IOMMUFD TYPE_HOST_IOMMU_DEVICE "-iommufd"
#endif
diff --git a/backends/iommufd.c b/backends/iommufd.c
index 2b3d51af26d2..a94d3b90c05c 100644
--- a/backends/iommufd.c
+++ b/backends/iommufd.c
@@ -208,6 +208,36 @@ int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
return ret;
}
+bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t dev_id,
+ uint32_t pt_id, uint32_t flags,
+ uint32_t data_type, uint32_t data_len,
+ void *data_ptr, uint32_t *out_hwpt,
+ Error **errp)
+{
+ int ret, fd = be->fd;
+ struct iommu_hwpt_alloc alloc_hwpt = {
+ .size = sizeof(struct iommu_hwpt_alloc),
+ .flags = flags,
+ .dev_id = dev_id,
+ .pt_id = pt_id,
+ .data_type = data_type,
+ .data_len = data_len,
+ .data_uptr = (uintptr_t)data_ptr,
+ };
+
+ ret = ioctl(fd, IOMMU_HWPT_ALLOC, &alloc_hwpt);
+ trace_iommufd_backend_alloc_hwpt(fd, dev_id, pt_id, flags, data_type,
+ data_len, (uintptr_t)data_ptr,
+ alloc_hwpt.out_hwpt_id, ret);
+ if (ret) {
+ error_setg_errno(errp, errno, "Failed to allocate hwpt");
+ return false;
+ }
+
+ *out_hwpt = alloc_hwpt.out_hwpt_id;
+ return true;
+}
+
bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t devid,
uint32_t *type, void *data, uint32_t len,
uint64_t *caps, Error **errp)
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 077dea8f1b64..545f4a404125 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -212,10 +212,88 @@ static bool iommufd_cdev_detach_ioas_hwpt(VFIODevice *vbasedev, Error **errp)
return true;
}
+static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
+ VFIOIOMMUFDContainer *container,
+ Error **errp)
+{
+ IOMMUFDBackend *iommufd = vbasedev->iommufd;
+ uint32_t flags = 0;
+ VFIOIOASHwpt *hwpt;
+ uint32_t hwpt_id;
+ int ret;
+
+ /* Try to find a domain */
+ QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
+ ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, errp);
+ if (ret) {
+ /* -EINVAL means the domain is incompatible with the device. */
+ if (ret == -EINVAL) {
+ /*
+ * It is an expected failure and it just means we will try
+ * another domain, or create one if no existing compatible
+ * domain is found. Hence why the error is discarded below.
+ */
+ error_free(*errp);
+ *errp = NULL;
+ continue;
+ }
+
+ return false;
+ } else {
+ vbasedev->hwpt = hwpt;
+ QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
+ return true;
+ }
+ }
+
+ if (!iommufd_backend_alloc_hwpt(iommufd, vbasedev->devid,
+ container->ioas_id, flags,
+ IOMMU_HWPT_DATA_NONE, 0, NULL,
+ &hwpt_id, errp)) {
+ return false;
+ }
+
+ hwpt = g_malloc0(sizeof(*hwpt));
+ hwpt->hwpt_id = hwpt_id;
+ QLIST_INIT(&hwpt->device_list);
+
+ ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, errp);
+ if (ret) {
+ iommufd_backend_free_id(container->be, hwpt->hwpt_id);
+ g_free(hwpt);
+ return false;
+ }
+
+ vbasedev->hwpt = hwpt;
+ QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
+ QLIST_INSERT_HEAD(&container->hwpt_list, hwpt, next);
+ return true;
+}
+
+static void iommufd_cdev_autodomains_put(VFIODevice *vbasedev,
+ VFIOIOMMUFDContainer *container)
+{
+ VFIOIOASHwpt *hwpt = vbasedev->hwpt;
+
+ QLIST_REMOVE(vbasedev, hwpt_next);
+ vbasedev->hwpt = NULL;
+
+ if (QLIST_EMPTY(&hwpt->device_list)) {
+ QLIST_REMOVE(hwpt, next);
+ iommufd_backend_free_id(container->be, hwpt->hwpt_id);
+ g_free(hwpt);
+ }
+}
+
static bool iommufd_cdev_attach_container(VFIODevice *vbasedev,
VFIOIOMMUFDContainer *container,
Error **errp)
{
+ /* mdevs aren't physical devices and will fail with auto domains */
+ if (!vbasedev->mdev) {
+ return iommufd_cdev_autodomains_get(vbasedev, container, errp);
+ }
+
return !iommufd_cdev_attach_ioas_hwpt(vbasedev, container->ioas_id, errp);
}
@@ -227,6 +305,11 @@ static void iommufd_cdev_detach_container(VFIODevice *vbasedev,
if (!iommufd_cdev_detach_ioas_hwpt(vbasedev, &err)) {
error_report_err(err);
}
+
+ if (vbasedev->hwpt) {
+ iommufd_cdev_autodomains_put(vbasedev, container);
+ }
+
}
static void iommufd_cdev_container_destroy(VFIOIOMMUFDContainer *container)
@@ -354,6 +437,7 @@ static bool iommufd_cdev_attach(const char *name, VFIODevice *vbasedev,
container = VFIO_IOMMU_IOMMUFD(object_new(TYPE_VFIO_IOMMU_IOMMUFD));
container->be = vbasedev->iommufd;
container->ioas_id = ioas_id;
+ QLIST_INIT(&container->hwpt_list);
bcontainer = &container->bcontainer;
vfio_address_space_insert(space, bcontainer);
diff --git a/backends/trace-events b/backends/trace-events
index 211e6f374adc..4d8ac02fe7d6 100644
--- a/backends/trace-events
+++ b/backends/trace-events
@@ -14,4 +14,5 @@ iommufd_backend_map_dma(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size
iommufd_backend_unmap_dma_non_exist(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size, int ret) " Unmap nonexistent mapping: iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" (%d)"
iommufd_backend_unmap_dma(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size, int ret) " iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" (%d)"
iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas) " iommufd=%d ioas=%d"
+iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id, uint32_t pt_id, uint32_t flags, uint32_t hwpt_type, uint32_t len, uint64_t data_ptr, uint32_t out_hwpt_id, int ret) " iommufd=%d dev_id=%u pt_id=%u flags=0x%x hwpt_type=%u len=%u data_ptr=0x%"PRIx64" out_hwpt=%u (%d)"
iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d id=%d (%d)"
--
2.17.2
^ permalink raw reply related [flat|nested] 53+ messages in thread
* RE: [PATCH v5 05/13] vfio/iommufd: Introduce auto domain creation
2024-07-19 12:04 ` [PATCH v5 05/13] vfio/iommufd: Introduce auto domain creation Joao Martins
@ 2024-07-22 5:16 ` Duan, Zhenzhong
2024-07-22 8:50 ` Joao Martins
0 siblings, 1 reply; 53+ messages in thread
From: Duan, Zhenzhong @ 2024-07-22 5:16 UTC (permalink / raw)
To: Joao Martins, qemu-devel@nongnu.org
Cc: Liu, Yi L, Eric Auger, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
>-----Original Message-----
>From: Joao Martins <joao.m.martins@oracle.com>
>Subject: [PATCH v5 05/13] vfio/iommufd: Introduce auto domain creation
>
>There's generally two modes of operation for IOMMUFD:
>
>1) The simple user API which intends to perform relatively simple things
>with IOMMUs e.g. DPDK. The process generally creates an IOAS and attaches
>to VFIO and mainly performs IOAS_MAP and UNMAP.
>
>2) The native IOMMUFD API where you have fine grained control of the
>IOMMU domain and model it accordingly. This is where most new feature
>are being steered to.
>
>For dirty tracking 2) is required, as it needs to ensure that
>the stage-2/parent IOMMU domain will only attach devices
>that support dirty tracking (so far it is all homogeneous in x86, likely
>not the case for smmuv3). Such invariant on dirty tracking provides a
>useful guarantee to VMMs that will refuse incompatible device
>attachments for IOMMU domains.
>
>Dirty tracking insurance is enforced via HWPT_ALLOC, which is
>responsible for creating an IOMMU domain. This is contrast to the
>'simple API' where the IOMMU domain is created by IOMMUFD
>automatically
>when it attaches to VFIO (usually referred as autodomains) but it has
>the needed handling for mdevs.
>
>To support dirty tracking with the advanced IOMMUFD API, it needs
>similar logic, where IOMMU domains are created and devices attached to
>compatible domains. Essentially mimicking kernel
>iommufd_device_auto_get_domain(). With mdevs given there's no IOMMU
>domain
>it falls back to IOAS attach.
>
>The auto domain logic allows different IOMMU domains to be created when
>DMA dirty tracking is not desired (and VF can provide it), and others where
>it is. Here it is not used in this way given how VFIODevice migration
>state is initialized after the device attachment. But such mixed mode of
>IOMMU dirty tracking + device dirty tracking is an improvement that can
>be added on. Keep the 'all of nothing' of type1 approach that we have
>been using so far between container vs device dirty tracking.
>
>Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>---
> include/hw/vfio/vfio-common.h | 9 ++++
> include/sysemu/iommufd.h | 5 +++
> backends/iommufd.c | 30 +++++++++++++
> hw/vfio/iommufd.c | 84
>+++++++++++++++++++++++++++++++++++
> backends/trace-events | 1 +
> 5 files changed, 129 insertions(+)
>
>diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-
>common.h
>index 98acae8c1c97..1a96678f8c38 100644
>--- a/include/hw/vfio/vfio-common.h
>+++ b/include/hw/vfio/vfio-common.h
>@@ -95,10 +95,17 @@ typedef struct VFIOHostDMAWindow {
>
> typedef struct IOMMUFDBackend IOMMUFDBackend;
>
>+typedef struct VFIOIOASHwpt {
>+ uint32_t hwpt_id;
>+ QLIST_HEAD(, VFIODevice) device_list;
>+ QLIST_ENTRY(VFIOIOASHwpt) next;
>+} VFIOIOASHwpt;
>+
> typedef struct VFIOIOMMUFDContainer {
> VFIOContainerBase bcontainer;
> IOMMUFDBackend *be;
> uint32_t ioas_id;
>+ QLIST_HEAD(, VFIOIOASHwpt) hwpt_list;
> } VFIOIOMMUFDContainer;
>
> OBJECT_DECLARE_SIMPLE_TYPE(VFIOIOMMUFDContainer,
>VFIO_IOMMU_IOMMUFD);
>@@ -135,6 +142,8 @@ typedef struct VFIODevice {
> HostIOMMUDevice *hiod;
> int devid;
> IOMMUFDBackend *iommufd;
>+ VFIOIOASHwpt *hwpt;
>+ QLIST_ENTRY(VFIODevice) hwpt_next;
> } VFIODevice;
>
> struct VFIODeviceOps {
>diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
>index 57d502a1c79a..e917e7591d05 100644
>--- a/include/sysemu/iommufd.h
>+++ b/include/sysemu/iommufd.h
>@@ -50,6 +50,11 @@ int
>iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
> bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t
>devid,
> uint32_t *type, void *data, uint32_t len,
> uint64_t *caps, Error **errp);
>+bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t
>dev_id,
>+ uint32_t pt_id, uint32_t flags,
>+ uint32_t data_type, uint32_t data_len,
>+ void *data_ptr, uint32_t *out_hwpt,
>+ Error **errp);
>
> #define TYPE_HOST_IOMMU_DEVICE_IOMMUFD
>TYPE_HOST_IOMMU_DEVICE "-iommufd"
> #endif
>diff --git a/backends/iommufd.c b/backends/iommufd.c
>index 2b3d51af26d2..a94d3b90c05c 100644
>--- a/backends/iommufd.c
>+++ b/backends/iommufd.c
>@@ -208,6 +208,36 @@ int
>iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
> return ret;
> }
>
>+bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t
>dev_id,
>+ uint32_t pt_id, uint32_t flags,
>+ uint32_t data_type, uint32_t data_len,
>+ void *data_ptr, uint32_t *out_hwpt,
>+ Error **errp)
>+{
>+ int ret, fd = be->fd;
>+ struct iommu_hwpt_alloc alloc_hwpt = {
>+ .size = sizeof(struct iommu_hwpt_alloc),
>+ .flags = flags,
>+ .dev_id = dev_id,
>+ .pt_id = pt_id,
>+ .data_type = data_type,
>+ .data_len = data_len,
>+ .data_uptr = (uintptr_t)data_ptr,
>+ };
>+
>+ ret = ioctl(fd, IOMMU_HWPT_ALLOC, &alloc_hwpt);
>+ trace_iommufd_backend_alloc_hwpt(fd, dev_id, pt_id, flags, data_type,
>+ data_len, (uintptr_t)data_ptr,
>+ alloc_hwpt.out_hwpt_id, ret);
>+ if (ret) {
>+ error_setg_errno(errp, errno, "Failed to allocate hwpt");
>+ return false;
>+ }
>+
>+ *out_hwpt = alloc_hwpt.out_hwpt_id;
>+ return true;
>+}
>+
> bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t
>devid,
> uint32_t *type, void *data, uint32_t len,
> uint64_t *caps, Error **errp)
>diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>index 077dea8f1b64..545f4a404125 100644
>--- a/hw/vfio/iommufd.c
>+++ b/hw/vfio/iommufd.c
>@@ -212,10 +212,88 @@ static bool
>iommufd_cdev_detach_ioas_hwpt(VFIODevice *vbasedev, Error **errp)
> return true;
> }
>
>+static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>+ VFIOIOMMUFDContainer *container,
>+ Error **errp)
>+{
>+ IOMMUFDBackend *iommufd = vbasedev->iommufd;
>+ uint32_t flags = 0;
>+ VFIOIOASHwpt *hwpt;
>+ uint32_t hwpt_id;
>+ int ret;
>+
>+ /* Try to find a domain */
>+ QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
>+ ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id,
>errp);
>+ if (ret) {
>+ /* -EINVAL means the domain is incompatible with the device. */
>+ if (ret == -EINVAL) {
>+ /*
>+ * It is an expected failure and it just means we will try
>+ * another domain, or create one if no existing compatible
>+ * domain is found. Hence why the error is discarded below.
>+ */
>+ error_free(*errp);
>+ *errp = NULL;
>+ continue;
>+ }
>+
>+ return false;
>+ } else {
>+ vbasedev->hwpt = hwpt;
>+ QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
>+ return true;
>+ }
>+ }
>+
>+ if (!iommufd_backend_alloc_hwpt(iommufd, vbasedev->devid,
>+ container->ioas_id, flags,
>+ IOMMU_HWPT_DATA_NONE, 0, NULL,
>+ &hwpt_id, errp)) {
>+ return false;
>+ }
>+
>+ hwpt = g_malloc0(sizeof(*hwpt));
>+ hwpt->hwpt_id = hwpt_id;
>+ QLIST_INIT(&hwpt->device_list);
>+
>+ ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, errp);
>+ if (ret) {
>+ iommufd_backend_free_id(container->be, hwpt->hwpt_id);
>+ g_free(hwpt);
>+ return false;
>+ }
>+
>+ vbasedev->hwpt = hwpt;
>+ QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
>+ QLIST_INSERT_HEAD(&container->hwpt_list, hwpt, next);
>+ return true;
>+}
>+
>+static void iommufd_cdev_autodomains_put(VFIODevice *vbasedev,
>+ VFIOIOMMUFDContainer *container)
>+{
>+ VFIOIOASHwpt *hwpt = vbasedev->hwpt;
>+
>+ QLIST_REMOVE(vbasedev, hwpt_next);
>+ vbasedev->hwpt = NULL;
>+
>+ if (QLIST_EMPTY(&hwpt->device_list)) {
>+ QLIST_REMOVE(hwpt, next);
>+ iommufd_backend_free_id(container->be, hwpt->hwpt_id);
>+ g_free(hwpt);
>+ }
>+}
Looks the detach flow is still missed?
>+
> static bool iommufd_cdev_attach_container(VFIODevice *vbasedev,
> VFIOIOMMUFDContainer *container,
> Error **errp)
> {
>+ /* mdevs aren't physical devices and will fail with auto domains */
>+ if (!vbasedev->mdev) {
>+ return iommufd_cdev_autodomains_get(vbasedev, container, errp);
>+ }
>+
> return !iommufd_cdev_attach_ioas_hwpt(vbasedev, container->ioas_id,
>errp);
> }
>
>@@ -227,6 +305,11 @@ static void
>iommufd_cdev_detach_container(VFIODevice *vbasedev,
> if (!iommufd_cdev_detach_ioas_hwpt(vbasedev, &err)) {
Shouldn't we check mdev before calling this?
> error_report_err(err);
> }
>+
>+ if (vbasedev->hwpt) {
>+ iommufd_cdev_autodomains_put(vbasedev, container);
>+ }
>+
> }
>
> static void iommufd_cdev_container_destroy(VFIOIOMMUFDContainer
>*container)
>@@ -354,6 +437,7 @@ static bool iommufd_cdev_attach(const char *name,
>VFIODevice *vbasedev,
> container =
>VFIO_IOMMU_IOMMUFD(object_new(TYPE_VFIO_IOMMU_IOMMUFD));
> container->be = vbasedev->iommufd;
> container->ioas_id = ioas_id;
>+ QLIST_INIT(&container->hwpt_list);
This can be in ::instance_init().
Thanks
Zhenzhong
>
> bcontainer = &container->bcontainer;
> vfio_address_space_insert(space, bcontainer);
>diff --git a/backends/trace-events b/backends/trace-events
>index 211e6f374adc..4d8ac02fe7d6 100644
>--- a/backends/trace-events
>+++ b/backends/trace-events
>@@ -14,4 +14,5 @@ iommufd_backend_map_dma(int iommufd, uint32_t
>ioas, uint64_t iova, uint64_t size
> iommufd_backend_unmap_dma_non_exist(int iommufd, uint32_t ioas,
>uint64_t iova, uint64_t size, int ret) " Unmap nonexistent mapping:
>iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" (%d)"
> iommufd_backend_unmap_dma(int iommufd, uint32_t ioas, uint64_t iova,
>uint64_t size, int ret) " iommufd=%d ioas=%d iova=0x%"PRIx64"
>size=0x%"PRIx64" (%d)"
> iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas) " iommufd=%d
>ioas=%d"
>+iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id, uint32_t
>pt_id, uint32_t flags, uint32_t hwpt_type, uint32_t len, uint64_t data_ptr,
>uint32_t out_hwpt_id, int ret) " iommufd=%d dev_id=%u pt_id=%u
>flags=0x%x hwpt_type=%u len=%u data_ptr=0x%"PRIx64" out_hwpt=%u
>(%d)"
> iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d
>id=%d (%d)"
>--
>2.17.2
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v5 05/13] vfio/iommufd: Introduce auto domain creation
2024-07-22 5:16 ` Duan, Zhenzhong
@ 2024-07-22 8:50 ` Joao Martins
2024-07-22 14:21 ` Cédric Le Goater
2024-07-23 4:36 ` Duan, Zhenzhong
0 siblings, 2 replies; 53+ messages in thread
From: Joao Martins @ 2024-07-22 8:50 UTC (permalink / raw)
To: Duan, Zhenzhong, qemu-devel@nongnu.org
Cc: Liu, Yi L, Eric Auger, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
On 22/07/2024 06:16, Duan, Zhenzhong wrote:
>> -----Original Message-----
>> From: Joao Martins <joao.m.martins@oracle.com>
>> Subject: [PATCH v5 05/13] vfio/iommufd: Introduce auto domain creation
>>
>> There's generally two modes of operation for IOMMUFD:
>>
>> 1) The simple user API which intends to perform relatively simple things
>> with IOMMUs e.g. DPDK. The process generally creates an IOAS and attaches
>> to VFIO and mainly performs IOAS_MAP and UNMAP.
>>
>> 2) The native IOMMUFD API where you have fine grained control of the
>> IOMMU domain and model it accordingly. This is where most new feature
>> are being steered to.
>>
>> For dirty tracking 2) is required, as it needs to ensure that
>> the stage-2/parent IOMMU domain will only attach devices
>> that support dirty tracking (so far it is all homogeneous in x86, likely
>> not the case for smmuv3). Such invariant on dirty tracking provides a
>> useful guarantee to VMMs that will refuse incompatible device
>> attachments for IOMMU domains.
>>
>> Dirty tracking insurance is enforced via HWPT_ALLOC, which is
>> responsible for creating an IOMMU domain. This is contrast to the
>> 'simple API' where the IOMMU domain is created by IOMMUFD
>> automatically
>> when it attaches to VFIO (usually referred as autodomains) but it has
>> the needed handling for mdevs.
>>
>> To support dirty tracking with the advanced IOMMUFD API, it needs
>> similar logic, where IOMMU domains are created and devices attached to
>> compatible domains. Essentially mimicking kernel
>> iommufd_device_auto_get_domain(). With mdevs given there's no IOMMU
>> domain
>> it falls back to IOAS attach.
>>
>> The auto domain logic allows different IOMMU domains to be created when
>> DMA dirty tracking is not desired (and VF can provide it), and others where
>> it is. Here it is not used in this way given how VFIODevice migration
>> state is initialized after the device attachment. But such mixed mode of
>> IOMMU dirty tracking + device dirty tracking is an improvement that can
>> be added on. Keep the 'all of nothing' of type1 approach that we have
>> been using so far between container vs device dirty tracking.
>>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> ---
>> include/hw/vfio/vfio-common.h | 9 ++++
>> include/sysemu/iommufd.h | 5 +++
>> backends/iommufd.c | 30 +++++++++++++
>> hw/vfio/iommufd.c | 84
>> +++++++++++++++++++++++++++++++++++
>> backends/trace-events | 1 +
>> 5 files changed, 129 insertions(+)
>>
>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-
>> common.h
>> index 98acae8c1c97..1a96678f8c38 100644
>> --- a/include/hw/vfio/vfio-common.h
>> +++ b/include/hw/vfio/vfio-common.h
>> @@ -95,10 +95,17 @@ typedef struct VFIOHostDMAWindow {
>>
>> typedef struct IOMMUFDBackend IOMMUFDBackend;
>>
>> +typedef struct VFIOIOASHwpt {
>> + uint32_t hwpt_id;
>> + QLIST_HEAD(, VFIODevice) device_list;
>> + QLIST_ENTRY(VFIOIOASHwpt) next;
>> +} VFIOIOASHwpt;
>> +
>> typedef struct VFIOIOMMUFDContainer {
>> VFIOContainerBase bcontainer;
>> IOMMUFDBackend *be;
>> uint32_t ioas_id;
>> + QLIST_HEAD(, VFIOIOASHwpt) hwpt_list;
>> } VFIOIOMMUFDContainer;
>>
>> OBJECT_DECLARE_SIMPLE_TYPE(VFIOIOMMUFDContainer,
>> VFIO_IOMMU_IOMMUFD);
>> @@ -135,6 +142,8 @@ typedef struct VFIODevice {
>> HostIOMMUDevice *hiod;
>> int devid;
>> IOMMUFDBackend *iommufd;
>> + VFIOIOASHwpt *hwpt;
>> + QLIST_ENTRY(VFIODevice) hwpt_next;
>> } VFIODevice;
>>
>> struct VFIODeviceOps {
>> diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
>> index 57d502a1c79a..e917e7591d05 100644
>> --- a/include/sysemu/iommufd.h
>> +++ b/include/sysemu/iommufd.h
>> @@ -50,6 +50,11 @@ int
>> iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
>> bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t
>> devid,
>> uint32_t *type, void *data, uint32_t len,
>> uint64_t *caps, Error **errp);
>> +bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t
>> dev_id,
>> + uint32_t pt_id, uint32_t flags,
>> + uint32_t data_type, uint32_t data_len,
>> + void *data_ptr, uint32_t *out_hwpt,
>> + Error **errp);
>>
>> #define TYPE_HOST_IOMMU_DEVICE_IOMMUFD
>> TYPE_HOST_IOMMU_DEVICE "-iommufd"
>> #endif
>> diff --git a/backends/iommufd.c b/backends/iommufd.c
>> index 2b3d51af26d2..a94d3b90c05c 100644
>> --- a/backends/iommufd.c
>> +++ b/backends/iommufd.c
>> @@ -208,6 +208,36 @@ int
>> iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
>> return ret;
>> }
>>
>> +bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t
>> dev_id,
>> + uint32_t pt_id, uint32_t flags,
>> + uint32_t data_type, uint32_t data_len,
>> + void *data_ptr, uint32_t *out_hwpt,
>> + Error **errp)
>> +{
>> + int ret, fd = be->fd;
>> + struct iommu_hwpt_alloc alloc_hwpt = {
>> + .size = sizeof(struct iommu_hwpt_alloc),
>> + .flags = flags,
>> + .dev_id = dev_id,
>> + .pt_id = pt_id,
>> + .data_type = data_type,
>> + .data_len = data_len,
>> + .data_uptr = (uintptr_t)data_ptr,
>> + };
>> +
>> + ret = ioctl(fd, IOMMU_HWPT_ALLOC, &alloc_hwpt);
>> + trace_iommufd_backend_alloc_hwpt(fd, dev_id, pt_id, flags, data_type,
>> + data_len, (uintptr_t)data_ptr,
>> + alloc_hwpt.out_hwpt_id, ret);
>> + if (ret) {
>> + error_setg_errno(errp, errno, "Failed to allocate hwpt");
>> + return false;
>> + }
>> +
>> + *out_hwpt = alloc_hwpt.out_hwpt_id;
>> + return true;
>> +}
>> +
>> bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t
>> devid,
>> uint32_t *type, void *data, uint32_t len,
>> uint64_t *caps, Error **errp)
>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>> index 077dea8f1b64..545f4a404125 100644
>> --- a/hw/vfio/iommufd.c
>> +++ b/hw/vfio/iommufd.c
>> @@ -212,10 +212,88 @@ static bool
>> iommufd_cdev_detach_ioas_hwpt(VFIODevice *vbasedev, Error **errp)
>> return true;
>> }
>>
>> +static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>> + VFIOIOMMUFDContainer *container,
>> + Error **errp)
>> +{
>> + IOMMUFDBackend *iommufd = vbasedev->iommufd;
>> + uint32_t flags = 0;
>> + VFIOIOASHwpt *hwpt;
>> + uint32_t hwpt_id;
>> + int ret;
>> +
>> + /* Try to find a domain */
>> + QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
>> + ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id,
>> errp);
>> + if (ret) {
>> + /* -EINVAL means the domain is incompatible with the device. */
>> + if (ret == -EINVAL) {
>> + /*
>> + * It is an expected failure and it just means we will try
>> + * another domain, or create one if no existing compatible
>> + * domain is found. Hence why the error is discarded below.
>> + */
>> + error_free(*errp);
>> + *errp = NULL;
>> + continue;
>> + }
>> +
>> + return false;
>> + } else {
>> + vbasedev->hwpt = hwpt;
>> + QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
>> + return true;
>> + }
>> + }
>> +
>> + if (!iommufd_backend_alloc_hwpt(iommufd, vbasedev->devid,
>> + container->ioas_id, flags,
>> + IOMMU_HWPT_DATA_NONE, 0, NULL,
>> + &hwpt_id, errp)) {
>> + return false;
>> + }
>> +
>> + hwpt = g_malloc0(sizeof(*hwpt));
>> + hwpt->hwpt_id = hwpt_id;
>> + QLIST_INIT(&hwpt->device_list);
>> +
>> + ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, errp);
>> + if (ret) {
>> + iommufd_backend_free_id(container->be, hwpt->hwpt_id);
>> + g_free(hwpt);
>> + return false;
>> + }
>> +
>> + vbasedev->hwpt = hwpt;
>> + QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
>> + QLIST_INSERT_HEAD(&container->hwpt_list, hwpt, next);
>> + return true;
>> +}
>> +
>> +static void iommufd_cdev_autodomains_put(VFIODevice *vbasedev,
>> + VFIOIOMMUFDContainer *container)
>> +{
>> + VFIOIOASHwpt *hwpt = vbasedev->hwpt;
>> +
>> + QLIST_REMOVE(vbasedev, hwpt_next);
>> + vbasedev->hwpt = NULL;
>> +
>> + if (QLIST_EMPTY(&hwpt->device_list)) {
>> + QLIST_REMOVE(hwpt, next);
>> + iommufd_backend_free_id(container->be, hwpt->hwpt_id);
>> + g_free(hwpt);
>> + }
>> +}
>
> Looks the detach flow is still missed?
>
I don't think so. The iommufd_backend_free_id() pairs with alloc_hwpt call and
is there for when there's no device attached to the hwpt to actually free the
hwpt. Besides setting to NULL the device hwpt, the detach flow was fixed below (...)
>> +
>> static bool iommufd_cdev_attach_container(VFIODevice *vbasedev,
>> VFIOIOMMUFDContainer *container,
>> Error **errp)
>> {
>> + /* mdevs aren't physical devices and will fail with auto domains */
>> + if (!vbasedev->mdev) {
>> + return iommufd_cdev_autodomains_get(vbasedev, container, errp);
>> + }
>> +
>> return !iommufd_cdev_attach_ioas_hwpt(vbasedev, container->ioas_id,
>> errp);
>> }
>>
>> @@ -227,6 +305,11 @@ static void
>> iommufd_cdev_detach_container(VFIODevice *vbasedev,
>> if (!iommufd_cdev_detach_ioas_hwpt(vbasedev, &err)) {
>
> Shouldn't we check mdev before calling this?
>
(...) here. Detach needs to be called for both, and keep in mind that this
doesn't a pt_id, as the ioctl detaches from whatever domain or emulated idea of
it (for mdev) that it has previously been called IOMMUFD_ATTACH with.
We also call this with mdev we just don't call it with a hwpt_id but rather use
autodomains (and it doesn't actually allocate a hw domain)
>> error_report_err(err);
>> }
>> +
>> + if (vbasedev->hwpt) {
>> + iommufd_cdev_autodomains_put(vbasedev, container);
>> + }
>> +
>> }
>>
>> static void iommufd_cdev_container_destroy(VFIOIOMMUFDContainer
>> *container)
>> @@ -354,6 +437,7 @@ static bool iommufd_cdev_attach(const char *name,
>> VFIODevice *vbasedev,
>> container =
>> VFIO_IOMMU_IOMMUFD(object_new(TYPE_VFIO_IOMMU_IOMMUFD));
>> container->be = vbasedev->iommufd;
>> container->ioas_id = ioas_id;
>> + QLIST_INIT(&container->hwpt_list);
>
> This can be in ::instance_init().
>
But there's no instance_init() for TYPE_VFIO_IOMMU_IOMMUFD. This is where all
IOMMUFD container stuff is taking place aiui.
> Thanks
> Zhenzhong
>
>>
>> bcontainer = &container->bcontainer;
>> vfio_address_space_insert(space, bcontainer);
>> diff --git a/backends/trace-events b/backends/trace-events
>> index 211e6f374adc..4d8ac02fe7d6 100644
>> --- a/backends/trace-events
>> +++ b/backends/trace-events
>> @@ -14,4 +14,5 @@ iommufd_backend_map_dma(int iommufd, uint32_t
>> ioas, uint64_t iova, uint64_t size
>> iommufd_backend_unmap_dma_non_exist(int iommufd, uint32_t ioas,
>> uint64_t iova, uint64_t size, int ret) " Unmap nonexistent mapping:
>> iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" (%d)"
>> iommufd_backend_unmap_dma(int iommufd, uint32_t ioas, uint64_t iova,
>> uint64_t size, int ret) " iommufd=%d ioas=%d iova=0x%"PRIx64"
>> size=0x%"PRIx64" (%d)"
>> iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas) " iommufd=%d
>> ioas=%d"
>> +iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id, uint32_t
>> pt_id, uint32_t flags, uint32_t hwpt_type, uint32_t len, uint64_t data_ptr,
>> uint32_t out_hwpt_id, int ret) " iommufd=%d dev_id=%u pt_id=%u
>> flags=0x%x hwpt_type=%u len=%u data_ptr=0x%"PRIx64" out_hwpt=%u
>> (%d)"
>> iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d
>> id=%d (%d)"
>> --
>> 2.17.2
>
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v5 05/13] vfio/iommufd: Introduce auto domain creation
2024-07-22 8:50 ` Joao Martins
@ 2024-07-22 14:21 ` Cédric Le Goater
2024-07-23 2:36 ` Duan, Zhenzhong
2024-07-23 4:36 ` Duan, Zhenzhong
1 sibling, 1 reply; 53+ messages in thread
From: Cédric Le Goater @ 2024-07-22 14:21 UTC (permalink / raw)
To: Joao Martins, Duan, Zhenzhong, qemu-devel@nongnu.org
Cc: Liu, Yi L, Eric Auger, Alex Williamson, Jason Gunthorpe,
Avihai Horon
On 7/22/24 10:50, Joao Martins wrote:
> On 22/07/2024 06:16, Duan, Zhenzhong wrote:
>>> -----Original Message-----
>>> From: Joao Martins <joao.m.martins@oracle.com>
>>> Subject: [PATCH v5 05/13] vfio/iommufd: Introduce auto domain creation
>>>
>>> There's generally two modes of operation for IOMMUFD:
>>>
>>> 1) The simple user API which intends to perform relatively simple things
>>> with IOMMUs e.g. DPDK. The process generally creates an IOAS and attaches
>>> to VFIO and mainly performs IOAS_MAP and UNMAP.
>>>
>>> 2) The native IOMMUFD API where you have fine grained control of the
>>> IOMMU domain and model it accordingly. This is where most new feature
>>> are being steered to.
>>>
>>> For dirty tracking 2) is required, as it needs to ensure that
>>> the stage-2/parent IOMMU domain will only attach devices
>>> that support dirty tracking (so far it is all homogeneous in x86, likely
>>> not the case for smmuv3). Such invariant on dirty tracking provides a
>>> useful guarantee to VMMs that will refuse incompatible device
>>> attachments for IOMMU domains.
>>>
>>> Dirty tracking insurance is enforced via HWPT_ALLOC, which is
>>> responsible for creating an IOMMU domain. This is contrast to the
>>> 'simple API' where the IOMMU domain is created by IOMMUFD
>>> automatically
>>> when it attaches to VFIO (usually referred as autodomains) but it has
>>> the needed handling for mdevs.
>>>
>>> To support dirty tracking with the advanced IOMMUFD API, it needs
>>> similar logic, where IOMMU domains are created and devices attached to
>>> compatible domains. Essentially mimicking kernel
>>> iommufd_device_auto_get_domain(). With mdevs given there's no IOMMU
>>> domain
>>> it falls back to IOAS attach.
>>>
>>> The auto domain logic allows different IOMMU domains to be created when
>>> DMA dirty tracking is not desired (and VF can provide it), and others where
>>> it is. Here it is not used in this way given how VFIODevice migration
>>> state is initialized after the device attachment. But such mixed mode of
>>> IOMMU dirty tracking + device dirty tracking is an improvement that can
>>> be added on. Keep the 'all of nothing' of type1 approach that we have
>>> been using so far between container vs device dirty tracking.
>>>
>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>> ---
>>> include/hw/vfio/vfio-common.h | 9 ++++
>>> include/sysemu/iommufd.h | 5 +++
>>> backends/iommufd.c | 30 +++++++++++++
>>> hw/vfio/iommufd.c | 84
>>> +++++++++++++++++++++++++++++++++++
>>> backends/trace-events | 1 +
>>> 5 files changed, 129 insertions(+)
>>>
>>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-
>>> common.h
>>> index 98acae8c1c97..1a96678f8c38 100644
>>> --- a/include/hw/vfio/vfio-common.h
>>> +++ b/include/hw/vfio/vfio-common.h
>>> @@ -95,10 +95,17 @@ typedef struct VFIOHostDMAWindow {
>>>
>>> typedef struct IOMMUFDBackend IOMMUFDBackend;
>>>
>>> +typedef struct VFIOIOASHwpt {
>>> + uint32_t hwpt_id;
>>> + QLIST_HEAD(, VFIODevice) device_list;
>>> + QLIST_ENTRY(VFIOIOASHwpt) next;
>>> +} VFIOIOASHwpt;
>>> +
>>> typedef struct VFIOIOMMUFDContainer {
>>> VFIOContainerBase bcontainer;
>>> IOMMUFDBackend *be;
>>> uint32_t ioas_id;
>>> + QLIST_HEAD(, VFIOIOASHwpt) hwpt_list;
>>> } VFIOIOMMUFDContainer;
>>>
>>> OBJECT_DECLARE_SIMPLE_TYPE(VFIOIOMMUFDContainer,
>>> VFIO_IOMMU_IOMMUFD);
>>> @@ -135,6 +142,8 @@ typedef struct VFIODevice {
>>> HostIOMMUDevice *hiod;
>>> int devid;
>>> IOMMUFDBackend *iommufd;
>>> + VFIOIOASHwpt *hwpt;
>>> + QLIST_ENTRY(VFIODevice) hwpt_next;
>>> } VFIODevice;
>>>
>>> struct VFIODeviceOps {
>>> diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
>>> index 57d502a1c79a..e917e7591d05 100644
>>> --- a/include/sysemu/iommufd.h
>>> +++ b/include/sysemu/iommufd.h
>>> @@ -50,6 +50,11 @@ int
>>> iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
>>> bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t
>>> devid,
>>> uint32_t *type, void *data, uint32_t len,
>>> uint64_t *caps, Error **errp);
>>> +bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t
>>> dev_id,
>>> + uint32_t pt_id, uint32_t flags,
>>> + uint32_t data_type, uint32_t data_len,
>>> + void *data_ptr, uint32_t *out_hwpt,
>>> + Error **errp);
>>>
>>> #define TYPE_HOST_IOMMU_DEVICE_IOMMUFD
>>> TYPE_HOST_IOMMU_DEVICE "-iommufd"
>>> #endif
>>> diff --git a/backends/iommufd.c b/backends/iommufd.c
>>> index 2b3d51af26d2..a94d3b90c05c 100644
>>> --- a/backends/iommufd.c
>>> +++ b/backends/iommufd.c
>>> @@ -208,6 +208,36 @@ int
>>> iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
>>> return ret;
>>> }
>>>
>>> +bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t
>>> dev_id,
>>> + uint32_t pt_id, uint32_t flags,
>>> + uint32_t data_type, uint32_t data_len,
>>> + void *data_ptr, uint32_t *out_hwpt,
>>> + Error **errp)
>>> +{
>>> + int ret, fd = be->fd;
>>> + struct iommu_hwpt_alloc alloc_hwpt = {
>>> + .size = sizeof(struct iommu_hwpt_alloc),
>>> + .flags = flags,
>>> + .dev_id = dev_id,
>>> + .pt_id = pt_id,
>>> + .data_type = data_type,
>>> + .data_len = data_len,
>>> + .data_uptr = (uintptr_t)data_ptr,
>>> + };
>>> +
>>> + ret = ioctl(fd, IOMMU_HWPT_ALLOC, &alloc_hwpt);
>>> + trace_iommufd_backend_alloc_hwpt(fd, dev_id, pt_id, flags, data_type,
>>> + data_len, (uintptr_t)data_ptr,
>>> + alloc_hwpt.out_hwpt_id, ret);
>>> + if (ret) {
>>> + error_setg_errno(errp, errno, "Failed to allocate hwpt");
>>> + return false;
>>> + }
>>> +
>>> + *out_hwpt = alloc_hwpt.out_hwpt_id;
>>> + return true;
>>> +}
>>> +
>>> bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t
>>> devid,
>>> uint32_t *type, void *data, uint32_t len,
>>> uint64_t *caps, Error **errp)
>>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>>> index 077dea8f1b64..545f4a404125 100644
>>> --- a/hw/vfio/iommufd.c
>>> +++ b/hw/vfio/iommufd.c
>>> @@ -212,10 +212,88 @@ static bool
>>> iommufd_cdev_detach_ioas_hwpt(VFIODevice *vbasedev, Error **errp)
>>> return true;
>>> }
>>>
>>> +static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>>> + VFIOIOMMUFDContainer *container,
>>> + Error **errp)
>>> +{
>>> + IOMMUFDBackend *iommufd = vbasedev->iommufd;
>>> + uint32_t flags = 0;
>>> + VFIOIOASHwpt *hwpt;
>>> + uint32_t hwpt_id;
>>> + int ret;
>>> +
>>> + /* Try to find a domain */
>>> + QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
>>> + ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id,
>>> errp);
>>> + if (ret) {
>>> + /* -EINVAL means the domain is incompatible with the device. */
>>> + if (ret == -EINVAL) {
>>> + /*
>>> + * It is an expected failure and it just means we will try
>>> + * another domain, or create one if no existing compatible
>>> + * domain is found. Hence why the error is discarded below.
>>> + */
>>> + error_free(*errp);
>>> + *errp = NULL;
>>> + continue;
>>> + }
>>> +
>>> + return false;
>>> + } else {
>>> + vbasedev->hwpt = hwpt;
>>> + QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
>>> + return true;
>>> + }
>>> + }
>>> +
>>> + if (!iommufd_backend_alloc_hwpt(iommufd, vbasedev->devid,
>>> + container->ioas_id, flags,
>>> + IOMMU_HWPT_DATA_NONE, 0, NULL,
>>> + &hwpt_id, errp)) {
>>> + return false;
>>> + }
>>> +
>>> + hwpt = g_malloc0(sizeof(*hwpt));
>>> + hwpt->hwpt_id = hwpt_id;
>>> + QLIST_INIT(&hwpt->device_list);
>>> +
>>> + ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, errp);
>>> + if (ret) {
>>> + iommufd_backend_free_id(container->be, hwpt->hwpt_id);
>>> + g_free(hwpt);
>>> + return false;
>>> + }
>>> +
>>> + vbasedev->hwpt = hwpt;
>>> + QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
>>> + QLIST_INSERT_HEAD(&container->hwpt_list, hwpt, next);
>>> + return true;
>>> +}
>>> +
>>> +static void iommufd_cdev_autodomains_put(VFIODevice *vbasedev,
>>> + VFIOIOMMUFDContainer *container)
>>> +{
>>> + VFIOIOASHwpt *hwpt = vbasedev->hwpt;
>>> +
>>> + QLIST_REMOVE(vbasedev, hwpt_next);
>>> + vbasedev->hwpt = NULL;
>>> +
>>> + if (QLIST_EMPTY(&hwpt->device_list)) {
>>> + QLIST_REMOVE(hwpt, next);
>>> + iommufd_backend_free_id(container->be, hwpt->hwpt_id);
>>> + g_free(hwpt);
>>> + }
>>> +}
>>
>> Looks the detach flow is still missed?
>>
>
>
> I don't think so. The iommufd_backend_free_id() pairs with alloc_hwpt call and
> is there for when there's no device attached to the hwpt to actually free the
> hwpt. Besides setting to NULL the device hwpt, the detach flow was fixed below (...)
>
>>> +
>>> static bool iommufd_cdev_attach_container(VFIODevice *vbasedev,
>>> VFIOIOMMUFDContainer *container,
>>> Error **errp)
>>> {
>>> + /* mdevs aren't physical devices and will fail with auto domains */
>>> + if (!vbasedev->mdev) {
>>> + return iommufd_cdev_autodomains_get(vbasedev, container, errp);
>>> + }
>>> +
>>> return !iommufd_cdev_attach_ioas_hwpt(vbasedev, container->ioas_id,
>>> errp);
>>> }
>>>
>>> @@ -227,6 +305,11 @@ static void
>>> iommufd_cdev_detach_container(VFIODevice *vbasedev,
>>> if (!iommufd_cdev_detach_ioas_hwpt(vbasedev, &err)) {
>>
>> Shouldn't we check mdev before calling this?
>>
> (...) here. Detach needs to be called for both, and keep in mind that this
> doesn't a pt_id, as the ioctl detaches from whatever domain or emulated idea of
> it (for mdev) that it has previously been called IOMMUFD_ATTACH with.
>
> We also call this with mdev we just don't call it with a hwpt_id but rather use
> autodomains (and it doesn't actually allocate a hw domain)
>
>>> error_report_err(err);
>>> }
>>> +
>>> + if (vbasedev->hwpt) {
>>> + iommufd_cdev_autodomains_put(vbasedev, container);
>>> + }
>>> +
>>> }
>>>
>>> static void iommufd_cdev_container_destroy(VFIOIOMMUFDContainer
>>> *container)
>>> @@ -354,6 +437,7 @@ static bool iommufd_cdev_attach(const char *name,
>>> VFIODevice *vbasedev,
>>> container =
>>> VFIO_IOMMU_IOMMUFD(object_new(TYPE_VFIO_IOMMU_IOMMUFD));
>>> container->be = vbasedev->iommufd;
>>> container->ioas_id = ioas_id;
>>> + QLIST_INIT(&container->hwpt_list);
>>
>> This can be in ::instance_init().
>>
> But there's no instance_init() for TYPE_VFIO_IOMMU_IOMMUFD. This is where all
> IOMMUFD container stuff is taking place aiui.
We can add an .instance_init() handler later on. It would be cleaner I agree
but it shouldn't be a reason to block the series.
Zhenzhong,
Did Joao address your concerns ?
Thanks,
C.
>> Thanks
>> Zhenzhong
>>
>>>
>>> bcontainer = &container->bcontainer;
>>> vfio_address_space_insert(space, bcontainer);
>>> diff --git a/backends/trace-events b/backends/trace-events
>>> index 211e6f374adc..4d8ac02fe7d6 100644
>>> --- a/backends/trace-events
>>> +++ b/backends/trace-events
>>> @@ -14,4 +14,5 @@ iommufd_backend_map_dma(int iommufd, uint32_t
>>> ioas, uint64_t iova, uint64_t size
>>> iommufd_backend_unmap_dma_non_exist(int iommufd, uint32_t ioas,
>>> uint64_t iova, uint64_t size, int ret) " Unmap nonexistent mapping:
>>> iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" (%d)"
>>> iommufd_backend_unmap_dma(int iommufd, uint32_t ioas, uint64_t iova,
>>> uint64_t size, int ret) " iommufd=%d ioas=%d iova=0x%"PRIx64"
>>> size=0x%"PRIx64" (%d)"
>>> iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas) " iommufd=%d
>>> ioas=%d"
>>> +iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id, uint32_t
>>> pt_id, uint32_t flags, uint32_t hwpt_type, uint32_t len, uint64_t data_ptr,
>>> uint32_t out_hwpt_id, int ret) " iommufd=%d dev_id=%u pt_id=%u
>>> flags=0x%x hwpt_type=%u len=%u data_ptr=0x%"PRIx64" out_hwpt=%u
>>> (%d)"
>>> iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d
>>> id=%d (%d)"
>>> --
>>> 2.17.2
>>
>
^ permalink raw reply [flat|nested] 53+ messages in thread
* RE: [PATCH v5 05/13] vfio/iommufd: Introduce auto domain creation
2024-07-22 14:21 ` Cédric Le Goater
@ 2024-07-23 2:36 ` Duan, Zhenzhong
0 siblings, 0 replies; 53+ messages in thread
From: Duan, Zhenzhong @ 2024-07-23 2:36 UTC (permalink / raw)
To: Cédric Le Goater, Joao Martins, qemu-devel@nongnu.org
Cc: Liu, Yi L, Eric Auger, Alex Williamson, Jason Gunthorpe,
Avihai Horon
>-----Original Message-----
>From: Cédric Le Goater <clg@redhat.com>
>Subject: Re: [PATCH v5 05/13] vfio/iommufd: Introduce auto domain
>creation
>
>On 7/22/24 10:50, Joao Martins wrote:
>> On 22/07/2024 06:16, Duan, Zhenzhong wrote:
>>>> -----Original Message-----
>>>> From: Joao Martins <joao.m.martins@oracle.com>
>>>> Subject: [PATCH v5 05/13] vfio/iommufd: Introduce auto domain
>creation
>>>>
>>>> There's generally two modes of operation for IOMMUFD:
>>>>
>>>> 1) The simple user API which intends to perform relatively simple things
>>>> with IOMMUs e.g. DPDK. The process generally creates an IOAS and
>attaches
>>>> to VFIO and mainly performs IOAS_MAP and UNMAP.
>>>>
>>>> 2) The native IOMMUFD API where you have fine grained control of the
>>>> IOMMU domain and model it accordingly. This is where most new
>feature
>>>> are being steered to.
>>>>
>>>> For dirty tracking 2) is required, as it needs to ensure that
>>>> the stage-2/parent IOMMU domain will only attach devices
>>>> that support dirty tracking (so far it is all homogeneous in x86, likely
>>>> not the case for smmuv3). Such invariant on dirty tracking provides a
>>>> useful guarantee to VMMs that will refuse incompatible device
>>>> attachments for IOMMU domains.
>>>>
>>>> Dirty tracking insurance is enforced via HWPT_ALLOC, which is
>>>> responsible for creating an IOMMU domain. This is contrast to the
>>>> 'simple API' where the IOMMU domain is created by IOMMUFD
>>>> automatically
>>>> when it attaches to VFIO (usually referred as autodomains) but it has
>>>> the needed handling for mdevs.
>>>>
>>>> To support dirty tracking with the advanced IOMMUFD API, it needs
>>>> similar logic, where IOMMU domains are created and devices attached
>to
>>>> compatible domains. Essentially mimicking kernel
>>>> iommufd_device_auto_get_domain(). With mdevs given there's no
>IOMMU
>>>> domain
>>>> it falls back to IOAS attach.
>>>>
>>>> The auto domain logic allows different IOMMU domains to be created
>when
>>>> DMA dirty tracking is not desired (and VF can provide it), and others
>where
>>>> it is. Here it is not used in this way given how VFIODevice migration
>>>> state is initialized after the device attachment. But such mixed mode of
>>>> IOMMU dirty tracking + device dirty tracking is an improvement that
>can
>>>> be added on. Keep the 'all of nothing' of type1 approach that we have
>>>> been using so far between container vs device dirty tracking.
>>>>
>>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>>> ---
>>>> include/hw/vfio/vfio-common.h | 9 ++++
>>>> include/sysemu/iommufd.h | 5 +++
>>>> backends/iommufd.c | 30 +++++++++++++
>>>> hw/vfio/iommufd.c | 84
>>>> +++++++++++++++++++++++++++++++++++
>>>> backends/trace-events | 1 +
>>>> 5 files changed, 129 insertions(+)
>>>>
>>>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-
>>>> common.h
>>>> index 98acae8c1c97..1a96678f8c38 100644
>>>> --- a/include/hw/vfio/vfio-common.h
>>>> +++ b/include/hw/vfio/vfio-common.h
>>>> @@ -95,10 +95,17 @@ typedef struct VFIOHostDMAWindow {
>>>>
>>>> typedef struct IOMMUFDBackend IOMMUFDBackend;
>>>>
>>>> +typedef struct VFIOIOASHwpt {
>>>> + uint32_t hwpt_id;
>>>> + QLIST_HEAD(, VFIODevice) device_list;
>>>> + QLIST_ENTRY(VFIOIOASHwpt) next;
>>>> +} VFIOIOASHwpt;
>>>> +
>>>> typedef struct VFIOIOMMUFDContainer {
>>>> VFIOContainerBase bcontainer;
>>>> IOMMUFDBackend *be;
>>>> uint32_t ioas_id;
>>>> + QLIST_HEAD(, VFIOIOASHwpt) hwpt_list;
>>>> } VFIOIOMMUFDContainer;
>>>>
>>>> OBJECT_DECLARE_SIMPLE_TYPE(VFIOIOMMUFDContainer,
>>>> VFIO_IOMMU_IOMMUFD);
>>>> @@ -135,6 +142,8 @@ typedef struct VFIODevice {
>>>> HostIOMMUDevice *hiod;
>>>> int devid;
>>>> IOMMUFDBackend *iommufd;
>>>> + VFIOIOASHwpt *hwpt;
>>>> + QLIST_ENTRY(VFIODevice) hwpt_next;
>>>> } VFIODevice;
>>>>
>>>> struct VFIODeviceOps {
>>>> diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
>>>> index 57d502a1c79a..e917e7591d05 100644
>>>> --- a/include/sysemu/iommufd.h
>>>> +++ b/include/sysemu/iommufd.h
>>>> @@ -50,6 +50,11 @@ int
>>>> iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t
>ioas_id,
>>>> bool iommufd_backend_get_device_info(IOMMUFDBackend *be,
>uint32_t
>>>> devid,
>>>> uint32_t *type, void *data, uint32_t len,
>>>> uint64_t *caps, Error **errp);
>>>> +bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t
>>>> dev_id,
>>>> + uint32_t pt_id, uint32_t flags,
>>>> + uint32_t data_type, uint32_t data_len,
>>>> + void *data_ptr, uint32_t *out_hwpt,
>>>> + Error **errp);
>>>>
>>>> #define TYPE_HOST_IOMMU_DEVICE_IOMMUFD
>>>> TYPE_HOST_IOMMU_DEVICE "-iommufd"
>>>> #endif
>>>> diff --git a/backends/iommufd.c b/backends/iommufd.c
>>>> index 2b3d51af26d2..a94d3b90c05c 100644
>>>> --- a/backends/iommufd.c
>>>> +++ b/backends/iommufd.c
>>>> @@ -208,6 +208,36 @@ int
>>>> iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t
>ioas_id,
>>>> return ret;
>>>> }
>>>>
>>>> +bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t
>>>> dev_id,
>>>> + uint32_t pt_id, uint32_t flags,
>>>> + uint32_t data_type, uint32_t data_len,
>>>> + void *data_ptr, uint32_t *out_hwpt,
>>>> + Error **errp)
>>>> +{
>>>> + int ret, fd = be->fd;
>>>> + struct iommu_hwpt_alloc alloc_hwpt = {
>>>> + .size = sizeof(struct iommu_hwpt_alloc),
>>>> + .flags = flags,
>>>> + .dev_id = dev_id,
>>>> + .pt_id = pt_id,
>>>> + .data_type = data_type,
>>>> + .data_len = data_len,
>>>> + .data_uptr = (uintptr_t)data_ptr,
>>>> + };
>>>> +
>>>> + ret = ioctl(fd, IOMMU_HWPT_ALLOC, &alloc_hwpt);
>>>> + trace_iommufd_backend_alloc_hwpt(fd, dev_id, pt_id, flags,
>data_type,
>>>> + data_len, (uintptr_t)data_ptr,
>>>> + alloc_hwpt.out_hwpt_id, ret);
>>>> + if (ret) {
>>>> + error_setg_errno(errp, errno, "Failed to allocate hwpt");
>>>> + return false;
>>>> + }
>>>> +
>>>> + *out_hwpt = alloc_hwpt.out_hwpt_id;
>>>> + return true;
>>>> +}
>>>> +
>>>> bool iommufd_backend_get_device_info(IOMMUFDBackend *be,
>uint32_t
>>>> devid,
>>>> uint32_t *type, void *data, uint32_t len,
>>>> uint64_t *caps, Error **errp)
>>>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>>>> index 077dea8f1b64..545f4a404125 100644
>>>> --- a/hw/vfio/iommufd.c
>>>> +++ b/hw/vfio/iommufd.c
>>>> @@ -212,10 +212,88 @@ static bool
>>>> iommufd_cdev_detach_ioas_hwpt(VFIODevice *vbasedev, Error **errp)
>>>> return true;
>>>> }
>>>>
>>>> +static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>>>> + VFIOIOMMUFDContainer *container,
>>>> + Error **errp)
>>>> +{
>>>> + IOMMUFDBackend *iommufd = vbasedev->iommufd;
>>>> + uint32_t flags = 0;
>>>> + VFIOIOASHwpt *hwpt;
>>>> + uint32_t hwpt_id;
>>>> + int ret;
>>>> +
>>>> + /* Try to find a domain */
>>>> + QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
>>>> + ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id,
>>>> errp);
>>>> + if (ret) {
>>>> + /* -EINVAL means the domain is incompatible with the device.
>*/
>>>> + if (ret == -EINVAL) {
>>>> + /*
>>>> + * It is an expected failure and it just means we will try
>>>> + * another domain, or create one if no existing compatible
>>>> + * domain is found. Hence why the error is discarded below.
>>>> + */
>>>> + error_free(*errp);
>>>> + *errp = NULL;
>>>> + continue;
>>>> + }
>>>> +
>>>> + return false;
>>>> + } else {
>>>> + vbasedev->hwpt = hwpt;
>>>> + QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
>>>> + return true;
>>>> + }
>>>> + }
>>>> +
>>>> + if (!iommufd_backend_alloc_hwpt(iommufd, vbasedev->devid,
>>>> + container->ioas_id, flags,
>>>> + IOMMU_HWPT_DATA_NONE, 0, NULL,
>>>> + &hwpt_id, errp)) {
>>>> + return false;
>>>> + }
>>>> +
>>>> + hwpt = g_malloc0(sizeof(*hwpt));
>>>> + hwpt->hwpt_id = hwpt_id;
>>>> + QLIST_INIT(&hwpt->device_list);
>>>> +
>>>> + ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id,
>errp);
>>>> + if (ret) {
>>>> + iommufd_backend_free_id(container->be, hwpt->hwpt_id);
>>>> + g_free(hwpt);
>>>> + return false;
>>>> + }
>>>> +
>>>> + vbasedev->hwpt = hwpt;
>>>> + QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
>>>> + QLIST_INSERT_HEAD(&container->hwpt_list, hwpt, next);
>>>> + return true;
>>>> +}
>>>> +
>>>> +static void iommufd_cdev_autodomains_put(VFIODevice *vbasedev,
>>>> + VFIOIOMMUFDContainer *container)
>>>> +{
>>>> + VFIOIOASHwpt *hwpt = vbasedev->hwpt;
>>>> +
>>>> + QLIST_REMOVE(vbasedev, hwpt_next);
>>>> + vbasedev->hwpt = NULL;
>>>> +
>>>> + if (QLIST_EMPTY(&hwpt->device_list)) {
>>>> + QLIST_REMOVE(hwpt, next);
>>>> + iommufd_backend_free_id(container->be, hwpt->hwpt_id);
>>>> + g_free(hwpt);
>>>> + }
>>>> +}
>>>
>>> Looks the detach flow is still missed?
>>>
>>
>>
>> I don't think so. The iommufd_backend_free_id() pairs with alloc_hwpt call
>and
>> is there for when there's no device attached to the hwpt to actually free
>the
>> hwpt. Besides setting to NULL the device hwpt, the detach flow was fixed
>below (...)
>>
>>>> +
>>>> static bool iommufd_cdev_attach_container(VFIODevice *vbasedev,
>>>> VFIOIOMMUFDContainer *container,
>>>> Error **errp)
>>>> {
>>>> + /* mdevs aren't physical devices and will fail with auto domains */
>>>> + if (!vbasedev->mdev) {
>>>> + return iommufd_cdev_autodomains_get(vbasedev, container,
>errp);
>>>> + }
>>>> +
>>>> return !iommufd_cdev_attach_ioas_hwpt(vbasedev, container-
>>ioas_id,
>>>> errp);
>>>> }
>>>>
>>>> @@ -227,6 +305,11 @@ static void
>>>> iommufd_cdev_detach_container(VFIODevice *vbasedev,
>>>> if (!iommufd_cdev_detach_ioas_hwpt(vbasedev, &err)) {
>>>
>>> Shouldn't we check mdev before calling this?
>>>
>> (...) here. Detach needs to be called for both, and keep in mind that this
>> doesn't a pt_id, as the ioctl detaches from whatever domain or emulated
>idea of
>> it (for mdev) that it has previously been called IOMMUFD_ATTACH with.
>>
>> We also call this with mdev we just don't call it with a hwpt_id but rather
>use
>> autodomains (and it doesn't actually allocate a hw domain)
>>
>>>> error_report_err(err);
>>>> }
>>>> +
>>>> + if (vbasedev->hwpt) {
>>>> + iommufd_cdev_autodomains_put(vbasedev, container);
>>>> + }
>>>> +
>>>> }
>>>>
>>>> static void iommufd_cdev_container_destroy(VFIOIOMMUFDContainer
>>>> *container)
>>>> @@ -354,6 +437,7 @@ static bool iommufd_cdev_attach(const char
>*name,
>>>> VFIODevice *vbasedev,
>>>> container =
>>>> VFIO_IOMMU_IOMMUFD(object_new(TYPE_VFIO_IOMMU_IOMMUFD));
>>>> container->be = vbasedev->iommufd;
>>>> container->ioas_id = ioas_id;
>>>> + QLIST_INIT(&container->hwpt_list);
>>>
>>> This can be in ::instance_init().
>>>
>> But there's no instance_init() for TYPE_VFIO_IOMMU_IOMMUFD. This is
>where all
>> IOMMUFD container stuff is taking place aiui.
>
>We can add an .instance_init() handler later on. It would be cleaner I agree
>but it shouldn't be a reason to block the series.
Yes, it's minor.
>
>Zhenzhong,
>
>Did Joao address your concerns ?
Sure.
Thanks
Zhenzhong
>
>Thanks,
>
>C.
>
>
>
>
>>> Thanks
>>> Zhenzhong
>>>
>>>>
>>>> bcontainer = &container->bcontainer;
>>>> vfio_address_space_insert(space, bcontainer);
>>>> diff --git a/backends/trace-events b/backends/trace-events
>>>> index 211e6f374adc..4d8ac02fe7d6 100644
>>>> --- a/backends/trace-events
>>>> +++ b/backends/trace-events
>>>> @@ -14,4 +14,5 @@ iommufd_backend_map_dma(int iommufd,
>uint32_t
>>>> ioas, uint64_t iova, uint64_t size
>>>> iommufd_backend_unmap_dma_non_exist(int iommufd, uint32_t ioas,
>>>> uint64_t iova, uint64_t size, int ret) " Unmap nonexistent mapping:
>>>> iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" (%d)"
>>>> iommufd_backend_unmap_dma(int iommufd, uint32_t ioas, uint64_t
>iova,
>>>> uint64_t size, int ret) " iommufd=%d ioas=%d iova=0x%"PRIx64"
>>>> size=0x%"PRIx64" (%d)"
>>>> iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas) "
>iommufd=%d
>>>> ioas=%d"
>>>> +iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id,
>uint32_t
>>>> pt_id, uint32_t flags, uint32_t hwpt_type, uint32_t len, uint64_t
>data_ptr,
>>>> uint32_t out_hwpt_id, int ret) " iommufd=%d dev_id=%u pt_id=%u
>>>> flags=0x%x hwpt_type=%u len=%u data_ptr=0x%"PRIx64"
>out_hwpt=%u
>>>> (%d)"
>>>> iommufd_backend_free_id(int iommufd, uint32_t id, int ret) "
>iommufd=%d
>>>> id=%d (%d)"
>>>> --
>>>> 2.17.2
>>>
>>
^ permalink raw reply [flat|nested] 53+ messages in thread
* RE: [PATCH v5 05/13] vfio/iommufd: Introduce auto domain creation
2024-07-22 8:50 ` Joao Martins
2024-07-22 14:21 ` Cédric Le Goater
@ 2024-07-23 4:36 ` Duan, Zhenzhong
1 sibling, 0 replies; 53+ messages in thread
From: Duan, Zhenzhong @ 2024-07-23 4:36 UTC (permalink / raw)
To: Joao Martins, qemu-devel@nongnu.org
Cc: Liu, Yi L, Eric Auger, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
>-----Original Message-----
>From: Joao Martins <joao.m.martins@oracle.com>
>Subject: Re: [PATCH v5 05/13] vfio/iommufd: Introduce auto domain
>creation
>
>On 22/07/2024 06:16, Duan, Zhenzhong wrote:
>>> -----Original Message-----
>>> From: Joao Martins <joao.m.martins@oracle.com>
>>> Subject: [PATCH v5 05/13] vfio/iommufd: Introduce auto domain
>creation
>>>
>>> There's generally two modes of operation for IOMMUFD:
>>>
>>> 1) The simple user API which intends to perform relatively simple things
>>> with IOMMUs e.g. DPDK. The process generally creates an IOAS and
>attaches
>>> to VFIO and mainly performs IOAS_MAP and UNMAP.
>>>
>>> 2) The native IOMMUFD API where you have fine grained control of the
>>> IOMMU domain and model it accordingly. This is where most new feature
>>> are being steered to.
>>>
>>> For dirty tracking 2) is required, as it needs to ensure that
>>> the stage-2/parent IOMMU domain will only attach devices
>>> that support dirty tracking (so far it is all homogeneous in x86, likely
>>> not the case for smmuv3). Such invariant on dirty tracking provides a
>>> useful guarantee to VMMs that will refuse incompatible device
>>> attachments for IOMMU domains.
>>>
>>> Dirty tracking insurance is enforced via HWPT_ALLOC, which is
>>> responsible for creating an IOMMU domain. This is contrast to the
>>> 'simple API' where the IOMMU domain is created by IOMMUFD
>>> automatically
>>> when it attaches to VFIO (usually referred as autodomains) but it has
>>> the needed handling for mdevs.
>>>
>>> To support dirty tracking with the advanced IOMMUFD API, it needs
>>> similar logic, where IOMMU domains are created and devices attached to
>>> compatible domains. Essentially mimicking kernel
>>> iommufd_device_auto_get_domain(). With mdevs given there's no
>IOMMU
>>> domain
>>> it falls back to IOAS attach.
>>>
>>> The auto domain logic allows different IOMMU domains to be created
>when
>>> DMA dirty tracking is not desired (and VF can provide it), and others
>where
>>> it is. Here it is not used in this way given how VFIODevice migration
>>> state is initialized after the device attachment. But such mixed mode of
>>> IOMMU dirty tracking + device dirty tracking is an improvement that can
>>> be added on. Keep the 'all of nothing' of type1 approach that we have
>>> been using so far between container vs device dirty tracking.
>>>
>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>> ---
>>> include/hw/vfio/vfio-common.h | 9 ++++
>>> include/sysemu/iommufd.h | 5 +++
>>> backends/iommufd.c | 30 +++++++++++++
>>> hw/vfio/iommufd.c | 84
>>> +++++++++++++++++++++++++++++++++++
>>> backends/trace-events | 1 +
>>> 5 files changed, 129 insertions(+)
>>>
>>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-
>>> common.h
>>> index 98acae8c1c97..1a96678f8c38 100644
>>> --- a/include/hw/vfio/vfio-common.h
>>> +++ b/include/hw/vfio/vfio-common.h
>>> @@ -95,10 +95,17 @@ typedef struct VFIOHostDMAWindow {
>>>
>>> typedef struct IOMMUFDBackend IOMMUFDBackend;
>>>
>>> +typedef struct VFIOIOASHwpt {
>>> + uint32_t hwpt_id;
>>> + QLIST_HEAD(, VFIODevice) device_list;
>>> + QLIST_ENTRY(VFIOIOASHwpt) next;
>>> +} VFIOIOASHwpt;
>>> +
>>> typedef struct VFIOIOMMUFDContainer {
>>> VFIOContainerBase bcontainer;
>>> IOMMUFDBackend *be;
>>> uint32_t ioas_id;
>>> + QLIST_HEAD(, VFIOIOASHwpt) hwpt_list;
>>> } VFIOIOMMUFDContainer;
>>>
>>> OBJECT_DECLARE_SIMPLE_TYPE(VFIOIOMMUFDContainer,
>>> VFIO_IOMMU_IOMMUFD);
>>> @@ -135,6 +142,8 @@ typedef struct VFIODevice {
>>> HostIOMMUDevice *hiod;
>>> int devid;
>>> IOMMUFDBackend *iommufd;
>>> + VFIOIOASHwpt *hwpt;
>>> + QLIST_ENTRY(VFIODevice) hwpt_next;
>>> } VFIODevice;
>>>
>>> struct VFIODeviceOps {
>>> diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
>>> index 57d502a1c79a..e917e7591d05 100644
>>> --- a/include/sysemu/iommufd.h
>>> +++ b/include/sysemu/iommufd.h
>>> @@ -50,6 +50,11 @@ int
>>> iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t
>ioas_id,
>>> bool iommufd_backend_get_device_info(IOMMUFDBackend *be,
>uint32_t
>>> devid,
>>> uint32_t *type, void *data, uint32_t len,
>>> uint64_t *caps, Error **errp);
>>> +bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t
>>> dev_id,
>>> + uint32_t pt_id, uint32_t flags,
>>> + uint32_t data_type, uint32_t data_len,
>>> + void *data_ptr, uint32_t *out_hwpt,
>>> + Error **errp);
>>>
>>> #define TYPE_HOST_IOMMU_DEVICE_IOMMUFD
>>> TYPE_HOST_IOMMU_DEVICE "-iommufd"
>>> #endif
>>> diff --git a/backends/iommufd.c b/backends/iommufd.c
>>> index 2b3d51af26d2..a94d3b90c05c 100644
>>> --- a/backends/iommufd.c
>>> +++ b/backends/iommufd.c
>>> @@ -208,6 +208,36 @@ int
>>> iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t
>ioas_id,
>>> return ret;
>>> }
>>>
>>> +bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t
>>> dev_id,
>>> + uint32_t pt_id, uint32_t flags,
>>> + uint32_t data_type, uint32_t data_len,
>>> + void *data_ptr, uint32_t *out_hwpt,
>>> + Error **errp)
>>> +{
>>> + int ret, fd = be->fd;
>>> + struct iommu_hwpt_alloc alloc_hwpt = {
>>> + .size = sizeof(struct iommu_hwpt_alloc),
>>> + .flags = flags,
>>> + .dev_id = dev_id,
>>> + .pt_id = pt_id,
>>> + .data_type = data_type,
>>> + .data_len = data_len,
>>> + .data_uptr = (uintptr_t)data_ptr,
>>> + };
>>> +
>>> + ret = ioctl(fd, IOMMU_HWPT_ALLOC, &alloc_hwpt);
>>> + trace_iommufd_backend_alloc_hwpt(fd, dev_id, pt_id, flags,
>data_type,
>>> + data_len, (uintptr_t)data_ptr,
>>> + alloc_hwpt.out_hwpt_id, ret);
>>> + if (ret) {
>>> + error_setg_errno(errp, errno, "Failed to allocate hwpt");
>>> + return false;
>>> + }
>>> +
>>> + *out_hwpt = alloc_hwpt.out_hwpt_id;
>>> + return true;
>>> +}
>>> +
>>> bool iommufd_backend_get_device_info(IOMMUFDBackend *be,
>uint32_t
>>> devid,
>>> uint32_t *type, void *data, uint32_t len,
>>> uint64_t *caps, Error **errp)
>>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>>> index 077dea8f1b64..545f4a404125 100644
>>> --- a/hw/vfio/iommufd.c
>>> +++ b/hw/vfio/iommufd.c
>>> @@ -212,10 +212,88 @@ static bool
>>> iommufd_cdev_detach_ioas_hwpt(VFIODevice *vbasedev, Error **errp)
>>> return true;
>>> }
>>>
>>> +static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>>> + VFIOIOMMUFDContainer *container,
>>> + Error **errp)
>>> +{
>>> + IOMMUFDBackend *iommufd = vbasedev->iommufd;
>>> + uint32_t flags = 0;
>>> + VFIOIOASHwpt *hwpt;
>>> + uint32_t hwpt_id;
>>> + int ret;
>>> +
>>> + /* Try to find a domain */
>>> + QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
>>> + ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id,
>>> errp);
>>> + if (ret) {
>>> + /* -EINVAL means the domain is incompatible with the device. */
>>> + if (ret == -EINVAL) {
>>> + /*
>>> + * It is an expected failure and it just means we will try
>>> + * another domain, or create one if no existing compatible
>>> + * domain is found. Hence why the error is discarded below.
>>> + */
>>> + error_free(*errp);
>>> + *errp = NULL;
>>> + continue;
>>> + }
>>> +
>>> + return false;
>>> + } else {
>>> + vbasedev->hwpt = hwpt;
>>> + QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
>>> + return true;
>>> + }
>>> + }
>>> +
>>> + if (!iommufd_backend_alloc_hwpt(iommufd, vbasedev->devid,
>>> + container->ioas_id, flags,
>>> + IOMMU_HWPT_DATA_NONE, 0, NULL,
>>> + &hwpt_id, errp)) {
>>> + return false;
>>> + }
>>> +
>>> + hwpt = g_malloc0(sizeof(*hwpt));
>>> + hwpt->hwpt_id = hwpt_id;
>>> + QLIST_INIT(&hwpt->device_list);
>>> +
>>> + ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id,
>errp);
>>> + if (ret) {
>>> + iommufd_backend_free_id(container->be, hwpt->hwpt_id);
>>> + g_free(hwpt);
>>> + return false;
>>> + }
>>> +
>>> + vbasedev->hwpt = hwpt;
>>> + QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
>>> + QLIST_INSERT_HEAD(&container->hwpt_list, hwpt, next);
>>> + return true;
>>> +}
>>> +
>>> +static void iommufd_cdev_autodomains_put(VFIODevice *vbasedev,
>>> + VFIOIOMMUFDContainer *container)
>>> +{
>>> + VFIOIOASHwpt *hwpt = vbasedev->hwpt;
>>> +
>>> + QLIST_REMOVE(vbasedev, hwpt_next);
>>> + vbasedev->hwpt = NULL;
>>> +
>>> + if (QLIST_EMPTY(&hwpt->device_list)) {
>>> + QLIST_REMOVE(hwpt, next);
>>> + iommufd_backend_free_id(container->be, hwpt->hwpt_id);
>>> + g_free(hwpt);
>>> + }
>>> +}
>>
>> Looks the detach flow is still missed?
>>
>
>
>I don't think so. The iommufd_backend_free_id() pairs with alloc_hwpt call
>and
>is there for when there's no device attached to the hwpt to actually free the
>hwpt. Besides setting to NULL the device hwpt, the detach flow was fixed
>below (...)
>
>>> +
>>> static bool iommufd_cdev_attach_container(VFIODevice *vbasedev,
>>> VFIOIOMMUFDContainer *container,
>>> Error **errp)
>>> {
>>> + /* mdevs aren't physical devices and will fail with auto domains */
>>> + if (!vbasedev->mdev) {
>>> + return iommufd_cdev_autodomains_get(vbasedev, container, errp);
>>> + }
>>> +
>>> return !iommufd_cdev_attach_ioas_hwpt(vbasedev, container-
>>ioas_id,
>>> errp);
>>> }
>>>
>>> @@ -227,6 +305,11 @@ static void
>>> iommufd_cdev_detach_container(VFIODevice *vbasedev,
>>> if (!iommufd_cdev_detach_ioas_hwpt(vbasedev, &err)) {
>>
>> Shouldn't we check mdev before calling this?
>>
>(...) here. Detach needs to be called for both, and keep in mind that this
>doesn't a pt_id, as the ioctl detaches from whatever domain or emulated
>idea of
>it (for mdev) that it has previously been called IOMMUFD_ATTACH with.
>
>We also call this with mdev we just don't call it with a hwpt_id but rather
>use
>autodomains (and it doesn't actually allocate a hw domain)
Yeah, you are right, no problem here.
>
>>> error_report_err(err);
>>> }
>>> +
>>> + if (vbasedev->hwpt) {
>>> + iommufd_cdev_autodomains_put(vbasedev, container);
>>> + }
>>> +
>>> }
>>>
>>> static void iommufd_cdev_container_destroy(VFIOIOMMUFDContainer
>>> *container)
>>> @@ -354,6 +437,7 @@ static bool iommufd_cdev_attach(const char
>*name,
>>> VFIODevice *vbasedev,
>>> container =
>>> VFIO_IOMMU_IOMMUFD(object_new(TYPE_VFIO_IOMMU_IOMMUFD));
>>> container->be = vbasedev->iommufd;
>>> container->ioas_id = ioas_id;
>>> + QLIST_INIT(&container->hwpt_list);
>>
>> This can be in ::instance_init().
>>
>But there's no instance_init() for TYPE_VFIO_IOMMU_IOMMUFD. This is
>where all
>IOMMUFD container stuff is taking place aiui.
OK.
Thanks
Zhenzhong
>
>> Thanks
>> Zhenzhong
>>
>>>
>>> bcontainer = &container->bcontainer;
>>> vfio_address_space_insert(space, bcontainer);
>>> diff --git a/backends/trace-events b/backends/trace-events
>>> index 211e6f374adc..4d8ac02fe7d6 100644
>>> --- a/backends/trace-events
>>> +++ b/backends/trace-events
>>> @@ -14,4 +14,5 @@ iommufd_backend_map_dma(int iommufd,
>uint32_t
>>> ioas, uint64_t iova, uint64_t size
>>> iommufd_backend_unmap_dma_non_exist(int iommufd, uint32_t ioas,
>>> uint64_t iova, uint64_t size, int ret) " Unmap nonexistent mapping:
>>> iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" (%d)"
>>> iommufd_backend_unmap_dma(int iommufd, uint32_t ioas, uint64_t
>iova,
>>> uint64_t size, int ret) " iommufd=%d ioas=%d iova=0x%"PRIx64"
>>> size=0x%"PRIx64" (%d)"
>>> iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas) "
>iommufd=%d
>>> ioas=%d"
>>> +iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id, uint32_t
>>> pt_id, uint32_t flags, uint32_t hwpt_type, uint32_t len, uint64_t
>data_ptr,
>>> uint32_t out_hwpt_id, int ret) " iommufd=%d dev_id=%u pt_id=%u
>>> flags=0x%x hwpt_type=%u len=%u data_ptr=0x%"PRIx64" out_hwpt=%u
>>> (%d)"
>>> iommufd_backend_free_id(int iommufd, uint32_t id, int ret) "
>iommufd=%d
>>> id=%d (%d)"
>>> --
>>> 2.17.2
>>
^ permalink raw reply [flat|nested] 53+ messages in thread
* [PATCH v5 06/13] vfio/{iommufd,container}: Remove caps::aw_bits
2024-07-19 12:04 [PATCH v5 00/13] hw/iommufd: IOMMUFD Dirty Tracking Joao Martins
` (4 preceding siblings ...)
2024-07-19 12:04 ` [PATCH v5 05/13] vfio/iommufd: Introduce auto domain creation Joao Martins
@ 2024-07-19 12:04 ` Joao Martins
2024-07-22 5:22 ` Duan, Zhenzhong
2024-07-19 12:04 ` [PATCH v5 07/13] vfio/iommufd: Add hw_caps field to HostIOMMUDeviceCaps Joao Martins
` (9 subsequent siblings)
15 siblings, 1 reply; 53+ messages in thread
From: Joao Martins @ 2024-07-19 12:04 UTC (permalink / raw)
To: qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Cedric Le Goater, Jason Gunthorpe, Avihai Horon, Joao Martins
Remove caps::aw_bits which requires the bcontainer::iova_ranges being
initialized after device is actually attached. Instead defer that to
.get_cap() and call vfio_device_get_aw_bits() directly.
This is in preparation for HostIOMMUDevice::realize() being called early
during attach_device().
Suggested-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com
---
include/sysemu/host_iommu_device.h | 3 ---
backends/iommufd.c | 3 ++-
hw/vfio/container.c | 5 +----
hw/vfio/iommufd.c | 1 -
4 files changed, 3 insertions(+), 9 deletions(-)
diff --git a/include/sysemu/host_iommu_device.h b/include/sysemu/host_iommu_device.h
index ee6c813c8b22..cdeeccec7671 100644
--- a/include/sysemu/host_iommu_device.h
+++ b/include/sysemu/host_iommu_device.h
@@ -19,12 +19,9 @@
* struct HostIOMMUDeviceCaps - Define host IOMMU device capabilities.
*
* @type: host platform IOMMU type.
- *
- * @aw_bits: host IOMMU address width. 0xff if no limitation.
*/
typedef struct HostIOMMUDeviceCaps {
uint32_t type;
- uint8_t aw_bits;
} HostIOMMUDeviceCaps;
#define TYPE_HOST_IOMMU_DEVICE "host-iommu-device"
diff --git a/backends/iommufd.c b/backends/iommufd.c
index a94d3b90c05c..58032e588f49 100644
--- a/backends/iommufd.c
+++ b/backends/iommufd.c
@@ -18,6 +18,7 @@
#include "qemu/error-report.h"
#include "monitor/monitor.h"
#include "trace.h"
+#include "hw/vfio/vfio-common.h"
#include <sys/ioctl.h>
#include <linux/iommufd.h>
@@ -270,7 +271,7 @@ static int hiod_iommufd_get_cap(HostIOMMUDevice *hiod, int cap, Error **errp)
case HOST_IOMMU_DEVICE_CAP_IOMMU_TYPE:
return caps->type;
case HOST_IOMMU_DEVICE_CAP_AW_BITS:
- return caps->aw_bits;
+ return vfio_device_get_aw_bits(hiod->agent);
default:
error_setg(errp, "%s: unsupported capability %x", hiod->name, cap);
return -EINVAL;
diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index 88ede913d6f7..c27f448ba26e 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -1144,7 +1144,6 @@ static bool hiod_legacy_vfio_realize(HostIOMMUDevice *hiod, void *opaque,
VFIODevice *vdev = opaque;
hiod->name = g_strdup(vdev->name);
- hiod->caps.aw_bits = vfio_device_get_aw_bits(vdev);
hiod->agent = opaque;
return true;
@@ -1153,11 +1152,9 @@ static bool hiod_legacy_vfio_realize(HostIOMMUDevice *hiod, void *opaque,
static int hiod_legacy_vfio_get_cap(HostIOMMUDevice *hiod, int cap,
Error **errp)
{
- HostIOMMUDeviceCaps *caps = &hiod->caps;
-
switch (cap) {
case HOST_IOMMU_DEVICE_CAP_AW_BITS:
- return caps->aw_bits;
+ return vfio_device_get_aw_bits(hiod->agent);
default:
error_setg(errp, "%s: unsupported capability %x", hiod->name, cap);
return -EINVAL;
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 545f4a404125..028533bc39b9 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -724,7 +724,6 @@ static bool hiod_iommufd_vfio_realize(HostIOMMUDevice *hiod, void *opaque,
hiod->name = g_strdup(vdev->name);
caps->type = type;
- caps->aw_bits = vfio_device_get_aw_bits(vdev);
return true;
}
--
2.17.2
^ permalink raw reply related [flat|nested] 53+ messages in thread
* RE: [PATCH v5 06/13] vfio/{iommufd,container}: Remove caps::aw_bits
2024-07-19 12:04 ` [PATCH v5 06/13] vfio/{iommufd,container}: Remove caps::aw_bits Joao Martins
@ 2024-07-22 5:22 ` Duan, Zhenzhong
2024-07-22 8:53 ` Joao Martins
0 siblings, 1 reply; 53+ messages in thread
From: Duan, Zhenzhong @ 2024-07-22 5:22 UTC (permalink / raw)
To: Joao Martins, qemu-devel@nongnu.org
Cc: Liu, Yi L, Eric Auger, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
>-----Original Message-----
>From: Joao Martins <joao.m.martins@oracle.com>
>Subject: [PATCH v5 06/13] vfio/{iommufd,container}: Remove caps::aw_bits
>
>Remove caps::aw_bits which requires the bcontainer::iova_ranges being
>initialized after device is actually attached. Instead defer that to
>.get_cap() and call vfio_device_get_aw_bits() directly.
>
>This is in preparation for HostIOMMUDevice::realize() being called early
>during attach_device().
>
>Suggested-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>Reviewed-by: Cédric Le Goater <clg@redhat.com
>---
> include/sysemu/host_iommu_device.h | 3 ---
> backends/iommufd.c | 3 ++-
> hw/vfio/container.c | 5 +----
> hw/vfio/iommufd.c | 1 -
> 4 files changed, 3 insertions(+), 9 deletions(-)
>
>diff --git a/include/sysemu/host_iommu_device.h
>b/include/sysemu/host_iommu_device.h
>index ee6c813c8b22..cdeeccec7671 100644
>--- a/include/sysemu/host_iommu_device.h
>+++ b/include/sysemu/host_iommu_device.h
>@@ -19,12 +19,9 @@
> * struct HostIOMMUDeviceCaps - Define host IOMMU device capabilities.
> *
> * @type: host platform IOMMU type.
>- *
>- * @aw_bits: host IOMMU address width. 0xff if no limitation.
> */
> typedef struct HostIOMMUDeviceCaps {
> uint32_t type;
>- uint8_t aw_bits;
> } HostIOMMUDeviceCaps;
>
> #define TYPE_HOST_IOMMU_DEVICE "host-iommu-device"
>diff --git a/backends/iommufd.c b/backends/iommufd.c
>index a94d3b90c05c..58032e588f49 100644
>--- a/backends/iommufd.c
>+++ b/backends/iommufd.c
>@@ -18,6 +18,7 @@
> #include "qemu/error-report.h"
> #include "monitor/monitor.h"
> #include "trace.h"
>+#include "hw/vfio/vfio-common.h"
> #include <sys/ioctl.h>
> #include <linux/iommufd.h>
>
>@@ -270,7 +271,7 @@ static int
>hiod_iommufd_get_cap(HostIOMMUDevice *hiod, int cap, Error **errp)
> case HOST_IOMMU_DEVICE_CAP_IOMMU_TYPE:
> return caps->type;
> case HOST_IOMMU_DEVICE_CAP_AW_BITS:
>- return caps->aw_bits;
>+ return vfio_device_get_aw_bits(hiod->agent);
I just realized there is an open here. hiod->agent is not necessarily VFIO device, can be VDPA device.
May need a bit more work on this.
Thanks
Zhenzhong
> default:
> error_setg(errp, "%s: unsupported capability %x", hiod->name, cap);
> return -EINVAL;
>diff --git a/hw/vfio/container.c b/hw/vfio/container.c
>index 88ede913d6f7..c27f448ba26e 100644
>--- a/hw/vfio/container.c
>+++ b/hw/vfio/container.c
>@@ -1144,7 +1144,6 @@ static bool
>hiod_legacy_vfio_realize(HostIOMMUDevice *hiod, void *opaque,
> VFIODevice *vdev = opaque;
>
> hiod->name = g_strdup(vdev->name);
>- hiod->caps.aw_bits = vfio_device_get_aw_bits(vdev);
> hiod->agent = opaque;
>
> return true;
>@@ -1153,11 +1152,9 @@ static bool
>hiod_legacy_vfio_realize(HostIOMMUDevice *hiod, void *opaque,
> static int hiod_legacy_vfio_get_cap(HostIOMMUDevice *hiod, int cap,
> Error **errp)
> {
>- HostIOMMUDeviceCaps *caps = &hiod->caps;
>-
> switch (cap) {
> case HOST_IOMMU_DEVICE_CAP_AW_BITS:
>- return caps->aw_bits;
>+ return vfio_device_get_aw_bits(hiod->agent);
> default:
> error_setg(errp, "%s: unsupported capability %x", hiod->name, cap);
> return -EINVAL;
>diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>index 545f4a404125..028533bc39b9 100644
>--- a/hw/vfio/iommufd.c
>+++ b/hw/vfio/iommufd.c
>@@ -724,7 +724,6 @@ static bool
>hiod_iommufd_vfio_realize(HostIOMMUDevice *hiod, void *opaque,
>
> hiod->name = g_strdup(vdev->name);
> caps->type = type;
>- caps->aw_bits = vfio_device_get_aw_bits(vdev);
>
> return true;
> }
>--
>2.17.2
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v5 06/13] vfio/{iommufd,container}: Remove caps::aw_bits
2024-07-22 5:22 ` Duan, Zhenzhong
@ 2024-07-22 8:53 ` Joao Martins
2024-07-23 5:30 ` Duan, Zhenzhong
0 siblings, 1 reply; 53+ messages in thread
From: Joao Martins @ 2024-07-22 8:53 UTC (permalink / raw)
To: Duan, Zhenzhong, qemu-devel@nongnu.org
Cc: Liu, Yi L, Eric Auger, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
On 22/07/2024 06:22, Duan, Zhenzhong wrote:
>
>
>> -----Original Message-----
>> From: Joao Martins <joao.m.martins@oracle.com>
>> Subject: [PATCH v5 06/13] vfio/{iommufd,container}: Remove caps::aw_bits
>>
>> Remove caps::aw_bits which requires the bcontainer::iova_ranges being
>> initialized after device is actually attached. Instead defer that to
>> .get_cap() and call vfio_device_get_aw_bits() directly.
>>
>> This is in preparation for HostIOMMUDevice::realize() being called early
>> during attach_device().
>>
>> Suggested-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> Reviewed-by: Cédric Le Goater <clg@redhat.com
>> ---
>> include/sysemu/host_iommu_device.h | 3 ---
>> backends/iommufd.c | 3 ++-
>> hw/vfio/container.c | 5 +----
>> hw/vfio/iommufd.c | 1 -
>> 4 files changed, 3 insertions(+), 9 deletions(-)
>>
>> diff --git a/include/sysemu/host_iommu_device.h
>> b/include/sysemu/host_iommu_device.h
>> index ee6c813c8b22..cdeeccec7671 100644
>> --- a/include/sysemu/host_iommu_device.h
>> +++ b/include/sysemu/host_iommu_device.h
>> @@ -19,12 +19,9 @@
>> * struct HostIOMMUDeviceCaps - Define host IOMMU device capabilities.
>> *
>> * @type: host platform IOMMU type.
>> - *
>> - * @aw_bits: host IOMMU address width. 0xff if no limitation.
>> */
>> typedef struct HostIOMMUDeviceCaps {
>> uint32_t type;
>> - uint8_t aw_bits;
>> } HostIOMMUDeviceCaps;
>>
>> #define TYPE_HOST_IOMMU_DEVICE "host-iommu-device"
>> diff --git a/backends/iommufd.c b/backends/iommufd.c
>> index a94d3b90c05c..58032e588f49 100644
>> --- a/backends/iommufd.c
>> +++ b/backends/iommufd.c
>> @@ -18,6 +18,7 @@
>> #include "qemu/error-report.h"
>> #include "monitor/monitor.h"
>> #include "trace.h"
>> +#include "hw/vfio/vfio-common.h"
>> #include <sys/ioctl.h>
>> #include <linux/iommufd.h>
>>
>> @@ -270,7 +271,7 @@ static int
>> hiod_iommufd_get_cap(HostIOMMUDevice *hiod, int cap, Error **errp)
>> case HOST_IOMMU_DEVICE_CAP_IOMMU_TYPE:
>> return caps->type;
>> case HOST_IOMMU_DEVICE_CAP_AW_BITS:
>> - return caps->aw_bits;
>> + return vfio_device_get_aw_bits(hiod->agent);
>
> I just realized there is an open here. hiod->agent is not necessarily VFIO device, can be VDPA device.
> May need a bit more work on this.
>
Broadly speaking I agree, that this needs some sort of IOMMUDevice structure
with a agent type that it needs to abstract from instead of an opaque object.
But feels unrelated to this patch exactly, as the existing code was already
making assumptions that ::opaque is a VFIODevice.
> Thanks
> Zhenzhong
>
>> default:
>> error_setg(errp, "%s: unsupported capability %x", hiod->name, cap);
>> return -EINVAL;
>> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
>> index 88ede913d6f7..c27f448ba26e 100644
>> --- a/hw/vfio/container.c
>> +++ b/hw/vfio/container.c
>> @@ -1144,7 +1144,6 @@ static bool
>> hiod_legacy_vfio_realize(HostIOMMUDevice *hiod, void *opaque,
>> VFIODevice *vdev = opaque;
>>
>> hiod->name = g_strdup(vdev->name);
>> - hiod->caps.aw_bits = vfio_device_get_aw_bits(vdev);
>> hiod->agent = opaque;
>>
>> return true;
>> @@ -1153,11 +1152,9 @@ static bool
>> hiod_legacy_vfio_realize(HostIOMMUDevice *hiod, void *opaque,
>> static int hiod_legacy_vfio_get_cap(HostIOMMUDevice *hiod, int cap,
>> Error **errp)
>> {
>> - HostIOMMUDeviceCaps *caps = &hiod->caps;
>> -
>> switch (cap) {
>> case HOST_IOMMU_DEVICE_CAP_AW_BITS:
>> - return caps->aw_bits;
>> + return vfio_device_get_aw_bits(hiod->agent);
>> default:
>> error_setg(errp, "%s: unsupported capability %x", hiod->name, cap);
>> return -EINVAL;
>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>> index 545f4a404125..028533bc39b9 100644
>> --- a/hw/vfio/iommufd.c
>> +++ b/hw/vfio/iommufd.c
>> @@ -724,7 +724,6 @@ static bool
>> hiod_iommufd_vfio_realize(HostIOMMUDevice *hiod, void *opaque,
>>
>> hiod->name = g_strdup(vdev->name);
>> caps->type = type;
>> - caps->aw_bits = vfio_device_get_aw_bits(vdev);
>>
>> return true;
>> }
>> --
>> 2.17.2
>
^ permalink raw reply [flat|nested] 53+ messages in thread
* RE: [PATCH v5 06/13] vfio/{iommufd,container}: Remove caps::aw_bits
2024-07-22 8:53 ` Joao Martins
@ 2024-07-23 5:30 ` Duan, Zhenzhong
0 siblings, 0 replies; 53+ messages in thread
From: Duan, Zhenzhong @ 2024-07-23 5:30 UTC (permalink / raw)
To: Joao Martins, qemu-devel@nongnu.org
Cc: Liu, Yi L, Eric Auger, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
>-----Original Message-----
>From: Joao Martins <joao.m.martins@oracle.com>
>Subject: Re: [PATCH v5 06/13] vfio/{iommufd,container}: Remove
>caps::aw_bits
>
>On 22/07/2024 06:22, Duan, Zhenzhong wrote:
>>
>>
>>> -----Original Message-----
>>> From: Joao Martins <joao.m.martins@oracle.com>
>>> Subject: [PATCH v5 06/13] vfio/{iommufd,container}: Remove
>caps::aw_bits
>>>
>>> Remove caps::aw_bits which requires the bcontainer::iova_ranges being
>>> initialized after device is actually attached. Instead defer that to
>>> .get_cap() and call vfio_device_get_aw_bits() directly.
>>>
>>> This is in preparation for HostIOMMUDevice::realize() being called early
>>> during attach_device().
>>>
>>> Suggested-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>> Reviewed-by: Cédric Le Goater <clg@redhat.com
>>> ---
>>> include/sysemu/host_iommu_device.h | 3 ---
>>> backends/iommufd.c | 3 ++-
>>> hw/vfio/container.c | 5 +----
>>> hw/vfio/iommufd.c | 1 -
>>> 4 files changed, 3 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/include/sysemu/host_iommu_device.h
>>> b/include/sysemu/host_iommu_device.h
>>> index ee6c813c8b22..cdeeccec7671 100644
>>> --- a/include/sysemu/host_iommu_device.h
>>> +++ b/include/sysemu/host_iommu_device.h
>>> @@ -19,12 +19,9 @@
>>> * struct HostIOMMUDeviceCaps - Define host IOMMU device capabilities.
>>> *
>>> * @type: host platform IOMMU type.
>>> - *
>>> - * @aw_bits: host IOMMU address width. 0xff if no limitation.
>>> */
>>> typedef struct HostIOMMUDeviceCaps {
>>> uint32_t type;
>>> - uint8_t aw_bits;
>>> } HostIOMMUDeviceCaps;
>>>
>>> #define TYPE_HOST_IOMMU_DEVICE "host-iommu-device"
>>> diff --git a/backends/iommufd.c b/backends/iommufd.c
>>> index a94d3b90c05c..58032e588f49 100644
>>> --- a/backends/iommufd.c
>>> +++ b/backends/iommufd.c
>>> @@ -18,6 +18,7 @@
>>> #include "qemu/error-report.h"
>>> #include "monitor/monitor.h"
>>> #include "trace.h"
>>> +#include "hw/vfio/vfio-common.h"
>>> #include <sys/ioctl.h>
>>> #include <linux/iommufd.h>
>>>
>>> @@ -270,7 +271,7 @@ static int
>>> hiod_iommufd_get_cap(HostIOMMUDevice *hiod, int cap, Error **errp)
>>> case HOST_IOMMU_DEVICE_CAP_IOMMU_TYPE:
>>> return caps->type;
>>> case HOST_IOMMU_DEVICE_CAP_AW_BITS:
>>> - return caps->aw_bits;
>>> + return vfio_device_get_aw_bits(hiod->agent);
>>
>> I just realized there is an open here. hiod->agent is not necessarily VFIO
>device, can be VDPA device.
>> May need a bit more work on this.
>>
>
>Broadly speaking I agree, that this needs some sort of IOMMUDevice
>structure
>with a agent type that it needs to abstract from instead of an opaque object.
>
>But feels unrelated to this patch exactly, as the existing code was already
>making assumptions that ::opaque is a VFIODevice.
Currently only VFIODevice is supported, so hiod->agent can only points to a VFIODevice.
In future, when VDPA is supported, hiod->agent can point to some kind of VDPADevice structure after ::realize() initialize it.
But I'm ok to leave it to VDPA to fix this as for now hiod->agent only points to VFIODevice.
Thanks
Zhenzhong
>
>> Thanks
>> Zhenzhong
>>
>>> default:
>>> error_setg(errp, "%s: unsupported capability %x", hiod->name, cap);
>>> return -EINVAL;
>>> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
>>> index 88ede913d6f7..c27f448ba26e 100644
>>> --- a/hw/vfio/container.c
>>> +++ b/hw/vfio/container.c
>>> @@ -1144,7 +1144,6 @@ static bool
>>> hiod_legacy_vfio_realize(HostIOMMUDevice *hiod, void *opaque,
>>> VFIODevice *vdev = opaque;
>>>
>>> hiod->name = g_strdup(vdev->name);
>>> - hiod->caps.aw_bits = vfio_device_get_aw_bits(vdev);
>>> hiod->agent = opaque;
>>>
>>> return true;
>>> @@ -1153,11 +1152,9 @@ static bool
>>> hiod_legacy_vfio_realize(HostIOMMUDevice *hiod, void *opaque,
>>> static int hiod_legacy_vfio_get_cap(HostIOMMUDevice *hiod, int cap,
>>> Error **errp)
>>> {
>>> - HostIOMMUDeviceCaps *caps = &hiod->caps;
>>> -
>>> switch (cap) {
>>> case HOST_IOMMU_DEVICE_CAP_AW_BITS:
>>> - return caps->aw_bits;
>>> + return vfio_device_get_aw_bits(hiod->agent);
>>> default:
>>> error_setg(errp, "%s: unsupported capability %x", hiod->name, cap);
>>> return -EINVAL;
>>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>>> index 545f4a404125..028533bc39b9 100644
>>> --- a/hw/vfio/iommufd.c
>>> +++ b/hw/vfio/iommufd.c
>>> @@ -724,7 +724,6 @@ static bool
>>> hiod_iommufd_vfio_realize(HostIOMMUDevice *hiod, void *opaque,
>>>
>>> hiod->name = g_strdup(vdev->name);
>>> caps->type = type;
>>> - caps->aw_bits = vfio_device_get_aw_bits(vdev);
>>>
>>> return true;
>>> }
>>> --
>>> 2.17.2
>>
^ permalink raw reply [flat|nested] 53+ messages in thread
* [PATCH v5 07/13] vfio/iommufd: Add hw_caps field to HostIOMMUDeviceCaps
2024-07-19 12:04 [PATCH v5 00/13] hw/iommufd: IOMMUFD Dirty Tracking Joao Martins
` (5 preceding siblings ...)
2024-07-19 12:04 ` [PATCH v5 06/13] vfio/{iommufd,container}: Remove caps::aw_bits Joao Martins
@ 2024-07-19 12:04 ` Joao Martins
2024-07-22 14:06 ` Cédric Le Goater
2024-07-19 12:04 ` [PATCH v5 08/13] vfio/{iommufd, container}: Invoke HostIOMMUDevice::realize() during attach_device() Joao Martins via
` (8 subsequent siblings)
15 siblings, 1 reply; 53+ messages in thread
From: Joao Martins @ 2024-07-19 12:04 UTC (permalink / raw)
To: qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Cedric Le Goater, Jason Gunthorpe, Avihai Horon, Joao Martins
Store the value of @caps returned by iommufd_backend_get_device_info()
in a new field HostIOMMUDeviceCaps::hw_caps. Right now the only value is
whether device IOMMU supports dirty tracking (IOMMU_HW_CAP_DIRTY_TRACKING).
This is in preparation for HostIOMMUDevice::realize() being called early
during attach_device().
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
include/sysemu/host_iommu_device.h | 4 ++++
hw/vfio/iommufd.c | 1 +
2 files changed, 5 insertions(+)
diff --git a/include/sysemu/host_iommu_device.h b/include/sysemu/host_iommu_device.h
index cdeeccec7671..fd03ce766522 100644
--- a/include/sysemu/host_iommu_device.h
+++ b/include/sysemu/host_iommu_device.h
@@ -19,9 +19,13 @@
* struct HostIOMMUDeviceCaps - Define host IOMMU device capabilities.
*
* @type: host platform IOMMU type.
+ *
+ * @hw_caps: host platform IOMMU capabilities (e.g. on IOMMUFD this represents
+ * the @out_capabilities value returned from IOMMU_GET_HW_INFO ioctl)
*/
typedef struct HostIOMMUDeviceCaps {
uint32_t type;
+ uint64_t hw_caps;
} HostIOMMUDeviceCaps;
#define TYPE_HOST_IOMMU_DEVICE "host-iommu-device"
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 028533bc39b9..7a10b1e90a6f 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -724,6 +724,7 @@ static bool hiod_iommufd_vfio_realize(HostIOMMUDevice *hiod, void *opaque,
hiod->name = g_strdup(vdev->name);
caps->type = type;
+ caps->hw_caps = hw_caps;
return true;
}
--
2.17.2
^ permalink raw reply related [flat|nested] 53+ messages in thread
* Re: [PATCH v5 07/13] vfio/iommufd: Add hw_caps field to HostIOMMUDeviceCaps
2024-07-19 12:04 ` [PATCH v5 07/13] vfio/iommufd: Add hw_caps field to HostIOMMUDeviceCaps Joao Martins
@ 2024-07-22 14:06 ` Cédric Le Goater
0 siblings, 0 replies; 53+ messages in thread
From: Cédric Le Goater @ 2024-07-22 14:06 UTC (permalink / raw)
To: Joao Martins, qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Jason Gunthorpe, Avihai Horon
On 7/19/24 14:04, Joao Martins wrote:
> Store the value of @caps returned by iommufd_backend_get_device_info()
> in a new field HostIOMMUDeviceCaps::hw_caps. Right now the only value is
> whether device IOMMU supports dirty tracking (IOMMU_HW_CAP_DIRTY_TRACKING).
>
> This is in preparation for HostIOMMUDevice::realize() being called early
> during attach_device().
>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Thanks,
C.
> ---
> include/sysemu/host_iommu_device.h | 4 ++++
> hw/vfio/iommufd.c | 1 +
> 2 files changed, 5 insertions(+)
>
> diff --git a/include/sysemu/host_iommu_device.h b/include/sysemu/host_iommu_device.h
> index cdeeccec7671..fd03ce766522 100644
> --- a/include/sysemu/host_iommu_device.h
> +++ b/include/sysemu/host_iommu_device.h
> @@ -19,9 +19,13 @@
> * struct HostIOMMUDeviceCaps - Define host IOMMU device capabilities.
> *
> * @type: host platform IOMMU type.
> + *
> + * @hw_caps: host platform IOMMU capabilities (e.g. on IOMMUFD this represents
> + * the @out_capabilities value returned from IOMMU_GET_HW_INFO ioctl)
> */
> typedef struct HostIOMMUDeviceCaps {
> uint32_t type;
> + uint64_t hw_caps;
> } HostIOMMUDeviceCaps;
>
> #define TYPE_HOST_IOMMU_DEVICE "host-iommu-device"
> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
> index 028533bc39b9..7a10b1e90a6f 100644
> --- a/hw/vfio/iommufd.c
> +++ b/hw/vfio/iommufd.c
> @@ -724,6 +724,7 @@ static bool hiod_iommufd_vfio_realize(HostIOMMUDevice *hiod, void *opaque,
>
> hiod->name = g_strdup(vdev->name);
> caps->type = type;
> + caps->hw_caps = hw_caps;
>
> return true;
> }
^ permalink raw reply [flat|nested] 53+ messages in thread
* [PATCH v5 08/13] vfio/{iommufd, container}: Invoke HostIOMMUDevice::realize() during attach_device()
2024-07-19 12:04 [PATCH v5 00/13] hw/iommufd: IOMMUFD Dirty Tracking Joao Martins
` (6 preceding siblings ...)
2024-07-19 12:04 ` [PATCH v5 07/13] vfio/iommufd: Add hw_caps field to HostIOMMUDeviceCaps Joao Martins
@ 2024-07-19 12:04 ` Joao Martins via
2024-07-19 14:10 ` [PATCH v5 08/13] vfio/{iommufd,container}: " Cédric Le Goater
2024-07-22 5:32 ` Duan, Zhenzhong
2024-07-19 12:04 ` [PATCH v5 09/13] vfio/iommufd: Probe and request hwpt dirty tracking capability Joao Martins
` (7 subsequent siblings)
15 siblings, 2 replies; 53+ messages in thread
From: Joao Martins via @ 2024-07-19 12:04 UTC (permalink / raw)
To: qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Cedric Le Goater, Jason Gunthorpe, Avihai Horon, Joao Martins
Move the HostIOMMUDevice::realize() to be invoked during the attach of the device
before we allocate IOMMUFD hardware pagetable objects (HWPT). This allows the use
of the hw_caps obtained by IOMMU_GET_HW_INFO that essentially tell if the IOMMU
behind the device supports dirty tracking.
Note: The HostIOMMUDevice data from legacy backend is static and doesn't
need any information from the (type1-iommu) backend to be initialized.
In contrast however, the IOMMUFD HostIOMMUDevice data requires the
iommufd FD to be connected and having a devid to be able to successfully
GET_HW_INFO. This means vfio_device_hiod_realize() is called in
different places within the backend .attach_device() implementation.
Suggested-by: Cédric Le Goater <clg@redhat.cm>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
include/hw/vfio/vfio-common.h | 1 +
hw/vfio/common.c | 16 ++++++----------
hw/vfio/container.c | 4 ++++
hw/vfio/helpers.c | 11 +++++++++++
hw/vfio/iommufd.c | 4 ++++
5 files changed, 26 insertions(+), 10 deletions(-)
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 1a96678f8c38..4e44b26d3c45 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -242,6 +242,7 @@ void vfio_region_finalize(VFIORegion *region);
void vfio_reset_handler(void *opaque);
struct vfio_device_info *vfio_get_device_info(int fd);
bool vfio_device_is_mdev(VFIODevice *vbasedev);
+bool vfio_device_hiod_realize(VFIODevice *vbasedev, Error **errp);
bool vfio_attach_device(char *name, VFIODevice *vbasedev,
AddressSpace *as, Error **errp);
void vfio_detach_device(VFIODevice *vbasedev);
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index b0beed44116e..cc14f0e3fe24 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1544,7 +1544,7 @@ bool vfio_attach_device(char *name, VFIODevice *vbasedev,
{
const VFIOIOMMUClass *ops =
VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_LEGACY));
- HostIOMMUDevice *hiod;
+ HostIOMMUDevice *hiod = NULL;
if (vbasedev->iommufd) {
ops = VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_IOMMUFD));
@@ -1552,21 +1552,17 @@ bool vfio_attach_device(char *name, VFIODevice *vbasedev,
assert(ops);
- if (!ops->attach_device(name, vbasedev, as, errp)) {
- return false;
- }
- if (vbasedev->mdev) {
- return true;
+ if (!vbasedev->mdev) {
+ hiod = HOST_IOMMU_DEVICE(object_new(ops->hiod_typename));
+ vbasedev->hiod = hiod;
}
- hiod = HOST_IOMMU_DEVICE(object_new(ops->hiod_typename));
- if (!HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod, vbasedev, errp)) {
+ if (!ops->attach_device(name, vbasedev, as, errp)) {
object_unref(hiod);
- ops->detach_device(vbasedev);
+ vbasedev->hiod = NULL;
return false;
}
- vbasedev->hiod = hiod;
return true;
}
diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index c27f448ba26e..adb302216e23 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -917,6 +917,10 @@ static bool vfio_legacy_attach_device(const char *name, VFIODevice *vbasedev,
trace_vfio_attach_device(vbasedev->name, groupid);
+ if (!vfio_device_hiod_realize(vbasedev, errp)) {
+ return false;
+ }
+
group = vfio_get_group(groupid, as, errp);
if (!group) {
return false;
diff --git a/hw/vfio/helpers.c b/hw/vfio/helpers.c
index 7e23e9080c9d..ea15c79db0a3 100644
--- a/hw/vfio/helpers.c
+++ b/hw/vfio/helpers.c
@@ -689,3 +689,14 @@ bool vfio_device_is_mdev(VFIODevice *vbasedev)
subsys = realpath(tmp, NULL);
return subsys && (strcmp(subsys, "/sys/bus/mdev") == 0);
}
+
+bool vfio_device_hiod_realize(VFIODevice *vbasedev, Error **errp)
+{
+ HostIOMMUDevice *hiod = vbasedev->hiod;
+
+ if (!hiod) {
+ return true;
+ }
+
+ return HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod, vbasedev, errp);
+}
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 7a10b1e90a6f..bb44d948c735 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -403,6 +403,10 @@ static bool iommufd_cdev_attach(const char *name, VFIODevice *vbasedev,
space = vfio_get_address_space(as);
+ if (!vfio_device_hiod_realize(vbasedev, errp)) {
+ return false;
+ }
+
/* try to attach to an existing container in this space */
QLIST_FOREACH(bcontainer, &space->containers, next) {
container = container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
--
2.17.2
^ permalink raw reply related [flat|nested] 53+ messages in thread
* Re: [PATCH v5 08/13] vfio/{iommufd,container}: Invoke HostIOMMUDevice::realize() during attach_device()
2024-07-19 12:04 ` [PATCH v5 08/13] vfio/{iommufd, container}: Invoke HostIOMMUDevice::realize() during attach_device() Joao Martins via
@ 2024-07-19 14:10 ` Cédric Le Goater
2024-07-22 5:32 ` Duan, Zhenzhong
1 sibling, 0 replies; 53+ messages in thread
From: Cédric Le Goater @ 2024-07-19 14:10 UTC (permalink / raw)
To: Joao Martins, qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Jason Gunthorpe, Avihai Horon
On 7/19/24 14:04, Joao Martins wrote:
> Move the HostIOMMUDevice::realize() to be invoked during the attach of the device
> before we allocate IOMMUFD hardware pagetable objects (HWPT). This allows the use
> of the hw_caps obtained by IOMMU_GET_HW_INFO that essentially tell if the IOMMU
> behind the device supports dirty tracking.
>
> Note: The HostIOMMUDevice data from legacy backend is static and doesn't
> need any information from the (type1-iommu) backend to be initialized.
> In contrast however, the IOMMUFD HostIOMMUDevice data requires the
> iommufd FD to be connected and having a devid to be able to successfully
> GET_HW_INFO. This means vfio_device_hiod_realize() is called in
> different places within the backend .attach_device() implementation.
>
> Suggested-by: Cédric Le Goater <clg@redhat.cm>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Thanks,
C.
> ---
> include/hw/vfio/vfio-common.h | 1 +
> hw/vfio/common.c | 16 ++++++----------
> hw/vfio/container.c | 4 ++++
> hw/vfio/helpers.c | 11 +++++++++++
> hw/vfio/iommufd.c | 4 ++++
> 5 files changed, 26 insertions(+), 10 deletions(-)
>
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 1a96678f8c38..4e44b26d3c45 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -242,6 +242,7 @@ void vfio_region_finalize(VFIORegion *region);
> void vfio_reset_handler(void *opaque);
> struct vfio_device_info *vfio_get_device_info(int fd);
> bool vfio_device_is_mdev(VFIODevice *vbasedev);
> +bool vfio_device_hiod_realize(VFIODevice *vbasedev, Error **errp);
> bool vfio_attach_device(char *name, VFIODevice *vbasedev,
> AddressSpace *as, Error **errp);
> void vfio_detach_device(VFIODevice *vbasedev);
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index b0beed44116e..cc14f0e3fe24 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -1544,7 +1544,7 @@ bool vfio_attach_device(char *name, VFIODevice *vbasedev,
> {
> const VFIOIOMMUClass *ops =
> VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_LEGACY));
> - HostIOMMUDevice *hiod;
> + HostIOMMUDevice *hiod = NULL;
>
> if (vbasedev->iommufd) {
> ops = VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_IOMMUFD));
> @@ -1552,21 +1552,17 @@ bool vfio_attach_device(char *name, VFIODevice *vbasedev,
>
> assert(ops);
>
> - if (!ops->attach_device(name, vbasedev, as, errp)) {
> - return false;
> - }
>
> - if (vbasedev->mdev) {
> - return true;
> + if (!vbasedev->mdev) {
> + hiod = HOST_IOMMU_DEVICE(object_new(ops->hiod_typename));
> + vbasedev->hiod = hiod;
> }
>
> - hiod = HOST_IOMMU_DEVICE(object_new(ops->hiod_typename));
> - if (!HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod, vbasedev, errp)) {
> + if (!ops->attach_device(name, vbasedev, as, errp)) {
> object_unref(hiod);
> - ops->detach_device(vbasedev);
> + vbasedev->hiod = NULL;
> return false;
> }
> - vbasedev->hiod = hiod;
>
> return true;
> }
> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
> index c27f448ba26e..adb302216e23 100644
> --- a/hw/vfio/container.c
> +++ b/hw/vfio/container.c
> @@ -917,6 +917,10 @@ static bool vfio_legacy_attach_device(const char *name, VFIODevice *vbasedev,
>
> trace_vfio_attach_device(vbasedev->name, groupid);
>
> + if (!vfio_device_hiod_realize(vbasedev, errp)) {
> + return false;
> + }
> +
> group = vfio_get_group(groupid, as, errp);
> if (!group) {
> return false;
> diff --git a/hw/vfio/helpers.c b/hw/vfio/helpers.c
> index 7e23e9080c9d..ea15c79db0a3 100644
> --- a/hw/vfio/helpers.c
> +++ b/hw/vfio/helpers.c
> @@ -689,3 +689,14 @@ bool vfio_device_is_mdev(VFIODevice *vbasedev)
> subsys = realpath(tmp, NULL);
> return subsys && (strcmp(subsys, "/sys/bus/mdev") == 0);
> }
> +
> +bool vfio_device_hiod_realize(VFIODevice *vbasedev, Error **errp)
> +{
> + HostIOMMUDevice *hiod = vbasedev->hiod;
> +
> + if (!hiod) {
> + return true;
> + }
> +
> + return HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod, vbasedev, errp);
> +}
> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
> index 7a10b1e90a6f..bb44d948c735 100644
> --- a/hw/vfio/iommufd.c
> +++ b/hw/vfio/iommufd.c
> @@ -403,6 +403,10 @@ static bool iommufd_cdev_attach(const char *name, VFIODevice *vbasedev,
>
> space = vfio_get_address_space(as);
>
> + if (!vfio_device_hiod_realize(vbasedev, errp)) {
> + return false;
> + }
> +
> /* try to attach to an existing container in this space */
> QLIST_FOREACH(bcontainer, &space->containers, next) {
> container = container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
^ permalink raw reply [flat|nested] 53+ messages in thread
* RE: [PATCH v5 08/13] vfio/{iommufd,container}: Invoke HostIOMMUDevice::realize() during attach_device()
2024-07-19 12:04 ` [PATCH v5 08/13] vfio/{iommufd, container}: Invoke HostIOMMUDevice::realize() during attach_device() Joao Martins via
2024-07-19 14:10 ` [PATCH v5 08/13] vfio/{iommufd,container}: " Cédric Le Goater
@ 2024-07-22 5:32 ` Duan, Zhenzhong
1 sibling, 0 replies; 53+ messages in thread
From: Duan, Zhenzhong @ 2024-07-22 5:32 UTC (permalink / raw)
To: Joao Martins, qemu-devel@nongnu.org
Cc: Liu, Yi L, Eric Auger, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
>-----Original Message-----
>From: Joao Martins <joao.m.martins@oracle.com>
>Subject: [PATCH v5 08/13] vfio/{iommufd,container}: Invoke
>HostIOMMUDevice::realize() during attach_device()
>
>Move the HostIOMMUDevice::realize() to be invoked during the attach of
>the device
>before we allocate IOMMUFD hardware pagetable objects (HWPT). This
>allows the use
>of the hw_caps obtained by IOMMU_GET_HW_INFO that essentially tell if
>the IOMMU
>behind the device supports dirty tracking.
>
>Note: The HostIOMMUDevice data from legacy backend is static and doesn't
>need any information from the (type1-iommu) backend to be initialized.
>In contrast however, the IOMMUFD HostIOMMUDevice data requires the
>iommufd FD to be connected and having a devid to be able to successfully
>GET_HW_INFO. This means vfio_device_hiod_realize() is called in
>different places within the backend .attach_device() implementation.
>
>Suggested-by: Cédric Le Goater <clg@redhat.cm>
>Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Thanks
Zhenzhong
>---
> include/hw/vfio/vfio-common.h | 1 +
> hw/vfio/common.c | 16 ++++++----------
> hw/vfio/container.c | 4 ++++
> hw/vfio/helpers.c | 11 +++++++++++
> hw/vfio/iommufd.c | 4 ++++
> 5 files changed, 26 insertions(+), 10 deletions(-)
>
>diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-
>common.h
>index 1a96678f8c38..4e44b26d3c45 100644
>--- a/include/hw/vfio/vfio-common.h
>+++ b/include/hw/vfio/vfio-common.h
>@@ -242,6 +242,7 @@ void vfio_region_finalize(VFIORegion *region);
> void vfio_reset_handler(void *opaque);
> struct vfio_device_info *vfio_get_device_info(int fd);
> bool vfio_device_is_mdev(VFIODevice *vbasedev);
>+bool vfio_device_hiod_realize(VFIODevice *vbasedev, Error **errp);
> bool vfio_attach_device(char *name, VFIODevice *vbasedev,
> AddressSpace *as, Error **errp);
> void vfio_detach_device(VFIODevice *vbasedev);
>diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>index b0beed44116e..cc14f0e3fe24 100644
>--- a/hw/vfio/common.c
>+++ b/hw/vfio/common.c
>@@ -1544,7 +1544,7 @@ bool vfio_attach_device(char *name, VFIODevice
>*vbasedev,
> {
> const VFIOIOMMUClass *ops =
>
>VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_LEGACY));
>- HostIOMMUDevice *hiod;
>+ HostIOMMUDevice *hiod = NULL;
>
> if (vbasedev->iommufd) {
> ops =
>VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_IOMMUF
>D));
>@@ -1552,21 +1552,17 @@ bool vfio_attach_device(char *name,
>VFIODevice *vbasedev,
>
> assert(ops);
>
>- if (!ops->attach_device(name, vbasedev, as, errp)) {
>- return false;
>- }
>
>- if (vbasedev->mdev) {
>- return true;
>+ if (!vbasedev->mdev) {
>+ hiod = HOST_IOMMU_DEVICE(object_new(ops->hiod_typename));
>+ vbasedev->hiod = hiod;
> }
>
>- hiod = HOST_IOMMU_DEVICE(object_new(ops->hiod_typename));
>- if (!HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod, vbasedev,
>errp)) {
>+ if (!ops->attach_device(name, vbasedev, as, errp)) {
> object_unref(hiod);
>- ops->detach_device(vbasedev);
>+ vbasedev->hiod = NULL;
> return false;
> }
>- vbasedev->hiod = hiod;
>
> return true;
> }
>diff --git a/hw/vfio/container.c b/hw/vfio/container.c
>index c27f448ba26e..adb302216e23 100644
>--- a/hw/vfio/container.c
>+++ b/hw/vfio/container.c
>@@ -917,6 +917,10 @@ static bool vfio_legacy_attach_device(const char
>*name, VFIODevice *vbasedev,
>
> trace_vfio_attach_device(vbasedev->name, groupid);
>
>+ if (!vfio_device_hiod_realize(vbasedev, errp)) {
>+ return false;
>+ }
>+
> group = vfio_get_group(groupid, as, errp);
> if (!group) {
> return false;
>diff --git a/hw/vfio/helpers.c b/hw/vfio/helpers.c
>index 7e23e9080c9d..ea15c79db0a3 100644
>--- a/hw/vfio/helpers.c
>+++ b/hw/vfio/helpers.c
>@@ -689,3 +689,14 @@ bool vfio_device_is_mdev(VFIODevice *vbasedev)
> subsys = realpath(tmp, NULL);
> return subsys && (strcmp(subsys, "/sys/bus/mdev") == 0);
> }
>+
>+bool vfio_device_hiod_realize(VFIODevice *vbasedev, Error **errp)
>+{
>+ HostIOMMUDevice *hiod = vbasedev->hiod;
>+
>+ if (!hiod) {
>+ return true;
>+ }
>+
>+ return HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod, vbasedev,
>errp);
>+}
>diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>index 7a10b1e90a6f..bb44d948c735 100644
>--- a/hw/vfio/iommufd.c
>+++ b/hw/vfio/iommufd.c
>@@ -403,6 +403,10 @@ static bool iommufd_cdev_attach(const char
>*name, VFIODevice *vbasedev,
>
> space = vfio_get_address_space(as);
>
>+ if (!vfio_device_hiod_realize(vbasedev, errp)) {
>+ return false;
>+ }
>+
> /* try to attach to an existing container in this space */
> QLIST_FOREACH(bcontainer, &space->containers, next) {
> container = container_of(bcontainer, VFIOIOMMUFDContainer,
>bcontainer);
>--
>2.17.2
^ permalink raw reply [flat|nested] 53+ messages in thread
* [PATCH v5 09/13] vfio/iommufd: Probe and request hwpt dirty tracking capability
2024-07-19 12:04 [PATCH v5 00/13] hw/iommufd: IOMMUFD Dirty Tracking Joao Martins
` (7 preceding siblings ...)
2024-07-19 12:04 ` [PATCH v5 08/13] vfio/{iommufd, container}: Invoke HostIOMMUDevice::realize() during attach_device() Joao Martins via
@ 2024-07-19 12:04 ` Joao Martins
2024-07-22 6:05 ` Duan, Zhenzhong
2024-07-19 12:04 ` [PATCH v5 10/13] vfio/iommufd: Implement VFIOIOMMUClass::set_dirty_tracking support Joao Martins
` (6 subsequent siblings)
15 siblings, 1 reply; 53+ messages in thread
From: Joao Martins @ 2024-07-19 12:04 UTC (permalink / raw)
To: qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Cedric Le Goater, Jason Gunthorpe, Avihai Horon, Joao Martins
In preparation to using the dirty tracking UAPI, probe whether the IOMMU
supports dirty tracking. This is done via the data stored in
hiod::caps::hw_caps initialized from GET_HW_INFO.
Qemu doesn't know if VF dirty tracking is supported when allocating
hardware pagetable in iommufd_cdev_autodomains_get(). This is because
VFIODevice migration state hasn't been initialized *yet* hence it can't pick
between VF dirty tracking vs IOMMU dirty tracking. So, if IOMMU supports
dirty tracking it always creates HWPTs with IOMMU_HWPT_ALLOC_DIRTY_TRACKING
even if later on VFIOMigration decides to use VF dirty tracking instead.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
include/hw/vfio/vfio-common.h | 1 +
hw/vfio/iommufd.c | 19 +++++++++++++++++++
2 files changed, 20 insertions(+)
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 4e44b26d3c45..7e530c7869dc 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -97,6 +97,7 @@ typedef struct IOMMUFDBackend IOMMUFDBackend;
typedef struct VFIOIOASHwpt {
uint32_t hwpt_id;
+ uint32_t hwpt_flags;
QLIST_HEAD(, VFIODevice) device_list;
QLIST_ENTRY(VFIOIOASHwpt) next;
} VFIOIOASHwpt;
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index bb44d948c735..2e5c207bbca0 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -110,6 +110,11 @@ static void iommufd_cdev_unbind_and_disconnect(VFIODevice *vbasedev)
iommufd_backend_disconnect(vbasedev->iommufd);
}
+static bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt)
+{
+ return hwpt && hwpt->hwpt_flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
+}
+
static int iommufd_cdev_getfd(const char *sysfs_path, Error **errp)
{
ERRP_GUARD();
@@ -246,6 +251,17 @@ static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
}
}
+ /*
+ * This is quite early and VFIO Migration state isn't yet fully
+ * initialized, thus rely only on IOMMU hardware capabilities as to
+ * whether IOMMU dirty tracking is going to be requested. Later
+ * vfio_migration_realize() may decide to use VF dirty tracking
+ * instead.
+ */
+ if (vbasedev->hiod->caps.hw_caps & IOMMU_HW_CAP_DIRTY_TRACKING) {
+ flags = IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
+ }
+
if (!iommufd_backend_alloc_hwpt(iommufd, vbasedev->devid,
container->ioas_id, flags,
IOMMU_HWPT_DATA_NONE, 0, NULL,
@@ -255,6 +271,7 @@ static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
hwpt = g_malloc0(sizeof(*hwpt));
hwpt->hwpt_id = hwpt_id;
+ hwpt->hwpt_flags = flags;
QLIST_INIT(&hwpt->device_list);
ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, errp);
@@ -267,6 +284,8 @@ static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
vbasedev->hwpt = hwpt;
QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
QLIST_INSERT_HEAD(&container->hwpt_list, hwpt, next);
+ container->bcontainer.dirty_pages_supported |=
+ iommufd_hwpt_dirty_tracking(hwpt);
return true;
}
--
2.17.2
^ permalink raw reply related [flat|nested] 53+ messages in thread
* RE: [PATCH v5 09/13] vfio/iommufd: Probe and request hwpt dirty tracking capability
2024-07-19 12:04 ` [PATCH v5 09/13] vfio/iommufd: Probe and request hwpt dirty tracking capability Joao Martins
@ 2024-07-22 6:05 ` Duan, Zhenzhong
2024-07-22 8:58 ` Joao Martins
0 siblings, 1 reply; 53+ messages in thread
From: Duan, Zhenzhong @ 2024-07-22 6:05 UTC (permalink / raw)
To: Joao Martins, qemu-devel@nongnu.org
Cc: Liu, Yi L, Eric Auger, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
>-----Original Message-----
>From: Joao Martins <joao.m.martins@oracle.com>
>Subject: [PATCH v5 09/13] vfio/iommufd: Probe and request hwpt dirty
>tracking capability
>
>In preparation to using the dirty tracking UAPI, probe whether the IOMMU
>supports dirty tracking. This is done via the data stored in
>hiod::caps::hw_caps initialized from GET_HW_INFO.
>
>Qemu doesn't know if VF dirty tracking is supported when allocating
>hardware pagetable in iommufd_cdev_autodomains_get(). This is because
>VFIODevice migration state hasn't been initialized *yet* hence it can't pick
>between VF dirty tracking vs IOMMU dirty tracking. So, if IOMMU supports
>dirty tracking it always creates HWPTs with
>IOMMU_HWPT_ALLOC_DIRTY_TRACKING
>even if later on VFIOMigration decides to use VF dirty tracking instead.
I thought there is no overhead for HWPT with IOMMU_HWPT_ALLOC_DIRTY_TRACKING vs. HWPT without IOMMU_HWPT_ALLOC_DIRTY_TRACKING if we don't enable dirty tracking. Right?
>
>Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>---
> include/hw/vfio/vfio-common.h | 1 +
> hw/vfio/iommufd.c | 19 +++++++++++++++++++
> 2 files changed, 20 insertions(+)
>
>diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-
>common.h
>index 4e44b26d3c45..7e530c7869dc 100644
>--- a/include/hw/vfio/vfio-common.h
>+++ b/include/hw/vfio/vfio-common.h
>@@ -97,6 +97,7 @@ typedef struct IOMMUFDBackend IOMMUFDBackend;
>
> typedef struct VFIOIOASHwpt {
> uint32_t hwpt_id;
>+ uint32_t hwpt_flags;
> QLIST_HEAD(, VFIODevice) device_list;
> QLIST_ENTRY(VFIOIOASHwpt) next;
> } VFIOIOASHwpt;
>diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>index bb44d948c735..2e5c207bbca0 100644
>--- a/hw/vfio/iommufd.c
>+++ b/hw/vfio/iommufd.c
>@@ -110,6 +110,11 @@ static void
>iommufd_cdev_unbind_and_disconnect(VFIODevice *vbasedev)
> iommufd_backend_disconnect(vbasedev->iommufd);
> }
>
>+static bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt)
>+{
>+ return hwpt && hwpt->hwpt_flags &
>IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
>+}
>+
> static int iommufd_cdev_getfd(const char *sysfs_path, Error **errp)
> {
> ERRP_GUARD();
>@@ -246,6 +251,17 @@ static bool
>iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
> }
> }
>
>+ /*
>+ * This is quite early and VFIO Migration state isn't yet fully
>+ * initialized, thus rely only on IOMMU hardware capabilities as to
>+ * whether IOMMU dirty tracking is going to be requested. Later
>+ * vfio_migration_realize() may decide to use VF dirty tracking
>+ * instead.
>+ */
>+ if (vbasedev->hiod->caps.hw_caps &
>IOMMU_HW_CAP_DIRTY_TRACKING) {
Looks there is still reference to hw_caps, then would suggest to bring back the NEW CAP.
>+ flags = IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
>+ }
>+
> if (!iommufd_backend_alloc_hwpt(iommufd, vbasedev->devid,
> container->ioas_id, flags,
> IOMMU_HWPT_DATA_NONE, 0, NULL,
>@@ -255,6 +271,7 @@ static bool
>iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>
> hwpt = g_malloc0(sizeof(*hwpt));
> hwpt->hwpt_id = hwpt_id;
>+ hwpt->hwpt_flags = flags;
> QLIST_INIT(&hwpt->device_list);
>
> ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, errp);
>@@ -267,6 +284,8 @@ static bool
>iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
> vbasedev->hwpt = hwpt;
> QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
> QLIST_INSERT_HEAD(&container->hwpt_list, hwpt, next);
>+ container->bcontainer.dirty_pages_supported |=
>+ iommufd_hwpt_dirty_tracking(hwpt);
If there is at least one hwpt without dirty tracking, shouldn't we make bcontainer.dirty_pages_supported false?
Thanks
Zhenzhong
> return true;
> }
>
>--
>2.17.2
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v5 09/13] vfio/iommufd: Probe and request hwpt dirty tracking capability
2024-07-22 6:05 ` Duan, Zhenzhong
@ 2024-07-22 8:58 ` Joao Martins
2024-07-22 14:09 ` Joao Martins
0 siblings, 1 reply; 53+ messages in thread
From: Joao Martins @ 2024-07-22 8:58 UTC (permalink / raw)
To: Duan, Zhenzhong, qemu-devel@nongnu.org
Cc: Liu, Yi L, Eric Auger, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
On 22/07/2024 07:05, Duan, Zhenzhong wrote:
>
>
>> -----Original Message-----
>> From: Joao Martins <joao.m.martins@oracle.com>
>> Subject: [PATCH v5 09/13] vfio/iommufd: Probe and request hwpt dirty
>> tracking capability
>>
>> In preparation to using the dirty tracking UAPI, probe whether the IOMMU
>> supports dirty tracking. This is done via the data stored in
>> hiod::caps::hw_caps initialized from GET_HW_INFO.
>>
>> Qemu doesn't know if VF dirty tracking is supported when allocating
>> hardware pagetable in iommufd_cdev_autodomains_get(). This is because
>> VFIODevice migration state hasn't been initialized *yet* hence it can't pick
>> between VF dirty tracking vs IOMMU dirty tracking. So, if IOMMU supports
>> dirty tracking it always creates HWPTs with
>> IOMMU_HWPT_ALLOC_DIRTY_TRACKING
>> even if later on VFIOMigration decides to use VF dirty tracking instead.
>
> I thought there is no overhead for HWPT with IOMMU_HWPT_ALLOC_DIRTY_TRACKING vs. HWPT without IOMMU_HWPT_ALLOC_DIRTY_TRACKING if we don't enable dirty tracking. Right?
>
Correct.
>>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> ---
>> include/hw/vfio/vfio-common.h | 1 +
>> hw/vfio/iommufd.c | 19 +++++++++++++++++++
>> 2 files changed, 20 insertions(+)
>>
>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-
>> common.h
>> index 4e44b26d3c45..7e530c7869dc 100644
>> --- a/include/hw/vfio/vfio-common.h
>> +++ b/include/hw/vfio/vfio-common.h
>> @@ -97,6 +97,7 @@ typedef struct IOMMUFDBackend IOMMUFDBackend;
>>
>> typedef struct VFIOIOASHwpt {
>> uint32_t hwpt_id;
>> + uint32_t hwpt_flags;
>> QLIST_HEAD(, VFIODevice) device_list;
>> QLIST_ENTRY(VFIOIOASHwpt) next;
>> } VFIOIOASHwpt;
>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>> index bb44d948c735..2e5c207bbca0 100644
>> --- a/hw/vfio/iommufd.c
>> +++ b/hw/vfio/iommufd.c
>> @@ -110,6 +110,11 @@ static void
>> iommufd_cdev_unbind_and_disconnect(VFIODevice *vbasedev)
>> iommufd_backend_disconnect(vbasedev->iommufd);
>> }
>>
>> +static bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt)
>> +{
>> + return hwpt && hwpt->hwpt_flags &
>> IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
>> +}
>> +
>> static int iommufd_cdev_getfd(const char *sysfs_path, Error **errp)
>> {
>> ERRP_GUARD();
>> @@ -246,6 +251,17 @@ static bool
>> iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>> }
>> }
>>
>> + /*
>> + * This is quite early and VFIO Migration state isn't yet fully
>> + * initialized, thus rely only on IOMMU hardware capabilities as to
>> + * whether IOMMU dirty tracking is going to be requested. Later
>> + * vfio_migration_realize() may decide to use VF dirty tracking
>> + * instead.
>> + */
>> + if (vbasedev->hiod->caps.hw_caps &
>> IOMMU_HW_CAP_DIRTY_TRACKING) {
>
> Looks there is still reference to hw_caps, then would suggest to bring back the NEW CAP.
>
Ah, but below helper is checking for GET_HW_INFO stuff, and not hwpt flags
gioven that we haven't allocated a hwpt yet.
While I could place this check into a helper it would only have an user. I will
need below helper iommufd_hwpt_dirty_tracking() in another patch, so this is a
bit of a one off check only (unless we want a new helper for cosmetic purposes)
>> + flags = IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
>> + }
>> +
>> if (!iommufd_backend_alloc_hwpt(iommufd, vbasedev->devid,
>> container->ioas_id, flags,
>> IOMMU_HWPT_DATA_NONE, 0, NULL,
>> @@ -255,6 +271,7 @@ static bool
>> iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>>
>> hwpt = g_malloc0(sizeof(*hwpt));
>> hwpt->hwpt_id = hwpt_id;
>> + hwpt->hwpt_flags = flags;
>> QLIST_INIT(&hwpt->device_list);
>>
>> ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, errp);
>> @@ -267,6 +284,8 @@ static bool
>> iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>> vbasedev->hwpt = hwpt;
>> QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
>> QLIST_INSERT_HEAD(&container->hwpt_list, hwpt, next);
>> + container->bcontainer.dirty_pages_supported |=
>> + iommufd_hwpt_dirty_tracking(hwpt);
>
> If there is at least one hwpt without dirty tracking, shouldn't we make bcontainer.dirty_pages_supported false?
>
> Thanks
> Zhenzhong
>
>> return true;
>> }
>>
>> --
>> 2.17.2
>
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v5 09/13] vfio/iommufd: Probe and request hwpt dirty tracking capability
2024-07-22 8:58 ` Joao Martins
@ 2024-07-22 14:09 ` Joao Martins
2024-07-22 14:13 ` Joao Martins
0 siblings, 1 reply; 53+ messages in thread
From: Joao Martins @ 2024-07-22 14:09 UTC (permalink / raw)
To: Duan, Zhenzhong, qemu-devel@nongnu.org
Cc: Liu, Yi L, Eric Auger, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
On 22/07/2024 09:58, Joao Martins wrote:
> On 22/07/2024 07:05, Duan, Zhenzhong wrote:
>>
>>
>>> -----Original Message-----
>>> From: Joao Martins <joao.m.martins@oracle.com>
>>> Subject: [PATCH v5 09/13] vfio/iommufd: Probe and request hwpt dirty
>>> tracking capability
>>>
>>> In preparation to using the dirty tracking UAPI, probe whether the IOMMU
>>> supports dirty tracking. This is done via the data stored in
>>> hiod::caps::hw_caps initialized from GET_HW_INFO.
>>>
>>> Qemu doesn't know if VF dirty tracking is supported when allocating
>>> hardware pagetable in iommufd_cdev_autodomains_get(). This is because
>>> VFIODevice migration state hasn't been initialized *yet* hence it can't pick
>>> between VF dirty tracking vs IOMMU dirty tracking. So, if IOMMU supports
>>> dirty tracking it always creates HWPTs with
>>> IOMMU_HWPT_ALLOC_DIRTY_TRACKING
>>> even if later on VFIOMigration decides to use VF dirty tracking instead.
>>
>> I thought there is no overhead for HWPT with IOMMU_HWPT_ALLOC_DIRTY_TRACKING vs. HWPT without IOMMU_HWPT_ALLOC_DIRTY_TRACKING if we don't enable dirty tracking. Right?
>>
>
> Correct.
>
>>>
>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>> ---
>>> include/hw/vfio/vfio-common.h | 1 +
>>> hw/vfio/iommufd.c | 19 +++++++++++++++++++
>>> 2 files changed, 20 insertions(+)
>>>
>>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-
>>> common.h
>>> index 4e44b26d3c45..7e530c7869dc 100644
>>> --- a/include/hw/vfio/vfio-common.h
>>> +++ b/include/hw/vfio/vfio-common.h
>>> @@ -97,6 +97,7 @@ typedef struct IOMMUFDBackend IOMMUFDBackend;
>>>
>>> typedef struct VFIOIOASHwpt {
>>> uint32_t hwpt_id;
>>> + uint32_t hwpt_flags;
>>> QLIST_HEAD(, VFIODevice) device_list;
>>> QLIST_ENTRY(VFIOIOASHwpt) next;
>>> } VFIOIOASHwpt;
>>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>>> index bb44d948c735..2e5c207bbca0 100644
>>> --- a/hw/vfio/iommufd.c
>>> +++ b/hw/vfio/iommufd.c
>>> @@ -110,6 +110,11 @@ static void
>>> iommufd_cdev_unbind_and_disconnect(VFIODevice *vbasedev)
>>> iommufd_backend_disconnect(vbasedev->iommufd);
>>> }
>>>
>>> +static bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt)
>>> +{
>>> + return hwpt && hwpt->hwpt_flags &
>>> IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
>>> +}
>>> +
>>> static int iommufd_cdev_getfd(const char *sysfs_path, Error **errp)
>>> {
>>> ERRP_GUARD();
>>> @@ -246,6 +251,17 @@ static bool
>>> iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>>> }
>>> }
>>>
>>> + /*
>>> + * This is quite early and VFIO Migration state isn't yet fully
>>> + * initialized, thus rely only on IOMMU hardware capabilities as to
>>> + * whether IOMMU dirty tracking is going to be requested. Later
>>> + * vfio_migration_realize() may decide to use VF dirty tracking
>>> + * instead.
>>> + */
>>> + if (vbasedev->hiod->caps.hw_caps &
>>> IOMMU_HW_CAP_DIRTY_TRACKING) {
>>
>> Looks there is still reference to hw_caps, then would suggest to bring back the NEW CAP.
>>
> Ah, but below helper is checking for GET_HW_INFO stuff, and not hwpt flags
> gioven that we haven't allocated a hwpt yet.
>
> While I could place this check into a helper it would only have an user. I will
> need below helper iommufd_hwpt_dirty_tracking() in another patch, so this is a
> bit of a one off check only (unless we want a new helper for cosmetic purposes)
>
>>> + flags = IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
>>> + }
>>> +
>>> if (!iommufd_backend_alloc_hwpt(iommufd, vbasedev->devid,
>>> container->ioas_id, flags,
>>> IOMMU_HWPT_DATA_NONE, 0, NULL,
>>> @@ -255,6 +271,7 @@ static bool
>>> iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>>>
>>> hwpt = g_malloc0(sizeof(*hwpt));
>>> hwpt->hwpt_id = hwpt_id;
>>> + hwpt->hwpt_flags = flags;
>>> QLIST_INIT(&hwpt->device_list);
>>>
>>> ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, errp);
>>> @@ -267,6 +284,8 @@ static bool
>>> iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>>> vbasedev->hwpt = hwpt;
>>> QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
>>> QLIST_INSERT_HEAD(&container->hwpt_list, hwpt, next);
>>> + container->bcontainer.dirty_pages_supported |=
>>> + iommufd_hwpt_dirty_tracking(hwpt);
>>
>> If there is at least one hwpt without dirty tracking, shouldn't we make bcontainer.dirty_pages_supported false?
>>
Missed this comment. We could set to false but the generic container abstraction
is utilizing this to let the ioctls() of the individual backend to go through to
the defined callback, and that's why I set to true.
At that is really the only effect of this patch. By the time we reach to patch
12 (which is what really enables live migration with IOMMU automatically), the
IOMMUFD dirty tracking is only called 1) when not one of the VF doesn't support
device dirty tracking [only if you're using IOMMUFD backend], and finally 2)
that no VF/mdev has added the migration blocker which essentially looks at the
HWPT flags (as opposed to the container attribute).
Joao
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v5 09/13] vfio/iommufd: Probe and request hwpt dirty tracking capability
2024-07-22 14:09 ` Joao Martins
@ 2024-07-22 14:13 ` Joao Martins
2024-07-23 3:07 ` Duan, Zhenzhong
0 siblings, 1 reply; 53+ messages in thread
From: Joao Martins @ 2024-07-22 14:13 UTC (permalink / raw)
To: Duan, Zhenzhong, qemu-devel@nongnu.org
Cc: Liu, Yi L, Eric Auger, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
On 22/07/2024 15:09, Joao Martins wrote:
> On 22/07/2024 09:58, Joao Martins wrote:
>> On 22/07/2024 07:05, Duan, Zhenzhong wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Joao Martins <joao.m.martins@oracle.com>
>>>> Subject: [PATCH v5 09/13] vfio/iommufd: Probe and request hwpt dirty
>>>> tracking capability
>>>>
>>>> In preparation to using the dirty tracking UAPI, probe whether the IOMMU
>>>> supports dirty tracking. This is done via the data stored in
>>>> hiod::caps::hw_caps initialized from GET_HW_INFO.
>>>>
>>>> Qemu doesn't know if VF dirty tracking is supported when allocating
>>>> hardware pagetable in iommufd_cdev_autodomains_get(). This is because
>>>> VFIODevice migration state hasn't been initialized *yet* hence it can't pick
>>>> between VF dirty tracking vs IOMMU dirty tracking. So, if IOMMU supports
>>>> dirty tracking it always creates HWPTs with
>>>> IOMMU_HWPT_ALLOC_DIRTY_TRACKING
>>>> even if later on VFIOMigration decides to use VF dirty tracking instead.
>>>
>>> I thought there is no overhead for HWPT with IOMMU_HWPT_ALLOC_DIRTY_TRACKING vs. HWPT without IOMMU_HWPT_ALLOC_DIRTY_TRACKING if we don't enable dirty tracking. Right?
>>>
>>
>> Correct.
>>
>>>>
>>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>>> ---
>>>> include/hw/vfio/vfio-common.h | 1 +
>>>> hw/vfio/iommufd.c | 19 +++++++++++++++++++
>>>> 2 files changed, 20 insertions(+)
>>>>
>>>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-
>>>> common.h
>>>> index 4e44b26d3c45..7e530c7869dc 100644
>>>> --- a/include/hw/vfio/vfio-common.h
>>>> +++ b/include/hw/vfio/vfio-common.h
>>>> @@ -97,6 +97,7 @@ typedef struct IOMMUFDBackend IOMMUFDBackend;
>>>>
>>>> typedef struct VFIOIOASHwpt {
>>>> uint32_t hwpt_id;
>>>> + uint32_t hwpt_flags;
>>>> QLIST_HEAD(, VFIODevice) device_list;
>>>> QLIST_ENTRY(VFIOIOASHwpt) next;
>>>> } VFIOIOASHwpt;
>>>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>>>> index bb44d948c735..2e5c207bbca0 100644
>>>> --- a/hw/vfio/iommufd.c
>>>> +++ b/hw/vfio/iommufd.c
>>>> @@ -110,6 +110,11 @@ static void
>>>> iommufd_cdev_unbind_and_disconnect(VFIODevice *vbasedev)
>>>> iommufd_backend_disconnect(vbasedev->iommufd);
>>>> }
>>>>
>>>> +static bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt)
>>>> +{
>>>> + return hwpt && hwpt->hwpt_flags &
>>>> IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
>>>> +}
>>>> +
>>>> static int iommufd_cdev_getfd(const char *sysfs_path, Error **errp)
>>>> {
>>>> ERRP_GUARD();
>>>> @@ -246,6 +251,17 @@ static bool
>>>> iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>>>> }
>>>> }
>>>>
>>>> + /*
>>>> + * This is quite early and VFIO Migration state isn't yet fully
>>>> + * initialized, thus rely only on IOMMU hardware capabilities as to
>>>> + * whether IOMMU dirty tracking is going to be requested. Later
>>>> + * vfio_migration_realize() may decide to use VF dirty tracking
>>>> + * instead.
>>>> + */
>>>> + if (vbasedev->hiod->caps.hw_caps &
>>>> IOMMU_HW_CAP_DIRTY_TRACKING) {
>>>
>>> Looks there is still reference to hw_caps, then would suggest to bring back the NEW CAP.
>>>
>> Ah, but below helper is checking for GET_HW_INFO stuff, and not hwpt flags
>> given that we haven't allocated a hwpt yet.
>>
>> While I could place this check into a helper it would only have an user. I will
>> need below helper iommufd_hwpt_dirty_tracking() in another patch, so this is a
>> bit of a one off check only (unless we want a new helper for cosmetic purposes)
>>
>>>> + flags = IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
>>>> + }
>>>> +
>>>> if (!iommufd_backend_alloc_hwpt(iommufd, vbasedev->devid,
>>>> container->ioas_id, flags,
>>>> IOMMU_HWPT_DATA_NONE, 0, NULL,
>>>> @@ -255,6 +271,7 @@ static bool
>>>> iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>>>>
>>>> hwpt = g_malloc0(sizeof(*hwpt));
>>>> hwpt->hwpt_id = hwpt_id;
>>>> + hwpt->hwpt_flags = flags;
>>>> QLIST_INIT(&hwpt->device_list);
>>>>
>>>> ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, errp);
>>>> @@ -267,6 +284,8 @@ static bool
>>>> iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>>>> vbasedev->hwpt = hwpt;
>>>> QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
>>>> QLIST_INSERT_HEAD(&container->hwpt_list, hwpt, next);
>>>> + container->bcontainer.dirty_pages_supported |=
>>>> + iommufd_hwpt_dirty_tracking(hwpt);
>>>
>>> If there is at least one hwpt without dirty tracking, shouldn't we make bcontainer.dirty_pages_supported false?
>>>
>
> Missed this comment. We could set to false but the generic container abstraction
> is utilizing this to let the ioctls() of the individual backend to go through to
> the defined callback, and that's why I set to true.
>
Let me rephrase, I meant: "(...) utilizing this to let the individual backend
container callbacks of dirty tracking to go through, and that's why I set to true."
> And that is really the only effect of this patch. By the time we reach to patch
> 12 (which is what really enables live migration with IOMMU automatically), the
> IOMMUFD dirty tracking is only called 1) when not one of the VF doesn't support
> device dirty tracking [only if you're using IOMMUFD backend], and finally 2)
> that no VF/mdev has added the migration blocker which essentially looks at the
> HWPT flags (as opposed to the container attribute).
>
> Joao
>
^ permalink raw reply [flat|nested] 53+ messages in thread
* RE: [PATCH v5 09/13] vfio/iommufd: Probe and request hwpt dirty tracking capability
2024-07-22 14:13 ` Joao Martins
@ 2024-07-23 3:07 ` Duan, Zhenzhong
0 siblings, 0 replies; 53+ messages in thread
From: Duan, Zhenzhong @ 2024-07-23 3:07 UTC (permalink / raw)
To: Joao Martins, qemu-devel@nongnu.org
Cc: Liu, Yi L, Eric Auger, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
>-----Original Message-----
>From: Joao Martins <joao.m.martins@oracle.com>
>Subject: Re: [PATCH v5 09/13] vfio/iommufd: Probe and request hwpt dirty
>tracking capability
>
>On 22/07/2024 15:09, Joao Martins wrote:
>> On 22/07/2024 09:58, Joao Martins wrote:
>>> On 22/07/2024 07:05, Duan, Zhenzhong wrote:
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: Joao Martins <joao.m.martins@oracle.com>
>>>>> Subject: [PATCH v5 09/13] vfio/iommufd: Probe and request hwpt
>dirty
>>>>> tracking capability
>>>>>
>>>>> In preparation to using the dirty tracking UAPI, probe whether the
>IOMMU
>>>>> supports dirty tracking. This is done via the data stored in
>>>>> hiod::caps::hw_caps initialized from GET_HW_INFO.
>>>>>
>>>>> Qemu doesn't know if VF dirty tracking is supported when allocating
>>>>> hardware pagetable in iommufd_cdev_autodomains_get(). This is
>because
>>>>> VFIODevice migration state hasn't been initialized *yet* hence it can't
>pick
>>>>> between VF dirty tracking vs IOMMU dirty tracking. So, if IOMMU
>supports
>>>>> dirty tracking it always creates HWPTs with
>>>>> IOMMU_HWPT_ALLOC_DIRTY_TRACKING
>>>>> even if later on VFIOMigration decides to use VF dirty tracking instead.
>>>>
>>>> I thought there is no overhead for HWPT with
>IOMMU_HWPT_ALLOC_DIRTY_TRACKING vs. HWPT without
>IOMMU_HWPT_ALLOC_DIRTY_TRACKING if we don't enable dirty tracking.
>Right?
>>>>
>>>
>>> Correct.
>>>
>>>>>
>>>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>>>> ---
>>>>> include/hw/vfio/vfio-common.h | 1 +
>>>>> hw/vfio/iommufd.c | 19 +++++++++++++++++++
>>>>> 2 files changed, 20 insertions(+)
>>>>>
>>>>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-
>>>>> common.h
>>>>> index 4e44b26d3c45..7e530c7869dc 100644
>>>>> --- a/include/hw/vfio/vfio-common.h
>>>>> +++ b/include/hw/vfio/vfio-common.h
>>>>> @@ -97,6 +97,7 @@ typedef struct IOMMUFDBackend
>IOMMUFDBackend;
>>>>>
>>>>> typedef struct VFIOIOASHwpt {
>>>>> uint32_t hwpt_id;
>>>>> + uint32_t hwpt_flags;
>>>>> QLIST_HEAD(, VFIODevice) device_list;
>>>>> QLIST_ENTRY(VFIOIOASHwpt) next;
>>>>> } VFIOIOASHwpt;
>>>>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>>>>> index bb44d948c735..2e5c207bbca0 100644
>>>>> --- a/hw/vfio/iommufd.c
>>>>> +++ b/hw/vfio/iommufd.c
>>>>> @@ -110,6 +110,11 @@ static void
>>>>> iommufd_cdev_unbind_and_disconnect(VFIODevice *vbasedev)
>>>>> iommufd_backend_disconnect(vbasedev->iommufd);
>>>>> }
>>>>>
>>>>> +static bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt)
>>>>> +{
>>>>> + return hwpt && hwpt->hwpt_flags &
>>>>> IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
>>>>> +}
>>>>> +
>>>>> static int iommufd_cdev_getfd(const char *sysfs_path, Error **errp)
>>>>> {
>>>>> ERRP_GUARD();
>>>>> @@ -246,6 +251,17 @@ static bool
>>>>> iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>>>>> }
>>>>> }
>>>>>
>>>>> + /*
>>>>> + * This is quite early and VFIO Migration state isn't yet fully
>>>>> + * initialized, thus rely only on IOMMU hardware capabilities as to
>>>>> + * whether IOMMU dirty tracking is going to be requested. Later
>>>>> + * vfio_migration_realize() may decide to use VF dirty tracking
>>>>> + * instead.
>>>>> + */
>>>>> + if (vbasedev->hiod->caps.hw_caps &
>>>>> IOMMU_HW_CAP_DIRTY_TRACKING) {
>>>>
>>>> Looks there is still reference to hw_caps, then would suggest to bring
>back the NEW CAP.
>>>>
>>> Ah, but below helper is checking for GET_HW_INFO stuff, and not hwpt
>flags
>>> given that we haven't allocated a hwpt yet.
>>>
>>> While I could place this check into a helper it would only have an user. I
>will
>>> need below helper iommufd_hwpt_dirty_tracking() in another patch, so
>this is a
>>> bit of a one off check only (unless we want a new helper for cosmetic
>purposes)
>>>
>>>>> + flags = IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
>>>>> + }
>>>>> +
>>>>> if (!iommufd_backend_alloc_hwpt(iommufd, vbasedev->devid,
>>>>> container->ioas_id, flags,
>>>>> IOMMU_HWPT_DATA_NONE, 0, NULL,
>>>>> @@ -255,6 +271,7 @@ static bool
>>>>> iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>>>>>
>>>>> hwpt = g_malloc0(sizeof(*hwpt));
>>>>> hwpt->hwpt_id = hwpt_id;
>>>>> + hwpt->hwpt_flags = flags;
>>>>> QLIST_INIT(&hwpt->device_list);
>>>>>
>>>>> ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id,
>errp);
>>>>> @@ -267,6 +284,8 @@ static bool
>>>>> iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>>>>> vbasedev->hwpt = hwpt;
>>>>> QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
>>>>> QLIST_INSERT_HEAD(&container->hwpt_list, hwpt, next);
>>>>> + container->bcontainer.dirty_pages_supported |=
>>>>> + iommufd_hwpt_dirty_tracking(hwpt);
>>>>
>>>> If there is at least one hwpt without dirty tracking, shouldn't we make
>bcontainer.dirty_pages_supported false?
>>>>
>>
>> Missed this comment. We could set to false but the generic container
>abstraction
>> is utilizing this to let the ioctls() of the individual backend to go through to
>> the defined callback, and that's why I set to true.
>>
>Let me rephrase, I meant: "(...) utilizing this to let the individual backend
>container callbacks of dirty tracking to go through, and that's why I set to
>true."
Not quite get.
If there is at least one hwpt not supporting dirty tracking, we can presume all dirty, no need to go through individual backend callbacks?
>
>> And that is really the only effect of this patch. By the time we reach to
>patch
>> 12 (which is what really enables live migration with IOMMU automatically),
>the
>> IOMMUFD dirty tracking is only called 1) when not one of the VF doesn't
>support
>> device dirty tracking [only if you're using IOMMUFD backend], and finally 2)
>> that no VF/mdev has added the migration blocker which essentially looks
>at the
>> HWPT flags (as opposed to the container attribute).
>>
>> Joao
>>
^ permalink raw reply [flat|nested] 53+ messages in thread
* [PATCH v5 10/13] vfio/iommufd: Implement VFIOIOMMUClass::set_dirty_tracking support
2024-07-19 12:04 [PATCH v5 00/13] hw/iommufd: IOMMUFD Dirty Tracking Joao Martins
` (8 preceding siblings ...)
2024-07-19 12:04 ` [PATCH v5 09/13] vfio/iommufd: Probe and request hwpt dirty tracking capability Joao Martins
@ 2024-07-19 12:04 ` Joao Martins
2024-07-22 6:15 ` Duan, Zhenzhong
2024-07-19 12:04 ` [PATCH v5 11/13] vfio/iommufd: Implement VFIOIOMMUClass::query_dirty_bitmap support Joao Martins
` (5 subsequent siblings)
15 siblings, 1 reply; 53+ messages in thread
From: Joao Martins @ 2024-07-19 12:04 UTC (permalink / raw)
To: qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Cedric Le Goater, Jason Gunthorpe, Avihai Horon, Joao Martins
ioctl(iommufd, IOMMU_HWPT_SET_DIRTY_TRACKING, arg) is the UAPI that
enables or disables dirty page tracking. The ioctl is used if the hwpt
has been created with dirty tracking supported domain (stored in
hwpt::flags) and it is called on the whole list of iommu domains.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
include/sysemu/iommufd.h | 2 ++
backends/iommufd.c | 23 +++++++++++++++++++++++
hw/vfio/iommufd.c | 32 ++++++++++++++++++++++++++++++++
backends/trace-events | 1 +
4 files changed, 58 insertions(+)
diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
index e917e7591d05..6fb412f61144 100644
--- a/include/sysemu/iommufd.h
+++ b/include/sysemu/iommufd.h
@@ -55,6 +55,8 @@ bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t dev_id,
uint32_t data_type, uint32_t data_len,
void *data_ptr, uint32_t *out_hwpt,
Error **errp);
+bool iommufd_backend_set_dirty_tracking(IOMMUFDBackend *be, uint32_t hwpt_id,
+ bool start, Error **errp);
#define TYPE_HOST_IOMMU_DEVICE_IOMMUFD TYPE_HOST_IOMMU_DEVICE "-iommufd"
#endif
diff --git a/backends/iommufd.c b/backends/iommufd.c
index 58032e588f49..1ae4751a1b2c 100644
--- a/backends/iommufd.c
+++ b/backends/iommufd.c
@@ -239,6 +239,29 @@ bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t dev_id,
return true;
}
+bool iommufd_backend_set_dirty_tracking(IOMMUFDBackend *be,
+ uint32_t hwpt_id, bool start,
+ Error **errp)
+{
+ int ret;
+ struct iommu_hwpt_set_dirty_tracking set_dirty = {
+ .size = sizeof(set_dirty),
+ .hwpt_id = hwpt_id,
+ .flags = start ? IOMMU_HWPT_DIRTY_TRACKING_ENABLE : 0,
+ };
+
+ ret = ioctl(be->fd, IOMMU_HWPT_SET_DIRTY_TRACKING, &set_dirty);
+ trace_iommufd_backend_set_dirty(be->fd, hwpt_id, start, ret ? errno : 0);
+ if (ret) {
+ error_setg_errno(errp, errno,
+ "IOMMU_HWPT_SET_DIRTY_TRACKING(hwpt_id %u) failed",
+ hwpt_id);
+ return false;
+ }
+
+ return true;
+}
+
bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t devid,
uint32_t *type, void *data, uint32_t len,
uint64_t *caps, Error **errp)
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 2e5c207bbca0..7137faaf4540 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -115,6 +115,37 @@ static bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt)
return hwpt && hwpt->hwpt_flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
}
+static int iommufd_set_dirty_page_tracking(const VFIOContainerBase *bcontainer,
+ bool start, Error **errp)
+{
+ const VFIOIOMMUFDContainer *container =
+ container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
+ VFIOIOASHwpt *hwpt;
+
+ QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
+ if (!iommufd_hwpt_dirty_tracking(hwpt)) {
+ continue;
+ }
+
+ if (!iommufd_backend_set_dirty_tracking(container->be,
+ hwpt->hwpt_id, start, errp)) {
+ goto err;
+ }
+ }
+
+ return 0;
+
+err:
+ QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
+ if (!iommufd_hwpt_dirty_tracking(hwpt)) {
+ continue;
+ }
+ iommufd_backend_set_dirty_tracking(container->be,
+ hwpt->hwpt_id, !start, NULL);
+ }
+ return -EINVAL;
+}
+
static int iommufd_cdev_getfd(const char *sysfs_path, Error **errp)
{
ERRP_GUARD();
@@ -724,6 +755,7 @@ static void vfio_iommu_iommufd_class_init(ObjectClass *klass, void *data)
vioc->attach_device = iommufd_cdev_attach;
vioc->detach_device = iommufd_cdev_detach;
vioc->pci_hot_reset = iommufd_cdev_pci_hot_reset;
+ vioc->set_dirty_page_tracking = iommufd_set_dirty_page_tracking;
};
static bool hiod_iommufd_vfio_realize(HostIOMMUDevice *hiod, void *opaque,
diff --git a/backends/trace-events b/backends/trace-events
index 4d8ac02fe7d6..28aca3b859d4 100644
--- a/backends/trace-events
+++ b/backends/trace-events
@@ -16,3 +16,4 @@ iommufd_backend_unmap_dma(int iommufd, uint32_t ioas, uint64_t iova, uint64_t si
iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas) " iommufd=%d ioas=%d"
iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id, uint32_t pt_id, uint32_t flags, uint32_t hwpt_type, uint32_t len, uint64_t data_ptr, uint32_t out_hwpt_id, int ret) " iommufd=%d dev_id=%u pt_id=%u flags=0x%x hwpt_type=%u len=%u data_ptr=0x%"PRIx64" out_hwpt=%u (%d)"
iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d id=%d (%d)"
+iommufd_backend_set_dirty(int iommufd, uint32_t hwpt_id, bool start, int ret) " iommufd=%d hwpt=%u enable=%d (%d)"
--
2.17.2
^ permalink raw reply related [flat|nested] 53+ messages in thread
* RE: [PATCH v5 10/13] vfio/iommufd: Implement VFIOIOMMUClass::set_dirty_tracking support
2024-07-19 12:04 ` [PATCH v5 10/13] vfio/iommufd: Implement VFIOIOMMUClass::set_dirty_tracking support Joao Martins
@ 2024-07-22 6:15 ` Duan, Zhenzhong
0 siblings, 0 replies; 53+ messages in thread
From: Duan, Zhenzhong @ 2024-07-22 6:15 UTC (permalink / raw)
To: Joao Martins, qemu-devel@nongnu.org
Cc: Liu, Yi L, Eric Auger, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
>-----Original Message-----
>From: Joao Martins <joao.m.martins@oracle.com>
>Subject: [PATCH v5 10/13] vfio/iommufd: Implement
>VFIOIOMMUClass::set_dirty_tracking support
>
>ioctl(iommufd, IOMMU_HWPT_SET_DIRTY_TRACKING, arg) is the UAPI that
>enables or disables dirty page tracking. The ioctl is used if the hwpt
>has been created with dirty tracking supported domain (stored in
>hwpt::flags) and it is called on the whole list of iommu domains.
>
>Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>---
> include/sysemu/iommufd.h | 2 ++
> backends/iommufd.c | 23 +++++++++++++++++++++++
> hw/vfio/iommufd.c | 32 ++++++++++++++++++++++++++++++++
> backends/trace-events | 1 +
> 4 files changed, 58 insertions(+)
>
>diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
>index e917e7591d05..6fb412f61144 100644
>--- a/include/sysemu/iommufd.h
>+++ b/include/sysemu/iommufd.h
>@@ -55,6 +55,8 @@ bool
>iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t dev_id,
> uint32_t data_type, uint32_t data_len,
> void *data_ptr, uint32_t *out_hwpt,
> Error **errp);
>+bool iommufd_backend_set_dirty_tracking(IOMMUFDBackend *be,
>uint32_t hwpt_id,
>+ bool start, Error **errp);
>
> #define TYPE_HOST_IOMMU_DEVICE_IOMMUFD
>TYPE_HOST_IOMMU_DEVICE "-iommufd"
> #endif
>diff --git a/backends/iommufd.c b/backends/iommufd.c
>index 58032e588f49..1ae4751a1b2c 100644
>--- a/backends/iommufd.c
>+++ b/backends/iommufd.c
>@@ -239,6 +239,29 @@ bool
>iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t dev_id,
> return true;
> }
>
>+bool iommufd_backend_set_dirty_tracking(IOMMUFDBackend *be,
>+ uint32_t hwpt_id, bool start,
>+ Error **errp)
>+{
>+ int ret;
>+ struct iommu_hwpt_set_dirty_tracking set_dirty = {
>+ .size = sizeof(set_dirty),
>+ .hwpt_id = hwpt_id,
>+ .flags = start ? IOMMU_HWPT_DIRTY_TRACKING_ENABLE : 0,
>+ };
>+
>+ ret = ioctl(be->fd, IOMMU_HWPT_SET_DIRTY_TRACKING, &set_dirty);
>+ trace_iommufd_backend_set_dirty(be->fd, hwpt_id, start, ret ? errno :
>0);
>+ if (ret) {
>+ error_setg_errno(errp, errno,
>+ "IOMMU_HWPT_SET_DIRTY_TRACKING(hwpt_id %u) failed",
>+ hwpt_id);
>+ return false;
>+ }
>+
>+ return true;
>+}
>+
> bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t
>devid,
> uint32_t *type, void *data, uint32_t len,
> uint64_t *caps, Error **errp)
>diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>index 2e5c207bbca0..7137faaf4540 100644
>--- a/hw/vfio/iommufd.c
>+++ b/hw/vfio/iommufd.c
>@@ -115,6 +115,37 @@ static bool
>iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt)
> return hwpt && hwpt->hwpt_flags &
>IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
> }
>
>+static int iommufd_set_dirty_page_tracking(const VFIOContainerBase
>*bcontainer,
>+ bool start, Error **errp)
>+{
>+ const VFIOIOMMUFDContainer *container =
>+ container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
>+ VFIOIOASHwpt *hwpt;
>+
>+ QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
>+ if (!iommufd_hwpt_dirty_tracking(hwpt)) {
>+ continue;
>+ }
>+
>+ if (!iommufd_backend_set_dirty_tracking(container->be,
>+ hwpt->hwpt_id, start, errp)) {
>+ goto err;
>+ }
>+ }
>+
>+ return 0;
>+
>+err:
>+ QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
>+ if (!iommufd_hwpt_dirty_tracking(hwpt)) {
>+ continue;
>+ }
>+ iommufd_backend_set_dirty_tracking(container->be,
>+ hwpt->hwpt_id, !start, NULL);
>+ }
Not sure if deserved to optimize a bit with breaking out from the failing hwpt.
With or without that,
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Thanks
Zhenzhong
>+ return -EINVAL;
>+}
>+
> static int iommufd_cdev_getfd(const char *sysfs_path, Error **errp)
> {
> ERRP_GUARD();
>@@ -724,6 +755,7 @@ static void
>vfio_iommu_iommufd_class_init(ObjectClass *klass, void *data)
> vioc->attach_device = iommufd_cdev_attach;
> vioc->detach_device = iommufd_cdev_detach;
> vioc->pci_hot_reset = iommufd_cdev_pci_hot_reset;
>+ vioc->set_dirty_page_tracking = iommufd_set_dirty_page_tracking;
> };
>
> static bool hiod_iommufd_vfio_realize(HostIOMMUDevice *hiod, void
>*opaque,
>diff --git a/backends/trace-events b/backends/trace-events
>index 4d8ac02fe7d6..28aca3b859d4 100644
>--- a/backends/trace-events
>+++ b/backends/trace-events
>@@ -16,3 +16,4 @@ iommufd_backend_unmap_dma(int iommufd,
>uint32_t ioas, uint64_t iova, uint64_t si
> iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas) " iommufd=%d
>ioas=%d"
> iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id, uint32_t
>pt_id, uint32_t flags, uint32_t hwpt_type, uint32_t len, uint64_t data_ptr,
>uint32_t out_hwpt_id, int ret) " iommufd=%d dev_id=%u pt_id=%u
>flags=0x%x hwpt_type=%u len=%u data_ptr=0x%"PRIx64" out_hwpt=%u
>(%d)"
> iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d
>id=%d (%d)"
>+iommufd_backend_set_dirty(int iommufd, uint32_t hwpt_id, bool start,
>int ret) " iommufd=%d hwpt=%u enable=%d (%d)"
>--
>2.17.2
^ permalink raw reply [flat|nested] 53+ messages in thread
* [PATCH v5 11/13] vfio/iommufd: Implement VFIOIOMMUClass::query_dirty_bitmap support
2024-07-19 12:04 [PATCH v5 00/13] hw/iommufd: IOMMUFD Dirty Tracking Joao Martins
` (9 preceding siblings ...)
2024-07-19 12:04 ` [PATCH v5 10/13] vfio/iommufd: Implement VFIOIOMMUClass::set_dirty_tracking support Joao Martins
@ 2024-07-19 12:04 ` Joao Martins
2024-07-22 6:16 ` Duan, Zhenzhong
2024-07-19 12:05 ` [PATCH v5 12/13] vfio/migration: Don't block migration device dirty tracking is unsupported Joao Martins
` (4 subsequent siblings)
15 siblings, 1 reply; 53+ messages in thread
From: Joao Martins @ 2024-07-19 12:04 UTC (permalink / raw)
To: qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Cedric Le Goater, Jason Gunthorpe, Avihai Horon, Joao Martins
ioctl(iommufd, IOMMU_HWPT_GET_DIRTY_BITMAP, arg) is the UAPI
that fetches the bitmap that tells what was dirty in an IOVA
range.
A single bitmap is allocated and used across all the hwpts
sharing an IOAS which is then used in log_sync() to set Qemu
global bitmaps.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.co>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
---
include/sysemu/iommufd.h | 4 ++++
backends/iommufd.c | 29 +++++++++++++++++++++++++++++
hw/vfio/iommufd.c | 28 ++++++++++++++++++++++++++++
backends/trace-events | 1 +
4 files changed, 62 insertions(+)
diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
index 6fb412f61144..4c4886c7787b 100644
--- a/include/sysemu/iommufd.h
+++ b/include/sysemu/iommufd.h
@@ -57,6 +57,10 @@ bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t dev_id,
Error **errp);
bool iommufd_backend_set_dirty_tracking(IOMMUFDBackend *be, uint32_t hwpt_id,
bool start, Error **errp);
+bool iommufd_backend_get_dirty_bitmap(IOMMUFDBackend *be, uint32_t hwpt_id,
+ uint64_t iova, ram_addr_t size,
+ uint64_t page_size, uint64_t *data,
+ Error **errp);
#define TYPE_HOST_IOMMU_DEVICE_IOMMUFD TYPE_HOST_IOMMU_DEVICE "-iommufd"
#endif
diff --git a/backends/iommufd.c b/backends/iommufd.c
index 1ae4751a1b2c..bd4fd49d2536 100644
--- a/backends/iommufd.c
+++ b/backends/iommufd.c
@@ -262,6 +262,35 @@ bool iommufd_backend_set_dirty_tracking(IOMMUFDBackend *be,
return true;
}
+bool iommufd_backend_get_dirty_bitmap(IOMMUFDBackend *be,
+ uint32_t hwpt_id,
+ uint64_t iova, ram_addr_t size,
+ uint64_t page_size, uint64_t *data,
+ Error **errp)
+{
+ int ret;
+ struct iommu_hwpt_get_dirty_bitmap get_dirty_bitmap = {
+ .size = sizeof(get_dirty_bitmap),
+ .hwpt_id = hwpt_id,
+ .iova = iova,
+ .length = size,
+ .page_size = page_size,
+ .data = (uintptr_t)data,
+ };
+
+ ret = ioctl(be->fd, IOMMU_HWPT_GET_DIRTY_BITMAP, &get_dirty_bitmap);
+ trace_iommufd_backend_get_dirty_bitmap(be->fd, hwpt_id, iova, size,
+ page_size, ret ? errno : 0);
+ if (ret) {
+ error_setg_errno(errp, errno,
+ "IOMMU_HWPT_GET_DIRTY_BITMAP (iova: 0x%"HWADDR_PRIx
+ " size: 0x"RAM_ADDR_FMT") failed", iova, size);
+ return false;
+ }
+
+ return true;
+}
+
bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t devid,
uint32_t *type, void *data, uint32_t len,
uint64_t *caps, Error **errp)
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 7137faaf4540..7dd5d43ce06a 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -25,6 +25,7 @@
#include "qemu/cutils.h"
#include "qemu/chardev_open.h"
#include "pci.h"
+#include "exec/ram_addr.h"
static int iommufd_cdev_map(const VFIOContainerBase *bcontainer, hwaddr iova,
ram_addr_t size, void *vaddr, bool readonly)
@@ -146,6 +147,32 @@ err:
return -EINVAL;
}
+static int iommufd_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
+ VFIOBitmap *vbmap, hwaddr iova,
+ hwaddr size, Error **errp)
+{
+ VFIOIOMMUFDContainer *container = container_of(bcontainer,
+ VFIOIOMMUFDContainer,
+ bcontainer);
+ unsigned long page_size = qemu_real_host_page_size();
+ VFIOIOASHwpt *hwpt;
+
+ QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
+ if (!iommufd_hwpt_dirty_tracking(hwpt)) {
+ continue;
+ }
+
+ if (!iommufd_backend_get_dirty_bitmap(container->be, hwpt->hwpt_id,
+ iova, size, page_size,
+ (uint64_t *)vbmap->bitmap,
+ errp)) {
+ return -EINVAL;
+ }
+ }
+
+ return 0;
+}
+
static int iommufd_cdev_getfd(const char *sysfs_path, Error **errp)
{
ERRP_GUARD();
@@ -756,6 +783,7 @@ static void vfio_iommu_iommufd_class_init(ObjectClass *klass, void *data)
vioc->detach_device = iommufd_cdev_detach;
vioc->pci_hot_reset = iommufd_cdev_pci_hot_reset;
vioc->set_dirty_page_tracking = iommufd_set_dirty_page_tracking;
+ vioc->query_dirty_bitmap = iommufd_query_dirty_bitmap;
};
static bool hiod_iommufd_vfio_realize(HostIOMMUDevice *hiod, void *opaque,
diff --git a/backends/trace-events b/backends/trace-events
index 28aca3b859d4..40811a316215 100644
--- a/backends/trace-events
+++ b/backends/trace-events
@@ -17,3 +17,4 @@ iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas) " iommufd=%d ioas=%d"
iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id, uint32_t pt_id, uint32_t flags, uint32_t hwpt_type, uint32_t len, uint64_t data_ptr, uint32_t out_hwpt_id, int ret) " iommufd=%d dev_id=%u pt_id=%u flags=0x%x hwpt_type=%u len=%u data_ptr=0x%"PRIx64" out_hwpt=%u (%d)"
iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d id=%d (%d)"
iommufd_backend_set_dirty(int iommufd, uint32_t hwpt_id, bool start, int ret) " iommufd=%d hwpt=%u enable=%d (%d)"
+iommufd_backend_get_dirty_bitmap(int iommufd, uint32_t hwpt_id, uint64_t iova, uint64_t size, uint64_t page_size, int ret) " iommufd=%d hwpt=%u iova=0x%"PRIx64" size=0x%"PRIx64" page_size=0x%"PRIx64" (%d)"
--
2.17.2
^ permalink raw reply related [flat|nested] 53+ messages in thread
* RE: [PATCH v5 11/13] vfio/iommufd: Implement VFIOIOMMUClass::query_dirty_bitmap support
2024-07-19 12:04 ` [PATCH v5 11/13] vfio/iommufd: Implement VFIOIOMMUClass::query_dirty_bitmap support Joao Martins
@ 2024-07-22 6:16 ` Duan, Zhenzhong
0 siblings, 0 replies; 53+ messages in thread
From: Duan, Zhenzhong @ 2024-07-22 6:16 UTC (permalink / raw)
To: Joao Martins, qemu-devel@nongnu.org
Cc: Liu, Yi L, Eric Auger, Alex Williamson, Cedric Le Goater,
Jason Gunthorpe, Avihai Horon
>-----Original Message-----
>From: Joao Martins <joao.m.martins@oracle.com>
>Subject: [PATCH v5 11/13] vfio/iommufd: Implement
>VFIOIOMMUClass::query_dirty_bitmap support
>
>ioctl(iommufd, IOMMU_HWPT_GET_DIRTY_BITMAP, arg) is the UAPI
>that fetches the bitmap that tells what was dirty in an IOVA
>range.
>
>A single bitmap is allocated and used across all the hwpts
>sharing an IOAS which is then used in log_sync() to set Qemu
>global bitmaps.
>
>Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>Reviewed-by: Cédric Le Goater <clg@redhat.co>
>Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Thanks
Zhenzhong
>---
> include/sysemu/iommufd.h | 4 ++++
> backends/iommufd.c | 29 +++++++++++++++++++++++++++++
> hw/vfio/iommufd.c | 28 ++++++++++++++++++++++++++++
> backends/trace-events | 1 +
> 4 files changed, 62 insertions(+)
>
>diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
>index 6fb412f61144..4c4886c7787b 100644
>--- a/include/sysemu/iommufd.h
>+++ b/include/sysemu/iommufd.h
>@@ -57,6 +57,10 @@ bool
>iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t dev_id,
> Error **errp);
> bool iommufd_backend_set_dirty_tracking(IOMMUFDBackend *be,
>uint32_t hwpt_id,
> bool start, Error **errp);
>+bool iommufd_backend_get_dirty_bitmap(IOMMUFDBackend *be,
>uint32_t hwpt_id,
>+ uint64_t iova, ram_addr_t size,
>+ uint64_t page_size, uint64_t *data,
>+ Error **errp);
>
> #define TYPE_HOST_IOMMU_DEVICE_IOMMUFD
>TYPE_HOST_IOMMU_DEVICE "-iommufd"
> #endif
>diff --git a/backends/iommufd.c b/backends/iommufd.c
>index 1ae4751a1b2c..bd4fd49d2536 100644
>--- a/backends/iommufd.c
>+++ b/backends/iommufd.c
>@@ -262,6 +262,35 @@ bool
>iommufd_backend_set_dirty_tracking(IOMMUFDBackend *be,
> return true;
> }
>
>+bool iommufd_backend_get_dirty_bitmap(IOMMUFDBackend *be,
>+ uint32_t hwpt_id,
>+ uint64_t iova, ram_addr_t size,
>+ uint64_t page_size, uint64_t *data,
>+ Error **errp)
>+{
>+ int ret;
>+ struct iommu_hwpt_get_dirty_bitmap get_dirty_bitmap = {
>+ .size = sizeof(get_dirty_bitmap),
>+ .hwpt_id = hwpt_id,
>+ .iova = iova,
>+ .length = size,
>+ .page_size = page_size,
>+ .data = (uintptr_t)data,
>+ };
>+
>+ ret = ioctl(be->fd, IOMMU_HWPT_GET_DIRTY_BITMAP,
>&get_dirty_bitmap);
>+ trace_iommufd_backend_get_dirty_bitmap(be->fd, hwpt_id, iova, size,
>+ page_size, ret ? errno : 0);
>+ if (ret) {
>+ error_setg_errno(errp, errno,
>+ "IOMMU_HWPT_GET_DIRTY_BITMAP (iova:
>0x%"HWADDR_PRIx
>+ " size: 0x"RAM_ADDR_FMT") failed", iova, size);
>+ return false;
>+ }
>+
>+ return true;
>+}
>+
> bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t
>devid,
> uint32_t *type, void *data, uint32_t len,
> uint64_t *caps, Error **errp)
>diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>index 7137faaf4540..7dd5d43ce06a 100644
>--- a/hw/vfio/iommufd.c
>+++ b/hw/vfio/iommufd.c
>@@ -25,6 +25,7 @@
> #include "qemu/cutils.h"
> #include "qemu/chardev_open.h"
> #include "pci.h"
>+#include "exec/ram_addr.h"
>
> static int iommufd_cdev_map(const VFIOContainerBase *bcontainer,
>hwaddr iova,
> ram_addr_t size, void *vaddr, bool readonly)
>@@ -146,6 +147,32 @@ err:
> return -EINVAL;
> }
>
>+static int iommufd_query_dirty_bitmap(const VFIOContainerBase
>*bcontainer,
>+ VFIOBitmap *vbmap, hwaddr iova,
>+ hwaddr size, Error **errp)
>+{
>+ VFIOIOMMUFDContainer *container = container_of(bcontainer,
>+ VFIOIOMMUFDContainer,
>+ bcontainer);
>+ unsigned long page_size = qemu_real_host_page_size();
>+ VFIOIOASHwpt *hwpt;
>+
>+ QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
>+ if (!iommufd_hwpt_dirty_tracking(hwpt)) {
>+ continue;
>+ }
>+
>+ if (!iommufd_backend_get_dirty_bitmap(container->be, hwpt-
>>hwpt_id,
>+ iova, size, page_size,
>+ (uint64_t *)vbmap->bitmap,
>+ errp)) {
>+ return -EINVAL;
>+ }
>+ }
>+
>+ return 0;
>+}
>+
> static int iommufd_cdev_getfd(const char *sysfs_path, Error **errp)
> {
> ERRP_GUARD();
>@@ -756,6 +783,7 @@ static void
>vfio_iommu_iommufd_class_init(ObjectClass *klass, void *data)
> vioc->detach_device = iommufd_cdev_detach;
> vioc->pci_hot_reset = iommufd_cdev_pci_hot_reset;
> vioc->set_dirty_page_tracking = iommufd_set_dirty_page_tracking;
>+ vioc->query_dirty_bitmap = iommufd_query_dirty_bitmap;
> };
>
> static bool hiod_iommufd_vfio_realize(HostIOMMUDevice *hiod, void
>*opaque,
>diff --git a/backends/trace-events b/backends/trace-events
>index 28aca3b859d4..40811a316215 100644
>--- a/backends/trace-events
>+++ b/backends/trace-events
>@@ -17,3 +17,4 @@ iommufd_backend_alloc_ioas(int iommufd, uint32_t
>ioas) " iommufd=%d ioas=%d"
> iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id, uint32_t
>pt_id, uint32_t flags, uint32_t hwpt_type, uint32_t len, uint64_t data_ptr,
>uint32_t out_hwpt_id, int ret) " iommufd=%d dev_id=%u pt_id=%u
>flags=0x%x hwpt_type=%u len=%u data_ptr=0x%"PRIx64" out_hwpt=%u
>(%d)"
> iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d
>id=%d (%d)"
> iommufd_backend_set_dirty(int iommufd, uint32_t hwpt_id, bool start, int
>ret) " iommufd=%d hwpt=%u enable=%d (%d)"
>+iommufd_backend_get_dirty_bitmap(int iommufd, uint32_t hwpt_id,
>uint64_t iova, uint64_t size, uint64_t page_size, int ret) " iommufd=%d
>hwpt=%u iova=0x%"PRIx64" size=0x%"PRIx64" page_size=0x%"PRIx64"
>(%d)"
>--
>2.17.2
^ permalink raw reply [flat|nested] 53+ messages in thread
* [PATCH v5 12/13] vfio/migration: Don't block migration device dirty tracking is unsupported
2024-07-19 12:04 [PATCH v5 00/13] hw/iommufd: IOMMUFD Dirty Tracking Joao Martins
` (10 preceding siblings ...)
2024-07-19 12:04 ` [PATCH v5 11/13] vfio/iommufd: Implement VFIOIOMMUClass::query_dirty_bitmap support Joao Martins
@ 2024-07-19 12:05 ` Joao Martins
2024-07-19 14:17 ` Cédric Le Goater
2024-07-19 12:05 ` [PATCH v5 13/13] vfio/common: Allow disabling device dirty page tracking Joao Martins
` (3 subsequent siblings)
15 siblings, 1 reply; 53+ messages in thread
From: Joao Martins @ 2024-07-19 12:05 UTC (permalink / raw)
To: qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Cedric Le Goater, Jason Gunthorpe, Avihai Horon, Joao Martins
By default VFIO migration is set to auto, which will support live
migration if the migration capability is set *and* also dirty page
tracking is supported.
For testing purposes one can force enable without dirty page tracking
via enable-migration=on, but that option is generally left for testing
purposes.
So starting with IOMMU dirty tracking it can use to accomodate the lack of
VF dirty page tracking allowing us to minimize the VF requirements for
migration and thus enabling migration by default for those too.
While at it change the error messages to mention IOMMU dirty tracking as
well.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
include/hw/vfio/vfio-common.h | 1 +
hw/vfio/iommufd.c | 2 +-
hw/vfio/migration.c | 11 ++++++-----
3 files changed, 8 insertions(+), 6 deletions(-)
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 7e530c7869dc..00b9e933449e 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -299,6 +299,7 @@ int vfio_devices_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
VFIOBitmap *vbmap, hwaddr iova, hwaddr size, Error **errp);
int vfio_get_dirty_bitmap(const VFIOContainerBase *bcontainer, uint64_t iova,
uint64_t size, ram_addr_t ram_addr, Error **errp);
+bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt);
/* Returns 0 on success, or a negative errno. */
bool vfio_device_get_name(VFIODevice *vbasedev, Error **errp);
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 7dd5d43ce06a..a998e8578552 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -111,7 +111,7 @@ static void iommufd_cdev_unbind_and_disconnect(VFIODevice *vbasedev)
iommufd_backend_disconnect(vbasedev->iommufd);
}
-static bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt)
+bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt)
{
return hwpt && hwpt->hwpt_flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
}
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 34d4be2ce1b1..63ffa46c9652 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -1036,16 +1036,17 @@ bool vfio_migration_realize(VFIODevice *vbasedev, Error **errp)
return !vfio_block_migration(vbasedev, err, errp);
}
- if (!vbasedev->dirty_pages_supported) {
+ if (!vbasedev->dirty_pages_supported &&
+ !iommufd_hwpt_dirty_tracking(vbasedev->hwpt)) {
if (vbasedev->enable_migration == ON_OFF_AUTO_AUTO) {
error_setg(&err,
- "%s: VFIO device doesn't support device dirty tracking",
- vbasedev->name);
+ "%s: VFIO device doesn't support device and "
+ "IOMMU dirty tracking", vbasedev->name);
goto add_blocker;
}
- warn_report("%s: VFIO device doesn't support device dirty tracking",
- vbasedev->name);
+ warn_report("%s: VFIO device doesn't support device and "
+ "IOMMU dirty tracking", vbasedev->name);
}
ret = vfio_block_multiple_devices_migration(vbasedev, errp);
--
2.17.2
^ permalink raw reply related [flat|nested] 53+ messages in thread
* Re: [PATCH v5 12/13] vfio/migration: Don't block migration device dirty tracking is unsupported
2024-07-19 12:05 ` [PATCH v5 12/13] vfio/migration: Don't block migration device dirty tracking is unsupported Joao Martins
@ 2024-07-19 14:17 ` Cédric Le Goater
2024-07-19 14:24 ` Joao Martins
0 siblings, 1 reply; 53+ messages in thread
From: Cédric Le Goater @ 2024-07-19 14:17 UTC (permalink / raw)
To: Joao Martins, qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Jason Gunthorpe, Avihai Horon
On 7/19/24 14:05, Joao Martins wrote:
> By default VFIO migration is set to auto, which will support live
> migration if the migration capability is set *and* also dirty page
> tracking is supported.
>
> For testing purposes one can force enable without dirty page tracking
> via enable-migration=on, but that option is generally left for testing
> purposes.
>
> So starting with IOMMU dirty tracking it can use to accomodate the lack of
> VF dirty page tracking allowing us to minimize the VF requirements for
> migration and thus enabling migration by default for those too.
>
> While at it change the error messages to mention IOMMU dirty tracking as
> well.
>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> ---
> include/hw/vfio/vfio-common.h | 1 +
> hw/vfio/iommufd.c | 2 +-
> hw/vfio/migration.c | 11 ++++++-----
> 3 files changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 7e530c7869dc..00b9e933449e 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -299,6 +299,7 @@ int vfio_devices_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
> VFIOBitmap *vbmap, hwaddr iova, hwaddr size, Error **errp);
> int vfio_get_dirty_bitmap(const VFIOContainerBase *bcontainer, uint64_t iova,
> uint64_t size, ram_addr_t ram_addr, Error **errp);
> +bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt);
>
> /* Returns 0 on success, or a negative errno. */
> bool vfio_device_get_name(VFIODevice *vbasedev, Error **errp);
> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
> index 7dd5d43ce06a..a998e8578552 100644
> --- a/hw/vfio/iommufd.c
> +++ b/hw/vfio/iommufd.c
> @@ -111,7 +111,7 @@ static void iommufd_cdev_unbind_and_disconnect(VFIODevice *vbasedev)
> iommufd_backend_disconnect(vbasedev->iommufd);
> }
>
> -static bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt)
> +bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt)
> {
> return hwpt && hwpt->hwpt_flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
> }
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index 34d4be2ce1b1..63ffa46c9652 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -1036,16 +1036,17 @@ bool vfio_migration_realize(VFIODevice *vbasedev, Error **errp)
> return !vfio_block_migration(vbasedev, err, errp);
> }
>
> - if (!vbasedev->dirty_pages_supported) {
> + if (!vbasedev->dirty_pages_supported &&
> + !iommufd_hwpt_dirty_tracking(vbasedev->hwpt)) {
Some platforms do not have IOMMUFD support and this call will need
some kind of abstract wrapper to reflect dirty tracking support in
the IOMMU backend.
Thanks,
C.
> if (vbasedev->enable_migration == ON_OFF_AUTO_AUTO) {
> error_setg(&err,
> - "%s: VFIO device doesn't support device dirty tracking",
> - vbasedev->name);
> + "%s: VFIO device doesn't support device and "
> + "IOMMU dirty tracking", vbasedev->name);
> goto add_blocker;
> }
>
> - warn_report("%s: VFIO device doesn't support device dirty tracking",
> - vbasedev->name);
> + warn_report("%s: VFIO device doesn't support device and "
> + "IOMMU dirty tracking", vbasedev->name);
> }
>
> ret = vfio_block_multiple_devices_migration(vbasedev, errp);
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v5 12/13] vfio/migration: Don't block migration device dirty tracking is unsupported
2024-07-19 14:17 ` Cédric Le Goater
@ 2024-07-19 14:24 ` Joao Martins
2024-07-19 15:32 ` Joao Martins
2024-07-19 17:26 ` Joao Martins
0 siblings, 2 replies; 53+ messages in thread
From: Joao Martins @ 2024-07-19 14:24 UTC (permalink / raw)
To: Cédric Le Goater, qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Jason Gunthorpe, Avihai Horon
On 19/07/2024 15:17, Cédric Le Goater wrote:
> On 7/19/24 14:05, Joao Martins wrote:
>> By default VFIO migration is set to auto, which will support live
>> migration if the migration capability is set *and* also dirty page
>> tracking is supported.
>>
>> For testing purposes one can force enable without dirty page tracking
>> via enable-migration=on, but that option is generally left for testing
>> purposes.
>>
>> So starting with IOMMU dirty tracking it can use to accomodate the lack of
>> VF dirty page tracking allowing us to minimize the VF requirements for
>> migration and thus enabling migration by default for those too.
>>
>> While at it change the error messages to mention IOMMU dirty tracking as
>> well.
>>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> ---
>> include/hw/vfio/vfio-common.h | 1 +
>> hw/vfio/iommufd.c | 2 +-
>> hw/vfio/migration.c | 11 ++++++-----
>> 3 files changed, 8 insertions(+), 6 deletions(-)
>>
>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>> index 7e530c7869dc..00b9e933449e 100644
>> --- a/include/hw/vfio/vfio-common.h
>> +++ b/include/hw/vfio/vfio-common.h
>> @@ -299,6 +299,7 @@ int vfio_devices_query_dirty_bitmap(const
>> VFIOContainerBase *bcontainer,
>> VFIOBitmap *vbmap, hwaddr iova, hwaddr size, Error **errp);
>> int vfio_get_dirty_bitmap(const VFIOContainerBase *bcontainer, uint64_t iova,
>> uint64_t size, ram_addr_t ram_addr, Error **errp);
>> +bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt);
>> /* Returns 0 on success, or a negative errno. */
>> bool vfio_device_get_name(VFIODevice *vbasedev, Error **errp);
>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>> index 7dd5d43ce06a..a998e8578552 100644
>> --- a/hw/vfio/iommufd.c
>> +++ b/hw/vfio/iommufd.c
>> @@ -111,7 +111,7 @@ static void iommufd_cdev_unbind_and_disconnect(VFIODevice
>> *vbasedev)
>> iommufd_backend_disconnect(vbasedev->iommufd);
>> }
>> -static bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt)
>> +bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt)
>> {
>> return hwpt && hwpt->hwpt_flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
>> }
>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>> index 34d4be2ce1b1..63ffa46c9652 100644
>> --- a/hw/vfio/migration.c
>> +++ b/hw/vfio/migration.c
>> @@ -1036,16 +1036,17 @@ bool vfio_migration_realize(VFIODevice *vbasedev,
>> Error **errp)
>> return !vfio_block_migration(vbasedev, err, errp);
>> }
>> - if (!vbasedev->dirty_pages_supported) {
>> + if (!vbasedev->dirty_pages_supported &&
>> + !iommufd_hwpt_dirty_tracking(vbasedev->hwpt)) {
>
>
> Some platforms do not have IOMMUFD support and this call will need
> some kind of abstract wrapper to reflect dirty tracking support in
> the IOMMU backend.
>
This was actually on purpose because only IOMMUFD presents a view of hardware
whereas type1 supporting dirty page tracking is not used as means to 'migration
is supported'.
The hwpt is nil in type1 and the helper checks that, so it should return false.
> Thanks,
>
> C.
>
>
>
>> if (vbasedev->enable_migration == ON_OFF_AUTO_AUTO) {
>> error_setg(&err,
>> - "%s: VFIO device doesn't support device dirty tracking",
>> - vbasedev->name);
>> + "%s: VFIO device doesn't support device and "
>> + "IOMMU dirty tracking", vbasedev->name);
>> goto add_blocker;
>> }
>> - warn_report("%s: VFIO device doesn't support device dirty tracking",
>> - vbasedev->name);
>> + warn_report("%s: VFIO device doesn't support device and "
>> + "IOMMU dirty tracking", vbasedev->name);
>> }
>> ret = vfio_block_multiple_devices_migration(vbasedev, errp);
>
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v5 12/13] vfio/migration: Don't block migration device dirty tracking is unsupported
2024-07-19 14:24 ` Joao Martins
@ 2024-07-19 15:32 ` Joao Martins
2024-07-19 17:26 ` Joao Martins
1 sibling, 0 replies; 53+ messages in thread
From: Joao Martins @ 2024-07-19 15:32 UTC (permalink / raw)
To: Cédric Le Goater, qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Jason Gunthorpe, Avihai Horon
On 19/07/2024 15:24, Joao Martins wrote:
> On 19/07/2024 15:17, Cédric Le Goater wrote:
>> On 7/19/24 14:05, Joao Martins wrote:
>>> By default VFIO migration is set to auto, which will support live
>>> migration if the migration capability is set *and* also dirty page
>>> tracking is supported.
>>>
>>> For testing purposes one can force enable without dirty page tracking
>>> via enable-migration=on, but that option is generally left for testing
>>> purposes.
>>>
>>> So starting with IOMMU dirty tracking it can use to accomodate the lack of
>>> VF dirty page tracking allowing us to minimize the VF requirements for
>>> migration and thus enabling migration by default for those too.
>>>
>>> While at it change the error messages to mention IOMMU dirty tracking as
>>> well.
>>>
>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>> ---
>>> include/hw/vfio/vfio-common.h | 1 +
>>> hw/vfio/iommufd.c | 2 +-
>>> hw/vfio/migration.c | 11 ++++++-----
>>> 3 files changed, 8 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>>> index 7e530c7869dc..00b9e933449e 100644
>>> --- a/include/hw/vfio/vfio-common.h
>>> +++ b/include/hw/vfio/vfio-common.h
>>> @@ -299,6 +299,7 @@ int vfio_devices_query_dirty_bitmap(const
>>> VFIOContainerBase *bcontainer,
>>> VFIOBitmap *vbmap, hwaddr iova, hwaddr size, Error **errp);
>>> int vfio_get_dirty_bitmap(const VFIOContainerBase *bcontainer, uint64_t iova,
>>> uint64_t size, ram_addr_t ram_addr, Error **errp);
>>> +bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt);
>>> /* Returns 0 on success, or a negative errno. */
>>> bool vfio_device_get_name(VFIODevice *vbasedev, Error **errp);
>>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>>> index 7dd5d43ce06a..a998e8578552 100644
>>> --- a/hw/vfio/iommufd.c
>>> +++ b/hw/vfio/iommufd.c
>>> @@ -111,7 +111,7 @@ static void iommufd_cdev_unbind_and_disconnect(VFIODevice
>>> *vbasedev)
>>> iommufd_backend_disconnect(vbasedev->iommufd);
>>> }
>>> -static bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt)
>>> +bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt)
>>> {
>>> return hwpt && hwpt->hwpt_flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
>>> }
>>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>>> index 34d4be2ce1b1..63ffa46c9652 100644
>>> --- a/hw/vfio/migration.c
>>> +++ b/hw/vfio/migration.c
>>> @@ -1036,16 +1036,17 @@ bool vfio_migration_realize(VFIODevice *vbasedev,
>>> Error **errp)
>>> return !vfio_block_migration(vbasedev, err, errp);
>>> }
>>> - if (!vbasedev->dirty_pages_supported) {
>>> + if (!vbasedev->dirty_pages_supported &&
>>> + !iommufd_hwpt_dirty_tracking(vbasedev->hwpt)) {
>>
>>
>> Some platforms do not have IOMMUFD support and this call will need
>> some kind of abstract wrapper to reflect dirty tracking support in
>> the IOMMU backend.
>>
>
> This was actually on purpose because only IOMMUFD presents a view of hardware
> whereas type1 supporting dirty page tracking is not used as means to 'migration
> is supported'.
>
> The hwpt is nil in type1 and the helper checks that, so it should return false.
>
Unless of course I misunderstood you.
This check is IOMMUFD specific, because it's one the mirroring hw support and
can be used to unblock live migration. Since initial VFIO live migration support
that type1 dirty tracking wasn't included in the checks that allow live
migration to occur. Another way of saying this is that: with type1 even if
dirty_pages_supported is true, we always add a live migration blocker in case
device doesn't have dirty page tracking. The change above is just meant to use
IOMMUFD dirty tracking which is hardware dependent and not block live migration.
>> Thanks,
>>
>> C.
>>
>>
>>
>>> if (vbasedev->enable_migration == ON_OFF_AUTO_AUTO) {
>>> error_setg(&err,
>>> - "%s: VFIO device doesn't support device dirty tracking",
>>> - vbasedev->name);
>>> + "%s: VFIO device doesn't support device and "
>>> + "IOMMU dirty tracking", vbasedev->name);
>>> goto add_blocker;
>>> }
>>> - warn_report("%s: VFIO device doesn't support device dirty tracking",
>>> - vbasedev->name);
>>> + warn_report("%s: VFIO device doesn't support device and "
>>> + "IOMMU dirty tracking", vbasedev->name);
>>> }
>>> ret = vfio_block_multiple_devices_migration(vbasedev, errp);
>>
>
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v5 12/13] vfio/migration: Don't block migration device dirty tracking is unsupported
2024-07-19 14:24 ` Joao Martins
2024-07-19 15:32 ` Joao Martins
@ 2024-07-19 17:26 ` Joao Martins
2024-07-22 14:53 ` Cédric Le Goater
1 sibling, 1 reply; 53+ messages in thread
From: Joao Martins @ 2024-07-19 17:26 UTC (permalink / raw)
To: Cédric Le Goater, qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Jason Gunthorpe, Avihai Horon
On 19/07/2024 15:24, Joao Martins wrote:
> On 19/07/2024 15:17, Cédric Le Goater wrote:
>> On 7/19/24 14:05, Joao Martins wrote:
>>> By default VFIO migration is set to auto, which will support live
>>> migration if the migration capability is set *and* also dirty page
>>> tracking is supported.
>>>
>>> For testing purposes one can force enable without dirty page tracking
>>> via enable-migration=on, but that option is generally left for testing
>>> purposes.
>>>
>>> So starting with IOMMU dirty tracking it can use to accomodate the lack of
>>> VF dirty page tracking allowing us to minimize the VF requirements for
>>> migration and thus enabling migration by default for those too.
>>>
>>> While at it change the error messages to mention IOMMU dirty tracking as
>>> well.
>>>
>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>> ---
>>> include/hw/vfio/vfio-common.h | 1 +
>>> hw/vfio/iommufd.c | 2 +-
>>> hw/vfio/migration.c | 11 ++++++-----
>>> 3 files changed, 8 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>>> index 7e530c7869dc..00b9e933449e 100644
>>> --- a/include/hw/vfio/vfio-common.h
>>> +++ b/include/hw/vfio/vfio-common.h
>>> @@ -299,6 +299,7 @@ int vfio_devices_query_dirty_bitmap(const
>>> VFIOContainerBase *bcontainer,
>>> VFIOBitmap *vbmap, hwaddr iova, hwaddr size, Error **errp);
>>> int vfio_get_dirty_bitmap(const VFIOContainerBase *bcontainer, uint64_t iova,
>>> uint64_t size, ram_addr_t ram_addr, Error **errp);
>>> +bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt);
>>> /* Returns 0 on success, or a negative errno. */
>>> bool vfio_device_get_name(VFIODevice *vbasedev, Error **errp);
>>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>>> index 7dd5d43ce06a..a998e8578552 100644
>>> --- a/hw/vfio/iommufd.c
>>> +++ b/hw/vfio/iommufd.c
>>> @@ -111,7 +111,7 @@ static void iommufd_cdev_unbind_and_disconnect(VFIODevice
>>> *vbasedev)
>>> iommufd_backend_disconnect(vbasedev->iommufd);
>>> }
>>> -static bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt)
>>> +bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt)
>>> {
>>> return hwpt && hwpt->hwpt_flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
>>> }
>>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>>> index 34d4be2ce1b1..63ffa46c9652 100644
>>> --- a/hw/vfio/migration.c
>>> +++ b/hw/vfio/migration.c
>>> @@ -1036,16 +1036,17 @@ bool vfio_migration_realize(VFIODevice *vbasedev,
>>> Error **errp)
>>> return !vfio_block_migration(vbasedev, err, errp);
>>> }
>>> - if (!vbasedev->dirty_pages_supported) {
>>> + if (!vbasedev->dirty_pages_supported &&
>>> + !iommufd_hwpt_dirty_tracking(vbasedev->hwpt)) {
>>
>>
>> Some platforms do not have IOMMUFD support and this call will need
>> some kind of abstract wrapper to reflect dirty tracking support in
>> the IOMMU backend.
>>
>
> This was actually on purpose because only IOMMUFD presents a view of hardware
> whereas type1 supporting dirty page tracking is not used as means to 'migration
> is supported'.
>
> The hwpt is nil in type1 and the helper checks that, so it should return false.
>
Oh wait, maybe you're talking about CONFIG_IOMMUFD=n which I totally didn't
consider. Maybe this would be a elegant way to address it? Looks to pass my
build with CONFIG_IOMMUFD=n
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 61dd48e79b71..422ad4a5bdd1 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -300,7 +300,14 @@ int vfio_devices_query_dirty_bitmap(const VFIOContainerBase
*bcontainer,
VFIOBitmap *vbmap, hwaddr iova, hwaddr size, Error **errp);
int vfio_get_dirty_bitmap(const VFIOContainerBase *bcontainer, uint64_t iova,
uint64_t size, ram_addr_t ram_addr, Error **errp);
+#ifdef CONFIG_IOMMUFD
bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt);
+#else
+static inline bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt)
+{
+ return false;
+}
+#endif
/* Returns 0 on success, or a negative errno. */
bool vfio_device_get_name(VFIODevice *vbasedev, Error **errp);
^ permalink raw reply related [flat|nested] 53+ messages in thread
* Re: [PATCH v5 12/13] vfio/migration: Don't block migration device dirty tracking is unsupported
2024-07-19 17:26 ` Joao Martins
@ 2024-07-22 14:53 ` Cédric Le Goater
2024-07-22 15:01 ` Joao Martins
0 siblings, 1 reply; 53+ messages in thread
From: Cédric Le Goater @ 2024-07-22 14:53 UTC (permalink / raw)
To: Joao Martins, qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Jason Gunthorpe, Avihai Horon
On 7/19/24 19:26, Joao Martins wrote:
> On 19/07/2024 15:24, Joao Martins wrote:
>> On 19/07/2024 15:17, Cédric Le Goater wrote:
>>> On 7/19/24 14:05, Joao Martins wrote:
>>>> By default VFIO migration is set to auto, which will support live
>>>> migration if the migration capability is set *and* also dirty page
>>>> tracking is supported.
>>>>
>>>> For testing purposes one can force enable without dirty page tracking
>>>> via enable-migration=on, but that option is generally left for testing
>>>> purposes.
>>>>
>>>> So starting with IOMMU dirty tracking it can use to accomodate the lack of
>>>> VF dirty page tracking allowing us to minimize the VF requirements for
>>>> migration and thus enabling migration by default for those too.
>>>>
>>>> While at it change the error messages to mention IOMMU dirty tracking as
>>>> well.
>>>>
>>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>>> ---
>>>> include/hw/vfio/vfio-common.h | 1 +
>>>> hw/vfio/iommufd.c | 2 +-
>>>> hw/vfio/migration.c | 11 ++++++-----
>>>> 3 files changed, 8 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>>>> index 7e530c7869dc..00b9e933449e 100644
>>>> --- a/include/hw/vfio/vfio-common.h
>>>> +++ b/include/hw/vfio/vfio-common.h
>>>> @@ -299,6 +299,7 @@ int vfio_devices_query_dirty_bitmap(const
>>>> VFIOContainerBase *bcontainer,
>>>> VFIOBitmap *vbmap, hwaddr iova, hwaddr size, Error **errp);
>>>> int vfio_get_dirty_bitmap(const VFIOContainerBase *bcontainer, uint64_t iova,
>>>> uint64_t size, ram_addr_t ram_addr, Error **errp);
>>>> +bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt);
>>>> /* Returns 0 on success, or a negative errno. */
>>>> bool vfio_device_get_name(VFIODevice *vbasedev, Error **errp);
>>>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>>>> index 7dd5d43ce06a..a998e8578552 100644
>>>> --- a/hw/vfio/iommufd.c
>>>> +++ b/hw/vfio/iommufd.c
>>>> @@ -111,7 +111,7 @@ static void iommufd_cdev_unbind_and_disconnect(VFIODevice
>>>> *vbasedev)
>>>> iommufd_backend_disconnect(vbasedev->iommufd);
>>>> }
>>>> -static bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt)
>>>> +bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt)
>>>> {
>>>> return hwpt && hwpt->hwpt_flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
>>>> }
>>>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>>>> index 34d4be2ce1b1..63ffa46c9652 100644
>>>> --- a/hw/vfio/migration.c
>>>> +++ b/hw/vfio/migration.c
>>>> @@ -1036,16 +1036,17 @@ bool vfio_migration_realize(VFIODevice *vbasedev,
>>>> Error **errp)
>>>> return !vfio_block_migration(vbasedev, err, errp);
>>>> }
>>>> - if (!vbasedev->dirty_pages_supported) {
>>>> + if (!vbasedev->dirty_pages_supported &&
>>>> + !iommufd_hwpt_dirty_tracking(vbasedev->hwpt)) {
>>>
>>>
>>> Some platforms do not have IOMMUFD support and this call will need
>>> some kind of abstract wrapper to reflect dirty tracking support in
>>> the IOMMU backend.
>>>
>>
>> This was actually on purpose because only IOMMUFD presents a view of hardware
>> whereas type1 supporting dirty page tracking is not used as means to 'migration
>> is supported'.
>>
>> The hwpt is nil in type1 and the helper checks that, so it should return false.
>>
>
> Oh wait, maybe you're talking about CONFIG_IOMMUFD=n which I totally didn't
> consider. Maybe this would be a elegant way to address it? Looks to pass my
> build with CONFIG_IOMMUFD=n
>
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 61dd48e79b71..422ad4a5bdd1 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -300,7 +300,14 @@ int vfio_devices_query_dirty_bitmap(const VFIOContainerBase
> *bcontainer,
> VFIOBitmap *vbmap, hwaddr iova, hwaddr size, Error **errp);
> int vfio_get_dirty_bitmap(const VFIOContainerBase *bcontainer, uint64_t iova,
> uint64_t size, ram_addr_t ram_addr, Error **errp);
> +#ifdef CONFIG_IOMMUFD
> bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt);
> +#else
> +static inline bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt)
> +{
> + return false;
> +}
> +#endif
>
> /* Returns 0 on success, or a negative errno. */
> bool vfio_device_get_name(VFIODevice *vbasedev, Error **errp);
>
hmm, no. You will need to introduce a new Host IOMMU device capability,
something like :
HOST_IOMMU_DEVICE_CAP_DIRTY_TRACKING,
Then, introduce an helper routine to check the capability :
return hiodc->get_cap( ... HOST_IOMMU_DEVICE_CAP_DIRTY_TRACKING...)
and replace the iommufd_hwpt_dirty_tracking call with it.
Yeah I know, it's cumbersome but it's cleaner !
That's not a major problem in the series. I can address it at the end
to avoid a resend. First, let's get a R-b on all other patches.
Thanks,
C.
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v5 12/13] vfio/migration: Don't block migration device dirty tracking is unsupported
2024-07-22 14:53 ` Cédric Le Goater
@ 2024-07-22 15:01 ` Joao Martins
2024-07-22 15:13 ` Cédric Le Goater
0 siblings, 1 reply; 53+ messages in thread
From: Joao Martins @ 2024-07-22 15:01 UTC (permalink / raw)
To: Cédric Le Goater, qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Jason Gunthorpe, Avihai Horon
On 22/07/2024 15:53, Cédric Le Goater wrote:
> On 7/19/24 19:26, Joao Martins wrote:
>> On 19/07/2024 15:24, Joao Martins wrote:
>>> On 19/07/2024 15:17, Cédric Le Goater wrote:
>>>> On 7/19/24 14:05, Joao Martins wrote:
>>>>> By default VFIO migration is set to auto, which will support live
>>>>> migration if the migration capability is set *and* also dirty page
>>>>> tracking is supported.
>>>>>
>>>>> For testing purposes one can force enable without dirty page tracking
>>>>> via enable-migration=on, but that option is generally left for testing
>>>>> purposes.
>>>>>
>>>>> So starting with IOMMU dirty tracking it can use to accomodate the lack of
>>>>> VF dirty page tracking allowing us to minimize the VF requirements for
>>>>> migration and thus enabling migration by default for those too.
>>>>>
>>>>> While at it change the error messages to mention IOMMU dirty tracking as
>>>>> well.
>>>>>
>>>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>>>> ---
>>>>> include/hw/vfio/vfio-common.h | 1 +
>>>>> hw/vfio/iommufd.c | 2 +-
>>>>> hw/vfio/migration.c | 11 ++++++-----
>>>>> 3 files changed, 8 insertions(+), 6 deletions(-)
>>>>>
>>>>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>>>>> index 7e530c7869dc..00b9e933449e 100644
>>>>> --- a/include/hw/vfio/vfio-common.h
>>>>> +++ b/include/hw/vfio/vfio-common.h
>>>>> @@ -299,6 +299,7 @@ int vfio_devices_query_dirty_bitmap(const
>>>>> VFIOContainerBase *bcontainer,
>>>>> VFIOBitmap *vbmap, hwaddr iova, hwaddr size, Error **errp);
>>>>> int vfio_get_dirty_bitmap(const VFIOContainerBase *bcontainer, uint64_t
>>>>> iova,
>>>>> uint64_t size, ram_addr_t ram_addr, Error
>>>>> **errp);
>>>>> +bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt);
>>>>> /* Returns 0 on success, or a negative errno. */
>>>>> bool vfio_device_get_name(VFIODevice *vbasedev, Error **errp);
>>>>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>>>>> index 7dd5d43ce06a..a998e8578552 100644
>>>>> --- a/hw/vfio/iommufd.c
>>>>> +++ b/hw/vfio/iommufd.c
>>>>> @@ -111,7 +111,7 @@ static void iommufd_cdev_unbind_and_disconnect(VFIODevice
>>>>> *vbasedev)
>>>>> iommufd_backend_disconnect(vbasedev->iommufd);
>>>>> }
>>>>> -static bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt)
>>>>> +bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt)
>>>>> {
>>>>> return hwpt && hwpt->hwpt_flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
>>>>> }
>>>>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>>>>> index 34d4be2ce1b1..63ffa46c9652 100644
>>>>> --- a/hw/vfio/migration.c
>>>>> +++ b/hw/vfio/migration.c
>>>>> @@ -1036,16 +1036,17 @@ bool vfio_migration_realize(VFIODevice *vbasedev,
>>>>> Error **errp)
>>>>> return !vfio_block_migration(vbasedev, err, errp);
>>>>> }
>>>>> - if (!vbasedev->dirty_pages_supported) {
>>>>> + if (!vbasedev->dirty_pages_supported &&
>>>>> + !iommufd_hwpt_dirty_tracking(vbasedev->hwpt)) {
>>>>
>>>>
>>>> Some platforms do not have IOMMUFD support and this call will need
>>>> some kind of abstract wrapper to reflect dirty tracking support in
>>>> the IOMMU backend.
>>>>
>>>
>>> This was actually on purpose because only IOMMUFD presents a view of hardware
>>> whereas type1 supporting dirty page tracking is not used as means to 'migration
>>> is supported'.
>>>
>>> The hwpt is nil in type1 and the helper checks that, so it should return false.
>>>
>>
>> Oh wait, maybe you're talking about CONFIG_IOMMUFD=n which I totally didn't
>> consider. Maybe this would be a elegant way to address it? Looks to pass my
>> build with CONFIG_IOMMUFD=n
>>
>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>> index 61dd48e79b71..422ad4a5bdd1 100644
>> --- a/include/hw/vfio/vfio-common.h
>> +++ b/include/hw/vfio/vfio-common.h
>> @@ -300,7 +300,14 @@ int vfio_devices_query_dirty_bitmap(const VFIOContainerBase
>> *bcontainer,
>> VFIOBitmap *vbmap, hwaddr iova, hwaddr size, Error **errp);
>> int vfio_get_dirty_bitmap(const VFIOContainerBase *bcontainer, uint64_t iova,
>> uint64_t size, ram_addr_t ram_addr, Error **errp);
>> +#ifdef CONFIG_IOMMUFD
>> bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt);
>> +#else
>> +static inline bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt)
>> +{
>> + return false;
>> +}
>> +#endif
>>
>> /* Returns 0 on success, or a negative errno. */
>> bool vfio_device_get_name(VFIODevice *vbasedev, Error **errp);
>>
>
> hmm, no. You will need to introduce a new Host IOMMU device capability,
> something like :
>
> HOST_IOMMU_DEVICE_CAP_DIRTY_TRACKING,
>
> Then, introduce an helper routine to check the capability :
>
> return hiodc->get_cap( ... HOST_IOMMU_DEVICE_CAP_DIRTY_TRACKING...)
>
> and replace the iommufd_hwpt_dirty_tracking call with it.
>
> Yeah I know, it's cumbersome but it's cleaner !
>
Funny you mention it, because that's what I did in v3:
https://lore.kernel.org/qemu-devel/20240708143420.16953-9-joao.m.martins@oracle.com/
But it was suggested to drop (I am assuming to avoid complexity)
> That's not a major problem in the series. I can address it at the end
> to avoid a resend. First, let's get a R-b on all other patches.
>
> Thanks,
>
> C.
>
>
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v5 12/13] vfio/migration: Don't block migration device dirty tracking is unsupported
2024-07-22 15:01 ` Joao Martins
@ 2024-07-22 15:13 ` Cédric Le Goater
2024-07-22 15:42 ` Joao Martins
0 siblings, 1 reply; 53+ messages in thread
From: Cédric Le Goater @ 2024-07-22 15:13 UTC (permalink / raw)
To: Joao Martins, qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Jason Gunthorpe, Avihai Horon
On 7/22/24 17:01, Joao Martins wrote:
> On 22/07/2024 15:53, Cédric Le Goater wrote:
>> On 7/19/24 19:26, Joao Martins wrote:
>>> On 19/07/2024 15:24, Joao Martins wrote:
>>>> On 19/07/2024 15:17, Cédric Le Goater wrote:
>>>>> On 7/19/24 14:05, Joao Martins wrote:
>>>>>> By default VFIO migration is set to auto, which will support live
>>>>>> migration if the migration capability is set *and* also dirty page
>>>>>> tracking is supported.
>>>>>>
>>>>>> For testing purposes one can force enable without dirty page tracking
>>>>>> via enable-migration=on, but that option is generally left for testing
>>>>>> purposes.
>>>>>>
>>>>>> So starting with IOMMU dirty tracking it can use to accomodate the lack of
>>>>>> VF dirty page tracking allowing us to minimize the VF requirements for
>>>>>> migration and thus enabling migration by default for those too.
>>>>>>
>>>>>> While at it change the error messages to mention IOMMU dirty tracking as
>>>>>> well.
>>>>>>
>>>>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>>>>> ---
>>>>>> include/hw/vfio/vfio-common.h | 1 +
>>>>>> hw/vfio/iommufd.c | 2 +-
>>>>>> hw/vfio/migration.c | 11 ++++++-----
>>>>>> 3 files changed, 8 insertions(+), 6 deletions(-)
>>>>>>
>>>>>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>>>>>> index 7e530c7869dc..00b9e933449e 100644
>>>>>> --- a/include/hw/vfio/vfio-common.h
>>>>>> +++ b/include/hw/vfio/vfio-common.h
>>>>>> @@ -299,6 +299,7 @@ int vfio_devices_query_dirty_bitmap(const
>>>>>> VFIOContainerBase *bcontainer,
>>>>>> VFIOBitmap *vbmap, hwaddr iova, hwaddr size, Error **errp);
>>>>>> int vfio_get_dirty_bitmap(const VFIOContainerBase *bcontainer, uint64_t
>>>>>> iova,
>>>>>> uint64_t size, ram_addr_t ram_addr, Error
>>>>>> **errp);
>>>>>> +bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt);
>>>>>> /* Returns 0 on success, or a negative errno. */
>>>>>> bool vfio_device_get_name(VFIODevice *vbasedev, Error **errp);
>>>>>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>>>>>> index 7dd5d43ce06a..a998e8578552 100644
>>>>>> --- a/hw/vfio/iommufd.c
>>>>>> +++ b/hw/vfio/iommufd.c
>>>>>> @@ -111,7 +111,7 @@ static void iommufd_cdev_unbind_and_disconnect(VFIODevice
>>>>>> *vbasedev)
>>>>>> iommufd_backend_disconnect(vbasedev->iommufd);
>>>>>> }
>>>>>> -static bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt)
>>>>>> +bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt)
>>>>>> {
>>>>>> return hwpt && hwpt->hwpt_flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
>>>>>> }
>>>>>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>>>>>> index 34d4be2ce1b1..63ffa46c9652 100644
>>>>>> --- a/hw/vfio/migration.c
>>>>>> +++ b/hw/vfio/migration.c
>>>>>> @@ -1036,16 +1036,17 @@ bool vfio_migration_realize(VFIODevice *vbasedev,
>>>>>> Error **errp)
>>>>>> return !vfio_block_migration(vbasedev, err, errp);
>>>>>> }
>>>>>> - if (!vbasedev->dirty_pages_supported) {
>>>>>> + if (!vbasedev->dirty_pages_supported &&
>>>>>> + !iommufd_hwpt_dirty_tracking(vbasedev->hwpt)) {
>>>>>
>>>>>
>>>>> Some platforms do not have IOMMUFD support and this call will need
>>>>> some kind of abstract wrapper to reflect dirty tracking support in
>>>>> the IOMMU backend.
>>>>>
>>>>
>>>> This was actually on purpose because only IOMMUFD presents a view of hardware
>>>> whereas type1 supporting dirty page tracking is not used as means to 'migration
>>>> is supported'.
>>>>
>>>> The hwpt is nil in type1 and the helper checks that, so it should return false.
>>>>
>>>
>>> Oh wait, maybe you're talking about CONFIG_IOMMUFD=n which I totally didn't
>>> consider. Maybe this would be a elegant way to address it? Looks to pass my
>>> build with CONFIG_IOMMUFD=n
>>>
>>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>>> index 61dd48e79b71..422ad4a5bdd1 100644
>>> --- a/include/hw/vfio/vfio-common.h
>>> +++ b/include/hw/vfio/vfio-common.h
>>> @@ -300,7 +300,14 @@ int vfio_devices_query_dirty_bitmap(const VFIOContainerBase
>>> *bcontainer,
>>> VFIOBitmap *vbmap, hwaddr iova, hwaddr size, Error **errp);
>>> int vfio_get_dirty_bitmap(const VFIOContainerBase *bcontainer, uint64_t iova,
>>> uint64_t size, ram_addr_t ram_addr, Error **errp);
>>> +#ifdef CONFIG_IOMMUFD
>>> bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt);
>>> +#else
>>> +static inline bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt)
>>> +{
>>> + return false;
>>> +}
>>> +#endif
>>>
>>> /* Returns 0 on success, or a negative errno. */
>>> bool vfio_device_get_name(VFIODevice *vbasedev, Error **errp);
>>>
>>
>> hmm, no. You will need to introduce a new Host IOMMU device capability,
>> something like :
>>
>> HOST_IOMMU_DEVICE_CAP_DIRTY_TRACKING,
>>
>> Then, introduce an helper routine to check the capability :
>>
>> return hiodc->get_cap( ... HOST_IOMMU_DEVICE_CAP_DIRTY_TRACKING...)
>>
>> and replace the iommufd_hwpt_dirty_tracking call with it.
>>
>> Yeah I know, it's cumbersome but it's cleaner !
>>
>
> Funny you mention it, because that's what I did in v3:
>
> https://lore.kernel.org/qemu-devel/20240708143420.16953-9-joao.m.martins@oracle.com/
>
> But it was suggested to drop (I am assuming to avoid complexity)
my bad if I did :/
we will need an helper such as :
bool vfio_device_dirty_tracking(VFIODevice *vbasedev)
{
HostIOMMUDevice *hiod = vbasedev->hiod ;
HostIOMMUDeviceClass *hiodc = HOST_IOMMU_DEVICE_GET_CLASS(hiod);
return hiodc->get_cap &&
hiodc->get_cap(hiod, HOST_IOMMU_DEVICE_CAP_DIRTY_TRACKING, NULL) == 1;
}
and something like,
static int hiod_iommufd_vfio_get_cap(HostIOMMUDevice *hiod, int cap,
Error **errp)
{
switch (cap) {
case HOST_IOMMU_DEVICE_CAP_DIRTY_TRACKING:
return !!(hiod->caps.hw_caps & IOMMU_HW_CAP_DIRTY_TRACKING);
default:
error_setg(errp, "%s: unsupported capability %x", hiod->name, cap);
return -EINVAL;
}
}
Feel free to propose your own implementation,
Thanks,
C.
>
>> That's not a major problem in the series. I can address it at the end
>> to avoid a resend. First, let's get a R-b on all other patches.
>>
>> Thanks,
>>
>> C.
>>
>>
>
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v5 12/13] vfio/migration: Don't block migration device dirty tracking is unsupported
2024-07-22 15:13 ` Cédric Le Goater
@ 2024-07-22 15:42 ` Joao Martins
2024-07-22 15:58 ` Cédric Le Goater
0 siblings, 1 reply; 53+ messages in thread
From: Joao Martins @ 2024-07-22 15:42 UTC (permalink / raw)
To: Cédric Le Goater, qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Jason Gunthorpe, Avihai Horon
On 22/07/2024 16:13, Cédric Le Goater wrote:
> On 7/22/24 17:01, Joao Martins wrote:
>> On 22/07/2024 15:53, Cédric Le Goater wrote:
>>> On 7/19/24 19:26, Joao Martins wrote:
>>>> On 19/07/2024 15:24, Joao Martins wrote:
>>>>> On 19/07/2024 15:17, Cédric Le Goater wrote:
>>>>>> On 7/19/24 14:05, Joao Martins wrote:
>>>>>>> By default VFIO migration is set to auto, which will support live
>>>>>>> migration if the migration capability is set *and* also dirty page
>>>>>>> tracking is supported.
>>>>>>>
>>>>>>> For testing purposes one can force enable without dirty page tracking
>>>>>>> via enable-migration=on, but that option is generally left for testing
>>>>>>> purposes.
>>>>>>>
>>>>>>> So starting with IOMMU dirty tracking it can use to accomodate the lack of
>>>>>>> VF dirty page tracking allowing us to minimize the VF requirements for
>>>>>>> migration and thus enabling migration by default for those too.
>>>>>>>
>>>>>>> While at it change the error messages to mention IOMMU dirty tracking as
>>>>>>> well.
>>>>>>>
>>>>>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>>>>>> ---
>>>>>>> include/hw/vfio/vfio-common.h | 1 +
>>>>>>> hw/vfio/iommufd.c | 2 +-
>>>>>>> hw/vfio/migration.c | 11 ++++++-----
>>>>>>> 3 files changed, 8 insertions(+), 6 deletions(-)
>>>>>>>
>>>>>>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>>>>>>> index 7e530c7869dc..00b9e933449e 100644
>>>>>>> --- a/include/hw/vfio/vfio-common.h
>>>>>>> +++ b/include/hw/vfio/vfio-common.h
>>>>>>> @@ -299,6 +299,7 @@ int vfio_devices_query_dirty_bitmap(const
>>>>>>> VFIOContainerBase *bcontainer,
>>>>>>> VFIOBitmap *vbmap, hwaddr iova, hwaddr size, Error
>>>>>>> **errp);
>>>>>>> int vfio_get_dirty_bitmap(const VFIOContainerBase *bcontainer, uint64_t
>>>>>>> iova,
>>>>>>> uint64_t size, ram_addr_t ram_addr, Error
>>>>>>> **errp);
>>>>>>> +bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt);
>>>>>>> /* Returns 0 on success, or a negative errno. */
>>>>>>> bool vfio_device_get_name(VFIODevice *vbasedev, Error **errp);
>>>>>>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>>>>>>> index 7dd5d43ce06a..a998e8578552 100644
>>>>>>> --- a/hw/vfio/iommufd.c
>>>>>>> +++ b/hw/vfio/iommufd.c
>>>>>>> @@ -111,7 +111,7 @@ static void
>>>>>>> iommufd_cdev_unbind_and_disconnect(VFIODevice
>>>>>>> *vbasedev)
>>>>>>> iommufd_backend_disconnect(vbasedev->iommufd);
>>>>>>> }
>>>>>>> -static bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt)
>>>>>>> +bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt)
>>>>>>> {
>>>>>>> return hwpt && hwpt->hwpt_flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
>>>>>>> }
>>>>>>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>>>>>>> index 34d4be2ce1b1..63ffa46c9652 100644
>>>>>>> --- a/hw/vfio/migration.c
>>>>>>> +++ b/hw/vfio/migration.c
>>>>>>> @@ -1036,16 +1036,17 @@ bool vfio_migration_realize(VFIODevice *vbasedev,
>>>>>>> Error **errp)
>>>>>>> return !vfio_block_migration(vbasedev, err, errp);
>>>>>>> }
>>>>>>> - if (!vbasedev->dirty_pages_supported) {
>>>>>>> + if (!vbasedev->dirty_pages_supported &&
>>>>>>> + !iommufd_hwpt_dirty_tracking(vbasedev->hwpt)) {
>>>>>>
>>>>>>
>>>>>> Some platforms do not have IOMMUFD support and this call will need
>>>>>> some kind of abstract wrapper to reflect dirty tracking support in
>>>>>> the IOMMU backend.
>>>>>>
>>>>>
>>>>> This was actually on purpose because only IOMMUFD presents a view of hardware
>>>>> whereas type1 supporting dirty page tracking is not used as means to
>>>>> 'migration
>>>>> is supported'.
>>>>>
>>>>> The hwpt is nil in type1 and the helper checks that, so it should return
>>>>> false.
>>>>>
>>>>
>>>> Oh wait, maybe you're talking about CONFIG_IOMMUFD=n which I totally didn't
>>>> consider. Maybe this would be a elegant way to address it? Looks to pass my
>>>> build with CONFIG_IOMMUFD=n
>>>>
>>>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>>>> index 61dd48e79b71..422ad4a5bdd1 100644
>>>> --- a/include/hw/vfio/vfio-common.h
>>>> +++ b/include/hw/vfio/vfio-common.h
>>>> @@ -300,7 +300,14 @@ int vfio_devices_query_dirty_bitmap(const
>>>> VFIOContainerBase
>>>> *bcontainer,
>>>> VFIOBitmap *vbmap, hwaddr iova, hwaddr size, Error **errp);
>>>> int vfio_get_dirty_bitmap(const VFIOContainerBase *bcontainer, uint64_t
>>>> iova,
>>>> uint64_t size, ram_addr_t ram_addr, Error **errp);
>>>> +#ifdef CONFIG_IOMMUFD
>>>> bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt);
>>>> +#else
>>>> +static inline bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt)
>>>> +{
>>>> + return false;
>>>> +}
>>>> +#endif
>>>>
>>>> /* Returns 0 on success, or a negative errno. */
>>>> bool vfio_device_get_name(VFIODevice *vbasedev, Error **errp);
>>>>
>>>
>>> hmm, no. You will need to introduce a new Host IOMMU device capability,
>>> something like :
>>>
>>> HOST_IOMMU_DEVICE_CAP_DIRTY_TRACKING,
>>>
>>> Then, introduce an helper routine to check the capability :
>>>
>>> return hiodc->get_cap( ... HOST_IOMMU_DEVICE_CAP_DIRTY_TRACKING...)
>>> and replace the iommufd_hwpt_dirty_tracking call with it.
>>>
>>> Yeah I know, it's cumbersome but it's cleaner !
>>>
>>
>> Funny you mention it, because that's what I did in v3:
>>
>> https://lore.kernel.org/qemu-devel/20240708143420.16953-9-joao.m.martins@oracle.com/
>>
>> But it was suggested to drop (I am assuming to avoid complexity)
>
> my bad if I did :/
>
No worries it is all part of review -- I think Zhenzhong proposed with good
intentions, and I probably didn't think too hard about the consequences on
layering with the HIOD.
> we will need an helper such as :
>
> bool vfio_device_dirty_tracking(VFIODevice *vbasedev)
> {
> HostIOMMUDevice *hiod = vbasedev->hiod ;
> HostIOMMUDeviceClass *hiodc = HOST_IOMMU_DEVICE_GET_CLASS(hiod);
>
> return hiodc->get_cap &&
> hiodc->get_cap(hiod, HOST_IOMMU_DEVICE_CAP_DIRTY_TRACKING, NULL) == 1;
> }
>
> and something like,
>
> static int hiod_iommufd_vfio_get_cap(HostIOMMUDevice *hiod, int cap,
> Error **errp)
> {
> switch (cap) {
> case HOST_IOMMU_DEVICE_CAP_DIRTY_TRACKING:
> return !!(hiod->caps.hw_caps & IOMMU_HW_CAP_DIRTY_TRACKING);
> default:
> error_setg(errp, "%s: unsupported capability %x", hiod->name, cap);
> return -EINVAL;
> }
> }
>
> Feel free to propose your own implementation,
>
Actually it's close to what I had in v3 link, except the new helper (the name
vfio_device_dirty_tracking is a bit misleading I would call it
vfio_device_iommu_dirty_tracking)
I can follow-up with this improvement in case this gets merged as is, or include
it in the next version if you prefer to adjourn this series into 9.2 (given the
lack of time to get everything right).
Joao
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v5 12/13] vfio/migration: Don't block migration device dirty tracking is unsupported
2024-07-22 15:42 ` Joao Martins
@ 2024-07-22 15:58 ` Cédric Le Goater
2024-07-22 16:29 ` Joao Martins
0 siblings, 1 reply; 53+ messages in thread
From: Cédric Le Goater @ 2024-07-22 15:58 UTC (permalink / raw)
To: Joao Martins, qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Jason Gunthorpe, Avihai Horon
On 7/22/24 17:42, Joao Martins wrote:
> On 22/07/2024 16:13, Cédric Le Goater wrote:
>> On 7/22/24 17:01, Joao Martins wrote:
>>> On 22/07/2024 15:53, Cédric Le Goater wrote:
>>>> On 7/19/24 19:26, Joao Martins wrote:
>>>>> On 19/07/2024 15:24, Joao Martins wrote:
>>>>>> On 19/07/2024 15:17, Cédric Le Goater wrote:
>>>>>>> On 7/19/24 14:05, Joao Martins wrote:
>>>>>>>> By default VFIO migration is set to auto, which will support live
>>>>>>>> migration if the migration capability is set *and* also dirty page
>>>>>>>> tracking is supported.
>>>>>>>>
>>>>>>>> For testing purposes one can force enable without dirty page tracking
>>>>>>>> via enable-migration=on, but that option is generally left for testing
>>>>>>>> purposes.
>>>>>>>>
>>>>>>>> So starting with IOMMU dirty tracking it can use to accomodate the lack of
>>>>>>>> VF dirty page tracking allowing us to minimize the VF requirements for
>>>>>>>> migration and thus enabling migration by default for those too.
>>>>>>>>
>>>>>>>> While at it change the error messages to mention IOMMU dirty tracking as
>>>>>>>> well.
>>>>>>>>
>>>>>>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>>>>>>> ---
>>>>>>>> include/hw/vfio/vfio-common.h | 1 +
>>>>>>>> hw/vfio/iommufd.c | 2 +-
>>>>>>>> hw/vfio/migration.c | 11 ++++++-----
>>>>>>>> 3 files changed, 8 insertions(+), 6 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>>>>>>>> index 7e530c7869dc..00b9e933449e 100644
>>>>>>>> --- a/include/hw/vfio/vfio-common.h
>>>>>>>> +++ b/include/hw/vfio/vfio-common.h
>>>>>>>> @@ -299,6 +299,7 @@ int vfio_devices_query_dirty_bitmap(const
>>>>>>>> VFIOContainerBase *bcontainer,
>>>>>>>> VFIOBitmap *vbmap, hwaddr iova, hwaddr size, Error
>>>>>>>> **errp);
>>>>>>>> int vfio_get_dirty_bitmap(const VFIOContainerBase *bcontainer, uint64_t
>>>>>>>> iova,
>>>>>>>> uint64_t size, ram_addr_t ram_addr, Error
>>>>>>>> **errp);
>>>>>>>> +bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt);
>>>>>>>> /* Returns 0 on success, or a negative errno. */
>>>>>>>> bool vfio_device_get_name(VFIODevice *vbasedev, Error **errp);
>>>>>>>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>>>>>>>> index 7dd5d43ce06a..a998e8578552 100644
>>>>>>>> --- a/hw/vfio/iommufd.c
>>>>>>>> +++ b/hw/vfio/iommufd.c
>>>>>>>> @@ -111,7 +111,7 @@ static void
>>>>>>>> iommufd_cdev_unbind_and_disconnect(VFIODevice
>>>>>>>> *vbasedev)
>>>>>>>> iommufd_backend_disconnect(vbasedev->iommufd);
>>>>>>>> }
>>>>>>>> -static bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt)
>>>>>>>> +bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt)
>>>>>>>> {
>>>>>>>> return hwpt && hwpt->hwpt_flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
>>>>>>>> }
>>>>>>>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>>>>>>>> index 34d4be2ce1b1..63ffa46c9652 100644
>>>>>>>> --- a/hw/vfio/migration.c
>>>>>>>> +++ b/hw/vfio/migration.c
>>>>>>>> @@ -1036,16 +1036,17 @@ bool vfio_migration_realize(VFIODevice *vbasedev,
>>>>>>>> Error **errp)
>>>>>>>> return !vfio_block_migration(vbasedev, err, errp);
>>>>>>>> }
>>>>>>>> - if (!vbasedev->dirty_pages_supported) {
>>>>>>>> + if (!vbasedev->dirty_pages_supported &&
>>>>>>>> + !iommufd_hwpt_dirty_tracking(vbasedev->hwpt)) {
>>>>>>>
>>>>>>>
>>>>>>> Some platforms do not have IOMMUFD support and this call will need
>>>>>>> some kind of abstract wrapper to reflect dirty tracking support in
>>>>>>> the IOMMU backend.
>>>>>>>
>>>>>>
>>>>>> This was actually on purpose because only IOMMUFD presents a view of hardware
>>>>>> whereas type1 supporting dirty page tracking is not used as means to
>>>>>> 'migration
>>>>>> is supported'.
>>>>>>
>>>>>> The hwpt is nil in type1 and the helper checks that, so it should return
>>>>>> false.
>>>>>>
>>>>>
>>>>> Oh wait, maybe you're talking about CONFIG_IOMMUFD=n which I totally didn't
>>>>> consider. Maybe this would be a elegant way to address it? Looks to pass my
>>>>> build with CONFIG_IOMMUFD=n
>>>>>
>>>>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>>>>> index 61dd48e79b71..422ad4a5bdd1 100644
>>>>> --- a/include/hw/vfio/vfio-common.h
>>>>> +++ b/include/hw/vfio/vfio-common.h
>>>>> @@ -300,7 +300,14 @@ int vfio_devices_query_dirty_bitmap(const
>>>>> VFIOContainerBase
>>>>> *bcontainer,
>>>>> VFIOBitmap *vbmap, hwaddr iova, hwaddr size, Error **errp);
>>>>> int vfio_get_dirty_bitmap(const VFIOContainerBase *bcontainer, uint64_t
>>>>> iova,
>>>>> uint64_t size, ram_addr_t ram_addr, Error **errp);
>>>>> +#ifdef CONFIG_IOMMUFD
>>>>> bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt);
>>>>> +#else
>>>>> +static inline bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt)
>>>>> +{
>>>>> + return false;
>>>>> +}
>>>>> +#endif
>>>>>
>>>>> /* Returns 0 on success, or a negative errno. */
>>>>> bool vfio_device_get_name(VFIODevice *vbasedev, Error **errp);
>>>>>
>>>>
>>>> hmm, no. You will need to introduce a new Host IOMMU device capability,
>>>> something like :
>>>>
>>>> HOST_IOMMU_DEVICE_CAP_DIRTY_TRACKING,
>>>>
>>>> Then, introduce an helper routine to check the capability :
>>>>
>>>> return hiodc->get_cap( ... HOST_IOMMU_DEVICE_CAP_DIRTY_TRACKING...)
>>>> and replace the iommufd_hwpt_dirty_tracking call with it.
>>>>
>>>> Yeah I know, it's cumbersome but it's cleaner !
>>>>
>>>
>>> Funny you mention it, because that's what I did in v3:
>>>
>>> https://lore.kernel.org/qemu-devel/20240708143420.16953-9-joao.m.martins@oracle.com/
>>>
>>> But it was suggested to drop (I am assuming to avoid complexity)
>>
>> my bad if I did :/
>>
>
> No worries it is all part of review -- I think Zhenzhong proposed with good
> intentions, and I probably didn't think too hard about the consequences on
> layering with the HIOD.
>
>> we will need an helper such as :
>>
>> bool vfio_device_dirty_tracking(VFIODevice *vbasedev)
>> {
>> HostIOMMUDevice *hiod = vbasedev->hiod ;
>> HostIOMMUDeviceClass *hiodc = HOST_IOMMU_DEVICE_GET_CLASS(hiod);
>>
>> return hiodc->get_cap &&
>> hiodc->get_cap(hiod, HOST_IOMMU_DEVICE_CAP_DIRTY_TRACKING, NULL) == 1;
>> }
>>
>> and something like,
>>
>> static int hiod_iommufd_vfio_get_cap(HostIOMMUDevice *hiod, int cap,
>> Error **errp)
>> {
>> switch (cap) {
>> case HOST_IOMMU_DEVICE_CAP_DIRTY_TRACKING:
>> return !!(hiod->caps.hw_caps & IOMMU_HW_CAP_DIRTY_TRACKING);
>> default:
>> error_setg(errp, "%s: unsupported capability %x", hiod->name, cap);
>> return -EINVAL;
>> }
>> }
>>
>> Feel free to propose your own implementation,
>>
>
> Actually it's close to what I had in v3 link, except the new helper (the name
> vfio_device_dirty_tracking is a bit misleading I would call it
> vfio_device_iommu_dirty_tracking)
Let's call it vfio_device_iommu_dirty_tracking.
> I can follow-up with this improvement in case this gets merged as is,
I can't merge as is since it break compiles (I am excluding the v5.1 patch).
Which means I would prefer a v6 please.
> or include
> it in the next version if you prefer to adjourn this series into 9.2 (given the
> lack of time to get everything right).
There aren't many open questions left.
* PATCH 5 lacks a R-b. I would feel more confortable if ZhenZhong or
Eric acked the changes.
* PATCH 9 is slightly hacky with the use of vfio_device_get_aw_bits().
I think it's minor. I would also feel more confortable if ZhenZhong
acked the changes.
* PATCH 12 needs the fix we have been talking about.
* PATCH 13 is for dev/debug.
What's important is to avoid introducing regressions in the current behavior,
that is when not using IOMMUFD. It looks fine on that aspect AFAICT.
Thanks,
C.
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v5 12/13] vfio/migration: Don't block migration device dirty tracking is unsupported
2024-07-22 15:58 ` Cédric Le Goater
@ 2024-07-22 16:29 ` Joao Martins
2024-07-22 17:04 ` Cédric Le Goater
0 siblings, 1 reply; 53+ messages in thread
From: Joao Martins @ 2024-07-22 16:29 UTC (permalink / raw)
To: Cédric Le Goater, qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Jason Gunthorpe, Avihai Horon
On 22/07/2024 16:58, Cédric Le Goater wrote:
> On 7/22/24 17:42, Joao Martins wrote:
>> On 22/07/2024 16:13, Cédric Le Goater wrote:
>>> On 7/22/24 17:01, Joao Martins wrote:
>>>> On 22/07/2024 15:53, Cédric Le Goater wrote:
>>>>> On 7/19/24 19:26, Joao Martins wrote:
>>>>>> On 19/07/2024 15:24, Joao Martins wrote:
>>>>>>> On 19/07/2024 15:17, Cédric Le Goater wrote:
>>>>>>>> On 7/19/24 14:05, Joao Martins wrote:
>>>>>>>>> By default VFIO migration is set to auto, which will support live
>>>>>>>>> migration if the migration capability is set *and* also dirty page
>>>>>>>>> tracking is supported.
>>>>>>>>>
>>>>>>>>> For testing purposes one can force enable without dirty page tracking
>>>>>>>>> via enable-migration=on, but that option is generally left for testing
>>>>>>>>> purposes.
>>>>>>>>>
>>>>>>>>> So starting with IOMMU dirty tracking it can use to accomodate the lack of
>>>>>>>>> VF dirty page tracking allowing us to minimize the VF requirements for
>>>>>>>>> migration and thus enabling migration by default for those too.
>>>>>>>>>
>>>>>>>>> While at it change the error messages to mention IOMMU dirty tracking as
>>>>>>>>> well.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>>>>>>>> ---
>>>>>>>>> include/hw/vfio/vfio-common.h | 1 +
>>>>>>>>> hw/vfio/iommufd.c | 2 +-
>>>>>>>>> hw/vfio/migration.c | 11 ++++++-----
>>>>>>>>> 3 files changed, 8 insertions(+), 6 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>>>>>>>>> index 7e530c7869dc..00b9e933449e 100644
>>>>>>>>> --- a/include/hw/vfio/vfio-common.h
>>>>>>>>> +++ b/include/hw/vfio/vfio-common.h
>>>>>>>>> @@ -299,6 +299,7 @@ int vfio_devices_query_dirty_bitmap(const
>>>>>>>>> VFIOContainerBase *bcontainer,
>>>>>>>>> VFIOBitmap *vbmap, hwaddr iova, hwaddr size, Error
>>>>>>>>> **errp);
>>>>>>>>> int vfio_get_dirty_bitmap(const VFIOContainerBase *bcontainer,
>>>>>>>>> uint64_t
>>>>>>>>> iova,
>>>>>>>>> uint64_t size, ram_addr_t ram_addr, Error
>>>>>>>>> **errp);
>>>>>>>>> +bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt);
>>>>>>>>> /* Returns 0 on success, or a negative errno. */
>>>>>>>>> bool vfio_device_get_name(VFIODevice *vbasedev, Error **errp);
>>>>>>>>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>>>>>>>>> index 7dd5d43ce06a..a998e8578552 100644
>>>>>>>>> --- a/hw/vfio/iommufd.c
>>>>>>>>> +++ b/hw/vfio/iommufd.c
>>>>>>>>> @@ -111,7 +111,7 @@ static void
>>>>>>>>> iommufd_cdev_unbind_and_disconnect(VFIODevice
>>>>>>>>> *vbasedev)
>>>>>>>>> iommufd_backend_disconnect(vbasedev->iommufd);
>>>>>>>>> }
>>>>>>>>> -static bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt)
>>>>>>>>> +bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt)
>>>>>>>>> {
>>>>>>>>> return hwpt && hwpt->hwpt_flags &
>>>>>>>>> IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
>>>>>>>>> }
>>>>>>>>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>>>>>>>>> index 34d4be2ce1b1..63ffa46c9652 100644
>>>>>>>>> --- a/hw/vfio/migration.c
>>>>>>>>> +++ b/hw/vfio/migration.c
>>>>>>>>> @@ -1036,16 +1036,17 @@ bool vfio_migration_realize(VFIODevice *vbasedev,
>>>>>>>>> Error **errp)
>>>>>>>>> return !vfio_block_migration(vbasedev, err, errp);
>>>>>>>>> }
>>>>>>>>> - if (!vbasedev->dirty_pages_supported) {
>>>>>>>>> + if (!vbasedev->dirty_pages_supported &&
>>>>>>>>> + !iommufd_hwpt_dirty_tracking(vbasedev->hwpt)) {
>>>>>>>>
>>>>>>>>
>>>>>>>> Some platforms do not have IOMMUFD support and this call will need
>>>>>>>> some kind of abstract wrapper to reflect dirty tracking support in
>>>>>>>> the IOMMU backend.
>>>>>>>>
>>>>>>>
>>>>>>> This was actually on purpose because only IOMMUFD presents a view of
>>>>>>> hardware
>>>>>>> whereas type1 supporting dirty page tracking is not used as means to
>>>>>>> 'migration
>>>>>>> is supported'.
>>>>>>>
>>>>>>> The hwpt is nil in type1 and the helper checks that, so it should return
>>>>>>> false.
>>>>>>>
>>>>>>
>>>>>> Oh wait, maybe you're talking about CONFIG_IOMMUFD=n which I totally didn't
>>>>>> consider. Maybe this would be a elegant way to address it? Looks to pass my
>>>>>> build with CONFIG_IOMMUFD=n
>>>>>>
>>>>>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>>>>>> index 61dd48e79b71..422ad4a5bdd1 100644
>>>>>> --- a/include/hw/vfio/vfio-common.h
>>>>>> +++ b/include/hw/vfio/vfio-common.h
>>>>>> @@ -300,7 +300,14 @@ int vfio_devices_query_dirty_bitmap(const
>>>>>> VFIOContainerBase
>>>>>> *bcontainer,
>>>>>> VFIOBitmap *vbmap, hwaddr iova, hwaddr size, Error
>>>>>> **errp);
>>>>>> int vfio_get_dirty_bitmap(const VFIOContainerBase *bcontainer, uint64_t
>>>>>> iova,
>>>>>> uint64_t size, ram_addr_t ram_addr, Error
>>>>>> **errp);
>>>>>> +#ifdef CONFIG_IOMMUFD
>>>>>> bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt);
>>>>>> +#else
>>>>>> +static inline bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt)
>>>>>> +{
>>>>>> + return false;
>>>>>> +}
>>>>>> +#endif
>>>>>>
>>>>>> /* Returns 0 on success, or a negative errno. */
>>>>>> bool vfio_device_get_name(VFIODevice *vbasedev, Error **errp);
>>>>>>
>>>>>
>>>>> hmm, no. You will need to introduce a new Host IOMMU device capability,
>>>>> something like :
>>>>>
>>>>> HOST_IOMMU_DEVICE_CAP_DIRTY_TRACKING,
>>>>>
>>>>> Then, introduce an helper routine to check the capability :
>>>>>
>>>>> return hiodc->get_cap( ... HOST_IOMMU_DEVICE_CAP_DIRTY_TRACKING...)
>>>>> and replace the iommufd_hwpt_dirty_tracking call with it.
>>>>>
>>>>> Yeah I know, it's cumbersome but it's cleaner !
>>>>>
>>>>
>>>> Funny you mention it, because that's what I did in v3:
>>>>
>>>> https://lore.kernel.org/qemu-devel/20240708143420.16953-9-joao.m.martins@oracle.com/
>>>>
>>>> But it was suggested to drop (I am assuming to avoid complexity)
>>>
>>> my bad if I did :/
>>>
>>
>> No worries it is all part of review -- I think Zhenzhong proposed with good
>> intentions, and I probably didn't think too hard about the consequences on
>> layering with the HIOD.
>>
>>> we will need an helper such as :
>>>
>>> bool vfio_device_dirty_tracking(VFIODevice *vbasedev)
>>> {
>>> HostIOMMUDevice *hiod = vbasedev->hiod ;
>>> HostIOMMUDeviceClass *hiodc = HOST_IOMMU_DEVICE_GET_CLASS(hiod);
>>>
>>> return hiodc->get_cap &&
>>> hiodc->get_cap(hiod, HOST_IOMMU_DEVICE_CAP_DIRTY_TRACKING, NULL)
>>> == 1;
>>> }
>>>
>>> and something like,
>>>
>>> static int hiod_iommufd_vfio_get_cap(HostIOMMUDevice *hiod, int cap,
>>> Error **errp)
>>> {
>>> switch (cap) {
>>> case HOST_IOMMU_DEVICE_CAP_DIRTY_TRACKING:
>>> return !!(hiod->caps.hw_caps & IOMMU_HW_CAP_DIRTY_TRACKING);
>>> default:
>>> error_setg(errp, "%s: unsupported capability %x", hiod->name, cap);
>>> return -EINVAL;
>>> }
>>> }
>>>
>>> Feel free to propose your own implementation,
>>>
>>
>> Actually it's close to what I had in v3 link, except the new helper (the name
>> vfio_device_dirty_tracking is a bit misleading I would call it
>> vfio_device_iommu_dirty_tracking)
>
> Let's call it vfio_device_iommu_dirty_tracking.
>
I thinking about this and I am not that sure it makes sense. That is the
.get_cap() stuff.
Using the hw_caps is only useful when choosing hwpt_flags, then the only thing
that matters for patch 12 is after the device is attached ... hence we gotta
look at hwpt_flags. That ultimately is what tells if dirty tracking can be done
in the device pagetable.
I can expand hiod_iommufd_vfio_get_cap() to return the hwpt flags, but it feels
just as hacky given that I am testing its enablement of the hardware pagetable
(HWPT), and not asking a HIOD capability.
e.g. hiod_iommufd_vfio_get_cap would make more sense in patch 9 for the
attach_device() flow[*], but not for vfio_migration_realize() flow.
[*] though feels unneeded as we only have a local callsite, not external user so
far.
Which would technically make v5.1 patch a more correct right check, perhaps with
better layering/naming.
>> I can follow-up with this improvement in case this gets merged as is,
>
> I can't merge as is since it break compiles (I am excluding the v5.1 patch).
> Which means I would prefer a v6 please.
>
Ah OK -- I thought this discussion assumed v5.1 to be in which does fix the
compilation issue and all that remained were acks.
>> or include
>> it in the next version if you prefer to adjourn this series into 9.2 (given the
>> lack of time to get everything right).
>
> There aren't many open questions left.
>
> * PATCH 5 lacks a R-b. I would feel more confortable if ZhenZhong or
> Eric acked the changes.
> * PATCH 9 is slightly hacky with the use of vfio_device_get_aw_bits().
> I think it's minor. I would also feel more confortable if ZhenZhong
> acked the changes.
I guess you meant patch 6 and not 9.
> * PATCH 12 needs the fix we have been talking about.
> * PATCH 13 is for dev/debug.
>
>
> What's important is to avoid introducing regressions in the current behavior,
> that is when not using IOMMUFD. It looks fine on that aspect AFAICT.
OK
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v5 12/13] vfio/migration: Don't block migration device dirty tracking is unsupported
2024-07-22 16:29 ` Joao Martins
@ 2024-07-22 17:04 ` Cédric Le Goater
2024-07-22 17:15 ` Cédric Le Goater
2024-07-22 18:01 ` Joao Martins
0 siblings, 2 replies; 53+ messages in thread
From: Cédric Le Goater @ 2024-07-22 17:04 UTC (permalink / raw)
To: Joao Martins, qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Jason Gunthorpe, Avihai Horon
On 7/22/24 18:29, Joao Martins wrote:
> On 22/07/2024 16:58, Cédric Le Goater wrote:
>> On 7/22/24 17:42, Joao Martins wrote:
>>> On 22/07/2024 16:13, Cédric Le Goater wrote:
>>>> On 7/22/24 17:01, Joao Martins wrote:
>>>>> On 22/07/2024 15:53, Cédric Le Goater wrote:
>>>>>> On 7/19/24 19:26, Joao Martins wrote:
>>>>>>> On 19/07/2024 15:24, Joao Martins wrote:
>>>>>>>> On 19/07/2024 15:17, Cédric Le Goater wrote:
>>>>>>>>> On 7/19/24 14:05, Joao Martins wrote:
>>>>>>>>>> By default VFIO migration is set to auto, which will support live
>>>>>>>>>> migration if the migration capability is set *and* also dirty page
>>>>>>>>>> tracking is supported.
>>>>>>>>>>
>>>>>>>>>> For testing purposes one can force enable without dirty page tracking
>>>>>>>>>> via enable-migration=on, but that option is generally left for testing
>>>>>>>>>> purposes.
>>>>>>>>>>
>>>>>>>>>> So starting with IOMMU dirty tracking it can use to accomodate the lack of
>>>>>>>>>> VF dirty page tracking allowing us to minimize the VF requirements for
>>>>>>>>>> migration and thus enabling migration by default for those too.
>>>>>>>>>>
>>>>>>>>>> While at it change the error messages to mention IOMMU dirty tracking as
>>>>>>>>>> well.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>>>>>>>>> ---
>>>>>>>>>> include/hw/vfio/vfio-common.h | 1 +
>>>>>>>>>> hw/vfio/iommufd.c | 2 +-
>>>>>>>>>> hw/vfio/migration.c | 11 ++++++-----
>>>>>>>>>> 3 files changed, 8 insertions(+), 6 deletions(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>>>>>>>>>> index 7e530c7869dc..00b9e933449e 100644
>>>>>>>>>> --- a/include/hw/vfio/vfio-common.h
>>>>>>>>>> +++ b/include/hw/vfio/vfio-common.h
>>>>>>>>>> @@ -299,6 +299,7 @@ int vfio_devices_query_dirty_bitmap(const
>>>>>>>>>> VFIOContainerBase *bcontainer,
>>>>>>>>>> VFIOBitmap *vbmap, hwaddr iova, hwaddr size, Error
>>>>>>>>>> **errp);
>>>>>>>>>> int vfio_get_dirty_bitmap(const VFIOContainerBase *bcontainer,
>>>>>>>>>> uint64_t
>>>>>>>>>> iova,
>>>>>>>>>> uint64_t size, ram_addr_t ram_addr, Error
>>>>>>>>>> **errp);
>>>>>>>>>> +bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt);
>>>>>>>>>> /* Returns 0 on success, or a negative errno. */
>>>>>>>>>> bool vfio_device_get_name(VFIODevice *vbasedev, Error **errp);
>>>>>>>>>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>>>>>>>>>> index 7dd5d43ce06a..a998e8578552 100644
>>>>>>>>>> --- a/hw/vfio/iommufd.c
>>>>>>>>>> +++ b/hw/vfio/iommufd.c
>>>>>>>>>> @@ -111,7 +111,7 @@ static void
>>>>>>>>>> iommufd_cdev_unbind_and_disconnect(VFIODevice
>>>>>>>>>> *vbasedev)
>>>>>>>>>> iommufd_backend_disconnect(vbasedev->iommufd);
>>>>>>>>>> }
>>>>>>>>>> -static bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt)
>>>>>>>>>> +bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt)
>>>>>>>>>> {
>>>>>>>>>> return hwpt && hwpt->hwpt_flags &
>>>>>>>>>> IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
>>>>>>>>>> }
>>>>>>>>>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>>>>>>>>>> index 34d4be2ce1b1..63ffa46c9652 100644
>>>>>>>>>> --- a/hw/vfio/migration.c
>>>>>>>>>> +++ b/hw/vfio/migration.c
>>>>>>>>>> @@ -1036,16 +1036,17 @@ bool vfio_migration_realize(VFIODevice *vbasedev,
>>>>>>>>>> Error **errp)
>>>>>>>>>> return !vfio_block_migration(vbasedev, err, errp);
>>>>>>>>>> }
>>>>>>>>>> - if (!vbasedev->dirty_pages_supported) {
>>>>>>>>>> + if (!vbasedev->dirty_pages_supported &&
>>>>>>>>>> + !iommufd_hwpt_dirty_tracking(vbasedev->hwpt)) {
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Some platforms do not have IOMMUFD support and this call will need
>>>>>>>>> some kind of abstract wrapper to reflect dirty tracking support in
>>>>>>>>> the IOMMU backend.
>>>>>>>>>
>>>>>>>>
>>>>>>>> This was actually on purpose because only IOMMUFD presents a view of
>>>>>>>> hardware
>>>>>>>> whereas type1 supporting dirty page tracking is not used as means to
>>>>>>>> 'migration
>>>>>>>> is supported'.
>>>>>>>>
>>>>>>>> The hwpt is nil in type1 and the helper checks that, so it should return
>>>>>>>> false.
>>>>>>>>
>>>>>>>
>>>>>>> Oh wait, maybe you're talking about CONFIG_IOMMUFD=n which I totally didn't
>>>>>>> consider. Maybe this would be a elegant way to address it? Looks to pass my
>>>>>>> build with CONFIG_IOMMUFD=n
>>>>>>>
>>>>>>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>>>>>>> index 61dd48e79b71..422ad4a5bdd1 100644
>>>>>>> --- a/include/hw/vfio/vfio-common.h
>>>>>>> +++ b/include/hw/vfio/vfio-common.h
>>>>>>> @@ -300,7 +300,14 @@ int vfio_devices_query_dirty_bitmap(const
>>>>>>> VFIOContainerBase
>>>>>>> *bcontainer,
>>>>>>> VFIOBitmap *vbmap, hwaddr iova, hwaddr size, Error
>>>>>>> **errp);
>>>>>>> int vfio_get_dirty_bitmap(const VFIOContainerBase *bcontainer, uint64_t
>>>>>>> iova,
>>>>>>> uint64_t size, ram_addr_t ram_addr, Error
>>>>>>> **errp);
>>>>>>> +#ifdef CONFIG_IOMMUFD
>>>>>>> bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt);
>>>>>>> +#else
>>>>>>> +static inline bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt)
>>>>>>> +{
>>>>>>> + return false;
>>>>>>> +}
>>>>>>> +#endif
>>>>>>>
>>>>>>> /* Returns 0 on success, or a negative errno. */
>>>>>>> bool vfio_device_get_name(VFIODevice *vbasedev, Error **errp);
>>>>>>>
>>>>>>
>>>>>> hmm, no. You will need to introduce a new Host IOMMU device capability,
>>>>>> something like :
>>>>>>
>>>>>> HOST_IOMMU_DEVICE_CAP_DIRTY_TRACKING,
>>>>>>
>>>>>> Then, introduce an helper routine to check the capability :
>>>>>>
>>>>>> return hiodc->get_cap( ... HOST_IOMMU_DEVICE_CAP_DIRTY_TRACKING...)
>>>>>> and replace the iommufd_hwpt_dirty_tracking call with it.
>>>>>>
>>>>>> Yeah I know, it's cumbersome but it's cleaner !
>>>>>>
>>>>>
>>>>> Funny you mention it, because that's what I did in v3:
>>>>>
>>>>> https://lore.kernel.org/qemu-devel/20240708143420.16953-9-joao.m.martins@oracle.com/
>>>>>
>>>>> But it was suggested to drop (I am assuming to avoid complexity)
>>>>
>>>> my bad if I did :/
>>>>
>>>
>>> No worries it is all part of review -- I think Zhenzhong proposed with good
>>> intentions, and I probably didn't think too hard about the consequences on
>>> layering with the HIOD.
>>>
>>>> we will need an helper such as :
>>>>
>>>> bool vfio_device_dirty_tracking(VFIODevice *vbasedev)
>>>> {
>>>> HostIOMMUDevice *hiod = vbasedev->hiod ;
>>>> HostIOMMUDeviceClass *hiodc = HOST_IOMMU_DEVICE_GET_CLASS(hiod);
>>>>
>>>> return hiodc->get_cap &&
>>>> hiodc->get_cap(hiod, HOST_IOMMU_DEVICE_CAP_DIRTY_TRACKING, NULL)
>>>> == 1;
>>>> }
>>>>
>>>> and something like,
>>>>
>>>> static int hiod_iommufd_vfio_get_cap(HostIOMMUDevice *hiod, int cap,
>>>> Error **errp)
>>>> {
>>>> switch (cap) {
>>>> case HOST_IOMMU_DEVICE_CAP_DIRTY_TRACKING:
>>>> return !!(hiod->caps.hw_caps & IOMMU_HW_CAP_DIRTY_TRACKING);
>>>> default:
>>>> error_setg(errp, "%s: unsupported capability %x", hiod->name, cap);
>>>> return -EINVAL;
>>>> }
>>>> }
>>>>
>>>> Feel free to propose your own implementation,
>>>>
>>>
>>> Actually it's close to what I had in v3 link, except the new helper (the name
>>> vfio_device_dirty_tracking is a bit misleading I would call it
>>> vfio_device_iommu_dirty_tracking)
>>
>> Let's call it vfio_device_iommu_dirty_tracking.
>>
>
> I thinking about this and I am not that sure it makes sense. That is the
> .get_cap() stuff.
>
> Using the hw_caps is only useful when choosing hwpt_flags, then the only thing
> that matters for patch 12 is after the device is attached ... hence we gotta
> look at hwpt_flags. That ultimately is what tells if dirty tracking can be done
> in the device pagetable.
>
> I can expand hiod_iommufd_vfio_get_cap() to return the hwpt flags, but it feels
> just as hacky given that I am testing its enablement of the hardware pagetable
> (HWPT), and not asking a HIOD capability.
arf. yes.
> e.g. hiod_iommufd_vfio_get_cap would make more sense in patch 9 for the
> attach_device() flow[*], but not for vfio_migration_realize() flow.
>
> [*] though feels unneeded as we only have a local callsite, not external user so
> far.
>
> Which would technically make v5.1 patch a more correct right check, perhaps with
> better layering/naming.
The quick fix (plan B if needed) would be :
@@ -1038,8 +1038,11 @@ bool vfio_migration_realize(VFIODevice *
}
if ((!vbasedev->dirty_pages_supported ||
- vbasedev->device_dirty_page_tracking == ON_OFF_AUTO_OFF) &&
- !iommufd_hwpt_dirty_tracking(vbasedev->hwpt)) {
+ vbasedev->device_dirty_page_tracking == ON_OFF_AUTO_OFF)
+#ifdef CONFIG_IOMMUFD
+ && !iommufd_hwpt_dirty_tracking(vbasedev->hwpt)
+#endif
+ ) {
if (vbasedev->enable_migration == ON_OFF_AUTO_AUTO) {
error_setg(&err,
"%s: VFIO device doesn't support device and "
I would prefer to avoid the common component to reference IOMMUFD
directly. The only exception today is the use of the vbasedev->iommufd
pointer which is treated as opaque.
I guess a simple approach would be to store the result of
iommufd_hwpt_dirty_tracking(hwpt) under a 'dirty_tracking' attribute
of vbasedev and return the value in vfio_device_iommu_dirty_tracking() ?
if not, let's merge v5 (with more acks) and the fix of plan B.
>>> I can follow-up with this improvement in case this gets merged as is,
>>
>> I can't merge as is since it break compiles (I am excluding the v5.1 patch).
>> Which means I would prefer a v6 please.
>>
>
> Ah OK -- I thought this discussion assumed v5.1 to be in which does fix the
> compilation issue and all that remained were acks.
v5.1 proposes a CONFIG_IOMMUFD in a header file which is error prone.
>>> or include
>>> it in the next version if you prefer to adjourn this series into 9.2 (given the
>>> lack of time to get everything right).
>>
>> There aren't many open questions left.
>>
>> * PATCH 5 lacks a R-b. I would feel more confortable if ZhenZhong or
>> Eric acked the changes.
>> * PATCH 9 is slightly hacky with the use of vfio_device_get_aw_bits().
>> I think it's minor. I would also feel more confortable if ZhenZhong
>> acked the changes.
>
> I guess you meant patch 6 and not 9.
yes.
Thanks,
C.
>
>> * PATCH 12 needs the fix we have been talking about.
>> * PATCH 13 is for dev/debug.
>>
>>
>> What's important is to avoid introducing regressions in the current behavior,
>> that is when not using IOMMUFD. It looks fine on that aspect AFAICT.
>
> OK
>
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v5 12/13] vfio/migration: Don't block migration device dirty tracking is unsupported
2024-07-22 17:04 ` Cédric Le Goater
@ 2024-07-22 17:15 ` Cédric Le Goater
2024-07-22 18:08 ` Joao Martins
2024-07-22 18:01 ` Joao Martins
1 sibling, 1 reply; 53+ messages in thread
From: Cédric Le Goater @ 2024-07-22 17:15 UTC (permalink / raw)
To: Joao Martins, qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Jason Gunthorpe, Avihai Horon
On 7/22/24 19:04, Cédric Le Goater wrote:
> On 7/22/24 18:29, Joao Martins wrote:
>> On 22/07/2024 16:58, Cédric Le Goater wrote:
>>> On 7/22/24 17:42, Joao Martins wrote:
>>>> On 22/07/2024 16:13, Cédric Le Goater wrote:
>>>>> On 7/22/24 17:01, Joao Martins wrote:
>>>>>> On 22/07/2024 15:53, Cédric Le Goater wrote:
>>>>>>> On 7/19/24 19:26, Joao Martins wrote:
>>>>>>>> On 19/07/2024 15:24, Joao Martins wrote:
>>>>>>>>> On 19/07/2024 15:17, Cédric Le Goater wrote:
>>>>>>>>>> On 7/19/24 14:05, Joao Martins wrote:
>>>>>>>>>>> By default VFIO migration is set to auto, which will support live
>>>>>>>>>>> migration if the migration capability is set *and* also dirty page
>>>>>>>>>>> tracking is supported.
>>>>>>>>>>>
>>>>>>>>>>> For testing purposes one can force enable without dirty page tracking
>>>>>>>>>>> via enable-migration=on, but that option is generally left for testing
>>>>>>>>>>> purposes.
>>>>>>>>>>>
>>>>>>>>>>> So starting with IOMMU dirty tracking it can use to accomodate the lack of
>>>>>>>>>>> VF dirty page tracking allowing us to minimize the VF requirements for
>>>>>>>>>>> migration and thus enabling migration by default for those too.
>>>>>>>>>>>
>>>>>>>>>>> While at it change the error messages to mention IOMMU dirty tracking as
>>>>>>>>>>> well.
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>>>>>>>>>> ---
>>>>>>>>>>> include/hw/vfio/vfio-common.h | 1 +
>>>>>>>>>>> hw/vfio/iommufd.c | 2 +-
>>>>>>>>>>> hw/vfio/migration.c | 11 ++++++-----
>>>>>>>>>>> 3 files changed, 8 insertions(+), 6 deletions(-)
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>>>>>>>>>>> index 7e530c7869dc..00b9e933449e 100644
>>>>>>>>>>> --- a/include/hw/vfio/vfio-common.h
>>>>>>>>>>> +++ b/include/hw/vfio/vfio-common.h
>>>>>>>>>>> @@ -299,6 +299,7 @@ int vfio_devices_query_dirty_bitmap(const
>>>>>>>>>>> VFIOContainerBase *bcontainer,
>>>>>>>>>>> VFIOBitmap *vbmap, hwaddr iova, hwaddr size, Error
>>>>>>>>>>> **errp);
>>>>>>>>>>> int vfio_get_dirty_bitmap(const VFIOContainerBase *bcontainer,
>>>>>>>>>>> uint64_t
>>>>>>>>>>> iova,
>>>>>>>>>>> uint64_t size, ram_addr_t ram_addr, Error
>>>>>>>>>>> **errp);
>>>>>>>>>>> +bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt);
>>>>>>>>>>> /* Returns 0 on success, or a negative errno. */
>>>>>>>>>>> bool vfio_device_get_name(VFIODevice *vbasedev, Error **errp);
>>>>>>>>>>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>>>>>>>>>>> index 7dd5d43ce06a..a998e8578552 100644
>>>>>>>>>>> --- a/hw/vfio/iommufd.c
>>>>>>>>>>> +++ b/hw/vfio/iommufd.c
>>>>>>>>>>> @@ -111,7 +111,7 @@ static void
>>>>>>>>>>> iommufd_cdev_unbind_and_disconnect(VFIODevice
>>>>>>>>>>> *vbasedev)
>>>>>>>>>>> iommufd_backend_disconnect(vbasedev->iommufd);
>>>>>>>>>>> }
>>>>>>>>>>> -static bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt)
>>>>>>>>>>> +bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt)
>>>>>>>>>>> {
>>>>>>>>>>> return hwpt && hwpt->hwpt_flags &
>>>>>>>>>>> IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
>>>>>>>>>>> }
>>>>>>>>>>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>>>>>>>>>>> index 34d4be2ce1b1..63ffa46c9652 100644
>>>>>>>>>>> --- a/hw/vfio/migration.c
>>>>>>>>>>> +++ b/hw/vfio/migration.c
>>>>>>>>>>> @@ -1036,16 +1036,17 @@ bool vfio_migration_realize(VFIODevice *vbasedev,
>>>>>>>>>>> Error **errp)
>>>>>>>>>>> return !vfio_block_migration(vbasedev, err, errp);
>>>>>>>>>>> }
>>>>>>>>>>> - if (!vbasedev->dirty_pages_supported) {
>>>>>>>>>>> + if (!vbasedev->dirty_pages_supported &&
>>>>>>>>>>> + !iommufd_hwpt_dirty_tracking(vbasedev->hwpt)) {
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Some platforms do not have IOMMUFD support and this call will need
>>>>>>>>>> some kind of abstract wrapper to reflect dirty tracking support in
>>>>>>>>>> the IOMMU backend.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> This was actually on purpose because only IOMMUFD presents a view of
>>>>>>>>> hardware
>>>>>>>>> whereas type1 supporting dirty page tracking is not used as means to
>>>>>>>>> 'migration
>>>>>>>>> is supported'.
>>>>>>>>>
>>>>>>>>> The hwpt is nil in type1 and the helper checks that, so it should return
>>>>>>>>> false.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Oh wait, maybe you're talking about CONFIG_IOMMUFD=n which I totally didn't
>>>>>>>> consider. Maybe this would be a elegant way to address it? Looks to pass my
>>>>>>>> build with CONFIG_IOMMUFD=n
>>>>>>>>
>>>>>>>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>>>>>>>> index 61dd48e79b71..422ad4a5bdd1 100644
>>>>>>>> --- a/include/hw/vfio/vfio-common.h
>>>>>>>> +++ b/include/hw/vfio/vfio-common.h
>>>>>>>> @@ -300,7 +300,14 @@ int vfio_devices_query_dirty_bitmap(const
>>>>>>>> VFIOContainerBase
>>>>>>>> *bcontainer,
>>>>>>>> VFIOBitmap *vbmap, hwaddr iova, hwaddr size, Error
>>>>>>>> **errp);
>>>>>>>> int vfio_get_dirty_bitmap(const VFIOContainerBase *bcontainer, uint64_t
>>>>>>>> iova,
>>>>>>>> uint64_t size, ram_addr_t ram_addr, Error
>>>>>>>> **errp);
>>>>>>>> +#ifdef CONFIG_IOMMUFD
>>>>>>>> bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt);
>>>>>>>> +#else
>>>>>>>> +static inline bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt)
>>>>>>>> +{
>>>>>>>> + return false;
>>>>>>>> +}
>>>>>>>> +#endif
>>>>>>>>
>>>>>>>> /* Returns 0 on success, or a negative errno. */
>>>>>>>> bool vfio_device_get_name(VFIODevice *vbasedev, Error **errp);
>>>>>>>>
>>>>>>>
>>>>>>> hmm, no. You will need to introduce a new Host IOMMU device capability,
>>>>>>> something like :
>>>>>>>
>>>>>>> HOST_IOMMU_DEVICE_CAP_DIRTY_TRACKING,
>>>>>>>
>>>>>>> Then, introduce an helper routine to check the capability :
>>>>>>>
>>>>>>> return hiodc->get_cap( ... HOST_IOMMU_DEVICE_CAP_DIRTY_TRACKING...)
>>>>>>> and replace the iommufd_hwpt_dirty_tracking call with it.
>>>>>>>
>>>>>>> Yeah I know, it's cumbersome but it's cleaner !
>>>>>>>
>>>>>>
>>>>>> Funny you mention it, because that's what I did in v3:
>>>>>>
>>>>>> https://lore.kernel.org/qemu-devel/20240708143420.16953-9-joao.m.martins@oracle.com/
>>>>>>
>>>>>> But it was suggested to drop (I am assuming to avoid complexity)
>>>>>
>>>>> my bad if I did :/
>>>>>
>>>>
>>>> No worries it is all part of review -- I think Zhenzhong proposed with good
>>>> intentions, and I probably didn't think too hard about the consequences on
>>>> layering with the HIOD.
>>>>
>>>>> we will need an helper such as :
>>>>>
>>>>> bool vfio_device_dirty_tracking(VFIODevice *vbasedev)
>>>>> {
>>>>> HostIOMMUDevice *hiod = vbasedev->hiod ;
>>>>> HostIOMMUDeviceClass *hiodc = HOST_IOMMU_DEVICE_GET_CLASS(hiod);
>>>>>
>>>>> return hiodc->get_cap &&
>>>>> hiodc->get_cap(hiod, HOST_IOMMU_DEVICE_CAP_DIRTY_TRACKING, NULL)
>>>>> == 1;
>>>>> }
>>>>>
>>>>> and something like,
>>>>>
>>>>> static int hiod_iommufd_vfio_get_cap(HostIOMMUDevice *hiod, int cap,
>>>>> Error **errp)
>>>>> {
>>>>> switch (cap) {
>>>>> case HOST_IOMMU_DEVICE_CAP_DIRTY_TRACKING:
>>>>> return !!(hiod->caps.hw_caps & IOMMU_HW_CAP_DIRTY_TRACKING);
>>>>> default:
>>>>> error_setg(errp, "%s: unsupported capability %x", hiod->name, cap);
>>>>> return -EINVAL;
>>>>> }
>>>>> }
>>>>>
>>>>> Feel free to propose your own implementation,
>>>>>
>>>>
>>>> Actually it's close to what I had in v3 link, except the new helper (the name
>>>> vfio_device_dirty_tracking is a bit misleading I would call it
>>>> vfio_device_iommu_dirty_tracking)
>>>
>>> Let's call it vfio_device_iommu_dirty_tracking.
>>>
>>
>> I thinking about this and I am not that sure it makes sense. That is the
>> .get_cap() stuff.
>>
>> Using the hw_caps is only useful when choosing hwpt_flags, then the only thing
>> that matters for patch 12 is after the device is attached ... hence we gotta
>> look at hwpt_flags. That ultimately is what tells if dirty tracking can be done
>> in the device pagetable.
>>
>> I can expand hiod_iommufd_vfio_get_cap() to return the hwpt flags, but it feels
>> just as hacky given that I am testing its enablement of the hardware pagetable
>> (HWPT), and not asking a HIOD capability.
>
> arf. yes.
>
>> e.g. hiod_iommufd_vfio_get_cap would make more sense in patch 9 for the
>> attach_device() flow[*], but not for vfio_migration_realize() flow.
>>
>> [*] though feels unneeded as we only have a local callsite, not external user so
>> far.
>>
>> Which would technically make v5.1 patch a more correct right check, perhaps with
>> better layering/naming.
>
> The quick fix (plan B if needed) would be :
>
> @@ -1038,8 +1038,11 @@ bool vfio_migration_realize(VFIODevice *
> }
>
> if ((!vbasedev->dirty_pages_supported ||
> - vbasedev->device_dirty_page_tracking == ON_OFF_AUTO_OFF) &&
> - !iommufd_hwpt_dirty_tracking(vbasedev->hwpt)) {
> + vbasedev->device_dirty_page_tracking == ON_OFF_AUTO_OFF)
> +#ifdef CONFIG_IOMMUFD
> + && !iommufd_hwpt_dirty_tracking(vbasedev->hwpt)
> +#endif
> + ) {
> if (vbasedev->enable_migration == ON_OFF_AUTO_AUTO) {
> error_setg(&err,
> "%s: VFIO device doesn't support device and "
>
> I would prefer to avoid the common component to reference IOMMUFD
> directly. The only exception today is the use of the vbasedev->iommufd
> pointer which is treated as opaque.
>
> I guess a simple approach would be to store the result of
> iommufd_hwpt_dirty_tracking(hwpt) under a 'dirty_tracking' attribute
'iommu_dirty_tracking' may be. 'dirty_tracking' is already used to
track ongoing logging.
> of vbasedev and return the value in vfio_device_iommu_dirty_tracking() ?
>
> if not, let's merge v5 (with more acks) and the fix of plan B.
>
>
>>>> I can follow-up with this improvement in case this gets merged as is,
>>>
>>> I can't merge as is since it break compiles (I am excluding the v5.1 patch).
>>> Which means I would prefer a v6 please.
>>>
>>
>> Ah OK -- I thought this discussion assumed v5.1 to be in which does fix the
>> compilation issue and all that remained were acks.
>
> v5.1 proposes a CONFIG_IOMMUFD in a header file which is error prone.
>
>>>> or include
>>>> it in the next version if you prefer to adjourn this series into 9.2 (given the
>>>> lack of time to get everything right).
>>>
>>> There aren't many open questions left.
>>>
>>> * PATCH 5 lacks a R-b. I would feel more confortable if ZhenZhong or
>>> Eric acked the changes.
>>> * PATCH 9 is slightly hacky with the use of vfio_device_get_aw_bits().
>>> I think it's minor. I would also feel more confortable if ZhenZhong
>>> acked the changes.
>>
>> I guess you meant patch 6 and not 9.
>
> yes.
>
> Thanks,
>
> C.
>
>
>
>>
>>> * PATCH 12 needs the fix we have been talking about.
>>> * PATCH 13 is for dev/debug.
>>>
>>>
>>> What's important is to avoid introducing regressions in the current behavior,
>>> that is when not using IOMMUFD. It looks fine on that aspect AFAICT.
>>
>> OK
>>
>
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v5 12/13] vfio/migration: Don't block migration device dirty tracking is unsupported
2024-07-22 17:15 ` Cédric Le Goater
@ 2024-07-22 18:08 ` Joao Martins
0 siblings, 0 replies; 53+ messages in thread
From: Joao Martins @ 2024-07-22 18:08 UTC (permalink / raw)
To: Cédric Le Goater, qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Jason Gunthorpe, Avihai Horon
On 22/07/2024 18:15, Cédric Le Goater wrote:
> On 7/22/24 19:04, Cédric Le Goater wrote:
>> On 7/22/24 18:29, Joao Martins wrote:
>>> On 22/07/2024 16:58, Cédric Le Goater wrote:
>>>> On 7/22/24 17:42, Joao Martins wrote:
>>>>> On 22/07/2024 16:13, Cédric Le Goater wrote:
>>>>>> On 7/22/24 17:01, Joao Martins wrote:
>>>>>>> On 22/07/2024 15:53, Cédric Le Goater wrote:
>>>>>>>> On 7/19/24 19:26, Joao Martins wrote:
>>>>>>>>> On 19/07/2024 15:24, Joao Martins wrote:
>>>>>>>>>> On 19/07/2024 15:17, Cédric Le Goater wrote:
>>>>>>>>>>> On 7/19/24 14:05, Joao Martins wrote:
>>>>>>>>>>>> By default VFIO migration is set to auto, which will support live
>>>>>>>>>>>> migration if the migration capability is set *and* also dirty page
>>>>>>>>>>>> tracking is supported.
>>>>>>>>>>>>
>>>>>>>>>>>> For testing purposes one can force enable without dirty page tracking
>>>>>>>>>>>> via enable-migration=on, but that option is generally left for testing
>>>>>>>>>>>> purposes.
>>>>>>>>>>>>
>>>>>>>>>>>> So starting with IOMMU dirty tracking it can use to accomodate the
>>>>>>>>>>>> lack of
>>>>>>>>>>>> VF dirty page tracking allowing us to minimize the VF requirements for
>>>>>>>>>>>> migration and thus enabling migration by default for those too.
>>>>>>>>>>>>
>>>>>>>>>>>> While at it change the error messages to mention IOMMU dirty
>>>>>>>>>>>> tracking as
>>>>>>>>>>>> well.
>>>>>>>>>>>>
>>>>>>>>>>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>>>>>>>>>>> ---
>>>>>>>>>>>> include/hw/vfio/vfio-common.h | 1 +
>>>>>>>>>>>> hw/vfio/iommufd.c | 2 +-
>>>>>>>>>>>> hw/vfio/migration.c | 11 ++++++-----
>>>>>>>>>>>> 3 files changed, 8 insertions(+), 6 deletions(-)
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/include/hw/vfio/vfio-common.h
>>>>>>>>>>>> b/include/hw/vfio/vfio-common.h
>>>>>>>>>>>> index 7e530c7869dc..00b9e933449e 100644
>>>>>>>>>>>> --- a/include/hw/vfio/vfio-common.h
>>>>>>>>>>>> +++ b/include/hw/vfio/vfio-common.h
>>>>>>>>>>>> @@ -299,6 +299,7 @@ int vfio_devices_query_dirty_bitmap(const
>>>>>>>>>>>> VFIOContainerBase *bcontainer,
>>>>>>>>>>>> VFIOBitmap *vbmap, hwaddr iova, hwaddr size,
>>>>>>>>>>>> Error
>>>>>>>>>>>> **errp);
>>>>>>>>>>>> int vfio_get_dirty_bitmap(const VFIOContainerBase *bcontainer,
>>>>>>>>>>>> uint64_t
>>>>>>>>>>>> iova,
>>>>>>>>>>>> uint64_t size, ram_addr_t ram_addr,
>>>>>>>>>>>> Error
>>>>>>>>>>>> **errp);
>>>>>>>>>>>> +bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt);
>>>>>>>>>>>> /* Returns 0 on success, or a negative errno. */
>>>>>>>>>>>> bool vfio_device_get_name(VFIODevice *vbasedev, Error **errp);
>>>>>>>>>>>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>>>>>>>>>>>> index 7dd5d43ce06a..a998e8578552 100644
>>>>>>>>>>>> --- a/hw/vfio/iommufd.c
>>>>>>>>>>>> +++ b/hw/vfio/iommufd.c
>>>>>>>>>>>> @@ -111,7 +111,7 @@ static void
>>>>>>>>>>>> iommufd_cdev_unbind_and_disconnect(VFIODevice
>>>>>>>>>>>> *vbasedev)
>>>>>>>>>>>> iommufd_backend_disconnect(vbasedev->iommufd);
>>>>>>>>>>>> }
>>>>>>>>>>>> -static bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt)
>>>>>>>>>>>> +bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt)
>>>>>>>>>>>> {
>>>>>>>>>>>> return hwpt && hwpt->hwpt_flags &
>>>>>>>>>>>> IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
>>>>>>>>>>>> }
>>>>>>>>>>>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>>>>>>>>>>>> index 34d4be2ce1b1..63ffa46c9652 100644
>>>>>>>>>>>> --- a/hw/vfio/migration.c
>>>>>>>>>>>> +++ b/hw/vfio/migration.c
>>>>>>>>>>>> @@ -1036,16 +1036,17 @@ bool vfio_migration_realize(VFIODevice
>>>>>>>>>>>> *vbasedev,
>>>>>>>>>>>> Error **errp)
>>>>>>>>>>>> return !vfio_block_migration(vbasedev, err, errp);
>>>>>>>>>>>> }
>>>>>>>>>>>> - if (!vbasedev->dirty_pages_supported) {
>>>>>>>>>>>> + if (!vbasedev->dirty_pages_supported &&
>>>>>>>>>>>> + !iommufd_hwpt_dirty_tracking(vbasedev->hwpt)) {
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Some platforms do not have IOMMUFD support and this call will need
>>>>>>>>>>> some kind of abstract wrapper to reflect dirty tracking support in
>>>>>>>>>>> the IOMMU backend.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> This was actually on purpose because only IOMMUFD presents a view of
>>>>>>>>>> hardware
>>>>>>>>>> whereas type1 supporting dirty page tracking is not used as means to
>>>>>>>>>> 'migration
>>>>>>>>>> is supported'.
>>>>>>>>>>
>>>>>>>>>> The hwpt is nil in type1 and the helper checks that, so it should return
>>>>>>>>>> false.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Oh wait, maybe you're talking about CONFIG_IOMMUFD=n which I totally
>>>>>>>>> didn't
>>>>>>>>> consider. Maybe this would be a elegant way to address it? Looks to
>>>>>>>>> pass my
>>>>>>>>> build with CONFIG_IOMMUFD=n
>>>>>>>>>
>>>>>>>>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>>>>>>>>> index 61dd48e79b71..422ad4a5bdd1 100644
>>>>>>>>> --- a/include/hw/vfio/vfio-common.h
>>>>>>>>> +++ b/include/hw/vfio/vfio-common.h
>>>>>>>>> @@ -300,7 +300,14 @@ int vfio_devices_query_dirty_bitmap(const
>>>>>>>>> VFIOContainerBase
>>>>>>>>> *bcontainer,
>>>>>>>>> VFIOBitmap *vbmap, hwaddr iova, hwaddr size, Error
>>>>>>>>> **errp);
>>>>>>>>> int vfio_get_dirty_bitmap(const VFIOContainerBase *bcontainer,
>>>>>>>>> uint64_t
>>>>>>>>> iova,
>>>>>>>>> uint64_t size, ram_addr_t ram_addr, Error
>>>>>>>>> **errp);
>>>>>>>>> +#ifdef CONFIG_IOMMUFD
>>>>>>>>> bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt);
>>>>>>>>> +#else
>>>>>>>>> +static inline bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt)
>>>>>>>>> +{
>>>>>>>>> + return false;
>>>>>>>>> +}
>>>>>>>>> +#endif
>>>>>>>>>
>>>>>>>>> /* Returns 0 on success, or a negative errno. */
>>>>>>>>> bool vfio_device_get_name(VFIODevice *vbasedev, Error **errp);
>>>>>>>>>
>>>>>>>>
>>>>>>>> hmm, no. You will need to introduce a new Host IOMMU device capability,
>>>>>>>> something like :
>>>>>>>>
>>>>>>>> HOST_IOMMU_DEVICE_CAP_DIRTY_TRACKING,
>>>>>>>>
>>>>>>>> Then, introduce an helper routine to check the capability :
>>>>>>>>
>>>>>>>> return hiodc->get_cap( ... HOST_IOMMU_DEVICE_CAP_DIRTY_TRACKING...)
>>>>>>>> and replace the iommufd_hwpt_dirty_tracking call with it.
>>>>>>>>
>>>>>>>> Yeah I know, it's cumbersome but it's cleaner !
>>>>>>>>
>>>>>>>
>>>>>>> Funny you mention it, because that's what I did in v3:
>>>>>>>
>>>>>>> https://lore.kernel.org/qemu-devel/20240708143420.16953-9-joao.m.martins@oracle.com/
>>>>>>>
>>>>>>> But it was suggested to drop (I am assuming to avoid complexity)
>>>>>>
>>>>>> my bad if I did :/
>>>>>>
>>>>>
>>>>> No worries it is all part of review -- I think Zhenzhong proposed with good
>>>>> intentions, and I probably didn't think too hard about the consequences on
>>>>> layering with the HIOD.
>>>>>
>>>>>> we will need an helper such as :
>>>>>>
>>>>>> bool vfio_device_dirty_tracking(VFIODevice *vbasedev)
>>>>>> {
>>>>>> HostIOMMUDevice *hiod = vbasedev->hiod ;
>>>>>> HostIOMMUDeviceClass *hiodc = HOST_IOMMU_DEVICE_GET_CLASS(hiod);
>>>>>>
>>>>>> return hiodc->get_cap &&
>>>>>> hiodc->get_cap(hiod, HOST_IOMMU_DEVICE_CAP_DIRTY_TRACKING, NULL)
>>>>>> == 1;
>>>>>> }
>>>>>>
>>>>>> and something like,
>>>>>>
>>>>>> static int hiod_iommufd_vfio_get_cap(HostIOMMUDevice *hiod, int cap,
>>>>>> Error **errp)
>>>>>> {
>>>>>> switch (cap) {
>>>>>> case HOST_IOMMU_DEVICE_CAP_DIRTY_TRACKING:
>>>>>> return !!(hiod->caps.hw_caps & IOMMU_HW_CAP_DIRTY_TRACKING);
>>>>>> default:
>>>>>> error_setg(errp, "%s: unsupported capability %x", hiod->name,
>>>>>> cap);
>>>>>> return -EINVAL;
>>>>>> }
>>>>>> }
>>>>>>
>>>>>> Feel free to propose your own implementation,
>>>>>>
>>>>>
>>>>> Actually it's close to what I had in v3 link, except the new helper (the name
>>>>> vfio_device_dirty_tracking is a bit misleading I would call it
>>>>> vfio_device_iommu_dirty_tracking)
>>>>
>>>> Let's call it vfio_device_iommu_dirty_tracking.
>>>>
>>>
>>> I thinking about this and I am not that sure it makes sense. That is the
>>> .get_cap() stuff.
>>>
>>> Using the hw_caps is only useful when choosing hwpt_flags, then the only thing
>>> that matters for patch 12 is after the device is attached ... hence we gotta
>>> look at hwpt_flags. That ultimately is what tells if dirty tracking can be done
>>> in the device pagetable.
>>>
>>> I can expand hiod_iommufd_vfio_get_cap() to return the hwpt flags, but it feels
>>> just as hacky given that I am testing its enablement of the hardware pagetable
>>> (HWPT), and not asking a HIOD capability.
>>
>> arf. yes.
>>
>>> e.g. hiod_iommufd_vfio_get_cap would make more sense in patch 9 for the
>>> attach_device() flow[*], but not for vfio_migration_realize() flow.
>>>
>>> [*] though feels unneeded as we only have a local callsite, not external user so
>>> far.
>>>
>>> Which would technically make v5.1 patch a more correct right check, perhaps with
>>> better layering/naming.
>>
>> The quick fix (plan B if needed) would be :
>>
>> @@ -1038,8 +1038,11 @@ bool vfio_migration_realize(VFIODevice *
>> }
>>
>> if ((!vbasedev->dirty_pages_supported ||
>> - vbasedev->device_dirty_page_tracking == ON_OFF_AUTO_OFF) &&
>> - !iommufd_hwpt_dirty_tracking(vbasedev->hwpt)) {
>> + vbasedev->device_dirty_page_tracking == ON_OFF_AUTO_OFF)
>> +#ifdef CONFIG_IOMMUFD
>> + && !iommufd_hwpt_dirty_tracking(vbasedev->hwpt)
>> +#endif
>> + ) {
>> if (vbasedev->enable_migration == ON_OFF_AUTO_AUTO) {
>> error_setg(&err,
>> "%s: VFIO device doesn't support device and "
>>
>> I would prefer to avoid the common component to reference IOMMUFD
>> directly. The only exception today is the use of the vbasedev->iommufd
>> pointer which is treated as opaque.
>>
>> I guess a simple approach would be to store the result of
>> iommufd_hwpt_dirty_tracking(hwpt) under a 'dirty_tracking' attribute
>
> 'iommu_dirty_tracking' may be. 'dirty_tracking' is already used to
> track ongoing logging.
>
I can consolidate all that into a new VFIODevice attribute, and drop the
hwpt_flags it that helps.
I'll try to restructure and try to submit a new version before Zhenzhong wakes up.
>
>
>
>> of vbasedev and return the value in vfio_device_iommu_dirty_tracking() ?
>>
>> if not, let's merge v5 (with more acks) and the fix of plan B.
>>
>>
>>>>> I can follow-up with this improvement in case this gets merged as is,
>>>>
>>>> I can't merge as is since it break compiles (I am excluding the v5.1 patch).
>>>> Which means I would prefer a v6 please.
>>>>
>>>
>>> Ah OK -- I thought this discussion assumed v5.1 to be in which does fix the
>>> compilation issue and all that remained were acks.
>>
>> v5.1 proposes a CONFIG_IOMMUFD in a header file which is error prone.
>>
>>>>> or include
>>>>> it in the next version if you prefer to adjourn this series into 9.2 (given
>>>>> the
>>>>> lack of time to get everything right).
>>>>
>>>> There aren't many open questions left.
>>>>
>>>> * PATCH 5 lacks a R-b. I would feel more confortable if ZhenZhong or
>>>> Eric acked the changes.
>>>> * PATCH 9 is slightly hacky with the use of vfio_device_get_aw_bits().
>>>> I think it's minor. I would also feel more confortable if ZhenZhong
>>>> acked the changes.
>>>
>>> I guess you meant patch 6 and not 9.
>>
>> yes.
>>
>> Thanks,
>>
>> C.
>>
>>
>>
>>>
>>>> * PATCH 12 needs the fix we have been talking about.
>>>> * PATCH 13 is for dev/debug.
>>>>
>>>>
>>>> What's important is to avoid introducing regressions in the current behavior,
>>>> that is when not using IOMMUFD. It looks fine on that aspect AFAICT.
>>>
>>> OK
>>>
>>
>
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v5 12/13] vfio/migration: Don't block migration device dirty tracking is unsupported
2024-07-22 17:04 ` Cédric Le Goater
2024-07-22 17:15 ` Cédric Le Goater
@ 2024-07-22 18:01 ` Joao Martins
2024-07-23 6:38 ` Cédric Le Goater
1 sibling, 1 reply; 53+ messages in thread
From: Joao Martins @ 2024-07-22 18:01 UTC (permalink / raw)
To: Cédric Le Goater, qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Jason Gunthorpe, Avihai Horon
On 22/07/2024 18:04, Cédric Le Goater wrote:
> On 7/22/24 18:29, Joao Martins wrote:
>> On 22/07/2024 16:58, Cédric Le Goater wrote:
>>> On 7/22/24 17:42, Joao Martins wrote:
>>>> On 22/07/2024 16:13, Cédric Le Goater wrote:
>>>>> On 7/22/24 17:01, Joao Martins wrote:
>>>>>> On 22/07/2024 15:53, Cédric Le Goater wrote:
>>>>>>> On 7/19/24 19:26, Joao Martins wrote:
>>>>>>>> On 19/07/2024 15:24, Joao Martins wrote:
>>>>>>>>> On 19/07/2024 15:17, Cédric Le Goater wrote:
>>>>>>>>>> On 7/19/24 14:05, Joao Martins wrote:
>>>>>>>>>>> By default VFIO migration is set to auto, which will support live
>>>>>>>>>>> migration if the migration capability is set *and* also dirty page
>>>>>>>>>>> tracking is supported.
>>>>>>>>>>>
>>>>>>>>>>> For testing purposes one can force enable without dirty page tracking
>>>>>>>>>>> via enable-migration=on, but that option is generally left for testing
>>>>>>>>>>> purposes.
>>>>>>>>>>>
>>>>>>>>>>> So starting with IOMMU dirty tracking it can use to accomodate the
>>>>>>>>>>> lack of
>>>>>>>>>>> VF dirty page tracking allowing us to minimize the VF requirements for
>>>>>>>>>>> migration and thus enabling migration by default for those too.
>>>>>>>>>>>
>>>>>>>>>>> While at it change the error messages to mention IOMMU dirty tracking as
>>>>>>>>>>> well.
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>>>>>>>>>> ---
>>>>>>>>>>> include/hw/vfio/vfio-common.h | 1 +
>>>>>>>>>>> hw/vfio/iommufd.c | 2 +-
>>>>>>>>>>> hw/vfio/migration.c | 11 ++++++-----
>>>>>>>>>>> 3 files changed, 8 insertions(+), 6 deletions(-)
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/include/hw/vfio/vfio-common.h
>>>>>>>>>>> b/include/hw/vfio/vfio-common.h
>>>>>>>>>>> index 7e530c7869dc..00b9e933449e 100644
>>>>>>>>>>> --- a/include/hw/vfio/vfio-common.h
>>>>>>>>>>> +++ b/include/hw/vfio/vfio-common.h
>>>>>>>>>>> @@ -299,6 +299,7 @@ int vfio_devices_query_dirty_bitmap(const
>>>>>>>>>>> VFIOContainerBase *bcontainer,
>>>>>>>>>>> VFIOBitmap *vbmap, hwaddr iova, hwaddr size, Error
>>>>>>>>>>> **errp);
>>>>>>>>>>> int vfio_get_dirty_bitmap(const VFIOContainerBase *bcontainer,
>>>>>>>>>>> uint64_t
>>>>>>>>>>> iova,
>>>>>>>>>>> uint64_t size, ram_addr_t ram_addr,
>>>>>>>>>>> Error
>>>>>>>>>>> **errp);
>>>>>>>>>>> +bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt);
>>>>>>>>>>> /* Returns 0 on success, or a negative errno. */
>>>>>>>>>>> bool vfio_device_get_name(VFIODevice *vbasedev, Error **errp);
>>>>>>>>>>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>>>>>>>>>>> index 7dd5d43ce06a..a998e8578552 100644
>>>>>>>>>>> --- a/hw/vfio/iommufd.c
>>>>>>>>>>> +++ b/hw/vfio/iommufd.c
>>>>>>>>>>> @@ -111,7 +111,7 @@ static void
>>>>>>>>>>> iommufd_cdev_unbind_and_disconnect(VFIODevice
>>>>>>>>>>> *vbasedev)
>>>>>>>>>>> iommufd_backend_disconnect(vbasedev->iommufd);
>>>>>>>>>>> }
>>>>>>>>>>> -static bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt)
>>>>>>>>>>> +bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt)
>>>>>>>>>>> {
>>>>>>>>>>> return hwpt && hwpt->hwpt_flags &
>>>>>>>>>>> IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
>>>>>>>>>>> }
>>>>>>>>>>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>>>>>>>>>>> index 34d4be2ce1b1..63ffa46c9652 100644
>>>>>>>>>>> --- a/hw/vfio/migration.c
>>>>>>>>>>> +++ b/hw/vfio/migration.c
>>>>>>>>>>> @@ -1036,16 +1036,17 @@ bool vfio_migration_realize(VFIODevice
>>>>>>>>>>> *vbasedev,
>>>>>>>>>>> Error **errp)
>>>>>>>>>>> return !vfio_block_migration(vbasedev, err, errp);
>>>>>>>>>>> }
>>>>>>>>>>> - if (!vbasedev->dirty_pages_supported) {
>>>>>>>>>>> + if (!vbasedev->dirty_pages_supported &&
>>>>>>>>>>> + !iommufd_hwpt_dirty_tracking(vbasedev->hwpt)) {
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Some platforms do not have IOMMUFD support and this call will need
>>>>>>>>>> some kind of abstract wrapper to reflect dirty tracking support in
>>>>>>>>>> the IOMMU backend.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> This was actually on purpose because only IOMMUFD presents a view of
>>>>>>>>> hardware
>>>>>>>>> whereas type1 supporting dirty page tracking is not used as means to
>>>>>>>>> 'migration
>>>>>>>>> is supported'.
>>>>>>>>>
>>>>>>>>> The hwpt is nil in type1 and the helper checks that, so it should return
>>>>>>>>> false.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Oh wait, maybe you're talking about CONFIG_IOMMUFD=n which I totally didn't
>>>>>>>> consider. Maybe this would be a elegant way to address it? Looks to pass my
>>>>>>>> build with CONFIG_IOMMUFD=n
>>>>>>>>
>>>>>>>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>>>>>>>> index 61dd48e79b71..422ad4a5bdd1 100644
>>>>>>>> --- a/include/hw/vfio/vfio-common.h
>>>>>>>> +++ b/include/hw/vfio/vfio-common.h
>>>>>>>> @@ -300,7 +300,14 @@ int vfio_devices_query_dirty_bitmap(const
>>>>>>>> VFIOContainerBase
>>>>>>>> *bcontainer,
>>>>>>>> VFIOBitmap *vbmap, hwaddr iova, hwaddr size, Error
>>>>>>>> **errp);
>>>>>>>> int vfio_get_dirty_bitmap(const VFIOContainerBase *bcontainer,
>>>>>>>> uint64_t
>>>>>>>> iova,
>>>>>>>> uint64_t size, ram_addr_t ram_addr, Error
>>>>>>>> **errp);
>>>>>>>> +#ifdef CONFIG_IOMMUFD
>>>>>>>> bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt);
>>>>>>>> +#else
>>>>>>>> +static inline bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt)
>>>>>>>> +{
>>>>>>>> + return false;
>>>>>>>> +}
>>>>>>>> +#endif
>>>>>>>>
>>>>>>>> /* Returns 0 on success, or a negative errno. */
>>>>>>>> bool vfio_device_get_name(VFIODevice *vbasedev, Error **errp);
>>>>>>>>
>>>>>>>
>>>>>>> hmm, no. You will need to introduce a new Host IOMMU device capability,
>>>>>>> something like :
>>>>>>>
>>>>>>> HOST_IOMMU_DEVICE_CAP_DIRTY_TRACKING,
>>>>>>>
>>>>>>> Then, introduce an helper routine to check the capability :
>>>>>>>
>>>>>>> return hiodc->get_cap( ... HOST_IOMMU_DEVICE_CAP_DIRTY_TRACKING...)
>>>>>>> and replace the iommufd_hwpt_dirty_tracking call with it.
>>>>>>>
>>>>>>> Yeah I know, it's cumbersome but it's cleaner !
>>>>>>>
>>>>>>
>>>>>> Funny you mention it, because that's what I did in v3:
>>>>>>
>>>>>> https://lore.kernel.org/qemu-devel/20240708143420.16953-9-joao.m.martins@oracle.com/
>>>>>>
>>>>>> But it was suggested to drop (I am assuming to avoid complexity)
>>>>>
>>>>> my bad if I did :/
>>>>>
>>>>
>>>> No worries it is all part of review -- I think Zhenzhong proposed with good
>>>> intentions, and I probably didn't think too hard about the consequences on
>>>> layering with the HIOD.
>>>>
>>>>> we will need an helper such as :
>>>>>
>>>>> bool vfio_device_dirty_tracking(VFIODevice *vbasedev)
>>>>> {
>>>>> HostIOMMUDevice *hiod = vbasedev->hiod ;
>>>>> HostIOMMUDeviceClass *hiodc = HOST_IOMMU_DEVICE_GET_CLASS(hiod);
>>>>>
>>>>> return hiodc->get_cap &&
>>>>> hiodc->get_cap(hiod, HOST_IOMMU_DEVICE_CAP_DIRTY_TRACKING, NULL)
>>>>> == 1;
>>>>> }
>>>>>
>>>>> and something like,
>>>>>
>>>>> static int hiod_iommufd_vfio_get_cap(HostIOMMUDevice *hiod, int cap,
>>>>> Error **errp)
>>>>> {
>>>>> switch (cap) {
>>>>> case HOST_IOMMU_DEVICE_CAP_DIRTY_TRACKING:
>>>>> return !!(hiod->caps.hw_caps & IOMMU_HW_CAP_DIRTY_TRACKING);
>>>>> default:
>>>>> error_setg(errp, "%s: unsupported capability %x", hiod->name,
>>>>> cap);
>>>>> return -EINVAL;
>>>>> }
>>>>> }
>>>>>
>>>>> Feel free to propose your own implementation,
>>>>>
>>>>
>>>> Actually it's close to what I had in v3 link, except the new helper (the name
>>>> vfio_device_dirty_tracking is a bit misleading I would call it
>>>> vfio_device_iommu_dirty_tracking)
>>>
>>> Let's call it vfio_device_iommu_dirty_tracking.
>>>
>>
>> I thinking about this and I am not that sure it makes sense. That is the
>> .get_cap() stuff.
>>
>> Using the hw_caps is only useful when choosing hwpt_flags, then the only thing
>> that matters for patch 12 is after the device is attached ... hence we gotta
>> look at hwpt_flags. That ultimately is what tells if dirty tracking can be done
>> in the device pagetable.
>>
>> I can expand hiod_iommufd_vfio_get_cap() to return the hwpt flags, but it feels
>> just as hacky given that I am testing its enablement of the hardware pagetable
>> (HWPT), and not asking a HIOD capability.
>
> arf. yes.
>
>> e.g. hiod_iommufd_vfio_get_cap would make more sense in patch 9 for the
>> attach_device() flow[*], but not for vfio_migration_realize() flow.
>>
>> [*] though feels unneeded as we only have a local callsite, not external user so
>> far.
>>
>> Which would technically make v5.1 patch a more correct right check, perhaps with
>> better layering/naming.
>
> The quick fix (plan B if needed) would be :
>
> @@ -1038,8 +1038,11 @@ bool vfio_migration_realize(VFIODevice *
> }
>
> if ((!vbasedev->dirty_pages_supported ||
> - vbasedev->device_dirty_page_tracking == ON_OFF_AUTO_OFF) &&
> - !iommufd_hwpt_dirty_tracking(vbasedev->hwpt)) {
> + vbasedev->device_dirty_page_tracking == ON_OFF_AUTO_OFF)
> +#ifdef CONFIG_IOMMUFD
> + && !iommufd_hwpt_dirty_tracking(vbasedev->hwpt)
> +#endif
> + ) {
> if (vbasedev->enable_migration == ON_OFF_AUTO_AUTO) {
> error_setg(&err,
> "%s: VFIO device doesn't support device and "
>
> I would prefer to avoid the common component to reference IOMMUFD
> directly. The only exception today is the use of the vbasedev->iommufd
> pointer which is treated as opaque.
>
> I guess a simple approach would be to store the result of
> iommufd_hwpt_dirty_tracking(hwpt) under a 'dirty_tracking' attribute
> of vbasedev and return the value in vfio_device_iommu_dirty_tracking() ?
>
> if not, let's merge v5 (with more acks) and the fix of plan B.
>
>
>>>> I can follow-up with this improvement in case this gets merged as is,
>>>
>>> I can't merge as is since it break compiles (I am excluding the v5.1 patch).
>>> Which means I would prefer a v6 please.
>>>
>>
>> Ah OK -- I thought this discussion assumed v5.1 to be in which does fix the
>> compilation issue and all that remained were acks.
>
> v5.1 proposes a CONFIG_IOMMUFD in a header file which is error prone.
>
hmmm, ok, that's strage. It does look quite common in Qemu? e.g. We even have
CONFIG_LINUX in the vfio-common.h header file.
>>>> or include
>>>> it in the next version if you prefer to adjourn this series into 9.2 (given the
>>>> lack of time to get everything right).
>>>
>>> There aren't many open questions left.
>>>
>>> * PATCH 5 lacks a R-b. I would feel more confortable if ZhenZhong or
>>> Eric acked the changes.
>>> * PATCH 9 is slightly hacky with the use of vfio_device_get_aw_bits().
>>> I think it's minor. I would also feel more confortable if ZhenZhong
>>> acked the changes.
>>
>> I guess you meant patch 6 and not 9.
>
> yes.
>
> Thanks,
>
> C.
>
>
>
>>
>>> * PATCH 12 needs the fix we have been talking about.
>>> * PATCH 13 is for dev/debug.
>>>
>>>
>>> What's important is to avoid introducing regressions in the current behavior,
>>> that is when not using IOMMUFD. It looks fine on that aspect AFAICT.
>>
>> OK
>>
>
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v5 12/13] vfio/migration: Don't block migration device dirty tracking is unsupported
2024-07-22 18:01 ` Joao Martins
@ 2024-07-23 6:38 ` Cédric Le Goater
0 siblings, 0 replies; 53+ messages in thread
From: Cédric Le Goater @ 2024-07-23 6:38 UTC (permalink / raw)
To: Joao Martins, qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Jason Gunthorpe, Avihai Horon
On 7/22/24 20:01, Joao Martins wrote:
> On 22/07/2024 18:04, Cédric Le Goater wrote:
>> On 7/22/24 18:29, Joao Martins wrote:
>>> On 22/07/2024 16:58, Cédric Le Goater wrote:
>>>> On 7/22/24 17:42, Joao Martins wrote:
>>>>> On 22/07/2024 16:13, Cédric Le Goater wrote:
>>>>>> On 7/22/24 17:01, Joao Martins wrote:
>>>>>>> On 22/07/2024 15:53, Cédric Le Goater wrote:
>>>>>>>> On 7/19/24 19:26, Joao Martins wrote:
>>>>>>>>> On 19/07/2024 15:24, Joao Martins wrote:
>>>>>>>>>> On 19/07/2024 15:17, Cédric Le Goater wrote:
>>>>>>>>>>> On 7/19/24 14:05, Joao Martins wrote:
>>>>>>>>>>>> By default VFIO migration is set to auto, which will support live
>>>>>>>>>>>> migration if the migration capability is set *and* also dirty page
>>>>>>>>>>>> tracking is supported.
>>>>>>>>>>>>
>>>>>>>>>>>> For testing purposes one can force enable without dirty page tracking
>>>>>>>>>>>> via enable-migration=on, but that option is generally left for testing
>>>>>>>>>>>> purposes.
>>>>>>>>>>>>
>>>>>>>>>>>> So starting with IOMMU dirty tracking it can use to accomodate the
>>>>>>>>>>>> lack of
>>>>>>>>>>>> VF dirty page tracking allowing us to minimize the VF requirements for
>>>>>>>>>>>> migration and thus enabling migration by default for those too.
>>>>>>>>>>>>
>>>>>>>>>>>> While at it change the error messages to mention IOMMU dirty tracking as
>>>>>>>>>>>> well.
>>>>>>>>>>>>
>>>>>>>>>>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>>>>>>>>>>> ---
>>>>>>>>>>>> include/hw/vfio/vfio-common.h | 1 +
>>>>>>>>>>>> hw/vfio/iommufd.c | 2 +-
>>>>>>>>>>>> hw/vfio/migration.c | 11 ++++++-----
>>>>>>>>>>>> 3 files changed, 8 insertions(+), 6 deletions(-)
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/include/hw/vfio/vfio-common.h
>>>>>>>>>>>> b/include/hw/vfio/vfio-common.h
>>>>>>>>>>>> index 7e530c7869dc..00b9e933449e 100644
>>>>>>>>>>>> --- a/include/hw/vfio/vfio-common.h
>>>>>>>>>>>> +++ b/include/hw/vfio/vfio-common.h
>>>>>>>>>>>> @@ -299,6 +299,7 @@ int vfio_devices_query_dirty_bitmap(const
>>>>>>>>>>>> VFIOContainerBase *bcontainer,
>>>>>>>>>>>> VFIOBitmap *vbmap, hwaddr iova, hwaddr size, Error
>>>>>>>>>>>> **errp);
>>>>>>>>>>>> int vfio_get_dirty_bitmap(const VFIOContainerBase *bcontainer,
>>>>>>>>>>>> uint64_t
>>>>>>>>>>>> iova,
>>>>>>>>>>>> uint64_t size, ram_addr_t ram_addr,
>>>>>>>>>>>> Error
>>>>>>>>>>>> **errp);
>>>>>>>>>>>> +bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt);
>>>>>>>>>>>> /* Returns 0 on success, or a negative errno. */
>>>>>>>>>>>> bool vfio_device_get_name(VFIODevice *vbasedev, Error **errp);
>>>>>>>>>>>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>>>>>>>>>>>> index 7dd5d43ce06a..a998e8578552 100644
>>>>>>>>>>>> --- a/hw/vfio/iommufd.c
>>>>>>>>>>>> +++ b/hw/vfio/iommufd.c
>>>>>>>>>>>> @@ -111,7 +111,7 @@ static void
>>>>>>>>>>>> iommufd_cdev_unbind_and_disconnect(VFIODevice
>>>>>>>>>>>> *vbasedev)
>>>>>>>>>>>> iommufd_backend_disconnect(vbasedev->iommufd);
>>>>>>>>>>>> }
>>>>>>>>>>>> -static bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt)
>>>>>>>>>>>> +bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt)
>>>>>>>>>>>> {
>>>>>>>>>>>> return hwpt && hwpt->hwpt_flags &
>>>>>>>>>>>> IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
>>>>>>>>>>>> }
>>>>>>>>>>>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>>>>>>>>>>>> index 34d4be2ce1b1..63ffa46c9652 100644
>>>>>>>>>>>> --- a/hw/vfio/migration.c
>>>>>>>>>>>> +++ b/hw/vfio/migration.c
>>>>>>>>>>>> @@ -1036,16 +1036,17 @@ bool vfio_migration_realize(VFIODevice
>>>>>>>>>>>> *vbasedev,
>>>>>>>>>>>> Error **errp)
>>>>>>>>>>>> return !vfio_block_migration(vbasedev, err, errp);
>>>>>>>>>>>> }
>>>>>>>>>>>> - if (!vbasedev->dirty_pages_supported) {
>>>>>>>>>>>> + if (!vbasedev->dirty_pages_supported &&
>>>>>>>>>>>> + !iommufd_hwpt_dirty_tracking(vbasedev->hwpt)) {
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Some platforms do not have IOMMUFD support and this call will need
>>>>>>>>>>> some kind of abstract wrapper to reflect dirty tracking support in
>>>>>>>>>>> the IOMMU backend.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> This was actually on purpose because only IOMMUFD presents a view of
>>>>>>>>>> hardware
>>>>>>>>>> whereas type1 supporting dirty page tracking is not used as means to
>>>>>>>>>> 'migration
>>>>>>>>>> is supported'.
>>>>>>>>>>
>>>>>>>>>> The hwpt is nil in type1 and the helper checks that, so it should return
>>>>>>>>>> false.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Oh wait, maybe you're talking about CONFIG_IOMMUFD=n which I totally didn't
>>>>>>>>> consider. Maybe this would be a elegant way to address it? Looks to pass my
>>>>>>>>> build with CONFIG_IOMMUFD=n
>>>>>>>>>
>>>>>>>>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>>>>>>>>> index 61dd48e79b71..422ad4a5bdd1 100644
>>>>>>>>> --- a/include/hw/vfio/vfio-common.h
>>>>>>>>> +++ b/include/hw/vfio/vfio-common.h
>>>>>>>>> @@ -300,7 +300,14 @@ int vfio_devices_query_dirty_bitmap(const
>>>>>>>>> VFIOContainerBase
>>>>>>>>> *bcontainer,
>>>>>>>>> VFIOBitmap *vbmap, hwaddr iova, hwaddr size, Error
>>>>>>>>> **errp);
>>>>>>>>> int vfio_get_dirty_bitmap(const VFIOContainerBase *bcontainer,
>>>>>>>>> uint64_t
>>>>>>>>> iova,
>>>>>>>>> uint64_t size, ram_addr_t ram_addr, Error
>>>>>>>>> **errp);
>>>>>>>>> +#ifdef CONFIG_IOMMUFD
>>>>>>>>> bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt);
>>>>>>>>> +#else
>>>>>>>>> +static inline bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt)
>>>>>>>>> +{
>>>>>>>>> + return false;
>>>>>>>>> +}
>>>>>>>>> +#endif
>>>>>>>>>
>>>>>>>>> /* Returns 0 on success, or a negative errno. */
>>>>>>>>> bool vfio_device_get_name(VFIODevice *vbasedev, Error **errp);
>>>>>>>>>
>>>>>>>>
>>>>>>>> hmm, no. You will need to introduce a new Host IOMMU device capability,
>>>>>>>> something like :
>>>>>>>>
>>>>>>>> HOST_IOMMU_DEVICE_CAP_DIRTY_TRACKING,
>>>>>>>>
>>>>>>>> Then, introduce an helper routine to check the capability :
>>>>>>>>
>>>>>>>> return hiodc->get_cap( ... HOST_IOMMU_DEVICE_CAP_DIRTY_TRACKING...)
>>>>>>>> and replace the iommufd_hwpt_dirty_tracking call with it.
>>>>>>>>
>>>>>>>> Yeah I know, it's cumbersome but it's cleaner !
>>>>>>>>
>>>>>>>
>>>>>>> Funny you mention it, because that's what I did in v3:
>>>>>>>
>>>>>>> https://lore.kernel.org/qemu-devel/20240708143420.16953-9-joao.m.martins@oracle.com/
>>>>>>>
>>>>>>> But it was suggested to drop (I am assuming to avoid complexity)
>>>>>>
>>>>>> my bad if I did :/
>>>>>>
>>>>>
>>>>> No worries it is all part of review -- I think Zhenzhong proposed with good
>>>>> intentions, and I probably didn't think too hard about the consequences on
>>>>> layering with the HIOD.
>>>>>
>>>>>> we will need an helper such as :
>>>>>>
>>>>>> bool vfio_device_dirty_tracking(VFIODevice *vbasedev)
>>>>>> {
>>>>>> HostIOMMUDevice *hiod = vbasedev->hiod ;
>>>>>> HostIOMMUDeviceClass *hiodc = HOST_IOMMU_DEVICE_GET_CLASS(hiod);
>>>>>>
>>>>>> return hiodc->get_cap &&
>>>>>> hiodc->get_cap(hiod, HOST_IOMMU_DEVICE_CAP_DIRTY_TRACKING, NULL)
>>>>>> == 1;
>>>>>> }
>>>>>>
>>>>>> and something like,
>>>>>>
>>>>>> static int hiod_iommufd_vfio_get_cap(HostIOMMUDevice *hiod, int cap,
>>>>>> Error **errp)
>>>>>> {
>>>>>> switch (cap) {
>>>>>> case HOST_IOMMU_DEVICE_CAP_DIRTY_TRACKING:
>>>>>> return !!(hiod->caps.hw_caps & IOMMU_HW_CAP_DIRTY_TRACKING);
>>>>>> default:
>>>>>> error_setg(errp, "%s: unsupported capability %x", hiod->name,
>>>>>> cap);
>>>>>> return -EINVAL;
>>>>>> }
>>>>>> }
>>>>>>
>>>>>> Feel free to propose your own implementation,
>>>>>>
>>>>>
>>>>> Actually it's close to what I had in v3 link, except the new helper (the name
>>>>> vfio_device_dirty_tracking is a bit misleading I would call it
>>>>> vfio_device_iommu_dirty_tracking)
>>>>
>>>> Let's call it vfio_device_iommu_dirty_tracking.
>>>>
>>>
>>> I thinking about this and I am not that sure it makes sense. That is the
>>> .get_cap() stuff.
>>>
>>> Using the hw_caps is only useful when choosing hwpt_flags, then the only thing
>>> that matters for patch 12 is after the device is attached ... hence we gotta
>>> look at hwpt_flags. That ultimately is what tells if dirty tracking can be done
>>> in the device pagetable.
>>>
>>> I can expand hiod_iommufd_vfio_get_cap() to return the hwpt flags, but it feels
>>> just as hacky given that I am testing its enablement of the hardware pagetable
>>> (HWPT), and not asking a HIOD capability.
>>
>> arf. yes.
>>
>>> e.g. hiod_iommufd_vfio_get_cap would make more sense in patch 9 for the
>>> attach_device() flow[*], but not for vfio_migration_realize() flow.
>>>
>>> [*] though feels unneeded as we only have a local callsite, not external user so
>>> far.
>>>
>>> Which would technically make v5.1 patch a more correct right check, perhaps with
>>> better layering/naming.
>>
>> The quick fix (plan B if needed) would be :
>>
>> @@ -1038,8 +1038,11 @@ bool vfio_migration_realize(VFIODevice *
>> }
>>
>> if ((!vbasedev->dirty_pages_supported ||
>> - vbasedev->device_dirty_page_tracking == ON_OFF_AUTO_OFF) &&
>> - !iommufd_hwpt_dirty_tracking(vbasedev->hwpt)) {
>> + vbasedev->device_dirty_page_tracking == ON_OFF_AUTO_OFF)
>> +#ifdef CONFIG_IOMMUFD
>> + && !iommufd_hwpt_dirty_tracking(vbasedev->hwpt)
>> +#endif
>> + ) {
>> if (vbasedev->enable_migration == ON_OFF_AUTO_AUTO) {
>> error_setg(&err,
>> "%s: VFIO device doesn't support device and "
>>
>> I would prefer to avoid the common component to reference IOMMUFD
>> directly. The only exception today is the use of the vbasedev->iommufd
>> pointer which is treated as opaque.
>>
>> I guess a simple approach would be to store the result of
>> iommufd_hwpt_dirty_tracking(hwpt) under a 'dirty_tracking' attribute
>> of vbasedev and return the value in vfio_device_iommu_dirty_tracking() ?
>>
>> if not, let's merge v5 (with more acks) and the fix of plan B.
>>
>>
>>>>> I can follow-up with this improvement in case this gets merged as is,
>>>>
>>>> I can't merge as is since it break compiles (I am excluding the v5.1 patch).
>>>> Which means I would prefer a v6 please.
>>>>
>>>
>>> Ah OK -- I thought this discussion assumed v5.1 to be in which does fix the
>>> compilation issue and all that remained were acks.
>>
>> v5.1 proposes a CONFIG_IOMMUFD in a header file which is error prone.
>>
>
> hmmm, ok, that's strage. It does look quite common in Qemu? e.g. We even have
> CONFIG_LINUX in the vfio-common.h header file.
Yes. there are some high level CONFIG options like LINUX, TCG, KVM,
etc in header files. It's different for device CONFIG options, you
first need to include CONFIG_DEVICES.
Thanks,
C.
>
>>>>> or include
>>>>> it in the next version if you prefer to adjourn this series into 9.2 (given the
>>>>> lack of time to get everything right).
>>>>
>>>> There aren't many open questions left.
>>>>
>>>> * PATCH 5 lacks a R-b. I would feel more confortable if ZhenZhong or
>>>> Eric acked the changes.
>>>> * PATCH 9 is slightly hacky with the use of vfio_device_get_aw_bits().
>>>> I think it's minor. I would also feel more confortable if ZhenZhong
>>>> acked the changes.
>>>
>>> I guess you meant patch 6 and not 9.
>>
>> yes.
>>
>> Thanks,
>>
>> C.
>>
>>
>>
>>>
>>>> * PATCH 12 needs the fix we have been talking about.
>>>> * PATCH 13 is for dev/debug.
>>>>
>>>>
>>>> What's important is to avoid introducing regressions in the current behavior,
>>>> that is when not using IOMMUFD. It looks fine on that aspect AFAICT.
>>>
>>> OK
>>>
>>
>
^ permalink raw reply [flat|nested] 53+ messages in thread
* [PATCH v5 13/13] vfio/common: Allow disabling device dirty page tracking
2024-07-19 12:04 [PATCH v5 00/13] hw/iommufd: IOMMUFD Dirty Tracking Joao Martins
` (11 preceding siblings ...)
2024-07-19 12:05 ` [PATCH v5 12/13] vfio/migration: Don't block migration device dirty tracking is unsupported Joao Martins
@ 2024-07-19 12:05 ` Joao Martins
2024-07-19 12:13 ` [PATCH v5 00/13] hw/iommufd: IOMMUFD Dirty Tracking Joao Martins
` (2 subsequent siblings)
15 siblings, 0 replies; 53+ messages in thread
From: Joao Martins @ 2024-07-19 12:05 UTC (permalink / raw)
To: qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Cedric Le Goater, Jason Gunthorpe, Avihai Horon, Joao Martins
The property 'x-pre-copy-dirty-page-tracking' allows disabling the whole
tracking of VF pre-copy phase of dirty page tracking, though it means
that it will only be used at the start of the switchover phase.
Add an option that disables the VF dirty page tracking, and fall
back into container-based dirty page tracking. This also allows to
use IOMMU dirty tracking even on VFs with their own dirty
tracker scheme.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
include/hw/vfio/vfio-common.h | 1 +
hw/vfio/common.c | 3 +++
hw/vfio/migration.c | 3 ++-
hw/vfio/pci.c | 3 +++
4 files changed, 9 insertions(+), 1 deletion(-)
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 00b9e933449e..61dd48e79b71 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -138,6 +138,7 @@ typedef struct VFIODevice {
VFIOMigration *migration;
Error *migration_blocker;
OnOffAuto pre_copy_dirty_page_tracking;
+ OnOffAuto device_dirty_page_tracking;
bool dirty_pages_supported;
bool dirty_tracking;
HostIOMMUDevice *hiod;
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index cc14f0e3fe24..070a4a2df020 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -199,6 +199,9 @@ bool vfio_devices_all_device_dirty_tracking(const VFIOContainerBase *bcontainer)
VFIODevice *vbasedev;
QLIST_FOREACH(vbasedev, &bcontainer->device_list, container_next) {
+ if (vbasedev->device_dirty_page_tracking == ON_OFF_AUTO_OFF) {
+ return false;
+ }
if (!vbasedev->dirty_pages_supported) {
return false;
}
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 63ffa46c9652..88fdc6efbcc9 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -1036,7 +1036,8 @@ bool vfio_migration_realize(VFIODevice *vbasedev, Error **errp)
return !vfio_block_migration(vbasedev, err, errp);
}
- if (!vbasedev->dirty_pages_supported &&
+ if ((!vbasedev->dirty_pages_supported ||
+ vbasedev->device_dirty_page_tracking == ON_OFF_AUTO_OFF) &&
!iommufd_hwpt_dirty_tracking(vbasedev->hwpt)) {
if (vbasedev->enable_migration == ON_OFF_AUTO_AUTO) {
error_setg(&err,
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 265d3cb82ffc..2407720c3530 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3361,6 +3361,9 @@ static Property vfio_pci_dev_properties[] = {
DEFINE_PROP_ON_OFF_AUTO("x-pre-copy-dirty-page-tracking", VFIOPCIDevice,
vbasedev.pre_copy_dirty_page_tracking,
ON_OFF_AUTO_ON),
+ DEFINE_PROP_ON_OFF_AUTO("x-device-dirty-page-tracking", VFIOPCIDevice,
+ vbasedev.device_dirty_page_tracking,
+ ON_OFF_AUTO_ON),
DEFINE_PROP_ON_OFF_AUTO("display", VFIOPCIDevice,
display, ON_OFF_AUTO_OFF),
DEFINE_PROP_UINT32("xres", VFIOPCIDevice, display_xres, 0),
--
2.17.2
^ permalink raw reply related [flat|nested] 53+ messages in thread
* Re: [PATCH v5 00/13] hw/iommufd: IOMMUFD Dirty Tracking
2024-07-19 12:04 [PATCH v5 00/13] hw/iommufd: IOMMUFD Dirty Tracking Joao Martins
` (12 preceding siblings ...)
2024-07-19 12:05 ` [PATCH v5 13/13] vfio/common: Allow disabling device dirty page tracking Joao Martins
@ 2024-07-19 12:13 ` Joao Martins
2024-07-19 22:19 ` [PATCH v5.1 12/13] vfio/migration: Don't block migration device dirty tracking is unsupported Joao Martins
2024-07-22 13:51 ` [PATCH v5 00/13] hw/iommufd: IOMMUFD Dirty Tracking Cédric Le Goater
15 siblings, 0 replies; 53+ messages in thread
From: Joao Martins @ 2024-07-19 12:13 UTC (permalink / raw)
To: Cedric Le Goater
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Jason Gunthorpe, Avihai Horon, qemu-devel
Hey Cedric,
On 19/07/2024 13:04, Joao Martins wrote:
> The unmap case is deferred until further vIOMMU support with migration
> is added[3] which will then introduce the usage of
> IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR in GET_DIRTY_BITMAP ioctl in the
> dma unmap bitmap flow.
>
A couple notes with respect to this series:
- The first 2 patches I think are needed because it addresses a regression
- This paragraph is meant to state that this doesn't support vIOMMU, and so I
haven't changed in that area. I am assuming that when the vIOMMU will get resent
for 9.2 where I would address the IOMMUFD counterpart.
In case this be merged, the next Qemu stages after this series are:
1) Mixed mode of VF with IOMMUFD dirty tracking
2) vIOMMU support (with the relaxing vIOMMU patch and IOMMUFD support)
It's not listed in terms of priority, just wanted to make it obvious on the gaps
that are on my list.
^ permalink raw reply [flat|nested] 53+ messages in thread
* [PATCH v5.1 12/13] vfio/migration: Don't block migration device dirty tracking is unsupported
2024-07-19 12:04 [PATCH v5 00/13] hw/iommufd: IOMMUFD Dirty Tracking Joao Martins
` (13 preceding siblings ...)
2024-07-19 12:13 ` [PATCH v5 00/13] hw/iommufd: IOMMUFD Dirty Tracking Joao Martins
@ 2024-07-19 22:19 ` Joao Martins
2024-07-22 13:51 ` [PATCH v5 00/13] hw/iommufd: IOMMUFD Dirty Tracking Cédric Le Goater
15 siblings, 0 replies; 53+ messages in thread
From: Joao Martins @ 2024-07-19 22:19 UTC (permalink / raw)
To: qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Cedric Le Goater, Jason Gunthorpe, Avihai Horon, Joao Martins
By default VFIO migration is set to auto, which will support live
migration if the migration capability is set *and* also dirty page
tracking is supported.
For testing purposes one can force enable without dirty page tracking
via enable-migration=on, but that option is generally left for testing
purposes.
So starting with IOMMU dirty tracking it can use to accomodate the lack of
VF dirty page tracking allowing us to minimize the VF requirements for
migration and thus enabling migration by default for those too.
While at it change the error messages to mention IOMMU dirty tracking as
well.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
Same patch as v5, but fixes builds that have CONFIG_IOMMUFD=n
Sending just this one as it doesn't justify sending the whole series
again.
---
hw/vfio/iommufd.c | 2 +-
hw/vfio/migration.c | 11 ++++++-----
include/hw/vfio/vfio-common.h | 8 ++++++++
3 files changed, 15 insertions(+), 6 deletions(-)
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 7dd5d43ce06a..a998e8578552 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -111,7 +111,7 @@ static void iommufd_cdev_unbind_and_disconnect(VFIODevice *vbasedev)
iommufd_backend_disconnect(vbasedev->iommufd);
}
-static bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt)
+bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt)
{
return hwpt && hwpt->hwpt_flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
}
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 34d4be2ce1b1..63ffa46c9652 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -1036,16 +1036,17 @@ bool vfio_migration_realize(VFIODevice *vbasedev, Error **errp)
return !vfio_block_migration(vbasedev, err, errp);
}
- if (!vbasedev->dirty_pages_supported) {
+ if (!vbasedev->dirty_pages_supported &&
+ !iommufd_hwpt_dirty_tracking(vbasedev->hwpt)) {
if (vbasedev->enable_migration == ON_OFF_AUTO_AUTO) {
error_setg(&err,
- "%s: VFIO device doesn't support device dirty tracking",
- vbasedev->name);
+ "%s: VFIO device doesn't support device and "
+ "IOMMU dirty tracking", vbasedev->name);
goto add_blocker;
}
- warn_report("%s: VFIO device doesn't support device dirty tracking",
- vbasedev->name);
+ warn_report("%s: VFIO device doesn't support device and "
+ "IOMMU dirty tracking", vbasedev->name);
}
ret = vfio_block_multiple_devices_migration(vbasedev, errp);
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 7e530c7869dc..333cabbf4362 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -299,6 +299,14 @@ int vfio_devices_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
VFIOBitmap *vbmap, hwaddr iova, hwaddr size, Error **errp);
int vfio_get_dirty_bitmap(const VFIOContainerBase *bcontainer, uint64_t iova,
uint64_t size, ram_addr_t ram_addr, Error **errp);
+#ifdef CONFIG_IOMMUFD
+bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt);
+#else
+static inline bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt)
+{
+ return false;
+}
+#endif
/* Returns 0 on success, or a negative errno. */
bool vfio_device_get_name(VFIODevice *vbasedev, Error **errp);
--
2.39.3
^ permalink raw reply related [flat|nested] 53+ messages in thread
* Re: [PATCH v5 00/13] hw/iommufd: IOMMUFD Dirty Tracking
2024-07-19 12:04 [PATCH v5 00/13] hw/iommufd: IOMMUFD Dirty Tracking Joao Martins
` (14 preceding siblings ...)
2024-07-19 22:19 ` [PATCH v5.1 12/13] vfio/migration: Don't block migration device dirty tracking is unsupported Joao Martins
@ 2024-07-22 13:51 ` Cédric Le Goater
15 siblings, 0 replies; 53+ messages in thread
From: Cédric Le Goater @ 2024-07-22 13:51 UTC (permalink / raw)
To: Joao Martins, qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Jason Gunthorpe, Avihai Horon
On 7/19/24 14:04, Joao Martins wrote:
> This small series adds support for IOMMU dirty tracking support via the
> IOMMUFD backend. The hardware capability is available on most recent x86
> hardware. The series is divided organized as follows:
>
> * Patch 1-2: Fixes a regression into mdev support with IOMMUFD. This
> one is independent of the series but happened to cross it
> while testing mdev with this series
>
> * Patch 3: Adds a support to iommufd_get_device_info() for capabilities
>
> * Patches 4 - 11: IOMMUFD backend support for dirty tracking;
>
> Introduce auto domains -- Patch 5 goes into more detail, but the gist is that
> we will find and attach a device to a compatible IOMMU domain, or allocate a new
> hardware pagetable *or* rely on kernel IOAS attach (for mdevs). Afterwards the
> workflow is relatively simple:
>
> 1) Probe device and allow dirty tracking in the HWPT
> 2) Toggling dirty tracking on/off
> 3) Read-and-clear of Dirty IOVAs
>
> The heuristics selected for (1) were to always request the HWPT for
> dirty tracking if supported, or rely on device dirty page tracking. This
> is a little simplistic and we aren't necessarily utilizing IOMMU dirty
> tracking even if we ask during hwpt allocation.
>
> The unmap case is deferred until further vIOMMU support with migration
> is added[3] which will then introduce the usage of
> IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR in GET_DIRTY_BITMAP ioctl in the
> dma unmap bitmap flow.
>
> * Patches 12-13: Don't block live migration where there's no VF dirty
> tracker, considering that we have IOMMU dirty tracking.
>
> Comments and feedback appreciated. Thanks for the review so far and
> apologies in advance if I missed any comment.
>
> Cheers,
> Joao
>
> P.S. Suggest linux-next (or future v6.11) as hypervisor kernel as there's
> some bugs fixed there with regards to IOMMU hugepage dirty tracking.
>
> Changes since v4[6]:
> * Add various Reviewed-by in patches 2,3,4,6,8,11
> * Change error messages to mention IOMMU (Zhenzhong)
> * Better improve the checking of dirty page tracking in
> vfio_migration_realize() to detect per-device IOMMU instead of using
> container dirty_page_supported().
> * Improve/Cleanup various commit messages to be clear (Eric)
> * Extract the caps::hw_caps into its own patch as it was miosleading to
> be hidden in another patch (new patch 7)
> * Restructure patch 1 helper to be vfio_device_is_mdev() and use
> vfio::mdev directly in rest of patches (Cedric)
> * Improve error messages of set,query dirty tracking (Cedric)
> * Add missing casts to uintptr and uint64_t* (Cedric)
> * Improve terciary check in set_dirty_Tracking (Cedric)
> * Add missing commens to struct doc from aw_bits removal (and hw_caps
> addition) (Eric)
> * Fix the detach flow in auto domains (Eric)
> * Add new helper vfio_device_hiod_realize() and use it in backends
> * (Cedric)
> * Move introduction of iommufd_hwpt_dirty_tracking() in the predecessor
> * patch (Cedric)
> * Set hwpt to NULL on detach (Eric)
> * Spurious line (Eric)
>
> Changes since v3[5]:
> * Skip HostIOMMUDevice::realize for mdev, and introduce a helper to check if the VFIO
> device is mdev. (Zhenzhong)
> * Skip setting IOMMU device for mdev (Zhenzhong)
> * Add Zhenzhong review tag in patch 3
> * Utilize vbasedev::bcontainer::dirty_pages_supported instead of introducing
> a new HostIOMMUDevice capability and thus remove the cap patch from the series (Zhenzhong)
> * Move the HostIOMMUDevice::realize() to be part of VFIODevice initialization in attach_device()
> while skipping it all together for mdev. (Cedric)
> * Due to the previous item, had to remove aw_bits because it depends on device attach being
> finished, instead defer it to when get_cap() gets called.
> * Skip auto domains for mdev instead of purposedly erroring out (Zhenzhong)
> * Pass errp in all cases, and instead just free the error in case of -EINVAL
> in most of all patches, and also pass Error* in iommufd_backend_alloc_hwpt() amd
> set/query dirty. This is made better thanks in part to skipping auto domains for mdev (Cedric)
>
> Changes since RFCv2[4]:
> * Always allocate hwpt with IOMMU_HWPT_ALLOC_DIRTY_TRACKING even if
> we end up not actually toggling dirty tracking. (Avihai)
> * Fix error handling widely in auto domains logic and all patches (Avihai)
> * Reuse iommufd_backend_get_device_info() for capabilities (Zhenzhong)
> * New patches 1 and 2 taking into consideration previous comments.
> * Store hwpt::flags to know if we have dirty tracking (Avihai)
> * New patch 8, that allows to query dirty tracking support after
> provisioning. This is a cleaner way to check IOMMU dirty tracking support
> when vfio::migration is iniitalized, as opposed to RFCv2 via device caps.
> device caps way is still used because at vfio attach we aren't yet with
> a fully initialized migration state.
> * Adopt error propagation in query,set dirty tracking
> * Misc improvements overall broadly and Avihai
> * Drop hugepages as it's a bit unrelated; I can pursue that patch
> * separately. The main motivation is to provide a way to test
> without hugepages similar to what vfio_type1_iommu.disable_hugepages=1
> does.
>
> Changes since RFCv1[2]:
> * Remove intel/amd dirty tracking emulation enabling
> * Remove the dirtyrate improvement for VF/IOMMU dirty tracking
> [Will pursue these two in separate series]
> * Introduce auto domains support
> * Enforce dirty tracking following the IOMMUFD UAPI for this
> * Add support for toggling hugepages in IOMMUFD
> * Auto enable support when VF supports migration to use IOMMU
> when it doesn't have VF dirty tracking
> * Add a parameter to toggle VF dirty tracking
>
> [0] https://lore.kernel.org/qemu-devel/20240201072818.327930-1-zhenzhong.duan@intel.com/
> [1] https://lore.kernel.org/qemu-devel/20240201072818.327930-10-zhenzhong.duan@intel.com/
> [2] https://lore.kernel.org/qemu-devel/20220428211351.3897-1-joao.m.martins@oracle.com/
> [3] https://lore.kernel.org/qemu-devel/20230622214845.3980-1-joao.m.martins@oracle.com/
> [4] https://lore.kernel.org/qemu-devel/20240212135643.5858-1-joao.m.martins@oracle.com/
> [5] https://lore.kernel.org/qemu-devel/20240708143420.16953-1-joao.m.martins@oracle.com/
> [6] https://lore.kernel.org/qemu-devel/20240712114704.8708-1-joao.m.martins@oracle.com/#t
>
> Joao Martins (13):
> vfio/pci: Extract mdev check into an helper
> vfio/iommufd: Don't initialize nor set a HOST_IOMMU_DEVICE with mdev
> backends/iommufd: Extend iommufd_backend_get_device_info() to fetch HW
> capabilities
> vfio/iommufd: Return errno in iommufd_cdev_attach_ioas_hwpt()
> vfio/iommufd: Introduce auto domain creation
> vfio/{iommufd,container}: Remove caps::aw_bits
> vfio/iommufd: Add hw_caps field to HostIOMMUDeviceCaps
> vfio/{iommufd,container}: Invoke HostIOMMUDevice::realize() during
> attach_device()
> vfio/iommufd: Probe and request hwpt dirty tracking capability
> vfio/iommufd: Implement VFIOIOMMUClass::set_dirty_tracking support
> vfio/iommufd: Implement VFIOIOMMUClass::query_dirty_bitmap support
> vfio/migration: Don't block migration device dirty tracking is
> unsupported
> vfio/common: Allow disabling device dirty page tracking
>
> include/hw/vfio/vfio-common.h | 15 +++
> include/sysemu/host_iommu_device.h | 5 +-
> include/sysemu/iommufd.h | 13 ++-
> backends/iommufd.c | 89 +++++++++++++-
> hw/vfio/common.c | 17 +--
> hw/vfio/container.c | 9 +-
> hw/vfio/helpers.c | 25 ++++
> hw/vfio/iommufd.c | 181 ++++++++++++++++++++++++++++-
> hw/vfio/migration.c | 12 +-
> hw/vfio/pci.c | 26 +++--
> backends/trace-events | 3 +
> 11 files changed, 356 insertions(+), 39 deletions(-)
Applied 1-4 to vfio-next.
Still looking at the rest. We have ~24h for the last reviews.
Thanks,
C.
^ permalink raw reply [flat|nested] 53+ messages in thread